Home / GDPR & compliance
GDPR compliance for robotics video datasets.
Compliance is not a checkbox at EgoVista, it is the architecture. Every dataset we deliver was collected, processed, and stored under EU law from day one, and the audit trail ships with the data.
1 of 8 sections
Why GDPR matters for robotics datasets.
In 2026 the regulatory pressure on AI training data is no longer theoretical. The European AI Act is in application, GDPR enforcement has matured, and compliance audits of robotics products and software systems are now a standard step in enterprise procurement. For an ML team, using a dataset that was collected or processed in a non-compliant way creates a chain of risk: dataset recall, model retraining, deployment blockage, and in the worst case, fines that scale with company revenue.
The risk is not abstract. A robotics startup that ships a manipulation policy trained on a non-compliant dataset can find itself in a position where the policy must be retired before launch, because the dataset cannot be re-licensed under acceptable conditions, or because a contributor withdrawal request cannot be honored in the trained model. EgoVista was designed so the dataset side never becomes that bottleneck.
2 of 8 sections
Face anonymization before any external processing.
The cornerstone of the EgoVista pipeline is a simple invariant: no frame containing an identifiable third party face ever crosses an external boundary. Anonymization runs locally, before the frame reaches any cloud service, any LLM, or any GPU inference endpoint.
The technical implementation in plain terms:
- Multi-person face detection with MediaPipe Face Detection. The detector is tuned with a conservative confidence threshold so missed detections are minimised, at the cost of occasional false positives that are then also blurred.
- Gaussian blur applied to each detected face region with a kernel sized to defeat off-the-shelf face recognition models on the resulting image. The parameters are calibrated to remain effective against current recognition systems, with a margin for the underlying detector to be updated.
- Verification pass after the blur: a second model attempts face detection on the anonymized frame. Any residual detection triggers a re-blur with a stronger filter before the frame leaves the local zone.
- The non-anonymized version of the video is never stored outside the contributor environment. The annotation pipeline only ever sees the anonymized derived version.
3 of 8 sections
European-first infrastructure.
Every component that touches personal data runs in the European Union. The choice is not branding, it is the simplest way to satisfy GDPR transfer rules and the data sovereignty expectations of enterprise clients in the EU. The full infrastructure stack:
- Storage: Cloudflare R2, EU region (Frankfurt). All raw, anonymized, and derived data sits in EU buckets with restricted access.
- Compute: RunPod data centres in Amsterdam and Frankfurt for the GPU-bound annotation layers (EgoHOS segmentation, hand pose, depth).
- LLM action labelling: Vertex AI in
europe-west4(Netherlands) for the natural language action descriptions, with the Google Cloud regional commitment that prevents cross-region failover. - Database and authentication: Supabase EU for the metadata catalogue, contributor accounts, and capture session metadata.
- Domain, email, and operational tooling: EU-headquartered providers, with US sub-processors limited to email delivery and analytics, covered by Standard Contractual Clauses.
The full transfer mechanism for each sub-processor is documented in the privacy policy, so a compliance officer can audit the chain without having to ask.
4 of 8 sections
Legal basis for each processing operation.
GDPR requires that every processing operation has a clear legal basis. The table below summarises the basis we rely on per step, with the GDPR article reference. The full version, including retention periods and sub-processor names, sits in the privacy policy.
| Processing | Legal basis | Justification |
|---|---|---|
| Contributor video capture | Explicit consent (Art. 6.1.a) | Each contributor signs a mission-specific consent form before recording. Consent is recorded, dated, and revocable. |
| Face anonymization | Legal obligation (Art. 6.1.c) and data protection by design (Art. 25) | Anonymisation is mandated to make subsequent processing safe and to honour data protection by design. |
| Hand pose, depth, segmentation | Legitimate interest (Art. 6.1.f) | The data is already anonymized at this point. The processing is necessary to produce a usable dataset, the impact on data subjects is minimal, and the balancing test documents this. |
| Action labelling on Vertex AI | Legitimate interest (Art. 6.1.f) | Processing runs on EU infrastructure (europe-west4), only on anonymized frames. The balancing test and the sub-processor agreement are documented. |
| Delivery to client | Contract (Art. 6.1.b) | The delivery is the core of the contractual relationship between EgoVista and the client. The dataset is produced and shipped for that purpose. |
| Post-delivery retention | Legitimate interest (Art. 6.1.f) | Retention windows are short, documented, and serve a defined purpose: re-packing into another format, quality re-review on dispute, contractual support. |
A DPIA (Data Protection Impact Assessment) covering the full pipeline is available on request for enterprise clients under NDA.
5 of 8 sections
Contributor rights and data subject access.
Contributors keep the standard set of GDPR data subject rights. The way each right is honoured on the EgoVista side:
- Right of access: a contributor can request a copy of the data captured during a mission, along with the processing log, within the legal one-month window.
- Right to withdrawal: consent can be withdrawn at any time, in writing. Data that has not yet been delivered is deleted within thirty days. Data that has been delivered is covered by the contractual chain described to the contributor at signup.
- Right to erasure: within the limits set by the contractual chain, erasure requests are processed and propagated to the client where applicable.
- Right to portability: a contributor can request their data in a structured, machine-readable format.
- Complaint procedure: contact details for the data controller and for the relevant supervisory authority (CNIL in France) are provided in the privacy policy, in plain language.
6 of 8 sections
AI Act and high-risk system data requirements.
Under the EU AI Act, robotics products that perform safety-critical functions can fall under the high-risk system category. Such systems require structured data governance, transparency about training data, and risk management. EgoVista contributes to that governance by shipping each dataset with:
- Provenance metadata: which contributor recorded which episode, when, where, on which equipment.
- Annotation provenance: which model version produced each annotation layer, when, on which compute zone.
- Per-dataset QA report: schema validation, completeness metrics, known limitations.
- Dataset card: a Hugging Face-style document with intended use, characterisations of distribution, known biases, and recommended evaluation practice.
The dataset card is intended to slot into your AI Act compliance documentation without rework. We do not certify the downstream system, that is your team's responsibility, but we make sure the dataset side does not become the missing piece.
7 of 8 sections
How EgoVista handles client confidentiality.
On the client side, the same posture applies. Datasets are produced for the commissioning client and are not reused for another client, with the contractual exclusivity terms agreed at engagement. Raw footage used to build a delivered dataset is purged ninety days after delivery, unless an extended retention is part of the engagement. NDAs are available before any technical conversation, and most enterprise engagements start with a mutual NDA. The storage and processing cost during the project window is absorbed in the delivery fee, with no per-gigabyte surprise on the invoice.
8 of 8 sections
GDPR and compliance frequently asked questions.
Is your dataset legal to use in EU production deployments?
Yes, under standard conditions. Every processing step in our pipeline has a documented legal basis under GDPR, and the data is collected with informed contributor consent. The dataset card we ship documents what was collected, on which legal basis, with which retention. For deployment in a high-risk AI system under the EU AI Act, your team is responsible for the broader governance (risk management, transparency, post-market monitoring), but the dataset side is built to slot into that governance without rework.
Can you provide a DPIA for our compliance review?
Yes. A Data Protection Impact Assessment covering the EgoVista capture and annotation pipeline is available on request for enterprise clients, under a mutual NDA. The DPIA describes the data flows, the legal bases per processing operation, the risks identified and the mitigations applied, including the local anonymization step and the EU-only compute path. Your DPO or compliance team can use the document as a starting point for your own DPIA.
What happens if a contributor withdraws consent after delivery?
Contributors can withdraw consent at any time. For data still in our pipeline that has not been delivered, withdrawal triggers deletion within thirty days. For data already delivered to a client and contractually owned by that client, the contractual chain explains what the client can and cannot do, and we facilitate a deletion request to the client when withdrawal applies. The contributor agreement documents both paths in plain language so no party is surprised.
Are facial features completely removed or just blurred?
Faces are blurred with a conservative Gaussian filter that prevents identification while keeping the body context intact for hand and object segmentation. The blur radius is calibrated to defeat off-the-shelf face recognition models on the anonymized frame, and we verify the result before any external API call. For projects with stronger anonymization needs, we can apply a stronger filter or a mask-and-fill technique, with documented impact on downstream annotation quality.
Do you process any data outside the EU?
No. Every step that handles personal data runs in the EU: storage on Cloudflare R2 in the EU region, segmentation inference on RunPod data centres in Amsterdam and Frankfurt, action labelling on Vertex AI in europe-west4 (Netherlands), database on Supabase EU. Email notifications are sent via Resend, with the email address being the only piece of personal data exposed to a US sub-processor, covered by Standard Contractual Clauses. The full data flow is documented in the privacy policy.
Can you sign a DPA (Data Processing Agreement)?
Yes. We provide a standard DPA aligned with GDPR article 28, covering the scope of processing, the sub-processors involved, the location of processing, the security measures applied, and the procedure for handling data subject requests. The DPA is signed before any client data crosses into our pipeline. Custom amendments to the standard DPA are accepted on request, within the limits of our compliance posture.
How long is raw footage retained before deletion?
Raw footage that was used for a delivered dataset is purged ninety days after delivery, unless the engagement explicitly requires longer retention for re-export or quality re-review. Anonymized frames are retained alongside the dataset for thirty days post-delivery to allow re-packing into a different format. Logs and metadata used to reconstruct the annotation provenance are retained longer, but they do not contain identifiable content.
Request a compliance brief.
Your compliance team can review the EgoVista pipeline before any data is exchanged. We can send a DPA template, a high-level architecture description, and a DPIA summary under NDA. For related material, see the product overview, the LeRobot format details, and the RLDS format details.