Home / GDPR & compliance

GDPR compliance for robotics video datasets.

Compliance is not a checkbox at EgoVista, it is the architecture. Every dataset we deliver was collected, processed, and stored under EU law from day one, and the audit trail ships with the data.

1 of 8 sections

Why GDPR matters for robotics datasets.

In 2026 the regulatory pressure on AI training data is no longer theoretical. The European AI Act is in application, GDPR enforcement has matured, and compliance audits of robotics products and software systems are now a standard step in enterprise procurement. For an ML team, using a dataset that was collected or processed in a non-compliant way creates a chain of risk: dataset recall, model retraining, deployment blockage, and in the worst case, fines that scale with company revenue.

The risk is not abstract. A robotics startup that ships a manipulation policy trained on a non-compliant dataset can find itself in a position where the policy must be retired before launch, because the dataset cannot be re-licensed under acceptable conditions, or because a contributor withdrawal request cannot be honored in the trained model. EgoVista was designed so the dataset side never becomes that bottleneck.

2 of 8 sections

Face pseudonymisation before any external processing.

The cornerstone of the EgoVista pipeline is a simple invariant: no frame containing an identifiable third party face ever crosses an external boundary. Facial blurring runs locally, before the frame reaches any cloud service, any model, or any GPU inference endpoint. This blurring is a pseudonymisation measure — a security and data-minimisation step — not anonymisation: the processed video remains personal data, because identifiers other than the face may persist.

Pseudonymisation runs before any external API call in the EgoVista pipeline.

The technical implementation in plain terms:

Multi-person face detection with a face-detection model running locally. The detector is tuned with a conservative confidence threshold so missed detections are minimised, at the cost of occasional false positives that are then also blurred.
Gaussian blur applied to each detected face region with a kernel sized to defeat off-the-shelf face recognition models on the resulting image. The parameters are calibrated to remain effective against current recognition systems, with a margin for the underlying detector to be updated.
Verification pass after the blur: a second model attempts face detection on the pseudonymised frame. Any residual detection triggers a re-blur with a stronger filter before the frame leaves the local zone.
The non-pseudonymised version of the video is never stored outside the contributor environment. The annotation pipeline only ever sees the pseudonymised derived version.

3 of 8 sections

European-first infrastructure.

Every component that touches personal data runs in the European Union. The choice is not branding, it is the simplest way to satisfy GDPR transfer rules and the data sovereignty expectations of enterprise clients in the EU. The full infrastructure stack:

Storage: object storage in the EU region. All raw, pseudonymised, and derived data sits in EU buckets with restricted access.
Compute: EU-region GPU compute for the GPU-bound annotation layers (segmentation, hand pose, depth).
Action labelling: EU-region infrastructure for the natural-language action descriptions, with a regional commitment that prevents cross-region failover.
Database and authentication: an EU-region database for the metadata catalogue, contributor accounts, and capture session metadata.
Domain, email, and operational tooling: EU-region providers, with US sub-processors limited to email delivery, covered by the EU-U.S. Data Privacy Framework with Standard Contractual Clauses as a fallback.

The full, dated list of sub-processors, with the functional category and the transfer mechanism for each, is published at egovista.app/legal/sub-processors and the data flow is documented in the privacy policy, so a compliance officer can audit the chain without having to ask.

4 of 8 sections

Legal basis for each processing operation.

GDPR requires that every processing operation has a clear legal basis. The table below summarises the basis we rely on per step, with the GDPR article reference. The full version, including retention periods and sub-processor names, sits in the privacy policy.

Processing	Legal basis	Justification
Contributor video capture	Explicit consent (Art. 6.1.a)	Each contributor signs a mission-specific consent form before recording. Consent is recorded, dated, and revocable.
Face pseudonymisation	Data minimisation (Art. 5.1.c); legitimate interest in protecting third parties (Art. 6.1.f); data protection by design (Art. 25)	Facial blurring is applied as a security and data-minimisation measure before any subsequent processing; the processed video remains personal data.
Hand pose, depth, segmentation	Legitimate interest (Art. 6.1.f)	The data is already pseudonymised at this point. The processing is necessary to produce a usable dataset, the impact on data subjects is minimal, and the balancing test documents this. An effective right to object (Art. 21) is available.
Action labelling	Legitimate interest (Art. 6.1.f)	Processing runs on EU-region infrastructure, only on pseudonymised frames. The balancing test and the sub-processor agreement are documented.
Delivery to client	Contract (Art. 6.1.b)	The delivery is the core of the contractual relationship between EgoVista and the client. The dataset is produced and shipped for that purpose.
Post-delivery retention	Legitimate interest (Art. 6.1.f)	Retention windows are short, documented, and serve a defined purpose: re-packing into another format, quality re-review on dispute, contractual support.

A DPIA (Data Protection Impact Assessment) covering the full pipeline is available on request for enterprise clients under NDA.

5 of 8 sections

Contributor rights and data subject access.

Contributors keep the standard set of GDPR data subject rights. The way each right is honoured on the EgoVista side:

Right of access: a contributor can request a copy of the data captured during a mission, along with the processing log, within the legal one-month window.
Right to withdrawal: consent can be withdrawn at any time, in writing. Data that has not yet been delivered is deleted within thirty days. Data that has been delivered is covered by the contractual chain described to the contributor at signup.
Right to erasure: EgoVista erases the data from its own systems (production storage, backups, annotation pipeline) within thirty days and notifies the client of the request, in discharge of its notification obligation under Article 19 GDPR. EgoVista cannot guarantee erasure on a third party's systems, and Article 17(3) exceptions may apply on the client side.
Right to portability: a contributor can request their data in a structured, machine-readable format.
Complaint procedure: contact details for the data controller and for the relevant supervisory authority (CNIL in France) are provided in the privacy policy, in plain language.

6 of 8 sections

AI Act and high-risk system data requirements.

Under the EU AI Act, robotics products that perform safety-critical functions can fall under the high-risk system category. Such systems require structured data governance, transparency about training data, and risk management. EgoVista contributes to that governance by shipping each dataset with:

Provenance metadata: which contributor recorded which episode, when, where, on which equipment.
Annotation provenance: which model version produced each annotation layer, when, on which compute zone.
Per-dataset QA report: schema validation, completeness metrics, known limitations.
Dataset card: a Hugging Face-style document with intended use, characterisations of distribution, known biases, and recommended evaluation practice.

The dataset card is intended to slot into your AI Act compliance documentation without rework. We do not certify the downstream system, that is your team's responsibility, but we make sure the dataset side does not become the missing piece.

7 of 8 sections

How EgoVista handles client confidentiality.

On the client side, the same posture applies. Datasets are produced for the commissioning client and are not reused for another client, with the contractual exclusivity terms agreed at engagement. Raw footage used to build a delivered dataset is purged ninety days after delivery, unless an extended retention is part of the engagement. NDAs are available before any technical conversation, and most enterprise engagements start with a mutual NDA. The storage and processing cost during the project window is absorbed in the delivery fee, with no per-gigabyte surprise on the invoice.

8 of 8 sections

GDPR and compliance frequently asked questions.

Is your dataset legal to use in EU production deployments?

Yes, under standard conditions. Every processing step in our pipeline has a documented legal basis under GDPR, and the data is collected with informed contributor consent. The dataset card we ship documents what was collected, on which legal basis, with which retention. For deployment in a high-risk AI system under the EU AI Act, your team is responsible for the broader governance (risk management, transparency, post-market monitoring), but the dataset side is built to slot into that governance without rework.

Can you provide a DPIA for our compliance review?

Yes. A Data Protection Impact Assessment covering the EgoVista capture and annotation pipeline is available on request for enterprise clients, under a mutual NDA. The DPIA describes the data flows, the legal bases per processing operation, the risks identified and the mitigations applied, including the local pseudonymisation step and the EU-only compute path. Your DPO or compliance team can use the document as a starting point for your own DPIA.

What happens if a contributor withdraws consent after delivery?

Contributors can withdraw consent at any time. For data still in our pipeline that has not been delivered, withdrawal triggers deletion within thirty days. For data already delivered to a client, EgoVista erases it from its own systems within thirty days and notifies the client of the request, in discharge of its notification obligation under Article 19 GDPR, so the client can give effect to it under its own GDPR procedure. EgoVista cannot guarantee an outcome that depends on the action of a third party, and Article 17(3) exceptions may apply on the client side. The contributor agreement documents both paths in plain language so no party is surprised.

Are facial features completely removed or just blurred?

Faces are blurred with a conservative Gaussian filter that prevents identification while keeping the body context intact for hand and object segmentation. The blur radius is calibrated to defeat off-the-shelf face recognition models on the pseudonymised frame, and we verify the result before any external API call. This facial blurring is a pseudonymisation measure (a security and data-minimisation step), not anonymisation: the processed video remains personal data because other identifiers may persist. For projects with stronger pseudonymisation needs, we can apply a stronger filter or a mask-and-fill technique, with documented impact on downstream annotation quality.

Do you process any data outside the EU?

No. Every step that handles personal data runs in the EU: object storage in the EU region, segmentation inference on EU-region GPU compute, action labelling on EU-region infrastructure, and an EU-region database. Email notifications are sent via a transactional email provider, with the email address being the only piece of personal data exposed to a US sub-processor, covered by the EU-U.S. Data Privacy Framework with Standard Contractual Clauses as a fallback. The full, dated list of sub-processors is published at egovista.app/legal/sub-processors and the data flow is documented in the privacy policy.

Can you sign a DPA (Data Processing Agreement)?

Yes. We provide a standard DPA aligned with GDPR article 28, covering the scope of processing, the sub-processors involved, the location of processing, the security measures applied, and the procedure for handling data subject requests. The DPA is signed before any client data crosses into our pipeline. Custom amendments to the standard DPA are accepted on request, within the limits of our compliance posture.

How long is raw footage retained before deletion?

Raw footage that was used for a delivered dataset is purged ninety days after delivery, unless the engagement explicitly requires longer retention for re-export or quality re-review. Pseudonymised frames are retained alongside the dataset for thirty days post-delivery to allow re-packing into a different format. Logs and metadata used to reconstruct the annotation provenance are retained longer, but they do not contain identifiable content.

Request a compliance brief.

Your compliance team can review the EgoVista pipeline before any data is exchanged. We can send a DPA template, a high-level architecture description, and a DPIA summary under NDA. For related material, see the product overview, the LeRobot format details, and the RLDS format details.

Request a compliance brief Back to home