ava's blog

AI clones and data protection

A few days ago, news spread through the web about a Meta project for letting an AI run the social media account of a deceased person, as it could emulate the person's activity like posting content and responding to messages. The goal was to maintain engagement on the platform and reduce the grief when a person passes away. If you believe a screenshot going around, a poster on 4chan revealed this years prior, saying it has the internal name "Project Lazarus", referencing the Lazarus of Bethany.

While Meta spokespeople said they had no plans to pursue this (yet?), there are other services like ELIXIR AI, who want to push digital immortality via an "eternal doppelganger from a customer's lifetime data".

In general, we are already dealing with a deluge of deepfakes online. Not only are people using AI to remove the clothes on the images of people, but they are also creating new images, video and audio material with a person's physical and vocal likeness, trained on even just a handful of photos, up to terabytes of video material if it's a popular and active YouTuber.

This also happens in the education and entertainment industry. Notable figures have digital copies in museums and other places to be interacted with, and deceased actors get "revived" to show up or to lend their voice to a character. Researchers talk about this as "spectral labour" in a "postmortal society", meaning the "exploitation of digital remains for aesthetically pleasing, politically charged, and communicative representations". The companies that provide these resurrection services are referred to as "*transcendence industry".

The tech and availability is changing fast, and as with any developing field, it can be hard to apply existing legal frameworks that didn't have this use case in mind specifically. While I have to leave the issues around general ethics and monetization to another day, I'd like to focus further on (European) data protection and privacy laws!

☁️☁️☁️

First up, good to know: Are your body and voice capable of being personal data? Yes! They make you identifiable.

You can also see this in Article 9 GDPR, which prohibits processing data related to racial or ethnic origin, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person and data concerning health, unless it falls into very specific allowed purposes. Your body carries this type of information. Additionally, the European Data Protection Board has also given out guidelines that suggest that voice data is considered inherently biometric.

That means making a model of you via a series of photos from different angles, motion capture, voice recordings etc. is processing personal data, some of it sensitive data under Article 9 GDPR. This is then further processed during AI training and finetuning to reproduce a person's physical or vocal likeness reliably.

Recital 51 of the GDPR mentions:

"The processing of photographs should not systematically be considered to be processing of special categories of personal data as they are covered by the definition of biometric data only when processed through a specific technical means allowing the unique identification or authentication of a natural person."

So, simply taking or editing some pictures is not considered processing of special (sensitive) personal data, as this would reach too far; it needs specific technical means that take measurements to turn it into biometric data, like when you set up to unlock your phone for FaceID, or if you get an eye scan or fingerprint scan to be able to unlock a door with your eye or finger. There are actually quite a few interesting discussions on whether taking a picture of someone wearing glasses is processing data about their health - but I digress.

AI models trained on reproducing your likeness reliably have turned you into a dataset, a bunch of measurements, a model, which generally counts as biometric data processing. Once data processing falls under Article 9 GDPR, legal bases of Article 6 GDPR - like legitimate interest, fulfillment of a contract, compliance, etc. - fall away, as only the specific allowances of Article 9(2) GDPR make an exception from the general prohibition. In the case of the entertainment and education industry, that will likely reduce it to the explicit consent named in:

"a) the data subject has given explicit consent to the processing of those personal data for one or more specified purposes, except where Union or Member State law provide that the prohibition referred to in paragraph 1 may not be lifted by the data subject"

This impossible for people who have already passed away, but you can usually ask their estate/remaining family members for consent in their stead though.

Consent, under GDPR, always needs to be given freely. Article 7 GDPR says, among other things,

"When assessing whether consent is freely given, utmost account shall be taken of whether, inter alia, the performance of a contract, including the provision of a service, is conditional on consent to the processing of personal data that is not necessary for the performance of that contract."

It's also referred to as the Coupling prohibition. That may be difficult to avoid in the entertainment industry: What if getting the role is tied to agreeing with AI cloning, even if not explicitly, then implicitly? What if refusing, at some point, gets you blacklisted? What if agreeing has an effect on your success and income at an agency? Many actors now have to deal with this as studios try to reduce the time spent on set for actors to reduce costs via AI clones, and also want a backup AI clone option in case the actor dies during production.

What's also problematic: How do you you freely and productively consent to something you don't understand?

Of course, you don't need to be an expert in everything, but usually, stuff is pretty straightforward in terms of taking pictures, video or audio recordings. Explaining how AI models work has been very difficult, even for people deeply involved, but now we are likely dealing with studios who are completely uninvolved with the company that actually handles the AI cloning.

And also, how do you properly inform someone contractually about how their data will be used and processed if the field and possibilities develop so fast? It's difficult to anticipate potential future use cases you'd want or not want. And if the data gets sent to somewhere outside of the EEA, you have a so-called 'third country transfer' to worry about, which needs special considerations and protections.

Now, we have established that your body and voice are personal data, and that processing them in this way falls under the GDPR. What about your clone data within the training set, or the output itself?

This is a bit controversial at the moment! It makes sense that this would also be regarded as personal data, as it is still identifiably you when it gets used with zero alterations. Where it gets problematic are use cases where you lend your likeness to something, especially your voice.

For example: Use for an ad that is not supposed to literally embody you, but instead just offer a neutral voice-over; or you're the new voice for Siri; or you might synchronize a cartoon character. Obviously, your friends and family could reliably recognize your voice, so it could count. But there are data protection authorities in Germany who vouch for a more usage-oriented interpretation, meaning: If your clone is used to identify you and represent you in some content, it is biometric identification, but if your voice is just used as one voice for a job, it's just imitation or synthesis.

I don't agree with that, as the data itself and the identification methods are still the same and current synthesis usage can still be used for biometric identification later, but that's the discussion right now.

Okay, so this type of data generally falls under the GDPR. That means I have the same rights as usual - right to deletion, too. But how I said before in my post about AI and the GDPR, it can be hard or impossible to delete data from a training set. Deleting the entire model or having to retrain it would incur massive costs and losses; it would make more sense instead to have more individual models that can be more easily separated and deleted, if possible. But since that is not in control of the person holding the rights, it might be hard to enforce them.

It's equally difficult for the output of these models: That falls under the GDPR as well and would be affected by the deletion or restriction requests, but that's also where lots of contracts, laws and rights collide. It needs to be assessed in each case individually.

☁️☁️☁️

There was an interesting case in Germany a while ago: A YouTuber using an AI generated voice from a famous voice actor in his videos, and the actor objecting to it.

The YouTuber had around 190,000 subscribers and an associated online shop. He published two political satire videos on YouTube that used an AI-generated voice that closely imitated the actor’s voice, but didn't label that it was AI. Viewers in the comments identified the voice as the actor's as well. The videos ended with references to the online shop, which sold merchandise linked to the channel’s political opinions.

The actor objected to the use of his voice towards the YouTuber and requested he stops, and wanted reimbursement of legal costs. The YouTuber agreed to cease, but refused to pay damages, arguing that the voice was synthetic, lawfully acquired from an AI voice provider, and used for satire rather than advertising. Meanwhile, the actor claimed that the AI-generated voice constituted use of his personal voice, that the processing occurred without consent, and that it created the impression that he endorsed the videos and products. He now also sought compensation equivalent to his usual licensing fees.

The court sided with the actor and saw that the YouTuber interfered with the actor’s right to his own voice, as despite being AI, the voice closely imitated a distinctive personal characteristic. The court considered that a significant part of the audience would associate the voice with the data subject, which was sufficient to establish personal attribution. As expected and further explained above, the court rejected the reasoning of "legitimate interest" in Article 6(1)(f) GDPR, and saw that the voice primarily served the YouTuber's commercial interests. No exemption applied under Article 85 GDPR as the processing was neither journalistic nor genuinely artistic in a way that would justify overriding the data subject’s rights, particularly given the commercial context and the lack of transparency about the AI-generated nature of the voice.

As a consequence, the court ordered the YouTuber to pay €4,000 as a fictitious license fee for the unauthorized use of the voice and €1,155.80 for reimbursable legal costs, plus interest.

☁️☁️☁️

I think it's important to talk about this as it doesn't only affect actors and voice actors, or historical people's likeness used in the classroom or at concerts, but also has the potential to affect you.

Your employer could ask to make an AI clone of you, for example.

At the data protection law conference I attended in Munich, the AI Officer of a big insurance firm said they are holding the data protection trainings required for their employees via AI generated videos and AI generated avatars of him and his colleague. That means employees that need to do the training get in a digital environment with this avatar of him that responds, smiles, blinks and leads them through the material, some of which is AI generated as well.

Circling back again to this research paper, we are at a point in time where, depending on your job, your body and voice can work independently of you, and people can monetize you after your death not by further selling what you produced in your lifetime, but producing new things indefinitely that you had no hand in while you were life, or selling access to "you". Eerie, huh?

So it's important to know your rights and what's going on in the space :)

Reply via email
Published

#2026 #data protection #tech