Part of the
4TU.
Ethics and Technology
4TU.
Ethics and Technology
Close

4TU.Federation

+31(0)6 48 27 55 61

secretaris@4tu.nl

Website: 4TU.nl

Š ChatGPT: voice alienability in the age of AI

Voice Alienability in the Age of AI

17/11/2025

When OpenAI released a new voice for ChatGPT that sounded uncannily like actress Scarlett Johansson, the company found itself in the middle of a public outcry over voice misappropriation. The resemblance wasn’t accidental. OpenAI had reportedly approached Johansson, hoping to capitalise on her iconic role as the voice of an AI assistant in Her. She declined — yet a voice eerily similar to hers still appeared in the system. After Johansson took legal action, the company removed the voice and insisted that it had hired a different actor.

This episode is one of many that have gotten people thinking about whether we have a right to our own voice (Kimppa & Saarni, 2008; Scott et al., 2019) and how existing legal frameworks can—or cannot—protect it (Patel, 2024).

In this blogpost, I propose that these questions are the expression of a deeper philosophical problem brought about by AI. As I argue here, voice synthesis technology is an example of the new and revolutionary developments in what Stiegler’s (2010) calls grammatisation of the body— the process by which human capacities are translated into technical systems, making them alienable: separable, ownable, and marketable.

Who owns a synthetic voice?

Synthetic voices, also known as AI voices or text-to-speech (TTS) voices, are created by training computational models on recordings of human speech (Hande, 2014). Companies recruit “voice donors” whose vocal profiles (gender, tone, age, emotional timbre) match their desired persona. These recordings are then used to train models capable of generating new speech that can sound indistinguishable from the original.

You’ve probably heard such voices on Siri, Google Maps, or public announcements on the train. They are now so human-like that distinguishing human from synthetic is often difficult, especially when the sentences they read are short and simple.

Yet the Johansson case is hardly unique. Many professional voice actors have discovered that their voices have been synthesised and used without their fully informed consent. This raises these tricky questions: Do we own our voices? Do we also own the synthesised version of them?

Legally, the issue is murky. Voices may not fit neatly into intellectual property law, since they are not “fixed in a tangible medium” (Patel 2024). Even if they were, ownership remains disputed. Should the voice donor alone be the rights-holder, as a personality-based theory might suggest? Or should ownership be shared between donor and developer?

The fact that we are even debating these questions signals a deeper shift. Until recently, your voice was inseparable from you — your identity, your body, your presence. Today, anyone with the right technology, and a little bit of your speech data, can make “you” speak words you never uttered. And we are talking about a technology that is scalable and much more accurate than any human doing voice impersonation.

The alienation of the human voice

Voice donors are increasingly reduced to sources of data — raw material for machine learning. Their voices, once a form of artistic or personal expression, become inputs for synthetic products.

This process mirrors what Stiegler (2010) calls grammatisation: the conversion of human gestures, actions, and knowledge into forms that machines can record, reproduce, and automate. According to Stiegler (2010), grammatisation underlies modern capitalism’s shift to cognitive labour, producing new forms of proletarianisation — the loss of both savoir-faire (know-how) and savoir-vivre (know-how-to-live).

In the case of voice technology, grammatisation turns the voice into a new kind of cognitive capital. Where once you needed a live speaker to record an audiobook or an advertisement, you now need only a few hours of training data. The model can then generate endless speech — fast, cheap, and tireless.

However, there is something particularly different about the grammatisation of the human voice. The voice is not just a sound; it carries identity. A synthetic voice still bears traces of the donor, in the tone, accent, rhythm, even if used in contexts the donor never approved. The result is a peculiar paradox: your voice can work without you, yet it still sounds like you.

Voice actors, once paid for their performances, now compete with the synthesised versions of themselves. They produce surplus value for companies while losing control over their own means of expression — a textbook case of technological alienation. This is what is happening nowadays with the question “who owns a voice”. Voice donors are seeing the effects of this process of proletarianisation.

Whose voice and who is speaking?

Voice synthesis also feeds a broader cultural myth: the belief that AI systems can speak or understand like humans. As Filippo Santoni de Sio (2024) argues, this illusion supports the narrative of autonomous technology — that machines act and communicate independently.

In reality, what we hear is not an AI “speaking” but a complex ventriloquism: the text is either written by a human author or generated by an LLM trained on human labour. The voice is a text-to-speech system trained on another person’s vocal labour. The human disappears twice, first as author, then as speaker, to support this illusion.

Linguist Emily Bender and Alexander Koller (2020) warned that such systems conflate form and meaning, mistaking the ability to produce grammatical sentences for genuine understanding. Synthetic voices amplify that confusion by adding an illusion of empathy, authority, or personality.

But this illusion breaks down when the voice is recognisable. When we hear Johansson, not ChatGPT. At that moment, we are reminded that behind every “AI voice” lies a real person’s identity and labour, re-packaged as a product. The voice has been turned into an alienable possession as a new marketable product, but it remains the voice of someone, even if not in the legal sense. We see that Stiegler’s (2010) notion of grammatisation is useful to understand voice synthesis technology in a politico-economic framework but also that voice synthesis, as a form of grammatisation, opens up new questions about the automation and reproduction by machines of identity-bearing human capacities.

Why establishing legal ownership is not enough

So, should Johansson be able to claim ownership of her synthetic voice? Legal scholars are beginning to say yes, proposing frameworks that treat synthetic voices as intellectual property (Patel 2024). Yet law alone cannot capture what is at stake.

The problem is not just about ownership but about how technology reshapes what can be owned in the first place. Once a human trait like voice becomes replicable and tradable, the boundary between person and product blurs.

Stiegler’s (2010) pharmacological view of technology reminds us that every innovation is both poison and cure. Voice synthesis can empower — enabling speech for those who’ve lost it, preserving endangered languages, or creating accessible media. But it can also exploit, alienating the human voice from its owner and feeding systems that profit from imitation.

As voice synthesis tools become commonplace, the philosophical challenge is to rethink what it means to speak in one’s own voice. The question is no longer only “Who owns the voice?” but “What does it mean when our voices can speak without us?”

References

Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. 5185–5198. https://www.nytimes.com/2018/11/18/technology/artific

Hande, S. S. (2014). International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering A Review on Speech Synthesis an Artificial Voice Production. www.ijareeie.com

Kimppa, K. K., & Saarni, T. I. (2008). Right to one’s voice. In T. W. Bynum, M. Calzarossa, I. De Lotto, & S. Rogerson (Eds.), Conference Proceedings of ETHICOMP 2008: Living, Working and Learning beyond Technology.https://www.researchgate.net/publication/298211140

Patel, P. (2024). AI Voice Enters the Copyright Regime: Proposal of a Three-Part AI Voice Enters the Copyright Regime: Proposal of a Three-Part Framework. Fordham Intellectual Property, Media and Entertainment Law Fordham Intellectual Property, Media and Entertainment Law Journal, 34, 2024. https://ir.lawnet.fordham.edu/iplj/vol34/iss2/6

Santoni de Sio, F. (2024). Human Freedom in the Age of AI.

Scott, K. M., Braude, D. A., Ashby, S., & Aylett, M. P. (2019, August 22). Who owns your voice? Ethically sourced voices for non-commercial TTS applications. ACM International Conference Proceeding Series. https://doi.org/10.1145/3342775.3342793

Author Bio

Matilde Nanni is a PhD candidate at the University of Inland Norway, exploring the ethical dimensions of voice and speech technologies at the crossroads of philosophy, linguistics, and artificial intelligence. She is especially interested in phenomena such as deepfakes and deathbots. Â