This is a living document. The last update is from: 22.10.2024
Voice carries our identities, emotions, labor and cultural heritage. Voice is by nature not a thing, but a process of exchange that arises in an ecosystem of senders and receivers, those who sound and those who listen. As we enter an era where voice is widely digitized and framed as "data" to fuel artificial intelligence, we must critically examine and take action in shaping the values that guide the computational collection, handling, sharing and storage of voice. This document aims to establish principles for engaging with digitised and digital voices with care and responsibility, and to balance innovation with accountability to the diversity of value systems around voice.
Voice technologies are ubiquitous, societally influential and rapidly evolving - however, these technologies are wholly dependent on human voice as their source material. This makes it crucial to establish ethical guidelines that ensure the development and use of these tools in ways that do not harmfully exploit individuals and communities of voice. Vocal Values Principles (VVP) represent a commitment to a future where voice technologies are created in such a way that their sources of voice are transparent and traceable, with due respect and acknowledgement for the humans whose contributions to these systems go beyond the reductionist notion of "data acquisition". By staking out these principles, we lay the groundwork for communities and practitioners of voice to organize behind a set of principles, and for organisations to act on their commitments to privacy, inclusivity, and fair treatment of those who contribute their voices.
A nuanced, ecological and dynamic understanding of consent is a core vocal value. Consent is about the right to say "no", to say "yes", and to understand that "no" and "yes" are complex and nuanced, not a binary. For example, consent includes the right to say "yes, but with conditions", and the right to change one's mind. Consent may be individual, collective, or somewhere in between. It may be explicitly given, or implicit in the form of a set of values in situations where explicit consent is not possible, in either case an understanding of consent requires an understanding of context, values and intention.
In the context of capture and digitisation of voice, consent must be:
Vocal Values reject:
Compensation principles describe fair and reciprocal exchange surrounding the giving of voice, a realistic and comprehensive assessment of its value, and its downstream exploitation and monetization. It encompasses ideas of exchange and of value in a broad sense, including financial remuneration, benefits from technological advancements, contributions to enhancing and respecting the value of vocal labor, and contributions the preservation and enrichment of voice communities. The principles of compensation acknowledges that the value of voice in the economic and knowledge value chains of voice technologies should be understood holistically, not as a clean and simple exchange of goods, but as an exchange of body, culture and craft that requires care and diligence.
In the context of capture and digitisation of voice, the principles of Vocal Values require that compensation models must include:
The the principles of Vocal Values rejects:
While it is indeed challenging to determine fair valuation for compensation, this challenge is not insurmountable and first requires action to rectify the deeply problematic power imbalance between the individuals and communities who provide vocal labor, and those who receive it, handle and work with it, and the end users of systems built upon it. An actionable starting point towards this rebalancing can take the form of transparency and responsible reporting of how voice is received and handled by those who work with it. Another place to implement this rebalancing is in the form of putting in place equitable revenue sharing models if and when voice which was given leads to commercialized software products or AI models.
To have control is to have a choice in how one's voice is used. Vocal Principles of control also imply enforcement, which encompasses being aware, being heard, and having recourse when violations occur. These principles are essential in ensuring that individuals and communities retain agency over their voice data. The principles explicitly reject the creation of black-box AI systems and foundation models that obscure the origins and sources of training data, unless that obfuscation is a direct request of the givers of voice, and situations whereby voice is labelled or categorized with no involvement of the givers, or to infer subjective qualities about an individual or group based on automated voice analysis (e.g. trustworthiness).
In the context of capture and digitisation of voice, the Principles of Vocal Values advocates for the following principles of control:
We need to recognize the complex ecosystem of voice data, acknowledging that fair compensation must consider the various roles and contributions within this system. Theis voice data ecosystem encompasses a chain of participants, each playing a crucial role in the life cycle of voice data.
At the beginning of this chain are the Senders - individuals who create voice, projecting it from their bodies into the world, often unknowingly placing it into the hands of others. These Senders are the primary source of the unique, personal data that fuels voice technologiesAI systems.
Next are the Receivers, which can be people or machines. They actively seek out, accept, and collect digitised these voices, sometimes with the explicit consent of the Senders, but often without. The ethical implications of how these voices are received and collected form a critical part of the discussion around vocal values.
The ecosystem also includes Handlers - like typically data scientists, AI engineers, and data labellers. These individuals or systems process and manipulate the collected voice data, transforming it into usable forms for AI systems. Their work potentially enriches the source voice, raising questions about how this added value should be recognized and compensated.
Finally, there are also Secondary Handlers, who interact with voice data indirectly through products created by the primary Handlers. This group may or may not include end-users of AI voice products or systems that build upon existing voice AI technologies.
We need torecognise this complex ecosystem of voice data, and consider the various roles and contributions of stakeholders within it, to approach vocal values holistically.
Vocal Values Principles describe foundational desires for a future where voice-enabled AI enhances human communication and creativity where technological, legal, and cultural innovation is directed towards the enrichment and propogation of the values of individuals and communities of voice, upon whom these technologies depend. We assert that such an approach to voice in AI systems is not just possible, but essential for the widely beneficial, culturally rich and non-homogenising development of this technology.
In practice, these principles require a rethinking of innovation beyond the profit motive, which simply leads our societies away from the creation of convivial technologies, but only serves to increase and accelerate human exploitation, and extractive mindsets which are patently unsustainable and in many cases, run afoul of core human rights.
This rethinking requires a shift in consciousness, and a great innovative imagination, as well as coalition building between individuals and communities across the vocal ecosystem. In practical terms, we must develop technological, cultural and legal frameworks that implement:
These frameworks may be thought of as mechanisms of enforcement that ensure accountability, transparency and provide pathways for recourse if voice is misused or agreements are violated. They include the necessary development of more flexible and nuanced licensing frameworks specifically for voice, that are able to bypass the current legal loopholes around "fair use" that allow for legally-sanctioned exploitation and decontextualisation of voice. They include auditing and review processes that adequately asses harms to potentially devalue, decontextualise and exploit individuals and communities of voice. They include the development of context-full and privacy-enhancing technologies for the storage and process of digitized voice.
In pursuit of these practical implementations, we envision a world where developers, companies, policymakers, researchers, and individuals collaborate to create a responsible voice ecosystem that also does not overly fixate on the individual, but also acknowledges collective ownership. The Vocal Values Principles serve as a beacon for this vision, having the potential to serve as the foundation for self-assessment frameworks and certifications.
We especially advocate for the creation of a Vocal Values Certification, which can be applied in a dynamic and flexible way acknowledging different collaborators in the totality of a vocal ecosystem, which includes the givers, senders, receivers, handlers, transformers, and secondary users of voice. There are many examples in the wild already of formats by which an individual, community or organisation may receive a fair practice certification of one type of another. We advocate considering the effectiveness of these examples and reflecting on how they can be adapted to the unique embodied, situated and transmissive qualities of voice.
As voice AI and collection practices around voice are a moving target informed by cultural, technological and legal developments, we must remain vigilant and adaptable in upholding these principles. Human voice(s), in all its complexity and richness, must be protected as a fundamental aspect of identity, physiology, craft, labor and culture. Through the Vocal Values Principles, we affirm our dedication to shaping a future where voice technology enhances human potentials rather than devalues, exploits or extinguishes them, fosters a vocal ecology that is inclusive and transparent, and contributes positively to communities of voice.
By adhering to the Vocal Values Principles and striving for excellence in voice data management, we can build a future where voice technology not only innovates but also upholds the highest standards of ethics and quality, benefiting those whose voice was used to create these technologies.
The Vocal Values principles were originally developed in September 2024 as part of the S+T+ARTS AIR project D̴̼̥̉a̸͂͜ͅd̵͂͑ͅą̶̀͐s̵̯̈́e̶̜͔̍͝t̶̘͎̃s̶̫͙͂ and a collaboration between Jonathan Reus, Wiebke Hutiri, In4Art and over 40 individuals from diverse fields of practice, research and industry of voice. These ideas have been workshopped through a public event at Mozfilla Fest Amsterdam 2024 titled "The Values of Voice", and through a second expert workshop of the same name held online September 16th in collaboration with The Hmm - Platform for Internet Cultures. This document draws inspiration from numerous sources, including various manifestos, certification principles and ethical frameworks in the field of technology and data ethics.
♡ → This manifesto is a living document, open to revision and expansion based on ongoing dialogue and emerging insights in the field of voice AI ethics.
If you would like to give feedback or participate in the revision process, contact us via this contact form, or send an email to vocalvalues[at]jonathanreus.com
#VocalValuesPrinciples #AIEthics #VoiceRights #TechPolicy