Principles of Vocal Values

Preamble

Voice carries our identities, emotions, labor and cultural heritage. Voice is by nature not a thing, but a process of exchange that arises in an ecosystem of senders and receivers, those who sound and those who listen. As we enter an era where voice is widely digitized and framed as "data" to fuel artificial intelligence, we must critically examine and take action in shaping the values that guide the computational collection, handling, sharing and storage of voice. This document aims to establish principles for engaging with digitised and digital voices with care and responsibility, and to balance innovation with accountability to the diversity of value systems around voice.

Voice technologies are ubiquitous, societally influential and rapidly evolving - however, these technologies are wholly dependent on human voice as their source material. This makes it crucial to establish ethical guidelines that ensure the development and use of these tools in ways that do not harmfully exploit individuals and communities of voice. Vocal Values Principles (VVP) represent a commitment to a future where voice technologies are created in such a way that their sources of voice are transparent and traceable, with due respect and acknowledgement for the humans whose contributions to these systems go beyond the reductionist notion of "data acquisition". By staking out these principles, we lay the groundwork for communities and practitioners of voice to organize behind a set of principles, and for organisations to act on their commitments to privacy, inclusivity, and fair treatment of those who contribute their voices.

Vocal Values Core Principles

⩫ CONSENT ⩫

A nuanced, ecological and dynamic understanding of consent is a core vocal value. Consent is about the right to say "no", to say "yes", and to understand that "no" and "yes" are complex and nuanced, not a binary. For example, consent includes the right to say "yes, but with conditions", and the right to change one's mind. Consent may be individual, collective, or somewhere in between. It may be explicitly given, or implicit in the form of a set of values in situations where explicit consent is not possible, in either case an understanding of consent requires an understanding of context, values and intention.

In the context of capture and digitisation of voice, consent must be:

Voluntary and respectful of individual and/or collective rights
Informed by the context and cultural norms of those who contribute their voices
Grounded in and bounded by permitted uses
Clearly and transparently communicated in colloquial language
Clearly defined in terms of the duration for which it is valid
Regularly re-evaluated
Clearly defined in terms of who grants consent, who receives consent, and who may revoke consent
Honoured when it is withdrawn
Responsive to possible future technological changes, and articulate values that can be followed in case of such changes

Vocal Values reject:

binary models of consent, one-time consent models that fail to capture the mutliplicity of voice use and the nuances of its creators' values
models where consent given for one purpose extends to all potential future uses
models that treat the open release of software or AI models as a stand-in for true transparency or as a justification for a lack of true consent
models that treat openness as a concept that is defined by the handler of voice without taking into account the values of the creators of voice through the principles of consent described above

⩫⩫ COMPENSATION ⩫⩫

Compensation principles describe fair and reciprocal exchange surrounding the giving of voice, a realistic and comprehensive assessment of its value, and its downstream exploitation and monetization. It encompasses ideas of exchange and of value in a broad sense, including financial remuneration, benefits from technological advancements, contributions to enhancing and respecting the value of vocal labor, and contributions the preservation and enrichment of voice communities. The principles of compensation acknowledges that the value of voice in the economic and knowledge value chains of voice technologies should be understood holistically, not as a clean and simple exchange of goods, but as an exchange of body, culture and craft that requires care and diligence.

In the context of capture and digitisation of voice, the principles of Vocal Values require that compensation models must include:

A recognition of the value of vocal labor that exceeds the instantaneous production of sound, such as valuations based solely on minutes or hours of recordings
A recognition of the value of the development of vocal craft, which in some cases may be decades
A recognition of potential long-term contribution to devaluation of such craft, due to increasingly sophisticated attempts at automating both the labor of voice production and voice perception
A recognition of the value of voice beyond the individual(s) producing sound, as voice is simultaneously individual and collective, arising from a culture, community and in many cases artistic or linguistic traditions that have been built over generations
A recognition of the value of voice as an outline of a human body, imbued with physiological cues as to the identity and state of that body
A recognition of the value of data labelling and categorisation as not only "data processing", but as the perceptual labor of listening, of receiving and understanding voice, and is therefore also both embodied and cultural
A consideration of alternative economic and technological models if the currently used model cannot adequately compensate for the above valuation principles
Exploring alternative economic models that respect the collective nature of language and voice

The the principles of Vocal Values rejects:

One-size-fits-all value models that undervalue vocal labor and overvalue engineering labor
Value models which only consider reciprocity in terms of financial exchange

While it is indeed challenging to determine fair valuation for compensation, this challenge is not insurmountable and first requires action to rectify the deeply problematic power imbalance between the individuals and communities who provide vocal labor, and those who receive it, handle and work with it, and the end users of systems built upon it. An actionable starting point towards this rebalancing can take the form of transparency and responsible reporting of how voice is received and handled by those who work with it. Another place to implement this rebalancing is in the form of putting in place equitable revenue sharing models if and when voice which was given leads to commercialized software products or AI models.

⩫⩫⩫ CONTROL ⩫⩫⩫

To have control is to have a choice in how one's voice is used. Vocal Principles of control also imply enforcement, which encompasses being aware, being heard, and having recourse when violations occur. These principles are essential in ensuring that individuals and communities retain agency over their voice data. The principles explicitly reject the creation of black-box AI systems and foundation models that obscure the origins and sources of training data, unless that obfuscation is a direct request of the givers of voice, and situations whereby voice is labelled or categorized with no involvement of the givers, or to infer subjective qualities about an individual or group based on automated voice analysis (e.g. trustworthiness).

In the context of capture and digitisation of voice, the Principles of Vocal Values advocates for the following principles of control:

The (co)development of systems - legal, technological and culture - which empower individual and/or community givers of voice to maintain an element of control over voice they have given
Track and trace, such that individuals and communities of voice may know when the voice they have contributed is used as a resource - either handled, or added to a repository, such as datasets or recording archives
The ability for givers of voice to withdraw voice contributions from handlers or storage points that they do not agree with
AI systems that can "forget" voice when source recordings are withdrawn from their training datasets
Data Orality. The ability for givers of voice to specify a "period of listenability", or "period of presence" that is respected by legal, technological and cultural means, such that their voice ceases to exist or be useable in any encoded format, digital or otherwise, after the end of this period.
Input from voice contributors on how their voices are categorized and labeled
Encoded Exegesis. The ability for givers of voice to specify "wishes and interpretation of the voice itself for acceptable use and labelling" that transcend independent contracts for voice use, but rather travel with and are intrinsically bundled with the voice itself, such that any receiver/handler/secondary user will encounter these wishes and be able to ascertain the intent and values of the voice's source bod(y)ies.
Digital voice representation formats and collections/datasets/archives with mechanisms that support Data Orality and Encoded IntentOther yet-imagined mechanisms for the intents, beliefs, and wishes of voice contributors to be heard in situations where it is not possible for them to be physically present.

The Techno-Social Web of Voice

We need to recognize the complex ecosystem of voice data, acknowledging that fair compensation must consider the various roles and contributions within this system. Theis voice data ecosystem encompasses a chain of participants, each playing a crucial role in the life cycle of voice data.

At the beginning of this chain are the Senders - individuals who create voice, projecting it from their bodies into the world, often unknowingly placing it into the hands of others. These Senders are the primary source of the unique, personal data that fuels voice technologiesAI systems.

Next are the Receivers, which can be people or machines. They actively seek out, accept, and collect digitised these voices, sometimes with the explicit consent of the Senders, but often without. The ethical implications of how these voices are received and collected form a critical part of the discussion around vocal values.

The ecosystem also includes Handlers - like typically data scientists, AI engineers, and data labellers. These individuals or systems process and manipulate the collected voice data, transforming it into usable forms for AI systems. Their work potentially enriches the source voice, raising questions about how this added value should be recognized and compensated.

Finally, there are also Secondary Handlers, who interact with voice data indirectly through products created by the primary Handlers. This group may or may not include end-users of AI voice products or systems that build upon existing voice AI technologies.

We need torecognise this complex ecosystem of voice data, and consider the various roles and contributions of stakeholders within it, to approach vocal values holistically.

From Principles to Practice

Vocal Values Principles describe foundational desires for a future where voice-enabled AI enhances human communication and creativity where technological, legal, and cultural innovation is directed towards the enrichment and propogation of the values of individuals and communities of voice, upon whom these technologies depend. We assert that such an approach to voice in AI systems is not just possible, but essential for the widely beneficial, culturally rich and non-homogenising development of this technology.

In practice, these principles require a rethinking of innovation beyond the profit motive, which simply leads our societies away from the creation of convivial technologies, but only serves to increase and accelerate human exploitation, and extractive mindsets which are patently unsustainable and in many cases, run afoul of core human rights.

This rethinking requires a shift in consciousness, and a great innovative imagination, as well as coalition building between individuals and communities across the vocal ecosystem. In practical terms, we must develop technological, cultural and legal frameworks that implement:

Consent that is truly informed, dynamic, and respectful of individual and collective rights
Compensation that fairly and accurately values the contributions of all participants in the voice data ecosystem
Control that empowers individuals and communities to follow, and have influence on the fate of their voice

These frameworks may be thought of as mechanisms of enforcement that ensure accountability, transparency and provide pathways for recourse if voice is misused or agreements are violated. They include the necessary development of more flexible and nuanced licensing frameworks specifically for voice, that are able to bypass the current legal loopholes around "fair use" that allow for legally-sanctioned exploitation and decontextualisation of voice. They include auditing and review processes that adequately asses harms to potentially devalue, decontextualise and exploit individuals and communities of voice. They include the development of context-full and privacy-enhancing technologies for the storage and process of digitized voice.

In pursuit of these practical implementations, we envision a world where developers, companies, policymakers, researchers, and individuals collaborate to create a responsible voice ecosystem that also does not overly fixate on the individual, but also acknowledges collective ownership. The Vocal Values Principles serve as a beacon for this vision, having the potential to serve as the foundation for self-assessment frameworks and certifications.

We especially advocate for the creation of a Vocal Values Certification, which can be applied in a dynamic and flexible way acknowledging different collaborators in the totality of a vocal ecosystem, which includes the givers, senders, receivers, handlers, transformers, and secondary users of voice. There are many examples in the wild already of formats by which an individual, community or organisation may receive a fair practice certification of one type of another. We advocate considering the effectiveness of these examples and reflecting on how they can be adapted to the unique embodied, situated and transmissive qualities of voice.

As voice AI and collection practices around voice are a moving target informed by cultural, technological and legal developments, we must remain vigilant and adaptable in upholding these principles. Human voice(s), in all its complexity and richness, must be protected as a fundamental aspect of identity, physiology, craft, labor and culture. Through the Vocal Values Principles, we affirm our dedication to shaping a future where voice technology enhances human potentials rather than devalues, exploits or extinguishes them, fosters a vocal ecology that is inclusive and transparent, and contributes positively to communities of voice.

By adhering to the Vocal Values Principles and striving for excellence in voice data management, we can build a future where voice technology not only innovates but also upholds the highest standards of ethics and quality, benefiting those whose voice was used to create these technologies.

Definitions

Voice/Data: Throughout this document, we use the term "voice" in place of "voice data", for the fundamental reason that we recognize human voice is intrinsically irreducible to the concept of "data" as it is commonly used in circles of research and industry. Therefore, "voice", as used in this document, is a holistic term meaning both the sonic, physiological and cultural dimensions which create human voice, but also the myriad forms and re-representations it assumes throughout its transmission, transformation, listening and reception, including both acoustic and digital transmission and intermediary mediums, representations, recordings, transcriptions, and derived features.
Vocal Ecosystem:Adds to "voice" as described above, by also including the web of individuals, communities and technologies implicated in the creation, transmission, receiving, storage and transformation of "voice".
Voice-enabled AI System: Any computational system that is part of a "vocal ecosystem" which involves model training, analysis, categorization or generation of voice.
Consent: The right to say no, yes, yes (with conditions) and to change one's mind regarding the use of voice that has been given.
Compensation: A fair reciprocal relationship of exchange for contributions of voice, which may include financial remuneration, but must include a fair and complete consideration of the value (or devaluing) of voice at different stages of its movement - through senders, receivers, handlers, and secondary handlers.
Control: The ability of individuals or communities to determine how voice they have given is used, shared, classified and represented.
Enforcement: The mechanisms and processes that ensure adherence to agreed-upon terms of voice propogation, including pathways for recourse.

Acknowledgments

The Vocal Values principles were originally developed in September 2024 as part of the S+T+ARTS AIR project D̴̼̥̉a̸͂͜ͅd̵͂͑ͅą̶̀͐s̵̯̈́e̶̜͔̍͝t̶̘͎̃s̶̫͙͂ and a collaboration between Jonathan Reus, Wiebke Hutiri, In4Art and over 40 individuals from diverse fields of practice, research and industry of voice. These ideas have been workshopped through a public event at Mozfilla Fest Amsterdam 2024 titled "The Values of Voice", and through a second expert workshop of the same name held online September 16th in collaboration with The Hmm - Platform for Internet Cultures. This document draws inspiration from numerous sources, including various manifestos, certification principles and ethical frameworks in the field of technology and data ethics.

♡ → This manifesto is a living document, open to revision and expansion based on ongoing dialogue and emerging insights in the field of voice AI ethics.

If you would like to give feedback or participate in the revision process, contact us via this contact form, or send an email to vocalvalues[at]jonathanreus.com

#VocalValuesPrinciples #AIEthics #VoiceRights #TechPolicy

Vocal Values

Statement of Vocal Values Principles