Whither the Artificial Voice?

In order to explore the issue of artificiality, it would first seem to be necessary to specify what we mean by non-artificial speech. For example, if we define natural speech as being the linguistic behaviour exhibited by a normal human vocal tract in a quiet non-reverberant environment, then we can start to distinguish different ways in which artificiality might be introduced.

For example, it is possible to itemise the following processes:

speech transformation (processes which alter the characteristics of a speech signal);
speech sampling (processes based on the selection of segments of pre-recorded speech);
speech simulation (processes which mimic some physical aspect of the speech production process); and,
speech synthesis (methods for creating a speech signal from a parametric specification).

Examples of speech transformation would be reverberant environments, communication channels, vocoders, pitch modifiers, voice morphers, etc. An example of speech sampling would be a concatenative/unit-selection text-to-speech system. Examples of speech simulators would be articulatory models and the Sheffield animatronic tongue/vocal tract (AnTon - http://staffwww.dcs.shef.ac.uk/people/R.Hofe/anton/anton.html). Examples of speech synthesis would be formant synthesis or hidden-Markov model based synthesis.

Given the definition of natural speech, then, human speech emanating from a non-human source would also be classified as artificial! I.e. even telephone speech could be viewed as artificial.

Roger Moore
Posted 20th September 2011 at 5:19 PM

Comments

I guess Chris was addressing some of the same questions in his thesis, but he was centred on the concept of 'liveness' and 'mediatization' (after Auslander). According to that, yes, even the voice on the phone is somewhat mediatized.

Alistair Edwards
Commented 3rd October 2011 at 11:11 AM