Reactive Speech Synthesis

About 5 years ago I coined the term ‘reactive speech synthesis’ to refer to a speech synthesis system whose characteristics could be controlled in real-time. For example, a reactive speech synthesiser should automatically talk louder (and perhaps more clearly) in a noisy environment. Such a device doesn't yet exist (although relevant research is under way#), however it's easily possible for a human operator to alter some of the characteristics of a synthetic voice in real-time. For example, New Media Arts (a Belgian speech lab) has recently demonstrated a reactive speech synthesiser in which the pitch and timing of the speech is controlled by using a hand-held device (see http://www.youtube.com/watch?v=HxQuSczW0rE).

It's my intention to reproduce the New Media Arts demonstration using the Holmes parallel formant synthesiser which I already have programmed in Pure Data. It's a relatively straightforward extension to allow dynamic control of pitch and timing (or any other parameter), so I've set it as an assignment on my Speech Processing course which is running right now. This means that over 70 CS students will have a go at it, and I'll get the best solution posted as a video - probably sometime in early December. Then we can see whether it's suitable as one of the exhibits in the CREST 'arcade'.

# Moore, R. K., & Nicolao, M. (2011). Reactive speech synthesis: actively managing phonetic contrast along an H&H continuum, 17th International Congress of Phonetics Sciences (ICPhS). Hong Kong. [available as a download via the publications link on my homepage - http://www.dcs.shef.ac.uk/~roger]

Roger Moore
Posted 22nd October 2011 at 10:02 AM