Voice Expressivity & Emotion

Title: Voice Expressivity and Emotion

We are building a "chatbot" – a conversational entity that stands alone as a tabletop exhibit. The entity uses speech recognition to infer what a spectator is saying. It then speaks to the audience with one of four artificial voices representing four different emotional personalities.


 Members:Sandra Pauletto, Chris Pidcock, Leonardo Bottaci, Jez Wells, Christophe Veaux, James Balentine, Bruce Balentine, Darren Mundy, Kevin Jones, Maria Aretoulaki

 Abstract: We want to build some kind of conversational entity, possibly one that could serve as one of the exhibits in the arcade. The entity uses speech activity detection or speech recognition to know that the user is speaking and to infer what the user may be saying. The entity then speaks to the user with an artificial voice. Some possibilities for the conversational purpose of the entity include:

  • maintain an ambiguous conversation for some time;
  • speak to the user with different voices; use the voice changes to convey different identities and emotions;
  • "borrow" the user's voice characteristics in some way;
  • ask the user to connect different voices with different physical bodies;
  • ask the user to locate or find an answer/treasure/secret; and/or,
  • TechnoMan (diskettes for eyes and cassettes for mouth).

 Speculation in particular grew around a 4-personality entity, with 4 voices and 4 "heads" (LCD display or physically modelled from wig stands). This interactive platform encourages the user to "match" the voices with the faces, and to identify which of the 4 personalities is actually engaged in the conversation (e.g., the ventriloquist). The "real" personality understands what the user is saying and says intelligent things back again. But the "dummy" personalities all have voices and keep interrupting each other, and talking about the user behind her back.

See the following three PDF descriptions:

VEEG Artefact Specification describes all three documents.

VEEG State tables contains the spoken dialogues.

VEEG State Machine shows state-transition diagrams that describe the proposed behavior of the entity.

The documents were prepared in early December 2011, and reviewed by VEEG mid-January, 2012.


These files are the words that will be exchanged between the human user and the artificial voices of the entity. We will all contribute to these, and they will be the last thing to cast in concrete. So feel free to get drafts posted here whenever possible, certainly before the April meeting.

eating apples at night by Kevin Jones

am I wearing a hat? by Kevin Jones

if you were on a desert island by Kevin Jones