Making virtual assistants sound human poses a challenge for designers

There’s a scene in the 2008 film Iron Man that offers a glimpse of future interactions between human and artificial intelligence assistants. In it, Tony Stark’s virtual assistant J.A.R.V.I.S. responds with sarcasm and humour to Stark’s commands.

Tony Stark and his AI assistant J.A.R.V.I.S. work on a project.

Contemporary voice assistants like Siri and Alexa are yet to offer such natural, nuanced social chatter. To that end, our team of computer science researchers at the University of British Columbia investigated what might be missing.

We found that voice interface designers dealt with an interesting dilemma: the tension between offering social conversation and getting things done.

Friendly or efficient?

Linguists classify human conversations into two categories: Social conversation such as greetings, humour and small talk for expressing social relations and personal attitudes, and “transactional conversation,” which transmits factual or propositional information.

People can effortlessly combine these two types of conversations in a natural manner. However, this magical blending is done somewhat subconsciously. Voice designers often fail to find the ideal blend because the two types of conversations are complementary but also conflicting.

The problem becomes pronounced when designers create voice assistants to help users complete tasks such as checking the weather or making a restaurant reservation. Designers try to enrich their voice agents’ dialogues with social courtesies such as sympathetic responses or chit-chat to enhance the naturalness.

Our study also showed that designers encounter challenges in finding an appropriate trade-off between designing for an effective assistant versus an affable companion. One participant highlighted that the more personality added, the longer the dialogue becomes, and results in either an overly chatty or cold and robotic voice agents.

Tool and design guideline support for voice designers can be helpful in solving this problem. A proper scripting tool for designing voice assistant dialogues should help designers balance the trade-off. For example, an advanced dialogue-authoring tool may suggest the designers add friendly remarks to the script or also issue a warning if the social chatter is too lengthy.

Also, design guidelines should provide prescriptive directions on how to combine these two types of conversations for different situations. For example, voice assistants should only use witty sarcasm when the user’s voice tone is detected to be in a good mood.

Collecting our emotions

To provide natural conversational experiences with voice agents, tech giants such as Apple, Amazon and Google will need to collect a lot of information about users’ conversation contexts, such as where they are, what they do, what they want and even how they feel. Indeed, scientists at Amazon are trying to understand our emotions based on our utterances.

By listening in to conversations, corporations can learn a lot about users’ health, finance and social life. Are users willing to give extensive data away to these tech giants in service of more natural conversational experiences with voice agents? What is needed for a more ethical and desirable future with voice agents?

Through natural conversations with voice assistants, we should handily be able to unlock cutting-edge AI technologies without the tedious learning process often experienced with graphical user interfaces. Recent technological advancements such as the development of nearly human-level language-generation models and speech synthesis promise the advent of truly natural voice agents.

Striking a balance between a benevolent assistant and a friendly interlocutor is within reach, but it will take more research to generate significantly better tool support for voice interface designers, and will require users to share their data.

Yelim Kim, a user experience designer at Ubisoft and who was involved in the research, co-authored this article.