Ever wish in-game characters would turn round and start swearing at you after you mutter orders from your sofa? Gamer-character chats and squabbles could be here sooner than you think.
Nuance, the company behind Swype, S Voice and – reportedly – Apple’s virtual assistant Siri, is working to bring its voice recognition, control and biometrics tech to gaming.
The main goal is to allow gamers to have interactive conversations with characters, with questions, replies and clarifications rather than simply saying basic voice commands out loud.
The technology is being built, by Nuance’s Innovation Team, on top of the open-source Unity platform which means we’re just as likely to see it on the iPad as we are on the PS4. The prototype which will be finished “in a few weeks” and ready to show to game developers and manufacturers at the gaming conference E3 2014 in June. Kenneth Harper, a Technical Product Manager, says the demo will be a “first cut of what we think this immersive experience will look like.”
VIRTUAL REALITY AND IPAD GAMES
Seasoned gamers have seen plenty of attempts at natural voice control before, from bellowing ‘Fus Ro Dah!’ at a Kinect in Skyrim to the PS2 title Lifeline, an otherwise average survival horror game controlled by simple voice commands. So why now?
“Where this is going to take off even more is with gaming manufacturers working on virtual reality headsets,” says Harper. “That’s where speech and NLU (natural language understanding – see below) is going to play a much bigger part in our opinion.” For traditional consoles, it’s not confirmed yet whether Nuance would use controllers, gaming headsets or distance voice capture.
In the short term, it’s also a good fit for narrative and storytelling based games such as the episodic Walking Dead games for iPhone and iPad by publisher Telltale Games. If you haven’t played it, the gamer moves through the story by pressing multiple choice answers on screen when interacting with other characters. Nuance says this could be replaced by interactive conversations and that developers such as Telltale, currently working on a Games of Thrones companion game, are the types of companies it wants to be working with.
JARGON BUSTER: NATURAL LANGUAGE UNDERSTANDING
How will it work? Nuance’s secret weapon is Natural Language Understanding.
In a nutshell, in Nuance products such as Dragon Mobile Assistant for Android and its customer care assistant Nina, the tech brings in context to extract meaning from users speaking naturally. That’s conversationally, as we would human-to-human, rather than relying purely on specific, predetermined keywords and phrases. Context can be supplied by structured knowledge stores on the web and previous interactions with that particular user.
Anyone who has used S Voice on a Galaxy S5 or the watered down version on a Gear 2 knows this problem is far from fully solved. But Harper says in the case of gaming, it would require application specification development on top of Nuance technology. “In the case of Game of Thrones,” he says, ” we’d be working with that developer to take NLU and feed in lots of questions that we would expect a user to be asking or commands to direct the system to do something. A statistical model would then be created to handle those types of user queries.”
The key to the future of voice is that companies like Nuance and Google use cloud data from gadgeteers trying out the feature to constantly expand what it’s capable of. With the right support, we’d hope updates to Nuance-powered games could be continuous.
“If we define an error as when the person doesn’t get the behaviour they expect out of the application, there have been really big advances in that. If we mean pure speech recognition, that hasn’t been the confounding problem for years now,” says Daniel Faulkner, VP for Voice To Text at Nuance.
“The trick has been the dialogue and the conversation and the design of the experience. Our ability to say to people ‘What do you want to do?’ and actually understand their answer and take the appropriate action is leaps and bounds ahead of where it was. It’s definitely not where it needs to be ultimately because even if you interact with the most sophisticated speech system, you’ll find its boundaries relatively quickly.
“What we see is a constant triangulation and iteration – somebody finds the boundaries of the technology, says ‘OK, that means I can’t ask this kind of question until the technology improves’. Then at some point it becomes known that now you can do these other sets of things. We’ve done our job when the user interface is unnoticeable and you don’t have to work at it.”