Heuristics: Lessons in the Art of Automated Conversation


Susan L. Hura, PhD
Intervoice, Inc.

July 2003

Applies to:
    Microsoft® Speech Technologies

Summary: Learn how to find the best path through the voice user interface (VUI) terrain and learn how to balance technological requirements and usability principles to produce superior speech interfaces. (3 printed pages)


The Lessons
Next Steps


The number of speech-enabled applications entering the consumer marketplace is rapidly expanding. Enterprises, government agencies and wireless carriers are all choosing voice over touchtone for their automated telephone applications. Additionally, the rise of the SALT initiative makes the possibility of a voice-enabled Web more imminent. For voice user interfaces to be more than a novelty, speech developers need a roadmap for creating effective, likeable interactions.

Speech is hailed as a natural mode of interaction with applications, but how do you create a natural automated conversation? Callers want and expect to be able to speak and be understood with no undue effort when using a speech application. It is well established that effective speech-enabled applications require more than excellent speech technology. Even with near perfect recognition performance, a speech interface must be easy to use if an application is to succeed. Voice user interfaces must be designed to anticipate the needs and preferences of the user and conform to the user's mental model of the domain and of spoken language. Both the performance of the technology and the design of the voice user interface are essential in speech applications.

What are the practical guidelines for balancing the needs of the user and the requirements of the recognizer in a voice-enabled interface? This paper and the Microsoft® Speech Technologies Webcast, Heuristics: Lessons in the Art of Automated Conversation, provide a roadmap for finding the best path through the voice user interface terrain—to balance technological requirements and usability principles to produce superior speech interfaces.

The Lessons

Intervoice has conducted usability evaluations of over thirty speech-enabled applications, providing us with a wealth of data about how to facilitate effective interactions between people and speech applications. Moreover, because we evaluate functional prototype applications, we have also learned a great deal about how traditional techniques for optimizing speech recognition performance can interfere with natural, conversational interaction. We have distilled usability findings across applications and domains into a set of heuristics, or rules of thumb, for voice user interface design. These guidelines will enable you to avoid pitting the user against the technology and create highly usable voice applications that make the most of speech recognition technology.

Many sets of heuristics for GUI and Web design have been published and there is general consensus on the basic guidelines for creating usable GUIs. A subset of these principles is applicable across modality and is equally relevant for usable VUI design. However, there are many principles specific to voice interaction here as well. Below we present an overview of the heuristics for voice user interface design. In the Microsoft Speech Technologies Webcast as referenced above, we will discuss each heuristic in detail, supported by relevant examples from real voice interfaces.

Lesson 1: Make It Real

Users come into their interaction with the automated system with a set of terminology, metaphors, and organizational structures already in place. That is, users have a mental model of the domain and their interaction. Usable applications tap into this knowledge to give users a head start in understanding how to interact with the application. This is an example of a modality-independent guideline that applies to any automated system.

Lesson 2: Clearly and Consistently Communicate System Capabilities

We are at a unique point in technological history. Today, many users have had more exposure to speech technologies via Star Trek than in real life. In the GUI realm, users draw upon their general computer experience, real world experience in the domain of an application, and possibly previous experience with other applications. On the other hand, many users have little or no previous experience interacting with VUIs. However, these same users are not naïve because their experience with spoken language is huge and may dominate their interaction with the application.

Therefore, voice user interfaces must be carefully designed to help users understand the capabilities of the system. Interfaces need to unobtrusively guide users to speak predictable utterances and avoid the unconstrained conversational speech that we use talking to another person. The goal is to achieve a natural conversation within the technological boundaries of speech recognition.

Lesson 3: Minimize the Limitations of the Medium

Listening is a difficult task, especially if there are other demands on the user's attention. The user's auditory memory is limited to a few short items, and these are quickly forgotten. The implications of these for a voice user interface are substantial. Navigation and overall architecture of the application must be transparent and easily retained for users to succeed. Moreover, the pace, word choice, and especially the intonation of prompts within a VUI are vital to helping users work in the auditory modality.

Lesson 4: Help the User Avoid Escalating Errors and Recover from Errors Gracefully

Speech recognition technology is imperfect, and users encounter failure for various reasons. Misrecognitions, time-outs, and out of vocabulary speech all occur regularly. To speech developers, these are distinct problems that require specialized remedies. The impression of the user, however, is simply that the system isn't working as they expect it to.

With careful consideration of error handling and cleverly designed help, we can produce applications that minimize the impact of problems that users will inevitably encounter. Applications should give users advice when they are likely to need it and allow callers change their minds, make corrections, and try again.

Lesson 5: Make the User Comfortable Using the Technology

Those of us in the speech business are technophiles who enjoy using new technology for its own sake. This is not true for most users of speech-enabled applications. Users tend to care more about accomplishing their goals than about cool technology. And recall that many users have little or no experience with speech technology, so they are unsure what to expect.

Applications need to provide users with reassurance that their spoken input was accepted and that their transactions will be processed appropriately. Remember, an automated application is valuable only if callers are comfortable enough to use it.

Next Steps

This article has laid out a set of heuristics for usable speech applications. Careful VUI design following these heuristics will produce applications that satisfy users by giving them a simple, effective method to solving their problems.

There is both art and science to designing voice user interfaces according to these guidelines. Even carefully designed applications benefit from a formal usability evaluation. The heuristics presented here are based upon results of numerous usability tests with speech-enabled applications. This does not eliminate the need for usability testing each new VUI. Techniques and metrics for VUI usability testing will be covered in a future article.