Tuesday, December 25, 2007

On Silence: Part II

3. Between categories of options

In our example, the system plays to the user three possible stock-related commands to choose from, and then plays one more option for transferring to a manager. Since the fourth option is not a stock-related command, a one second pause should be inserted between the last stock command option and the announcement for the next command, "You can also say..."

SYSTEM: What would you like to do next? You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock."
[SILENCE]
You can also say,
[SILENCE]
"Speak to a manager."

4. When interacting with power-users

Most of the users to the stock-management application we are using for this example are going to be repeat users – that is, power-users who will not want to listen to all the menu options every time they call. In such heavy power-user applications, use silences prior to listing menu options. In this case, add a two-second pause after, "What would you like to do next?"

SYSTEM: What would you like to do next?
[SILENCE]
You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock."
[SILENCE]
You can also say,
[SILENCE]
"Speak to a manager."

5. After echoing

A brief echo from the system of the option selected by the user can serve as a reassuring confirmation that the system understood what the user said, or, in case of misrecognition, as a quick indication of error. In either case, insert a brief silence after the echo. In case of correct recognition, the silence will prepare the user for the next prompt, while in case of misrecognition, it will give the user an opportunity to barge-in with a correction. (Of course, you will need to configure an error strategy that can elegantly recover from such an error.)

SYSTEM: What would you like to do next?
[SILENCE]
You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock."
[SILENCE]
You can also say,
[SILENCE]
"Speak to a manager."

User: Get quotes.

SYSTEM: Getting quotes.
[SILENCE]
As of 10:25 am…

6. Before and after TTS prompts

As we have mentioned in a previous newsletter, avoid mixing recorded prompts with computerized, TTS prompts. Mixed prompts make for an unpleasant audio experience and should be avoided whenever possible. In cases where you have no choice but to mix human-recorded and computer-generated prompts, insert a pause between the recorded prompts and the TTS prompts. The silence will alleviate the jarring transition and will increase the level of listener comprehension.

SYSTEM: Getting quotes.
[SILENCE]
As of 10:25 am.
[SILENCE]
IBM is trading at
[SILENCE]
eighty two dollars and thirty five cents
[SILENCE]
MicroStrategy at
[SILENCE]
one hundred and three dollars and twenty four cents
[SILENCE]
and Google at
[SILENCE]
three hundred seventy four dollars and thirteen cents

Here is the entire interaction, with all silences inserted:

SYSTEM: What would you like to do next?
[SILENCE]
You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock."
[SILENCE]
You can also say,
[SILENCE]
"Speak to a manager."

User: Get quotes.

SYSTEM: Getting quotes.
[SILENCE]
As of 10:25 am
[SILENCE]
IBM is trading at
[SILENCE]
eighty two dollars and thirty five cents
[SILENCE]
MicroStrategy at
[SILENCE]
one hundred and three dollars and twenty four cents
[SILENCE]
and Google at
[SILENCE]
three hundred seventy four dollars and thirteen cents

Wednesday, December 12, 2007

On Silence: Part I

Silence is for VUI design what the number zero is for algebra. As a concept and a tool, it is at the same time essential, ubiquitous, and taken for granted. In this post, I highlight the main cases where the use of silences and pauses can contribute to a smoother, more usable VUI.

Take the following brief interaction between an IVR stock management application and a human user.

System: What would you like to do next? You can say, "Get quotes," "Buy stock,” or "Sell stock." You can also say, "Speak to a manager."

User: Get quotes.

System: Getting quotes. As of 10:25 am, IBM is trading at eighty two dollars and thirty five cents, MicroStrategy at one hundred three dollars and twenty four cents, and Google at three hundred seventy four dollars and thirteen cents.

Let's pinpoint where silences can enhance the usability of the voice interface. I will list three points in the dialog where silences are needed. I will list more in the next post.

1. Prior to listing menu options

When the system is about to provide the user with a list of options, a brief, half-second pause should be inserted between the announcement prompt and the first option that is played to the listener.

System: What would you like to do next? You can say,
[SILENCE]
"Get quotes," "Buy stock," or "Sell stock."
You can also say,
[SILENCE]
"Speak to a manager."

2. Between options in a menu list

When listing options for the user to choose from, separate consecutive options with half-second silences. The pauses will give the listener time to decide whether to select the option or wait for the next option.

System: What would you like to do next? You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock." You can also say,
[SILENCE]
"Speak to a manager."

Tuesday, December 11, 2007

Why I prefer automation when I prefer automation

Here are 5 scenarios where I find myself preferring automation over human service. Now mind you, I am someone who loves to self check out in the supermarket even when I have a cart full of stuff -- including produce that needs to be weighed.

1. I don't have to wait a long time to get what I want done by using self-service

2. I suspect that the human who will try to help me is not going to be well trained

3. It would take longer to get what I need done with a human, even if I didn't have to wait

4. I don't want to spend the emotional energy needed to interact with a human

5. I suspect that the human will try to sell me something and I'm not in a mood to listen to a pitch

Wednesday, December 5, 2007

Bruce Balentine's new book

Just started reading Bruce Balentine's new book, "It's Better to Be a Good Machine Than a Bad Person," and from the few pages I have read so far, it promises to be a very good and useful book.
Here is a quick nugget:
So speech as a business has been driven by enterprise buyers that deploy the technology for end users who have no say in the matter. In other words, IVR. (Emphases are his.) (p. 197)

Then he goes on to make the following provocative statement:
End users do not value ASR highly. Nor are they likely to do so--at least in very large numbers--at any time in the future. (Emphasis is his.) (p. 197)

Now, before you rush to dump your Nuance stock, Balentine is quick to qualify his statement with, "this book is not at all negative about speech technology" and "certain speech market niches--for example, IVR--are so important from a business perspective that their value is unquestionable (and oddly, not yet very well-mined)".

Balentine's aim seems to be to want to shock us into lowering the prevailing unreasonable "Jetsonian" expectations about ASR and to settle down to solving real-problems in the real world. Or as he puts it quite well:
The key to solving the business problem of speech technologies is to let go of the future and to start working on the present. (p. 199)

Amen!

As for what "Jetsonian" means:
Jetsonian thinking goes like this: 'No problem is so great that we can't overcome it with technology. And no task is so trivial that it's not worth automating. The future is open-ended, and advancement has no cost. It is our manifest destiny to create new and complicated things in the name of progress, and even in the absence of needs.' (p. 109)

Monday, December 3, 2007

Absurd VUI

So I'm dropping my son to school this morning and I see a "SAFETRK" sticker at the back of a pick-up truck. This is the golden sticker with a toll free number you can call to give your feedback about how safe the driver of the vehicle bearing the sticker is being. So, while stopped at the red light, I pick up my cell phone and call 1-800-SAFETRK to see what it gives me.

Well, first, I'm not liking having to hunt around for the letters. I just hate that. Then I get greeted by a pleasant enough male voice, but it keeps talking and talking -- all useful info, but a minute goes by and I am still listening. Then the kicker: If I am calling from a cell phone and it's not convenient for me to punch in answers, I should hang up and call again, so that I can safely interact with the system....

Talk about absurdity: most of the people calling will be calling from a car and they will be using their cell phones. The input the system asks for are digits: a problem that has long ago been solved by Automatic Speech Recognition....

I'll invesitage some more about the offering and will keep you posted....