Tuesday, December 25, 2007

On Silence: Part II

3. Between categories of options

In our example, the system plays to the user three possible stock-related commands to choose from, and then plays one more option for transferring to a manager. Since the fourth option is not a stock-related command, a one second pause should be inserted between the last stock command option and the announcement for the next command, "You can also say..."

SYSTEM: What would you like to do next? You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock."
[SILENCE]
You can also say,
[SILENCE]
"Speak to a manager."

4. When interacting with power-users

Most of the users to the stock-management application we are using for this example are going to be repeat users – that is, power-users who will not want to listen to all the menu options every time they call. In such heavy power-user applications, use silences prior to listing menu options. In this case, add a two-second pause after, "What would you like to do next?"

SYSTEM: What would you like to do next?
[SILENCE]
You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock."
[SILENCE]
You can also say,
[SILENCE]
"Speak to a manager."

5. After echoing

A brief echo from the system of the option selected by the user can serve as a reassuring confirmation that the system understood what the user said, or, in case of misrecognition, as a quick indication of error. In either case, insert a brief silence after the echo. In case of correct recognition, the silence will prepare the user for the next prompt, while in case of misrecognition, it will give the user an opportunity to barge-in with a correction. (Of course, you will need to configure an error strategy that can elegantly recover from such an error.)

SYSTEM: What would you like to do next?
[SILENCE]
You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock."
[SILENCE]
You can also say,
[SILENCE]
"Speak to a manager."

User: Get quotes.

SYSTEM: Getting quotes.
[SILENCE]
As of 10:25 am…

6. Before and after TTS prompts

As we have mentioned in a previous newsletter, avoid mixing recorded prompts with computerized, TTS prompts. Mixed prompts make for an unpleasant audio experience and should be avoided whenever possible. In cases where you have no choice but to mix human-recorded and computer-generated prompts, insert a pause between the recorded prompts and the TTS prompts. The silence will alleviate the jarring transition and will increase the level of listener comprehension.

SYSTEM: Getting quotes.
[SILENCE]
As of 10:25 am.
[SILENCE]
IBM is trading at
[SILENCE]
eighty two dollars and thirty five cents
[SILENCE]
MicroStrategy at
[SILENCE]
one hundred and three dollars and twenty four cents
[SILENCE]
and Google at
[SILENCE]
three hundred seventy four dollars and thirteen cents

Here is the entire interaction, with all silences inserted:

SYSTEM: What would you like to do next?
[SILENCE]
You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock."
[SILENCE]
You can also say,
[SILENCE]
"Speak to a manager."

User: Get quotes.

SYSTEM: Getting quotes.
[SILENCE]
As of 10:25 am
[SILENCE]
IBM is trading at
[SILENCE]
eighty two dollars and thirty five cents
[SILENCE]
MicroStrategy at
[SILENCE]
one hundred and three dollars and twenty four cents
[SILENCE]
and Google at
[SILENCE]
three hundred seventy four dollars and thirteen cents

Wednesday, December 12, 2007

On Silence: Part I

Silence is for VUI design what the number zero is for algebra. As a concept and a tool, it is at the same time essential, ubiquitous, and taken for granted. In this post, I highlight the main cases where the use of silences and pauses can contribute to a smoother, more usable VUI.

Take the following brief interaction between an IVR stock management application and a human user.

System: What would you like to do next? You can say, "Get quotes," "Buy stock,” or "Sell stock." You can also say, "Speak to a manager."

User: Get quotes.

System: Getting quotes. As of 10:25 am, IBM is trading at eighty two dollars and thirty five cents, MicroStrategy at one hundred three dollars and twenty four cents, and Google at three hundred seventy four dollars and thirteen cents.

Let's pinpoint where silences can enhance the usability of the voice interface. I will list three points in the dialog where silences are needed. I will list more in the next post.

1. Prior to listing menu options

When the system is about to provide the user with a list of options, a brief, half-second pause should be inserted between the announcement prompt and the first option that is played to the listener.

System: What would you like to do next? You can say,
[SILENCE]
"Get quotes," "Buy stock," or "Sell stock."
You can also say,
[SILENCE]
"Speak to a manager."

2. Between options in a menu list

When listing options for the user to choose from, separate consecutive options with half-second silences. The pauses will give the listener time to decide whether to select the option or wait for the next option.

System: What would you like to do next? You can say,
[SILENCE]
"Get quotes,"
[SILENCE]
"Buy stock," or
[SILENCE]
"Sell stock." You can also say,
[SILENCE]
"Speak to a manager."

Tuesday, December 11, 2007

Why I prefer automation when I prefer automation

Here are 5 scenarios where I find myself preferring automation over human service. Now mind you, I am someone who loves to self check out in the supermarket even when I have a cart full of stuff -- including produce that needs to be weighed.

1. I don't have to wait a long time to get what I want done by using self-service

2. I suspect that the human who will try to help me is not going to be well trained

3. It would take longer to get what I need done with a human, even if I didn't have to wait

4. I don't want to spend the emotional energy needed to interact with a human

5. I suspect that the human will try to sell me something and I'm not in a mood to listen to a pitch

Wednesday, December 5, 2007

Bruce Balentine's new book

Just started reading Bruce Balentine's new book, "It's Better to Be a Good Machine Than a Bad Person," and from the few pages I have read so far, it promises to be a very good and useful book.
Here is a quick nugget:
So speech as a business has been driven by enterprise buyers that deploy the technology for end users who have no say in the matter. In other words, IVR. (Emphases are his.) (p. 197)

Then he goes on to make the following provocative statement:
End users do not value ASR highly. Nor are they likely to do so--at least in very large numbers--at any time in the future. (Emphasis is his.) (p. 197)

Now, before you rush to dump your Nuance stock, Balentine is quick to qualify his statement with, "this book is not at all negative about speech technology" and "certain speech market niches--for example, IVR--are so important from a business perspective that their value is unquestionable (and oddly, not yet very well-mined)".

Balentine's aim seems to be to want to shock us into lowering the prevailing unreasonable "Jetsonian" expectations about ASR and to settle down to solving real-problems in the real world. Or as he puts it quite well:
The key to solving the business problem of speech technologies is to let go of the future and to start working on the present. (p. 199)

Amen!

As for what "Jetsonian" means:
Jetsonian thinking goes like this: 'No problem is so great that we can't overcome it with technology. And no task is so trivial that it's not worth automating. The future is open-ended, and advancement has no cost. It is our manifest destiny to create new and complicated things in the name of progress, and even in the absence of needs.' (p. 109)

Monday, December 3, 2007

Absurd VUI

So I'm dropping my son to school this morning and I see a "SAFETRK" sticker at the back of a pick-up truck. This is the golden sticker with a toll free number you can call to give your feedback about how safe the driver of the vehicle bearing the sticker is being. So, while stopped at the red light, I pick up my cell phone and call 1-800-SAFETRK to see what it gives me.

Well, first, I'm not liking having to hunt around for the letters. I just hate that. Then I get greeted by a pleasant enough male voice, but it keeps talking and talking -- all useful info, but a minute goes by and I am still listening. Then the kicker: If I am calling from a cell phone and it's not convenient for me to punch in answers, I should hang up and call again, so that I can safely interact with the system....

Talk about absurdity: most of the people calling will be calling from a car and they will be using their cell phones. The input the system asks for are digits: a problem that has long ago been solved by Automatic Speech Recognition....

I'll invesitage some more about the offering and will keep you posted....

Tuesday, November 27, 2007

Measuring satisfaction

I've always felt that post-call customer surveys are methodologically unsound -- or at least as sound as the online poll surveys. The people who bother to take the surveys are self-selecting, and chances are that the people who respond were really happy about the service and were moved enough to trouble themselves with hanging on and providing their feedback, or -- which is more likely -- were really pissed and needed to vent off their frustration.

But there is another layer still that makes these surveys suspect: agent interference. That is, agents selecting which callers to nudge to the survey, and when nudging, trying to influence them to give them good marks.

Check out this article from Service Untitled. It talks about agents "begging" callers to give them good marks!

Such surveys can still be useful if used, for instance, to identify which calls to listen to for training purposes (listen to the ones that got real bad or real good marks). They can also be used to route callers to manager in case the marks entered are really bad, etc.

Friday, November 23, 2007

The Automation Quadrant

Here's a high-level visualization I've come up with of the interplay between task complexity, levels of task automation, and the possible resulting combinations of agent and user satisfaction levels.

#1: An easy task that is automated makes the agent happy because they don't have to deal with mundane, mindless tasks (What is my balance? What's the status of my order?).

#2: The user is happy because they can get through their call quickly, without having to wait for an agent. No waiting and the task is completed quickly.

#3: Trying to automate a very complex task probably means getting rid of agents.

#4: Having users go through complex tasks with an automated system rarely makes the user happy.

#5: The agent is unhappy because they are having to deal with mundane, mindless tasks (what is my balance, what's the status of my order). Their job is demoralizing, soul-deadening.

#6: The user is made to wait for an agent to do something that should take just a few seconds. And they get to interact with a demoralized agent.

#7: The agent is being used for complex tasks that cannot be easily automated: a fulfilling job.

#8: The user is glad that a human is helping them through a difficult task.

Barge-in: Part II

Here are 6 guidelines for when to turn barge-in on or off.

1. Turn off barge-in during the opening prompt

You usually want to turn your barge-in off at the very beginning of your application (unless it is used mainly by repeat users), where you are greeting the user and preparing them for the interaction. The same holds for the beginning of a new section in the call – e.g., a section in a survey.

2. Turn on barge-in when playing a menu prompt

Usually, the barge-in should be turned on while listing options, thus giving the user the ability to speak or enter their selection as soon as they hear it. However, in situations where it is important that the user hear all the options before making their selection, barge-in should be turned off. In either case, it is always a good idea to alert the user that they can either speak as soon as they hear the option they want to select, or that they will need to listen to the whole prompt before making their selection.

3. Turn off barge-in during transitions

Transitions are also sensitive points in an exchange, since they signal both the end of a phase and the beginning of the next phase. You do not want the marking of this important moment in the exchange to be marred by an unintentional barge-in.

4. Turn off barge-in when requesting confirmation

By definition, a systems requests confirmation when the consequences of making a recognition error are significant. Minimize the likelihood of interruptions when requesting confirmation.

5. Turn off barge-in when providing information

Minimize user frustration by reducing the likelihood of having the system be inadvertently interrupted by the user when the prompt being played contains information that is of interest to the user.

6. Turn off barge-in in error prompts

For instance, if a user thinks the system is expecting a date when in fact it is expecting a zip code, switching barge-in off so that the user can hear, "Sorry, I didn't get that. Please give me your zip code," would help the user recover. If barge-in is on, the user is liable to interrupt the prompt after, "Sorry, I didn't get that," and again mistakenly give the application a date.

Wednesday, November 21, 2007

Barge-in: Part I

Barge-in -- the ability by the user to interrupt a system prompt with voice or DTMF input -- is a very useful tool that the VUI designer can tap into to effectively adapt to various exchange settings the system may need to navigate. The challenge for the VUI designer is determining when to give the user the ability to interrupt and when to take it away from them.

As a rule, you should let the user interrupt system prompts with their input, unless a good reason presents for taking that ability away from them. Three broad parameters need to be taken into consideration when considering turning off barge-in.

User's level of experience with the system

The first and perhaps most obvious factor the designer must consider when debating whether to turn barge-in on or off is the user’s level of experience and familiarity with the application. If the vast majority of users are going to be repeat users and therefore very familiar with the call flow and the options available at any point in the call (e.g., employee check-in/check out), then the default setting for barge-in should be “on”. If the user is going to be a one-time or very infrequent user (e.g., a phone survey), then the barge-in setting should default to “off”.

Call environment

If your application is called from noisy environments (e.g., busy street, factory floor), consider doing two things: first, set the value of your speech recognizer’s sensitivity below the default setting (this will enable the system to tolerate a higher threshold of noise without taking it as input from the user), and second, turn barge-in off, thus at least ensuring that the system will be able to complete playing the prompt to the user.

Conversational context

The third dimension the VUI designer must keep in mind to decide whether to turn barge-in on or off is the conversational context – i.e., where in the structure of the application is the call.

In the next couple of posts, I will list 6 guidelines as to when to turn barge-in on or off.

Saturday, November 17, 2007

Help fix Speech Technology Magazine's VUI

This month's SpeechTech Magazine lead editorial by known speech technology expert."

The expert apparently lambasted Myron by pointing out that the system opened with "a long, rambling monologue that provided virtually no value" and added that it "even had the ‘please listen carefully as our stuff has changed’ nonsense. Surprised that you didn’t have a Web hype or a couple of ‘Your call is important to us’ included. When I pressed 0, the system said that wasn’t valid and then turned around and told me to press 0 to reach an operator."

First, I give David credit for owning up to committing close to an egregious mistake by not making sure that the Magazine's IVR was a showcase of what speech technology can do.

I also give him credit for taking the next step to fix the problem:

[W]e’re turning to the readers of this magazine for help. I encourage independent VUI consultants to call 212-251-0608, navigate through our IVR system, and email me suggestions for improvement. The VUI designer with the best suggestions will be announced in the magazine, win a three-month, full-page ad placement in Speech Technology magazine (our production team will even create the ad for you), and we will also place the winner on our editorial advisory board for the next year. If this isn’t enough, we may even dance around the water cooler and chant your name.

One thing to note here is that David doesn't seem to understand something fundamental about design (VUI or otherwise): the need to talk to the customer and to understand what they want out of the IVR application when designing a system. Instead, he talks about the system as something out there that needs to be tweaked "objectively". I will ping him on this and maybe even offer to host their solution for free at Angel.com.

And by the way, I can bet my bottom dollar that the so-called known speech technology expert" was none other than Walt Tetschner. He has three or four things he complains about constantly (and correctly I would say), and he touched on three of them here: length of opening prompt, the "please listen as our stuff has changed" and of course, pressing zero and getting to a human. But the real giveaway is the use of the word "nonsense" to describe behavior he doesn't like. A pretty predicatable character, I must say....

Thursday, November 15, 2007

Dictionary of VUI Terms

Finally got around doing something I've been meaning to do for a year now: publish an on-line, interactive dictionary of VUI terms.

Check it out at: http://www.lingospace.com/VUI

The dictionary is a work in progress, and I will be feeding a bunch of terms in the next few days. Please feel free to add to it via: http://www.lingospace.com/VUI/addword.asp

Sunday, November 4, 2007

Let sentences end in a preposition

If the more natural sounding way of asking a question or providing information has you ending a sentence with a preposition, so be it. Do not twist sentences into weird sounding phraseologies just to satisfy some English grammar rule that, in any case, is almost always disregarded in spoken speech.

Here is a not-so-natural sounding prompt:

System: From where will you be leaving?

Here is a more natural sounding prompt:

System: Where are you leaving from?

Saturday, October 20, 2007

Put the most important information first

When providing requested information, put at the beginning of the prompt the most important elements of the system’s response. (And “important” is what the user wants to hear first, not what the company wants the user to hear.) Otherwise, you will risk having the user interrupt the prompt and miss the important information they were seeking.

Friday, October 12, 2007

Remember that the user will mimic the system

When you design your prompts, keep in mind that the user will be listening carefully to the language spoken by the system. If the system uses stiff, robotic, or verbose, language, so will the user. The user will also closely mimic the very wording used by the system. This can cut both ways. First, make sure that if you have to use slang or jargon, that your grammar includes such slang or jargon. By the same token, if you want the user to use specific phrases or adopt a certain style of responding to your questions, all you have to do is write the prompts in language that implicitly illustrates to the user how you want them to speak.

Friday, October 5, 2007

Use language that is commonly used in conversation

The closer to a human agent’s language you can make your application sound, the more usable it will be. This is not a call to try to fool the user into thinking that they are talking to a real human being. Rather, it is a modest suggestion to avoid having the system speak say things that a human being would never say in normal circumstances. Don’t be tempted with formal, cramped language, or language that reads nicely but would never be spoken by a socially competent human being.

Here is a bad prompt:

System: Please tell me the date of your birth, including the month, day, and year.

Here is a better prompt:

System: What is your birthday?

Tuesday, September 18, 2007

Postpone the call-recoding disclaimer

If you are not recording the automated part of calls and only interactions between users and agents, then delay the obligatory message, “This call may be recorded for quality assurance purposes,” to just before when the call is being transferred to an agent. Such disclaimers not only lengthen the opening prompt but have come to be taken as a cue that the user is about to be transferred to a live agent and may needlessly frustrate users when no such transfer follows.

Thursday, September 6, 2007

Don't mention the web site in the opening prompt

Chances are that the people calling you not only already know that you have a web site, but that they got hold of your phone number from the web site. Unless this is not the case (your users are answering an infomercial, for instance), don't risk insulting your users' intelligence or making them feel that you really don’t want to interact with them over the phone. If you have to mention the web site, try to mention a specific page where they can find help, and mention it at moments where you have determined that the help they are seeking could be obtained via the web. A good place to mention the web site is at the closing of the call.

Thursday, August 23, 2007

Drop "For English Press…."

If the application is multi-lingual and you need a language selector, drop the “For English” option from the list of language options if the application starts off speaking in English. Have it pause 1.5 or 2.0 seconds after the language instructions have completed and then proceed with the rest of the dialog in English if the user does not respond.

System: Widget Solutions. [Audio Icon]. Intelligence at your service. Para Espanol, oprima el dos.


Friday, August 10, 2007

Establish that they can use speech

If the application is speech enabled, make sure that your opening prompt conveys that upfront. Have the first interaction ask them to answer a yes/no question, for instance, by explicitly saying, “you can say ‘yes’ or ‘no’”. However, make sure that you don’t explicitly announce to the user that the application is speech enabled. Chances are that, whatever formulation you choose to make that announcement, some people will not understand what it means.

Friday, August 3, 2007

Never ever say, “Please listen carefully as our options have changed”

An awful invention that must be banned once and for all. Make excluding this expression from you VUI design a non-negotiable principle for which you are willing to walk away if any project manager insists on having it included.

Tuesday, July 24, 2007

Drop the "Welcome to..." and "Thank you for calling…"

Instead, why not simply have your application announce your company’s name, preceded or followed by an audio icon, and then followed by the company’s tag line, if the company has one.? Such an opening will not only set your IVR apart from the garden variety ones but will have shortened the length of your opening prompt.

System: Widget Solutions. [Audio Icon]. Intelligence at your service.

Friday, July 20, 2007

My Beef with Paul English: Part II

What did I mean by "the propagation of the very ad-hoc, amateurish IVR deployments that Paul English is complaining about"?

The problem with the voice automation field is not a lack of knowledge, it is a lack of practice. We know and have known for many years what a good VUI sounds like. And we continue to learn and refine our knowledge. What is at the heart of what ails IVR deployments is the disconnect between what we know and what we encounter in the deployed IVR wilderness.

I am not going to theorize here about the root causes of such a disconnect. That would make for a fine research topic for a Technology Studies graduate student. I might venture to guess that it has something to do with the fact that telephony deployments have for long been technically complex projects with the bulk of the challenge and expense being in just getting a system to work and to keep it running. So, your ops people, the smartest and most technically savvy members of your staff, end up slapping together the IVR system for you, thus rendering usability to an afterthought at best.

Another possibility may be that only very recently have we seen universities seriously taking up VUI design as a legitimate line of training. I can tell you from experience that there is no plethora of professionally trained VUI designers out there. The good people at EIG are fulfilling a great need, but by themselves they cannot make the impact that dozens of universities minting out thousands of Bachelor or Masters degrees in VUI design can.

So, when Paul English comes along and begins to conduct methodologically unsound surveys of his website visitors (see http://www.gethuman.com/standard/
for a blurb on his “methodology”), pretending that he is seriously building a useful corpus of knowledge that will guide the industry into better deployments, one wonders if the man is serious, and if he is, why won't the industry veterans who have decided to "support" him not point out to him that the wheel he is supposedly building has already been invented?

Paul English can still be very useful to our industry and to the consumer rights movement in general. Instead of framing the struggle against bad automation in epistemological terms -- i.e., we don't have enough knowledge so let us start learning -- he could for instance engage his energies in propagating the good word about the large body of work and thinking that has already been done, or the many outfits that can help companies deploy sound interfaces. Or even more significantly, he could agitate for universities to train a solid generation of professional-grade VUI designers and developers.

Now, should we expect someone who runs a web site called gethuman.com and who has gained notoriety from publishing a cheat sheet that helps callers completely avoid automation seriously undertake such a mission? I don’t think we should. I think it is up to us to get beyond Paul English and to start tackling the true causes of what ails voice automation as it is deployed today.

Thursday, July 19, 2007

My Beef with Paul English: Part I

Nowadays, when I open my newly arrived "Speech Technology Magazine" (and I do like the new format), I brace myself. I brace myself for yet another article announcing that Paul English "is right" and that he is "good for the industry". As I read, I literally cringe at the fawning for a man who has only contempt for what we do. And palpable fear and a hint of panic is what I read between the lines. Otherwise, how can one explain the flight from rational thought that has led many industry experts and veterans to settle that the best way to deal with someone who heaps loathing upon you is to adopt him as a latter day saint and a savior?

Here is the extent to which Paul English is right. He is right that people do not like IVR very much, or at all, and he is right that a good number of deployed IVR systems are not well designed. And that’s about it.

Now, my beef with Paul English is not that he is going about announcing the obvious. The best teachers start from the basic elements of truth, and those two observations are indeed good starting points.

My beef with Paul English is his second act: the project that he has launched to begin "reforming the industry".

Let's take a look at his gethuman.com web site. If you browse through his "gethuman standard" pages, the one thing that will strike you if you have been in the Voice User Interface (VUI) design field for any period of time is his "tabula-rasa"[0] approach to his reformation "movement". It is as if there has been NOTHING done in the VUI design field before Paul English decided to take up the reform mantle. No mention of books written in VUI design, no mention of articles, forums and other resources. To someone who is not familiar with the industry, it will surely appear as if no one before Paul English had ever bothered to care about caller experience, no one had ever thought of writing down VUI best practices or tending to voice interface usability.

What does that tell us? Well, first and foremost, that the man is not serious about reforming automation. Read his "core principles" page, for example. It smacks of hurried, half-hearted amateurism. So we end up with inanities such as, "The system should be so easy, convenient and efficient to use that people will willingly choose to use it," or "Self-service applications should have logical flow," or "No prompt content should be included unless it improves efficiency of task completion for the user." And that's about the level of sophistication that one will get.

Now, as I said, stating the obvious is no sin in and of itself. But it is a sin if the obvious is misleadingly presented as the cutting-edge final word and not as a stimulus for more serious learning and investigation. The only references that I could find on Paul English’s gethuman web site to resources for those interested in better VUI were to Walt Tetschner's ASRNews and Walter Rolandi’s VUI consulting practice. (Both Tetschner and Rolandi are members of the gethuman team.[Y]) No mention of The Enterprise Integration Group, for instance, a very well known and respected consulting firm that offers top-quality training in VUI design. No mention of Vocalabs, a highly competent agency in IVR usability. No mention even of Nuance’s Speech University. One would have expected at least a mention of Nuance, given that Peter Mahoney, according to a speech he gave in Speechtek West in 2006, went to high school with Paul English and had at the time English was about to launch gethuman.com been chatting with him for hours at a time about his Cheat Sheet. Of course, mentioning Angel.com’s IVR university would be out of the question: Paul English considers Angel.com an arch-enemy, period. Why? Because we dared respond to his Cheat Sheet with out own IVR Cheat Sheet.

The consequence is the propagation of the very ad-hoc, amateurish IVR deployments that Paul English is complaining about.

Tuesday, July 17, 2007

Having it out with Walt Tetschner

My response to Walt Tetschner's sarcastic response to one of my posts on the VUIDs yahoogroups:

And thank you for making it hard for me to be sarcastic!

I only wish you would apply your severe standards of evidence to the methodology applied by the gethuman project to reach its conclusions.... There, self-selection and loaded questions, among other basic infractions of sound statistical practice, are tolerated as unremarkable..... See:
http://www.gethuman.com/standard/

Anyway, your posts remind me of that famous saying that goes something like: "The beatings shall continue until morale improves!" ;-)
This was Walt's post:

Thanks for your input to my survey. The denial scores continue to dominate. Have you attempted to contact Vodafone or the SpeechTech magazine author about the survey that substantiated that their customers are so satisfied? They don’t mention any details of the survey at all which makes it a bit difficult to blindly accept the results of the sutvey. If they truly are obtaining results that are as good as are being reported, then we should really find out what they are doing so well. I noticed that they have also found that long menus are good for improving customer satisfaction.

Walt

Monday, July 16, 2007

When users prefer IVR

A couple of interesting articles from Speech Technology Magazine. The first will probably send Walt&Walt into shock. It makes the brazen claim that users of a Vodafone Spain speech solution were happy with the speech deployment they called into. Of course, the numbers they cite -- 95
percent of its customers surveyed about their experience with the speech-enabled call center find the system easy to use, 89 percent think using the system is quick, and 96 percent are not bothered by the system at all -- may be complete lies and outright fabrications, but here is the article, for what it's worth, and for the record:
http://www.speechtechmag.com/Articles/ReadArticle.aspx?ArticleID=29711

A second, less shocking article, mentions a 2006 Gartner study that also gives some interesting numbers. Again, the numbers should be taken with a dry grain of salt as they may also be shameless fabrications that may very well not withstand the crushing weight of anecdotal evidence and irrefutable general (but sanguine) cultural sentiments:
http://multichannelmerchant.com/opsandfulfillment/contact_center_advisor/speech_\
IVR/

Tuesday, July 10, 2007

Opening the Call

You get only one shot at making a good first impression. In an IVR system, such an impression is formed by users and conveyed by VUI designers with the application’s opening prompt.

When writing your application’s opening prompt, keep the following three basic VUI guidelines in mind:

Be brief: Belabored, verbose opening prompts confirm the worst stereotype of the dumb, overbearing IVR system. If you force users to listen to 30 seconds of instructions, information, and disclaimers before they can take the first step towards solving their problem, you will not only have started your user on the wrong note, but would have given users a whole 30 seconds to push the zero-out button.

Be concise: Each and every single word in your opening prompt needs to be absolutely indispensable. If you can get rid of a word without losing meaning or effectiveness, do it.

Be polite: Politeness is not simply an icing on the cake of a good VUI design. A system that is respectful of users is a system that is attentive to user needs, and therefore a system that will help users successfully accomplish the task they called about.

Sunday, July 1, 2007

The Invisibility of VUI

Perhaps the most frustrating thing about using a voice interface is the feeling of not knowing where precisely you are in the interaction and what exactly the system expects you to do next. A well-designed web site will show navigators where in the menu tree they are, but even without a menu path indicator, a web page usually has enough visual clues to tip the user on where they are in the site (a url being one simple indicator). Not so with a voice interface, where the user can quickly feel lost for a lack of mental markers positioning them where they precisely are in the exchange with the system.

Mark the exchange: just like a well-designed web page will indicate where in the web site a user is, a good voice interface will tell the user where in the menu tree they are positioned. Usually, a word or two will suffice: “main menu” for the highest level menu, “here are your flights” before announcing a list of flight numbers, etc.

Trace the path: in applications where the menu structure is deep and wide, users can very easily become confused about where they are in the interaction, even when you mark the individual menu levels. In such situations, you can associate with each voice page that handles an interaction with users a “position page” that traces, starting from the main menu, the position of the user within the menu tree. “Restaurants, Chinese, Zip code”, for instance, would succinctly help the user understand that they chose “Restaurants”, then “Chinese”, and are now giving out a zip code to locate Chinese restaurants within that zip code. You can achieve path tracing by using a message page with a prompt describing the path and the “Go back” option for “Actions”.

Use earcons: an “earcon”, or “auditory icon”, is the voice-equivalent of a graphical interface’s icon. An icon is small graphic that means something specific in the context of the interaction: for instance, an “arrow” pointing to the right may mean go to the next page, and one to the left may mean go back to the previous page. Earcons can be very useful in positioning the user within a menu structure or in announcing the type of action that is about to be undertaken. The sound of a keyboard clicking could be used to indicate to the user that the system is busy doing something (while dead silence may be taken by the user that the system crashed or the call had ended).

Perhaps the one fundamental advantage that GUIs have over VUIs is the feeling that a graphical user has of control over both the medium and the interaction. A very bad GUI can certainly make one feel helpless and at the mercy of irrational forces, but it does take a very bad GUI to throw the user into a state of confusion. A VUI, on the other hand, because it is time time-linear, uni-directional, and invisible, has to stumble only once in the interaction for the user to be thrown in a state of hopeless perplexity. Keeping in mind that there are key differences between designing a GUI and a VUI should help the alert VUI designer avoid making the costly mistake of smuggling GUI assumptions when engaged in VUI design.

Monday, June 18, 2007

The Uni-directionality of VUI

Compounding the linearity of speech is its unidirectional character. Just as time is a one way street, speech is a one-way medium. When you hear something, you can’t easily go back and listen to it again. Contrast this to reading a piece of text where you can readily scan a couple of paragraphs, or even pages, back and re-read the text.

Offer to repeat: one obvious way to alleviate this limitation is to offer the ability to repeat information. Of course, make sure that the user is aware that they can have information repeated to them by informing them of this ability at the beginning of the call and any time where important information is given out to them.

Offer help: crucial information such as instructions given at the start of the interaction should be available for the user to tap into at any point in the exchange. Offer instruction on how to access help at the beginning of the call and at moments where the user seems at a loss over what to do (e.g., at no-input or a no match).

Offer summaries: in interactions where information is being gathered from the user or given out to them in a step-wise fashion, a powerful technique to overcome the uni-directionality of voice interfaces is to offer users the ability to ask for summary of information collected so far.

Saturday, June 2, 2007

The Time Linearity of VUI

Unlike graphical interfaces, voice interfaces are linearly coupled with time. When you are reading text on a web page, for instance, you can easily skip ahead with your eyes to the section that you are interested in. Not so with a voice interface, where you must patiently listen to one word before you can hear the one that follows it.

Avoid long prompts: obviously, unnecessarily long prompts will quickly tax the user’s patience. Long prompts explaining how the application works, for instance, may be inevitable and necessary with a novice user, but they should not be forced upon an expert user. Differentiate at the outset of a call between novice and expert users, and use short, to the point prompts with the experts.

Use short menus: the length of an alphabetically sorted drop down menu on a web page is a non-issue. The length of a menu in a voice interface on the other hand should not exceed five or six.

Put important information first: don’t annoy the user by having them wait through unnecessary noise for the information they need. Give them what they want upfront.

Allow interruptions: the ability to interrupt is usually a must have when dealing with non-novice users. People who know what they want to do, what to say and how to say it don’t want to wait for the system to finish talking before they give their response.

Offer short cuts for the user who knows what to do: another must for non-novice users are shortcuts that cut through menus and get the user to what they want to do or where they want to be in a menu structure.

Allow pauses: an enormous advantage that a graphical interface has over a voice interface is the ability to easily pause and pick up where you left off. We do this without even thinking about it when we are reading a piece of text. During interactions where the user may need to pause and do something, make sure that you offer that option to them. For instance, if the user needs to take down a long series of numbers (say a confirmation code), ask them to go ahead and get paper and pencil and to say, “continue,” when they are ready.