Writing Prompts for Speech Recognition

Best Practices for Voice Prompts and Recognition

Reduce confusion and misrecognition

Talkman devices rely on a text-to-speech (TTS) engine that interprets written words in certain ways. If the devices pronounce syllables or words in ways that are difficult for workers to understand, modify the prompt phrases and list items values in the VoiceForm files to fix the problems.

Reduce confusion with device prompts by following these best practices.

Additionally, technicians may have problems getting the voice application to understand the list items spoken. Every speech recognition system requires some amount of repeating or retraining words so it can understand human pronunciations and inflections.

Reduce the number of misrecognitions by following these tips.

Abbreviations

While a text-to-speech (TTS) engine usually pronounces abbreviations correctly, spelling them out ensures that they will be spoken in the correct context. For example, 1st should be spelled out as first.

If you are using a display for device prompts, consider spelling the abbreviations as phonetic substitutions in the task package rather than editing the prompts in the VoiceForm and making them hard to read. See Changing Phonetics in VoiceConsole.

Capital Letters

Use capital letters if you want the letters to be spoken. If there are spaces between the letters, however, they will be spoken using the phonetic alphabet. For example, APU will be spoken as "ehe pea you"; while A P U will be spoken as "alpha papa unicorn." See Device Speech Rules.

Punctuation

Commas are used for adding pauses in sentences. Use them if a prompt sounds like a run-on sentence.
Question marks should be used if you want the prompt to sound like a question to technician.
Hyphenated words are usually spoken with the word "dash," so you may want to spell the phrase phonetically. For example, T-Wheel will be spoken as "tee dash wheel." Without the hyphen, T Wheel will be spoken as "tango wheel."

Words that Rhyme

Avoid using words in list items response options that sound the same and can be confused with each other. Small differences between pronunciations, such as "door" and "floor," can cause recognition issues.

Natural Human Speech

Write prompt phrases and multiple list selection items using conversational language. Existing entries in a database, on a paper form, or in a drop-down menu may need to be rewritten into phrases that are easier to speak and understand.

Prompt Length

Keep prompts short and to the point. If prompts are wordy and repetitive, technicians will find them annoying to listen to and will be slowed in performing their work.

If a short prompt confuses technicians even after training with the system, consider writing a longer help message to clarify instructions when a technician speaks the "details" command.

Only use longer prompts when the prompt occurs infrequently in assignments, and technicians are likely to require more information.

Separate Data Collection

Resist the urge to use VoiceNotes recordings and transcriptions to combine related inspection observations. Separate prompts and responses result in clear, distinct data. When the voice application asks specific questions and records one response per inspection task, you receive a data set that is easy to interpret, report, and audit.

Example:

Prompt: Record notes on the condition of the seats, seat belts, seat position adjusters, and seat back adjusters.

Response: "Ready." [tone] "Rip in driver's seat, passenger seat belt is worn, seat position adjusters are normal, driver's seat back adjuster is cracked, passenger seat back adjuster is jammed."

Change to:

Prompt: Is the driver's seat damaged?

Response: "Yes"

Prompt [condition on "yes" response]: Seat condition?

Valid responses: Worn, Ripped, Dirty, Broken

Prompt: Is the passenger seat damaged?

Response: "No"

Prompt: Is the driver's seat belt damaged:

Response: "No"

Prompt: Is the passenger seat belt damaged:

Response: "Yes"

Prompt [condition on "yes" response]: Seat belt condition?

Valid responses: Worn, Frayed, Dirty, Not functional

etc.

Yes/No Answers

If there are only two possible responses to a prompt, consider writing the prompt phrase as a yes/no question. This format not only simplifies responses but also increases speech recognition performance because technicians always train the words "yes" and "no" in their voice templates.

Example:

Prompt: Muffler condition?

Valid responses: Damaged, Normal

Change to:

Prompt: Is the muffler damaged?

Valid responses: Yes, No

The voice software running on Talkman devices does follow punctuation rules, so use question marks only when a yes/no response is expected.

Min and Max Lengths

Use length conditions, especially for value entries, when there is a known minimum or maximum word or character count in the expected response. With these parameters set, the voice application will not accept additional insertions and will move quickly to the next prompt.

Confirm Input

Technicians responding to a Talkman device can make mistakes, and in a noisy environment, sounds can be mistaken for spoken input (referred to as insertions).

While some input can be validated automatically, such as dates or yes/no responses, other input is best confirmed by the technician.

For the confirm parameter that is available for most prompt types, set the value to "True" to tell the voice application to ask technicians to confirm an entry, or "False" to skip confirmation. Before setting this value, consider:

whether it is possible to validate the input or input format automatically,
whether the technicians will know without a confirmation that the input was recognized incorrectly,
the tradeoff between ensuring correct entries and reduced productivity.

Note that when the confirm parameter is turned off for multiple list selection prompts, the device still echoes the responses back to the technician, so the user can hear mistakes and use the "undo last entry" command to correct the entries.

Data Type Consistency

When defining a step with a value entry, make sure that entries for Result Data Type and Characters parameters do not conflict.

Most value entry steps should be set to a string data type in order to accept a combination of characters. You should set the data type to integer, however, if you use the step result in a condition statement that performs a mathematical calculation.

If you create an inconsistency by setting the data type to integer and then allowing only alpha characters, technicians may hear an "invalid numeric entry" error. In addition, the response data could be stored as a mismatch and may affect both condition evaluation and the integrity of exported data.

Consistent Phrasing

For prompts that technicians will hear frequently, use the same phrasing. For example, when prompting technicians to leave VoiceNotes, always refer to "record a note" rather than using a variety of phrases such as "leave a comment" or "speak a message."

Similarly, list items that technicians speak frequently should be constructed in the same way. For example, if "missing" is a common list item to describe an engine part that is not present, do not use alternate synonyms, such as "not found" or "lost" or "short," in other list groups.

Words Often Spoken Together

While building prompts, you may discover that technicians are expected to respond frequently with certain words in the same sequence. If these words are always spoken together without pausing, consider adding them to embedded training to improve recognition.

When first training voice templates, operators are prompted to speak the numbers zero through nine in sequences of three—for example: one two one, three nine three. This training ensures that when operators speak numbers without pausing, the recognition engine is accustomed to the way the words sound when they run together.

Update the embedded training list by editing the task package in VoiceConsole, then reloading the task package to the devices.

Phonetic Spellings

If you find that the device is mispronouncing words in prompts or that responses are often misrecognized, try defining these words with phonetic spellings in VoiceConsole. For example, replace "oi" with "oy" or replace "o" with "uh" if appropriate.

HOW TO:

Navigate to Task Packages in the VoiceConsole user interface (GUI), select appropriate task package, and click the Edit selected task package action link.

On the edit page, open the Phonetic Sub. tab, and add a row with the word and its phonetic spelling.

Do not modify an inspection step's prompt, help, or list item phrases with phonetic spellings. These phrases are displayed on the web pages served by the Talkman devices where phonetic spellings would appear as typos and potentially confuse technicians.

Word Training or Retraining

If technicians are having problems with devices understanding specific words, technicians can train these words and add them to their own voice templates. Using the Talkman buttons, access the device menu and select the "retrain word" option. Newly trained words are automatically added to a technician's template.

When retraining a standard command, ensure that the technician is at a prompt in the dialog where that command is available and that the technician accesses the device menu after the device finishes speaking the prompt. Otherwise the word may not be present in the retrain word list.