System for storing voice recognizable identifiers using a...

Telephonic communications – Telephone line or system combined with diverse electrical... – Having transmission of a digital message signal over a...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C379S088020, C379S088030, C704S251000

Reexamination Certificate

active

06728348

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed to a system that identifies and stores new phonetically based identifiers, such as names for a voice dialer and, more particularly, uses a dictionary and a word generator to produce candidates from a limited text input device, such as a telephone DTMF key pad or a spelling recognizer where there are potentially multiple candidates for the letters of a name, to produce name candidates one of which is selected by a speech recognizer.
2. Description of the Related Art
In speech-controlled systems, that is, systems where the human voice is the primary or only mode of user input, human speech is processed by a subsystem called a speech recognizer (or simply a recognizer), which may contain both software and hardware components. A typical speech-controlled system obtains a speech input (called an utterance) from a human user and uses the speech recognizer subsystem to determine which words were spoken (called the recognized text); it then uses those words to determine the actions to be carried out. Of course the recognized text will not always correctly match the utterance, since speech recognizers are still imperfect.
The current state of the art in speech recognition technology does not permit so-called “open-set” recognition, in which the human user may say anything at all and the speech recognizer determines the correct word sequence. Instead, every system that uses a speech recognizer must supply a description of the possible word sequences that the system expects to hear from the user; we call these possibilities the in-set utterances. The manner in which the in-set utterances are specified depends on the speech recognizer.
The present invention is concerned with conventional recognizers that require, as part of the specification of in-set utterances, a description of the pronunciation of each word in those utterances. The pronunciation of each word is typically provided as a phonetic spelling, a transcription of the pronunciation in a phonetic alphabet. For example, the word “phone” could be specified as being pronounced “f ow n”, where ‘f’, “ow”, and “n” are elements of the alphabet. There are several phonetic alphabets, but any particular recognizer of this class uses only one.
These systems typically provide to the recognizer a list of all the words that occur in the in-set utterances, along with one or more phonetic spellings of each word. In the so-called speaker-independent systems with which the present invention is concerned, multiple phonetic spellings of a word are often necessary because of differences in the way people pronounce words; an example is “tomayto” and “tomahto”.
The maximum number of distinct words usable at anyone time depends on the particular recognizer. For simple recognizers, the maximum may be only a few dozen, or even fewer. More complex recognizers can handle hundreds or thousands of words at a time. When each utterance consists of only a single word, some recognizers can handle a few tens of thousands of words. Recognizers that handle multi-word “continuous speech” utterances are currently restricted to a few thousand or tens of thousands words at most.
As already mentioned, a speech recognition application must identify in advance all the legitimate “in-set” utterances. However, in certain applications it would be beneficial to provide the user with the ability to add new in-set utterances in the course of using the application.
For example, consider an application that permits a user to place telephone calls simply by speaking the name of the person desired. The user might say “Call John Jones”. The system responds “Dialing John Jones at 555-1234” and completes the call. Such a system is called a voice dialer.
Suppose that there is a need to provide a “personalized” voice dialing service, where a user may speak a name from a personal list, unique to that user. In other words, each user has a personal address book containing a list of names and associated phone numbers, and each user's address book is distinct from that of other users. The application must first identify the user to tell which address book to use; only after the correct address book is identified can the application provide the correct list of in-set utterances to the speech recognizer.
What is needed is a system that will allow the addition of new names, with associated phone numbers, to the personal address books using only a telephone, without a computer terminal or keyboard or any other device at all, and without the need for human intervention in any way. What is more particularly needed is a system that acquires from the user, over the telephone, enough information to create a phonetic spelling of the name to be added (because that phonetic spelling must be provided to the recognizer for subsequent recognitions from this user's address book).
The reason that this problem is difficult is that a system cannot simply ask the user to pronounce the name to be added and process that utterance with a speech recognizer—since, by definition, we don't know the name to be added.
The present invention assumes that a conventional name dictionary is available and which is a list of a large number of the most common names (perhaps several hundred thousand, covering about 95% of the population) with one or more phonetic spellings for each. However, the entire dictionary cannot be provided to the speech recognizer because it contains too many possible utterances. Moreover, the name that the user wishes to add may not be in the name dictionary, since it is impossible to compile an exhaustive list of names.
Notice that in this example the system does not actually need the English spelling of the name to be added (although having that spelling would suffice). The voice dialer does not need a text representation of the names in an address book since it never interacts with the user except over the telephone; it only needs a phonetic representation of each name (which is what must be loaded into the speech recognizer) and, for each name, the associated number to dial.
For the purposes of simplicity the discussion herein will continue to use this example as a typical one for our problem—that is, the specific problem is to obtain, by telephone only, the phonetic spelling of a name. But the general problem is to determine, using a limited character set input device, such as telephone, a phonetic spelling of a word or phrase from a set much larger than can be managed by the speech recognizer, where the set (in general) is not completely known in advance.
Given a text representation of a name—that is, its spelling—it is conventional to determine an adequate phonetic spelling. For the fairly rare name that is not in the name dictionary a conventional text-to-phoneme heuristic (e.g., the so-called Navy rules) that find a reasonable phonetic transcription given a text word is used. With this approach, only an extremely rare name will yield a phonetic transcription so poor that recognition is impossible.
There are a number of different ways that a system, can obtain a text representation of a name over the telephone.
One method is to recognize letter spelling using a speech recognizer. This is essentially a speech recognition problem with only twenty-six “words.” A phonetic spelling for each letter is created, any sequence of letters is permitted as a legitimate utterance, and the user is asked to spell the name. The problem with this method is that speech recognition of the alphabet is extremely poor, since (a) all letters but one consist of a single syllable, giving the recognizer little chance at differentiation, and (b) many letters sound very much alike except for subtle distinctions difficult to detect with current recognition technology. Using letter spelling in conjunction with a dictionary of names when the word being spelled is in the dictionary works better, but still not well enough for all applications, such as voiced based dialing, because it is highly possible that a surname will not

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for storing voice recognizable identifiers using a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for storing voice recognizable identifiers using a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for storing voice recognizable identifiers using a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3212236

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.