Speech recognition apparatus and method

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Speech recognition apparatus and method Speech recognition apparatus and method

: 2001-03-30
: 2004-11-30
: Abebe, Daniel (Department: 2655)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Application

: C704S275000
: Reexamination Certificate
: active
: 06826533
: ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to the field of speech recognition, and in particular to a technique for controlling the sensitivity of the speech recognition unit based upon the operating state/mode of the unit.
Voice-controlled systems are becoming popular with the advances in technology. Their advantage is that the person controlling the system is not required to make direct contact with the controlled device in order to control it.
Speech recognition is performed by appropriate speech recognition algorithms that access a dictionary database. The speech recognition algorithms are such that a voice command directed to the system being controlled can be distinguished from ambient noises and similar phonetic sequences.
A problem arises when phonetic sequences are erroneously recognized as a valid voice command. Such an erroneous detection of a voice command is often described by the “false acceptance rate” (FAR) (i.e., a false positive), which specifies how many phonetic sequences were erroneously recognized as a voice command. Therefore, the FAR value is a measure of acceptance sensitivity or acceptance threshold.
An additional problem with speech recognition systems is that a valid voice command may not be properly recognized. This erroneous rejection of valid voice commands is described by the “false rejection rate” (FRR), which is a measure of how many valid voice commands were not recognized by the speech recognition system. The FRR value is a measure of the rejection sensitivity or rejection threshold at which a valid voice command is not recognized.
When controlling a device by voice, without using an additional signal transducer, such as for example a sensor or switch, the system designer would like to simultaneously achieve the best possible FAR value and the best possible FRR value. Ideally, both of these error rates should be minimized.
However, on the basis of the speech recognition algorithm, these two error rates or sensitivities are often mutually exclusive. That is, an increasing FAR value is associated with a decreasing FRR value and vice versa, so that the two error rates cannot be simultaneously optimized. In the extreme case, no valid voice command is recognized (i.e., FAR=0%, FRR=100%), or all phonetic sequences are erroneously accepted as a valid voice command (i.e., FAR−100%, FRR=0%).
In conventional speech recognition systems “keyword spotting” (i.e., recognition of a keyword) is performed to mark the beginning of a command sequence that activates the actual speech recognition function of the particular voice-controlled system. After a keyword has been recognized, the speech recognition algorithm waits for a voice command to be input. Once a valid voice command is detected, the appropriate menu item or control parameter associated with the voice command is selected. For example, the detection of the voice command “loudness” causes the speech recognition system to select the menu item for setting the loudness, while the subsequent voice command “soft” can set the appropriate loudness parameter. The command sequence is terminated through the optional input of a suitable termination command, such as, for example “end”. However, the speech recognition algorithm can also recognize the end of the command sequence from the given and previously run-through menu scheme.
These speech recognition systems often include several system operating states. For example, in a first state the speech recognition system waits to recognize a keyword. After the keyword has been recognized, the system transitions to a second state and tests subsequent voice commands in order to select a menu item or to set a corresponding parameter associated with the voice command. In these individual states, the conventional speech recognition system utilizes a constant/static FAR value and FRR value. However, since neither the static FAR value nor the static FRR value is optimized, the system operation is often less than desirable.
Therefore, there is a need for a speech recognition system that more accurately recognizes speech.
SUMMARY OF THE INVENTION
Briefly, according to the present invention, a speech recognition unit includes a memory device that stores an executable speech recognition routine, and a processor that receives and executes program instructions associated with the executable speech recognition routine. In a first operating state the processor regularly receives a digitized audio signal and processes the digitized audio signal to determine, using first detection criteria, if a keyword is within the digitized audio signal, and upon detection of the keyword the processor transitions to operate in a second operating state. In the second operating state the processor regularly receives and processes the digitized audio signal to determine, using a second detection criteria, if a first voice command is within the digitized audio signal. The first detection criteria is selected to provide a greater false rejection rate than the second detection criteria.
Speech is recognized with state-specific speech recognition parameters, which are referred to as “scores”. The speech recognition parameters thus are set to different values in the individual states so that, for example, the FAR value and the FRR value are influenced in a manner appropriately specific to the state. This procedure has the advantage that the speech recognition parameters can be optimized for each state.
It is especially advantageous for the FAR value to have a lower value in the state during which the system waits for input of the keyword, than it has in the other states. The acceptance threshold corresponding to the FAR value at which a phonetic sequence is recognized as a voice command is thus increased. Therefore, the user is required to speak the keyword clearly and to repeat it if necessary. Reducing the FAR value generally results in an increase of the FRR value (i.e., the probability that a valid voice command is not recognized is decreased).
After the keyword has been recognized, the FAR value can be increased and thus the acceptance threshold can be lowered. At the same time, the FRR value is decreased thus lowering the probability of an erroneous rejection of a voice command. This reduces the probability of erroneous execution of more complex voice commands and at the same time increases operating convenience.
Adjusting the speech recognition parameters as a function of the state utilizes the fact that the probability of recognizing a control word or a control command after the keyword has been recognized is close to 100%, since on occasions other than the initial state occurrence of the keyword is improbable.
A preferred application of the present invention is in the field of entertainment electronics. However, in principle the invention is suited for systems of arbitrary design, which are to be controlled by speech recognition. Furthermore, the invention is not limited to adjusting the FAR value and the FRR value, but can also be applied to dynamically adjusting other speech recognition parameters that are important for the speech recognition function.
These and other objects, features and advantages of the present invention will become more apparent in light of the following detailed description of preferred embodiments thereof, as illustrated in the accompanying drawings.

REFERENCES:
patent: 4866778 (1989-09-01), Baker
patent: 5191532 (1993-03-01), Moroto et al.
patent: 6151571 (2000-11-01), Pertrushin
patent: 6230132 (2001-05-01), Class et al.
patent: 6594630 (2003-07-01), Zlokarnik et al.
patent: 40 29 716 (1998-02-01), None

Affiliated with

Burchard Bernd

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Fournier Jean-Philippe

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Schneider Tobias

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Volk Thomas

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Abebe Daniel

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Gauthier & Connors LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Micronas GmbH

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition apparatus and method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech recognition apparatus and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition apparatus and method will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3321626

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure