Noise-compensated speech recognition templates

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S244000

Reexamination Certificate

active

06381569

ABSTRACT:

BACKGROUND OF THE INVENTION
I. Field of the Invention
The present invention relates to speech processing. More particularly, the present invention relates to a system and method for the automatic recognition of spoken words or phrases.
II. Description of the Related Art
Digital processing of speech signals has found widespread use, particularly in cellular telephone and PCS applications. One digital speech processing technique is that of speech recognition. The use of speech recognition is gaining importance due to safety reasons. For example, speech recognition may be used to replace the manual task of pushing buttons on a cellular phone keypad. This is especially important when a user is initiating a telephone call while driving a car. When using a phone without speech recognition, the driver must remove one hand from the steering wheel and look at the phone keypad while pushing the buttons to dial the call. These acts increase the likelihood of a car accident. Speech recognition allows the driver to place telephone calls while continuously watching the road and maintaining both hands on the steering wheel. Handsfree carkits containing speech recognition will likely be a legislated requirement in future systems for safety reasons.
Speaker-dependent speech recognition, the most common type in use today, operates in two phases: a training phase and a recognition phase. In the training phase, the speech recognition system prompts the user to speak each of the words in the vocabulary once or twice so it can learn the characteristics of the user's speech for these particular words or phrases. The recognition vocabulary sizes are typically small (less than 50 words) and the speech recognition system will only achieve high recognition accuracy on the user that trained it. An example of a vocabulary for a handsfree carkit system would include the digits on the keypad, the keywords “call”, “send”, “dial”, “cancel”, “clear”, “add”, “delete”, “history”, “program”, “yes”, and “no”, as well as 20 names of commonly-called coworkers, friends, or family members. Once training is complete, the user can initiate calls in the recognition phase by speaking the trained keywords. For example, if the name “John” was one of the trained names, the user can initiate a call to John by saying the phrase “Call John.” The speech recognition system recognizes the words “Call” and “John”, and dials the number that the user had previously entered as John's telephone number.
A block diagram of a training unit
6
of a speaker-dependent speech recognition system is shown in FIG.
1
. Training unit
6
receives as input s(n), a set of digitized speech samples for the word or phrase to be trained. The speech signal s(n) is passed through parameter determination block
7
, which produces a template of N parameters {p(n) n=1. . . N} capturing the characteristics of the user's pronunciation of the particular word or phrase. Parameter determination unit
7
may implement any of a number of speech parameter determination techniques, many of which are well-known in the art. An exemplary embodiment of a parameter determination technique is the vocoder encoder described in U.S. Pat. No. 5,414,796, entitled “VARIABLE RATE VOCODER,” which is assigned to the assignee of the present invention and incorporated by reference herein. An alternative embodiment of a parameter determination technique is a fast fourier transform (FFT), where the N parameters are the N FFT coefficients. Other embodiments derive parameters based on the FFT coefficients. Each spoken word or phrase produces one template of N parameters that is stored in template database
8
. After training is completed over M vocabulary words, template database
8
contains M templates, each containing N parameters. Template database
8
is stored into some type of non-volatile memory so that the templates stay resident when the power is turned off.
FIG. 2
is a block diagram of speech recognition unit
10
, which operates during the recognition phase of a speaker-dependent speech recognition system. Speech recognition unit
10
comprises template database
14
, which in general will be template database
8
from training unit
6
. The input to speech recognition unit
10
is digitized input speech x(n), which is the speech to be recognized. The input speech x(n) is passed into parameter determination block
12
, which performs the same parameter determination technique as parameter determination block
7
of training unit
6
. Parameter determination block
12
produces a recognition template of N parameters {t(n) n=1 . . . N} that models the characteristics of input speech x(n). Recognition template t(n) is then passed to pattern comparison block
16
that performs a pattern comparison between template t(n) and all the templates stored in template database
14
. The distances between template t(n) and each of the templates in template database
14
are forwarded to decision block
18
, which selects from template database
14
the template that most closely matches recognition template t(n). The output of decision block
18
is the decision as to which word in the vocabulary was spoken.
Recognition accuracy is a measure of how well a recognition system correctly recognizes spoken words or phrases in the vocabulary. For example, a recognition accuracy of 95% indicates that the recognition unit correctly recognizes words in the vocabulary 95 times out of 100. In a traditional speech recognition system, the recognition accuracy is severely degraded in the presence of noise. The main reason for the loss of accuracy is that the training phase typically occurs in a quiet environment but the recognition typically occurs in a noisy environment. For example, a handsfree carkit speech recognition system is usually trained while the car is sitting in a garage or parked in the driveway, so the engine and air conditioning are not running and the windows are usually rolled up. However, recognition is normally used while the car is moving, so the engine is running, there is road and wind noise present, the windows may be down, etc. As a result of the disparity in noise level between the training and recognition phases, the recognition template does not form a good match with any of the templates obtained during training. This increases the likelihood of a recognition error or failure.
FIG. 3
illustrates a speech recognition unit
20
which must perform speech recognition in the presence of noise. As shown in
FIG. 3
, summer
22
adds speech signal x(n) with noise signal w(n) to produce noise-corrupted speech signal r(n). It should be understood that summer
22
is not a physical element of the system, but is an artifact of a noisy environment. The noise-corrupted speech signal r(n) is input to parameter determination block
24
, which produces noise-corrupted template t
1
(n). Pattern comparison block
28
compares template t
1
(n) with all the templates in template database
26
, which was constructed in a quiet environment. Since noise-corrupted template t
1
(n) does not exactly match any of the training templates, there is a high probability that the decision produced by decision block
30
may be a recognition error or failure.
SUMMARY OF THE INVENTION
The present invention is a system and method for the automatic recognition of spoken words or phrases in the presence of noise. Speaker-dependent speech recognition systems operate in two phases: a training phase and a recognition phase. In the training phase of a traditional speech recognition system, a user is prompted to speak all the words or phrases in a specified vocabulary. The digitized speech samples for each word or phrase are processed to produce a template of parameters characterizing the spoken words. The output of the training phase is a library of such templates. In the recognition phase, the user speaks a particular word or phrase to initiate a desired action. The spoken word or phrase is digitized and processed to produce a template, which is compared with all

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Noise-compensated speech recognition templates does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Noise-compensated speech recognition templates, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Noise-compensated speech recognition templates will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2856125

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.