Dynamic speech recognition pattern switching for enhanced...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06631348

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to speech recognition systems. Specifically, this invention relates to a novel system and method that enhances the accuracy of speech recognition systems by dynamically switching between reference patterns corresponding to training information produced under different ambient noise levels.
2. Description of Related Art and General Background
Speech recognition systems afford users the capability of performing various tasks on recognition-enabled apparatuses via verbal commands.
FIG. 1A
(Prior Art) is a high-level functional block diagram depicting a conventional speech recognition system
100
. As indicated in
FIG. 1
, system
100
comprises apparatus
105
and a sound capturing device
115
(e.g., microphone). Apparatus
105
includes a speech recognition processing mechanism
110
for analyzing and processing sounds captured by device
115
and for generating an identified utterance signal u
i
. Apparatus
105
also includes a statistical speech model
120
comprising a set of reference patterns, and related applications
125
for performing predetermined tasks t
i
. It is to be noted that apparatus
105
may take the form of a computer, telephone, or any device capable of recognizing and processing verbal commands and executing tasks based on those commands.
FIG. 1B
is a high-level flow diagram depicting the general operation of system
200
, denoted as process
150
. As indicated in
FIG. 1B
, the sounds or utterances captured by device
115
are received by speech recognition processing mechanism
110
in analog form in block B
155
. In block B
160
, mechanism
110
samples and digitizes the analog utterances and assembles the digitized utterances into frames. In block B
165
, mechanism
110
then extracts acoustical information from the utterance frames by employing any of a number of well-known techniques, including Linear Predictive Coding (LPC) and Filter Bank Analyses (FBA).
In block B
170
, process
150
endeavors to “recognize” the speech captured device
115
by having mechanism
110
compare the extracted acoustical information to a set of reference patterns stored in speech model
120
. The reference patterns comprise a plurality of utterances to be recognized. As such, mechanism
110
determines the best match between the extracted acoustical information and reference patterns in order to identify the utterance received by mechanism
110
. In performing the comparisons, mechanism
110
may employ a host of well-known statistical pattern matching techniques, including Hidden Markov Models, Neural Networks, Dynamic Time Warped models, Templates, or any other suitable word representation model. It is to be noted that the plurality of utterances comprising the reference patterns are based, at least in part, on speech training information produced during a training mode. Typically, in training mode, users recite a variety of selected verses into device
115
in order to acclimate mechanism
110
to the user's voice, prior to using system
100
. To this end, the selected verses are designed to make the user articulate a wide range of sounds (e.g., diphones, phonemes, allophones, etc.).
Based on the results of the comparison, mechanism
110
, in block B
175
, generates an identified utterance signal u
i
, indicating the best match between the utterance received by mechanism
110
and the stored reference patterns. Mechanism
110
then supplies signal u
i
to applications
125
to perform the predetermined tasks t
i
.
As noted in
FIG. 1A
, speech training is performed in the presence of ambient noise level n, and thus, the utterances comprising the stored reference patterns are affected by ambient noise. Given the reliance of the reference patterns on the speech training information, system
100
is particularly susceptible to the contextual nature of speech training. For example, suppose apparatus
105
is a portable computer equipped with applications
125
, configured to convert speech into text for word-processing tasks, and mechanism
110
, trained within a relatively serene environment (i.e., office). Once removed from the serene environment into a noisier environment, such as, for example, an airplane, mechanism
110
may suffer a significant decrease in accuracy and fidelity. The reasons for such decrease in performance may be two-fold. One reason may be that the ambient noise level n is so high that the sounds captured by the sound capturing device include a blend of speech and background noise, thus making it difficult to distinguish between the two.
Another reason, perhaps more common, is the fact that individuals have a tendency to manipulate their voices so as to ensure that the speech produced is understandable in the presence of substantial ambient noise. In doing so, individuals may, unwittingly, pronounce words with different phonological characteristics (e.g., level, inflections, stress, pitch, and rhythm) than normally produced during quieter conditions. As such, the performance of speech recognition processing mechanism
110
, trained and acclimated to a user's pronunciations under certain conditions, may be adversely affected when mechanism
110
operates under different conditions.
Therefore, what is needed is a system and method that dynamically switches between reference patterns based on training information produced under different ambient noise levels to enhance speech recognition accuracy.


REFERENCES:
patent: 4897878 (1990-01-01), Boll et al.
patent: 4905286 (1990-02-01), Sedgwick et al.
patent: 4933973 (1990-06-01), Porter
patent: 5293588 (1994-03-01), Satoh et al.
patent: 6381569 (2002-04-01), Sih et al.
patent: 6529872 (2003-03-01), Cerisara et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Dynamic speech recognition pattern switching for enhanced... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Dynamic speech recognition pattern switching for enhanced..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic speech recognition pattern switching for enhanced... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3113746

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.