Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1995-04-24
2001-01-23
Knepper, David D. (Department: 2748)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S244000
Reexamination Certificate
active
06178399
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an apparatus for recognizing time series signals, such as human speech and other acoustic signals.
2. Description of the Background Art
Conventionally, a time series signal recognition, such as speech recognition, has been achieved basically by first performing a so called segmentation in which a word boundary is detected in the time series signals, and then look for a matching between a reference pattern in a speech recognition dictionary and a word feature parameter extracted from the signal within the detected word boundary. There are several speech recognition methods which falls within this category of the prior art, which includes DP matching, HMM (Hidden Markov Model), and the Multiple Similarity (partial space) method.
However, in more realistic noisy environments there has been a problem in practice that many recognition errors due to failure of the appropriate word boundary detection as are due to false pattern matching.
Namely, the detection of the word boundary has conventionally been performed with energy or pitch frequency as a parameter, so that highly accurate recognition tests can be performed in a quiet experiment room. But, the recognition rate drastically decreases for more practical locations for use, such as inside offices, cars, stations, or factories.
To cope with this problem, there has been a proposition of a speech recognition method, called a word spotting (continuous pattern matching) method, in which the word boundary is taken to be not fixed but flexible, but this method is associated with another kind of recognition error problem.
This can be seen from the diagram of
FIG. 1
in which an example of time series for an energy of a signal is depicted along with indications for three different noise levels. As shown in
FIG. 1
, the word boundary for this signal progressively gets narrower as the noise level increases from N
1
to N
2
and to N
3
, which are indicated as intervals (S
1
, E
1
), (S
2
, E
2
), and (S
3
, E
3
), respectively. However, the speech recognition dictionary is usually prepared by using the word feature vectors obtained by using the specific word boundaries and the specific noise level, so that when such a conventional speech recognition dictionary is used with the word spotting method, the matching with the word feature vector obtained from an unfixed word boundary for a speech mixed with noise having a low signal
oise ratio becomes troublesome, and many recognition errors occur.
On the other hand, for a speech recognition method using a fixed word boundary, there is a learning system for a speech recognition dictionary in which the speech variations are taken into account artificially, but no effective learning system is known for the word spotting method, so that the word spotting method has been plagued by the problem of excessive recognition errors.
Thus, although sufficiently high recognition rate has been obtainable for experiments performed in a favorable noiseless environment, such as an experimental room, conducted by an experienced experimenter, a low recognition rate resulted in a more practical noisy environment with an inexperienced speaker because of errors in word boundary detection. This has been a major obstacle for realization of a practical speech recognition system. Furthermore, the speech recognition dictionary and the word boundary detection have been developed rather independent of each other, so that no effective learning system has been known for the speech recognition method using an unfixed word boundary, such as the word spotting method.
It is also to be noted that these problems are relevant not only for speech recognition, but also to the recognition of other time series signals, such as vibrations or various sensor signals.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a method and an apparatus for time series signal recognition capable of obtaining a high recognition rate even in noisy environments in which the signals are subjected to rather large variations.
According to one aspect of the present invention, there is provided an apparatus for time series signal recognition, comprising: means for inputting signal patterns for time series signals to be recognized; means for recognizing the time series signals, including: means for extracting a multiplicity of candidate feature vectors characterizing individual time series signal from the signal pattern, without fixing a boundary for individual time series signal in the signal patterns; recognition dictionary means for storing reference patterns with which the individual time series signals are matched; means for calculating similarity values for each of the multiplicity of candidate feature vectors and the reference patterns stored in the recognition dictionary means; means for determining a recognition result by selecting reference patterns stored in the recognition dictionary means for which the similarity value calculated by the calculating means is greater than a prescribed threshold value; and means for learning new reference patterns to be stored in the recognition dictionary means, including: means for artificially synthesizing signal patterns with variations for learning to be given to the recognizing means; means for extracting feature vectors for learning from the recognition results and the similarity values obtained by the recognizing means from the signal patterns with variations for learning; and means for obtaining the new reference patterns from the feature vectors for learning extracted by the extracting means.
According to another aspect of the present invention there is provided a method of time series signal recognition, comprising the steps of: inputting signal patterns for time series signals to be recognized; recognizing the time series signals, including the steps of: extracting a multiplicity of candidate feature vectors characterizing individual time series signal from the signals pattern, without fixing a boundary for individual time series signal in the signal patterns; storing reference patterns with which the individual time series signals are matched in recognition dictionary means; calculating similarity values for each of the multiplicity of candidate feature vectors and the reference patterns stored in the recognition dictionary means; and determining a recognition result by selecting reference patterns stored in the recognition dictionary means, for which the similarity value calculated at the calculating step is greater than a prescribed threshold value; and learning new reference patterns to be stored in the recognition dictionary means, including the steps of: artificially synthesizing signal patterns with variations for learning to be given to the recognizing step; extracting feature vectors for learning from the recognition results and the similarity values obtained by the recognizing step from the signal patterns with variations for learning; and obtaining the new reference patterns from the feature vectors for learning extracted by the extracting step.
Other features and advantages of the present invention will become apparent from the following description taken in conjunction with the accompanying drawings.
REFERENCES:
patent: 4100370 (1978-07-01), Suzuki
patent: 4481593 (1984-11-01), Bahler
patent: 4720802 (1988-01-01), Damoulakis et al.
patent: 4783802 (1988-11-01), Takebayoshi
patent: 4852181 (1989-07-01), Morito et al.
patent: 178 509 (1985-09-01), None
C.Lee, et al., IEEE Int'l Conference on Acoustics Speech and Signal Processing, “Speech Recognition Under Additive Noise”, vol. 3, Mar. 19, 1984, pp. 3571-3572.
The ICAASP Space 84 Proceedings, Mar. 19-21, 1984, pp.3573-3574.
David Roe, IEEE Int'l Conference on Acoustics Speech and Signal Processing, “Speech Recognition with a Noise-Adapting Codebook”, vol. 2, Apr. 6, 1987, pp.1139-1140.
D.Paul, et al.,Speech Tech '86, “Robust HHM Based Techniques for Recognition of Speech Produced Produced Under Stress
Chimoto Hiroyuki
Kanazawa Hiroshi
Takebayashi Yoichi
Foley & Lardner
Kabushiki Kaisha Toshiba
Knepper David D.
LandOfFree
Time series signal recognition with signal variation proof... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Time series signal recognition with signal variation proof..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Time series signal recognition with signal variation proof... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2436217