Image analysis – Pattern recognition – Classification
Reexamination Certificate
1996-02-02
2001-01-09
Bella, Matthew C. (Department: 2721)
Image analysis
Pattern recognition
Classification
C382S159000, C382S228000, C704S244000, C704S251000
Reexamination Certificate
active
06173076
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a pattern recognition system and, more particularly, to a pattern adaptation system for adapting “a reference pattern” constituting a plurality of different categories using “an input pattern” as an aggregate of input samples. It is presently understood that the best field of utilization of the present invention is the speaker adaptation system in a speech recognition system. This system is based on a Hidden Marcov model (HMM) of a mixed continuous distribution model type or the like in which the reference pattern output probability distribution is a mixed Gaussian distribution.
Recently, research and investigations concerning mechanical recognition of speech patterns have been made, and various methods (i.e., speech recognition methods) have been proposed. One typical method that is extensively applied is based on a method called dynamic programming (DP) matching.
Particularly, in the field of speech recognition systems using HMM, speaker-independent speech recognition systems that are capable of recognition of the speech of any person, have recently been extensively studied and developed.
The speaker-independent type of recognition system has an advantage over the speaker-dependent type of recognition system, where the speaker-dependent type is used by a definite user, because the user of a speaker-independent type need not register any speech in advance. However, the following problems in the speaker-independent recognition system are pointed out. A first problem is that the speaker-independent system is inferior to the speaker-dependent system for almost all speakers. A second problem is that the speaker-independent recognition performance is greatly deteriorated for some “particular speakers” (i.e., unique speakers).
In order to solve these problems, research and investigations have recently been started, which concern the application of the speaker adaptation techniques that are used mainly in speaker-dependent systems to speaker-independent systems as well. The speaker adaptation techniques have a concept of adapting a speech recognition system to new users (i.e., unknown speakers) by using a lesser amount of adaptation data than is used for the initial training. The speaker adaptation techniques are detailed in Sadaoki Furui, “Speaker Adaptation Techniques in Speech Recognition”, Television Study Association, Vol. 43, No. 9, 1989, pp. 929-934.
Speaker adaptation can be classified into two methods. One is “supervised speaker adaptation,” and the other is “unsupervised speaker adaptation.” Also, it is understood that the “supervised signal” is a vocal sound expression series representing the speech contents of input speech. The “supervised speaker adaptation” thus refers to an adaptation method in the case where the vocal sound expression series for the input speech is unknown, and requires preliminary instruction of speech vocabularies with the unknown speaker for adaptation. The “unsupervised adaptation,” on the other hand, is an adaptation method used when the vocal sound expression series for the input speech is known, and requires no limit on the speech contents of input speech to the unknown speaker, i.e., no speech vocabulary has to be instructed with the unknown speaker. Actually, unsupervised adaptation using input speech as the subject of speech recognition can occur without the unknown speaker being aware that the adaptation is being done. Generally, however, the recognition rate based on “unsupervised adaptation” after the adaptation is low as compared to that based on the “supervised adaptation.” For this reason, the “supervised adaptation” is presently used frequently.
From the above viewpoint, the need for the speaker adaptation system in the speech recognition system is increasing. The “adaptation” techniques as described are important not only in speech recognition systems but also in pattern recognition systems, the concept of which involves the speech recognition system. The “speaker adaptation system” in the speech recognition system can be generalized as the “pattern adaptation system” in the pattern recognition system.
In the prior art pattern adaptation systems of the type as described, adaptation is executed in the same mode irrespective of whether the number of input samples for adaptation is large or small. Therefore, when the input samples are less in number, then the data amount may be insufficient and deteriorate the accuracy of parameter estimation for the pattern adaptation.
The process of the speech recognition system, which are the most extensive applications of the present invention, will now be described. A speech recognition system using HMM is described as an example, and the speaker adaptation techniques in this speech recognition system will also be mentioned with reference to FIG.
4
.
A speaker's speech (i.e., input speech) is supplied to an input pattern generation device
42
for conversion to a feature vector time series for each unit, also called a “frame,” having a certain time length through such processes as analog-to-digital conversion and speech analysis. The “feature vector time series” is referred to as an input pattern. The time length of the frame is usually 10 to 100 ms. The feature vectors are obtained by extracting the feature quantity of the speech spectrum at corresponding instants, usually 10-dimensional to 100-dimensional (10-d to 100-d).
HMM's are stored as reference patterns in a reference pattern memory, device
41
. The HMM's are speech (sound) information source models, and the HMM parameters may be trained by using input speech. The HMM's will be mentioned in the description of a recognition device
43
given hereunder. The HMM is usually prepared for each recognition unit. Here, the case of where the recognition unit is a sound element is taken as an example. In the speaker-independent recognition system, HMM's are stored in the recognition pattern memory device
41
where the HMM's have been previously obtained for use with an unknown speaker through training of speeches of many speakers.
A case is now assumed, where 1,000 words are the subjects of recognition, that is, a case where a correct answer of one word is obtained among a set of recognition candidates of 1,000 words. For word recognition, HMMs of individual sound elements are coupled together to produce an HMM of a recognition candidate word (word HMM). When 1,000 words are recognized, word HMMs for 1,000 words are produced.
The recognition device
43
recognizes the input pattern using the word HMMs. This “pattern recognition” will now be described. In the HMM, a statistical concept is introduced into the description of the reference pattern to cope with variations of the speech pattern. The HMM is detailed in Seiichi Nakagawa, “Speech Recognition with Probability Models”, the Electronic Information Communication Engineer's Association, 1987 (hereinafter referred to as the Nakagawa Literature), pp. 40-44, 55-60 and 69-74.
Each sound element HMM usually comprises 1 to 10 states and inter-state transitions. Usually, the start (i.e., first) and last states are defined, and a symbol is taken out from each state for every unit time for inter-state transition. The speech of each sound element is expressed as a time series of symbols produced from individual states during the inter-state transition interval from the start state to the last state. For each state the symbol appearance probability (output probability) is defined, and for each inter-state transition the transition probability is defined. The HMM thus has an output probability parameter and a transition probability parameter. The output probability parameter represents a “sound color” sway of the speech pattern. The transition probability parameter represents a “time-wise” sway of the speech pattern. The generation probability of speech from the model (i.e., HMM) thereof, can be obtained by setting the start state probability to a certain value and multiplying the value by the output probabili
Bella Matthew C.
Foley & Lardner
NEC Corporation
LandOfFree
Speech recognition pattern adaptation system using tree scheme does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech recognition pattern adaptation system using tree scheme, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition pattern adaptation system using tree scheme will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2481031