Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2001-03-27
2004-10-05
McFadden, Susan (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S254000
Reexamination Certificate
active
06801892
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to a speech recognition method and apparatus using a Hidden Markov Model, a program for executing speech recognition by computer, and a storage medium from which the stored program can be read by a computer.
BACKGROUND OF THE INVENTION
Methods using the Hidden Markov Model (referred to as “HMM” below) are the focus of continuing research and application as effective methods of speech recognition, and many speech recognition systems are currently in use.
FIG. 6
is a flowchart illustrating an example of conventional speech recognition using an HMM.
Step S
1
, which is a voice input step, subjects a voice signal that has been input from a microphone or the like to an analog-to-digital conversion to obtain a digital signal. Step S
2
subjects the voice signal obtained by the conversion at step S
1
to acoustic analysis and extracts a time series of feature vectors. In acoustic analysis, an analytical window having a window width of 30 ms is provided for a voice signal, which is a continuous waveform that varies with time, and the voice signal is subjected to acoustic analysis while the analytical window is shifted by one-half to one-third the window width (i.e., 10 to 15 ms). The analytical results within each of the windows are output as feature vectors. The voice signal is converted to feature-vector sequences O(t) (1≦t≦/T), wherein t represents the frame number.
Next, processing proceeds to step S
3
. This step includes generating a search space, in which the two axes are HMM state sequences and feature-vector sequences of the input voice, by using an HMM database
5
, which stores HMMs comprising prescribed structural units, and a dictionary
6
that describes the corresponding relationship between words to be recogized and HMM state sequences, and finding an optimum path using Viterbi algorithm for which the maximum acoustic likelihood is obtained, in this search space.
The details of a procedure for the search will be described with reference to FIG.
7
.
FIG. 7
illustrates search space and the manner in which the search is conducted in a case where two words “aki” and “aka” are subjected to continuous speech recognition using phoneme HMMs. In
FIG. 7
, horizontal axis shows an example of feature-vector sequences and the vertical axis shows an example of the HMM state sequences.
First, HMM state sequences corresponding to one or more words to undergo recognition are generated from the HMM database
5
and dictionary
6
, which describes the corresponding relationship between words to be recogized and the HMM state sequences. The HMM state sequences thus generated are as shown along the vertical axis in FIG.
7
.
A two-dimensional, grid-like search space is formed from the HMM state sequences thus generated and feature-vector sequences.
Next, with regard to all paths that originate from “START” and arrive at “END” in the search space of
FIG. 7
, an optimum path for which the maximum cumulative acoustic likelihood will be obtained is found from the state output probability at each grid point and HMM state transition probability corresponding to a transition between grid points.
Then, with regard to each of the grid points (state hypotheses) in search space, the cumulative acoustic likelihoods (state-hypothesis likelihoods) up to arrival at the respective grid points are calculated in numerical order from t=1 to t=T. A state-hypothesis likelihood H(s,t) of state s of frame t is calculated by the following equation:
H
(
s,t
)=max
H
(
s′,t
−1)×
a
(
s′,s
)×
b[s,O
(
t
)]
s′&egr;S
′(
s
) Eq. (1)
where S′ (s) represents a set of states connected to state s, a(s′,s) represents the transition probability from state s′ to state s, and b[s,O(t)] represents the state output probability of state s with respect to a feature vector O(t).
By using the state-hypothesis likelihood calculated above, the acoustic likelihood of the optimum path leading to “END” is calculated in accordance with the following equation:
max H(s,T)×a(s,s′)s&egr;Sf Eq. (2)
where Sf represents a set of phoneme HMM states for which arrival at “END” is possible, i.e., a set of HMM final states representing each of the words to be recognized. Further, a(s,s′) denotes the probability of a transition from state s to other states.
When the state-hypothesis likelihood of each state hypothesis is calculated in the calculation process described above, the states of the origins of transitions [s′ in Equation (1)] for which the state-hypothesis likelihood is maximized are stored and the optimum path for which the maximum acoustic likelihood is calculated by tracing the stored values.
The HMM state sequences corresponding to the optimum path found through the above-described procedure are obtained and the recognized words corresponding to these state sequences are adopted as the results of recognition. In a case where the path indicated by the bold line in
FIG. 7
is the optimum path for which the maximum cumulative acoustic likelihood is obtained, this path traverses the states of phoneme HMM /a/ /k/ /a/ and therefore the result of speech recognition in this instance is “aka”.
Finally, processing proceeds to step S
4
in
FIG. 6
, where the result of recognition is displayed on a display unit or delivered to another process.
The search space shown in
FIG. 7
increases in size in proportion to the number of words to be recognized and the duration of the input speech. This enlargement of the search space is accompanied by an enormous increase in the amount of processing needed to search for the optimum path. As a consequence, the response speed of speech recognition declines when implementing speech recognition applied to a large vocabulary and when implementing speech recognition using a computer that has an inferior processing capability.
SUMMARY OF THE INVENTION
Accordingly, an object of the present invention is to provide a speech recognition method, apparatus and storage medium wherein high-speed speech recognition is made possible by reducing the amount of processing needed for speech-recognition search processing.
According to the present invention, a speech recognition method for attaining the foregoing object comprises a speech recognition method comprising the steps of: extracting sequences of feature vectors from an input voice signal; and subjecting the voice signal to speech recognition using a search space in which an HMM-to-HMM transition is not allowed in specific feature-vector sequences.
Further, a speech recognition apparatus for attaining the foregoing object comprises a speech recognition apparatus comprising: extraction means for extracting sequences of feature vectors from an input voice signal; and recognition means for subjecting the voice signal to speech recognition using a search space in which an HMM-to-HMM transition is not allowed in specific feature-vector sequences.
Other features and advantages of the present invention will be apparent from the following description taken in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the figures thereof.
REFERENCES:
patent: 5390278 (1995-02-01), Gupta et al.
patent: 5706391 (1998-01-01), Yamada et al.
patent: 5787396 (1998-07-01), Komori et al.
patent: 5799278 (1998-08-01), Cobbett et al.
patent: 5903865 (1999-05-01), Ishimitsu et al.
patent: 5940794 (1999-08-01), Abe
patent: 5956676 (1999-09-01), Shinoda
patent: 5956679 (1999-09-01), Komori et al.
patent: 5970445 (1999-10-01), Yamamoto et al.
patent: 5970453 (1999-10-01), Sharman
patent: 5983180 (1999-11-01), Robinson
patent: 6456970 (2002-09-01), Kao
Lewis Michael
McFadden Susan
LandOfFree
Method and system for the reduction of processing time in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for the reduction of processing time in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for the reduction of processing time in a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3309403