Search and rescoring method for a speech recognition system

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S256000

Reexamination Certificate

active

06253178

ABSTRACT:

BACKGROUND OF THE INVENTION
A. Field of the Invention
The present invention relates to a speech recognition system and method. More particularly, the present invention relates to a speech recognition system that uses a search and rescoring method to recognize the speech.
B. Description of the Related Art
Speech recognition systems typically use a concatenation of phonemes to model the vocabulary of words spoken by a user. Phonemes are the basic units representing the sounds that comprise any given word and, therefore, depend upon the context in which they arise in a word. Allophones are context dependent phonemes and are often represented by Hidden Markov Models (HMM) comprising a sequence of states each having a transition probability. Thus, any word can be represented as a chain of concatenated HMM enabling speech to be modelled as a random walk through the HMM for a word.
To recognize an unknown utterance spoken by the user, the system must then compute the most likely sequence of states through the HMM. The well-known Viterbi method can be used to evaluate the most likely path by opening the HMM up into a trellis. The trellis has the same number of states as there are in the allophone model, and, thus, the total number of operations per frame for each trellis is proportional to the total number of transitions in the corresponding allophone model.
The Viterbi method, however, has two main problems. First, the method is computationally complex because it evaluates every transition at every node of the entire vocabulary network. For speech recognition systems having a medium to large vocabulary, this computation can be very burdensome and greatly increases the cost of the computer hardware. Second, the complexity of the Viterbi method allows the computation for only a single recognition result. This precludes the use of more accurate post processing to refine the recognition result.
While other approaches have been developed which provide more than one choice for the recognition result, they have problems as well. For example, the stack decoding disclosed by P. Kenny et al., “A* Admissible Heuristics for Rapid Lexical Access,” Proceeding ICASSP, p. 689-92 (1991), provides alternative choices, but is ineffective when the heuristic partial path likelihoods are inaccurate. The approach disclosed by H. Ney et al., “Data Driven Organization of the Dynamic Programming Beam Search for Continuous Speech Recognition,” IEEE Transactions on Signal Processing,” Vol. SP-40, No. 2, p. 272-81, February 1992, provides an efficient method to limit the search space. However, this method suffers when the speech contains noise common in telephone applications.
Also known are methods which can be used for continuous word recognition. See R. Schwarthz et al., “The N-Best Algorithm: An Efficient and Exact Procedure for FICES Finding the N Most Likely Sentence Hypothesis,” IEEE ICASSP-90, p. 81-84,
Albuquerque, April 1990; V. Steinbiss, “Sentence-Hypothesis Generation in a Continuous-Speech Recognition System,” Proc. EuroSpeech-89, Vol. 2, p. 51-54, Paris, September 1989; L. Nguyen et al., “Search Algorithm for Software-Only Real-Time Recognition with Very Large Vocabularies,” Proceedings of ARPA Human Language Technology Workshop, p. 91-95, Plainsboro, N.J., March 1993. These methods, however, are not useful in recognizing strings of unrelated words, such as strings describing a location, a company name or person's name.
U.S. Pat. No. 5,195,167 (the '167 patent) describes a fast-match search which reduces the number of computations performed by the Viterbi method. The '1167 patent teaches replacing each HMM transition probability at a certain time with the maximal value over the associated allophone. U.S. Pat. No. 5,515,475 discloses a two-pass search method that builds upon the idea of the '167 patent. The first pass identifies the N most likely candidates for the spoken word, the N most likely hypotheses, using a one-state allophone model having a two frame minimum duration. The second pass then decodes the N hypothesis choices using the Viterbi algorithm. The memory and processing time requirements of this approach, however, are unsatisfactory for some applications. Therefore, there is a demand for a speech recognition system which can operate at a high speed without requiring an excessive amount of memory or a high speed processor.
SUMMARY OF THE INVENTION
Systems and methods consistent with the present invention provide a speech recognition system which can operate at a high speed and does not require a large amount of memory or a high speed processor.
To achieve these and other advantages, a method for recognizing speech consistent with the present invention includes the step of calculating feature parameters for each frame of the input speech signal. The input speech signal is organized into a series of frames and is decimated to select K frames out of every L frames of the input speech signal according to a decimation rate K/L. A first set of model distances is then calculated for each of the K selected frames of the input speech signal, and a Hidden Markov Model (HMM) topology of a first set of models is reduced according to the decimation rate K/L. The system then selects a reduced set of model distances from the computed first set of model distances according to the reduced HMM topology and selects a first plurality of candidate choices for recognition according to the reduced set of model distances. A second set of model distances is computed, using a second set of models, for a second plurality of candidate choices, wherein the second plurality of candidate choices correspond to at least a subset of the first plurality of candidate choices. The second plurality of candidate choices are rescored using the second set of model distances, and a recognition result is selected from the second plurality of candidate choices according to the rescored second plurality of candidate choices.
Both the foregoing general description and the following Detailed Description are exemplary and are intended to provide further explanation of the invention as claimed.


REFERENCES:
patent: 4587670 (1986-05-01), Levinson et al.
patent: 5195167 (1993-03-01), Bahl et al.
patent: 5241619 (1993-08-01), Schwartz et al.
patent: 5337394 (1994-08-01), Sejnoha
patent: 5386492 (1995-01-01), Wilson et al.
patent: 5515475 (1996-05-01), Gupta et al.
patent: 5799277 (1998-08-01), Takami
patent: 5884259 (1999-03-01), Bahl et al.
X.D.Huang “Phoneme classification using semicontinuous hidden Markov models,” IEEE Transactions on Signal Processing, vol. 40, Issue 5, May 1992, pp. 1062-1067.*
Y. Zhao, “A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units,” IEEE Transactions on Speech and Audio, vol.1, Issue 3, Jul. 1993, pp. 345-361.*
P. Kenny et al., A * Admissible Heuristics for Rapid Lexical Access, Proceeding ICASSP, p. 689-92 (1991).
H. Ney et al., “Data Driven Organization of the Dynamic Programming Beam Search for Continuous Speech Recognition, ”IEEE Transactions of Signal Processing, vol. SP-40, No. 2, p. 272-81, Feb. 1992.
R. Schwartz et al., “The N-Best Algorithm: An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypothesis, ”IEEE ICASSP-90, p. 81-84, Albuquerque, Apr. 1990.
V. Steinbiss, “Sentence-Hypothesis Generation in a Continuous Speech Recognition System, ”Proc. EuroSpeech-89, vol. 2, p. 51-54, Paris, Sep. 1989.
L. Nguyen et al., “Search Algorithm for Software-Only Real-Time Recognition with Very Large Vocabularies, ”Proceedings of ARPA Human Language Technology Workshop, p. 91-95, Plainsboro, New Jersey, Mar. 1993.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Search and rescoring method for a speech recognition system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Search and rescoring method for a speech recognition system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Search and rescoring method for a speech recognition system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2482555

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.