Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-01-07
2001-08-14
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S242000
Reexamination Certificate
active
06275802
ABSTRACT:
TECHNICAL FIELD
The present invention generally relates to speech recognition systems, and more particularly, to search algorithms for recognizing sequences of words in an input utterance.
BACKGROUND ART
A speech recognition system compares input speech parameters to word models in the form of state sequences. That is, each word in the system vocabulary is modeled as a sequence of connected states in which the states, the connections, or both are characterized by probability distributions of the speech parameters. During the recognition search, multiple recognition hypotheses are maintained, each hypothesis being predicated on: 1) the arrival of the input speech in a given state of a given word model, and 2) that a given sequence of words was spoken before that word. For the speech recognition system to operate at an acceptable speed, the number of active recognition hypotheses needs to be limited.
Forward-backward search is a commonly known technique for efficient speech recognition. A discussion of this subject matter appears in Chapter 12 of Deller, Proakis & Hansen,
Discrete
-
Time Processing of Speech Signals
(Prentice Hall, 1987), which is incorporated herein by reference. Forward-backward search employs a two-level approach to search a vast space of possible word sequences in order to assess which word sequence is most likely to have been spoken. In the forward search pass, relatively simple models are used to create a first set of word recognition hypotheses of words which could have been spoken, along with their associated occurrence probabilities. The backward search pass in the reverse direction uses more complex models which require greater computational resources. The number of possible word sequences considered by the backward search pass is limited by starting from the set of recognition hypotheses produced by the forward search pass. A forward-backward search algorithm, as described in the prior art, performs a forward search on an input utterance until the utterance ends, and then searches backwards from the end to the beginning of the utterance. This leads to a system in which the recognized words are presented only after the end of the complete utterance.
One approach utilizing a forward-backward search, described by Schwartz et al. in U.S. Pat. No. 5,241,619, which is hereby incorporated herein by reference, uses a forward search employing a relatively simple algorithm, followed by a backward search which performs a more complex word dependent n-best search. For a given state in a given word, Schwartz requires that different recognition hypotheses be maintained for different possible word histories. These recognition hypotheses form a monolithic set, which is limited to a certain maximum number. When the best recognition hypothesis in the set has a probability score which is outside a given offset from the probability score of the overall best recognition hypothesis of that speech frame, all of the recognition hypotheses in the set are removed in a single operation.
Thus, Schwartz describes a system with a two level state organization, with super-states that contain substates for different previous words. There are different mechanisms for limiting the number of super-states and the number of substates per super-state. The complexity of the state structure in Schwartz requires considerable computational time and resources.
SUMMARY OF THE INVENTION
A preferred embodiment of the present invention is directed to a speech recognition system and a method for processing an input speech signal represented by a sequence of parameters. A current block time period in the sequence has a duration sufficient that at least one word in the input speech signal is likely included. The current block time period is searched at selected locations in the sequence with a forward search pass that uses a first set of word models having sequences of model states. For each state in a set of selected model states, a most likely forward pass recognition hypothesis is generated which ends in the state and corresponds to the input speech signal. A backward search pass, back through the sequence of parameters within the current block time period is then performed using a second set of word models having sequences of model states and the set of recognition hypotheses generated by the forward search pass. A current word graph is produced which is composed of a set of word graph recognition hypotheses of at least one word, that end in selected model states, and nodes which connect adjacent words in the word graph. Word graph recognition hypotheses having an occurrence probability score less than a selected threshold are pruned. Any preceding word graph is updated by linking recognition hypotheses of the preceding word graph to the current word graph. The method is repeated for the next block time period until the sequence of parameters ends.
In a further embodiment, the forward search pass over the current block time period begins operation with the forward pass recognition hypotheses that were active at the end of the immediately preceding forward search pass, if any. In addition, or alternatively, the backward search pass may continue into a portion of the sequence of parameters in an immediately preceding block time period, if any. In such an embodiment, the step of updating may also include continuing the backward search pass in a reduced search mode over a portion of the immediately preceding block time period. Such a reduced search mode includes creating, when the backward search pass at a given time reaches a beginning of a word, a new node in the current word graph for that word, and examining the previous word graph for a node for that word at that time. If such a node exists, a substitute pointer is created from that node in the previous word graph to the new node in the current word graph, and the backward search pass for that word is stopped. If no such node exists, the backward search pass is continued.
In addition, or alternatively, the step of updating may further include deleting inactive recognition hypotheses of the preceding word graph. In a preferred embodiment, the step of updating may use pointers in the word graph to trace back from recognition hypotheses in the preceding word graph, and then reconnecting the backward pointers of active hypotheses of the preceding word graph to corresponding recognition hypotheses in the current word graph.
In an alternative embodiment, each word graph contains layers of recognition hypotheses and the step of updating involves processing backward through the word graph layers. The word graph layers may be structured so that recognition hypotheses within a word graph layer point to preceding word graph layers so that all the hypotheses within each word graph layer are updated when the word graph layer is processed. All recognition hypotheses ending at the same time may be within the same word graph layer. In addition, time may be an indexed part of each word graph layer. In such an embodiment, the layers may be updated by redirecting links from recognition hypotheses in the preceding word graph to recognition hypotheses in the current word graph.
In a preferred embodiment, the step of updating may also include outputting at least one of the probable recognition hypotheses of the current word graph such as by displaying to a user at least one of the recognition hypotheses, for example, the most probable recognition hypothesis.
In addition, or alternatively, a preferred embodiment may include pruning the current word graph when the sequence of parameters continues for a predetermined length of time without pausing at the end of a phrase. Such pruning may include determining the most probable recognition hypothesis for the current word graph, selecting a boundary time between a pair of words near the end of the most probable recognition hypothesis, and treating the boundary time as the end of the sequence of parameters and the beginning of a new sequence of parameters.
In accordance with another preferred embodiment of the present inve
Bromberg & Sunstein LLP
Lernout & Hauspie Speech Products N.V.
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Search algorithm for large vocabulary speech recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Search algorithm for large vocabulary speech recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Search algorithm for large vocabulary speech recognition will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2536806