Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-11-08
2002-08-27
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S251000, C704S270100
Reexamination Certificate
active
06442520
ABSTRACT:
TECHNICAL FIELD
The invention relates to automatic speech recognition and more particularly to a method and apparatus for continuous speech recognition using a self-adjusting decoder network having multiple layers.
DESCRIPTION OF THE PRIOR ART
Two decoder techniques are commonly known in the speech recognition field. The first of these two techniques is called the best-first based stack decoder. This first technique accommodates long span language models, multi-word grammars and increased vocabulary size. However, the best-first stack decoder method requires good search heuristics in order to estimate the least upper bound of the speech recognition path score so that search errors can be avoided and complexity reduced. The second of these two techniques is called the breadth first based beam decoder. The breadth first beam search technique does not require heuristics and the search can be made frame synchronous to the incoming speech data frames. However, the breadth first beam search decoder requires considerable circuit resources in order to support and maintain the large number of active nodes that typically are created and maintained during a beam search.
Thus, there is a need in the art for a speech recognition decoder that combines the resource advantages of the best-first stack search decoder and the accuracy advantages of the breadth first beam search decoder into a new search decoder.
SUMMARY OF THE INVENTION
Briefly stated in accordance with one aspect of the invention, the aforementioned need is achieved by providing a continuous speech recognizer that is based on three dynamically expanded networks, each having a self-adjusting capability.
In accordance with one aspect of the invention, the aforementioned need is provided by a system for recognizing speech which includes a converter for converting input speech into frames of speech data. The speech data is inputted to a dynamic programming network which receives the frames of speech data and builds nodes that represent likelihood scores of various pre-defined models corresponding to the speech data of the respective frame. An asynchronous phone expanding network operates in parallel with said dynamic programming network, and provides phone rules that control which nodes of said dynamic programming network can be connected by arcs to other nodes dependent upon said speech data. It should be noted that the word ‘phone’ in this application is taken directly from the Greek word ‘phone’, which means sound and/or speech. Additionally, an asynchronous word network operates in parallel with the phone network and the dynamic programming network to provide word rules that control which portions of the phone network correspond to recognizable words and which do not correspond to recognizable words. The dynamic programming network, the phone network and the word network cooperating to process the speech data frames to recognize the input speech.
In accordance with another aspect of the invention, the aforementioned need is achieved by providing a speech recognition system including: a converter for converting input speech into frames of speech data; a dynamic programming process that establishes a plurality of nodes in response to the frames of speech data and arc paths connecting to others of the plurality of nodes thereby forming a speech decoder network; a phone rule driven process that applies predetermined phone rules for the speech decoder network to establish a phone network and increase the accuracy of an output of the speech recognition system; and a word rule driven process that applies pre-determined word rules for the speech decoder network and the phone network to increase also the accuracy of the output of the speech recognition system.
In accordance with another aspect of the invention, the aforementioned need is achieved by providing a decoder for continuous speech recognition using a processor and a memory having a plurality of memory locations. The decoder has a speech framer for regularly processing input speech into consecutive frames of acoustic data. Connected to the output of the speech framer are a word network process for storing and applying language rules, a phone network process for storing and applying phone rules; and a dynamic programming network process. The dynamic programming network process processes the acoustic data to build a network of nodes connected by arcs which provide possible decodings of the input speech. The dynamic programming network process also uses information from the word network process and the phone network process to direct the building of the nodes and the connection of each node to previous nodes by arcs.
REFERENCES:
patent: 5040127 (1991-08-01), Gerson
patent: 5168524 (1992-12-01), Kroeker et al.
patent: 5222190 (1993-06-01), Pawate et al.
patent: 5251129 (1993-10-01), Jacobs et al.
patent: 5268990 (1993-12-01), Cohen et al.
patent: 5329608 (1994-07-01), Bocchieri et al.
patent: 5333275 (1994-07-01), Wheatley et al.
patent: 5388183 (1995-02-01), Lynch
patent: 5509104 (1996-04-01), Lee et al.
patent: 5581655 (1996-12-01), Cohen et al.
patent: 5649057 (1997-07-01), Lee et al.
patent: 5706397 (1998-01-01), Chow
patent: 5719997 (1998-02-01), Brown et al.
patent: 5729656 (1998-03-01), Nahamoo et al.
patent: 5805772 (1998-09-01), Chou et al.
patent: 6006181 (1999-12-01), Buhrke et al.
patent: 6073095 (2000-06-01), Dharanipragada et al.
C-H. Lee et al., “A Frame-Synchronous Network Search Algorithm for Connected Word Recognition”,IEEE Transactions on Acoustics, Speech and Signal Processing vol. 37, No. 11, Nov. 1989, pp. 1649-1658.
P. Kenney et al. “A*—Admissible Heuristics For Rapid Lexical Access”, IEEE Transactions on Speech And Audio Processing, vol. 1, No. 1, 1993, pp. 49-58.
R. Haeb-Umbach et al., “Improvements In Beam Search For 10000-Word Continuous-Speech Recognition”, IEEE Transactions On Speech And Audio Processing, vol. 2, No. 2, Apr. 1994, pp. 353-356.
H. Ney et al., “Improvements In Beam Search For 10000-Word Continuous Speech Recognition”, ICASSP, Mar. 23-26, 1992, 0-7803-0532-9/92, IEEE, pp. I-9-I-12.
H. Van hamme et al., “An Adaptive Beam Pruning Technique For Continuous Speech Recognition”, pp. 2083-2086.
D. B. Paul et al., “The Lincoln Large-Vocabulary Stack-Decoder HMM CSR”, 0-7803-0946-4/93, 1993 IEEE, pp. II-660-II-663.
P. Kenney et al., “A*—Admissible Heuristics For Rapid Lexical Access”, CH2977-7/91/0000-0689, 1991 IEEE, pp. 689-692.
T. Kuhn et al., “DP-Based Wordgraph Pruning”, 0-7803-3192-3/96, 1996 IEEE, pp. 861-864.
X. L. Aubert, “Fast Look-Ahead Pruning Strategies In Continuous Speech Recognition”, CH2673-2/89/8008-0659, 1989 IEEE, pp. 659-662.
Buhrke Eric Rolse
Chou Wu
Agere Systems Guardian Corp.
Dorvil Richemond
Nolan Daniel N.
Penrod Jack R.
LandOfFree
Method and apparatus for continuous speech recognition using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for continuous speech recognition using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for continuous speech recognition using... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2932965