N-best search for continuous speech recognition using...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S256000, C704S242000

Reexamination Certificate

active

06374220

ABSTRACT:

TECHNICAL FIELD OF THE INVENTION
This invention relates to speech recognition and more particularly to perform N-best search with limited storage space.
BACKGROUND OF THE INVENTION
Speech recognition involves searching and comparing the input speech to speech models representing vocabulary to identify words and sentences.
The search speed and search space for large vocabulary speech recognition has been an active research area for the past few years. Even on the state-of-the-art workstation, search can take hundreds of times real time for a large vocabulary task (20K words). Most of the fast search algorithms involve multi-passes of search. Namely to use simple models (e.g. monophones) to do a quick rough search and output a much smaller N-best sub-space; then use detailed models (e.g. clustered triphones with mixtures) to search that sub-space and output the final results (see Fil Alleva et al. “An Improved Search Algorithm Using Incremental Knowledge for Continuous Speech Recognition,” ICASSP 1993, Vol. 2, 307-310; Long Nguyen et al. “Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabulary,” ICASSP; and Hy Murveit et al. “Progressive-Search Algorithms for Large Vocabulary Speech Recognition,” ICASSP). The first pass of using monophones to reduce the search space will introduce error, therefore the reduced search space has to be large enough to contain the best path. This process requires a lot of experiments and fine-tuning.
The search process involves expanding a search tree according to the grammar and lexical constraints. The size of the search tree and the storage requirements grow exponentially with the size of the vocabulary. Viterbi beam search is used to prune away improbable branches of the tree; however, the tree is still very large for large vocabulary tasks.
Multi-pass algorithm is often used to speed up the search. Simple models (e.g. monophones) are used to do a quick rough search and output a much smaller N-best sub-space. Because there are very few models, the search can be done much faster. However, the accuracy of these simple models are not good enough, therefore a large enough N-best sub-space has to be preserved for following stages of search with more detailed models.
Another process is to use lexical tree to maximize the sharing of evaluation. See Mosur Ravishankar “Efficient Algorithms for Speech Recognition,” Ph.D. thesis, CMU-CS96-143, 1996. Also see Julian Odell “The Use of Context in Large Vocabulary Speech Recognition,” Ph.D. thesis, Queens' College, Cambridge University, 1995. For example, suppose both bake and baked are allowed in a certain grammar node, much of their evaluation can be shared because both words start with phone sequence: /b//ey//k/. If monophones are used in the first pass of search, no matter how large the vocabulary is, there are only about 50 English phones the search can start with. This principle is called lexical tree because the sharing of initial evaluation, and then the fanning out only when phones differ looks like a tree structure. The effect of lexical tree can be achieved by removing the word level of the grammar, and then canonicalize (remove redundancy) the phone network. For example:
% more simple.cfg
start(<S>).
<S>- - - > bake| baked.
bake - - - > b ey k.
baked - - - > b ey k t.
% cfg_merge simple.cfg| rg_from_rgdag| \
rg_canonicalize
start(<S>).
<S>- - - > b, Z_
1
.
Z_
1
- - - > ey, Z_
2
.
Z_
2
- - - > k, Z_
3
.
Z_
3
- - - > t, Z_
4
.
Z_
3
- - - > “ ”.
Z_
4
- - - > “ ”.
The original grammar has two levels: sentence grammar in terms of words, and pronunciation grammar (lexicon) in terms of phones. After removing the word level and then canonicalizing the one level phone network, same initial will be automatically shared. The recognizer will output phone sequence as the recognition result, which can be parsed (text only) to get the word. Text parsing takes virtually no time compared to speech recognition parsing.
It is desirable to provide a method to speed up the search and reduce the resulting search space that does not introduce error and can be used independently of multi-pass search or lexical tree.
SUMMARY OF THE INVENTION
In accordance with one embodiment of the present invention, an N-best search process with little increase in memory space and processing is provided by Viterbi pruning word level states to keep best path but also keeping sub-optimal paths for sentence level states.


REFERENCES:
patent: 5241619 (1993-08-01), Schwartz et al.
patent: 5805772 (1998-09-01), Chou et al.
patent: 6236962 (2001-05-01), Kosaka et al.
Richard M. Schwartz and Stephen C. Austin “A Comparison of Several Aproximate Algorithms for Finding Multiple (N-Best) Sentence Hypotheses,” Proc. IEEE ICASSP 1991, vol. 1, pp. 701-704, Apr. 1991.*
Fred K. Soong and Eng-Fong Huang, “A Tree-Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition,” Proc. IEEE ICASSP 1991, vol. 1, pp. 705-708, Apr. 1991.*
F. Richardson, M. Ostendorf, and J. R. Rohlicek, “Lattice-Based Search Strategies for Large Vocabulary Speech Recognition,” Proc. IEEE ICASSP 1995, vol. 1, pp. 576-579, May 1995.*
Bach-Hiep Tran, Frank Seide, and Volker Steinbiss, “A Word Graph Based N-Best Search in Continuous Speech Recognition,” Proc. 4th Intl. Conf. on Spoken Language, ICSLP 1996, vol. 4, pp. 2127-2130, Oct. 1996.*
C. J. Waters and B. A. MacDonald, “Efficient Word-Graph Parsing and Search With a Stochastic Context-Free Grammar,” Proc. 1997 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 311-318, Dec. 1997.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

N-best search for continuous speech recognition using... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with N-best search for continuous speech recognition using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and N-best search for continuous speech recognition using... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2840329

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.