Continuous speech recognition apparatus and method

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000, C704S251000

Reexamination Certificate

active

06484141

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a continuous speech recognition apparatus and method, and more particularly to a continuous speech recognition apparatus and method which achieves augmentation in speed and accuracy of recognition.
2. Description of the Related Art
As an example of a conventional continuous speech recognition apparatus, reference is had to a paper by S. Ortmanns, “LANGUAGE-MODEL LOOK-AHEAD FOR LARGE VOCABULARY SPEECH RECOGNITION”, ICSLP, 1996.
The conventional continuous speech recognition apparatus is shown in FIG.
6
. Referring to
FIG. 6
, the conventional continuous speech recognition apparatus shown includes a hypothesis storage section
1
, a hypothesis expansion section
3
, a tree structure dictionary storage section
4
, a language model section
7
, and an acoustic model section
8
.
In operation, the hypothesis storage section
1
stores hypotheses therein. The tree structure dictionary storage section
4
stores words, which make an object of recognition, as a tree structure dictionary (refer to FIG.
2
). The acoustic model section
8
calculates an acoustic model score for each frame. The language model section
7
calculates a language model score.
The hypothesis expansion section
3
acquires, for each frame, a structure of arcs from the tree structure dictionary storage section
4
taking an acoustic model from the acoustic model section
8
and a language model score from the language model section
7
into consideration and expands a hypothesis present on an arc to a succeeding arc. Referring to
FIG. 2
, a tree structure dictionary is structured such that a word is reached by tracing arcs branching in a tree structure from a root to a leaf (terminal arc).
Speech which makes an object of recognition is divided into short-time frames of a predetermined period, and such expansion as described above (that is, expansion of a hypothesis on an arc of a tree structure dictionary to a succeeding arc) is repeated from the speech beginning frame to the speech terminating frame. Then, a word through which a hypothesis which exhibits the highest score has passed in the past (a terminal of the tree structure dictionary) is finally determined as a recognition result.
Here, a hypothesis has position information of an arc on a tree structure dictionary, a history until the position is reached, and a score.
In a continuous speech recognition system wherein a plurality of words are represented as one tree structure dictionary (refer to FIG.
2
), what is a word with regard to which a hypothesis is being expanded at present cannot be specified except at the terminal arc.
Therefore, although an acoustic model score is calculated for each frame, a language model score can originally be determined only when a hypothesis reaches a terminal arc of a tree structure dictionary.
Therefore, in order to add a language model score as early as possible, a method employing look-ahead of a unigram language model score and look-ahead of a bigram language model score is disclosed in the document mentioned hereinabove.
According to the look-ahead of a unigram language model score, the highest one of unigram language model scores of words settled at terminal arcs in a tree structure dictionary is provided to a predecessor arc, and the unigram language model provided to the arc is temporarily added as a language model score of the hypothesis present on the arc, and then, when the hypothesis reaches the terminal arc of the tree structure dictionary and the word is settled, the unigram language model score which has been used till then is abandoned and then the settled bigram language model score is added.
On the other hand, according to the look-ahead of a bigram language model score, when a context is determined and a new tree structure dictionary is produced, bigram language model scores regarding all words of the context are calculated, and that one of the language model scores which has the highest score is provided to a predecessor arc, and then the bigram language score provided to the arc is added as a language model score of the hypothesis present on a certain arc.
The conventional speech recognition system has the following problems.
The first problem resides in that, when look-ahead of a bigram language model score is performed, a great memory capacity and a large amount of calculation are required.
The reason is that, where look-ahead of a bigram language model score is performed, when a context is produced and a tree structure dictionary is produced newly, it is required to repeat processing of producing not part of a tree structure dictionary but an entire tree structure dictionary, calculating all bigram language model scores with respect to the context and provide language model scores of all terminal arcs in the tree structure dictionary, with which words are settled, to a predecessor arc to propagate the language model scores to all predecessor arcs.
The second problem resides in that, when look-ahead of a unigram language model score is performed, wasteful calculation is performed.
The reason is that, when look-ahead of a unigram language model score is performed, some of arcs of a tree structure dictionary may expand only to a word whose connection to the context is not permitted linguistically and the hypothesis is expanded also to such arc, in which wasteful calculation is involved.
The third problem is such as follows. If strict look-ahead of a language model score of a bigram or more is not performed using a frame synchronous beam search (for the frame synchronous beam search, for example, Hermann Ney, “Data Driven Search Organization for Continuous Speech Recognition”, IEEE TRANSACTIONS ON SIGNAL PROCESSING, February, 1992 is referred to), that is, if connection possibility according to linguistic restrictions between a context and a word in a tree structure dictionary is not looked ahead, then the hypothesis is expanded also to an arc which is developed to a word whose connection to a context is not permitted linguistically as described above in connection with the second problem.
Then, if the score of the hypothesis is much higher than the others, then all hypotheses on an arc which is developed to a word whose connection to the context is permitted linguistically are excluded from the beam and thus eliminated.
As a result, in the succeeding frames, the word cannot be connected to a next word at all, and recognition processing for speech uttered later is disabled. In other words, recognition processing cannot be performed any more and a recognition result cannot be outputted.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide a continuous speech recognition apparatus and method by which the recognition speed and the recognition accuracy in continuous speech recognition can be augmented.
In order to attain the object described above, according to an aspect of the present invention, there is provided a continuous speech recognition apparatus, comprising a hypothesis storage section for storing hypotheses therein, hypothesis expansion discrimination means for determining whether or not a hypothesis may be expanded to a succeeding arc, a tree structure dictionary storage section for storing a tree structure dictionary and a context preceding to the tree structure dictionary therein, a succeeding word speech part information storage section for storing information of whether or not speech parts are included in all of succeeding words present behind each of arcs in the tree structure dictionary, a speech part connection information storage section for storing connection information between the speech parts, means for providing a language model score to a hypothesis, means for providing an acoustic model score to a hypothesis, and hypothesis expansion means operable in response to an expansion instruction received from the hypothesis expansion discrimination means for acquiring a structure of an arc from the tree structure dictionary storage section and expanding a hypothesis prese

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Continuous speech recognition apparatus and method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Continuous speech recognition apparatus and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Continuous speech recognition apparatus and method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2987115

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.