Method and device for recognizing at least one keyword in...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S251000

Reexamination Certificate

active

06453293

ABSTRACT:

BACKGROUND OF THE INVENTION
FIELD OF THE INVENTION
The invention relates to a method and a device for recognizing at least one keyword in spoken speech using a computer.
A method and a device for recognizing spoken speech are known from the reference by A. Hauenstein, titled “Optimierung von Algorithmen und Entwurf eines Prozessors für die automatische Spracherkennung”, [“Optimization of Algorithms and Design of a Processor for Automatic Speech Recognition”], Chair of Integrated Circuits, Technical University Munich, Dissertation, 07.19.1993, chapter 2, pages 13 to 26. This publication also contains a basic introduction on components of a device for a method for speech recognition, and also important techniques customary in speech recognition.
A keyword is a specific word that is to be recognized by a device for speech recognition in spoken speech. Such a keyword is mostly linked to a prescribed action, that is to say this action is executed after recognition of the keyword.
A method and a device for recognizing spoken speech are also described in the reference by N. Haberland, et. al., titled “Sprachunterricht—Wie funktioniert die computerbasierte Spracherkennung?”, [“Language Instruction—How Does Computer—Based Speech Recognition Work?”], c't May 1998, Heinz Heise Verlag, Hannover 1998, pages 120 to 125. It follows therefrom, inter alia, that modeling by hidden Markov models permits adaptation to a variation in the speed of the speaker, and that in the case of recognition a dynamic adaptation of the prescribed speech modules to the spoken speech is therefore performed, in particular by carrying out compression or expansion of the time axis. This corresponds to a dynamic adaptation (also: dynamic programming) which is insured, for example, by the Viterbi algorithm.
A space between sounds or sound sequences is determined, for example, by determining a (multidimensional) space between feature vectors that describe the sounds of speech in digitized form. This spacing is an example for a measure of similarity between sounds or sound sequences.
SUMMARY OF THE INVENTION
It is accordingly an object of the invention to provide a method and a device for recognizing at least one keyword in spoken speech using a computer which overcome the above-mentioned disadvantages of the prior art methods and devices of this general type, in which the recognition is robust and insensitive to interference.
With the foregoing and other objects in view there is provided, in accordance with the invention, a speech recognition method in which a computer performs the steps of:
a) subdividing a keyword into key segments;
b) assigning each of the key segments a set of reference features;
c) subdividing a test pattern derived for spoken speech into test segments;
d) assigning each of the test segments of the test pattern a reference feature from the set of reference features from a corresponding one of the key segments being most similar to a respective test segment; and
e) recognizing the test pattern as the keyword, if a measure of similarity is determined to be below a prescribed value of an accumulated segment-wise comparison of the reference feature to the respective test segment for each of the test segments of the test pattern.
A method is specified for recognizing at least one keyword in spoken speech using a computer, the keyword being subdivided into segments and each segment being assigned a set of reference features. A test pattern which is included in the spoken speech is subdivided into segments, each segment of the test pattern is assigned a reference feature being most similar to the segment, from the set of the reference features for the corresponding segment of the keyword. The test pattern is recognized as a keyword when a measure of similarity for the accumulated segment wise assignment of a reference feature of the keyword relative to the test pattern is below a prescribed bound. The test pattern is not recognized as a keyword if the measure of similarity is not below a prescribed bound. In this case, a low measure of similarity characterizes a good correspondence between the reference feature of the keyword and the test pattern.
A brief account of the various terms and their meaning follows below. The test pattern is a pattern included in the spoken speech which is to be compared with the keyword and is recognized as the keyword, if appropriate. The measure of similarity characterizes the degree of correspondence between a test pattern and the keyword, or between a part of the test pattern and a part of the keyword. The segment is a section of the test pattern or of the keyword which has a prescribed duration. The reference feature is a sub-feature of the keyword which is referenced to a segment. A reference pattern contains the reference features characterizing a form of expression of the keyword. A word class contains all to reference patterns which can be produced by different combinations of reference features, and a plurality of reference features per segment being stored for the keyword, in particular. In a training phase, representatives of reference features of the respective keyword are determined and stored, while in a recognition phase a comparison of the test pattern with possible reference patterns of the keyword is carried out.
In the training phase, a prescribed set M of representatives of the reference features is preferably stored. If more than reference features are available as free spaces M, averaging of the reference features, for example in the form of a sliding average, can be performed in order thereby to take account of the information of the additional reference features in the representatives.
A development of the invention consists in that the test pattern (and/or the keyword) is an independent sound unit, in particular a word. The test pattern and/or the keyword can also be a phonem, a diphone, another sound composed of a plurality of phonems, or a set of words.
Another development consists in that the number of segments for the keyword and for the test pattern is the same in each case.
Within the framework of an additional development, the test pattern is compared with a plurality of keywords, and the keyword most similar to the test pattern is output. This corresponds to a system for recognizing individual words, the plurality of keywords representing the individual words to be recognized in the spoken speech. In each case, the keyword which best fits the test pattern included in the spoken speech is output.
Another development is that feature vectors are used for storing the keyword and the test pattern, in which case at prescribed sampling instances the speech is digitized and one feature vector each is stored with the data characterizing the speech. This digitization of the speech signal takes place within the framework of preprocessing. A feature vector is preferably determined from the speech signal every 10 ms.
Another development consists in that there is stored for each segment a feature vector which is averaged over all the feature vectors of this segment and is further used as a characteristic of this segment. The digitized speech data, which occur every 10 ms, for example, are preferably preprocessed in overlapping time windows with a temporal extent of 25 ms. An LPC analysis, a spectral analysis or a Cepstral analysis can be used for this purpose. A feature vector with n coefficients is available as a result of the respective analysis for each 10 ms section. The feature vectors of a segment are preferably averaged such that one feature vector is available per segment. It is possible within the framework of the training for recognizing the keyword to store a plurality of different reference features per segment from different sources for spoken speech, such that a plurality of averaged reference features (feature vectors for the keyword) are available.
Furthermore, a device is specified for recognizing at least one keyword in spoken speech, which has a processor unit which is set up in such a way that the f

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and device for recognizing at least one keyword in... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and device for recognizing at least one keyword in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and device for recognizing at least one keyword in... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2817631

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.