Dynamic phoneme dictionary for speech recognition

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S249000

Reexamination Certificate

active

06804645

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to speech recognition and in particular to providing a dictionary for a speech recognition system.
2. Description of the Related Art
A considerable problem in computer-assisted speech recognition is comprised in the size of the stored vocabulary. Since computer-assisted speech recognition is implemented by comparing sequences of feature vectors that respectively represent a word in a feature space, speech recognition becomes more and more inexact with the size of a plurality of words to be compared. The cause of the increasing imprecision of the speech recognition given an increasing plurality of words to be compared lies in the higher plurality of sequences of feature vectors in the feature space. As a result of the increased plurality of feature vectors, the distances of the feature vectors from one another in the feature space become smaller and, thus, a classification of the word to be recognized also becomes more and more problematical.
The necessity of reducing the vocabulary thus derives. A problem in the reduction of the vocabulary, however, is comprised therein that those words that are not stored with the corresponding sequences of feature vectors are also not recognized.
Further, the considerable memory requirement is to be considered a disadvantage of storing a large vocabulary. This high requirement for memory space leads to considerable costs that are unavoidable given employment of a large vocabulary that is not adapted to the application.
The structure of a text phoneme-converter is known, for example, from publications K. Wothke, Morphologically Based Automatic Phonetic Transcription, IBM Systems Journal, Vol. 32, No. 3, pp. 486-511, 1993; S. Besling, Heuristical and Statistical Methods for Grapheme-to-Phoneme Conversion, Proceedings CONVENS 1994, Vienna, AUstria, pp. 23-31, 1994; and W. Daelemens et al., Data-oriented Methods for Grapheme-to-Phoneme Conversion, Proceedings EACL 1993, Utrecht, pp. 45-53, 1993.
A survey of the fundamentals of expert systems is described in the publication by K. Bauer et al. Expertensysteme: Einführung in Technik und Anwendung, Siemens A G, Engioneering & Kommunikation, D. Nebendahl (Ed.), Siemens A G (Publishing Department), ISBN 3-8009-1495-6, pp. 27-82, 1987.
SUMMARY OF THE INVENTION
The invention is based on the problem of specifying an arrangement for producing a digital dictionary with which a flexible, favorable and qualitatively high-grade speech recognition can be realized on the basis of the digital dictionary. Another problem is to specify an arrangement for speech recognition upon employment of the digital dictionary. The invention is also based on the problem of specifying a method for producing a digital dictionary with the assistance of a computer that enables a flexible, favorable and qualitatively high-grade speech recognition on the basis of the digital dictionary.
The first described arrangement comprises a means for reading arbitrary electronic documents in, as well as a first memory for permanent storage of standard words and their phoneme sequences and a second memory for temporary storage of additional words and their phoneme sequences as well as a text-phoneme converter. As a result of the means for selecting and reading in arbitrary electronic documents, those documents that contain a suitable vocabulary for a prescribable application situation are automatically selected and read in from an arbitrary set of electronic documents on the basis of the greatest variety of criteria. Added words for which the corresponding phoneme sequences are formed with the text-phoneme converter are selected from the words. The added words and the corresponding phonemes are temporarily stored in the second memory. At least the standard words and the added words and their respective phoneme sequences form the digital dictionary.
This arrangement makes it possible to respectively determine an application-specific vocabulary for the duration of a specific, characteristic application and to temporarily store these words together with the corresponding phoneme sequences and temporarily incorporate them in the digital dictionary. When the application changes, new electronic documents are again identified and added words are again selected from them, these then being again temporarily stored in the second memory with the corresponding phoneme sequences. The inventive arrangement thus achieves a considerable reduction of the memory space required.
Compared to the first described arrangement, the second disclosed arrangement additionally comprises a means for speaker-dependent speech recognition. By reducing the respectively investigated (compared) vocabulary in the speech recognition, it thereby becomes possible to achieve a better recognition performance than is possible with the known arrangements, since the plurality of words to be compared has been reduced without, however, leaving specific words characteristic of the application out of consideration. Further, the inventive arrangement can be very flexibly utilized under the greatest variety of application situations without requiring involved training phases of the vocabulary for each characteristic application situation. As a result thereof, a substantially lower requirement of memory space is achieved, this leading to a considerable saving of costs for the arrangement for speech recognition.
In the method according to patent claim
6
, a digital dictionary that already comprises standard words and phoneme sequences allocated to the standard words at the beginning of the method is built up with the assistance of a computer. Electronic documents are identified in a first step and added words are selected from the electronic documents. A respective phoneme sequence that is allocated to the respective added word is formed for the added words with the assistance of the text-phoneme converter. The added words are temporarily stored and temporarily assigned to the digital dictionary. As a result of the inventive method, a digital dictionary is built up in a way that can be very flexibly adapted to changing applications. A speech recognition that uses the dictionary which has been built up according to the present invention is thus implemented rapidly and reliably with low costs since, first, the plurality of permanently stored words is reduced and, second, the density of the individual phoneme sequences in the feature space is likewise reduced, this leading to improved recognition performance in the speech recognition.
Advantageous developments of the inventive arrangements as well as advantageous developments of the inventive method are provided by a decision unit that is additionally provided for the selection of the added words from the selected electronic documents. A third memory may be provided for storing predetermined reserve words and phoneme sequences allocated to the reserve words that are temporarily stored for each application, whereby the phoneme sequences of the reserve words exhibit a higher quality than the phoneme sequences that are formed by the text-phoneme converter. In one embodiment, a fourth memory for storing speech feature vectors of sayings and/or words that are prescribed by a user, whereby the speech feature vectors respectively characterize a part of the word. User feature vectors of a part of a digitalized voice signal that characterize the part of the digitalized voice signal are compared to stored phoneme feature vectors and/or to stored speech feature vectors, whereby the phoneme feature vectors respectively characterize the phoneme, and whereby the speech feature vectors respectively characterize a part of the word. As a preferred development, the determination of the electronic documents ensues according to at least one of the following rules: an unambiguous allocation of the electronic documents is predetermined; spoken words of a user are recognized by the arrangement for speech recognition and the determination ensues on the basis of the recognized w

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Dynamic phoneme dictionary for speech recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Dynamic phoneme dictionary for speech recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic phoneme dictionary for speech recognition will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3284780

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.