Segmentation technique increasing the active vocabulary of...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S254000

Reexamination Certificate

active

06738741

ABSTRACT:

1 BACKGROUND OF THE INVENTION
1.1 Field of the Invention
The present invention relates to a speech recognition system and a method executed by a speech recognition system. More particularly, the invention relates to the vocabulary of a speech recognition system and its usage during the speech recognition process.
1.2 Description and Disadvantages of Prior Art
The invention may preferably be implemented in accordance with the IBM ViaVoice 98 speech recognition system developed by the present assignee. IBM ViaVoice 98 is a real time speech recognition system for large vocabularies which can be speaker-trained with little cost to the user. However, the invention is not limited to use with this particular system and may be used in accordance with other speech recognition systems.
The starting point in these known systems is the breakdown of the speech recognition process into a part based on acoustic data (decoding) and a language statistics part referring back to bodies of language or text for a specific area of application (language model). The decision on candidate words is thus derived both from a decoder and a model language probability. For the user, the fitting of the vocabulary processed by this recognition system, to the specific field or even to individual requirements, is of particular significance.
With this speech recognition system, the acoustic decoding first supplies hypothetical words. The further evaluation of competing hypothetical words is then based on the language model. This represents estimates of word string frequencies obtained from application-specific bodies of text based on a collection of text samples from a desired field of application. From these text samples are generated the most frequent forms of words and statistics on word sequences.
In the method used here for estimating the frequency of sequences of words, the frequency of occurrence of the so-called word form trigrams in a given text are estimated. In known speech recognition systems, the so-called Hidden Markov Model is frequently used for estimating the probabilities. Here, several frequencies observed in the text are set down. For a trigram “uvw” these are a nullgram term f
0
, a unigram term f(w), a bigram term f(w|v) and a trigram term f(w|uv). These terms correspond to the relative frequencies observed in the text, where the nullgram term has only a corrective significance.
If these terms are interpreted as probabilities of the word w under various conditions, a so-called latent variable can be added, from which one of the four conditions which produce the word w is achieved by substitution. If the transfer probabilities for the corresponding term are designated &lgr;
0
&lgr;
1
&lgr;
2
&lgr;
3
, then we obtain the following expression for the trigram probability sought
Pr
(
w|uv
)=&lgr;
0
f
0
+&lgr;
1
f
(
w
)+&lgr;
2
f
(
w|v
)+&lgr;
3
f
(
w|uv
)
The known speech recognition systems have the disadvantage that each word appears as a word form in the vocabulary of the system. For this reason there are relatively large demands on the memory capacity of the system. The generally very extensive vocabularies also have a disadvantageous effect on the speed of the recognition process.
Typical speech recognition systems are working in real-time on today's PCs. They have an active vocabulary of up to and exceeding 60,000 words, can recognize continuously and/or naturally spoken input without the need to adapt the system to specific characteristics of a speaker. S. Kunzmann; “VoiceType: A Multi-Lingual, Large Vocabulary Speech Recognition System for a PC”, Proceedings of the 2nd SQEL Workshop, Pilsen, Apr. 27-29, 1997, ISBN 80-7082-314-3) gives an outline on these aspects. Given the actual vocabulary used in human communication, the order of magnitude of the vocabulary recognized by computer-based speech recognition systems must actually reach hundreds of thousands to several million words. Even if such large vocabulary sizes would be available today, beside algorithmic limitations on recognizing these extremely large vocabulary sizes, issues like recognition accuracy, decoding speed and system resources (CPU, memory, disc) play a major role for classifying real-time speech recognition systems.
In the past several approaches have been suggested to increase the size of the active vocabulary for such recognition systems. In particular such state of the art approaches are related to the handling of compound words.
The German patent for instance DE 19510083 C2 assumes that the compound words e.g., German “Fahrbahnschalter” or “vorgehen” are decomposed in constituents like “Fahrbahn-schalter” or “vorgehen”. The assumption is that composita are split in constituents which are a sequences of legal words in the German language as well as in the recognition vocabulary (“Fahrbahn”, “Schalter” and “vor”, “gehen”). For each of these words statistics are computed, describing the most likely frequencies of each word (Fahrbahnschalter, vorgehen) in their context of occurrence e.g., “Der Fahrbahnschalter ist geschlossen”. In addition separate frequency statistics are computed which describe the sequence of these constituents within compound words. Both statistical models are used to decide if the individual constituents are displayed to the user as single words or as compound word. Cases like “Verfügbarkeit” (constituents: “verfügbar”+“keit”) or “Birnen” (constituents: “Birne”+“n”) are not covered since “keit” and “n” are neither legal (standalone) words nor syllables in the German language, thus it's not contained within the recognition vocabulary. According to this teaching an additional, separate frequency model is required to allow the resolving of problems of illegal word sequences during recombination of these arbitrary constituents into words (e.g. “vor”−“Verfügbar”).
The recent U.S. patent U.S. Pat. No. 5,754,972 teaches the introduction of a special dictation mode where the user either announces a “compound dictation mode” or the system is switched into a special recognition mode. This is exposed to the user by a specific user interface. In languages like German the occurrence of compound words is extremely frequent, so the need to switch towards specific dictation modes is extremely cumbersome. In addition, the teaching of U.S. Pat. No. 5,754,972 is based on the same fundamental assumption as German patent DE 19510083 C2: compound words can be built only on constituents representing legal words of the vocabulary by their own. To support the generation of new compound words the spelling of the characters of the compound word is introduced within this special dictation mode.
A different approach is disclosed by G. Ruske, “Half words as processing units in automatic speech recognition”, Journal “Sprache und Datenverarbeitung”, Vol. 8, 1984, Part ½, pp. 5-16. A word of the recognition vocabulary is usually described via it's orthography (spelling) and it's associated (multiple) pronunciations via smallest recognition units. The recognition units are the smallest recognizable units for the decoder. G. Ruske defines these recognition units based on a set of syllables (around 5000 in German). To each spelling of the vocabulary, a sequence of syllables describes the pronunciation(s) of each individual word. Thus, according to the teaching of Ruske, words of the vocabulary are set up by the recognition units of the decoder being identical to the syllables according to the pronunciation of the word in that language. Therefore, the recombination of constituents to build words of the language is thus limited to the recognition units of the decoder.
1.3 Objective of the Invention
The invention is based on the objective to provide a technology to increase the size of an active vocabulary recognized by speech recognition systems. It is a further objective of the current invention to reduce at the same time the algorithmic limitations on recognizing such extremely large vocabulary sizes for instance in terms of recog

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Segmentation technique increasing the active vocabulary of... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Segmentation technique increasing the active vocabulary of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Segmentation technique increasing the active vocabulary of... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3216748

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.