Data processing: speech signal processing – linguistics – language – Linguistics – Dictionary building – modification – or prioritization
Reexamination Certificate
1999-11-02
2003-04-15
To, Doris H. (Department: 2655)
Data processing: speech signal processing, linguistics, language
Linguistics
Dictionary building, modification, or prioritization
C704S008000
Reexamination Certificate
active
06549883
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to the field of speech recognition and speech synthesis. This invention is particularly applicable to the generation of speech recognition dictionaries including transcriptions for use in speech recognition systems as may be used in a telephone directory assistance system, voice activated dialing (VAD) system, personal voice dialing system and other speech recognition enabled services. This invention is also applicable to text to speech synthesizers for generating suitable pronunciations of vocabulary items.
BACKGROUND OF THE INVENTION
Speech recognition enabled services are more and more popular today. The services may include stock quotes, directory assistance, reservations and many others.
In typical speech recognition systems, tie user enters his request using isolated word, connected word or continuous speech via a microphone or telephone set. If valid speech is detected, the speech recognition layer of the system is invoked in an attempt to recognize the unknown utterance. Typically, entries in a speech recognition dictionary are scored in order to determine the most likely match to the utterance. The recognition of speech involves aligning an input audio signal with the most appropriate target speech model.
Speech recognition dictionaries used in such speech recognition systems typically comprise a group of transcriptions associated to a given vocabulary item. A transcription is a representation of the pronunciation of the associated vocabulary item when uttered by a human. Typically, a transcription is the acoustic representation a vocabulary item as a sequence of sub-transcription units. A number of acoustic sub-transcription units can be used in a transcription such as phonemes, allophones, triphones, syllables and dyads (demi-syllables). Commonly, the phoneme is used as the sub-transcription unit and the representation in such case is designated as “phonemic transcription”.
In most cases, multiple transcriptions are provided for each vocabulary item thereby allowing for different pronunciations of the vocabulary item. Typically, a limit on the total number of transcriptions in a speech recognition dictionary is imposed due to the inherent computational limits of the speech recognizer as well as due to the memory requirements for storing the transcriptions. Commonly, the limit on the total number of transcriptions is put into practice by limiting the maximum number of transcriptions stored for each vocabulary item.
Of particular interest here are multi-lingual pronunciations. A common method is to provide for each vocabulary item and for each language that the dictionary is desirous to support a transcription in order to account for the different possible pronunciations of the vocabulary item in the different languages. A specific example will better illustrate this method. Suppose the vocabulary item “Robert” and the languages that the dictionary is desirous to support are comprised of French, English, German, Russian and Spanish. The dictionary will comprise five transcriptions for each vocabulary item, one transcription for each language.
A deficiency of the above-described method is that the above-described method does not provide any mechanism for including language probability information in the selection of the transcriptions. Consequently, a large number of transcriptions having a low likelihood of being used by a speech processing device are stored taking up memory space and increase the computational load of speech processing devices making use of the transcriptions since more transcriptions have to be scored. Continuing the specific example of the vocabulary item “Robert”, it is unlikely for this vocabulary item to be pronounced on the basis of a Russian pronunciation since “Robert” is an uncommon name in that language.
Thus, there exists a need in the industry to refine the process of generating a group of transcriptions capable of being used by a speech processing device such as a speech recognition dictionary or a text to speech synthesizer.
SUMMARY OF THE INVENTION
A method and apparatus for generating transcriptions suitable for use in a speech-processing device. The invention provides processing the vocabulary item to derive a characteristic from the vocabulary item allowing to divide a pool of available languages in a first sub-group and a second sub-group. The vocabulary item has a higher probability of belonging to any one of the languages in the first sub-group than belonging to any language in the second sub-group. The invention further provides processing the vocabulary item to generate a group of transcriptions, the group of transcriptions characterized by the absence of at least one transcription belong to a language in the second sub-group of languages.
The advantage of this data structure over prior art data structures resides in the reduction of unnecessary transcriptions.
In a specific example of implementation, the vocabulary items in the sub-set are further associated to transcriptions belonging to a common default language.
Preferably but not essentially, a characteristic allowing to divide the pool of available languages in the first sub-group and the second sub-group is the etymology of the vocabulary item.
In accordance with another broad aspect, the invention further provides a method for generating a group of transcriptions suitable for use in a speech processing device. The method comprises providing a vocabulary item and processing it to derive a characteristic allowing to divide a pool of available languages in a first sub-group and a second sub-group. The vocabulary item manifests a higher probability of belonging to any language in the first sub-group than belonging to a language in the second sub-group. The method further comprises processing the vocabulary item to generate a group of transcriptions, the group of transcriptions being characterized by the absence of at least one transcription belonging to a language in the second sub-group of languages established for the vocabulary item. Optionally, the method further comprises storing the group of transcriptions on a computer readable storage medium in a format suitable for use by a speech-processing device.
Preferably but not essentially, the method provides processing the vocabulary item to generate transcriptions corresponding to each language belonging to the first sub-group.
Preferably but not essentially, the characteristic allowing to divide the pool of available languages in the first sub-group and the second sub-group is the etymology of the vocabulary item.
In accordance with another broad aspect, the invention further provides an apparatus for implementing the above-described method.
In accordance with another broad aspect, the invention provides a computer readable medium comprising a program element suitable for execution by a computing apparatus for implementing the above-described method.
In accordance with another broad aspect, the invention further provides a computer readable medium containing a speech recognition dictionary comprising transcriptions generated by the above-described method.
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.
REFERENCES:
patent: 5535120 (1996-07-01), Chong et al.
patent: 6167405 (2000-12-01), Rosensteel, Jr. et al.
patent: 6256605 (2001-07-01), MacMillan
“N-Gram Based Text Categorization,” Cavnar, W.B., et al. 1994 Symposium on Document Analysis and Information Retrieval, pp. 161-176.
Multilingual Sentence Categorization According to Language, Giguet, E. (1995), Proceedings of the European Chapter of the Association for Computational Linguistics SIGDAT Workshop, “From Text to Tags: Issues in Multilingual Language Analysis,” pp. 73-76, Dublin, Ireland.
“Automatic Language Identification of Telephone Speech Messages Using Phoneme Recognition and N-Gram Modeling,” ICASSP, 1994, Proceedings, vol
Fabiani Marc A.
Sabourin Michael G.
Nolan Daniel
Nortel Networks Limited
Smith Kevin L.
To Doris H.
LandOfFree
Method and apparatus for generating multilingual... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for generating multilingual..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for generating multilingual... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3095521