Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
2000-04-07
2004-05-18
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S008000, C704S240000
Reexamination Certificate
active
06738745
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to speech recognition systems and, more particularly, to methods and apparatus for detecting non-target languages in a monolingual speech recognition system.
BACKGROUND OF THE INVENTION
Speech recognition and audio indexing systems are generally developed for a specific target language. The lexica, grammar and acoustic models of such monolingual systems reflect the typical properties of the target language. In practice, however, these monolingual systems may be exposed to other non-target languages, leading to poor performance, including improper transcription or indexing, potential misinterpretations or false system reaction.
For example, many organizations, such as broadcast news organizations and information retrieval services, must process large amounts of audio information, for storage and retrieval purposes. Frequently, the audio information must be classified by subject or speaker name, or both. In order to classify audio information by subject, a speech recognition system initially transcribes the audio information into text for automated classification or indexing. Thereafter, the index can be used to perform query-document matching to return relevant documents to the user.
If the source audio information includes non-target language references, however, the speech recognition system may improperly transcribe the non-target language references, potentially leading to improper classification or indexing of the source information. A need therefore exists for a method and apparatus for detecting non-target language references in an audio transcription or speech recognition system.
With the trend in globalizing communication technologies and providing services to a wide, multilingual public, the ability to distinguish between languages has become increasingly important. The language-rejection problem is closely related to this ability and thus to the problem of automatic language identification (ALI). For a detailed discussion of automatic language identification techniques, see, for example, Y. K. Muthusamy et al., “Reviewing Automatic Language Identification,” IEEE Signal Processing Magazine, 11(4):33-41 (October 1994); J. Navrátil and W. Zühlke, “Phonetic-Context Mapping in Language Identification,” Proc. of the EUROSPEECH-97, Vol. 1, 71-74 (1997); and J. Navrátil and W. Zühilke, “An Efficient Phonotactic-Acoustic System for Language Identification,” Proc. of the Int'l Conf. on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2, 781-84, Seattle, Wash., IEEE (May, 1998), each incorporated by reference herein.
A number of automatic language identification techniques have been proposed or suggested for distinguishing languages based on various features contained in the speech signal. Several sources of language-discriminative information have been identified as relevant for the task of language identification including, for example, the prosody, the acoustics, and the grammatical and lexical structure. Automatic language identification techniques based on the prosody or acoustics of speech attempt to identify a given language based on typical melodic and pronunciation patterns, respectively.
Due to the complexity of automatic language identification techniques based on the grammatical and lexical structure, however, most proposals have advanced techniques based on acoustic-prosodic information or derived lexical features in order to represent the phonetic structure in a less complex manner. ALI techniques have been developed that model statistical dependencies inherent in phonetic chains, referred to as the phonotactics. In the statistical sense, phonotactics can be viewed as a subset of grammatical and lexical rules of a language. Since these rules differ among languages, the ability to discriminate among languages is naturally reflected in the phonotactic properties.
SUMMARY OF THE INVENTION
Generally, methods and apparatus are disclosed for detecting non-target language references in an audio transcription or speech recognition system using confidence scores. The confidence score may be based on (i) a probabilistic engine score provided by a speech recognition system, (ii) additional scores based on background models, or (iii) a combination of the foregoing. The engine score provided by the speech recognition system for a given input speech utterance reflects the degree of acoustic and linguistic match of the utterance with the trained target language. In one illustrative implementation, the probabilistic engine score provided by the speech recognition system is combined with the background model scores to normalize the engine score as well as to account for the potential presence of a non-target language. The normalization narrows the variability range of the scores across speakers and channels.
The present invention identifies a non-target language utterance within an audio stream when the confidence score falls below a predefined criteria. According to one aspect of the invention, a language rejection mechanism interrupts or modifies the transcription process when speech in the non-target language is detected. In this manner, the present invention prevents improper transcription and indexing and false interpretations of the speech recognition output.
In the presence of non-target language utterances, the transcription system is not able to find a good match based on its native vocabulary, language models and acoustic models. The resulting recognized text will have associated lower engine score values. Thus, the engine score alone may be used to identify a non-target language when the engine score is below a predefined threshold.
The background models are created or trained based on speech data in several languages, which may or may not include the target language itself. A number of types of background language models may be employed for each modeled language, including one or more of (i) prosodic models; (ii) acoustic models; (iii) phonotactic models; and (iv) keyword spotting models.
REFERENCES:
patent: 5724526 (1998-03-01), Kunita
patent: 5913185 (1999-06-01), Martino et al.
patent: 6047251 (2000-04-01), Pon et al.
patent: 6061646 (2000-05-01), Martino et al.
patent: 6085160 (2000-07-01), D'hoore et al.
patent: 2160184 (1996-06-01), None
Chen et al., “Clustering via the Bayesian Information Criterion with Applications in Speech Recognition,” IBM, T.J. Watson Research Center.
Beigi et al., “A Distance Measure Between Collections of Distributions and its Application to Speaker Recognition,” IBM, T.J. Watson Research Center.
Chen et al., “Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion,” IBM, T.J. Watson Research Center.
Dharanipragada et al., “Experimental Results in Audio Indexing”, IBM, T.J. Watson Research Center.
Neti et al., “Audio-Visual Speaker Recognition for Video Broadcast News,” IBM, T.J. Watson Research Center.
Chen et al., “IBM's LVCSR System for Transcription of Broadcast News Used in the 1997 HUB4 English Evaluation,” IBM, T.J. Watson Research Center.
Navratil et al., “Phonetic-Context Mapping in Language Identification,” Proc. Of the EUROSPEECH-97, vol. 1, 71-74 (1997).
Navratil et al., An Efficient Phonotactic-Acoustic System for Language Identification, Proc. of the Int'l Conf. On Acoustics, Speech and Signal Processing (ICASSP), vol. 2, 781-84, Seattle, WA, IEEE (May, 1998).
Ramabhadran et al., Acoustics Only Based Automatic Phonetic Baseform Generation Proc. Of the Int'l Conf. On Acoustics, Speech and Signal Processing (ICASSP), Seattle, WA, IEEE (May, 1998).
Navratil Jiri
Viswanathan Mahesh
Dang, Esq. Thu Ann
Dorvil Richemond
Ryan & Mason & Lewis, LLP
Storm Donald L.
LandOfFree
Methods and apparatus for identifying a non-target language... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for identifying a non-target language..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for identifying a non-target language... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3240507