Data processing: speech signal processing – linguistics – language – Linguistics
Reexamination Certificate
1999-11-19
2002-07-02
Edouard, Patrick N. (Department: 2747)
Data processing: speech signal processing, linguistics, language
Linguistics
C704S257000, C704S255000
Reexamination Certificate
active
06415248
ABSTRACT:
BACKGROUND
The present invention relates to a method that builds phrase grammars from a corpus of speech, text, phonemes or any kind of symbolic input (herein “the corpus”).
It has long been a goal of computing systems to interact with human users using natural language from the users. That is, rather than restricting the user to predetermined syntactic commands, it would be preferable to have the user express a command in the most natural way for the user and to have a computer comprehend the command. Although modern computing systems have improved remarkably in their ability to recognize spoken words, comprehension of speech still is limited because these system cannot ascribe meanings to the commands.
Significant advances have been made in the ability of modern computing systems to acquire phrases from a corpus. For example, acquisition techniques are disclosed in U.S. patent application Ser. No. 08/960,291, entitled “Automatic Generation of Superwords,” filed Oct. 29, 1997. Other examples may be found in E. Giachin, “Phrase Bigram for Continuous Speech Recognition,” Proc. ICASSP, pp. 225-228, (1995), K. Ries, et al., “Improved Language Modeling by Unsupervised Acquisition of Structure,” Proc. ICASSP, pp. 193-196 (1995).
Additionally, advances have been made in the ability of such systems to classify words that possess similar lexical significance. The inventors, for example, have developed a clustering technique as disclosed in co-pending U.S. patent application Ser. No. 207,326 entitled “Automatic Clustering of Tokens from a Corpus of Speech,” the disclosure of which is incorporated herein. Clustering processing also is disclosed in Kneser, et al., “Improved Clustering Techniques for Class-Based Statistical Language Modeling,” Eurospeech (1993) and in McCandless, et al., “Empirical Acquisition of Word and Phrase Classes in the Atis Domain,” Third European Conf. Speech Comm. Tech. (1993).
While phrase acquisition and clustering techniques improve the ability of a computing system to comprehend speech, neither technique alone can build a structure model from a corpus of speech or text. Accordingly, there is a need in the art for a method for building a linguistic model from a corpus of speech or text.
SUMMARY
The present invention provides a method that combines clustering techniques with phrase acquisition techniques with a closed-loop optimization method to build complex linguistic models from a corpus. A set of features is initialized by the corpus. Thereafter, the method determines, according to a predetermined cost function, to process the features by one of phrase clustering processing or phrase grammar learning processing. If phrase clustering processing is performed, the method processes an interstitial set of features comprising both the old features and newly established clusters by phrase grammar learning processing. The features obtained as an output of phrase grammar learning is re-indexed as a set of features for a subsequent iteration. The method may be repeated over several iterations to build a hierarchical linguistic model.
REFERENCES:
patent: 5839106 (1998-11-01), Bellegarda
patent: 6021384 (2000-02-01), Gorin et al.
patent: 6173261 (2001-01-01), Arai et al.
McCandless et al, “Empirical Acquistion of Word and Phrase Classes in the ATIS Domain”, Spoken Language Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts.
Kneser et al, “Improved Clustering Techniques for Class-Based Statistical Language Modeling”, Philips GmbH Forschungslaboratorien, Weisshausstrasse, 2, D-52066 Aachen, Germany.
Abella et al, “Generating Semantically Consistent Inputs to a Dialog Manager”, At&T Labs Research, Florham Park, New Jersey.
Saul et al, “Aggregate and Mixed Order Markov Models for Statistical Language Processing”, AT&T Labs—Research, Florham Park, New Jersey.
Bangalore Srinivas
Riccardi Giuseppe
AT&T Corp.
Edouard Patrick N.
Kenyon & Kenyon
LandOfFree
Method for building linguistic models from a corpus does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for building linguistic models from a corpus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for building linguistic models from a corpus will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2872365