Method for building linguistic models from a corpus

Data processing: speech signal processing – linguistics – language – Linguistics

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method for building linguistic models from a corpus Method for building linguistic models from a corpus

: 1999-11-19
: 2002-07-02
: Edouard, Patrick N. (Department: 2747)
: Data processing: speech signal processing, linguistics, language
: Linguistics

: C704S257000, C704S255000
: Reexamination Certificate
: active
: 06415248
: ABSTRACT:

BACKGROUND
The present invention relates to a method that builds phrase grammars from a corpus of speech, text, phonemes or any kind of symbolic input (herein “the corpus”).
It has long been a goal of computing systems to interact with human users using natural language from the users. That is, rather than restricting the user to predetermined syntactic commands, it would be preferable to have the user express a command in the most natural way for the user and to have a computer comprehend the command. Although modern computing systems have improved remarkably in their ability to recognize spoken words, comprehension of speech still is limited because these system cannot ascribe meanings to the commands.
Significant advances have been made in the ability of modern computing systems to acquire phrases from a corpus. For example, acquisition techniques are disclosed in U.S. patent application Ser. No. 08/960,291, entitled “Automatic Generation of Superwords,” filed Oct. 29, 1997. Other examples may be found in E. Giachin, “Phrase Bigram for Continuous Speech Recognition,” Proc. ICASSP, pp. 225-228, (1995), K. Ries, et al., “Improved Language Modeling by Unsupervised Acquisition of Structure,” Proc. ICASSP, pp. 193-196 (1995).
Additionally, advances have been made in the ability of such systems to classify words that possess similar lexical significance. The inventors, for example, have developed a clustering technique as disclosed in co-pending U.S. patent application Ser. No. 207,326 entitled “Automatic Clustering of Tokens from a Corpus of Speech,” the disclosure of which is incorporated herein. Clustering processing also is disclosed in Kneser, et al., “Improved Clustering Techniques for Class-Based Statistical Language Modeling,” Eurospeech (1993) and in McCandless, et al., “Empirical Acquisition of Word and Phrase Classes in the Atis Domain,” Third European Conf. Speech Comm. Tech. (1993).
While phrase acquisition and clustering techniques improve the ability of a computing system to comprehend speech, neither technique alone can build a structure model from a corpus of speech or text. Accordingly, there is a need in the art for a method for building a linguistic model from a corpus of speech or text.
SUMMARY
The present invention provides a method that combines clustering techniques with phrase acquisition techniques with a closed-loop optimization method to build complex linguistic models from a corpus. A set of features is initialized by the corpus. Thereafter, the method determines, according to a predetermined cost function, to process the features by one of phrase clustering processing or phrase grammar learning processing. If phrase clustering processing is performed, the method processes an interstitial set of features comprising both the old features and newly established clusters by phrase grammar learning processing. The features obtained as an output of phrase grammar learning is re-indexed as a set of features for a subsequent iteration. The method may be repeated over several iterations to build a hierarchical linguistic model.

REFERENCES:
patent: 5839106 (1998-11-01), Bellegarda
patent: 6021384 (2000-02-01), Gorin et al.
patent: 6173261 (2001-01-01), Arai et al.
McCandless et al, “Empirical Acquistion of Word and Phrase Classes in the ATIS Domain”, Spoken Language Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts.
Kneser et al, “Improved Clustering Techniques for Class-Based Statistical Language Modeling”, Philips GmbH Forschungslaboratorien, Weisshausstrasse, 2, D-52066 Aachen, Germany.
Abella et al, “Generating Semantically Consistent Inputs to a Dialog Manager”, At&T Labs Research, Florham Park, New Jersey.
Saul et al, “Aggregate and Mixed Order Markov Models for Statistical Language Processing”, AT&T Labs—Research, Florham Park, New Jersey.

Affiliated with

Bangalore Srinivas

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Riccardi Giuseppe

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

AT&T Corp.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Edouard Patrick N.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kenyon & Kenyon

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for building linguistic models from a corpus does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for building linguistic models from a corpus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for building linguistic models from a corpus will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2872365

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure