Automatic clustering of tokens from a corpus for grammar...

Data processing: speech signal processing – linguistics – language – Linguistics

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S255000

Reexamination Certificate

active

06751584

ABSTRACT:

BACKGROUND
The present invention relates to an application that builds linguistic models from a corpus of speech.
For a machine to comprehend speech, not only must the machine identify spoken (or typed) words, but it also must understand language grammar to comprehend the meaning of commands. Accordingly, much research has been devoted to the construction of language models that a machine may use to ascibe meaning to spoken commands. Often, language models are preprogrammed. However, such predefined models increase the costs of a speech recognition system. Also, the language models obtained therefrom have narrow applications. Unless a programmer predefines the language model to recognize a certain command, the speech recognition system that uses the model may not recognize the command. What is needed is a training system that automatically extracts grammatical relationships from a predefined corpus of speech.
SUMMARY
An embodiment of the present invention provides a method of learning grammar from a corpus, in which context words are identified from a corpus. For the other non-context words, the method counts the occurrence of predetermined relationships with the context words, and maps the counted occurrences to a multidimensional frequency space. Clusters are grown from the frequency vectors. The clusters represent classes of words; words in the same cluster posses the same lexical significancy and provide an indicator of grammatical structure.


REFERENCES:
patent: 5325298 (1994-06-01), Gallant et al.
patent: 5619709 (1997-04-01), Caid et al.
patent: 5835893 (1998-11-01), Ushioda
patent: 5839106 (1998-11-01), Bellegarda
patent: 5860063 (1999-01-01), Gorin et al.
patent: 6052657 (2000-04-01), Yamron
patent: 6073091 (2000-06-01), Kanevsky et al.
patent: 6094653 (2000-07-01), Li et al.
“Dimensions of Meaning,” Hinrich Schutze, Center for the Study of Language and Information, Ventura Hall.
“Grammar Fragment Acquisition using Syntactic and Semantic Clustering,” Jeremy H. Wright, Giuseppe Riccardi, Allen L. Gorin & Kazuhiro Arai.
“Improved Clustering Techniques for Class-Based Statistical Language Modelling,” Reinhard Kneser and Hermann Ney.
“Aggregate and Mixed Order Markov Models for Statistical Language Processing,” Lawrence Saul and Fernando Pereira.
“Empirical Acquisition of Word and Phrase Classes in the Atis Domain,” Michael K. McCandless and James R. Glass.
“Distributional Clustering of English Words,” Fernando Pereria, Naftali Tishby and Lillian Lee.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Automatic clustering of tokens from a corpus for grammar... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Automatic clustering of tokens from a corpus for grammar..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic clustering of tokens from a corpus for grammar... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3310214

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.