Method, device and system for part-of-speech disambiguation

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S010000

Reexamination Certificate

active

06182028

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to part-of-speech disambiguation, and more particularly to combining neural-network and stochastic processors into a hybrid system to accomplish such disambiguation.
BACKGROUND OF THE INVENTION
Part-of-speech disambiguation is the process of assigning the correct part of speech to each word in a sentence, based on the word's usage in the sentence. For example, the part of speech of the English word “record” may be either noun or verb, depending on the context in which the word is used; in the sentence “John wants to record a record”, the first occurrence of “record” is used as a verb and the second is used as a noun. The accurate recognition of this distinction is particularly important in a text-to-speech system, because “record” is pronounced differently depending on whether it is a noun or verb.
As shown in
FIG. 1
, numeral
100
, to disambiguate the parts-of- speech of words in a text, part-of-speech disambiguation systems typically use the following three-step process. Step 1 is the tokenization step, in which a text stream (
101
) is tokenized into a sequence of text tokens (
104
) by a text tokenizer (
102
) as specified by a tokenization knowledge database (
103
). The tokenization knowledge database typically contains predetermined rules that are used to identify textual elements, which are classifiable by part of speech. Examples of such textual elements are words, punctuation marks, and special symbols such as “%” and “$”. Step 2 is the lexicon access step, in which each text token is looked up in a lexicon (
106
) by a lexicon accessor (
105
). The lexicon consists of a static lexicon (
107
) that contains a plurality of textual elements and corresponding part-of-speech tags, and a dynamic lexicon (
108
) that can generate part-of-speech tags for the textual elements that are not stored in the static lexicon. Because some textual elements (e.g., the word “record”) have more than one part of speech, the lexicon access step will result in at least one part-of-speech tag being assigned to each text token; the output of the lexicon access step is therefore a sequence of ambiguously tagged text tokens (
109
). Step 3 is the disambiguation step, in which all part-of-speech ambiguities in the sequence of ambiguously tagged text tokens are resolved by the disambiguator (
110
) as specified by the disambiguation knowledge database (
111
), thus resulting in a sequence of unambiguously tagged text tokens (
112
).
An example of the application of the above process is presented in
FIG. 2
, numeral
200
. A text stream (
201
) is input into the tokenization step, which yields a sequence of untagged text tokens (
202
) as its output. The sequence of untagged text tokens is input into the lexicon access step, which yields a sequence of ambiguously tagged text tokens as its output. As may be seen in
FIG. 2
, several text tokens have more than one tag associated with them; for example, “wants” is an ambiguously tagged text token (
204
), because it may be used as either a plural noun (tag “NNS”) or a third-person, present tense verb (tag “VBZ”). The set of all possible tag sequences based on the sequence of ambiguously tagged text tokens is represented by a directed acyclic graph of tag sequences (
203
). The sequence of ambiguously tagged text tokens is input into the disambiguation step, which determines a best path (
205
) through the directed acyclic graph of tag sequences, thus yielding a sequence of unambiguously tagged text tokens (
206
).
It is known in the art that local context is a strong indicator of a word's part of speech; hence stochastic systems based on the statistical modeling of word and tag collocations have proven successful. However, these systems fail predictably for syntactic structures that involve non-local dependencies. Because non-local dependencies are beyond the limits of stochastic systems, such effects must be accounted for by systems that can process expanded context. Two problems to be considered in developing such systems are: identifying and placing appropriate limits on the amount of expanded context to be processed, and balancing the contribution of the evidence provided by local and expanded context processing.
Hence, there is a need for a method, device and system for part-of-speech disambiguation that advantageously combines the processing of both local and expanded context.


REFERENCES:
patent: 4916614 (1990-04-01), Kaji et al.
patent: 5146405 (1992-09-01), Church
patent: 5383120 (1995-01-01), Zernick
patent: 5418717 (1995-05-01), Su et al.
patent: 5752052 (1998-05-01), Richardson et al.
patent: 5799269 (1998-08-01), Schabes et al.
Bennello, Julian, Andrew W. Mackie, and James A. Anderson. 1989. Syntatic category disambiguation with neural networks. Computer Speech and Language 3: 203-217.
DeRose, Steven J. 1988. Grammatical category disambiguation by statistical optimization. Computional Linguistics 14 (1): 31-39.
Schmid, Helmut. 1994. Part-of Speech tagging with neural networks. In Proceedings, Fifteenth International Conference on Computational Linguistics, 172-176.
Kempe, Andre. 1994. Probabilistic tagging with feature structures. In Proceedings, Fifteenth International Conference on Computional Linguistics.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method, device and system for part-of-speech disambiguation does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method, device and system for part-of-speech disambiguation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, device and system for part-of-speech disambiguation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2437461

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.