Bootstrapping sense characterizations of occurrences of...

Data processing: speech signal processing – linguistics – language – Linguistics – Dictionary building – modification – or prioritization

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000

Reexamination Certificate

active

06253170

ABSTRACT:

TECHNICAL FIELD
The invention relates generally to the field of natural language processing, and, more specifically, to the field of automated lexicography.
BACKGROUND OF THE INVENTION
A dictionary is a resource that lists, defines, and gives usage examples of words and other terms. For example, a conventional dictionary might contain the following entries:
TABLE 1
flow, sense 100 (verb, intransitive): to run smoothly with unbroken
continuity—“honey flows slowly”
run, sense 100 (verb, intransitive): to stride quickly
run, sense 115 (verb, intransitive): (of liquids, sand, etc.) to flow
run, sense 316 (noun): a movement or flow
run, sense 329 (verb, transitive, slang): to control—“the
supervisor runs the flow of assignments”
run, sense 331 (verb, intransitive): (of computer program)
execute—“analyze the efficiency of the flow of the program when the
program runs”
The entries above include an entry for one sense of flow and entries for each of five different senses of run. Each entry identifies a word, a sense of the word, a part of speech, a definition, and, in some cases, a usage example. For example, the first entry above is for an intransitive verb sense of the word flow, sense
100
. (“Transitive” characterizes a verb that takes an object, while “intransitive” characterizes a verb that does not take an object.) This sense of flow has the definition “to run smoothly with unbroken continuity,” and the usage example “honey flows slowly.”
Many languages contain polysemous words—that is, words that have multiple senses or meanings. Because different definitions and usage examples are appropriate for different senses of the same polysemous word, many dictionaries take care to subdivide polysemous words into their senses and provide separate entries, including separate definitions and usage examples, for each sense as shown above.
Dictionaries are generally produced for human readers, who are able to use their understanding of some words and word senses of a language in order to understand entries for other words or word senses with which they are not familiar. For example, a reader might know the different senses of the word run shown above, but might not know the word flow. To learn more about the word flow, the human reader would look up the entry shown above for flow. In reading the definition “to run smoothly with unbroken continuity” for flow, a human reader would employ his or her understanding of the different senses of run to determine that this definition of flow refers to the sand and liquid flowing sense of run (sense
115
) rather than any other senses of run.
The field of natural language processing is directed to discerning the meaning of arbitrary natural language expressions, such as phrases, sentences, paragraphs, and longer documents, in a computer system. Given the existence of conventional dictionaries intended for human readers as described above, it is desirable to utilize such dictionaries as a basis for discerning the meaning of natural language expressions in a natural language processing system. The information in such a dictionary is not optimized for use by a natural language processing system, however. As noted above, the meaning of the occurrence of the word run in the definition for flow is ambiguous, thus rendering the entire definition for flow ambiguous. That is, the definition for flow may mean any of the following, depending upon the sense of run that is selected:
TABLE 2
Sense of run
employed
Meaning
100
to <stride quickly> smoothly with unbroken continuity
115
to <flow like liquid or sand> smoothly with unbroken
continuity
316
to <a movement or flow> smoothly with unbroken
continuity
329
to <control> smoothly with unbroken continuity
331
to <execute, as a computer program> smoothly with
unbroken continuity
While it is clear to a typical human reader that the second of these interpretations is by far the most plausible, a computer-based natural language processing system does not share the human intuitions that provide a basis for resolving the ambiguity between these five possible meanings. An automated method for augmenting a conventional dictionary by adding word sense characterizations to occurrences of words whose sense is not characterized in a representation of the dictionary would have significant utility for natural language processing systems, so that natural language processing systems need not select between multiple meanings of text strings represented in the dictionary representation that contain polysemous words. Such an augmented dictionary representation represents the relationships between word senses, rather than merely relationships between orthographic word shapes.
SUMMARY OF THE INVENTION
The present invention is directed to characterizing the senses of occurrences of polysemous words. In accordance with the invention, a sense characterization software facility (“the facility”) characterizes the sense in which words are used. In a representation of a dictionary such as a lexical knowledge base that contains sense characterizations for some word occurrences, the facility collects a number of dictionary text segments, such as definitions and usage examples, that all contain a common word, such as flow. This collection of dictionary text segments represents a context in which only a small subset of the total number of senses of polysemous words other than the common word are likely to be used. The facility then finds a word occurrence among the collected text segments that does not have a sense characterization, such as the occurrence of the word run in the following definition for the word flow: “to run smoothly with unbroken continuity.” The facility then identifies other occurrences of the same word, run, among the collected text segments that do have sense characterizations. In this regard, the collected text segments may include definitions and usage examples of run that each (a) contain sense-characterized occurrences of run, and (b) contain flow. The facility then selects one of the identified occurrences of run, and copies its sense characterization to the occurrence which does not have a sense characterization. If the text segment containing the occurrence receiving the new sense characterization occurs elsewhere in the dictionary representation, the new sense characterization is further copied to, or “propagated to,” the same word occurrence in the other occurrences of the text segment. This process is preferably repeated for a large number of word occurrences, substantially increasing the number of word occurrences in the dictionary representation having sense characterizations.
The facility preferably selects one of the identified word occurrences by first rejecting identified word occurrences having inappropriate senses. These include senses that have different parts of speech or (for verbs) transitivity attributes than the word occurrence without a sense characterization. In the example, the occurrence of run without a sense characterization is used as an intransitive verb, so the facility rejects identified word occurrences having parts of speech other than verb or that are transitive. The facility also preferably rejects identified word occurrences having specialized senses that are annotated as slang, archaic, or limited to a specific subject matter domain. After rejecting identified word occurrences having inappropriate senses, the facility selects one of the remaining identified word occurrences in a way that favors (a) identified word occurrences derived from the same dictionary as the word occurrence without a sense characterization and (b) identified word occurrences that have strong relationships with the common word in the dictionary text segments in which they appear. As an example of (b), the occurrence of run in the text segment “run (sense
115
): (of liquids, sands, etc.) to flow” has a stronger relationship with the occurrence of flow than does the occurrence of run in the text segment “analyze the efficiency of the flow of the program when the program runs (sense

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Bootstrapping sense characterizations of occurrences of... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Bootstrapping sense characterizations of occurrences of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Bootstrapping sense characterizations of occurrences of... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2445989

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.