Method and system for reducing lexical ambiguity

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S002000

Reexamination Certificate

active

06721697

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to language translation systems. More particularly, the present invention relates to a method for reducing lexical ambiguity.
2. Background Information
With the continuing growth of multinational business dealings where the global economy brings together business people of all nationalities and with the ease and frequency of today's travel between countries, the demand for a machine-aided interpersonal communication system that provides accurate near real-time language translation, whether in spoken or written form, is a compelling need. This system would relieve users of the need to possess specialized linguistic or translational knowledge.
A typical language translation system functions by using natural language processing. Natural language processing is generally concerned with the attempt to recognize a large pattern or sentence by decomposing it into small subpatterns according to linguistic rules. A natural language processing system uses considerable knowledge about the structure of the language, including what the words are, how words combine to form sentences, what the words mean, and how word meanings contribute to sentence meanings. However, linguistic behavior cannot be completely accounted for without also taking into account another aspect of what makes humans intelligent—their general world knowledge and their reasoning abilities. For example, to answer questions, to participate in a conversation, or to create and understand written language, a person not only must have knowledge about the structure of the language being used, but also must know about the world in general and the conversational setting in particular. Specifically, phonetic and phonological knowledge concerns how words are related to sounds that realize them. Morphological knowledge concerns how words are constructed from more basic units called morphemes. Syntactic knowledge concerns how words can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of what other phrases. Typical syntactic representations of language are based on the notion of context-free grammars, which represent sentence structure in terms of what phrases are subparts of other phrases. This syntactic information is often presented in a tree form. Semantic knowledge concerns what words mean and how these meanings combine in sentences to form sentence meanings. This is the study of context-independent meaning—the meaning a sentence has regardless of the context in which it is used. The representation of the context-independent meaning of a sentence is called its logical form. The logical form encodes possible word senses and identifies the semantic relationships between the words and phrases.
Natural language processing systems further include interpretation processes that map from one representation to the other. For instance, the process that maps a sentence to its syntactic structure and logical form is called parsing, and it is performed by a component called a parser. The parser uses knowledge about word and word meaning, the lexicon, and a set of rules defining the legal structures, the grammar, in order to assign a syntactic structure and a logical form to an input sentence.
Formally, a context-free grammar of a language is a four-tuple comprising nonterminal vocabularies, terminal vocabularies, a finite set of production rules, and a starting symbol for all productions. The nonterminal and terminal vocabularies are disjoint. The set of terminal symbols is called the vocabulary of the language. Pragmatic knowledge concerns how sentences are used in different situations and how use affects the interpretation of the sentence.
A natural language processor receives an input sentence, lexically separates the words in the sentence, syntactically determines the types of words, semantically understands the words, pragmatically determines the type of response to generate, and generates the response. The natural language processor employs many types of knowledge and stores different types of knowledge in different knowledge structures that separate the knowledge into organized types.
The complexity of the natural language process is increased due to lexical ambiguity of input sentences. Cases of lexical ambiguity may hinge on the fact that a particularly word has more than one meaning. For example, the word bank can be used to denote either a place where monetary exchange and handling takes place or the land close river, the bank of the river. A word or a small group of words may also have two or more related meanings. That is, the adjective bright may be used as a synonym for “shining” (e.g., “The stars are bright tonight”) or as a synonym for “smart” (e.g., “She must be very bright if she made an “A” on the test”). In the field of spoken language translation, the problem is compounded by words that are not necessarily spelled the same but are pronounced the same and have different meanings. For example, the words night and knight are pronounced exactly the same although they are spelled differently, and they have very different meanings.
Factors causing the lexical ambiguity vary from one language to another. In character-based languages, e.g. in the Japanese language, extracting information from an input sentence creates a serious problem because Japanese sentences do not have spaces between words. Part-of-speech (POS) tags are another factor causing lexical ambiguity. In many languages, including both word-based and character-based natural languages, one word may have more than one POS tag depending on the context of POS within the sentence. The word table, for example, can be a verb in some contexts (e.g., “He will table the motion”) and a noun in others (e.g., “The table is ready”). The existence of multiword expressions in many languages, including the English language, is yet another factor contributing to lexical ambiguity. That is, depending on the context, a group of words, such as “white house”, can be treated as a multiword expression (e.g., “I want to visit the White House”) or as separate words (e.g., “He lives in a white house across the street”).
One current approach that deals with lexical ambiguity in a Japanese input sentence involves treating each Japanese character as a word and letting the parser group the characters using the parsing grammar. After the parser defines the words, the parser must try all POS tags found for each word and rule out the impossible tags. As a result, the parsing program is time consuming and requires a large amount of space for its operation. If a long or complicated sentence is involved, such a parser may not be able to perform the parsing at all.
Another current approach to deal with lexical ambiguity recognizes all the possible words in a Japanese sentence and then finds possible connections between adjacent words. The recognition of all the words is done using a morpheme dictionary. The morpheme dictionary defines Japanese morphemes with the names of POS tags. The connectivity is defined using a connection-pair grammar. The connection-pair grammar defines pairs of sets of morphemes that may occur adjacently in a sentence. Various costs are then applied to the morphemes to compare all possible segmentations of the input sentence. These various costs correspond to the likelihood of observing a word as a certain part of speech and to the likelihood of observing two words in adjacent positions. In this approach, the segmentation that has the lowest corresponding cost is selected from all the possible segmentations of the input sentence for further processing. However, the segmentation selected based upon the lowest costs may not correspond to the correct meaning of the input sentence. Since the syntactic parser is better equipped to recognize the correct meaning of the input sentence, making a selection before the parsing operation may result in loss of pertinent information. Consequently, this approach may

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for reducing lexical ambiguity does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for reducing lexical ambiguity, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for reducing lexical ambiguity will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3247044

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.