Method of and system for disambiguating syntactic word...

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06260008

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method of and a system for disambiguating syntactic word multiples, such as pairs of words, which may be collocates (i.e. words frequently used together). Such a method and system may be used in natural language processing (NLP), for instance for assisting in machine translation between languages.
2. Description of the Related Art
The term “disambiguating” when applied to a word (or group of words) means clarifying the meaning of the word (or group) with reference to the context in which the word (or group) occurs. For example, the verb “fire” can be used to describe a shooting act, such as “fire pistol”, or the act of dismissal from employment, such as “fire employee”. Disambiguating the verb “fire” in “fire pistol” would comprise clarifying that the verb is used in the “shooting” sense.
It is known in NLP systems to provide syntactic analysis whereby a “parser” analyses input or stored text into the different “parts of speech”. However, in natural languages, the same word with the same spelling and part of speech can have different meanings according to the context in which it occurs. For example, as described above, the verb “fire” describes a shooting act in the context of “fire pistol” and an act of dismissal from employment in the context of “fire employee”. In such cases, syntactic analysis as available through conventional parsers is unable to clarify the meaning of words in context. There is therefore a need for “word disambiguation” in order to complement syntactic analysis in NLP systems.
A first step in word disambiguation may be performed by clustering word senses in terms of semantic similarity. Word senses may, for instance, be found in electronic dictionaries and thesauri. Semantic similarity can be assessed from electronic thesauri where synonymic links are disambiguated i.e. each synonymic link relates specific word senses.
Words are said to be “semantically similar” or “semantically congruent” if their meanings are sufficiently close. Closeness in meaning may be established in terms of equivalance or compatibility of usage. For example, the words “gun” and “pistol” are very close in meaning because they can be used to describe the same object. Similarly, the words “ale” and “beer” are very close in meaning because the first is a specific instance of the second i.e. “ale” is a type of “beer”. The notion of semantic similarity can also be used in a relative sense to express the degree to which words are close in meaning. In this case, there may be no equivalence or compatibility in meaning. For example, there is a clear sense in which “doctor” and “nurse” are closer in meaning than “doctor” and “computer”. Although “doctor” and “nurse” are neither equivalent nor compatible in meaning because they describe distinct professions, they both refer to a person who is trained to attend sick people. The words “doctor” and “computer” share little else beyond the fact that both refer to concrete entities. In this case, the specificity of the shared concept (for instance “person who is trained to attend sick people” versus “concrete entity”) determines the relative semantic similarity between words.
There are several known techniques for finding semantic similarity between two words using machine readable thesauri. Examples of such techniques are disclosed in European Patent Application No. 91480001.6 and in papers published by Resnik in 1995 entitled “Using Information Content to Evaluate Semantic Similarity in a Taxonomy” (IJCAI-95) and “Disambiguating Noun Groupings with Respect to WordNet Senses” (third workshop on very large corpora, Association for Computational Linguistics). The techniques disclosed by Resnik make use of the WordNet Lexical Database disclosed by Beckwith et al, 1991, “WordNet: A Lexical Database Organised on Psycholinguistic Principles” in Lexical Acquisition, LEA, Hillsdale, N.J.
Resnik defines the semantic similarity between two words as the “entropy” value of the most informative concept subsuming or implied by the two words. This assessment is performed with reference to a lexical database such as WordNet mentioned hereinbefore, where word senses are hierarchically arranged in terms of subsumption links. For example, all senses of the nouns “clerk” and “salesperson” in WordNet are connected to the first sense of the nouns “employee”, “worker”, “person” so as to indicate that “clerk” and “salesperson” are a kind of “employee” which is a kind of “worker” which in turn is a kind of “person”. In this case, the semantic similarity between the words “clerk” and “salesperson” would correspond to the entropy value of “employee”, which is the most informative (i.e. most specific) concept shared by the two words.
The informative content (or Entropy) of a concept c (such as a set of synonymic words such as fire, dismiss, terminate, sack) is formally defined as:
−log p(c)
where p is the probability of c. The probability of c is obtained for each choice of text sample or collection K by dividing the frequency of c in K by the total number W of words which occur in K and which have the same part of speech as the word senses in c. This may be expressed as:
p(c
pos
)=(freq(c
pos
))/(w
pos
)
where pos specifies the same part of speech. The frequency of a concept is calculated by counting the occurrences of all words which are an instance of (i.e. subsumed by) the concept: every time that a word w is encountered in K, the count of all concepts subsuming w is increased by one. This may be expressed as:
freq

(
c
)
=

W

words

(
c
)

count

(
W
)
The semantic similarity between two words W
1
and W
2
is expressed as the entropy value of the most informative concept c which subsumes both W
1
, W
2
:
sim
(
W
1,
W
2)=
c&egr;{x &Lgr;subsumes(x,W1)
max
&Lgr;subsumes(x,W2)}
[−log
p
(
c
)]
The specific senses of W
1
, W
2
under which semantic similarity holds can be determined with respect to the subsumption relation linking c with W
1
, W
2
. For instance, if it is found that, in calculating the semantic similarity of the two verbs “fire” and “dismiss” using the WordNet lexical database, the most informative subsuming concept is represented by the synonym set containing the word sense remove_v

2, then it is known that the senses for “fire” and “dismiss” under which the similarity holds are fire

4 and dismiss_v

4 because these belong to the only set of word senses which are subsumed by remove_v

2 in the WordNet hierarchy.
SUMMARY OF THE INVENTION
According to a first aspect of the invention, there is provided a method of disambiguating first and second words occurring in a first predetermined syntactic relationship, comprising the steps of:
(a) forming a plurality of first sets, each of which comprises: a first subset containing a plurality of senses of the first word; and a second subset containing a plurality of first word senses which are capable of being in the first predetermined syntactic relationship with the first word and which have semantically similar senses,
(b) forming a plurality of second sets, each of which comprises: a third subset containing a plurality of second word senses which are capable of being in the first predetermined syntactic relationship with the second word and which have semantically similar senses; and a fourth subset containing a plurality of senses of the second word, and
(c) selecting an output set comprising each sense of the first word and each sense of the second word which senses occur together in at least one of the first sets and in at least one of the second sets.
According to a second aspect of the invention, there is provided a system for disambiguating first and second words occurring in a first predetermined syntactic relationship, the system comprising a data processor programmed to perform the steps of
(a) forming a plurality of first sets, each of which comprises: a first subset containing a plurality of senses of the first word; and a second

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of and system for disambiguating syntactic word... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of and system for disambiguating syntactic word..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of and system for disambiguating syntactic word... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2469976

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.