Thesaurus retrieval and synthesis system

Data processing: speech signal processing – linguistics – language – Linguistics – Dictionary building – modification – or prioritization

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06282509

ABSTRACT:

BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates to a system which can generate thesauruses as required by various operations to meet users' purposes.
Conventionally, research has been actively done on text retrieval and automatic classification as part of natural language processing techniques. In such fields, there is increasing importance of a so-called knowledge base, particularly one in which relations among concepts called a thesaurus are hierarchically defined primarily based on rank relations.
The flow of knowledge processing having been proposed conventionally is broadly divided into a rule-based approach and a example-based approach.
The rule-based approach performs knowledge processing by mapping the real world into a combination of defined rules. In a production system, which is a typical rule-based system, problems are solved by an inference engine referencing a rule base. This approach is characterized by the modularity and uniformity of rules, and natural knowledge representation. To the contrary, this approach has the problem of the difficulty of the creation and maintenance of rules because of reduction in efficiency due to increased rules as a result of treatment for exceptions and the intrinsic difficulty of creation of rules.
On the other hand, the example-based approach constructs a knowledge base based on expressions and cases that actually exist.
Methods for the example-based approach are further broadly divided into a general-purpose method and a specific-use method.
The general-purpose method intends to describe world knowledge based on actual examples, as typified by the dictionary construction approach of Japan Electronic Dictionary Research Institute (EDR). This method has the advantage of enabling non-specialists to construct a knowledge base having high volumes of knowledge by providing manpower because it does not depend on specific syntax rules for a method of dictionary description, etc., ensuring uniformity and accuracy for concepts having actual examples like a rule base. However, the fact that each concept generally has polysemy would provide no guarantee for accuracy and make detailed reflection of reality difficult without the same situation as at actual use of an actual example. Also, since the knowledge base becomes a very large one that contains hundreds of thousands of concepts or more, there is a problematic point that individual pieces of data are easy to input but it is very difficult to perform maintenance on the entire knowledge base.
As inventions relating to such thesaurus construction and generation, there are proposed “Knowledge Structure Creating Method” described in Japanese Published Unexamined Patent Application No. Hei 4-237332, “Thesaurus Generating Device” described in Japanese Published Unexamined Patent Application No. Hei 4-39769, and “Data Classification Device/Method, Data Classification Tree Generating Device/Method, Derivative Extracting Device/Method, Thesaurus Constructing Device/Method, and Data Processing Device” described in Japanese Published Unexamined Patent Application No. Hei 8-16620.
However, all of these inventions merely automate construction by introducing analysis techniques and only enjoy the merits of the example-base approach, providing no solution for the above-mentioned problem.
“Thesaurus Automatic Reorganization Device” described in Japanese Published Unexamined Patent Application No. Hei 3-276369, which describes the reconstruction of thesaurus, assumes thesauruses intended for specific uses as described below, such as program parts and electronic parts, providing no solution for the problem of polysemy in a general-purpose large-scale thesaurus.
A method for a specific use constructs and uses a small-scale knowledge base to meet the use. This method, although it has the advantage of providing easy maintenance while reflecting reality because of the specific-use characteristics, has a problem in that application to another use or reuse after several modifications requires as much time and effort as constructing a knowledge base from the beginning because situations eligible for use are limited and its existence is unknown to others.
As an invention relating to this use, there is proposed “Field-Classified Thesaurus Generating Device” described in Japanese Published Unexamined Patent Application No. Hei 9-6789. The invention relates to a technique which constructs field-classified thesauruses with inadequate words not relating to specific fields being removed based on general-purpose thesauruses from queries defining specific fields. However, the invention provides no solution for the problem of the difficulty of construction of general-purpose thesauruses, and further provides no effective means for reuse.
As described above, it has been so far difficult to construct a new thesaurus to meet user's purposes and uses from existing thesauruses.
SUMMARY OF THE INVENTION
The present invention has been made in consideration of the above-described conventional circumstances and its object is to generate a new thesaurus from existing thesauruses by operations in order to facilitate thesaurus expansion and maintenance.
Another object of the present invention is to generate a new thesaurus to meet user's purposes and uses from a plurality of thesauruses distributed on a network in order to facilitate thesaurus expansion and maintenance.
To achieve the above-mentioned objects, the present invention defines basic operations to synthesize thesauruses meeting the objects and generates a new thesaurus by repeating operations on pages making up the thesauruses.
The structure of a thesaurus relating to the present invention is as shown in FIG.
1
. The thesaurus consists of pages P, P
1
, P
2
, P
3
, and so on, each of which has nodes N, N
1
, N
2
, N
3
, and so on, and arcs A, A
1
, A
2
, A
3
, and so on indicating a lower relation with a corresponding node, respectively, and is constituted by relating the pages. The nodes N, N
1
, N
2
, N
3
, and so on each are provided with a concept and assigned a unique identifier ID, and the arcs A, A
1
, A
2
, A
3
, and so on each contain the identifier ID of a node related subordinate thereto. Therefore, individual nodes (i.e., pages) in the thesaurus can be located by an identifier ID.
The present invention performs addition, product, Cartesian product, transposition, and other operations on thesauruses of the above structure in response to requests from a user, thereby generating a new thesaurus satisfying the objects.
For example, in a product operation, a matching concept is detected from lower nodes corresponding to arc identifiers among a plurality of inputted pages, a second page having the node of the matching concept is generated, and the second page is related with the arc of a new page, whereby a new thesaurus with the new page as the top node is generated.
As an embodiment of a product operation, a matching concept is detected from lower nodes corresponding to arc identifiers among a plurality of inputted pages, a second page having the node of a concept of a predetermined match level or higher is generated, and the second page is related with the arc of a new page, whereby a new thesaurus with the new page as the top node is generated.
In a Cartesian product operation, the arcs of a new page are generated with new identifiers assigned to arcs thereof by combining the arcs of a plurality of inputted pages, second pages generated based on lower nodes corresponding to the original arcs subjected to the combination processing are related with the newly generated arcs, and the nodes of concepts common among lower nodes of the original nodes subjected to the combination processing are related with the arcs of the second pages, whereby a new thesaurus with the new page as the top node is generated.
In a transposition operation, when a target node provided with a target concept is specified, a new thesaurus with the target node as the top node is generated by performing the operation so as to relate a node related

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Thesaurus retrieval and synthesis system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Thesaurus retrieval and synthesis system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Thesaurus retrieval and synthesis system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2534135

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.