Retrieval system of secondary data added documents in...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06697798

ABSTRACT:

TECHNICAL FIELD
The present invention relates to a retrieval system having an interface for intuitive operations of retrievals of secondary data added sentences in a database and a program for the system. Further, the present invention relates to a database suitable for high-speed retrievals of secondary data added sentences by using an intuitive interface.
More particularly, the present invention relates to a retrieval system having an intuitively searchable interface for an annotated corpus, an example of secondary data added documents.
BACKGROUND
A sub-string search technique (also known as a full-text search technique) is useful to retrieve sentences, which function as information transmitters, from a collection of texts such as newspaper articles or patent specifications. The same technique is applied to retrievals of HTML documents in the Internet. In this technique only character strings included in texts displayed on a Web-browser are searched, and another parts of HTML documents are neglected.
Though it is possible to use plural key words for one search and to analyze documents whether there is a match between each key word and character strings by using this technique, word order of key words in one sentence is not considered. Because a relationship of key words in a query by this technique is just a simple conjunction.
In this specification, “a document” is a collection of sentences divided by a period and the like, expressing character information of something organized such as a newspaper article. An element of “a document” ended by a period and the like is “a sentence”.
In a standard data interchange format such as Standard Generalized Markup Language (SGML), secondary data can be added to each related sentence as attributes in a tag. Secondary data added sentences of such a standard data interchange format have advantages that various types of information can be included in a tag and a data interchange is easy because such sentences are essentially written in text format.
Applying these advantages to a corpus, an annotated corpus, which has not only sentences but also secondary data relating to each sentence, is the current focus of attention.
A corpus is generally a computerized large collection of linguistic data included in various documents such as newspaper articles or screenplays, having the purpose of support for language description or language analysis. In other words, a corpus is a large collection of illustrations of daily usage in the form of electronic character data. In many case, a corpus is retrieved by using GREP, and a retrieval result is displayed on screen in Key Word in Context format (KWIC). Making use of a corpus brings convenient way of collocation searches to clarify a practical side of language expressions, and is useful for a natural language description or language analysis. A corpus is a collection of documents, and a document is a collection of sentences.
As a search method for a corpus, a full-text search methods is well used. For example, Unexamined Patent Publication (Kokai) No.8-137898 discloses an invention concerning a document retrieval system, expanding a user input key word into related key words referring to a concept dictionary, searching a corpus with the related key words, so as to improve accuracy of a search.
An annotated corpus mentioned above is a corpus that secondary data such as a part of speech, a lemma and so on are added to each syntactic unit of a document such as a word, a phrase or a chapter and so on as attributes of a tagged form. From the point of view of data input efficiency, most of annotated corpuses in use adopt a format adding secondary data to each word. A format of an annotated corpus is not limited to a tagged form, for example, a format divided simply by “/” can also be used. An annotated corpus are widely used in the field of language study or dictionary compilation. As an embodiment of an annotated corpus, The British National Corpus (BNC) and The Bank of English are known. The vast file size of each corpus amounts to a few Giga bytes.
One problem concerning with these corpuses is that, retrieval results of such a large corpus without utilizing secondary data (annotation) are often useless because too many matchings occur. Therefore, many linguists want to make use of secondary data such as a part of speech included in an annotated corpus in order to limit the number of retrieved sentences.
However, for the purpose of utilizing secondary data, a query has to be based on a special construction rule which is called “Corpus Query Language” (CQL). And a user has to learn practical expressions of each corpus, CQL, UNIX commands, software, programming and so on.
Further, many annotated corpuses in use are not suitable for fast retrieval because the formats of such corpuses are selected from the point of view of data input efficiency. For example, a retrieval of a phrase composed of more than one word such as “pretty woman” from a corpus, which is added secondary data on each word as attributes of SGML, takes much time.
SUMMARY OF THE INVENTION
In one aspect, the present invention relates to a retrieval system having an interface for intuitive operations to search secondary data added documents such as an annotated corpus without detailed knowledge about a format of each corpus, commands or programmings.
In one aspect, the present invention relates to a retrieval system and a database to retrieve secondary data added documents in a relatively short time.
In another aspect, the present invention relates to a retrieval system of a database storing secondary data added documents, said system including:
means for transmitting a graphical user interface (GUI) for searching having data entry fields configured in a matrix, to display on a user's display;
means for storing retrieval data input in one or more data entry field(s) of the GUI;
means for locating each data entry field in which each datum is input;
means for generating a query comprising query units, each unit being generated by using a set of retrieval data input in each one column of data entry fields of said matrix, and each unit corresponding to one element of said document,
in the case that more than one column of data entry fields being input with retrieval data, generating a query so as to retrieve sentences having the same order of elements in each sentence as the order of said columns of data entry fields,
in the case that only one column of data entry field being input with retrieval data, generating a query so as to retrieve sentences having an element corresponding to said retrieval data;
means for interpreting said query and searching said database;
means for transmitting search results to display on a user's display.
According to the fourth aspect of the present invention, we provide a program for a retrieval of a database storing secondary data added documents, said program including the step of:
transmitting a GUI having data entry fields configured in a matrix, to display on a user's display;
storing retrieval data being input in one or more data entry field(s) of a GUI for searching;
locating each data entry field being input with each datum;
generating a query comprising query units, each unit being generated by using a set of retrieval data input in each one column of data entry fields of said matrix, and each unit corresponding to one element of said document,
in the case that more than one column of data entry fields being input with retrieval data, generating a query so as to retrieve sentences having the same order of elements in each sentence as the order of said columns of data entry fields,
in the case that only one column of data entry field being input with retrieval data, generating a query so as to retrieve sentences having an element corresponding to said retrieval data;
interpreting said query and searching said database.
As a result, a retrieval of secondary data added documents becomes fast and accurate because elements of sentences are the unit of data input and searching.
As a result, a retriev

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Retrieval system of secondary data added documents in... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Retrieval system of secondary data added documents in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Retrieval system of secondary data added documents in... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3345054

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.