Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-08-27
2001-04-17
Alam, Hosain T. (Department: 2771)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06219664
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to a search method and system employing syntactic information, and in particular to a method and system for parsing/analyzing a sentence upon receiving a search request, and for searching through a document.
BACKGROUND OF THE INVENTION
Search systems that are currently in use on the World Wide Web (WWW) are either keyword search types or full-text search types. Since with such systems a very large number of search results are obtained, a great deal of effort must be expended before a target document for a user can be located. In attempts to resolve this problem, various corrective methods have been employed. According to one of them, a search request is submitted in the form of a sentence, not the logical product or the logical sum of several keywords, and a search is made using a sentence that resembles the sentence used to request the search. Technically, this method can be broken down into the following sub-methods:
(1) A vector space model method
(2) A keyword location constraint matching method
(3) A sentence matching method
The vector space model method (1) (“Automatic Text Processing: the transformation, analysis and retrieval of information by computer,” Salton G., Addison-Wesley Publishing, 1989) is a method whereby a document and a search request are respectively regarded as vectors, with their keywords acting as axes; and similarity is calculated by using the distance between the vectors. However, since with this method it is merely assumed that a keyword in a search request has appeared independently, this method can not be used to cope with a situation wherein a keyword in a search request just happens to be included in a large document.
The keyword location constraint matching method (2) (“Fast Method for Obtaining a Similarity for a Long Japanese Expression,” Hideki Tanaka, reference material for the Language Processing Research Group of the Information Processing Institute, NLWG121-10, 1997) is a method for extracting keywords from a search request, and for defining, as matching, those keywords that satisfy a total-order relationship concerning the locations at which the keywords appear. This method is superior to method (1), but is inferior to method (3), in that only the locations of the keywords are used as constraints.
Method (3) is one for analyzing a search request and a document and for obtaining a match at a syntactic tree level. Although this appears to be an ideal method, the accuracy and the speed that are attained with it are unsatisfactory for syntactic analysis. Therefore, it is not widely employed.
It is therefore one object of the present invention to provide a search method and system for maintaining a balance between the accuracy and the speed attained during syntactic analysis.
It is another object of the present invention to provide a method and a system for performing an efficient network search.
It is an additional object of the present invention to provide a search method and system that does not syntactically analyze a document to be retrieved.
It is a further object of the present invention to provide a search method and system for the employment of location constraint data in a search request sentence.
SUMMARY OF THE INVENTION
To achieve the above objects, first, a search request is syntactically analyzed, and location constraint data for keywords and function words (FNWORD) are extracted that constitute partial-order relationships. A document, which is to be retrieved and for which no syntactic analysis is required, is sought that contains a sentence matching the partial-order relationship. In the search for the document containing the sentence that matches the partial-order relationship, a sentence having a short context that matches the partial-order relationship is regarded as a sentence that has a higher similarity. With this arrangement, when compared with the matching method (2) of the keyword location constraint to type, data providing more detailed location constraints can be extracted by employing syntactic analysis. In addition, when a comparison is made using the syntactic matching method (3), the problems of poor matching accuracy and low speed, which occur as the result of an incomplete syntactic analysis, can be resolved because a document to be retrieved is not syntactically analyzed.
FIG. 1
is a fundamental flowchart showing a search method according to the present invention. At step
110
a search request is syntactically analyzed, and at step
120
location constraint data (partial-order relationships) are extracted from the results of the analysis. Finally, at step
130
a sentence that matches the location constraint data (partial-order relationships) is obtained from a document to be retrieved.
With a search request QS, syntactic analysis tree QT is generally represented as follows.
Expression 1
QT=TREE
TREE=(WORD)|(CHILD+HEAD CHILD*)|(CHILD* HEAD CHILD+)
HEAD=(‘HEAD’ TREE)
CHILD=(FUNC TREE)|(TREE FUNC)
FUNC=‘FN’ WORD
where FUNC represents an existing modification relationship between HEAD and TREE. An example analysis tree for a search request is represented as follows.
Expression 2
“XXX sha no YYY sha heno teiso”
(((XXX sha)FN no)((YYY sha)FN heno)(HEAD(teiso)))
“lawsuit of XXXCO. to YYYCo.”
(((HEAD(lawsuit))(FN of (XXXCo.))(FN to (YYYCo.)))
From the above syntactic analysis tree, location data existing between HEAD and CHILD are employed as location constraint data. The extracted location constraint data are represented as follows.
order constraint . . . The positional relationship between CHILD and HEAD must be maintained. For example, when HEAD is located after CHILD, the relationship should be described as CHILD→HEAD.
neighbor-order constraint . . . A HEAD word and an FN word in a NODE must be neighbors, while their positional relationship is maintained. Being neighbors means that these words are located within a distance delineated by a count of words that is equivalent to a numeral provided as a parameter. For example, when FNWORD is in the neighborhood that follows the NODE, the positional relationship is described by NODE→FNWORD.
Therefore, the following location constraint data are obtained from the above example Japanese sentence.
Expression 3
XXX sha→teiso
YYY sha→teiso
XXX sha→no
YYY sha→heno
Also, the following location constraint data are obtained from the example English sentence.
Equation 4
lawsuit→XXXCo.
lawsuit→YYYCo.
of→XXXCo.
to→YYYCo.
These location constraint data are employed for the search.
It should be noted, however, that the matching similarity is higher for a group having a shorter context that satisfies the constraint data, such as a paragraph that consists of two sentences rather than a single sentence.
When compared with the vector space model in background art (1), it is apparent that, as well as the matching method (2) which uses keyword location constraints, the method of the present invention is superior in the employment of the location constraints associated with keywords. In addition, when compared with the syntactic matching method (3), the method of the invention is superior in that, since a document to be retrieved is not syntactically analyzed, the problems of incomplete syntactic analysis and low speeds for the matching of syntactic trees can be resolved. When compared with the matching method (2), which uses the keyword location constraints, a more flexible search can be performed by using location constraint data that are selected from a correlation of the syntactic trees. For example, when six keywords, A, B, C, D, E and F are present in the named order and form the following syntactic tree,
Expression 5
search request: A . . . B . . . C . . . D . . . E . . . F
syntactic tree: ((FN fn
1
((FN fn
2
(A))(HEAD(B))))
(FN fn
3
((FN fn
4
(C)(FN fn
5
(D))(HEAD(E))))
(HEAD(F)))
document 1: . . . A . . . B . . . C . . . D . . . E . . . F . . .
document 2: . . . A . . . B . . . C . . . D . . . E . . . F . . .
documen
Alam Hosain T.
Corrielus Jean M.
Dougherty Anne Vachon
International Business Machines Corp.
Jordan Kevin M.
LandOfFree
Search method and system using syntactic information does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Search method and system using syntactic information, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Search method and system using syntactic information will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2457661