Method for creating an information closure model

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06539378

ABSTRACT:

COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
This invention relates to structured information retrieval and interpretation from disparate semistructured information resources. A particular application of the invention is extraction of information from public and semipublic databases through worldwide information sources, as facilitated by the Internet.
The Internet provides avenues for worldwide communication of information, ideas and messages. Although the Internet has been utilized by academia for decades, recently public interest has turned to the Internet and the information made available by it. The World Wide Web (or “the Web”) accounts for a significant part of the growth in the popularity of the Internet, due in part to the user-friendly graphical user interfaces (“GUIs”) that are readily available for accessing the Web.
The World Wide Web makes hypertext documents available to users over the Internet. A hypertext document does not present information linearly like a book, but instead provides the reader with links or pointers to other locations so that the user may jump from one location to another. The hypertext documents on the Web are written in the Hypertext Markup Language (“HTML”).
As the popularity of the World Wide Web grows, so too does the wealth of information it provides. Accordingly, there may be many sites and pages on the World Wide Web that contain information a user is seeking. However, the Web contains no built-in mechanism for searching for information of interest. Without a searching mechanism, finding sites of interest would literally be like finding a needle in a haystack. Fortunately, there exist a number of web sites (e.g., YAHOO, ALTA VISTA, EXCITE, etc.) that allow users to perform relatively simple keyword searches.
Although keyword searches are adequate for many applications, they fail miserably for many others. For example, there are numerous web sites that include multiple entries or lists on job openings, houses for sale, and the like. Keyword searches are inadequate to search these sites for many reasons. Keyword searches invariably turn up information that, although matching the keywords, is not of interest. This problem may be alleviated somewhat by narrowing the search parameters, but this has the attendant risk of missing information of interest. Additionally, the search terms supported may not allow identification of information of interest. As an example, one may not be able to specify in a keyword search query to find job listings that require less than three years of experience in computer programming.
Ideally, it would be desirable if information like job listings on multiple web sites could appear as a single relational database so that relational database queries could be utilized to find information of interest. However, there is no standard for the structure of information like job listings on the Web. This problem was addressed in a co-owned, U.S. Pat. No. 5,826,258, in the name of Ashish Gupta, et. al., entitled “Method and Apparatus for Structuring the Querying and Interpretation of Semistructured Information,” which introduced the concept of “Wrappers” for retrieving and interpreting information from disparate semistructured information resources. Wrappers are programs that interact with web sites to obtain information stored in the web site and then to structure it according to a prespecified schema. In a copending U.S. patent application Ser. No. 10/000,743, in the name of Ashish Gupta, et al., entitled “Method and Apparatus for Creating Extractors, Field Information Objects and Inheritance Hierarchies in a Framework for Retrieving Semistructured Information,” methods for obtaining information using wrappers are disclosed. However, these methods do not teach the information closure techniques of the present invention.
What is needed is a method of forming an information closure from related tuples of information for incorporation into a relational database.
SUMMARY OF THE INVENTION
According to the invention, a method is provided for forming an information closure of a plurality of rows in a linkage stack built by a wrapper program for accessing semistructured information. This method includes removing a first row from the linkage stack and computing a cross product of the fields in the first row. A step of adding this cross product to a list of accepted rows can also be part of the method. For each remaining row in the linkage stack, the method includes a step of computing a selective cross product according to a plurality of steps. In one step, a result is initialized to empty. Then, for each row in the list of accepted rows, a step of determining for a first new row from the accepted row, extended with the non-empty fields of the remaining row is performed. The method can also include a step of determining a second new row from the remaining row, extended with the non-empty fields in the accepted row. Thereupon, a step of adding the two new rows to the result can be performed. Repeating the determining steps and the adding step for all rows in the list of accepted rows, and removing from the result any identical rows can provide an information closure.
Numerous benefits are achieved by way of the present invention for enabling the use of a relational database to organize information obtained from a semistructured source, such as Web pages on the World Wide Web over conventional Web search techniques. In some embodiments, the present invention is easier to use than conventional user interfaces. The present invention can provide way to automatically propagate information to related tuples. Some embodiments according to the invention are easier for new users to learn than known techniques. The present invention enables data mining to be accomplished using a relational database. These and other benefits are described throughout the present specification.
A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.


REFERENCES:
patent: 4631673 (1986-12-01), Haas et al.
patent: 4918588 (1990-04-01), Barrett et al.
patent: 4918593 (1990-04-01), Huber
patent: 5307484 (1994-04-01), Baker et al.
patent: 5386556 (1995-01-01), Hedin et al.
patent: 5457792 (1995-10-01), Virgil et al.
patent: 5544355 (1996-08-01), Chaudhuri et al.
patent: 5649186 (1997-07-01), Ferguson
patent: 5659729 (1997-08-01), Nielsen
patent: 5692181 (1997-11-01), Anand et al.
patent: 5706501 (1998-01-01), Horikiri et al.
patent: 5706507 (1998-01-01), Schloss
patent: 5708806 (1998-01-01), DeRose et al.
patent: 5708825 (1998-01-01), Sotomayor
patent: 5721851 (1998-02-01), Cline et al.
patent: 5721903 (1998-02-01), Anand et al.
patent: 5737592 (1998-04-01), Nguyen et al.
patent: 5748954 (1998-05-01), Mauldin
patent: 5761663 (1998-06-01), Lagarde et al.
patent: 5806066 (1998-09-01), Golshani et al.
patent: 5826258 (1998-10-01), Gupta et al.
patent: 5870739 (1999-02-01), Davis et al.
patent: 5873079 (1999-02-01), Davis et al.
patent: 5884304 (1999-03-01), Davis et al.
patent: 5895465 (1999-04-01), Guha
patent: 5903893 (1999-05-01), Kleewein et al.
patent: 5943665 (1999-08-01), Guha
patent: 5963949 (1999-10-01), Gupta et al.
patent: 6085190 (2000-07-01), Sakata
patent: 6094645 (2000-07-01), Aggarwal et al.
patent: 6102969 (2000-08-01), Christianson et al.
patent: 6108651 (2000-08-01), Guha
patent: 6108666 (2000-08-01), Floratos et al.
patent: 6167393 (2000-12-01), Davis et al.
patent: 6247018 (2001-06-01), Rheaume
patent: 6263327 (2001-07-01), Aggarwal et al.
patent: 6272495 (2001-08-01), Hetherington
patent: 6295533 (2001-09-01), Cohen
patent: 2001/0013035 (2001-08-01), Cohen
Flores

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for creating an information closure model does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for creating an information closure model, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for creating an information closure model will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3049865

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.