Method and apparatus for creating extractors, field...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06571243

ABSTRACT:

COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
This invention relates to structured information retrieval and interpretation from disparate semistructured information resources. A particular application of the invention is extraction of information from public and semipublic databases through worldwide information sources, as facilitated by the Internet.
The Internet provides avenues for worldwide communication of information, ideas and messages. Although the Internet has been utilized by academia for decades, recently public interest has turned to the Internet and the information made available by it. The World Wide Web (or “the Web”) accounts for a significant part of the growth in the popularity of the Internet, due in part to the user-friendly graphical user interfaces (“GUIs”) that are readily available for accessing the Web.
The World Wide Web makes hypertext documents available to users over the Internet. A hypertext document does not present information linearly like a book, but instead provides the reader with links or pointers to other locations so that the user may jump from one location to another. The hypertext documents on the Web are written in the Hypertext Markup Language (“HTML”).
As the popularity of the World Wide Web grows, so too does the wealth of information it provides. Accordingly, there may be many sites and pages on the World Wide Web that contain information a user is seeking. However, the Web contains no built-in mechanism for searching for information of interest. Without a searching mechanism, finding sites of interest would literally be like finding a needle in a haystack. Fortunately, there exist a number of web sites (e.g., YAHOO, ALTA VISTA, EXCITE, etc.) that allow users to perform relatively simple keyword searches.
Although keyword searches are adequate for many applications, they fail miserably for many others. For example, there are numerous web sites that include multiple entries or lists on job openings, houses for sale, and the like. Keyword searches are inadequate to search these sites for many reasons. Keyword searches invariably turn up information that, although matching the keywords, is not of interest. This problem may be alleviated somewhat by narrowing the search parameters, but this has the attendant risk of missing information of interest. Additionally, the search terms supported may not allow identification of information of interest. As an example, one may not be able to specify in a keyword search query to find job listings that require less than three years of experience in computer programming.
Ideally, it would be desirable if information like job listings on multiple web sites could appear as a single relational database so that relational database queries could be utilized to find information of interest. However, there is no standard for the structure of information like job listings on the Web. This problem was addressed in a co-owned, U.S. Pat. No. 5,826,258, in the name of Ashish Gupta, et. al., entitled “Method and Apparatus for Structuring the Querying and Interpretation of Semistructured Information,” which introduced the concept of “Wrappers” for retrieving and interpreting information from disparate semistructured information resources. Wrappers are programs that interact with web sites to obtain information stored in the web site and then to structure it according to a prespecified schema. In a copending U.S. patent application Ser. No. 10/000,235, in the name of Ashish Gupta, et. al. entitled, “Method for Creating an Information Closure Model” methods for forming the information closure of information gathered by a wrapper are disclosed. However, the methods for formulating extractors, field objects and inheritance hierarchies in a wrapper framework of the present invention are heretofore not known in the art.
What is needed is a method of formulating extractors, field objects and inheritance hierarchies for retrieving and interpreting information from semistructured resources for incorporation into a relational database.
SUMMARY OF THE INVENTION
According to the invention, a system is provided for extracting information from a semistructured information source. The system includes a listing stack for holding extracted information. A means for matching at least one extractor to the semistructured information to return a list of potential matches is also included. The system can also include a means for iterating through the list of potential matches and a means for retrieving information from a particular match in the list of potential matches. A means for adding a particular match into the listing stack can also be part of the system.
In another aspect of the present invention, a method for extracting information from a semistructured information source into a listing stack is provided. The step of matching at least one extractor to the semistructured information in order to return a list of potential matches is included in the method. A step of iterating through the list of potential matches can also be part of the method. Information from a particular match in the list of potential matches can be retrieved in another step. The method can also include a step of adding a particular match into the listing stack. Combinations of these steps can extract information from a semistructured information source.
Numerous benefits are achieved by way of the present invention for enabling the use of a relational database to organize information obtained from a semistructured source, such as Web pages on the World Wide Web, over conventional Web search techniques. In some embodiments, the present invention is easier to use than conventional user interfaces. The present invention can provide way to automatically propagate information to related tuples. Some embodiments according to the invention are easier for new users to learn than known techniques. The present invention enables data mining to be accomplished using a relational database. These and other benefits are described throughout the present specification.
A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.


REFERENCES:
patent: 4631673 (1986-12-01), Haas et al.
patent: 4917588 (1990-04-01), Grabener et al.
patent: 4918593 (1990-04-01), Huber
patent: 5307484 (1994-04-01), Baker et al.
patent: 5386556 (1995-01-01), Hedin et al.
patent: 5457792 (1995-10-01), Virgil et al.
patent: 5544355 (1996-08-01), Chaudhuri et al.
patent: 5649186 (1997-07-01), Ferguson
patent: 5659729 (1997-08-01), Nielsen
patent: 5692181 (1997-11-01), Anand et al.
patent: 5706501 (1998-01-01), Horikiri et al.
patent: 5706507 (1998-01-01), Schloss
patent: 5708806 (1998-01-01), DeRose et al.
patent: 5708825 (1998-01-01), Sotomayor
patent: 5721851 (1998-02-01), Cline et al.
patent: 5721903 (1998-02-01), Anand et al.
patent: 5737592 (1998-04-01), Nguyen et al.
patent: 5748954 (1998-05-01), Mauldin
patent: 5761663 (1998-06-01), Lagarde et al.
patent: 5806066 (1998-09-01), Golshani et al.
patent: 5826258 (1998-10-01), Gupta et al.
patent: 5864848 (1999-01-01), Horvitz et al.
patent: 5870739 (1999-02-01), Davis, III et al.
patent: 5873079 (1999-02-01), Davis, III et al.
patent: 5884304 (1999-03-01), Davis, III et al.
patent: 5890147 (1999-03-01), Peltonen et al.
patent: 5903893 (1999-05-01), Kleewein et al.
patent: 5913214 (1999-06-01), Madnick et al.
patent: 5926652 (1999-07-01), Reznak
patent: 5943665 (1999-08-01), Guha
patent: 5956720 (1999-09-01), Fernandez et al.
patent: 5963949 (1999-10-01), Gupta et al.
patent: 5991756 (1999-11-01), Wu
patent: 6009410 (1999-12-01), LeMole et al.
patent: 6029182 (2000-02-

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for creating extractors, field... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for creating extractors, field..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for creating extractors, field... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3056531

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.