System and method for finding information in a distributed...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06418432

ABSTRACT:

BACKGROUND OF THE INVENTION
(1) Field of the Invention
This invention relates to information retrieval systems. More particularly, the invention relates to information retrieval in distributed information system, e.g Internet using query learning and meta search.
(2) Description of the Prior Art
The World Wide Web (WWW) is currently filled with documents that collect together links to all known documents on a topic; henceforth, we will refer to documents of this sort as resource directories. While resource directories are often valuable, they can be difficult to create and maintain. Maintenance is especially problematic because the rapid growth in on-line documents makes it difficult to keep a resource directory up-to-date.
This invention proposes to describe machine learning methods to address the resource directory maintenance problem. In particular, we propose to treat a resource directory as an extensional definition of an unknown concept
i.e. documents pointed to by the resource list will be considered positive examples of the unknown concept, and all other documents will be considered negative examples of the concept. Machine learning methods can then be used to construct from these examples an intensional definition of the concept. If an appropriate learning method is used, this definition can be translated into a query for a WWW search engine, such as Altavista, Infoseek or Lycos. If the query is accurate, then re-submitting the query at a later date will detect any new instances of the concept that have been added. We will present experimental results on this problem with two implemented systems. One is an interactive system
an augmented WWW browser that allows the user label any document, and to learn a search query from previously labeled examples. This system is useful in locating documents similar to those in a resource directory, thus making it more comprehensive. The other is a batch system which repeatedly learns queries from examples, and then collects and labels pages using these queries. In labeling examples, this system assumes that the original resource directory is complete, and hence can only be used with a nearly exhaustive initial resource directory; however, it can operate without human intervention.
Prior art related to machine learning methods includes the following:
U.S. Pat. No. 5,278,980 issued Jan. 11, 1994 discloses an information retrieval system and method in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of a document, and which returns any matches between the search key and the corpus of a documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching work data, and all intervening stop—words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop words as new query words to reformulate the search key and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop words for each phrase are preferably aligned with each other (e.g., columination) to ease viewing of the “new” content words.
Other prior art related to machine learning methods is disclosed in the references attached to the specification as Appendix 1.
None of the prior art discloses a system and method of adding documents to a resource directory in a distributed information system by using a learning means to generate from training data a plurality of items as positive and/or negatives examples of a particular class and using a learning means to generate at least one query that can be submitted to any of a plurality of methods for searching the system for a new item, after which the new item is evaluated by learning means with the aim of verifying that the new item is a new subset of the class.
SUMMARY OF THE INVENTION
An information retrieval system finds information in a Distributed Information System (DIS), e.g. the Internet using query learning and meta search for adding documents to resource directories contained in the DIS. A selection means generates training data characterized as positive and negative examples of a particular class of data residing in the DIS. A learning means generates from the training data at least one query that can be submitted to any one of a plurality of search engines for searching the DIS to find “new” items of the particular class. An evaluation means determines and verifies that the new item(s) is a new subset of the particular class and adds or updates the particular class in the resource directory.


REFERENCES:
patent: 5278980 (1994-01-01), Pedersen et al.
patent: 5488725 (1996-01-01), Turtle et al.
patent: 5491820 (1996-02-01), Belove et al.
patent: 5530852 (1996-06-01), Meske, Jr. et al.
patent: 5572643 (1996-11-01), Judson
patent: 5623652 (1997-04-01), Vora et al.
patent: 5717914 (1998-02-01), Husick et al.
patent: 5768578 (1998-06-01), Kirk et al.
patent: 5867799 (1999-02-01), Lang et al.
patent: 6081750 (2000-06-01), Hoffberg et al.
J. Kunze, IS&T UC Berkley, Feb. 1995, “Functional Recommandations for Internet Locators”, pp. 1-10.
K. Sollins, MIT/LCS—L. Masinter, Xerox Corporation, Dec. 1994, “Functional Requirements for Uniform Resource Names”, pp. 1-7.
T. Beners-Lee, Cern, Jun. 1994, “Universal Resource Indentifiers in WWW”, pp. 1-25.
R. Fielding, UC Irving, Jun. 1995, Relative Uniform Resource Locators, pp. 1-16.
M. Orton Et Al, AT&T Bell Laboratories, -Dec.m 1987, “Standard for Interchange of USENET Messages”, pp. 1-19.
T. Berners Et Al, Xerox Corporation, Dec. 1994, Uniform Resource Locators (URL), pp. 1-25.
Donald H. Jones, IEEE Expert Magazine, Dec. 1995, “A Model for Commerce On The World Wide Web”, pp. 54-59.
Armstrong, D. Frietag, T. Joachims, and T.M. Mitchell. WebWatcher: a learning apprentice for the world wide web. In Proceedings of the 1995 AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments. Stanford, CA, 1995. AAAI Press.
Avrim Blum. Learning boolean functions in a infinite attribute space. In 22bnAnnual Symposium on the Theory of Computinf. ACM Press, 1990.
(Avrim Blum. Empirical support for WINNOW and weighted majority alogrithms: results on a calendar scheduling domain. In Machine Learning: Proceedingd of the Twelfth International Conference, Lake Taho, California, 1995. Morgan Kaufmann.
Nicolò Cesa-Bianchi, Yoav Freund, David P. Helmbold, David Haussler, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. In Proceedings of the Twenty-Fifth annual ACM Symposium on the Theory of Computing, pp. 382-391, May 1993. Submitted to the Journal of the ACM.
William W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference, Lake Taho, California, 1995. Morgan Kaufmann.
William W. Cohen. Learning to claddify English text with ILP methods. In Luc De Raedt, editor, Advances in ILP. IOS Press, 1995.
William W. Cohen. Text categorization and relational learning. In Machine Learning: Proceedings of the Tewlfth International Conference, Lake Taho, California, 1995. Morgan Kaufmann.
William W. Cohen. Learning with set-valued features. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland Oregon, 1996.
Ido Dagan and Shaun Engelson. Committee-based sampling for training probabilistuc classifiers. In Machine Learning: Proceedingd of the TwelfthInternational Conference, Lake Taho, California, 1995. Morgan Kauffman.
Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the Second European Conference on Computational Learning Theory, pp. 23-27. Springer-Verlag, 1995.
Donna Harman. overview of the second text retrieval conference (TREC-2). Information Processing and management, 3:271-289, 1995.
Robert Holte, Laine A

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for finding information in a distributed... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for finding information in a distributed..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for finding information in a distributed... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2816813

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.