Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-04-24
2003-03-18
Kindred, Alford W. (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06535873
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to electronic text and more particularly to searching and indexing of electronic text.
BACKGROUND ART
Current technologies in digital media storage have allowed text to be stored in electronic format on a magnetic medium or an optical medium such as compact disks. Storing text in electronic format has many advantages including space savings and near effortless mass distribution if required. Perhaps the biggest advantage is the ability to quickly search through the electronic text to retrieve the desired information. Two important factors about text searches are the speed and accuracy of the search. With increasing computing power, speed is becoming less of a concern. However, accuracy is an area where significant improvements can still be made. Search accuracy is the ability to search and locate relevant information on the subject of interest. Several criteria have been used to describe search accuracy. Search precision is the fraction of relevant search results returned to all results and search recall (also known as sensitivity) is the fraction of relevant search result returned to all possible relevant results. Therefore, one goal of a search is to increase the search precision without severely reducing the search recall.
The Internet is a gigantic set of databases linked together by a decentralized network. Because of this gigantic array of databases, there is a vast amount of data, or information, that can be searched for relevant information for a subject of interest. However, as the amount of data increases, the search accuracy decreases as there is more extraneous data.
Typical search engines such as Lycos and Infoseek on the Internet use keyword search methods. Keyword search methods involve parsing a document in a database through a search engine and selecting documents or sections that contains the keyword(s). With keyword searches, the search accuracy is usually very low. The keyword search returns many irrelevant results even though the results may contain the keywords. This low accuracy is caused by words having different meaning when in different context and also by search words being in close proximity but not being used together semantically in the text. Even when searching with multiple search keywords using boolean expressions do not yield in significant increases in accuracies. This lack of accuracy may be acceptable in the Internet environment where a user may have ample time to sieve through the irrelevant results. However, mission-critical users in other environments may not be as tolerant as time is of the essence in obtaining the relevant information.
Health-care professionals in clinical environments need precise and timely information if they are to provide optimal patient care. It has been shown that tertiary references such as textbooks or edited reviews, could meet the majority of these information needs. However, precise and timely extraction of information from these tertiary sources calls for the development of a system to efficiently search and index these tertiary sources.
Researches have developed a variety of systems to improve the indexing and searching of medical text sources with the primary goal to increase the search precision without severely reducing recall. For example, in an article titled “MYCIN II: design and implementation of a therapy reference with complex content-based indexing” Proc Amia Symp 1998: 175-179, Kim and associates built MYCIN II, a prototype information retrieval (IR) system capable of searching content-based markup in an electronic textbook on infectious disease. Users select from a pre-determined set of query templates (the query model) a query that is passed to a search engine for processing.
In an article titled “Automated Text Markup for Information Retrieval from an Electronic Textbook of infectious Disease” Proc Amia Symp 1998:975, Berrios and colleagues developed a markup tool that provided the HTML indexing required for the MYCIN II search engine. Because the tools in this system were developed independently with minimal integration, a significant amount of repeated work by the domain expert is required to generate the ontology of concepts in the concept model used by a domain expert during the markup process and the set of questions for the search engine in the query model.
A need therefore exists for a method and a highly integrated system to search and index electronic text for precise information retrieval.
OBJECTS AND ADVANTAGES
Accordingly, it is a primary object of the present invention to provide a method and a highly integrated system that will significantly increase the search precision while reducing the time necessary to prepare a file of electronic text for searching.
SUMMARY
The primary object of the present invention to provide a method and a highly integrated system that will significantly increase the search precision while reducing the time necessary to prepare a file of electronic text for searching.
Accordingly, the present invention consists of an electronic text indexing and search system comprising a concept model, a markup tool, a query model, a query interface, and a search engine.
The concept model defines a set of concept-value pairs. The concept model is modified by a concept model tool and new concept-values can also be added by a query model tool.
The query model defines a set of queries for submission to the search engine in terms of a first subset of concept-value pairs in the concept model. Each query in the query model is a template for a number of possible queries that are defined when a user uses concept-values from a menu.
The markup tool uses the first subset of concept-values used in the query model to create a set of allowable concept-values for assignment. The domain expert assigns the allowable set of concept-value pairs to the text. The markup tool also has the ability to suggest assignment of query and markup tags to the domain expert for marking up the electronic text.
The user query interface is generated automatically by using the query model. The user query interface allows the user to formulate a query to submit to the search engine.
The search engine tries to match the concept-value submitted by the query to the subset assigned by the markup tool. If there are any matches, the search engine will display a results page that displays an excerpt from the text that is found and also gives the user an option to output the query to an external database.
The user query interface can be a computer program that calls a function that selects the concept-values to be submitted to the search engine. The search engine can also output the search results, the concept-values assigned to the search results, or the original concept-values submitted by the query, to an external electronic resource.
REFERENCES:
patent: 5309359 (1994-05-01), Katz et al.
patent: 5404295 (1995-04-01), Katz et al.
patent: 5404506 (1995-04-01), Fujisawa et al.
patent: 5418948 (1995-05-01), Turtle
patent: 5737739 (1998-04-01), Shirley et al.
patent: 5787234 (1998-07-01), Molloy
patent: 5799268 (1998-08-01), Boguraev
patent: 5873056 (1999-02-01), Liddy et al.
patent: 5918232 (1999-06-01), Pouschine et al.
patent: 5940821 (1999-08-01), Wical
patent: 5963940 (1999-10-01), Liddy et al.
Kim et al., MYCIN II: Design and Implementation of a Therapy Reference with Complex Content-Based Indexingk, Stanford Medical Informatics, SMI-98-0752, Oct. 1998.*
Jonathan M. Dugan et al.,Automation and Integration of Components for Generalized Semantic Markup of Electronic Medical Texts, Stanford Medical Informatics.
AMIA'99, Annual Symposium, Nov. 6-10, 1999, Marriott Wardman Park Hotel, Washington DC.
Dugan et al.,Automation and Integration of Components for Generalized Semantic Markup of Electronic Medical Texts, Stanford Medical Informatics, SMI-1999-0792, Apr. 1999.
Berrios et al.,Knowledge Requirements for Automated Inference of Medical Textbook Markup, Stanford Medical Informatics, SMI-1999-0802, Jun. 1999.
Kim et al.,MYCIN II: Design and Implement
Berrios Daniel
Dugan Jonathan
Fagan Lawrence
Kindred Alford W.
Lumen Intellectual Property Services Inc.
The Board of Trustees of the Leland Stanford Junior University
LandOfFree
System and method for indexing electronic text does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for indexing electronic text, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for indexing electronic text will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3059391