Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-03-09
2002-04-23
Black, Thomas (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06377945
ABSTRACT:
FIELD OF THE INVENTION
The present invention concerns a search system for information retrieval, particularly information stored in form of text, wherein a text T comprises words and/or symbols and sequences thereof S, wherein the information retrieval takes place with a given or varying degree of matching between a query Q, wherein the query Q comprises words and/or symbols q and sequences P thereof, and retrieved information R comprising words and/or symbols and sequences thereof from the text T, wherein the search system comprises a data structure for storing at least a part of the text (T), and a metric M which measures the degree of matching between the query Q and retrieved information R, and wherein the search system implements search algorithms for executing a search, particularly a full text search on the basis of keywords KW; and a method in a search system for information retrieval, particularly information stored in the form of text T, wherein a text T comprises words and symbols s and sequences S thereof, wherein the information retrieval takes place with a given or varying degree of matching between a query Q. wherein the query Q comprises words and/or symbols q and sequences P thereof, and retrieved information R comprising words and/or symbols and sequences thereof from the text T, wherein the search system comprises a data structure for storing at least a part of the text T, and a metric M which measures the degree of matching between the query Q and retrieved information R, and wherein the search system implements search algorithms for executing a search, particularly a full text search on the basis of keywords kw, wherein the information in the text T is divided into words and word sequences S, the words being substrings of the entire text separated by word boundary terms and forming a sequence of symbols, and wherein each word is structured as a sequence of symbols in the word forming sequence; and the use of the search system.
BACKGROUND OF THE INVENTION
A tremendous amount of information in various fields of human knowledge is collected and stored in computer memory systems. As the computer memory systems increasingly are linked in public available data communication networks, there has been an increasing effort to develop systems and methods for searching and retrieving information for public or personal use. Present search methods for data have, however, limitations that seriously reduce the possibility of retrieving efficiently and using information stored in this manner.
Information may be stored in the form of different data types, and in the context of information search and retrieval it will be useful to discern between dynamic data and static data. Dynamic data is data that change often and continuously, so that the set of valid data varies all the time, while static data only changes very seldom or never at all. For instance will economic data, such as stock values, or meteorological data be subject to very quick changes and hence dynamic. On the other hand archival storage of books and documents are usually permanent and static data. The concept the volatility of the data relates to how long the information is valid. The volatility of data has some bearing upon how the information should be searched and retrieved. Large volumes of data require some structure in order to facilitate searching, but the time cost of building such structures must not be higher than the time the data is valid. The cost of building a structure is dependent on the data volume and hence the building of data structures for searching the information should take both the data volume and the volatility into consideration. The information collected are stored in databases and these may be structured or unstructured. Moreover, the databases may contain several types of documents, including compound documents which contain images, video, sound and formatted or annotated text. Particularly structured databases are usually furnished with indexes in order to facilitate searching and retrieving the data. The growth of the World Wide Web (WWW) offers a steadily growing collection of compound and hyperlinked documents. A great many of these are not collected in structured databases and no indexes facilitating rapid searching are available. However, the need for searching documents in the World Wide Web is obvious and as a result a number of so-called search engines has been developed, enabling searching at least parts of the information in the World Wide Web.
With a search engine it is commonly understood one or more tools for searching and retrieving information. In addition to the search system proper, a search engine also contains an index, for instance comprising text from a large number of uniform resource locators (URLs). Examples of such search engines are Alta Vista, HotBot with Inktomi technology, Infoseek, Excite and Yahoo. All these offer facilities for performing search and retrieval of information in the World Wide Web. However, their speed and efficiency do by no means match the huge amount of information available on the World Wide Web and hence the search and retrieval efficiency of these search engines leaves much to be desired.
Searching a large collection of text documents can usually be done with several query types. The most common query type is matching and variants of this. By specifying a keyword or set of keywords that has to be present in the queried information the search system retrieves all documents that fulfils this requirement. The basic search method is based on so-called single keyword matching. The keyword p is searched for and all documents containing this word shall be retrieved. It is also possible to search for a keyword prefix p
j
and all documents where this prefix is present in any keyword in the documents, will be retrieved. Instead of searching with keywords, the search is sometimes based on so-called exact phrase matching, where the search uses several single keywords in particular sequence. As well-known by persons skilled in the art, the exact matching of keyword phrases in many search systems may be done with the use of Boolean operators, for instance based on operators such as AND, OR, and NOT which allow a filtering of the information; e.g. using an AND phrase results in that all documents containing the two keywords linked by the AND operator will be returned. Also a NEAR operator has been used for returning just the documents with the keywords matching and located “near” to each other in the document text. In many structured database the documents contained in the database have been annotated, e.g. provided with fields which denote certain parts or types of information in the document. This allows the search for matches in only parts of the documents and is useful when the type of queried information is known in advance.
When searching in text documents the data are structured and most likely present in some natural language, like English, Norwegian etc. When searching for documents with a certain context it is possible to apply proximity metrics for matching keywords or phrases that match the query approximately. Allowing errors in keywords and phrases are common method for proximity, using a thesaurus is another common method. A proximity search requires only that there shall be a partial match between the information retrieved and the query. International published application WO96/00945 titled “Variable length data sequence matching method and apparatus” (Döringer & al.) which has been assigned to International Business Machines, Corp., discloses the building, maintenance and use of a database with a trie-like structure for storing entries and retrieving at least a partial match, preferably the longest partial match or all partial matches of a search argument (input key) from the entries.
SUMMARY OF THE INVENTION
The main object of the present invention is to provide a search system and a method for fast and efficient search and retrieval of information in large volumes of data. Particularly it is an object of the present inve
Black Thomas
Fast Search & Transfer ASA
Jacobson & Holman PLLC
Veillard Jacques
LandOfFree
Search system and method for retrieval of data, and the use... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Search system and method for retrieval of data, and the use..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Search system and method for retrieval of data, and the use... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2822740