Method for generating numerical values indicative of...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06789084

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to information retrieval systems, and, more particularly, to information retrieval systems that direct users to objects containing information (e.g., images, text, and/or sounds).
2. Description of the Related Art
Information retrieval (IR) may be generally described as the study of systems for indexing, storing, searching, and retrieving data relevant to a human user's need for information. Indexing is the process of converting data within a collection of objects (e.g., documents) into a form suitable for easy search and retrieval. The goal of information retrieval (IR) systems is to direct the user to those objects in the collection that will best satisfy the user's need for information.
Almost all information retrieval (IR) systems today accept either Boolean text search queries, or text pattern search queries. Boolean text search queries typically include Boolean combinations of words (e.g., “information AND retrieval,” “vision OR sight,” “python AND (NOT monty)”). Text pattern search queries typically include word strings or phrases (e.g., “great barrier reef,” as opposed to the Boolean expression “great AND barrier AND reef”).
A problem arises in that information retrieval (IR) systems typically rely upon measurements of similarity between objects in a collection during indexing, and between queries and stored data during data retrieval. Words and phrases, used by almost all information retrieval (IR) systems today, may be thought of as being “highly granular.” The resulting “fine granularity” leads to mistakes in the recognition of degrees of similarity among objects during indexing, and between queries and objects during data retrieval. These errors are foreign to human perception that easily shifts the logical frame of reference to compensate for variation of granularity in the similarity recognition task among objects.
For example, if a human subject were to visually scan two different areas of, for example, a beach, the subject would have little difficulty in recognizing both areas as belonging to the same general class of “Beach.” However, if a machine (e.g., a computer) were to consider only grains of sand from the two different areas of the beach, the machine might conclude that, due to differences in, for example, the size, texture, and/or coarseness of the grains of sand, the two areas do not in fact belong to the same general class of Beach.
Computers employed in information retrieval (IR) systems today make similar mistakes when attempting to determine degrees of similarity among objects during indexing, and between queries and objects during data retrieval. As a result, information retrieval (IR) systems are not always highly effective when retrieving data deemed relevant to a human user's need for information.
Known efforts to reduce the granularity problem inherent when using words and phrases for similarity measurement include statistical techniques such as latent semantic indexing (LSI) and singular value decomposition (SVD). In general, such techniques result in a smaller pool of words or phrases upon which to measure similarity among objects during indexing, and between queries and objects during data retrieval.
The Internet is a global network connecting millions of computers worldwide. In 1999, the Internet had over 200 million users in over 100 different countries. The World Wide Web (abbreviated WWW, and often referred to simply as “the Web”) is a portion of the Internet servers supporting documents formatted according to the hypertext markup language (HTML). The hypertext markup language (HTML) supports links to graphics, audio, and video files, as well as links to other HTML documents (i.e. “hyperlinks”). Computer programs called “Web browsers” are commonly used to access HTML documents on the World Wide Web.
For example, assume an HTML document has a link to graphics, audio, and/or video files, as well as a links to other HTML documents. When a computer user accesses the HTML document, the graphics, audio, and/or video files may be displayed on the user's computer. The user may transition from the HTML document to another HTML document simply by clicking on the link to the other HTML document.
The number of HTML documents accessible today via the Web may exceed one billion. “Search engines” are available to aid users in accessing specific HTML documents in this large number of HTML documents. A search engine is a computer program that accepts a user query including words called “keywords,” searches indexed HTML documents for the keywords, and returns a list of the indexed HTML documents including the keywords. The list typically includes hyperlinks to the corresponding HTML documents.
Despite careful design, search engines often return lists containing a large number of “junk results” along with a small number of “meaningful results” a user is interested in. Being more plentiful than the meaningful results, the junk results often obscure the meaningful results. As most users are only willing to look at the first few tens of results, the user may never discover highly relevant HTML documents in a list containing hundreds of results.
As the number of HTML documents available on the Web continues to grow, new information retrieval techniques are needed that will allow search engines to return greater numbers of “meaningful results” and/or smaller numbers of “junk results.”
The present invention is directed to methods that may solve, or at least reduce, some or all of the aforementioned problems, and systems incorporating the method.
SUMMARY OF THE INVENTION
A method is described that may be used to generate numerical values indicative of frequencies of selected features in one or more objects. The method includes arranging columns of a matrix in sum total order, wherein the matrix has one or more rows, and multiple intersecting columns. Each of the rows of the matrix represents a different object, and each of the columns represents a different one of multiple selected features. Values reside at row-column intersections. A given value residing at an intersection of a given row and a given column corresponds to the given row and the given column, and represents the frequency of the selected feature represented by the given column in the object represented by the given row.
The matrix is converted to a binary matrix (e.g., comprising binary values ‘0’ and ‘1’). The columns of the matrix are divided into multiple segments of equal length. The matrix columns encompassed by each segment are replaced by a single column. Values at intersections of the rows and the single columns are set equal to numerical values indicative of ratios of a total number of one of the binary values (e.g., a total number of ‘1’s) in a portion of the corresponding row encompassed by a segment to a total number of the one of the binary values (e.g. a total number of ‘1’s) in the corresponding row.
A computer system embodying the method is also described, as is a carrier medium including program instructions for carrying out the method. The carrier medium may be, for example, a computer-readable storage medium such as a floppy disk or a compact disk read only memory (CD-ROM) disk.


REFERENCES:
patent: 4495522 (1985-01-01), Matsunawa et al.
patent: 5675819 (1997-10-01), Schuetze
patent: 6615163 (2003-09-01), Rasoulian et al.
patent: 2002/0059161 (2002-05-01), Li
patent: 2002/0087508 (2002-07-01), Hull et al.
Joel L. Fagan “Automatic Phrase Indexing for Document Retrieval”, ACM 1987, pp. 91-101.*
Deerwester, S., Dumais, S.T., Landauer, T. K., Furnas, G. W. and Harshman, R. A. (1990), “Indexing by Latent Semantic Analysis,” Journal of the Society for Information Science, 41(6), 391-407.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for generating numerical values indicative of... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for generating numerical values indicative of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for generating numerical values indicative of... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3259569

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.