Classification of information sources using graph structures

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C345S440000, C345S215000, C345S215000

Reexamination Certificate

active

06598043

ABSTRACT:

FIELD OF THE INVENTION
The invention relates to methods and apparatus for the classification of information sources and the display of information to a user.
BACKGROUND OF THE INVENTION
The increasing popularity of high-speed computer networking has made large amounts of data available to individuals. Methods used in the past for dealing with information were adequate when the amount of information was small, but they do not scale up to handle the enormous amount of information that is now easily accessible.
Research is a fundamental activity of knowledge workers, whether they are scientists, engineers or business executives. While each discipline may have its own interpretation of research, the primary meaning of the word is a “careful and thorough search.” In most cases, the thing one is searching for is information. In other words, one of the most important activities of modern educated individuals is searching for information. Whole industries have arisen to meet the need for thorough searching. These include libraries, newspapers, magazines, abstracting services and online search services.
Not surprisingly, the search process itself has been studied at least since the 1930s, and a standard model was developed by the mid-1960s. In this model, the searcher has an “information need” which the searcher tries to satisfy using a large collection or “corpus” of information sources. The information sources that satisfy the searcher's needs are the “relevant” information sources. The searcher expresses an information need using a formal statement called a “query.” Queries may be expressed using topics, categories and/or words. The query is then given to a search intermediary. In the past, the intermediary was a person who specialized in searching. It is more common today for the intermediary to be a computer system. Such systems are called information retrieval systems or online search engines. The search intermediary tries to match the topics, categories and/or words from the query with information sources in the corpus. The intermediary responds with a set of information sources that, so it is hoped, satisfies the searcher's needs.
Also, in accordance with the standard model, another very commonly used technique to find information in a corpus is to start with a document and then follow citations or references within the document to find other documents in the corpus. References in these documents are then used to find further documents. This technique is called “browsing” and online browsing tools are now becoming very popular. Such tools allow a searcher to quickly follow references contained in information sources, often by simply “clicking” on a word or picture within the information source. In the standard model for information retrieval, a sharp distinction is made between searching using queries and searching using references.
Computerized search engines have been developed to assist in information retrieval. Some are primarily based on matching words in a query with words in text documents. In practice, this means that this type of search engine cannot search effectively for features of images and other kinds of multimedia. Non-word based techniques currently employ approaches to extracting relevant information that are different and distinct from those used in word based systems and generally involve extracting data “features” from the raw data. Features of images, sound and video streams can be represented in a computer system as a set of data structures stored in a database.
Features can be as simple as the value of an attribute such as brightness of an image, but many features are more complicated and are thus represented using a complex data structure. Typically, features can be extracted from structured documents by parsing the document to produce data structures, and can be extracted from unstructured documents by using one of the many feature extraction algorithms that have been developed for implementation on a computer. As in the case of structured documents, feature extraction from an unstructured document produces data structures.
A large variety of feature extraction algorithms has been developed for media such as sound, images and video streams. For a discussion of such algorithms, see
The Ninth International Conference on Image Analysis and Processing
, A. Del Bimbo, editor, v. 1311, Springer Verlag and Company, September 1997, which is incorporated in its entirety by reference.
The data structures that represent features typically conform to a “data model” for the database that determines the kinds of components and attribute values that are allowed. Each feature can have one or more values associated with components of the data structure that represents the feature. In the simplest case, the data structure can have a single component with an associated value, and the feature can be represented by one attribute of the object. Features that are more complex can be represented by several inter-related components, each of which may have attribute values. The data model for features at the domain level is often called an “ontology.” An ontology models knowledge within a particular domain, such as, for example, medicine. An ontology can include a concept network, specialized vocabulary, syntactic forms and inference rules. In particular, an ontology specifies the features that objects can possess as well as how to extract features from objects. When the extracted features are represented as a computer data structure, the data structure is called a “knowledge representation” of the information source.
In the standard model, the quality of a search is measured using two numbers. The first number represents how thorough the search was. It is the fraction of the total number of relevant information sources that are presented to the searcher. This number is called the “recall.” If the recall is less than 100%, then some relevant information sources have been missed. The second number represents the fraction of the total number of information sources that are presented to the searcher that are judged to be relevant. This number is called the “precision.” If the precision is less than 100%, then some irrelevant information sources were presented to the searcher.
The recall can always be increased by adding many more information sources to those already presented, which can decrease the precision. Similarly, the precision can be increased by reducing the number of references retrieved and presented to the searcher, which can decrease the recall. Ideally, the recall and precision should be balanced so as to achieve a search that is as careful and thorough as possible. However, typical online search engines can achieve only about 60% recall and 40% precision. Surprisingly, these performance rates have not changed significantly in the last 20 years.
The standard model for information retrieval uses recall and precision as measures of “relevance.” Relevance is a central concept in human (as opposed to computer) communication. This was recognized already in the 1940s when information science was first being formed as a discipline. The first formal in-depth discussion of relevance occurred in 1959, and the topic was discussed intensively during the 1960s and early 1970s. As a result of such discussions, researchers began to study relevance from a human perspective. The two best-known studies were by Cuadra and Katter and by Rees and Schultz, both of which appeared in 1967. The main conclusions of these studies are that the recall and precision rates used in the standard model for information retrieval do not accurately represent how people perceive relevance. People perceive an information source to be relevant if it extends their knowledge and, thus, relevance is determined by the difference between what is known and what is yet to be known. For example, if a search uncovers an information source that is already known to a searcher, the searcher will consider the source to be redundant rather than relevant. However, in accordance with the standard model for inf

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Classification of information sources using graph structures does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Classification of information sources using graph structures, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Classification of information sources using graph structures will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3070208

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.