Systems and methods for organizing text

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Type

Reexamination Certificate

Status

active

Patent number

06411962

Description

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of Invention
This invention relates to systems and methods for organizing a collection of electronic text passages.
2. Description of Related Art
Document retrieval systems, such as World-Wide Web search engines, typically produce a set of result documents in response to a user's query. These search results are organized as a linear list of documents, typically ranked according to a degree of matching with the query. The documents are typically displayed by document title, and, in some cases, are accompanied with a short extract from the beginning of the document, or an excerpted summary that is obtained from the document. The user navigates by viewing the list of titles and/or the extracted text, and successively accessing the documents in an arbitrary order. Words in the extracted documents that correspond to the words used in the query may be highlighted to facilitate review of the document by the user.
U.S. Pat. No. 5,708,825 discloses a system that uses automatically-identified terms to navigate or index document content, without requiring a query to be supplied by a user. This system automatically produces term-based indices. The indexed terms are presented as an alphabetically ordered list.
U.S. Pat. Nos. 5,519,608 and 5,696,962 describe document retrieval systems in which a user inputs a query in natural language, and in which terms are produced that are responsive to the query. The terms are called “answer hypotheses” because they are chosen as being possible answers when specific questions are input.
The World-Wide Web search engine Excite produces words or terms as an aid to the user in formulating a new query. In this system, search results are presented traditionally, as simple ranked lists of document titles, each with attendant summary information intended to be representative of the document as a whole.
The Hyper-Index Browser Prototype generates a “hyper-index” from the search results for a query and allows navigation by terms created from the search results, and also uses the terms for purposes of query expansion. It appears that all result terms shown to the user contain words that were part of the query. It further appears that all terms presented to the user must include all of the query terms.
U.S. Pat. Nos. 4,972,349 and 5,062,074 describe methods that recursively segment a document collection into separate non-overlapping groups of whole documents. Each new group is determined by the most frequently occurring word occurring in the current group, and labeled by that word. The recursive application of this method yields a hierarchical, or “tree”, description. This hierarchy is organized according to a maximum frequency count of a word.
SUMMARY OF THE INVENTION
This invention provides systems and methods for organizing text content of one or more text passages, such as text passages obtained in response to a search query, and/or other text passages, not obtained in response to a search query, using an organization based on concept terms obtained from the one or more text passages.
This invention separately provides methods and/or systems for organizing text content of at least one text passage, which may or may not have been obtained in response to a search query.
A hierarchical structure is used to organize the documents in a way that informs the user about co-occurrence relationships among terms that represent concepts, indicating the relative degree of co-occurrence and context of discussion of the terms within the search results.
In various exemplary embodiments, a plurality of terms from the at least one text passage are automatically selected, and at least some of the plurality of selected terms are organized into a hierarchy according to co-occurrence relationships among the some of the plurality of terms. The hierarchy is then displayed.
Before displaying a final hierarchy, one or more candidate hierarchies may be generated, with one or more respective candidate terms placed in the most-dominant position of the hierarchy or respective hierarchies. The one or more candidate hierarchies can be evaluated, and a final hierarchy for display can be selected based on the evaluation.
Selectable elements may be associated with at least one term of a hierarchy such that, when the selectable element is selected, a text passage associated with the term is displayed. In some exemplary embodiments, the display space required to indicate the content of many documents is reduced. This allows a user to view more results in a given display frame of a display device.
In some exemplary embodiments, terms are used that expose terminology contained in search results. This improves user feedback and provides the user with at least a preliminary indication of the content of the results, beyond the terminology used in a search query.
In some exemplary embodiments, organization continues until the text has been broken into the smallest possible concepts. This provides a finer level of description.
In the systems and methods according to this invention, document content can be summarized with or without a query supplied by a user. Furthermore, the internal content of documents, rather than entire documents, can be organized. This allows a finer level of description.
Additionally, terms can be organized according to their co-occurrence with other terms in a document or group of documents. This allows a finer level of description than when words or terms are organized only their individual maximum frequency in a given group of documents.
Furthermore, in the systems and methods according to this invention, rather than relying on a single frequently-occurring word to label a group of different documents, a label term is used to label text units containing that term. The relation between a label term and a text unit containing the label term is therefore more clear than in the above-described prior method that uses a single label to characterize a group of whole documents.
Additionally, according to this invention, text units from a document may be referred to from arbitrary places in the tree. For example, the text units reached from a selectable element associated with a particular term may freely mix the content of several different documents. This provides a more useful organization than in the above-described prior methods in which, once a document is assigned to a label, that document's content cannot be referred to by any parts of the tree that are not dominated by the label. Furthermore, according to this invention, document content need not be segmented into non-overlapping groups. Rather, overlapping tree relationships can be built on the same content.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of exemplary embodiments.


REFERENCES:
patent: 4972349 (1990-11-01), Kleinberger
patent: 5062074 (1991-10-01), Kleinberger
patent: 5519608 (1996-05-01), Kupiec
patent: 5619709 (1997-04-01), Caid et al.
patent: 5696962 (1997-12-01), Kupiec
patent: 5708825 (1998-01-01), Sotomayor
patent: 5794050 (1998-08-01), Dahlgren et al.
patent: 5963940 (1999-10-01), Liddy et al.
patent: 5966126 (1999-10-01), Szabo
patent: 6076088 (2000-06-01), Paik et al.
patent: 6137911 (2000-10-01), Zhilyaev
patent: 6154213 (2000-11-01), Rennison et al.
patent: 6185550 (2001-02-01), Snow et al.
patent: 6199067 (2001-03-01), Geller
patent: 6236987 (2001-05-01), Horowitz et al.
“Deriving Concept Hierarchies From Text,” Mark Sanderson and Bruce Croft, SIGIR '99 8/99 Berkley, CA, pp. 206-213.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Systems and methods for organizing text does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Systems and methods for organizing text, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Systems and methods for organizing text will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2894570

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.