Text management system

Patent

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Text management system Text management system

: 1994-09-26
: 1999-08-17
: Hofsass, Jeffery

: 395760, 395794, G06F 1520
: Patent
: active
: 059406240
: DESCRIPTION:

BRIEF SUMMARY
BACKGROUND OF THE INVENTION

The invention relates generally to text management systems.
Each year organizations spend countless hours searching through documents and images, organizing filing systems and databases. Even with large information retrieval systems, considerable resources are needed to index documents, guess which key words will locate needed information, search through pages one query at a time, and sort through all the irrelevant data that the search actually yields.
A number of studies evaluating large information retrieval systems show that these systems are retrieving less than 20 percent of the documents relevant to a particular search, and at that the same time only 30 percent of the retrieved information is actually relevant to the intended meaning of the search request. One of the key reasons for poor retrieval results is that the people who perform retrieval only know the general topics of their interest and do not know the exact words used in the texts or in the keyword descriptors used to index the documents.
Another study analyzed how long it would take to index 5000 reports. It was assumed that each user was allowed 10 minutes to review each report, make indexing decisions by selecting the keywords, and record the information. At this rate, it would take 833 hours or 21 weeks for one full-time person (at 40 hours per week) to process the documents. The users would also need extra time to verify and correct the data. Under such an approach, the user must index incoming documents on a daily basis to keep the system from falling hopelessly behind. In addition, since the user chooses the relevant search terms, all unspecified terms are eliminated for search purposes. This creates a significant risk that documents containing pertinent information may not show up during a search because of the user's subjective judgments in selecting keywords.
Many text retrieval systems utilize index files which contain words in the documents with the location within the documents for each word. The indexes provide significant advantages in the speed of retrieval. One major disadvantage of this approach is that for most of the systems the overhead of the index is 50 to 100 percent of the document database. This means that a 100 Mbyte document database will require an index ranging from 50 to 100 Mbytes. This adds mass storage costs and overhead to the system.
Automated indexing processes have been proposed. For example, in the book, INTRODUCTION TO MODERN INFORMATION RETRIEVAL, by Salton and McGill (McGraw Hill, 1983) a process for automatically indexing a document is presented. First, all the words of the document are compared to a stop list. Any words which are in the stop list are automatically not included in the index. Then, the stems of the remaining words are generated by removing suffixes and prefixes. The generated atoms are then processed to determine which will be most useful in the search process. The inverse document frequency function is an example of such a process. The resulting index of this document, and other documents, may then be searched for articles relevant to the user.
The technique of truncating words by deleting prefixes and suffixes has also been applied to reduce storage requirements and accessing times in a text processing machine for automatic spelling verification and hyphenation functions. U.S. Pat. No. 4,,342,,085, issued Jul. 27, 1982 Glickman et al. describes a method for storing a word list file and accessing the word list file such that legal prefixes and suffixes are truncated and only the unique root element, or "stem", of a word is stored. A set of unique rules is provided for prefix/suffix removal during compilation of the word list file and subsequent accessing of the word list file. Spelling verification is accomplished by applying the rules to the words whose spelling is to be verified and application of the said rules provides, under most circumstances, a natural hyphenation break point at the prefix-stem and stem-suffix junctions.

SUMMARY OF THE INVENTI

REFERENCES:
patent: 4342085 (1982-07-01), Glickman et al.
patent: 4864501 (1989-09-01), Kucera et al.
patent: 5323316 (1994-06-01), Kadashevich et al.
patent: 5369577 (1994-11-01), Kadashevich et al.
Salton, G. & McGill, M.J. "Introduction to Modern Information Retrieval," McGraw-Hill Book Company, 1983, pp. 71-75.
Ozkarahan, E. "Database Machines and Database Management," Prentice Hall 1986, pp. 498-522.

Affiliated with

Clark Cheryl

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Harvey Mary F.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kadashevich A. Julie

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Clapp Gary D.

Representative

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hill Andrew

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hofsass Jeffery

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Paglierani Ronald J.

Representative

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wang Laboratories, Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Text management system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Text management system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Text management system will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-324396

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure