Method and system for text analysis based on the tagging,...

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06658377

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to a computer-implemented method and system of text analysis for enabling automated text processing.
2. Description of Related Art
In their pursuit of greater efficiency and profitability, businesses are increasingly replacing manual systems with computer technology and automated systems. Improvements in text processing technology and the development of electronic networks such as the Internet now readily permit spoken, handwritten, and scanned text to be recognized, processed, and stored as computer-accessible data. In view of the high cost of manually processing from the content of such text, it is desirable to use computer technology to automatically derive knowledge therefrom. However, the nearly infinite variety of written and spoken text has proved to be an obstacle to the development of automated systems for analyzing content and deriving information from text.
Prior art technologies for automated text analysis can generally be categorized as “upstream technologies” that address the complexities of language (such as linguistics and natural language processing.), and “downstream technologies” that are directed to enabling computers to handle knowledge (such as machine understanding and artificial intelligence). These different technologies are usually applied in isolation from each other. This isolation has inhibited the overall potential for automated text analysis.
The prior art language and text analysis systems typically include a database module and a processing module. The database module contains definitions and/or semantic information corresponding to individual words. The processing module customarily performs a variety of processes upon the language or text to provide a simplified representation of the text that can be processed by a computer. Language or text analysis systems of this type are used by search engines and other information retrieval systems.
Certain prior art processing modules provide each word with a semantic tag and are therefore referred to as “taggers”. Processing modules can also be used to decompose a stream of text into individual sentences, fragments, and words. These individual words are sometimes referred to as “tokens,” and this analysis step is referred to as “tokenization”. Following tokenization, the stream of text can be subjected to further semantic or linguistic processing such as identification of basic units of grammar, subdivision into corresponding fragments, and application of higher level algorithms.
The prior art language and text analysis systems are subject to several known disadvantages. Some prior systems require symbolic representations of the words and/or tokens. Many prior systems are characterized by excessive and unnecessary levels of processing. Furthermore, to analyze text, many prior art systems require an understanding of the precise meaning of the language or text.
The prior art language and text analysis systems cannot readily be configured to automatically determine such information as the relevance, weight or quantification of language or text. In particular, the prior art systems are not effective in deriving such information in correlation with the underlying purpose of the text. Such a system would be particularly suitable and advantageous in automatically processing data acquired in response to a particular inquiry, such as survey results.
It would be an advantage to provide a method and system for automatically analyzing text. It would be a further advantage if such method and system were available to automatically convert text into a format that could be further automatically processed to derive information regarding text content.
SUMMARY AND OBJECTS OF THE INVENTION
The invention is a method and system for text analysis. In the invention, a computer is used to analyze, parse, and manipulate natural language text according to a series of specific steps. Text is decomposed into small, homogenous segments that can be readily correlated to one another, to quantitative data, or to a knowledge database. The invention thereby enables the automated processing, analysis, and comparison of differing text streams to derive information and conclusions therefrom, and/or to build new or add to existing knowledge databases.
In the preferred embodiment of the present invention, the words of an input text are labeled with semantic tags. In one embodiment, the input text is acquired from a response to one or more input requests or prompts, such as survey. A series of operations are then performed on the semantically labeled input text. These operations can include splitting text, translating idioms, combining text, editing word tags, deleting unnecessary or superfluous words, identifying phrases or combinations, and rearranging expressions.
Text fragments are portions of the input text that are obtained as an output of any intermediate step of the present invention. The combination of words that is generated at the completion of the text analysis is a segment that can then be further processed, for example, by a computer to derive statistical information, to generate a report, or to build a knowledge database.
In another embodiment, the various operations to be performed upon the text portions comprise the steps of searching the text for particular combinations of words and/or tags, and changing the combination according to a corresponding prescription or rule. In yet another embodiment, the step of providing each word with a semantic tag can be accomplished using a commercially available tagging program, such as CLAWS, developed by the University of Lancaster, England.
In a further embodiment, a initial preparation step can be performed upon the roughly separable text portions; this initial step can be done prior to the other recited steps, such as the step of providing words with semantic tags. This initial preparation step may include spell-checking, character replacement, parsing the roughly separable text into smaller preliminary fragments and/or a variety of other cleaning operations. This step may have, as one purpose, the effect of formatting the stream of text to fit a set of proscribed parameters for a commercially available tagging program.


REFERENCES:
patent: 4965763 (1990-10-01), Zamora
patent: 5708825 (1998-01-01), Sotomayor
patent: 5878398 (1999-03-01), Tokuda et al.
patent: 5893075 (1999-04-01), Plainfield et al.
patent: 6055494 (2000-04-01), Friedman
patent: 6138088 (2000-10-01), Goeser
patent: 6289304 (2001-09-01), Grefenstette
Friedman, et al.: “Natural Language Processing in an Operational Clinical Informaiton System”, Natural Language Engineering, vol. 1, No. 1, pp. 83-108 (Mar. 7, 1995).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for text analysis based on the tagging,... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for text analysis based on the tagging,..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for text analysis based on the tagging,... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3106092

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.