Method and system for mining a document containing dirty text

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06978275

ABSTRACT:
A method and system for mining a document containing dirty text. Dirty text is removed or replaced and the document is processed using a variety of text mining techniques. In one embodiment, dirty text removal and replacement occurs in two stages. In the first stage, a general cleaning occurs on all documents without regard to what domain they belong to or the mining task to be performed. In the second stage, document cleaning is more specific to the anomalies of the domain and the mining task to be performed. In the third stage, the document is processed using a variety of data mining techniques according to the mining task. In one embodiment, the present invention scores and ranks sentences in a document according to their relevance, extracts the highest ranked sentences, and presents a summary. The present invention allows users to leverage existing domain knowledge and can be customized according the domain and task requirements.

REFERENCES:
patent: 4839853 (1989-06-01), Deerwester et al.
patent: 4965763 (1990-10-01), Zamora
patent: 5754938 (1998-05-01), Herz et al.
patent: 5857179 (1999-01-01), Vaithyanathan et al.
patent: 6085206 (2000-07-01), Domini et al.
patent: 6199034 (2001-03-01), Wical
patent: 6308172 (2001-10-01), Agrawal et al.
patent: 6332138 (2001-12-01), Hull et al.
patent: 6374241 (2002-04-01), Lamburt et al.
patent: 6442545 (2002-08-01), Feldman et al.
patent: 6446061 (2002-09-01), Doerre et al.
patent: 6539376 (2003-03-01), Sundaresan et al.
patent: 6567789 (2003-05-01), Baker
patent: 2002/0103834 (2002-08-01), Thompson et al.
patent: 2002/0138528 (2002-09-01), Gong et al.
patent: 2002/0169788 (2002-11-01), Lee et al.
patent: 2002/0178002 (2002-11-01), Boguraev et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for mining a document containing dirty text does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for mining a document containing dirty text, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for mining a document containing dirty text will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3483224

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.