Method and system for normalizing dirty text in a document

Data processing: presentation processing of document – operator i – Presentation processing of document – Layout

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C715S252000, C715S252000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

07003725

ABSTRACT:
A method and system of normalizing dirty text in a document. The present invention creates a thesaurus that evolves over time as new document collections are analyzed. This thesaurus, which is used by an editor, contains standard terms and phrases, and their corresponding variations of these standard terms and phrases. Documents are run through this editor and misspelled words or phrases, joined words, and ad hoc abbreviations are replaced with standard terms from the thesaurus. The present invention also enables normalization of documents in cases where a list of standard terms must be inferred from the corpus of the document. The normalizer will facilitate data mining applications which can not function properly with dirty text, resulting in more accurate analysis of documents. Over time, as the thesaurus evolves, collecting more words and phrases, the process of generating the thesaurus will become more automated.

REFERENCES:
patent: 6353840 (2002-03-01), Saito et al.
patent: 6687873 (2004-02-01), Ballantyne et al.
patent: 2002/0103834 (2002-08-01), Thompson et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for normalizing dirty text in a document does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for normalizing dirty text in a document, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for normalizing dirty text in a document will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3707985

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.