Method and apparatus for content identification and...

Image analysis – Pattern recognition – Template matching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06363174

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to data manipulation and categorization in general, and specifically to processing of textual data for categorization, content identification and authentication.
2. Background Information
With the advent of the electronic age and the internet as a useful means for communication and storage of data, there is a need for systems for determining whether a given document was authored by a certain person, whether a given document is in a particular language, or what type of material a given document deals with. This is not well addressed by present methods of textual analysis. At best, currently it is possible to analyze a given document utilizing phrase or key word searches and then have a human look at the results of such analysis in an attempt to determine their authorship, content, or language. What is needed is a methodology that will produce a result that can be more readily analyzed by a computer without human intervention. Additionally, what is needed is a methodology that can look at frequency of character utilization, key word searches and frequency of occurrences of phrases all at once rather than looking at them discretely.
SUMMARY OF THE INVENTION
The present invention provides a method and apparatus for content identification and categorization of data. In one embodiment, a Burrows-Wheeler Transform is performed on a document of textual data to produce a set of transformed textual data. The transformed textual data is divided into a set of one or more intervals. The transformed textual data of that set of intervals is transformed to produce a pattern map. The pattern map is compared to a reference pattern map thereby producing an indication of whether the subject textual data is of a type corresponding to the reference pattern map.


REFERENCES:
patent: 4996707 (1991-02-01), O'Malley et al.
patent: 5077668 (1991-12-01), Doi
patent: 5390259 (1995-02-01), Withgott et al.
patent: 6075470 (2000-06-01), Little et al.
patent: 6119120 (2000-09-01), Miller
M. Burrows and D.J. Wheeler,A Block-sorting Lossless Data Compression Algorithm, Digital Systems Research Center Research Report 124; http://gatekeeper.dec.com/pub/DEC/SRC/research-reports/abstracts/src-rr-124.html.
M. Nelson,Data Compression with the Burrows-Wheeler Transform, Dr. Dobb's Journal Sep. 1996; http://web2.airmail.net/markn/articlesbwt/bwt.htm.
Ning Lu,Fractal Imaging, Chapter 12. Entropy Coding, Academic Press, 1997.
Y. Yang,An Evaluation of Statistical Approaches to Text Categorization, Preprint, Apr. 10, 1997.
W.W. Cohen,Learning to Classify English Text with ILP Methods, Aug. 8, 1995.
D.D. Lewis, R.E. Schapire, J.P. Callan, and R. Papka,Training Algorithms for Linear Text Classifiers, inProceedings of 19th ACM SIGIR Conference on R&D in Information Retrieval, 1996, pp. 298-306.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for content identification and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for content identification and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for content identification and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2873399

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.