Filtering invalid tokens from a document using high IDF...

Data processing: database and file management or data structures – Database and file access – Preparing data for information retrieval

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Filtering invalid tokens from a document using high IDF... Filtering invalid tokens from a document using high IDF...

: 2011-03-15
: 2011-03-15
: Trujillo, James (Department: 2159)
: Data processing: database and file management or data structures
: Database and file access
: Preparing data for information retrieval

: C707S750000, C707S754000, C704S010000, C717S174000
: Reexamination Certificate
: active
: 07908279
: ABSTRACT:
Systems and methods for filtering tokens from a document for determining whether the document describes substantially similar subject matter compared to another document are described. In one embodiment, a first document is obtained. This document is organized into a plurality of fields, and at least some of the fields include tokens representing the subject matter described by the document. A field of this document is selected and a token from within the selected field having the highest inverse document frequency (IDF) is selected. Those tokens that have a higher IDF than the selected token are removed. Using the remaining tokens, a determination is made as to whether the first document describes substantially similar subject matter to the subject matter described by a second document. An indication is provided as to whether the first document describes substantially similar subject matter to that described by a second document according to the determination.

REFERENCES:
patent: 4849898 (1989-07-01), Adi
patent: 5062074 (1991-10-01), Kleinberger
patent: 5261112 (1993-11-01), Futatsugi
patent: 5835892 (1998-11-01), Kanno
patent: 5960383 (1999-09-01), Fleischer
patent: 6038561 (2000-03-01), Snyder
patent: 6075896 (2000-06-01), Tanaka
patent: 6076086 (2000-06-01), Masuichi
patent: 6167398 (2000-12-01), Wyard
patent: 6173251 (2001-01-01), Ito
patent: 6263121 (2001-07-01), Melen
patent: 6606744 (2003-08-01), Mikurak
patent: 6810376 (2004-10-01), Guan
patent: 6961721 (2005-11-01), Chaudhuri et al.
patent: 7113943 (2006-09-01), Bradford
patent: 7346839 (2008-03-01), Acharya
patent: 7386441 (2008-06-01), Kempe
patent: 7426507 (2008-09-01), Patterson
patent: 7529756 (2009-05-01), Haschart et al.
patent: 7562088 (2009-07-01), Daga et al.
patent: 7567959 (2009-07-01), Patterson
patent: 7599914 (2009-10-01), Patterson
patent: 7603345 (2009-10-01), Patterson
patent: 2002/0016787 (2002-02-01), Kanno
patent: 2003/0065658 (2003-04-01), Matsubayashi
patent: 2003/0101177 (2003-05-01), Matsubayashi
patent: 2006/0112128 (2006-05-01), Brants
patent: 2006/0282415 (2006-12-01), Shibata
patent: 2007/0067157 (2007-03-01), Kaku et al.
patent: 2009/0119281 (2009-05-01), Wang et al.
patent: 2009/0204609 (2009-08-01), Labrou et al.
patent: 1 380 966 (2004-01-01), None
Bilenko et al, ‘Adaptive Name Matching in Information Integration’, 2003, IEEE Computer Society, pp. 16-23.
J. Ramos, ‘Using TF-IDF to Determine Word Relevance in Document Queries’, 2001, Citeseer, pp. 1-4.
A. Kilgarriff, ‘Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity between Corpora’, 1997, Citeseer, pp. 231-245.
Conrad et al, ‘Online Duplicate Document Detection: Signature Reliability in a Dynamic Retrieval Environment’, Nov. 3-8, 2003, ACM, CIKM '03, pp. 443-452.
Ghahrmani, Z., and K.A. Heller, “Bayesian Sets,” Advances in Neural Information Processing Systems 18 (2006), 8 pages.
“Google Sets,” ©2007 Google, <http://labs.google.com/sets> [retrieved Feb. 13, 2008].

Affiliated with

Emery Grant M.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Manoharan Aswath

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Mohan Vijai

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Terra Egidio

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Thirumalai Srikanth

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Amazon Technologies Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kowert Robert C.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Meyertons Hood Kivlin Kowert & Goetzel P.C.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Shechtman Cheryl M

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Trujillo James

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Filtering invalid tokens from a document using high IDF... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Filtering invalid tokens from a document using high IDF..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Filtering invalid tokens from a document using high IDF... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2699907

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure