Method of identifying data type and locating in a file

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

704 1, 707530, G06F 1728

Patent

active

059917148

ABSTRACT:
A method of identifying the types of data contained in an electronic file of unknown data type by gathering exemplary files of each data type of interest; counting the number of unique n-grams within each exemplary file; determining a weight for each unique n-gram; listing the unique n-grams in the exemplary files of a particular data type by descending magnitude of weight for each data type of interest; selecting the top m weighted n-grams and their associated weights; establishing a threshold for each data type of interest; selecting a length of data from the electronic file; listing every n-gram in the data selected; giving each listed n-gram, that was also selected, the weight that that n-gram was given for each data type of interest; summing the weights given to each n-gram according to data type; comparing the sums to the thresholds established in order to determine the types, if any, of the selected data; recording the location of the selected data if it is of a data type of interest; stopping if the number of selected lengths of data reached a user-definable number, otherwise selecting another length of data from the file that is the same length as that selected previously, where the newly selected data overlaps with the previously selected data by at least one position; and repeating the steps from listing every n-gram to stopping using the newly selected data.

REFERENCES:
patent: 5062143 (1991-10-01), Schmitt
patent: 5371807 (1994-12-01), Register et al.
patent: 5418951 (1995-05-01), Damashek
patent: 5463773 (1995-10-01), Sakakibara et al.
patent: 5526443 (1996-06-01), Nakayama
patent: 5548507 (1996-08-01), Martino et al.
patent: 5706365 (1998-01-01), Rangarajan et al.
patent: 5717914 (1998-02-01), Husick et al.
patent: 5724593 (1998-03-01), Hargrave, III et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of identifying data type and locating in a file does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of identifying data type and locating in a file, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of identifying data type and locating in a file will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-1233991

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.