Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Patent
1998-04-22
1999-11-23
Isen, Forester W.
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
704 1, 707530, G06F 1728
Patent
active
059917148
ABSTRACT:
A method of identifying the types of data contained in an electronic file of unknown data type by gathering exemplary files of each data type of interest; counting the number of unique n-grams within each exemplary file; determining a weight for each unique n-gram; listing the unique n-grams in the exemplary files of a particular data type by descending magnitude of weight for each data type of interest; selecting the top m weighted n-grams and their associated weights; establishing a threshold for each data type of interest; selecting a length of data from the electronic file; listing every n-gram in the data selected; giving each listed n-gram, that was also selected, the weight that that n-gram was given for each data type of interest; summing the weights given to each n-gram according to data type; comparing the sums to the thresholds established in order to determine the types, if any, of the selected data; recording the location of the selected data if it is of a data type of interest; stopping if the number of selected lengths of data reached a user-definable number, otherwise selecting another length of data from the file that is the same length as that selected previously, where the newly selected data overlaps with the previously selected data by at least one position; and repeating the steps from listing every n-gram to stopping using the newly selected data.
REFERENCES:
patent: 5062143 (1991-10-01), Schmitt
patent: 5371807 (1994-12-01), Register et al.
patent: 5418951 (1995-05-01), Damashek
patent: 5463773 (1995-10-01), Sakakibara et al.
patent: 5526443 (1996-06-01), Nakayama
patent: 5548507 (1996-08-01), Martino et al.
patent: 5706365 (1998-01-01), Rangarajan et al.
patent: 5717914 (1998-02-01), Husick et al.
patent: 5724593 (1998-03-01), Hargrave, III et al.
Edouard Patrick N.
Isen Forester W.
Morelli Robert D.
The United States of America as represented by the National Secu
LandOfFree
Method of identifying data type and locating in a file does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of identifying data type and locating in a file, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of identifying data type and locating in a file will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1233991