Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Patent
1998-02-05
2000-07-18
Thomas, Joseph
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
704257, G06F 1727, G06F 1728
Patent
active
060920386
ABSTRACT:
System and methods for compressing (losslessly) n-gram language models for use in real-time decoding, whereby the size of the model is significantly reduced without increasing the decoding time of the recognizer. Lossless compression is achieved using various techniques. In one aspect, n-gram records of an N-gram language model are split into (i) a set of common history records that include subsets of n-tuple words having a common history and (ii) sets of hypothesis records that are associated with the common history records. The common history records are separated into a first group of common history records each having only one hypothesis record associated therewith and a second group of common history records each having more than one hypothesis record associated therewith. The first group of common history records are stored together with their corresponding hypothesis record in an index portion of a memory block comprising the N-gram language model and the second group of common history records are stored in the index together with addresses pointing to a memory location having the corresponding hypothesis records. Other compression techniques include, for instance, mapping word records of the hypothesis records into word numbers and storing a difference value between subsequent word numbers; segmenting the addresses and storing indexes to the addresses in each segment to multiples of the addresses; storing word records and probability records as fractions of bytes such that each pair of word-probability records occupies a multiple of bytes and storing flags indicating the length; and storing the probability records as indexes to sorted count values that are used to compute the probability on the run.
REFERENCES:
patent: 4342085 (1982-07-01), Glickman, et al.
patent: 5467425 (1995-11-01), Lau, et al.
patent: 5649060 (1997-07-01), Ellozy et al.
patent: 5724593 (1998-03-01), Hargrave, III, et al.
patent: 5794249 (1998-08-01), Orsolono et al.
patent: 5835888 (1998-11-01), Kanevshy et al.
Kanevsky Dimitri
Rao Srinivasa Patibandla
International Business Machines - Corporation
Thomas Joseph
LandOfFree
System and method for providing lossless compression of n-gram l does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for providing lossless compression of n-gram l, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for providing lossless compression of n-gram l will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2047487