Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-01-07
2002-05-21
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S253000, C704S254000, C382S186000, C382S187000, C709S241000
Reexamination Certificate
active
06393395
ABSTRACT:
FIELD OF THE INVENTION
The invention relates generally to computer systems, and more particularly to the recognition of handwriting and speech.
BACKGROUND OF THE INVENTION
Users who attempt to input information into desktop and hand-held computers via writing or speech can experience many recognition errors. This significantly slows the rate at which information is input, and significantly frustrates users. Improved recognition accuracy is continually sought.
The accurate recognition of cursive handwriting, for example, is a formidable task. A first difficulty arises in that cursive handwriting is initially represented as large quantities of coordinate pairs coming in via a digitizer over time, which must be processed in some manner. The higher the resolution of the digitizer, the more coordinate pairs are provided. To directly recognize handwriting from the coordinate data is beyond the capabilities of ordinary computers, and thus some pre-processing needs to be done on the data to make it more manageable.
One type of recognizer is based on a time-delayed neural network. In one such recognizer, described in the publication
“Recognizing Cursive Handwriting
,” David E. Rumelhart, Computational Learning & Cognition, Proceedings of the Third NEC Research Symposium, a neural network is trained to recognize a number of feature values representing known words. For example, two such values represent the net motions in the x and y directions, respectively, for that word. After training, when later attempting recognize a word, an unknown input word is featurized according to the criteria on which the neural network was trained, and the features therefor are fed into the neural network. The neural net outputs a probability for possible letters (a-z) in the word, and a dynamic programming procedure finds the best fitting words from a dictionary to produce a ranked ordering of words.
While the above recognition technique clearly works to an extent, tests on large numbers of samples have shown an approximately seventeen percent average error rate in recognition. This is inadequate for most user applications. Thus, while neural networked-based recognition is a promising recognition technique, improving the recognition accuracy is needed in order for practical applications to benefit therefrom.
SUMMARY OF THE INVENTION
Briefly, the present invention provides a system and method that improve the recognition accuracy of a time-delayed neural network-based handwriting or speech recognizer via an improved training method, improvements in pre-processing and an improved neural network model architecture. To recognize handwriting, in a first preprocessing step, a partitioning mechanism partitions a user's handwritten electronic ink into lines of ink, or alternatively, into proposed words. A second step (via a mechanism for implementing same) smoothes and resamples the ink to reduce any variability resulting from different writing speeds and sizes, and also eliminates jagged edges. The resampling is based on the second derivative of the ink over a particular area, which accentuates the number of points at the curves and cusps of a character as opposed to the straight portions of a character. A third step examines the smoothed ink in time order to identify delayed strokes, i.e., strokes made with dotted “i” or crossed “t” or “x” characters, which otherwise might potentially confuse the neural net. Delayed strokes are removed from the ink and recorded as feature information.
A segmenter provides a fourth step in which the recognizer process separates the ink into distinct segments based on the y-minima thereof. A featurizer implements a fifth step to featurize the segmented ink into a number of features, including Chebyschev coefficients, size and other stroke related information. A sixth step then runs the features for each segment, including the delayed stroke feature information, through a time delayed neural network.
The time-delayed neural network records the output in an x-y matrix, where the x-axis represents the strokes over time and the y-axis represents letter output scores assigned by the neural network for each letter. The improved architecture of the time-delayed neural network of the present invention outputs a separate score for whether a character is starting or continuing. In a seventh step, for every word or phrase generated from a trie structured dictionary and language model, a dynamic time warp (DTW) is run to find the most probable path through the output matrix for that word or phrase. Words or phrases are assigned a score based on the least costly path that can be traversed through the output matrix, and based on the assigned scores, the best words or phrases are returned from the recognizer. Note that as used herein the term phrase is intended to mean any plurality of words, whether they constitute a grammatically proper phrase, a complete sentence, or just any set of words not necessarily associated with one another.
A recognizer training method is also provided, the method using data labeled only at the word or phrase level. In general, the method enforces the correct number of letters and the correct order of the letters to be learned at the network train time. To this end, the neural network is started with initially random weights, and for each word or phrase input during training, the ink is featurized as described above and run through the neural network at that point. The label for the word is known, whereby a DTW matrix for that word is computed as at recognition time, recording the path backwards taken at each matrix cell. The cell in the upper-right corner of the matrix is then followed backwards to find the optimal path, setting a target of one for every network output that corresponds to the path, and a zero everywhere else.
Speech recognition based on phoneme information instead of stroke information may also employ the recognition steps, improved neural network architecture and the training method of the present invention to increase recognition accuracy.
Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:
REFERENCES:
patent: 3111646 (1963-11-01), Harmon
patent: 3127588 (1964-03-01), Harmon
patent: 3133266 (1964-05-01), Frishkopf
patent: 3969698 (1976-07-01), Bollinger et al.
patent: 3996557 (1976-12-01), Donahey
patent: 4610025 (1986-09-01), Blum et al.
patent: 4731857 (1988-03-01), Tappert
patent: 4754489 (1988-06-01), Bokser
patent: 4764972 (1988-08-01), Yoshida et al.
patent: 4918740 (1990-04-01), Ross
patent: 4933977 (1990-06-01), Ohnishi et al.
patent: 4987603 (1991-01-01), Ohnishi et al.
patent: 5034989 (1991-07-01), Loh
patent: 5052043 (1991-09-01), Gaborski
patent: 5313527 (1994-05-01), Guberman et al.
patent: 5440651 (1995-08-01), Martin
patent: 5442715 (1995-08-01), Gaborski et al.
patent: 5455892 (1995-10-01), Minot et al.
patent: 5467407 (1995-11-01), Guberman et al.
patent: 5528728 (1996-06-01), Matsuura et al.
patent: 5568591 (1996-10-01), Minot et al.
patent: 5764797 (1998-06-01), Adcock
patent: 5926566 (1999-07-01), Wang et al.
patent: 6018591 (2000-01-01), Hull et al.
patent: 0 543 590 (1993-05-01), None
patent: 0 858 047 (1998-08-01), None
patent: WO 94/07214 (1994-03-01), None
patent: WO 98/15914 (1998-04-01), None
Ha-Jin Yu and Yung-Hwan Oh, “A Neural Network for 500 Vocabulary Word Spotting Using Acoustic Sub-Word Units”, Proc. IEEE ICASSP 1997, vol. 4, pp. 3277-3280, Apr. 1997.*
N.Z. Hakim, J.J. Kaufman, G. Cerf, and H.E. Meadows, “Cursive Script Online Character Recognition with a Recurrent Neural Network Model,” Proc. International Joint Conference on Neural Networks, IJCNN 1992, vol. 3, pp. 711-716, Jun. 1992.*
Ehrich et al., “Experiments in the Contextual Recognition of Cursive Script,”IEEE Transactions on Computers,V C-24, No. 2, pp. 182-194 (Feb. 1975).
Guberman et al., “Simulation of Behavior and Intelligence,”Algorithm for the Recognition of Handwritten Text,Plenum Publishing Corporation, New York, NY, pp. 751-757 (1976).
Frishkopf et al., “Ma
Guha Angshuman
Haluptzok Patrick M.
Pittman James A.
Michalik & Wylie PLLC
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Handwriting and speech recognizer using neural network with... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Handwriting and speech recognizer using neural network with..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Handwriting and speech recognizer using neural network with... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2889176