Method and apparatus for the recognition of spelled spoken...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S251000, C704S257000, C704S231000

Reexamination Certificate

active

06694296

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to speech recognition. More specifically, the present invention relates to the recognition of spoken, spelled words.
In speech recognition systems, an input speech signal is converted into words that represent the verbal content of the speech signal. This conversion begins by converting the analog speech signal into a series of digital values. The digital values are then passed through a feature extraction unit, which computes a sequence of feature vectors based on the digital values. Each feature vector represents a section of the speech signal.
The feature vectors are then used to identify the most likely sequence of words that would have generated the sequence of feature vectors. Typically, this involves applying the feature vectors to an acoustic model to determine the most likely sequences of sub-word units, typically senones, and then using a language model to determine which of these sequences of sub-word units is most likely to appear in the language. This most likely sequence of sub-word units is identified as the recognized speech.
In many systems, the sub-word units are concatenated to form words, and sequences of words. A language model is accessed to determine a most likely sequence of words. The language model provides a statistical probability of any sequence of words. For example, a trigram language model provides the statistical probability of any three-word sequence. The structure and operation of such language models is well known.
Though some current speech recognition systems attain a high degree of accuracy, they do make mistakes. For example, in a dictation (or document creation) system, a user may be rapidly dictating into the speech recognition system. The system may also provide a graphical output, in the nature of a display, displaying the words, as recognized. If the user notices that a word has been mis-recognized, the user may attempt to correct the word. This often entails the user first selecting the mis-recognized word by highlighting it with a mouse, keyboard, or other user input device. The user then attempts to correct the word using a number of techniques, such as re-speaking the word, or by spelling the word out loud.
However, recognizing spoken, spelled words is very difficult, and presents many problems, primarily due to the existing acoustic similarities among certain groups of letters. There are many confusable groups of letters, most notably “E-set” which is formed of the letters b, c, d, e, g, p, t, v and z. Because of the minimal acoustic differences between letter pairs in the E-set, it is recognized as being one of the most confusable sets in the task of recognizing spoken letters. A number of other, less confusable groups, present similar problems as well.
Because of the problems present with recognizing spoken letters, prior speech recognizers invoked dedicated spoken letter recognition systems. This has required the user to affirmatively take action to enter a special spelling recognition mode in order to spell spoken words. Still other systems required the user to spell using the military alphabet (i.e, alpha, bravo, Charlie, etc.). However, this required the user to know the military alphabet, and also required a special purpose lexicon in the speech recognition system to recognize those words.
SUMMARY OF THE INVENTION
The speech recognizer includes a dictation language model providing a dictation model output indicative of a likely word sequence recognized based on an input utterance. A spelling language model provides a spelling model output indicative of a likely letter sequence recognized, based on the input utterance. An acoustic model provides an acoustic model output indicative of a likely speech unit recognized based on the input utterances. A speech recognition component is configured to access the dictation language model, the spelling language model and the acoustic model. The speech recognition component weights the dictation model output and the spelling model output in calculating likely recognized speech based on the input utterance. The speech recognizer can also be configured to confine spelled speech to an active lexicon. The present invention can also be practiced as a method.
Another feature of the present invention is directed to creation of the spelling language model. A lexicon is decomposed into individual letters and is then processed into the spelling language model.


REFERENCES:
patent: 5865626 (1999-02-01), Beattie et al.
patent: 5995928 (1999-11-01), Nguyen et al.
patent: 6064959 (2000-05-01), Young et al.
patent: 6314399 (2001-11-01), Deligne et al.
patent: 2002/0138265 (2002-09-01), Stevens et al.
Isolated-word sentence recognition using probabilistic context-free grammar By:G.J.F. Jones et al. Eurospeech 91, 2nd European Conf. On Speech Comm. and Tech. Proceedings p. 487-9, vol. 2.
Context-free grammar driven, frame synchronous HMM-based continuous speech recognition methods using word spotting By: S. Nakagawa et al. Transactions of the Inst. of Electr., Information and Communication Engineers D-II vol. J76D-II, No. 7, p. 1329-36.
One-pass continuous speech recognition directed by generalized LR parsing By: K. Kita et al., ICSLP 94. 1994 Intertational Conference on Spoken Language Processing.
The ARISTOTLE speech recognition system By: C. Waters et al., Progress in Connectionist-Based Information Systems.
A context-free grammar compiler for speech understanding systems By: M.K. Brown et al. ICSLP 94. 1994 International Conference on Spoken Language Processing Part. vol. 1, p. 21-4.
Efficient word-graph parsing and search with a stochastic context-free grammar By: C.J. Waters et al., 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
Dynamic programming parsing for context-free grammar in continuous speech recognition By: H. Ney, IEEE Transactions on Signal Processing, vol. 39, No. 2, p. 336-40.
Using a stochastic context-free grammar as a language model for speech recognition By: D. Jurafsky, et al., 1995 International Conference on Acoustics, Speech, and Signal Processing. Part 1, 189-92, vol. 1.
Development of an effective context-free parser for continuous stochastic languages By: L.R. Strydom et al., AFRICON 96′. Incorporating AP-MTT-96 and COMSIG-96.
Reliable utterance segment recognition by integrating a grammar with statistical language constraints By: H. Tsukada et al., Speech Communication vol. 26, No. 4, p. 299-309.
Active middleware services in a decision support system for managing highly available distributed resources By: S.A. Fakhouri et al., International Conf. On Distributed Systems Platforms and Open Distributed Processing. Lecture Notes in Computer Science vol. 1795, p. 349-71.
Improving scalability of event-driven distributed objects architectures. By: D. Mencnarowski et al., Poland Journal: Software-Practice and Experience vol. 30, No. 13, p. 1509-29.
Improved spelling recognition using a tree-based fast lexical match. By: C.D. Mitchell et al., 1999 IEEE International Conf. On Acoustics, Speech and Signal Proceedings. vol. 2, p. 597-600.
Event management components for the 3/sup rd/ generation OSS By: S, Desrochers et al., Proceedings of Network Operations and Management Symposium Conference Date: Apr. 10-14 2000 Conference Location: Honolulu, HI, USA.
A context-dependent similarity measure for strings By: E. Tanaka. Transactions of the Institute of Electronics and Communication Engineers of Japan, Part A, VO.. J67A, No. 6, p. 612-13.
READY: a high performance event notification service By: Gruber, R.E. et al. Proceedings 16th International Conference on Data Engineering Conference Sponsor: IEEE Comput. Soc. Tech. Committee on Data Eng. Conference Date: Feb. 29-Mar. 3, 2000 Conference Location: San Diego, CA, USA.
An event notification framework based on Java and CORBA By: Tomono, M. Japan Conference Title: Integrated Network Management VI. Distributed Management for the Networked Millennium.
Mobile streams By: Ranganathan, M et al., Proceedings of the Sixth Annual Tcl/Tk Confer

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for the recognition of spelled spoken... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for the recognition of spelled spoken..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for the recognition of spelled spoken... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3308418

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.