Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1997-12-01
2001-01-30
Hudspeth, David R.. (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S254000, C704S256000, C704S275000
Reexamination Certificate
active
06182038
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to computer speech recognition.
BACKGROUND OF THE INVENTION
Recent advances in computer hardware and software have allowed computer speech recognition (CSR) to cross the threshold of usability. Systems are now available for high end personal computers that can be used for large vocabulary, continuous speech dictation. To obtain adequate performance, such systems need to be adapted to a specific user's voice and environment of usage. In addition, these systems can only recognize words drawn from a certain vocabulary and are usually tied to a particular language model, which captures the relative probabilities of different sequences of words. Without all of these constraints, it is very difficult to get adequate performance from a CSR system.
In most CSR systems, the user and environment specific part, or acoustic models, are usually separate to the vocabulary and language models. However, because of the above constraints, any application that requires speech recognition needs access to both the user/environment specific acoustic models and the application specific vocabulary and language models.
This is a major obstacle to moving CSR systems beyond standalone dictation, to systems where many different users will need to access different applications, possibly in parallel and often over the internet or a local area network (LAN). The reason is that either: (a) each application will have to keep separate acoustic models for each user/environment; or (b) each user will need to maintain separate sets of vocabularies and language models for each application they wish to use. Since the size of acoustic and language models are typically in the order of megabytes to tens of megabytes for a medium to large vocabulary application, it follows that in either scenario (a) or (b), the systems' resources are going to be easily overwhelmed.
One possibility is to store the acoustic models on a different machine to the vocabulary and language models, and connect the machines via a LAN or the internet. However, in either (a) or (b), enormous amounts of network traffic will be generated as megabytes of data are shifted to the target recognizer.
Thus, a need exists for a CSR system that is independent of the vocabulary and language models of an application without sacrificing performance in terms of final recognition accuracy.
REFERENCES:
patent: 5293584 (1994-03-01), Brown et al.
patent: 5475792 (1995-12-01), Stanford et al.
patent: 5515475 (1996-05-01), Gupta et al.
patent: 5535120 (1996-07-01), Chong et al.
patent: 5555344 (1996-09-01), Zunkler
patent: 5615296 (1997-03-01), Stanford et al.
patent: 5621859 (1997-04-01), Schwartz et al.
patent: 5651096 (1997-07-01), Pallakoff et al.
patent: 5715367 (1998-02-01), Gillick et al.
patent: 5745649 (1998-04-01), Lubensky
patent: 5805710 (1998-09-01), Higgins et al.
patent: 5867817 (1999-02-01), Catallo et al.
patent: 5915001 (1999-06-01), Uppaluru
patent: 5960399 (1999-09-01), Barclay et al.
patent: 2230370 (1990-10-01), None
patent: 224023 (1991-07-01), None
patent: WO 98/08215 (1998-02-01), None
“Specialized Language Models for Speech Recognition”, IBM Technical Disclosure Bulletin, vol. 38, No. 2, Feb. 1995, pp. 155-157, XP000502428.
S.J. Young, M.G. Brown, J.T. Foote, G.J.F. Jones and K. Sparck Jones. Acoustic Indexing For Multimedia Retrieval and Browsing. In Proc. ICASSP 97, pp. 1-4, Munich, Germany, Ap. 1997. IEEE.
Austin Stephen
Balakrishnan Sreeram
Bose Romi N.
Dunlop Hugh C.
Hudspeth David R..
Hughes Terri S.
Lerner Martin
LandOfFree
Context dependent phoneme networks for encoding speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Context dependent phoneme networks for encoding speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Context dependent phoneme networks for encoding speech... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2528817