Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2001-04-20
2004-08-31
McFadden, Susan (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S270000, C704S270100, C704S201000, C704S273000, C379S088010
Reexamination Certificate
active
06785647
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention.
The present invention relates, in general, to voice recognition, and, more particularly, to software, systems, software and methods for performing voice and speech recognition over a distributed network.
2. Relevant Background
Voice and speech recognition systems are increasingly common interfaces for obtaining user input into computer systems. Speech recognition is used to provide enhanced services such as interactive voice response (IVR), automated phone attendants, voice mail, fax mail, and other applications. More sophisticated speech recognition systems are used for speech-to-text conversion systems used for dictation and transcription.
Voice and speech recognition systems are characterized by, among other things, their recognition accuracy, speed and vocabulary size. High speed, accurate, large vocabulary systems tend to be complex and so require significant computing resources to implement. Moreover, such systems have increased training demands to develop accurate models of users' speech patterns. In applications where computing resources are limited or the ability to train to a particular user's speech patterns is limited, speech recognition products tend to be slow and/or inaccurate. Currently, speech recognition enabled software applications must often compromise between complex but accurate solutions, or simple but less accurate solutions. In many applications, however, the impracticality of meaningful training dictates that the application can only implement less accurate techniques.
Voice recognition is of two basic types, speaker-dependent and speaker-independent. A speaker dependent system operates in environments where the system has relatively frequent contact with each speaker, where sizable vocabularies are involved, and where the cost of recognition errors is high. These systems are usually easier to develop, cheaper to buy and more accurate, but not as flexible as speaker-adaptive or speaker-independent systems. In a speaker-dependent system, a user trains the system by, for example, providing speech samples and creating a correlation between the samples and text of what was provided, usually with some manual effort on the part of the speaker. Such systems often use a generic engine coupled with substantial data files, called voice models, that characterize a particular speaker for which the system has been trained. The training process can involve significant effort to obtain high recognition rates. Moreover, the voice model files are tightly coupled to the recognition software so that it is difficult to port the training investment to other hardware/software platforms.
A speaker independent system operates for any speaker of a particular type (e.g. American English). These systems are the most difficult to develop, most expensive and accuracy is lower than speaker dependent systems. However, they are highly useful in a wide variety of applications where many users must use the system such as answering services, interactive voice response (IVR) systems, call processing centers, data entry and the like. Such applications sacrifice the accuracy of speaker-dependent systems for the flexibility of enabling a heterogeneous group of speakers to use the system. Such applications are characterized in that high recognition rates are desirable, but the cost of recognition failure is relatively low.
A middle ground is sometimes defined as a speaker adaptive system. A speaker adaptive system dynamically adapts its operation to the characteristics of new speakers. These systems are more akin to speaker-dependent models, but allow the system to be trained over time. Adaptive systems can improve their vocabulary over time and result in complex, but accurate speech models. Such systems still require significant training effort, however. As in speaker-dependent systems, the complex speech models cannot be readily ported to other systems.
Training methods tend to be very product specific. Moreover, the data structures in which the relationships between a user's speech and text are correlated tend to be product specific. Hence, the significant training effort applied to a first speech recognition program may not be reusable for any other program or system. In some cases, speakers must re-train systems between version updates of the same program. Temporary or permanent changes to a user's voice patterns affect performance and may require retraining. This significant training burden and lack of portability between products has worked against wide scale adoption of speech recognition systems.
Moreover, even where a user has trained one or more speaker-dependent systems, this training effort cannot be leveraged to improve the performance of the many speaker-independent systems that are encountered. The speaker-independent systems cannot, by design, access or use speaker-dependent speech models to improve their performance. Hence, a need exists for improved speech recognition systems, software and methods that enable portable speech models that can be used for a wide variety of tasks and leverage the training efforts across a wide variety of systems.
The dichotomy between speaker-dependent and speaker-independent technologies has resulted in an interesting dilemma in industry. Many of the applications that could benefit most from accurate speech recognition (e.g., interactive voice response systems) cannot afford the complexity of highly accurate speaker dependent systems, nor obtain the necessary voice models that would improve their accuracy. From a practical perspective, speakers will only invest the significant time required to develop a high quality voice model in applications where the result is worth the effort. The benefits realized by a business cannot compel individual speakers to submit to the necessary training regimens. Hence, these applications settle for speaker-independent solutions and invest heavily in improving the performance of such systems.
Increasingly, computer-implemented applications and services are targeting “thin clients” or computers with limited processing power and data storage capacity. Such devices are cost effective means of implementing user interfaces. Thin clients are becoming prominent in appliances such as televisions, telephones, Internet terminals and the like. However, the limited computing resources make it difficult to implement complex functionality such as voice and speech recognition. A need exists for voice processing systems, methods and software that can provide high quality voice processing services with reduced hardware requirements.
In the past, computers were used by one user, or perhaps a few users, to access a limited set of applications. As computers are used more frequently to provide interfaces to everyday appliances, the need to adapt user interfaces to multiple users becomes more pressing. Voice processing, in particular, represents a user input mode that is difficult to adapt to multiple users. In current systems, a voice model must be developed on and stored in each machine for each user. Not only does this tax the machine's resources, but it creates a burdensome need for each user to train each computer that they use.
Conversely, each user tends to access computer resources via a variety of computer-implemented interfaces and computing hardware. It is contemplated that any given user may wish to access voice-enabled television, voice-enabled software on a personal computer, voice-enabled automobile controls, and the like. The effort to train and maintain each of these systems individually becomes significant with only a few applications, and prohibitive with the large number of applications that could potentially become voice enabled.
Hence, a need exists for speech recognition systems, methods and software that provide increased accuracy with reduced cost. Moreover, there is a need for systems that require reduced effort on the part of the speaker. Further, a need for systems and software that enable users to leverage training effort
Hogan & Hartson LLP
Langley Stuart T.
McFadden Susan
LandOfFree
Speech recognition system with network accessible speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech recognition system with network accessible speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition system with network accessible speech... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3290838