Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1999-10-06
2002-07-02
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S270000, C704S235000, C704S251000, C704S201000
Reexamination Certificate
active
06415258
ABSTRACT:
TECHNICAL FIELD
This invention relates generally to the field of multi-source data processing systems and, more particularly, to a background audio recovery system for speech recognition systems/software.
BACKGROUND OF THE INVENTION
Since the advent of the personal computer, human interaction with the computer has been primarily through the keyboard. Typically, when a user wants to input information or to enter a command into a computer, he types the information or the command on the keyboard attached to the computer. Other input devices that have supplemented the keyboard as an input device include the mouse, touch-screen displays, integrated pointer devices, and scanners. Use of these other input devices have decreased the amount of user time spent in entering data or commands into the computer.
Computer-based voice recognition and speech recognition systems have also been used for data or command input into personal computers. Speech recognition systems convert human speech into a format that can be understood by the computer. When a computer is equipped with a speech recognition system, data and command input can be performed by merely speaking the data or command to the computer. The speed at which the user can speak is typically faster than conventional data or command entry. Therefore, the inherent speed in disseminating data or commands through human speech is a sought after advantage of incorporating and speech recognition systems into personal computers.
The increased efficiency of users operating personal computers equipped with voice recognition and speech recognition systems has encouraged the use of such systems in the workplace. Many workers in a variety of industries now utilize voice recognition and speech recognition systems for numerous applications. For example, computer software programs utilizing voice recognition and speech recognition technologies have been created by DRAGON, IBM, and LERNOUT & HAUSPIE. When a user reads a document aloud or dictates to a voice recognition program, the program can enter the user's spoken words directly into a word processing program operating on a personal computer.
Generally, computer-based voice recognition and speech recognition programs convert human speech into a series of digitized frequencies. These frequencies are matched against a previously stored set of words, or phonemes. When the computer determines correct matches for the series of frequencies, computer recognition of that portion of human speech is accomplished. The frequency matches are compiled until sufficient information is collected for the computer to react. The computer can then react to certain spoken words by storing the human speech in a memory device, transcribing the human speech into a document for a word processing program, or executing a command in an application program.
However, voice recognition and speech recognition systems are not 100% accurate. Even with hardware and software modifications, the most efficient voice recognition and speech recognition systems can attain approximately 97-99% accuracy. Internal and external factors can affect the reliability of voice recognition and speech recognition systems. Internal factors dependent upon the recognition technology include the comparison between the finite set of words/phonemes and the vocabulary of words of a speaker. External factors include the environment such as regional accents, external noise, and the type of microphone can degrade the quality of the input, thus affecting the frequency of the user's words and introducing potential error into the word or phoneme matching.
Conventional speech recognition systems suffer from significant recognition error rates. Different solutions have been applied to increase the recognition rate and to decrease the number of recognition errors. One solution is to train the voice recognition or speech recognition program to recognize the frequencies for a specific human voice. In a speaker dependent speech recognition system, the system creates a voice profile that recognizes the pronunciation patterns unique to a specific human voice. Speech recognition systems that are not trained for a particular speaker are called speaker independent systems, and therefore are more prone to recognition errors due to regional accents or differences in pronunciation.
Another solution uses a method called discrete speech input. Discrete speech input requires the operator to speak relatively slowly, pausing between each word, before speaking the next word. The pausing of the operator gives the speech recognition system an opportunity to distinguish between the beginning and the end each operator's word. Recognition systems relying upon discrete speech input are slow and cumbersome for users accustomed to speaking at a normal conversational speed.
An alternative solution involves a method based upon continuous speech input. Continuous speech input systems require the user to speak a limited set of words that have been previously stored in the system vocabulary. Therefore, the speech recognition system relies upon a limited vocabulary of words. Optimum use of these systems occurs when the system is utilized by users in an environment with a specific vocabulary. For example, continuous speech input systems have been implemented in the medical industry in specific fields such as radiology, orthopedics, internal medicine, emergency medicine, mental health, etc. However, continuous speech input systems are limited by their inherent deficiencies of vocabulary, which limits their ability to be used in other industries or work environments.
Natural speech input systems will ultimately reach the marketplace. These systems will not require the user to speak in any particular way for the computer to understand, but will be able to understand the difference between a user's command to the computer and information to be entered into the computer.
Throughout the remainder of this disclosure, the terms “voice recognition” and “speech recognition” may be used interchangeably. In some instances, a distinction is made between voice recognition and speech recognition. However, both voice recognition and speech recognition systems suffer from some of the same reliability problems described above, and the same solutions have been applied to both recognition technologies to resolve the shortcomings of the prior art.
Problems of Conventional Art to Be Solved by the Present Invention
Many multi-source data processing systems include voice recognition software. As described above, conventional voice and speech recognition software has many drawbacks. One major drawback is that an application program employing the voice or speech recognition software, such as a word processing program, frequently loses or does not properly capture dictation generated by a user.
There are two major reasons for not properly capturing dictation: One of the major reasons for this lost dictation is that users frequently forget to activate the speech recognition software because the microphone status indicators or icons are difficult to locate on a display device. Another reason why dictation is not properly capture is that, frequently, users assume that the microphone of the speech recognition software was turned on and start to dictate their thoughts. However, after a few minutes, the users discover that their voice commands and/or dictation were not recorded or properly processed by the speech recognition software. In such situations, users have to “turn-on” or “wake-up” the speech recognition software and re-dictate their thoughts.
Another cause of lost dictation is that the computers supporting the speech recognition software often have very slow processing speeds. Speech recognition software typically requires increased processing power relative to everyday applications, and many conventional computers do not sufficiently meet the needs of speech recognition software. In conventional computers, users may often utter a command and assume the command was properly captured by the computer.
Caulton David Allen
Kim Paul Kyong Hwan
Reynar Jeffrey C.
Rucker Erik
Dorvil Richemond
Microsoft Corporation
Nolan Daniel A.
LandOfFree
Background audio recovery system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Background audio recovery system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Background audio recovery system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2892594