Method and system for providing audio playback of a...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method and system for providing audio playback of a... Method and system for providing audio playback of a...

: 1999-10-27
: 2002-09-03
: Dorvil, Richemond (Department: 2654)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Synthesis

: C704S235000, C704S278000, C704S275000
: Reexamination Certificate
: active
: 06446041
: ABSTRACT:

TECHNICAL FIELD
The invention relates generally to the field of speech recognition, and more specifically to a multi-source input and playback utility for a display computer.
BACKGROUND OF THE INVENTION
Since the advent of the personal computer, human interaction with the computer has been primarily through the keyboard. Typically, when a user wants to input information into a computer, he types the information on a keyboard attached to the computer. Other input devices have supplemented the keyboard, including the mouse, touch-screen displays, integrated pointer devices, and scanners. Use of these other input devices has decreased the amount of user time spent in entering data or commands into the computer.
Computer-based speech recognition and speech recognition systems have also been used for data or command input into personal computers. Speech recognition and speech recognition systems convert human speech into a format understood by the computer. When a computer is equipped with a voice or speech recognition system, data input may be performed by merely speaking the data into a computer input device. The speed at which the user can speak is typically faster than conventional data entry. Therefore, the inherent speed in disseminating data through human speech is an advantage of incorporating speech recognition and speech recognition systems into personal computers. The increased efficiency of users operating personal computers equipped with speech recognition and speech recognition systems has encouraged the use of such systems in the workplace. Many workers in a variety of industries now utilize speech recognition and speech recognition systems for numerous applications. For example, computer software programs utilizing speech recognition and speech recognition technologies have been created by Dragon Systems, Inc. (Newton, Mass.), IBM Corporation (Armonk, N.Y.), and Lemout & Hauspie (Burlington, Mass.). When a user reads a document aloud or dictates to a speech recognition program, the program may enter the user's spoken words directly into a word processing program or other application operating on a personal computer.
Generally, computer-based speech recognition and speech recognition programs convert human speech into a series of digitized frequencies. These frequencies are matched against a previously stored set of words or speech elements, called phonemes.
A phoneme is the smallest unit of speech that distinguishes one sound from another in a spoken language. Each phoneme may have one or more corresponding allophones. An allophone is an acoustic manifestation of a phoneme. A particular phoneme may have many allophones, each sounding slightly different due to the position of the phoneme in a word or variant pronunciations in a language of the same letter set. For example, the phoneme /b/ is pronounced differently in the words “boy” and “beyond.” Each pronunciation is an allophone of the phoneme /b/.
The utility processes these phonemes and converts them to text based on the most likely textual representation of the phoneme in a manner well known to those skilled in the art. The text is then displayed within a word processor or other application, such as a spreadsheet, database, web browser, or any program capable of receiving a voice input and converting it into display text or a program command. The multi-source input and playback utility may store the audio data. The audio data may be stored in a variety of formats on various storage media, including in volatile RAM, on long-term magnetic storage, or on optical media such as a CD-ROM. The audio data may be further compressed in order to minimize storage requirements. The utility may also link the stored audio data to the text generated by the audio data for future playback. When the computer determines correct matches for the series of frequencies, computer recognition of that portion of human speech is accomplished. The frequency matches are compiled until sufficient information is collected for the computer to react. The computer can then react to certain spoken words by storing the speech in a memory device, transcribing the speech as text in a document manipulable by a word processing program, or executing a command in an application program.
Natural speech input systems are expected to ultimately reach the marketplace. Such systems will not require the user to speak in any particular way for the computer to understand, but instead will be able to understand the difference between a user's command to the computer and information to be entered into the computer.
Lacking this technological advance, contemporary speech recognition and speech recognition systems are not completely reliable. Even with hardware and software modifications, the most proficient speech recognition and speech recognition systems attain no greater than 97-99% reliability. Internal and external factors may affect the reliability of speech recognition and speech recognition systems. Factors dependent upon the recognition technology itself include the finite set of words or phonemes inherent in the speaker's language, and the vocabulary of words to which the speech recognition software may compare the speaker's input. Environmental factors such as regional accents, external noise, and microphone quality may degrade the quality of the input, thus affecting the frequency of the user's words and introducing potential error into the word or phoneme matching.
Consequently, dictated documents transcribed by speech recognition software often contain recognition errors. Unlike typing errors, where simple mistakes such as the transposition of letters are easily identifiable and correctable, recognition errors are often more severe. Recognition errors typically are not the substitution or transposition of letters, but instead tend to be the wholesale substitution of similar-sounding words. For example, a classic speech recognition error is the transcription of the phrase “recognize speech” as “wreck a nice beach.” While these phrases sound similar, they have totally different meanings. Further, an editor proofreading a document containing this recognition error may not immediately recall the intended phrase, leading to unnecessary confusion.
Traditionally, users have attempted to minimize this confusion by reading words aloud as they proofread the document. This practice assists in identifying intended phrases, since the vocal similarities are apparent when the document is read aloud. However, where significant time elapses between dictating and editing a document, the user may forget what the intended phrase was.
Known current speech recognition products attempt to solve this problem by storing the dictation session as audio data, and linking the stored audio data to the individual transcribed words. Users may select single words or text sequences and request playback of the audio corresponding to the selected portion.
While this aids a user in recognizing the intended transcription, a severe problem arises in the event that the user has edited the document in the time between dictation and requesting audio playback. A user is then presented with the prospect of requesting playback for a portion of a document generated through mixed input sources.
For example, a user may have dictated “I wish my computer could recognize speech,” which the speech recognition system transcribed as “I wish my computer could wreck a nice beach.” If the user then types the word “really” between “I” and “wish,” the document has mixed input sources. Thus, when a user selects the sentence as it appears on the screen (“I really wish my computer could wreck a nice beach”) and requests playback, no audio data is linked to the word “really,” since it was typed and not dictated.
Known current speech recognition platforms disable the playback option in this situation. Instead, the speech recognition system returns an error message to the user, stating that playback is not available because audio data does not exist for all of the selected text. This force

Affiliated with

Kim Paul Kyong Hvan

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Reynar Jeffrey C.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Rucker Erik

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Dorvil Richemond

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Merchant & Gould P,C,

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Microsoft Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Nolan Daniel

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for providing audio playback of a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for providing audio playback of a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for providing audio playback of a... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2872615

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure