Capture and application of sender voice dynamics to enhance...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Capture and application of sender voice dynamics to enhance... Capture and application of sender voice dynamics to enhance...

: 1999-01-28
: 2001-01-16
: Hudspeth, David R. (Department: 2741)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Recognition

: C704S260000, C704S276000, C704S278000
: Reexamination Certificate
: active
: 06175820
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to data processing systems, and in particular to a method and system for enhanced speech recognition environment on a data processing system. Still more particularly, the present invention relates to a method and system for providing voice dynamics in a speech-to-text application within a speech recognition environment on a data processing system.
2. Description of the Related Art
Human speech recognition technology has been around for several years and is well known in the art and is commercially available. Speech analysis and speech recognition algorithms, machines, and devices are becoming more and more common. Such systems have become increasingly powerful and less expensive. Those familiar with the technology are aware that various applications exist which recognize human speech and stores it in various forms on a data processing system. One extension of this technology is in speech-to-text application which provides a textual representation on a data processing system of human speech. Speech recognition software is being utilized every day by hundreds of thousands of people.
Speech-to-text applications have evolved as one of the ultimate goals of speech recognition technology. Many current applications utilize this technology to convert spoken language into text form which is then made accessible to a user of the data processing system.
Within recent years, an explosion in the utilization of voice recognition systems has occurred. One goal of voice recognition systems is to provide a more humanistic interface for operating a data processing system. Voice recognition systems, typically, are utilized with other input devices, such as a mouse, keyboard, or printer, to supplement the input/output (I/O) processes of voice recognition systems.
Some common examples of the implementation of voice recognition technology are Dragon™ (a product of COREL) and ViaVoice™ and IBM Voicetype™, both products of International Business Machines Corporation (IBM).
ViaVoice Executive Edition is IBM's most powerful continuous speech software. ViaVoice Executive offers direct dictation into most popular Windows applications, voice navigation of your desktop and applications and the use of intuitive “natural language commands” for editing and formatting Microsoft Word documents.
In order for voice recognition be useful to a user of a data processing system, various means of outputting the human speech signal for user interface is required. This aspect of human speech recognition is quickly developing and is well known in the art.
Standard Generalized Markup Language (SGML) has been developed to provide additional information when outputting text to provide a recipient with a more detailed output. The Java Speech Markup Language (JSML) is particularly developed for marking up text that will be spoken on devices incorporating the java speech API (Java is a trademark of Sun Microsystems, Inc.).
The Java Speech Markup Language is utilized by applications to annotate text input to Java Speech Application Programming Interface (JSAPI) speech synthesizers. The JSML elements provide a speech synthesizer equipped with the JSAPI with detailed information on how to say the text. JSML includes elements that describe the structure of a document, provide pronunciations of words and phrases, and place markers in the text. JSML also provides prosodic elements that control phrasing, emphasis, pitch, speaking rate, improves the quality and naturalness of the synthesized voice. JSML utilizes the Unicode character set so JSML can be utilized to markup text in most languages.
The current market consists of different forms of voice recognition. These different forms are: Speaker Dependent, Speaker Independent, Command * Control, Discrete Speech Input, Continuous Speech Input and Natural Speech Input.
Natural Speech Input is the ultimate goal in Voice Recognition Technology. To be able to talk to your computer in no specific manner and have the computer understand what the user wants, then apply the commands or words. One aspect of natural speech input is the ability to capture speaker voice dynamics to convey additional meaning to the text created. Currently no application exists which can capture speech dynamics and convert them to a text document representing the spoken text.
As voice recognition technology evolves, there will be a need to facilitate the retention of subtleties often lost in the process. Much of a verbal message's value is in the tone, emphasis inflection, volume, etc., which is mostly or entirely lost today. If all or part of this information content could be captured and passed along with the text message created through speech-to-text software, the formation content to the recipient would be greatly enhanced.
Further, although speech capture is well known, no current method or application exists which bridges the gap between speech recognition and speech-to-text technology to the creation of a marked-up text which exhibits the speech dynamics such as volume, pitch, range, and rate. Currently, most Extended Markup Language (XML) is prepared by hand utilizing no JSAPI specific editors.
It would therefore be desirable to have a method and system for enhanced recognition of speech, including recognition of its dynamics such as volume, pitch and tone. It would further be desirable to allow the real-time representation of such voice dynamics with speech in its textual form. It would further be desirable if such captured voice dynamics were capable of being transmitted along with the text representation to an audible output as a marked up document.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide an improved data processing system.
It is another object of the present invention to provide a method and system for enhanced speech recognition on a data processing system.
It is yet another object of the present invention to provide a method and system for providing speaker voice dynamics in a speech-to-text application within a speech recognition environment on a data processing system.
The foregoing objects are achieved as is now described. A method is disclosed for providing voice dynamics of human utterances converted to and represented by text within a data processing system. The method first selects predetermined parameters for recognition and representation of dynamics in human utterances. The method then creates an enhanced human speech recognition software program implementing said predetermined parameters on a data processing system, wherein said enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition. Also, disclosed is the capturing of said dynamics in a human utterance utilizing said enhanced human speech recognition software and converting said human utterance into a textual representation utilizing said speech-to-text ability of said software. Finally, a method to merge said dynamics along with said textual representation of the human utterance to produce a marked-up text document on said data processing system is disclosed.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.

REFERENCES:
patent: 5799273 (1998-08-01), Mitchell
patent: 5799280 (1998-08-01), Degen et al.
patent: 5842167 (1998-11-01), Miyatake et al.
patent: 5860064 (1999-01-01), Henton
patent: 6088675 (2000-07-01), MacKenty et al.

Affiliated with

Dietz Timothy Alan

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Azad Abul K.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Felsman Bradley Vaden Gunter & Dillon, LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Henkler Richard A.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hudspeth David R.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

International Business Machines - Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Capture and application of sender voice dynamics to enhance... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Capture and application of sender voice dynamics to enhance..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Capture and application of sender voice dynamics to enhance... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2503311

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure