Text formatting from speech

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06785649

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to converting between text and speech, and specifically to converting speech to text in the presence of speech intonation.
BACKGROUND OF THE INVENTION
Methods for converting between text and speech are known in the art. Text-to-speech conversion methods have been commercially produced for at least fifteen years, with improvements being made to the quality of the products as time has proceeded. Speech-to-text conversion is significantly more difficult to achieve than text-to-speech, and general-purpose, commercial speech-to-text systems have only been available in the last few years.
The Productivity Works, Inc., of Trenton, N.J., produces a “SoftVoice” text-to-speech product known as “SVTTS,” which analyzes text into phonemes, and generates speech from the phonemes. SoftVoice is a trademark of SoftVoice Inc. Tags and commands (which are not themselves converted to speech) may be embedded into the text so as to indicate to the SVTTS how the speech is to be generated. For example, there are tags for speaking in an English or Spanish accent, or in a whisper or speaking with a breathy quality.
IBM Corporation of Armonk, New York, produces a speech-to-text software package known as “ViaVoice.” ViaVoice is a registered trademark of International Business Machines Corporation. Preferably, the system uses a learning period, during which an operator is able to adjust to the system, and during which a computer upon which the system is installed becomes accustomed to the speech of the operator. During operation, the system converts speech to text, and inter alia, the system may be taught to recognize specific words and output them in a special format. For example, the system may be instructed to convert the spoken word “comma” to the punctuation mark “,”.
In an article titled “Super Resolution Pitch Determination of Speech Signals,” by Medan et al., in IEEE Transactions on Signal Processing 39:1 (January, 1991), which is incorporated herein by reference, the authors describe an algorithm giving extremely high resolution of pitch value measurements of speech. The algorithm may be implemented in real time to generate pitch spectral analyses.
In a book titled “Pitch Determination of Speech Signals” by W. Hess, (Springer-Verlag, 1983), which is incorporated herein by reference, the author gives a comprehensive survey of available pitch determination algorithms. The author points out that no single algorithm operates reliably for all applications.
SUMMARY OF THE INVENTION
It is an object of some aspects of the present invention to provide improved methods and apparatus for converting speech to text.
In preferred embodiments of the present invention, a speech/text processor automatically converts speech to text, while analyzing one or more non-verbal characteristics of the speech. Such non-verbal characteristics include, for example, the speed, pitch, and volume of the speech. The non-verbal characteristics are mapped to corresponding format characteristics of the text, which are applied by the speech/text processor in generating a text output. Such format characteristics can include, for example, font attributes such as different font faces and/or styles, character height, character width, character weight, character position, spacing between characters and/or words, and combinations of these characteristics. Text with such associated characteristics is herein termed expressive text, and cannot be generated by speech-to-text systems known in the art.
The expressive text produced from the speech may be used, for example, in an electronic mail transmission and/or to produce a hardcopy of the speech. Alternatively, the expressive text may be converted to a marked-up text, by a custom mark-up language or a standard mark-up language, such as HTML (hypertext mark-up language). Associating format characteristics with text to register non-verbal characteristics of speech is an innovative and extremely useful way of converting between speech and text, and overcomes limitations of speech-to-text and text-to-speech methods known in the art.
In some preferred embodiments of the present invention, the expressive text generated by the speech/text processor is converted back to speech by a speech synthesizer. The speech synthesizer recognizes the format characteristics of the expressive text, and applies them to generate speech so as to reproduce the non-verbal characteristics originally analyzed by the speech/text processor. Alternatively, similar format characteristics may be generated using a suitable word processor program, so that text that is input using a keyboard is reproduced by the speech synthesizer with certain desired non-verbal characteristics.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a method for converting speech to text, including:
receiving a spoken input having a non-verbal characteristic; and
automatically generating a text output, responsive to the spoken input, having a variable format characteristic corresponding to the non-verbal characteristic of the spoken input.
Preferably, receiving the spoken input includes analyzing the spoken input to identify the non-verbal characteristic.
Preferably, receiving the spoken input includes determining words and boundaries between words, and generating the text output includes generating text corresponding to the words.
Preferably, the non-verbal characteristic includes at least one characteristic of the words selected from a group consisting of a speed, a pitch, and a volume, of the words.
Preferably, receiving the spoken input includes determining parts of words and boundaries between parts of words in the spoken input, and the non-verbal characteristic includes at least one characteristic of the parts of the words selected from a group consisting of a speed, a pitch, and a volume of the parts of the words.
Preferably, generating the text output includes encoding-the text output as marked-up text.
Preferably, generating the text output includes generating the text output according to a predetermined mapping between the variable format characteristic and the non-verbal characteristic.
Further preferably, generating the text output includes normalizing a distribution of the non-verbal characteristic over a predetermined quantity of speech according to an adaptive mapping.
Alternatively, generating the text output includes generating the variable format characteristic according to a user-alterable absolute mapping.
Preferably, generating the text output according to the predetermined mapping includes generating the text output according to a quantized mapping, wherein a range of values of the non-verbal characteristic is mapped to a discrete variable format characteristic.
Alternatively, generating the text output according to the predetermined mapping includes generating the text output according to a continuous mapping, wherein a range of values of the non-verbal characteristic is mapped to a range of values of the variable format characteristic.
Preferably, automatically generating the text output includes:
applying the predetermined mapping at a transmitter;
encoding the text output with the variable format characteristic as a data bitstream at the transmitter
transmitting the data bitstream from the transmitter to a receiver; and
decoding the data bitstream to generate the text output with the variable format characteristic at the receiver.
Preferably, applying the predetermined mapping at the transmitter includes altering the predetermined mapping at the transmitter.
Alternatively, automatically generating the text output includes:
encoding the text output and the non-verbal characteristic as a data bitstream at a transmitter;
transmitting the data bitstream from the transmitter to a receiver;
decoding the data bitstream at the receiver; and
applying the predetermined mapping at the receiver, responsive to the non-verbal characteristic encoded in the data bitstream, so as to generate the text output with the variable format

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Text formatting from speech does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Text formatting from speech, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Text formatting from speech will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3349665

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.