Method and system for delivering text-to-speech in a real...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S201000, C704S270000, C370S412000

Reexamination Certificate

active

06778961

ABSTRACT:

TECHNICAL FIELD
The present invention is generally related to communication methods and systems employing text-to-speech engines and, more particularly, to a method and system for delivering text-to-speech in a real time telephony environment.
BACKGROUND ART
Text-to-speech (TTS) engines are computing devices which convert written text into audible computer generated speech. Telephony based applications require TTS engines to convert email, news, stock quotes, sports scores, and many other types of textual data into speech for delivery to telephony users. In these types of telephony applications, a speech version of a text document is demanded in real time by telephony users. Because the text which is requested by telephony users is not known beforehand, the text must be converted in real time and delivered without delay to the telephony users.
Performing high quality text-to-speech conversion or synthesis is resource intensive. For example, given 4,000 bytes of textual data, a typical TTS engine produces an audio or speech file having three million bytes to play for the telephony user. This is a 700 to one expansion ratio and presents a serious bottleneck for the synthesis of large textual documents. As a result, the telephony user will likely not wait for the several minutes it may take to convert the entire textual document into speech before the speech is provided to the telephony user. Synthesizing the text into speech before the telephony user requests the text is not a viable option as it is generally not known what the telephony user will request. Additionally, the physical storage requirements for a large number of pre-synthesized audio files is prohibitive in many environments.
DISCLOSURE OF INVENTION
Accordingly, it is an object of the present invention to provide a method and system for delivering text-to-speech (TTS) in a real time telephony environment in which text documents of any size are efficiently converted into speech which is provided immediately to a telephony user.
It is another object of the present invention to provide a method and system for delivering TTS in a real time telephony environment in which a first part of a text is converted into a first speech segment and the first speech segment is delivered to a telephony user while a second part of the text is being converted into second speech segment for delivery to the telephony user after the first speech segment has been delivered to the telephony user.
It is a further object of the present invention to provide a method and system for delivering TTS in a real time telephony environment in which a text is divided into text segments for conversion by a farm of TTS engines into speech segments which are then reassembled in the proper order and delivered to a telephony user.
It is still another object of the present invention to provide a method and system for delivering TTS in a real time telephony environment which employ a streaming buffer of speech converted from text for delivery to a telephony user in which the streaming buffer adapts to the bandwidth of the network delivering the speech to the telephony user.
It is still a further object of the present invention to provide a method and system for delivering TTS in a real time telephony environment which employ a streaming buffer for storing speech converted from text such that a first speech segment corresponding to a first text segment is delivered to the telephony user from the streaming buffer while a second speech segment corresponding to a second text segment is being delivered to the streaming buffer for future delivery to the telephony user.
In carrying out the above objects and other objects, the present invention provides a communication system for communicating information to a telephony user in response to a request for the information from the telephony user. The system includes a text data source having a plurality of text documents. A voice application is operable with the telephony user for receiving a request from the telephony user for information. The voice application is operable with the text data source for retrieving a text document related to the information requested by the telephony user. A text-to-speech (TTS) resource manager is operable for dividing the text document into text document segments and associating a sequence number with each text document segment. The TTS resource manager places the text document segments and the corresponding sequence numbers in a sequential order within a queue. A TTS engine farm has a plurality of TTS engines which are operable for receiving text document segments and the corresponding sequence numbers from the queue of the TTS resource manager in the sequential order for converting the text document segments into speech segments. Each text document segment is converted into a speech segment by one TTS engine. A buffer receives the speech segments and the corresponding sequence numbers from the TTS engines. The buffer uses the corresponding sequence numbers to reassemble the speech segments in the proper order and then delivers the speech segments in the proper order to the telephony user via the voice application in order to satisfy the request for information from the telephony user.
The TTS resource manager is operable to determine the rate at which speech segments are delivered to the telephony user from the buffer. The TTS resource manager divides the text document as a function of the rate at which speech segments are delivered to the telephony user such that the speech segments are delivered from the TTS engines to the buffer and from the buffer to the telephony user continuously.
The TTS resource manager is further operable to determine the load of each of the TTS engines. The TTS resource manager delivers the text document segments to the TTS engines as a function of the load of the TTS engines.
In operation, the buffer delivers a first speech segment to the telephony user via the voice application after the buffer has received a second speech segment from a TTS engine and while the buffer is receiving a third speech segment from a TTS engine such that the speech segments are delivered to the telephony user continuously. The buffer delivers the first speech segment to the telephony user via the voice application while a TTS engine is converting a fourth text document segment into a fourth speech segment.
The request from the telephony user may be an audio request. The voice application is operable for converting the audio request into a text request in order to retrieve a text document related to the information requested by the telephony user. Similarly, the request from the telephony user may be a dual tone multi-frequency request. The voice application is operable for converting the dual tone multi-frequency request into a text request in order to retrieve a text document related to the information requested by the telephony user.
Further, in carrying out the above objects and other objects, the present invention provides a communication method for communicating information from a text data source having a plurality of text documents to a telephony user in response to a request for the information from the telephony user. The method includes receiving a request from the telephony user for information. A text document related to the information requested by the telephony user is then retrieved. The text document is then divided into text document segments and a sequence number is associated with each text document segment. The text document segments and the corresponding sequence numbers are then placed in a sequential order within a queue. Respective text document segments and the corresponding sequence numbers are then transferred from the queue in the sequential order to respective TTS engines. Respective text document segments are then converted into speech segments using one TTS engine for each respective text document segment. The speech segments and the corresponding sequence numbers from the TTS engines are then stored in a buffer. The stored speech segments are th

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for delivering text-to-speech in a real... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for delivering text-to-speech in a real..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for delivering text-to-speech in a real... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3269874

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.