Text-to-speech native coding in a communication system

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S278000, C704S258000

Reexamination Certificate

active

06681208

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to text-to-speech synthesis, and more particularly to text-to-speech synthesis in a communication system using native speech coding.
BACKGROUND OF THE INVENTION
Radio communication devices, such as cellular phones, are no longer viewed as voice only devices. With the advent of data based wireless services available to consumers, some serious problems arise for the conventional cellular phones. For example, cellular phones are currently only capable of presenting data services in text format on a small screen. This requires screen scrolling or other user manipulation in order to get the data or message. Also, comparing to landline systems, a wireless system has much higher data error rate and faces spectrum constraints, which makes providing real-time streaming audio, i.e. real-audio, to cellular users impractical. One way to deal with these problems is text-to-speech encoding.
The process of converting text to speech is generally broken down into two major blocks: text analysis and speech synthesis. Text analysis is the process by which text is converted into a linguistic description that can be synthesized. This linguistic description generally consists of the pronunciation of the speech to be synthesized along with other properties that determine the prosody of the speech. These other properties can include (1) syllable, word, phrase, and clause boundaries; (2) syllable stress; (3) part-of-speech information; and (4) explicit representations of prosody such as are provided by the ToBI labeling system, as known in the art, and further described in 2nd International Conference on Spoken Language Processing (ICSLP92): TOBI: “A Standard for Labeling English Prosody”, Silverman et al, (October 1992).
The pronunciation of speech included in the linguistic description is described as a sequence of phonetic units. These phonetic units are generally phones or phonics, which are particular physical speech sounds, or allophones, which are particular ways in which a phoneme may be expressed. (A phoneme is a speech sound perceived by the speakers of a language). For example, the English phoneme “t” may be expressed as a closure followed by a burst, as a glottal stop, or as a flap. Each of these represents different allophones of “t”. Different sounds that may be produced when “t” is expressed as a flap represent different phonics. Other phonetic units that are sometimes used are demisyllables and diphones. Demisyllables are half-syllables and diphones are sequences of two phonics.
Speech synthesis can be generated from phonics using a rule-based system. For example, the phonetic unit has a target phenome acoustic parameters (such as duration and intonation) for each segment type, and has rules for smoothing the parameter transitions between the segments. In a typical concatenation system, the phonetic component has a parametric representation of a segment occurring in natural speech and concatenates these recorded segments, smoothing the boundaries between segments using predefined rules. The speech is then processed through a vocoder for transmission. Voice coders, such as vector-sum or code excited linear prediction (CELP) vocoders are in general use in digital cellular communication devices. For example, U.S. Pat. No. 4,817,157, which is hereby incorporated by reference, describes such a vocoder implementation as used for the Global System for Mobile (GSM) communication system among others.
Unfortunately, the text-to-speech process as described above is computationally complex and extensive. For example, in existing digital communication devices, vocoder technology already uses the limits of computational power in a device in order to maintain voice quality at its highest possible level. However, the text-to-speech process described above requires further signal processing in addition to the vocoder processing. In other words, the process of converting text to phonics, applying acoustic parameters rules for each phonic, concatenation to provide a voiced signal, and voice coding require more processing power than just voice coding alone.
Accordingly, there is a need for an improved text-to-speech coding system that reduces the amount of signal processing required to provide a voiced output. In particular, it would be of benefit to be able to use the existing native speech coding incorporated into a communication device. It would also be advantageous if current low-cost technology could be used without the requirement for customized hardware.
SUMMARY OF THE INVENTION
The present invention finds use in communication devices, such as radiotelephones for example, that have audio capabilities that can take advantage of text-to-speech conversion of text messages.
One aspect of the present invention uses an existing vocoder with a stored code table containing coded speech parameters for use in text-to-speech conversion. These native speech parameters in a communication device can be used without the need to create and store new speech parameters. Instead, the native parameters can be modified if and when needed, such as to provide more natural-sounding language for example.
Another aspect of the present invention involves dividing the text messages into phonics, spaces, and special characters, and wherein white noise is used to emulate spaces between words of text. This saves time and code processing for non-phonics that do not contain any speech information.
Another aspect of the present invention involves the division of text into phonics which can be mapped against native coded speech parameters used in existing communication systems. For example, each distinct phonic can be mapped with a memory location index of predefined phonics in a look-up table to point to a digitized wave file defining equivalent native coded speech parameters from the code table.


REFERENCES:
patent: 4405983 (1983-09-01), Perez-Mendez
patent: 4817157 (1989-03-01), Gerson
patent: 4893197 (1990-01-01), Howells et al.
patent: 5119425 (1992-06-01), Rosenstrach et al.
patent: 5463715 (1995-10-01), Gagnon
patent: 5625687 (1997-04-01), Sayre, III
patent: 5673362 (1997-09-01), Matsumoto
patent: 5696879 (1997-12-01), Cline et al.
patent: 5745650 (1998-04-01), Otsuka et al.
patent: 5864812 (1999-01-01), Kamai et al.
patent: 5896393 (1999-04-01), Yard et al.
patent: 5924068 (1999-07-01), Richard et al.
patent: 5940791 (1999-08-01), Byrnes et al.
patent: 5956681 (1999-09-01), Yamakita
patent: 6070138 (2000-05-01), Iwata
patent: 6081780 (2000-06-01), Lumelsky
patent: 6125346 (2000-09-01), Nishimura et al.
patent: 6178402 (2001-01-01), Corrigan
patent: 6246983 (2001-06-01), Zou et al.
patent: 6272587 (2001-08-01), Irons
patent: 6516298 (2003-02-01), Kamai et al.
patent: 2002/0147882 (2002-10-01), Pua et al.
patent: 62-165267 (1987-07-01), None
patent: 05-173586 (1993-07-01), None
patent: 05-181492 (1993-07-01), None
patent: 08-160990 (1996-06-01), None
patent: 08-335096 (1996-12-01), None
patent: 2000-148175 (2000-05-01), None
Sagisaka (“Speech Synthesis From Text”, IEEE Communications Magazine, Jan. 1990).*
O'Malley, M. et al. “Text-To-Speech Conversion Technology.”IEEE; Aug. 1990 pp. 17-23.
Mobius, B. et al. “Modeling Segmental Duration in German Text-to-Speech Synthesis.”ICSLP 4thInternational Conference on Spoken Language; Oct. 1996, vol. 4, pp. 2395-2398.
Sproat, R. et al. “EMU: and E-Mail Preprocessor for Text-To-Speech.”IEEE Second Workshop on Multimedia Signal Processing; Dec. 1998, pp. 239-244.
Silverman et al., TOBI: “A Standard for Labeling English Prosody”, 2nd International Conference on Spoken Language Processing (ICSLP92): Oct. 1992, pp. 867-870.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Text-to-speech native coding in a communication system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Text-to-speech native coding in a communication system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Text-to-speech native coding in a communication system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3203818

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.