Speech synthesizing system and method for modifying prosody...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06823309

ABSTRACT:

TECHNICAL FIELD
The present invention relates to a speech synthesis system in which arbitrary input texts, input phonetic characters, or the like are converted into synthesized speech to be output therefrom.
BACKGROUND ART
In recent years, synthesized speech has been widely used in electric home appliances and various electronic appliances such as vehicle navigation systems and mobile phones, in which various speech messages such as conditions of the appliances, instructions for operation, and response messages, are voiced by synthesized speeches. In addition, synthesized speeches have begun to be employed in personal computers or the like for such purposes as operating the apparatuses by way of a voice interface and confirming the result of text recognition by optical character recognition (OCR).
One of the techniques for performing such a speech synthesis is that speech data are stored in a system in advance and the stored data are played back when required. This technique is widely used in cases where a limited number of messages are to be vocalized. However, when a system according to this technique is applied to generate arbitrary speeches, the system requires a large capacity storage system, which inevitably makes the system costly and thus limiting the application thereof.
Another technique that is used in relatively less expensive systems than the above is such a system wherein, based on input texts or phonetic character strings, speech data are generated using a predetermined speech data generating rule. However, by this technique that utilizes the speech data generating rule, it is difficult to generate natural sounding speeches with various kinds of expressions.
In view of these problems, Japanese Unexamined Patent Publication No. 8-87297, for example, discloses a speech synthesis system that employs both the speech synthesis by retrieving speech data from a database and the speech synthesis by using a speech sound generating rule. More specifically, this type of apparatus has, as shown in
FIG. 13
, a text input section
910
, a speech information database
920
storing speech parameters and corresponding speech content data, the speech parameters being obtained by analyzing actual speech and extracting data therefrom, a speech data retrieving section
930
retrieving data from the speech information database
920
, a speech sound generating section
940
generating a speech waveform, a speech sound generating rule
950
including a rule for generating a speech parameter from the input text or the input phonetic character string, and an electroacoustic transducer
960
. This speech synthesis system operates in the following manner. If a text or a phonetic character string is inputted into the text input section
910
, the speech data retrieving section
930
retrieves from the speech information database
920
speech data having speech content that matches the input text or the input phonetic character string. If a matching speech content is present in the database, corresponding speech data is transmitted to the speech sound generating section
940
. If the matching speech content is absent, the speech data retrieving section
930
transmits the input text or the input phonetic character string as it is to the speech sound generating section
940
. When the speech sound generating section
940
receives the retrieved speech data, the speech sound generating section
940
generates a synthesized speech based on the retrieved speech data. Alternatively, when the speech sound generating section
940
receives the input text or the input phonetic character string, the speech sound generating section
940
generates speech parameters based on the input text or input phonetic character string and the speech sound generating rule
950
, and thereafter generates a synthesized speech.
By using the speech data retrieval and the speech sound generating rule as described above, an arbitrary input text can be converted into a synthesized speech to be outputted, and for a limited portion of the speech (where the retrieval can find a successful match), a natural sounding speech can be obtained.
One of the drawbacks of the above-described prior art speech synthesis system is that there is a large difference in the sound quality between a synthesized speech in which the search has found a successful match and a synthesized speech in which the search has not found a successful match, that is, between a case where a speech content data corresponding to the input text or the like is present in the speech information database and a case where the corresponding speech content data is absent. In addition, by concatenating such speeches having different sound qualities, the resulting synthesized speech becomes further unnatural. Further, the retrieval from the speech information database
920
is performed by simply detecting the presence or absence of matching between the input phonetic character string and the stored speech content data, and therefore when a matching speech content data is present in the database, the speech synthesis is performed based on the retrieved data, regardless of other actors such as construction of the sentence, also leading to unnatural synthesized speech.
Specifically, assume that the system is required to synthesize a sentence in Japanese “
(which is transcribed in the Roman alphabet as ‘Osaka ni sunde iru watashi wa Matsushita desu’, which means that ‘I, who live in Osaka, am Matsushita.’)”, for example. In this case, if the proper noun “Matsushita” is absent in the database, the corresponding portion of the speech tends to become a mechanical sounding synthesized speech. Also, when the speech content data corresponding to the clause “Osaka ni sundeiru” which is stored as a speech data of the end of a sentence is used to construct the required sentence, the resulting speech tends to become an unnatural sounding synthesized speech such that two separate sentences “
(‘osaka ni sunde iru’, meaning ‘I live in Osaka’)” and “
(‘watashi wa Matsushita desu’, meaning ‘I am Matsushita’)” are unnaturally concatenated.
DISCLOSURE OF THE INVENTION
In view of the foregoing and other drawbacks of prior art, it is an object of the present invention to provide a speech synthesis system capable of generating natural sounding synthesized speeches from arbitrary input texts, particularly a speech synthesis system capable of generating natural sounding synthesized speech having a good sound quality regardless of whether or not the speech information (prosodic information) database contains speech content data that matches the input text.
This and other objects are accomplished, in a first aspect of the present invention, by the provision of a speech synthesis system for generating a synthesized speech based on input data representing a speech to be synthesized, the system comprising:
a database storing prosodic data for use in synthesizing speech, the prosodic data corresponding to key data being used as a retrieval key;
means for retrieving the prosodic data according to a degree of matching between the input data and the key data;
means for modifying the prosodic data retrieved by the means for retrieving based on the input data, the degree of matching between the input data and the key data, and a predetermined modifying rule; and
means for synthesizing a synthesized speech based on the input data and the prosodic data modified by the means for modifying.
A second to a six aspects of the invention are as follows. The input data and the key data may include a phonetic character string representing a phonetic attribute of the speech to be synthesized, and further include linguistic data representing a linguistic attribute of the speech to be synthesized. The phonetic character string may include a data substantially indicating at least one of a phonological segment string of the speech to be synthesized, an accent position in the speech to be synthesized, and either one of the presence or absence and the length of a pause in the speech to be synthesized. Further, the li

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech synthesizing system and method for modifying prosody... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech synthesizing system and method for modifying prosody..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesizing system and method for modifying prosody... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3362595

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.