Method and apparatus for editing/creating synthetic speech...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S258000, C704S266000

Reexamination Certificate

active

06226614

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a method and apparatus for editing/creating synthetic speech messages and a recording medium with the method recorded thereon. More particularly, the invention pertains to a speech message editing/creating method that permits easy and fast synthesization of speech messages with desired prosodic features.
Dialogue speech conveys speaker's mental states, intentions and the like as well as the linguistic meaning of spoken dialogue. Such information contained in the speaker's voices, except their linguistic meaning, is commonly referred to as non-verbal information. The hearer takes in the non-verbal information from the intonation, accents and duration of the utterance being made. There has heretofore been researched and developed, as what is called a TTE (Text-To-Speech) message synthesis method, a “speech synthesis-by-rule” that converts a text to speech form. Unlike in the case of editing and synthesizing recorded speech, this method places no particular limitations on the output speech and settles the problem of requiring the original speaker's voice for subsequent partial modification of the message. Since the prosody generation rules used are based on prosodic features of speech made in a recitation tone, however, it is inevitable that the synthesized speech becomes recitation-type and hence is monotonous. In natural conversations the prosodic features of dialogue speech often significantly vary with the speaker's mental states and intentions.
With a view to making the speech synthesized by rule sound more natural, an attempt has been made to edit the prosodic features, but such editing operations are difficult to automate; conventionally, it is necessary for a user to perform edits based on his experience and knowledge. In the edits it is hard to adopt an arrangement or configuration for arbitrarily correcting prosodic parameters such as intonation, fundamental frequency (pitch), amplitude value (power) and duration of an utterance unit desired to synthesize. Accordingly, it is difficult to obtain a speech message with desired prosodic features by arbitrarily correcting prosodic or phonological parameters of that portion in the synthesized speech which sounds monotonous and hence recitative.
To facilitate the correction of prosodic parameters, there has also been proposed a method using GUI (graphic user interface) that displays prosodic parameters of synthesized speech in graphic form on a display, visually corrects and modifies them using a mouse or similar pointing tool and synthesizes a speech message with desired non-verbal information while confirming the corrections and modifications through utilization of the synthesized speech output. Since this method visually corrects the prosodic parameters, however, the actual parameter correcting operation requires experience and knowledge of phonetics, and hence is difficult for an ordinary operator.
In any of U.S. Pat. No. 4,907,279 and Japanese Patent Application Laid-Open Nos. 5-307396, 3-189697 and 5-19780 there is disclosed a method that inserts phonological parameter control commands such as accents and pauses in a text and edits synthesized speech through the use of such control commands. With this method, too, the non-verbal information editing operation is still difficult for a person who has no knowledge about the relationship between the non-verbal information and prosody control.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a synthetic speech editing/creating method and apparatus with which it is possible for an operator to easily synthesize a speech message with desired prosodic parameters.
Another object of the present invention is to provide a synthetic speech editing/creating method and apparatus that permit varied expressions of non-verbal information which is not contained in verbal information, such as the speaker's mental states, attitudes and the degree of understanding.
Still another object of the present invention is to provide a synthetic speech message editing/creating method and apparatus that allow ease in visually recognizing the effect of prosodic parameter control in editing non-verbal information of a synthetic speech message.
According to a first aspect of the present invention, there is provided a method for editing non-verbal information of a speech message synthesized by rules in correspondence to a text, the method comprising the steps of:
(a) inserting in the text, at the position of a character or character string to be added with non-verbal information, a prosodic feature control command of a semantic layer (hereinafter referred to as an S layer) and/or an interpretation layer (hereinafter referred to as an I layer) of a multi-layered description language so as to effect prosody control corresponding to the non-verbal information, the multi-layered description language being composed of the S and I layers and a parameter layer (hereinafter referred to as a P layer), the P layer being a group of controllable prosodic parameters including at least pitch and power, the I layer being a group of prosodic feature control commands for specifying details of control of the prosodic parameters of the P layer, the S layer being a group of prosodic feature control commands each represented by a phrase or word indicative of an intended meaning of non-verbal information, for executing a command set composed of at least one prosodic feature control command of the I layer, and the relationship between each prosodic feature control command of the S layer and a set of prosodic feature control commands of the I layer and prosody control rules indicating details of control of the prosodic parameters of the P layer by the prosodic feature control commands of the I layer being prestored in a prosody control rule database;
(b) extracting from the text a prosodic parameter string of speech synthesized by rules;
(c) controlling that one of the prosodic parameters of the prosodic parameter string corresponding to the character or character string to be added with the non-verbal information, by referring to the prosody control rules stored in the prosody control rule database; and
(d) synthesizing speech from the prosodic parameter string containing the controlled prosodic parameter and for outputting a synthetic speech message.
A synthetic speech message editing apparatus according to the first aspect of the present invention comprises:
a text/prosodic feature control command input part into which a prosodic feature control command to be inserted in an input text is input, the phonological control command being described in a multi-layered description language composed of semantic, interpretation and parameter layers (hereinafter referred to simply as an S, an I and a P layer, respectively), the P layer being a group of controllable prosodic parameters including at least pitch and power, the I layer being a group of prosodic feature control commands for specifying details of control of the prosodic parameters of the P layer, and the S layer being a group of prosodic feature control commands each represented by a phrase or word indicative of an intended meaning of non-verbal information, for executing a command set composed of at least one prosodic feature control command of the I layer;
a text/prosodic feature control command separating part for separating the prosodic feature control command from the text;
a speech synthesis information converting part for generating a prosodic parameter string from the separated text based on a “synthesis-by-rule” method;
a prosodic feature control command analysis part for extracting, from the separated prosodic feature control command, information about its position in the text;
a prosodic feature control part for controlling and correcting the prosodic parameter string based on the extracted position information and the separated prosodic feature control command; and
speech synthesis part for generating synthetic speech based on the corrected pros

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for editing/creating synthetic speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for editing/creating synthetic speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for editing/creating synthetic speech... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2510199

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.