Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Reexamination Certificate
1999-04-26
2001-01-16
Hudspeth, David R. (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
C704S270000, C704S265000, C704S260000
Reexamination Certificate
active
06175821
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to apparatus for and a method of generating voice messages. It has particular utility in relation to voice message generators which generate one or more types of message, each type of message having an invariable portion which is common to all such messages and a variable portion which normally differs from one such message to the next.
2. Related Art
In many examples of such voice message apparatus it is undesirable to record, in their entirety, all possible messages that might be output by the apparatus. Instead, benefits are gained by storing only one instance of the invariable portion (sometimes known as a carrier phrase) and using that in the generation of all messages of that type. The variable portion of the message giving message-specific information can then be output with the carrier phrase to generate a specific message.
In some cases recorded speech corresponding to each possible message-specific information can be used. In other cases, it is better to synthesise speech corresponding to the message-specific information.
To give an example of the former case, an information apparatus for use in a metropolitan railway network might be operable to output a chosen message for each of around 200 stations in the network. A carrier phrase for one type of message might then be ‘This train is now approaching . . . ’ Any one of the 200 station names (the message specific information) might be inserted into the gap in generating a specific message. Those skilled in the art will realise that the cost and complexity of the information apparatus will be significantly reduced if a single recording of the carrier phrase is used for all of the 200 possible messages.
To give an example of the latter case, a voice message generator forms an important component in an apparatus operable to enable the telephonic retrieval of information stored in a database. If, say, the database contains the names and telephone numbers of millions of people, it is impractical to store a recording of every one of those names and corresponding numbers. Hence, voice messages output by such apparatuses include variable portions synthesised from a text signal representing the name and/or the number concerned. Again, a single recording of a carrier phrase such as ‘The number you require is . . . ’ can be used in generating any possible message of that type.
However, a drawback of conventional voice message generators is that the carrier phrase may have characteristic qualities which are different from those of the message specific part. These qualities might include highness of voice, liveliness of intonation, speed of delivery, loudness and the like. This is especially so in messages containing both recorded and synthetic speech, since, owing to the constraints of conventional speech synthesis technology, it is likely that the synthesised voice will have lower pitch and duller intonation than the recorded voice.
Another situation in which such a disturbing change of quality might present a problem arises where a recorded word is inserted to the synthesised output of the text-to-speech apparatus. It might be necessary to do this because the text-to-speech apparatus is itself unable to say the word well.
The conventional solution to the above problems is to place a short pause before and/or after the variable portion of the message.
SUMMARY OF THE INVENTION
According to the present invention there is provided a method of generating a voice message signal representing all or part of a message comprising a variable portion and an invariable portion, said method comprising: obtaining a recorded carrier speech signal representing at least a major part of the invariable portion; obtaining a message-specific speech signal representing at least the variable portion; generating a transition signal on the basis of the carrier and message-specific speech signals; forming the voice message signal by concatenating all or part of one of the carrier speech signal and the message-specific speech signal, said transition signal and all or part of the other of said carrier speech signal and the message-specific speech signal.
Because the carrier and message-specific signals are merged rather than being separated by a pause, the output of an apparatus operating in accordance with the above method is more fluent than has hitherto been achieved.
The word ‘signal’ is intended in this specification to include electrical, electromagnetic (including optical) or like types of signal.
It is to be understood that the carrier and message-specific signals may derive from the same speaker. For example, the carrier phrase may be obtained directly from a recording of a speaker's voice, whilst the message-specific part is formed from the concatenation of phoneme segment-representing signals taken from a recording of the same speaker's voice. Also, a speaker's voice may vary between recording sessions or even during a recording session.
Preferably, said transition signal generating step involves the generation of a transition signal which represents a transition audio portion whose pitch varies from having an initial pitch similar to the end of the leading one of the carrier speech signal and the message-specific speech signal to having a final pitch similar to the beginning of the trailing one of the carrier speech signal and the message-specific speech signal. This has the advantage that the presence of a disturbing pitch discontinuity in the output voice message is avoided.
Preferably, the method further comprises the step of truncating one or both of the carrier speech signal and the message-specific speech signal to the extent that the total length removed is substantially equal to the length of the transition signal. This has the advantage that the duration of the voice message is not altered by the insertion of the transition audio portion represented by the transition signal.
Preferably, the transition signal generating step comprises: generating a plurality of transition pitchmarks, the spacing of which represents the pitch of a transition audio portion represented by said transition signal; windowing the carrier speech signal to provide carrier speech short-term signals; windowing the message-specific speech signal to provide message-specific speech short-term signals; and mapping the carrier speech short-term signals and the message-specific short-term signals onto said transition pitchmarks to generate the transition signal.
Because this method involves low computation its use leads to a lower cost voice message generator.
In a preferred embodiment the transition pitchmark providing step involves a linear interpolation between values of the pitch of the voice message on either side of the transition audio portion. It is found that the use of a linear interpolation method represents a good compromise between the requirement for low computation and the requirement for a fluent output.
In a further refinement, the mapping comprises mapping a combination of a message-specific speech short-term signal and a carrier speech short-term signal to one or more of said plurality of transition pitchmarks. This has the advantage of providing a smooth change in the timbre of the voice message at the join between the two voice message portions.
Preferably, the transition audio portion is located around the centre of a phoneme of the invariable portion, which phoneme is closest to the boundary between the invariable portion and the variable portion of the voice message. The effect of this is to increase the fluency of the voice message.
According to a second aspect of the present invention there is provided apparatus for generating a voice message signal representing a message comprising a variable portion and an invariable portion, said apparatus comprising:
means arranged in operation to receive a carrier speech signal representing at least a major part of the invariable portion;
means arranged in operation to receive a message-specific speech sign
Murrin Paul
Page Julian H.
Abebe Daniel
British Telecommunications public limited company
Hudspeth David R.
Nixon & Vanderhye P.C.
LandOfFree
Generation of voice messages does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Generation of voice messages, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Generation of voice messages will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2555482