Telephonic communications – Audio message storage – retrieval – or synthesis – Voice message synthesis
Reexamination Certificate
2000-06-13
2004-05-18
Weaver, Scott L. (Department: 2748)
Telephonic communications
Audio message storage, retrieval, or synthesis
Voice message synthesis
C379S088220, C704S258000
Reexamination Certificate
active
06738457
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to voice processing systems and such like, and in particular to the way in which such systems can interact with callers.
2. Description of the Related Art
Voice processing systems whereby callers interact over the telephone network with computerised equipment are very well-known in the art, and include voice mail systems, voice response units, and so on. Typically such systems ask a caller (or called party) questions using prerecorded prompts, and the caller inputs answers by pressing dual tone multiple frequency (DTMF) keys on their telephone. In this manner, the caller can navigate through a hierarchy of prompt menus, for example to retrieve desired information, or to be connected eventually to a particular telephone extension or customer department.
There has been an increasing tendency in recent years for voice processing systems to use speech recognition (also sometimes called voice recognition—the two terms are used interchangeably herein), in order to augment DTMF input. The adoption of speech recognition permits the handling of callers who do not have a DTMF phone, and also the acquisition of more complex information beyond simple numerals from the caller. Speech recognition in a telephony environment can be supported by a variety of hardware architectures. Many voice processing systems include a special DSP card for running speech recognition software (firmware or microcode), which is connected to a line interface unit for the transfer of telephony data via a time division multiplex (TDM) bus. Most commercial voice processing systems conform to one of two standard TDM bus architectures: either the Signal Computing System Architecture (SCSA), or the Multi-vendor Integration Protocol (MVIP). A somewhat different configuration is described in GB 2280820, in which a voice processing system is connected via a local area network to a remote server, which provides a voice recognition facility.
Voice processing systems such as interactive voice response systems (IVRs) run applications to play prerecorded prompts to callers. IVRs typically have a set of system provided audio segments for commonly used items, such as numbers, days of the week, and so on. Additional audio segments must then be recorded as required for any specific application. The prompts played to a caller for that application can then be formed from one or more system provided audio segments and/or one or more application specific audio segments, concatenated together as required.
One problem with this approach is that the voice used to record the application specific audio segments will generally sound different from the voice which was used to record the system provided audio segments. Therefore the output when a system provided audio segment is concatenated with an application specific prompt will sound slightly incongruous. One way around this difficulty is to have the person who records the application specific audio segments re-record the system provided audio segments, so that all are spoken with the same voice. However, the extra time for these re-recordings represents additional expense for the application developer, and the possible duplication of recorded audio segments can increase system storage requirements. These problems are particularly acute where the IVR is running two or more applications, if it is decided to re-record the system prompts separately for each application.
A similar problem is related specifically to voice mail systems (also termed voice messaging systems), which are used to store messages from incoming calls when the intended recipient is absent or otherwise engaged. The intended recipient (often referred to as a subscriber to the voice mail system) can then listen to their stored messages at some future time. A voice mail system is generally implemented either on special purpose computer hardware, or else on a standard computer workstation equipped with a suitable telephony interface. This system is then attached to (or into) the telephone network, typically via a switch or PBX. Such voice mail systems are well-known; one example is the DirectTalkMail system, available from IBM Corporation (now marketed as the IBM Message Center). Other examples of voice mail systems are described in U.S. Pat. No. 4,811,381 and EPA 0588576.
An important feature of many voice mail systems is their ability to provide callers with a personalized greeting for the intended recipient, for example: “The party you have called, JOHN SMITH, is unavailable at present. Please leave a message after the tone, or hit the zero key for further assistance”. This greeting actually comprises three (or more) audio segments which the system automatically concatenates together for audio output:
(1) “The party you have called”
(2) “JOHN SMITH”
(3) “is unavailable at present. Please leave a message after the tone, or hit the zero key for further assistance”.
In this case the first and last segments may be standard audio segments provided by the voice mail system. By contrast, the middle segment (sometimes referred to as the “audio name”) is a separate audio segment which has to be specifically recorded by the subscriber. This is because it is very difficult to generate a spoken name automatically, for example with a text to speech system, because of the very wide range of names (many with unusual spellings), and also because of the variety of pronunciations used by different people even when they have the same name.
The use of such personalized greetings is further beneficial in voice mail systems, because hearing the name and indeed recorded voice of the subscriber reassures the caller that they have reached the correct mailbox. Nevertheless, the overall output can sound somewhat awkward in that the system provided audio segments (ie segments (1) and (3) above) may be spoken in a very different voice to that of the subscriber. This can then sound very cumbersome when they are concatenated together with the audio name of the subscriber.
The way to try to overcome this problem is to have the subscriber record the entire greeting, in other words, to record all three segments above (possibly as one long segment). Although this removes any disparity in sound between the different parts of the greeting, it is still not entirely satisfactory. For example, not all subscribers may be prepared for the additional effort required to produce the longer recording. This is particularly the case where the system may provide different greetings for different situations (eg one for general unavailability, one for when the subscriber has left the office for the night, etc), and where the system can normally re-use the same audio name recording for the different greetings. Therefore, if it is desired to have a greeting spoken in its entirety by a subscriber, then the subscribers may now be faced with having to record multiple greetings, rather than just a single audio name. Furthermore, even those subscribers that are prepared to record whole greetings may produce a greeting that is mumbled and difficult to understand, hesitant, lacks information, or has some other defect compared to the standard system audio segments. This in turn can reflect badly on the professionalism of the subscriber's organization.
SUMMARY OF THE INVENTION
Accordingly, the invention provides a voice processing system for connection to a telephone network and running at least one application for controlling interaction with calls over the telephone network, said system comprising:
means for providing at least one audio segment recorded by a first speaker for use by said at least one application;
means for providing at least one vocal parameter characteristic of a second speaker;
means for applying said at least one vocal parameter to said audio segment to produce a modified audio segment such that said modified audio segment sounds substantially as if spoken by said second speaker; and
means for outputting the modified audio segment over the telephone network.
Thus the inve
Pickering John Brian
Tuttle Graham Hugh
Carstens Yee & Cahoon LLP
Clay A. Bruce
International Business Machines - Corporation
Weaver Scott L.
LandOfFree
Voice processing system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Voice processing system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Voice processing system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3259056