Telephonic communications – Audio message storage – retrieval – or synthesis – Message signal analysis
Reexamination Certificate
1999-07-16
2004-04-27
Tsang, Fan (Department: 2645)
Telephonic communications
Audio message storage, retrieval, or synthesis
Message signal analysis
C379S088220, C704S201000
Reexamination Certificate
active
06728344
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to telephone answering devices, and in particular, a telephone answering device with a first speech coder for encoding/decoding fixed voice prompt messages based on a first set of codebooks and a second speech coder for encoding/decoding incoming and outgoing voice messages based on a second set of codebooks, significantly larger in size than the first codebook.
DESCRIPTION OF RELATED ART
In telecommunication devices, such as digital telephone answering devices (DTADs), speech processing systems are employed to store and forward speech sounds. Conventional digital telecommunication devices provide for the storage and playback of incoming voice messages, outgoing voice messages, and fixed prompt voice messages. Incoming voice messages include messages transmitted over the telephone line by the calling party and recorded by the DTAD. Outgoing messages are the pre-recorded messages played by the DTAD in response to receiving a telephone call. For example, the outgoing message might state “I am presently unavailable. At the sound of the tone please leave a brief message.” Incoming and outgoing messages are stored in a read/write memory in the DTAD. These messages are limitless in terms of the number of utterances or phrases that may be expressed and the number of speakers, so long as the memory size is not exceeded. Another type of audio message played by the DTAD is a fixed voice prompt message or “voice read only message” (VROM), such as a date/time stamp, with significantly fewer utterances or phrases spoken by a single speaker. Since the fixed voice prompt messages need only be read, and not changed, they are stored in a read only memory (ROM).
In a conventional DTAD the VROM messages are stored in an external ROM and compressed using the same coding techniques, for example, code-excited linear predictive coding (CELP), used for the storage of incoming and outgoing messages. Alternatively, the VROM messages may be stored on a linear predictive coding (LPC) synthesis chip; however, this provides a lower quality then CELP coding. External voice ROMs or LPC synthesis chips are relatively large in size. The overall size of the circuitry may be reduced by storing the VROM messages in a smaller memory device, such as a digital signal processor read only memory (DSP ROM). However, the cost of the DSP ROM significantly increases as the available storage capacity increases. Thus, it is preferable to use a DSP ROM with a relatively small storage capacity. By way of example, in a 16 k DSP ROM approximately 12 k is used to stored the encoding speech program and other programs, leaving only approximately 4 k words for the fixed voice prompts. The typical total recording time for storing time/day stamp fixed voice prompts is approximately 37 seconds. An encoding rate of 6.8 kbps, which is generally used in DTAD employing a codebook trained for a relatively large number of utterances and speakers, requires at least 15,725 words of storage. Thus, the overall storage requirements for the fixed voice prompts exceed the storage capacity in the typical low cost DSP ROM. Although DSP ROMs having a larger storage capacity, such as 24 k or 32 k, may be used they are significantly more expensive, and thus may be impracticable.
It is therefore desirable to develop a DTAD in which the fixed voice prompts are stored in a DSP ROM at a reduced compression bit rate while maintaining the quality of the reconstructed speech or voice data.
SUMMARY OF THE INVENTION
For the purposes of this invention, the term “set of codebooks” is defined to include an LPC codebook, an adaptive codebook, and a fixed codebook. In addition, the term “voice message” includes both incoming and outgoing voice messages. The terms “voice read only message” and “fixed voice prompt” are synonymous.
The digital telephone answering device in accordance with the present invention includes two separate coders, a first speech coder for encoding/decoding fixed voice prompts spoken by a single speaker and a second coder for encoding/decoding voice messages spoken by multiple speakers. The first speech coder uses a first set of codebooks generated by training on a first set of utterances spoken by a single speaker, while the second speech coder uses a second set of codebooks generated by training on a second set of utterances spoken by multiple speakers. Because the first set of utterances is significantly smaller in size than the second set of utterances, and the range of pitch period is significantly smaller in size for the first set of utterances spoken by a single speaker in comparison to that of the second set of utterances spoken by multiple speakers, the size of the first set of codebooks is significantly reduced relative to the size of the second set of codebooks. As a result, the fixed voice prompt messages may be compressed at a lower bit rate with a relatively high quality of encoding, thereby optimizing the codebook and reducing the amount of memory required for storing the encoded fixed voice prompts. Furthermore, the encoding of fixed voice prompts can occur off line, and thus need not be performed by the DSP in real time. Only decoding of the fixed voice prompts is performed by the DSP in real time.
In addition, the present invention is directed to a method of using the telephone answering device described above. Fixed voice prompts are encoded using a first speech coder having a first set of codebooks generated by training on a first set of utterances spoken by a single speaker. Incoming/outgoing voice messages are encoded using a second speech coder having a second set of codebooks generated by training on a second set of utterances spoken by multiple speakers, wherein the second set of utterances is larger than the first set of utterances. The encoded fixed voice prompts and voice messages are stored in first and second memory devices, respectively, for future retrieval and playback.
REFERENCES:
patent: 5602963 (1997-02-01), Bissonnette et al.
patent: 5692100 (1997-11-01), Tsuboka et al.
patent: 5727047 (1998-03-01), Bentley et al.
patent: 5752223 (1998-05-01), Aoyagi et al.
patent: 6009395 (1999-12-01), Lai et al.
patent: 6058361 (2000-05-01), Mainard
patent: 6073101 (2000-06-01), Maes
patent: 6076056 (2000-06-01), Huang et al.
patent: 6119022 (2000-09-01), Osborn et al.
patent: 6295340 (2001-09-01), Cannon et al.
patent: 6463406 (2002-10-01), McCree
patent: 6507814 (2003-01-01), Gao
Iyengar Vasu
Kroon Peter
Agere Systems Inc.
Escalante Ovidio
Synnestvedt & Lechner LLP
Tsang Fan
LandOfFree
Efficient compression of VROM messages for telephone... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Efficient compression of VROM messages for telephone..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient compression of VROM messages for telephone... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3266255