Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1996-01-30
2001-04-24
Tung, Kee M. (Department: 2671)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S211000
Reexamination Certificate
active
06223153
ABSTRACT:
The present invention relates to a voice processing system and method.
Voice processing systems, which are well-known in the art (see for example “Voice Processing”, by Walt Teschner, published by Artech House), perform a variety of functions, the most common of which is voice mail (also known as voice messaging), whereby callers who cannot reach their intended addressee can instead record a message for them for subsequent retrieval. It is occasionally desirable to be able to skip through a stored voice mail message; either forwards to the more important issues raised therein or backwards to listen to points again. The DirectTalkMail system available from International Business Machines Corporation allows one to skip through a message, either backwards or forwards, using keys seven and nine respectively, eight seconds at a time (see DirectTalkMail Guide SC33-1221-XX, available from International Business Machines Corporation). However, such skipping through does not allow one to concurrently listen to the message; to achieve that the system must provide for variable speed of output of the stored voice data. The speeding up and slowing down of the rate of output of stored voice data is provided in the Aspen voice mail system available from Octel Communications Corporation, incorporated in Delaware, USA. One of the problems associated with speeding up and slowing down the speed of output of a voice message is to avoid a significant variation in a pitch which substantially reduces the comprehensibility of the voice message. It is possible to obviate this variation in pitch using digital signal processing techniques. One example of these is provided in product ETSM available from Entropic Speech, Inc, incorporated in California, USA. However, the digital signal processing techniques utilised are very processor intensive and present a significant drain on processor capacity thereby making it difficult to perform the necessary processing in a realtime telephony environment.
Accordingly, the present invention provides a method for varying the speed of playback of digitised audio data derived from a sequence of encoded audio data units, comprising the steps of storing a set of digitised audio data units, processing said digitised audio data units by omitting or repeating selected digitised audio data units in accordance with a desired variation in speed, and outputting said processed digitised audio data units.
The present invention allows the speed of output of a voice message to be varied whilst preserving the pitch thereof. As a consequence of the pitch remaining substantially unchanged, the comprehensibility of the voice message at higher or lower speeds of output is much improved. Further, the present invention affords a very simple and a processor inexpensive manner of achieving a variation in the speed of output of voice messages whilst maintaining pitch. As the processing involves repeating or omitting the utilisation of digitised audio data units without further processing, the processor overhead is significantly reduced.
An embodiment provides a method wherein said digitised audio data units are encoded using a history based encoding technique. History based techniques, such as those which utilise differences between successive segments of audio data, are particularly effective for use in the present invention. The history based techniques as they contain information related to or derived from previous audio data units enable good quality audio data to be generated therefrom notwithstanding that previous audio data units have been omitting or repeated.
An embodiment provides a method wherein said encoded audio data blocks represent Linear Predictive Coding (LPC) coefficients. The use of LPC coefficients to represent digitised voice has the dual benefit of, first, allowing very good quality speech to be derived therefrom and, secondly, being very efficient in terms of storage and processing overhead. It is important for voice mail systems to be able to store data in compressed form in order to efficiently utilise storage capacity. Further, the ability to repeat or omit the use of LPC blocks reduces processor overhead as the omission or repetition is performed before decompression or decoding of the LPC blocks. Thus the amount of data which is processed as compared with unencoded data is substantially reduced thereby reducing processor loading.
Preferably the percentage variation in the speed of playback is between 50° to 200%. A practical implementation of the present invention indicates that the comprehensibility of the audio signal derived from the digitised audio data units starts to degrade when the speed of playback is outside the above range.
It is preferred that the digitised audio data units represent between 5 msec and 50 msec of audio data. Using speech in blocks of between 5 msec and 50 msec enables a compromise to be reached between granularity and speed of searching. A practical implementation has found that 20 msec represent a good compromise. If the time period of audio data represented by the LPC coefficients is too small, the processor may become unduly loaded as a consequence of handling a large number of small blocks. In addition, it is believed that a lower limit on the duration of the speech may arise from the LPC coefficients. This lower limit is determined by the dynamics of the human ear, that is an LPC block may have to allow slightly more than one complete cycle of the lowest frequency present to be derived therefrom in order that that cycle is discernable by the human ear. However, if the time period represented by the LPC coefficients is too large, discernable repetition or stutter will be audible in the resultant audio signal derived therefrom.
The present invention also provides a voice mail system comprising means for storing voice messages comprising a set of digitised audio data units, means for playing back the stored message including means for varying the speed of playback, means for processing said digitised audio data units by omitting or repeating selected digitised audio data units in accordance with a desired variation in speed, and means for outputting said processed digitised audio data units.
REFERENCES:
patent: 4435832 (1984-03-01), Asada et al.
patent: 4864620 (1989-09-01), Bialick
patent: 5175769 (1992-12-01), Hejna, Jr. et al.
Aspen Quick Reference Guide, Octel Communications Corporation, 2 pages.
ETSM—Entopic Time-Scale Modification Software promotional leaflet, 4 pages.
“Speech Coding and Speech Recognition Technologies: A Review”, IEEE International Symposium on circuits and Systems, 1991, pp. 575-577, vol. 1.
Bowater Ronald John
Cobbett Michael
Staton Mervyn Aubony
International Business Machines - Corporation
Ray-Yarletts Jeanine S.
Tung Kee M.
LandOfFree
Variation in playback speed of a stored audio data signal... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Variation in playback speed of a stored audio data signal..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Variation in playback speed of a stored audio data signal... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2496588