Variation in playback speed of a stored audio data signal...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S211000

Reexamination Certificate

active

06223153

ABSTRACT:

The present invention relates to a voice processing system and method.
Voice processing systems, which are well-known in the art (see for example “Voice Processing”, by Walt Teschner, published by Artech House), perform a variety of functions, the most common of which is voice mail (also known as voice messaging), whereby callers who cannot reach their intended addressee can instead record a message for them for subsequent retrieval. It is occasionally desirable to be able to skip through a stored voice mail message; either forwards to the more important issues raised therein or backwards to listen to points again. The DirectTalkMail system available from International Business Machines Corporation allows one to skip through a message, either backwards or forwards, using keys seven and nine respectively, eight seconds at a time (see DirectTalkMail Guide SC33-1221-XX, available from International Business Machines Corporation). However, such skipping through does not allow one to concurrently listen to the message; to achieve that the system must provide for variable speed of output of the stored voice data. The speeding up and slowing down of the rate of output of stored voice data is provided in the Aspen voice mail system available from Octel Communications Corporation, incorporated in Delaware, USA. One of the problems associated with speeding up and slowing down the speed of output of a voice message is to avoid a significant variation in a pitch which substantially reduces the comprehensibility of the voice message. It is possible to obviate this variation in pitch using digital signal processing techniques. One example of these is provided in product ETSM available from Entropic Speech, Inc, incorporated in California, USA. However, the digital signal processing techniques utilised are very processor intensive and present a significant drain on processor capacity thereby making it difficult to perform the necessary processing in a realtime telephony environment.
Accordingly, the present invention provides a method for varying the speed of playback of digitised audio data derived from a sequence of encoded audio data units, comprising the steps of storing a set of digitised audio data units, processing said digitised audio data units by omitting or repeating selected digitised audio data units in accordance with a desired variation in speed, and outputting said processed digitised audio data units.
The present invention allows the speed of output of a voice message to be varied whilst preserving the pitch thereof. As a consequence of the pitch remaining substantially unchanged, the comprehensibility of the voice message at higher or lower speeds of output is much improved. Further, the present invention affords a very simple and a processor inexpensive manner of achieving a variation in the speed of output of voice messages whilst maintaining pitch. As the processing involves repeating or omitting the utilisation of digitised audio data units without further processing, the processor overhead is significantly reduced.
An embodiment provides a method wherein said digitised audio data units are encoded using a history based encoding technique. History based techniques, such as those which utilise differences between successive segments of audio data, are particularly effective for use in the present invention. The history based techniques as they contain information related to or derived from previous audio data units enable good quality audio data to be generated therefrom notwithstanding that previous audio data units have been omitting or repeated.
An embodiment provides a method wherein said encoded audio data blocks represent Linear Predictive Coding (LPC) coefficients. The use of LPC coefficients to represent digitised voice has the dual benefit of, first, allowing very good quality speech to be derived therefrom and, secondly, being very efficient in terms of storage and processing overhead. It is important for voice mail systems to be able to store data in compressed form in order to efficiently utilise storage capacity. Further, the ability to repeat or omit the use of LPC blocks reduces processor overhead as the omission or repetition is performed before decompression or decoding of the LPC blocks. Thus the amount of data which is processed as compared with unencoded data is substantially reduced thereby reducing processor loading.
Preferably the percentage variation in the speed of playback is between 50° to 200%. A practical implementation of the present invention indicates that the comprehensibility of the audio signal derived from the digitised audio data units starts to degrade when the speed of playback is outside the above range.
It is preferred that the digitised audio data units represent between 5 msec and 50 msec of audio data. Using speech in blocks of between 5 msec and 50 msec enables a compromise to be reached between granularity and speed of searching. A practical implementation has found that 20 msec represent a good compromise. If the time period of audio data represented by the LPC coefficients is too small, the processor may become unduly loaded as a consequence of handling a large number of small blocks. In addition, it is believed that a lower limit on the duration of the speech may arise from the LPC coefficients. This lower limit is determined by the dynamics of the human ear, that is an LPC block may have to allow slightly more than one complete cycle of the lowest frequency present to be derived therefrom in order that that cycle is discernable by the human ear. However, if the time period represented by the LPC coefficients is too large, discernable repetition or stutter will be audible in the resultant audio signal derived therefrom.
The present invention also provides a voice mail system comprising means for storing voice messages comprising a set of digitised audio data units, means for playing back the stored message including means for varying the speed of playback, means for processing said digitised audio data units by omitting or repeating selected digitised audio data units in accordance with a desired variation in speed, and means for outputting said processed digitised audio data units.


REFERENCES:
patent: 4435832 (1984-03-01), Asada et al.
patent: 4864620 (1989-09-01), Bialick
patent: 5175769 (1992-12-01), Hejna, Jr. et al.
Aspen Quick Reference Guide, Octel Communications Corporation, 2 pages.
ETSM—Entopic Time-Scale Modification Software promotional leaflet, 4 pages.
“Speech Coding and Speech Recognition Technologies: A Review”, IEEE International Symposium on circuits and Systems, 1991, pp. 575-577, vol. 1.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Variation in playback speed of a stored audio data signal... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Variation in playback speed of a stored audio data signal..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Variation in playback speed of a stored audio data signal... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2496588

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.