Method for coding speech and music signals

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S219000, C704S230000, C375S225000, C375S244000

Reexamination Certificate

active

06658383

ABSTRACT:

FIELD OF THE INVENTION
This invention is directed in general to a method and an apparatus for coding signals, and more particularly, for coding both speech signals and music signals.
BACKGROUND OF THE INVENTION
Speech and music are intrinsically represented by very different signals. With respect to the typical spectral features, the spectrum for voiced speech generally has a fine periodic structure associated with pitch harmonics, with the harmonic peaks forming a smooth spectral envelope, while the spectrum for music is typically much more complex, exhibiting multiple pitch fundamentals and harmonics. The spectral envelope may be much more complex as well. Coding technologies for these two signal modes are also very disparate, with speech coding being dominated by model-based approaches such as Code Excited Linear Prediction (CELP) and Sinusoidal Coding, and music coding being dominated by transform coding techniques such as Modified Lapped Transformation (MLT) used together with perceptual noise masking.
There has recently been an increase in the coding of both speech and music signals for applications such as Internet multimedia, TV/radio broadcasting, teleconferencing or wireless media. However, production of a universal codec to efficiently and effectively reproduce both speech and music signals is not easily accomplished, since coders for the two signal types are optimally based on separate techniques. For example, linear prediction-based techniques such as CELP can deliver high quality reproduction for speech signals, but yield unacceptable quality for the reproduction of music signals. On the other hand, the transform coding-based techniques provide good quality reproduction for music signals, but the output degrades significantly for speech signals, especially in low bit-rate coding.
An alternative is to design a multi-mode coder that can accommodate both speech and music signals. Early attempts to provide such coders are for example, the Hybrid ACELP/Transform Coding Excitation coder and the Multi-mode Transform Predictive Coder (MTPC). Unfortunately, these coding algorithms are too complex and/or inefficient for practically coding speech and music signals.
It is desirable to provide a simple and efficient hybrid coding algorithm and architecture for coding both speech and music signals, especially adapted for use in low bit-rate environments.
SUMMARY OF THE INVENTION
The invention provides a transform coding method for efficiently coding music signals. The transform coding method is suitable for use in a hybrid codec, whereby a common Linear Predictive (LP) synthesis filter is employed for reproduction of both speech and music signals. The LP synthesis filter input is switched between a speech excitation generator and a transform excitation generator, pursuant to the coding of a speech signal or a music signal, respectively. In a preferred embodiment, the LP synthesis filter comprises an interpolation of the LP coefficients. In the coding of speech signals, a conventional CELP or other LP technique may be used, while in the coding of music signals, an asymmetrical overlap-add transform technique is preferably applied. A potential advantage of the invention is that it enables a smooth output transition at points where the codec has switched between speech coding and music coding.
Additional features and advantages of the invention will be made apparent from the following detailed description of illustrative embodiments that proceeds with reference to the accompanying figures.


REFERENCES:
patent: 5394473 (1995-02-01), Davidson
patent: 5717823 (1998-02-01), Kleijn
patent: 5734789 (1998-03-01), Swaminathan et al.
patent: 5751903 (1998-05-01), Swaminathan et al.
patent: 5778335 (1998-07-01), Ubale et al.
patent: 6108626 (2000-08-01), Cellario et al.
patent: 6134518 (2000-10-01), Cohen et al.
patent: 6240387 (2001-05-01), De Jaco
patent: 6310915 (2001-10-01), Wells et al.
patent: 6311154 (2001-10-01), Gersho et al.
patent: 6351730 (2002-02-01), Chen
patent: 2001/0023395 (2001-09-01), Su et al.
patent: WO 9827543 (1998-06-01), None
Lefebvre, et al., “High quality coding of wideband audio signals using transform coded excitation (TCX),” Apr. 1994, 1994 IEEE International Confernece on Acoustics, Speech, and Signal Processing, vol. 1, pp. I/193-I/196.*
Salami, et al., “A wideband codec at 16/24 kbit/s with 10 ms frames,” Sep. 1997, 1997 Workshop on Speech Coding for Telecommunications , pp 103-104.*
ITU-T, G.722.1 (09/99), Series G: Transmission Systems and Media, Digital Systems and Networks, Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss.*
Saunders, J., “Real Time Discrimination of Broadcast Speech/Music,”Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 993-996 (May 1996).
Scheirer, E., et al., “Construction and Evalutaiton of A Robust Multifeature Speech/Music Discriminator,”In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1331-1334, (Apr. 1997).
Combescure, P., et al., “A16, 24, 32 kbit/s Wideband Speech Codec Based on ATCELP,”In Proceedings of IEEE International Conference On Acoustics, Speech, and Signal Processing, vol. 1, pp. 5-8 (Mar. 1999).
Ellis, D., et al., “Speech/Music Discrimination Based on Posterior Probability Features,”In Proceedings of Eurospeech, 4 pages, Budapest (1999).
El Maleh, K., et al. “Speech/Music Discrimination for Multimedia Applications,”In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 2445-2448, (Jun. 2000).
Houtgast, T., et al., “The Modulation Transfer Function In Room Acoustics As A Predictor of Speech Intelligibility,”Acustica, vol. 23, pp. 66-73 (1973).
Tzanetakis, G., et al., “Multifeature Audio Segmentation for Browsing and Annotation,”Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, pp. 103-106 (Oct. 1999).
J. Schnitzler, J. Eggers, C. Erdmann and P. Vary, “Wideband Speech Coding Using Forward/Backward Adaptive Prediction with Mixed Time/Fre quency Domain Excitation,” in Proc. IEEE Workshop on Speech Coding, pp. 3-5, 1999.
B. Bessette, R. Salami, C. Laflamme and R. Lefebvre, “A Wideband Speech and Audio Codec at 16/24/32 kbit/s using Hybrid ACELP/TCX Techniques,” in Proc. IEEE Workshop on Speech Coding, pp. 7-9, 1999.
S.A. Ramprashad, “A Multimode Transform Predictive Coder (MTPC) for Speech and Audio,” in Proc. IEEE Workshop on Speech Coding, pp. 10-12, 1999.
L. Tancerel, R. Vesa, V.T. Ruoppila and R. Lefebvre, “Combined Speech and Audio Coding by Discrimination,” in Proc. IEEE Workshop on Speech Coding, pp. 154-156, 2000.
J-H. Chen and D. Wang, “Transform Predictive Coding of Wideband Speech Signals,” in Proc. International Conference on Acoustic, Speech, Signal Processing, pp. 275-278, 1996.
A. Ubale and A. Gersho, “Multi-Band CELP Wideband Speech Coder,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Munich, pp. 1367-1370.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for coding speech and music signals does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for coding speech and music signals, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for coding speech and music signals will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3107486

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.