Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-03-03
2004-08-24
Knepper, David D. (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S233000, C704S223000
Reexamination Certificate
active
06782361
ABSTRACT:
TECHNICAL FIELD
This invention relates to a method and apparatus for providing background acoustic noise during a discontinued/reduced rate transmission mode of a voice transmission system. The invention is especially applicable to digital voice communications and more particularly to wireless voice communications systems, and bit-rate sensitive applications including digital simultaneous voice and data (DSVD) systems, voice over internet-protocol (VOIP) and digital speech interpolation (DSI) systems.
BACKGROUND ART
In wireless voice communication systems, it is desirable to reduce the level of transmitted power so as to reduce co-channel interference and to prolong battery life of portable units. In cellular systems, interference reduction enhances spectral efficiency and increases system capacity. One way to reduce the power level of transmitted information is to reduce the overall transmission bit rate. A typical telephone conversation comprises approximately 40 per cent active speech and about 60 per cent silence and non-speech sounds, including acoustic background noise. Consequently, it is known to discontinue transmission during periods when there is no speech.
Other wireless systems require a continuous mode of transmission for system synchronization and channel monitoring. It is inefficient to use the full speech-coding rate mode for the background acoustic noise because it contains less information than the speech. When speech is absent, a lower rate coding mode is used to encode the background noise. In Code Division Multiple Access (CDMA) wireless communication systems, variable bit rate (VBR) coding is used to reduce the average bit rate and to increase system capacity. The very low bit rate used during speech gaps is insufficient to avoid perceptible discontinuities between the background noise accompanying speech and during speech gaps.
A disadvantage of simply discontinuing transmission, as done by early systems, is that the background noise stops along with the speech, and the resulting received signal sounds unnatural to the recipient.
This problem of discontinuities has been addressed by generating synthetic noise, known as “comfort noise”, at the receiver and substituting it for the received signal during the quiet periods. One such silence compression scheme using a combination of voice activity detection, discontinuous transmission, and synthetic noise insertion has been used by Global System for Mobile Communications (GSM) wireless voice communication systems. The GSM scheme employs a transmitter, which includes a voice activity detector (VAD) which discriminates between voice and non-voice signals, and receiver which includes a synthetic noise generator. When the user is speaking, the transmitter uses the full coding rate to encode the signal. During quiet periods, i.e. when no speech is detected, the transmitter is idle except for periodically updating background noise information characterizing the “real” background noise. When the receiver detects such quiet periods, it causes the synthetic noise generator to generate synthetic noise, i.e. comfort noise, and insert it into the received signal. During the quiet periods, the transmitter transmits to the receiver updated information about the background noise using what are known as Silence Insertion Descriptor (SID) frames and the receiver uses the parameters to update its synthetic noise generator.
It is known to generate the synthetic noise by passing a spectrally-flat noise signal (white noise) through a synthesis filter in the receiver, the noise parameters transmitted in the SID frames then being coefficients for the synthesis filter. It has been found, however, that the human auditory system is capable of detecting relatively subtle differences, and a typical recipient can perceive, and be distracted by, differences between the real background noise and the synthetic noise. This problem was discussed in European patent application number EP 843,301 by K. Jarvinen et al., who recognized that a user can still perceive differences where the spectral content of the real background noise differs from that of the synthetic noise. In order to reduce the spectral quality differences, Jarvinen et al. disclosed passing the random noise excitation signal through a spectral control filter before applying it to the synthesis filter. While such spectral modification of the excitation signal might yield some improvement over conventional systems, it is not entirely satisfactory. Mobile telephones, in particular, may be used in a wide variety of locations and the typical user can still perceive the concomitant differences between the background noise accompanying speech and the synthetic noise inserted during non-speech intervals.
DISCLOSURE OF INVENTION
An object of the present invention is to provide a background noise coding method and apparatus capable of providing synthetic noise (“comfort” noise) which sounds more like the actual background noise.
To this end, in communications systems embodying the present invention, the background noise is classified into one or more of a plurality of noise classes and the receiver selects one or more of a corresponding plurality of different excitation signals for use in generating the synthetic noise.
According to one aspect of the present invention, in a digital communications system comprising a transmitter and a receiver, the transmitter interrupting or reducing transmission of a voice signal during interval absent speech and the receiver inserting synthetic noise into the received voice signals during said intervals, there is provided a method comprising the steps of assigning acoustic background noise in the voice signal to one or more of a plurality of noise classes, selecting a corresponding one of a plurality of excitation vectors each corresponding to at least one of the classes, using at least part of the selected noise vector to synthesize the synthetic noise, and outputting the synthetic noise during a said interval.
According to a second aspect of the present invention, there is provided a digital communications system comprising a transmitter and a receiver, the transmitter having means for interrupting or reducing transmission of a voice signal during interval absent speech and the receiver having means for inserting synthetic noise into the received voice signals during said intervals, there being provided means for assigning acoustic background noise in the voice signal to one or more of a plurality of noise classes, selecting a corresponding one of a plurality of excitation vectors each corresponding to at least one of the classes, using at least part of the selected excitation vector to synthesize the synthetic noise, and outputting the synthetic noise during a said interval.
In embodiments of either aspect, the transmitter may perform the classification of the background noise and transmit to the receiver a corresponding noise index and the receiver may select the corresponding excitation vector(s) in dependence upon the noise index. The receiver may select from a plurality of previously-stored vectors, or use a generator to generate an excitation vector with the appropriate parameters.
The predefined noise classes may be defined by temporal and spectral features based upon a priori knowledge of expected input signals. Such features may include zero crossing rate, root-mean-square energy, critical band energies, and correlation coefficients. Preferably, however, noise classification uses line spectral frequencies (LSFs) of the signal, with a Gaussian fit to each LSF histogram.
Preferably, the noise classification is done on a frame-by-frame basis using relatively short segments of the input voice signal, conveniently about 20 milliseconds.
In preferred embodiments of either aspect of the invention, linear prediction (LP) analysis of the input signal is performed every 20 milliseconds using an autocorrelation method and windows each of length 240 samples overlapping by 80 samples. The LP coefficients then are calculated using the Levinson-Durbin alg
El-Maleh Khaled Helmi
Kabal Peter
Adams Thomas
Knepper David D.
McGill University
LandOfFree
Method and apparatus for providing background acoustic noise... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for providing background acoustic noise..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for providing background acoustic noise... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3353322