Coded data generation or conversion – Digital code to digital code converters
Reexamination Certificate
1999-05-12
2001-08-07
Young, Brian (Department: 2819)
Coded data generation or conversion
Digital code to digital code converters
Reexamination Certificate
active
06271771
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to audio coding and decoding, respectively, and in particular to a method of and a device for performing a hearing-adapted quality assessment of audio signals.
BACKGROUND OF THE INVENTION AND PRIOR ART
As hearing-adapted digital coding methods have been standardized for some years (Kh. Brandenbrug and G. Stoll, The iso/mpeg-audio codec: A generic standard for coding of high quality digital audio, 92nd AES-Convention, Vienna, 1992, Preprint 3336), these are being employed in increasing manner. Examples hereof are the digital compact cassette (DCC), the minidisk, digital terrestrial broadcasting (DAB; DAB=Digital Audio Broadcasting) and the digital video disk (DVD). The disturbances known from analog transmissions as a rule are no longer present in digital uncoded audio signal transmission. Measurement technology can be confined to the transition from analog to digital and vice versa, if no coding of the audio signals is carried out.
In case of coding by means of hearing-adapted coding methods, however, audible artificial products or artifacts may occur that have not occurred in analog audio signal processing.
Known measurement values, such as e.g. the harmonic distortion factor or the signal-to-noise ratio, cannot be employed for hearing-adapted coding methods. Many hearing-adapted coded music signals have a signal-to-noise ratio of below 15 dB, without audible differences to the uncoded original signal being perceivable. In opposite manner, a signal-to-noise ratio of more than 40 dB may already lead to clearly audible disturbances.
In recent years, various hearing-adapted measuring methods were introduced, of which the NMR method (NMR=Noise to Mask Ratio) is to be mentioned (Kh. Brandenburg and Th. Sporer. “NMR” and “Masking Flag”: Evaluation of quality using perceptual criteria. In
Proceedings of the
11
th International Conference of the AES,
Portland, 1992).
In an implementation of the NMR method, a discrete Fourier transform of the length 1024 and using a Hann window with an advancing speed of 512 sampling values for an original signal and for a differential signal, is calculated between the original signal and a processed signal each. The spectral coefficients obtained therefrom are combined in frequency bands the width of which corresponds approximately to the frequency groups suggested by Zwicker in E. Zwicker, Psychoacoustics, publisher Springer-Verlag, Berlin Heidelberg N.Y., 1982, whereupon the energy density of each frequency band is determined. From the energy densities of the original signal, an actual masking or covering threshold is determined in consideration of the masking within the respective frequency group, the masking between the frequency groups and the post-masking for each frequency band, with said masking threshold being compared with the energy density of the differential signal. The resting threshold of the human ear is not fully considered since the input signals of the measuring method cannot be identified with fixed listening loudnesses, as a listener of audio signals usually has access to the loudness of the piece of music or audio piece he wants to listen to.
It has turned out that the NMR method, for example, in case of a typical sampling rate of 44.1 kHz, has a frequency resolution of about 43 Hz and a time resolution of about 23 ms. The frequency resolution is too low in case of low frequencies, whereas the time resolution is too low in case of high frequencies. Nevertheless, the NMR method displays a good reaction to many time effects. When a sequence of beats, such as e.g. drum beats, is sufficiently low, the block prior to the beat still has very low energy, so that a possibly occurring pre-echo can be recognized exactly. The advancing speed of 11.6 ms for the analysis window permits the recognition of many pre-echoes. However, when the analysis window has an unfavorable position, a pre-echo may remain unrecognized.
The difference between masking by tonal signals and by noise is not taken into consideration in the NMR method. The masking curves employed are empirical values obtained from subjective hearing tests. To this end, the frequency groups are located at fixed positions within the frequency spectrum, whereas the ear forms the frequency groups dynamically around particularly prominent sound events in the spectrum. Thus, more correct would be a dynamic arrangement about the centers of the energy densities. Due to the width of the fixed frequency groups, it is not possible to distinguish, for example, whether a sinusoidal signal is located in the center or at an edge of a frequency group. The masking curve thus is based on the most critical case, i.e. the lowest masking effect. The NMR method therefore sometimes indicates disturbances that cannot be heard by a human being.
The already mentioned low frequency resolution of only 43 Hz constitutes a limit to a hearing-adapted quality assessment of audio signals by means of the NMR method in particular in the lower frequency range. This has a particularly disadvantageous effect in the assessment of low-pitched voice signals, as produced for example by a male speaker, or sounds of very low-pitched instruments, such as e.g. a bass trombone.
For providing a better understanding of the present invention, some important psychoacoustic and cognitive fundamentals for the hearing-adapted quality assessment of audio signals will be indicated in the following. The most important term in the field of hearing-adapted coding and measuring technology is the term “Verdeckung”(=masking) which by analogy with the English term “masking” often is also referred to as “Maskierung”. A discretely occurring, perceivable sound event of low loudness is masked by a louder sound event, i.e. it is no longer perceived in the presence of the second, louder sound event. The masking effect is dependent both upon the time structure and upon the spectral structure of the masker (i.e. the masking signal) and the masked signal.
FIG. 1
is to illustrate the masking of sounds by narrow-band noise signals
1
,
2
,
3
at 250 Hz, 1,000 Hz and 4,000 Hz and a sound pressure level of 60 dB.
FIG. 1
is taken from E. Zwicker and H. Fastl, Concerning the dependency of post-masking on disturbance pulse duration, in
Acustica,
Vol. 26, pages 78 to 82, 1982.
The human ear in this respect can be regarded as a bank of filters consisting of a large number of mutually overlapping band-pass filters. The distribution of these filters over the frequency is not constant. In particular, with low frequencies the frequency resolution is clearly better than with high frequencies. When looking at the smallest perceivable frequency difference, this value is about 3 Hz at frequencies below about 500 Hz, and above 500 Hz increases in proportion to the frequency or center frequency of the frequency groups. When the smallest perceivable differences are juxtaposed on the frequency scale, 640 perceivable stages are obtained. A frequency scale that is adapted to the frequency sensation of human beings is constituted by the bark scale. The latter subdivides the entire audible range up to about 15.5 KHz into, 24 sections.
Due to the overlapping of filters of finite steepness, audio signals of low loudness in the vicinity of loud audio signals are masked. Thus, in
FIG. 1
all sinusoidal audio signals present below the illustrated narrow-band noise curves
1
,
2
,
3
, which in the spectrum are represented as an individual line, are masked and thereby are not audible.
The edge steepness of the individual masking filters of the bank of filters in the human ear, as assumed in the model, furthermore is dependent upon the sound pressure level of the signal heard and to a lesser extent on the center frequency of the respective band filter. The maximum masking is dependent upon the structure of the masker and is about −5 dB in case of masking by noise. In case of masking by sinusoidal sounds, the maximum masking is considerably lesser and, depending on the center frequency, is &min
Seitzer Dieter
Sporer Thomas
Beyer Weaver & Thomas LLP
Fraunhofer-Gesellschaft zur Förderung der Angewandten e.V.
Young Brian
LandOfFree
Hearing-adapted quality assessment of audio signals does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Hearing-adapted quality assessment of audio signals, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hearing-adapted quality assessment of audio signals will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2454091