Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-02-25
2002-12-24
Knepper, David D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S228000, C370S508000, C370S519000
Reexamination Certificate
active
06499009
ABSTRACT:
FIELD OF INVENTION
The present invention relates to the field of telecommunications. More particularly, the present invention relates to estimating the quality of a speech signal.
BACKGROUND
In a conventional telecommunications system, the transmission chain over which a speech signal (e.g., a signal carrying a spoken sentence) must pass, may include speech encoders, speech decoders, an air interface, public switched telephone network (PSTN) links, computer network links, receive buffering, signal processing logic, and/or playback equipment. As one skilled in the art will readily appreciate, any one or more of these elements which make up the transmission chain may distort the speech signal. Estimating the quality of speech signals is important in order to ensure that speech quality exceeds minimum acceptable standards, so that speech signals can be heard and understood by a listener.
Typically, estimating speech quality involves transmitting a reference speech signal (herein referred to as a “reference signal”) across a transmission chain to a receiving entity. The received signal, having been distorted by the various elements that make up the transmission chain, is herein referred to as the test signal. The test signal and the original reference signal are then forwarded to a speech quality estimation algorithm.
There are a number of conventional, speech quality estimation algorithms. Most, however, employ the same basic technique which is illustrated in FIG.
1
. As shown, a reference signal
105
and a test signal
110
are divided into N number of short time frames (e.g., 20 msec. each). A new representation, such as a frequency representation, is then derived for each of the N time frames associated with the reference signal
105
and each of the N time frames associated with the test signal
110
. A difference vector comprising N time frames is then derived by comparing the representations associated with each of the N time frames of the reference signal
105
with the corresponding representation associated with the test signal
110
. The comparison might be accomplished by subtracting the corresponding representations on a frame-by-frame basis. For each frame, the difference between the corresponding representations may be summed so that a single distortion metric is derived for each of the N time frames. The N distortion metrics may then be averaged, where the average value can be used as a measure of total signal distortion or speech quality.
A problem with the above-identified speech quality estimation technique is that it is highly sensitive to time shifts (e.g., transmission delays); the greater the time shift, the more unreliable the speech quality estimation. In an attempt to avoid this problem, conventional speech quality estimation algorithms align the reference signal and the test signal before performing the speech quality estimation, as illustrated in FIG.
2
. Of course, just as there are a number of conventional approaches for estimating speech quality, there are a number of conventional techniques for aligning a reference signal and a test signal.
One such technique for aligning a reference signal and a test signal utilizes a known, estimated “global” delay factor, as illustrated in FIG.
2
. In accordance with this technique, the test signal or the reference signal is shifted in the time domain by an amount that is equivalent to an estimated global delay. Thereafter, the two signals may be fed to the speech quality estimation algorithm. Another well-known technique for aligning a reference signal and a test signal involves iteratively aligning the two signals in the time domain until a cross-correlation measurement, or other similar metric is maximized. Still another technique involves transmitting the reference signal, and in addition, information which identifies one or more portions of the signal, for example, by inserting sinusoidal signals or chirps into the reference signal. Accordingly, these one or more portions of the test signal can be more easily recognized and aligned with the corresponding portions of the reference signal.
Each of the above-identified techniques for aligning a reference signal and a test signal, however, assume that the delay introduced by the various components which make up the transmission chain is a fixed delay, or a delay that changes slowly over time, such that periodic resynchronization is possible. In other words, it is assumed that a constant time shift exists between the reference signal and the test signal. While this may hold true for circuit switched networks, transmission delays are rarely fixed or constant in packet switched networks, for example, Internet Protocol (IP) based networks. For instance, in virtually all packet switched network scenarios, transmission delays vary with traffic load (i.e., the level of congestion in the network). Since traffic load generally changes on a continuous basis, the transmission delay experienced by a single speech signal traversing the network may vary. If these variable transmission delays go undetected, the reference signal and the test signal cannot be properly aligned, and the speech quality estimation algorithm cannot possibly perform an accurate speech quality estimation. Furthermore, the use of inexpensive personal computer systems as communications devices might also contribute to a speech signal experiencing variable delays.
SUMMARY OF THE INVENTION
The present invention involves a speech quality estimation technique that permits the use of an arbitrary speech quality estimation algorithm. In general, the present invention analyzes the reference signal and the test signal, and based on this analysis, identifies delay variations and/or discontinuities in the test signal, if any. These portions of the test signal are then removed so that the reference signal and the test signal are similarly scaled with respect to time. The reference signal and the test signal are then forwarded to a standard speech quality estimation algorithm. The resulting speech quality estimation is then adjusted based on an analysis of the portions of the test signal that were previously removed.
Accordingly, it is an object of the present invention to provide a speech quality estimation technique that is capable of assessing speech quality despite the presence of variable transmission delays, including continuous and intermittent, variable transmission delays.
It is another object of the present invention to prevent the presence of variable transmission delays from precluding the use of a standard speech quality estimation algorithln.
In accordance with a first aspect of the present invention, the above-identified and other objectives are achieved by a method for estimating speech quality. The method involves identifying portions of a first speech signal that exhibit distortions caused by transmission delays. The identified portions are then removed from the first speech signal, and the first speech signal is compared to a second speech signal. A speech quality estimate is then generated, based on the comparison of the first speech signal and the second speech signal.
In accordance with a second aspect of the present invention, the above-identified and other objectives are achieved through a method of estimating speech quality in a telecommunications network, wherein a first speech signal is transported across a transmission chain to a receiving entity. The method involves aligning, at the receiving entity, each of a number of synchronization points along the first speech signal and a corresponding one of a number of synchronization points along a reference speech signal. A determination is then made as to whether any portions of the first speech signal reflect an intermittent delay variation, based on the alignment of the synchronization points along the first speech signal and the reference speech signal. The level of continuous delay variation exhibited by the first speech signal is then determined, and the first speech signal, or the reference speech signal, is adjusted to account for
Karlsson Anders
Lundberg Jonas
Steinarson Arne
Knepper David D.
Lerner Martin
Telefonaktiebolaget LM Ericsson
LandOfFree
Handling variable delay in objective speech quality assessment does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Handling variable delay in objective speech quality assessment, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Handling variable delay in objective speech quality assessment will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2927748