Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-06-30
2001-10-16
Dorvil, Richemond (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S230000, C704S219000
Reexamination Certificate
active
06304842
ABSTRACT:
TECHNICAL FIELD
This invention is directed to linear predictive coding of speech sounds in a manner which more accurately represents the sudden energy variations which characterize unvoiced plosives.
BACKGROUND
Linear Predictive Coding (LPC) of speech involves estimating the coefficients of a time varying filter (henceforth called a “synthesis filter”) and providing appropriate excitation (input) to that time varying filter. The process is conventionally broken down into two steps known as encoding and decoding.
As shown in
FIG. 1
, in the encoding step, the original speech signal s is first filtered by pre-filter
10
. The pre-filtered speech signal s
p
is then analyzed by LPC Analysis block
14
to compute the coefficients of the synthesis filter. Then, an LPC analysis filter
12
is formed, using the same coefficients as the synthesis filter but having an inverse structure. The pre-filtered speech signal s
p
is processed by analysis filter
12
to produce a residual output signal u called the “residue”. Information about the filter coefficients and the residue is passed to a decoder (not shown) for use in the decoding step.
In the decoding step, a synthesis filter is formed using the coefficients obtained from the encoder. An appropriate excitation signal is applied to the synthesis filter, based on the information about the residue obtained from the encoder. The synthesis filter outputs a synthetic speech signal, which is ideally the closest possible approximation imitation to the original speech signal, s.
This invention pertains to the processing of unvoiced plosives in the residue (i.e. the process steps shown in blocks
20
-
28
enclosed within the dashed outline portions of FIG.
1
). During unvoiced speech, plosives (or stops) in the residue are characterized by sudden variations in energy from one block of speech samples to the next. Prior art linear predictive speech coding techniques have achieved only poor representation of unvoiced plosives. In particular, prior art techniques typically represent unvoiced plosives by interpolating energy variations between relatively few samples spaced relatively far apart. This yields a gradual variation in energy, which does not accurately reflect unvoiced plosives' sudden energy variations. This invention achieves more accurate location and coding of unvoiced plosives in the residue. Information about the location of the start of the sudden energy variation (burst portion of the unvoiced plosive) in the residue is encoded. This enables the decoder to produce a synthetic excitation signal having sudden energy variations during unvoiced plosives, thereby improving the quality of the synthetic speech considerably.
SUMMARY OF INVENTION
The invention provides a method of encoding signal segments which represent unvoiced plosives. The signal segments to be encoded are contained within a speech signal divided into m=1, . . . , N frames. Each frame is subdivided into l=1, . . . , L subframes. The speech signal has a gain g
m
(l) within each subframe.
In accordance with the invention, an energy measure e
m
(l) representative of the signal segments' energy content is defined. An energy threshold e
th
(l) representative of a sudden energy change characteristic of an unvoiced plosive is also defined. For each frame, the energy measure e
m
(l) and the energy threshold e
th
(l) are derived for each subframe within that frame. If e
m
(l)≦e
th
(l) for each subframe within a particular frame, then a plosive locator l
pl
=0 and a plosive index i
pl
=0 are assigned to that frame to indicate absence of a plosive within that frame. If e
m
(l)>e
th
(l) for any subframe within the frame, then that frame's plosive locator l
pl
is assigned a non-zero value indicating location of the plosive at a transition point immediately following that one of the subframes within the frame for which e
m
(l)−e
th
(l) is greatest; and, that frame's plosive index i
pl
is assigned a non-zero value representing presence of a plosive within that frame.
The plosive index i
pl
≠0 is assigned as:
if (l
pl
<L)
i
pl
=J(l
pl
−1)+k k=j if g
m
(l
pl
)&egr;(g
level
(j−1),g
level
(j)], j=1, . . . , J
else
i
pl
=2
K
−1
end if
where, l
pl
is the subframe for which the energy measure exceeds the energy measure threshold, J is the predefined value of the number of levels used in quantizing the gain, g
m
(l
pl
), K=┌log
2
(J(L−1)+2)┐ is the value of the number of bits used in encoding the plosive locator l
pl
and g
level
is the predefined quantized gain decision level vector.
The invention further provides a method of decoding a signal which has been encoded as above. Since the encoder's gain values are not directly available to the decoder, the encoder provides a quantized gain vector for use by the decoder. In order to minimize the encoded bit rate, the gain of only one subframe is quantized, with the remaining elements of the quantized gain vector being estimated in a manner which ensures reproduction of the sudden energy variations necessary for improved characterization of plosives.
REFERENCES:
patent: Re. 32580 (1988-01-01), Atal et al.
patent: 5091946 (1992-02-01), Ozawa
patent: 5794186 (1998-08-01), Bergstrom et al.
patent: 5839102 (1998-11-01), Haagen et al.
patent: 173986A (1986-03-01), None
patent: 852376A (1998-07-01), None
Susumu Sato et al: “Recognition of Plosive Using Mixed Features by Fisher's Linear Discriminant” Proceedings of the International Conference on Spoken Language Processing (ICSLP), JP, Tokyo, ASJ, 1990 pp. 213-216.
Weigelt L F et al: “Plosive/Fricative Distinction: The Voiceless Case” Journal of the Acoustical Society of America, US, American Institute of Physics. New York, vol. 87, No. 6, Jun. 1, 1990, pp. 2729-2737.
“An Improved Mixed Excitation Linear Prediction (MELP) Coder”, Unno et al, Proc. IEEE Intl. Conf. on Audio, Speech & Signal Processing, 1999, vol. 1., pp. 245-248.
Bhattacharya Bhaskar
Husain Mohammad Aamir
Abebe Daniel
Dorvil Richemond
Glenayre Electronics, Inc.
Oyen Wiggs Green & Mutala
LandOfFree
Location and coding of unvoiced plosives in linear... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Location and coding of unvoiced plosives in linear..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Location and coding of unvoiced plosives in linear... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2591644