Look-ahead pitch determination

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S219000, C704S223000

Reexamination Certificate

active

06564182

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is generally in the field of signal coding. In particular, the present invention is in the field of pitch determination for speech coding.
2. Background Art
Traditionally, all parametric speech coding methods make use of the redundancy inherent in the speech signal to reduce the amount of information that must be sent and to estimate the value of speech samples of a signal at short intervals. This redundancy primarily arises from the repetition of wave shapes at a periodic rate.
The redundancy of speech wave forms may be considered with respect to several different types of speech signal, such as voiced and unvoiced. For voiced speech, the speech signal is essentially periodic; however, this periodicity may be variable over the duration of a speech segment and the shape of the periodic wave usually changes gradually from segment to segment. As for the unvoiced speech, the signal is more like a random noise and has a smaller amount of predictability.
In either case, parametric coding may be used to reduce the redundancy of the speech segments by separating the excitation component of the speech from the spectral envelop component. The coding advantage arises from the slow rate at which the parameters change. However, it is difficult to estimate exactly the rate at which the parameters change. Yet, it is rare for the parameters to be significantly different from the values held within a few milliseconds. Accordingly, the sampling rate of the speech is such that the nominal frame duration is in the range of five to thirty milliseconds. In a more recent ITU standard Evrc, G.723 or EFR that has adopted the Code Excited Linear Prediction Technique (“CELP”), each frame includes 160 samples and is 20 milliseconds long.
A robust estimation of the pitch or fundamental frequency of speech is one of the classic problems in the art of speech coding. Accurate pitch estimation is a key to any speech coding algorithm. In CELP, for example, the pitch estimation is performed for each frame. For pitch estimation purposes, each 20 ms frame is processed in two 10 ms subframes. First, the pitch lag of the first 10 ms subframe is estimated using an open loop pitch estimation method. Subsequently, the pitch lag of the second 10 ms is estimated in a similar fashion. However, at the time of estimating the pitch lag of the second subframe, additional information or the pitch lag information of the first subframe is available to more accurately estimate the pitch lag of the second subframe. Traditionally, such information is used to better estimate and correct the pitch lag of the second subframe. The traditional approach allows for the past pitch information to be used for estimating the future pitch lag, since, as stated above, speech parameters are not significantly different from the values held within a few milliseconds previously. In particular, the pitch changes very slowly during voiced speech.
Referring to
FIG. 2
, an application of a conventional pitch lag estimation method is illustrated with reference to a speech signal
220
. As shown, frame
1
212
is shown in two subframes for which pitch lag
0
231
and pitch lag
1
232
are estimated. The pitch lag
0
231
is obtained before the pitch lag
1
232
and is available for correcting the pitch lag
1
232
. As further shown, the pitch lag information for each subframe of subsequent frames
213
,
214
, . . .
216
are computed in a sequential fashion. For example, the pitch lag
1
232
information would be available to help estimate pitch lag
0
of frame
2
213
, pitch lag
0
233
would be available to help estimate pitch lag
1
234
, and so on. Accordingly, the past pitch information is conventionally used to estimate subsequent pitch lags.
The conventional approach suffers from incorrectly assuming that the past pitch lag information is always a proper indication of what follows. The conventional approach also lacks the ability to properly estimate the pitch in speech transition areas as well as other areas. Accordingly, there is a serious need in the art to provide a more accurate pitch estimation, especially in speech transition areas from unvoiced to voiced speech.
SUMMARY OF THE INVENTION
In accordance with the purpose of the present invention as broadly described herein, there is provided method and system for speech coding.
The encoder of the present invention processes an input signal on a frame-by-frame basis. Each frame is divided into first half and second half subframes. For a first frame, a pitch of the first half subframe of a subsequent frame (look-ahead subframe) is estimated. Using the look-ahead pitch information, a pitch of the second half subframe of the first frame is estimated and corrected.
In one aspect of the present invention, a pitch of the first half subframe of the first frame is also estimated and used to better estimate and correct the pitch of the second half subframe of the first frame. In another aspect of the invention, the pitch of the look-ahead frame is used as the pitch of the first half subframe of the subsequent frame.
In yet another aspect of the invention, a normalized correlation is calculated using the pitch of the look-ahead subframe. The normalized correlation is used to correct and estimate the pitch of the second half subframe of the first frame.


REFERENCES:
patent: 5159611 (1992-10-01), Tomita et al.
patent: 5226108 (1993-07-01), Hardwick et al.
patent: 5495555 (1996-02-01), Swaminathan
patent: 5596676 (1997-01-01), Swaminathan et al.
patent: 5734789 (1998-03-01), Swaminathan et al.
patent: 6003004 (1999-12-01), Hershkovits et al.
patent: 6055496 (2000-04-01), Heidari et al.
patent: 6104993 (2000-08-01), Ashley
patent: 6141638 (2000-10-01), Peng et al.
TIA/EIA Interim Standard Article: “Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems,” from Telecommunications Industry Association, No. TIA/EIA/IS-127, Jan. 1997, 6 pages (including cover page).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Look-ahead pitch determination does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Look-ahead pitch determination, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Look-ahead pitch determination will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3059837

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.