Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2001-06-11
2003-06-24
To, Doris H. (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S206000
Reexamination Certificate
active
06584437
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to the field of speech coding and, in particular, to the quantization of successive pitch periods.
BACKGROUND OF THE INVENTION
Based on the human speech processing mechanism, the pitch period contour of voiced speech evolves slowly in time. This phenomenon is exploited in many current speech coders by coding the difference between successive pitch periods thereby increasing the coding efficiency. In a typical coder operating on a subframe basis, such as the code excited linear predictive (CELP) coder, the absolute pitch period is sent a least once per frame.
The difference between successive pitch periods is generally referred to as a delta period. In prior art, the delta periods may attain uniformly distributed values from a limited range facilitating their coding. This can be interpreted as a multi-dimensional rectangular lattice populated uniformly by points that define the delta periods over the frame. Accordingly, coding of the delta periods is carried out by using a uniform quantizer. That is, similar quantizers are used to code independently several successive delta periods. An encoder that uses such an approach is also known as a multi-dimensional rectangular lattice quantizer. In a multi-dimensional lattice quantizer, each dimension represents a pitch period in a corresponding subframe. Usually, the first dimension of a lattice is indicative of the absolute pitch period in the first subframe, while each of the remaining dimensions represents the difference between the pitch periods of the current and the preceding subframe. Thus, in a speech coding scheme where a speech frame is divided into four subframes for speech processing, the encoder for use in the quantization of successive pitch periods is referred to as a four-dimensional lattice quantizer, and the absolute pitch period in the first dimension and the delta periods in the remaining three dimensions are represented by a point (p, d
1
, d
2
, d
3
) in a four-dimensional pitch space. In the present invention, special attention is paid to a lattice structure containing the dimensions only for the delta periods (d
1
, d
2
, d
3
, . . . , d
n
).
In most prior art speech coders utilizing differential coding, the lattice structure for n delta periods is described as a set of points with a regular arrangement in an n-dimensional pitch space such that the points are uniformly spaced throughout the pitch space. In addition to the uniform spacing of the points in the pitch space, the key feature of the prior art speech coders is the rectangular shape of the projection of the lattice points onto a two-dimensional plane. The structure of the lattice is usually constant regardless of the pitch period in the previous segment. An example of a typical two-dimensional lattice for delta periods is presented in
FIG. 1
, where the lattice L is defined by
L
={(
d
1
,d
2
)|
d
1min
≦d
1
≦d
1max
&Lgr;d
2min
≦d
2
≦d
2max}
(1)
The lattice covers all possible combination of d
1
and d
2
between their respective minimum and maximum values. While the lattice, as shown in
FIG. 1
, is two-dimensional, higher dimensional lattices can be easily derived from the two-dimensional case. In general, the minimum and maximum possible delta periods for the jth dimension are denoted by d
jmin
and d
jmax
, respectively.
Once the shape and the region of the lattice quantizer are defined, an important parameter is the density of the lattice, for the density determines the bit rate of the coder. The bit rate is a monotonically increasing function of the density. Thus, the density of the lattice quantizer reflects the accuracy used for pitch period information. Normally, fractional values are used instead of integers to improve the quality of the synthesized speech.
In a typical lattice quantizer for delta periods, attention is usually paid to the boundary values (d
jmin
, d
jmax
) of the lattice while the rectangular shape of the lattice is kept constant. Attention is not paid, however, to the selection of a suitable set of lattice points to cover the regions of pitch space containing most of the source probability.
It is known that in a speech signal where pitch is a meaningful parameter, the evolution of pitch is smooth due to the characteristics of human speech processing mechanism. In general, the pitch period contour of voiced speech evolves slowly in time, and abrupt changes in the contour are very unlikely to happen. It has been found that a rectangular lattice structure is far from being optimal regarding the selection of lattice points to cover the regions of pitch space. Furthermore, in prior art, the search for differential pitch values is performed independently in each dimension. The use of rectangular lattices and the search method have not been optimized to reflect the known behavior of human speech.
It is advantageous and desirable to provide an improved method and system for the quantization of successive pitch periods in speech coders, taking advantage of the source probability in the pitch space to improve the quality of synthesized speech.
SUMMARY OF THE INVENTION
It is a primary object of the present invention to increase the efficiency of coding successive pitch periods thereby improving the quality of synthesized speeches in a speech coder utilizing differential coding to code the difference between successive pitch periods. This object can be achieved by defining an optimized, or more efficient, lattice structure which is shaped to cover the region of pitch space where the most probable points are located, based on a priori knowledge of the behavior of successive delta periods in voiced speech. Furthermore, regions with different point density representing different time resolution for pitch periods can be defined within the optimized lattice structure. With such an optimized lattice structure, a new method for assigning an index to a point in the optimized lattice structure and the search of the index in a codebook can be provided.
Thus, according to the first aspect of the present invention, a method of coding a sound signal in a plurality of signal frames each having a pitch period indicative of the sound signal in the respective signal frame, wherein each signal frame comprises a plurality of signal segments each representing a dimension in a pitch space, and the sound signal in each of the signal segments is characterized by a pitch value, and wherein the pitch values are representable by a point distribution pattern characteristic of the sound signal in a lattice structure for defining codebook indices in the pitch space, said method comprising the steps of:
shaping the lattice structure based on the point distribution pattern; and
providing a codebook index representing the pitch value in each dimension of the pitch space according to the shaped lattice structure for facilitating coding of the sound signal.
According the first aspect of the present invention, the method further comprises the steps of:
obtaining an open-loop estimate of the pitch period by an open-loop search from the shaped lattice structure considering all of the dimensions of the pitch space; and
refining the open-loop estimate in each of the dimensions in the pitch space separately by a closed-loop search from the shaped lattice structure for obtaining a closed-loop search value indicative of the pitch value in the respective signal segment.
According to the present invention, the pitch value is indicative of a differential pitch period or an absolute pitch period.
According to the present invention, the pitch value in at least one of the signal segments is indicative of an absolute pitch period and the pitch value in each of the remaining signal segments is indicative of a differential pitch period.
Accordingly, when the signal segments comprise sequentially a first signal segment and three second signal segments, the pitch value in the first signal segment is indicative of an absolute pitch period and the pitch value in each of the second
Heikkinen Ari
Pietilä Samuli
Ruoppila Vesa T.
Nokia Mobile Phones Ltd.
Nolan Daniel A.
To Doris H.
Ware Fressola Van Der Sluys & Adolphson LLP
LandOfFree
Method and apparatus for coding successive pitch periods in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for coding successive pitch periods in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for coding successive pitch periods in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3156951