Multipulse search processing method and speech coding apparatus

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Multipulse search processing method and speech coding apparatus Multipulse search processing method and speech coding apparatus

: 2000-09-18
: 2002-12-31
: McFadden, Susan (Department: 2654)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: For storage or transmission

: C704S219000, C704S220000
: Reexamination Certificate
: active
: 06502068
: ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a speech coding apparatus and, more particularly, to a speech coding apparatus for coding an input speech signal using an MPEG-4/CELP scheme as one of code excited linear prediction coding schemes of modeling a sound source using a multipulse.
MPEG-4/CELP (Moving Picture Experts Group phase 4) is one of CELP (Code Excited Linear Prediction) schemes as general-purpose speech coding schemes standardized by ISO/IEC (International Organization for Standardization/International Electrotechnical Commission) in February, 1999. There are two coding modes, MPE (MultiPulse Excitation) and RPE (Regular Pulse Excitation) in accordance with the type of sound source code book. In both the MPE and RPE modes, the sound source is modeled by a multipulse made up of a plurality of impulses. However, the degrees of freedom for the pulse position have a difference. The RPE mode uses a constant pulse interval, whereas the MPE mode has a high degree of freedom for the pulse position. Because of this difference, the MPE mode can achieve higher speech quality than in the RPE mode, but suffers a large required calculation amount.
The basic operation of a speech coding apparatus using the MPEG-4/CELP scheme as a speech coding apparatus for the MPE mode will be described with reference to FIG.
5
.
As shown in
FIG. 5
, this speech coding apparatus is constituted by an LPC (Linear Prediction Codec) analysis unit
401
, quantization unit
402
, LPC filter
403
, speech synthesis unit
404
, and subtracter
412
.
Speed coding is done by segmenting input speech into frames each with a predetermined time, and using the frame as a compression unit.
An input speech signal as original speech is subjected to LPC analysis by the LPC analysis unit
401
, and quantized by the quantization unit
402
. A code speech-synthesized by the speech synthesis unit
404
and a code quantized by the quantization unit
402
are filtered by the LPC filter
403
to generate reproduced speech. The subtracter
412
calculates the difference between the original speech and the reproduced speech, and outputs an error signal
405
. The error signal
405
is input to the speech synthesis unit
404
to select and the parameters of the speech synthesis unit
404
so as to minimize the error signal
405
. When the error signal
405
minimizes, the speech synthesis model and input speech are approximate to each other. The parameters of the speech synthesis unit
404
which minimize the error signal
405
form an MPEG-4/CELP code.
The speech synthesis unit
404
comprises multipliers
409
and
410
, an adder
411
, and three parameters, an ACB (Adaptive Code Book)
406
, MP (MultiPulse) code book
407
, and GCB (Gain Code Book)
408
.
The ACB
406
is generated from many basic speech models of a corresponding person on the basis of the primitive period of the sound source, and generates a pitch period component. The MP code book
407
expresses the noise/error of the sound source by the positions and amplitudes of a plurality of pulses (multipulse), and generates a random component other than the pitch period component. The GCB
408
represents the mixing ratio of the ACB
406
and MP code book
407
. That is, the multiplier
409
multiplies a pitch period component generated by the ACB
406
by the mixing ratio of the ACB
406
controlled by the GCB
408
, while the multiplier
410
multiplies a random component generated by the MP code book
407
by the mixing ratio of the MP code book
407
controlled by the GCB
408
. Outputs from the multipliers
409
and
410
are added by the adder
411
to perform speech synthesis.
Processing of selecting a multipulse which minimizes the error signal
405
from the MP code book
407
is called multipulse search processing. The multipulse search processing method as the feature of the MPE mode is disclosed in Japanese Patent Laid-Open No. 7-160298.
In multipulse search processing, a position where each pulse can be set is uniquely determined for each pulse. Therefore, in multipulse search processing, distortions are calculated and added for respective set pulse position candidates in ascending order of pulse numbers, and a combination exhibiting the smallest distortion is obtained. The “distortion” is a correlation coefficient between adjacent pulses. Multipulse search processing creates a multipulse search table which stores a distortion for each pulse position candidate set for each pulse number, and determines the position and amplitude of each pulse based on the multipulse search table. This multipulse search table must be created for each frame serving as a speech compression unit.
FIG. 6
shows the structure of the MP code book
407
for performing multipulse search processing in a conventional speech coding apparatus.
A search table creation unit
508
creates a multipulse search table
307
on the basis of an inter-pulse distortion table
301
and pulse position candidate table
302
.
The contents of the pulse position candidate table
302
are shown in Table 1.
TABLE 1
Pulse Number
Pulse Position Candidate m
i
1
0, 5, 10, 15, 20, 25, 30, 35
2
1, 6, 11, 16, 21, 26, 31, 36
3
2, 7, 12, 17, 22, 27, 32, 37
4
3, 8, 13, 18, 23, 28, 33, 38
5
4, 9, 14, 19, 24, 29, 34, 39
The pulse position candidate table exists for each compression bit rate. Table 1 represents a pulse position candidate table for an MPEG-4/CELP compression bit rate of 8,300 bps. The number of pulses is five, and pulses are given by pulse numbers
1
,
2
, . . . ,
5
sequentially from the top. For a bit rate of 8,300 bps, the number of samples in one frame serving as a compression unit is 40, and 40 pulses having an amplitude of ±1 are modeled to be expressed by five pulses. The pulse position candidate table in Table 1 has pulse position candidates for each pulse number. The pulse position candidate interval for each pulse number is uniquely determined.
As the modeling method, the pulse position candidate table is arranged at the nodes of a tree structure as shown in FIG.
7
.
FIG. 8
shows the structure of the multipulse search table
307
. The structure of the multipulse search table
307
stores a distortion
704
between adjacent pulses for each pulse position candidate
703
present for each pulse number
702
. The pulse interval in obtaining each pulse position candidate and a distortion between adjacent pulses varies from 1 to the maximum number of samples of one frame at a pulse position candidate interval. Distortions are calculated every pulse interval, and stored as the inter-pulse distortion table
301
as shown in FIG.
6
.
Multipulse search processing in the conventional speech coding apparatus will be explained with reference to the flow charts of
FIGS. 9 and 10
.
The multipulse search processing sequence has a quadruple loop structure made up of, sequentially from the outer loop, a loop whose end condition (step S
901
) is whether processing has been performed up to the maximum pulse position candidate interval from an initial value of 1 at a distance increment of 1 using an inter-pulse distance for obtaining a distortion as an index, a loop whose end condition (step S
902
) is whether processing has been performed for the maximum number of samples of one frame from an initial number of 1 at a pulse position candidate interval of 1, a loop whose end condition (step S
903
) is whether processing has been performed for the number of pulses to be modeled, i.e., pulse numbers, and a loop whose end condition (step S
904
) is whether processing has been performed for the number of pulse position candidates at each pulse number. Whether processing has been done for the maximum number of samples of one frame from an initial number of 1 at a pulse position candidate interval of 1 is determined (step S
902
). Then, a distortion between pulses having a distance set by the outermost loop is obtained, and distortions of one frame are stored in the inter-pulse distortion table
301
(step S
905
). In these loops, the multipulse search table
307
is created (s

Affiliated with

Misu Katsuya

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Dickstein Shapiro Morin & Oshinsky LLP.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

McFadden Susan

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multipulse search processing method and speech coding apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Multipulse search processing method and speech coding apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multipulse search processing method and speech coding apparatus will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2987728

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure