Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-07-20
2002-12-17
Chawan, Vijay B. (Department: 2645)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S221000
Reexamination Certificate
active
06496796
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to a voice coding apparatus for compressing a digital sound signal to a smaller information amount and a voice decoding apparatus for decoding voice code generated by the voice coding apparatus, etc., to reproduce the digital sound signal.
Most voice coding apparatus and voice decoding apparatus in related arts separate input voice into spectrum envelope information and a sound source and code them in frame units to generate voice code, then decode the voice code to combine the spectrum envelope information and the sound source through a combining filter, thereby providing decode voice.
A voice coding apparatus and a voice decoding apparatus using a code-excited linear prediction (CELP) technique are available as the most representative voice coding apparatus and voice decoding apparatus.
FIG. 15
shows the general configuration of a CELP base voice coding apparatus. In the figure, numeral
1
denotes input voice, numeral
2
denotes linear prediction analysis means, numeral
3
denotes linear prediction coefficient coding means, numeral
4
denotes adaptive sound source coding means, numeral
5
denotes drive sound source coding means, numeral
6
denotes gain coding means, numeral
7
denotes multiplexing means, and numeral
8
denotes voice code.
FIG. 16
shows the general configuration of a CELP base voice decoding apparatus. In the figure, numeral
9
denotes demultiplexing means, numeral
10
denotes linear prediction coefficient decoding means, numeral
11
denotes adaptive sound source decoding means, numeral
12
denotes drive sound source decoding means, numeral
13
denotes gain decoding means, numeral
14
denotes a combining filter, and numeral
15
denotes output voice.
The voice coding apparatus and the voice decoding apparatus in the related art perform processing in frame units with about 5 to 50 ms as a frame. The operation of the voice coding apparatus and the voice decoding apparatus in the related art is as follows:
First, in the voice coding apparatus, the input voice
1
is input to the linear prediction analysis means
2
and the adaptive sound source coding means
4
. The linear prediction analysis means
2
analyzes the input voice
1
and extracts a linear prediction coefficient of voice spectrum envelope information. The linear prediction coefficient coding means
3
codes the linear prediction coefficient and outputs the code to the multiplexing means
7
and also outputs the coded linear prediction coefficient for coding a sound source.
The adaptive sound source coding means
4
, in which past sound sources are previously stored as an adaptive sound source code book, prepares time-series vectors periodically repeating the past sound sources corresponding to the adaptive sound source codes. Next, the adaptive sound source coding means
4
multiplies each time-series vector by an appropriate gain and allows the result to pass through a combining filter using the coded linear prediction coefficient for providing a tentative composite tone. It examines the distance between the tentative composite tone and the input voice
1
, selects an adaptive sound source code to minimize the distance, and outputs the time-series vector corresponding to the selected adaptive sound source code as the adaptive sound source. The adaptive sound source coding means
4
also outputs the input voice
1
or a signal provided by subtracting the composite tone based on the adaptive sound source from the input voice
1
to the drive sound source coding means
5
at the following stage.
The drive sound source coding means
5
first reads time-series vectors sequentially from a drive sound source code book stored in the drive sound source coding means
5
corresponding to drive sound source codes. Next, the drive sound source coding means
5
multiplies each time-series vector and the adaptive sound source by an appropriate gain, adds the results, and allows the addition result to pass through a combining filter using the coded linear prediction coefficient for providing a tentative composite tone. It uses the input voice
1
or the signal provided by subtracting the composite tone based on the adaptive sound source from the input voice
1
as a signal to be coded, examines the distance between the signal to be coded and the tentative composite tone, selects a drive sound source code to minimize the distance, and outputs the time-series vector corresponding to the selected drive sound source code as the drive sound source.
The gain coding means
6
first reads gain vectors sequentially from a gain code book stored in the gain coding means
6
corresponding to gain codes. The gain coding means
6
multiplies the adaptive sound source and the drive sound source by each element of each gain vector, adds the results, and allows the addition result to pass through a combining filter using the coded linear prediction coefficient for providing a tentative composite tone. It examines the distance between the tentative composite tone and the input voice
1
and selects a gain code to minimize the distance.
Last, the adaptive sound source coding means
4
multiplies the adaptive sound source and the drive sound source by each element of the gain vector corresponding to the selected gain code and adds the results, thereby preparing a sound source and updating the adaptive sound source code book.
The multiplexing means
7
multiplexes the linear prediction coefficient code, the adaptive sound source code, the drive sound source code, and the gain code and outputs a provided voice code
8
.
In the voice decoding apparatus, the demultiplexing means
9
demultiplexes the voice code
8
into the linear prediction coefficient code, the adaptive sound source code, the drive sound source code, and the gain code.
The linear prediction coefficient decoding means
10
decodes the linear prediction coefficient from the linear prediction coefficient code and sets the linear prediction coefficient as a coefficient of the combining filter
14
.
Next, the adaptive sound source decoding means
11
, in which past sound sources are previously stored as an adaptive sound source code book, outputs time-series vectors periodically repeating the past sound sources corresponding to the adaptive sound source codes. The drive sound source decoding means
12
outputs the time-series vector corresponding to the drive sound source code. The gain decoding means
13
outputs the gain vector corresponding to the gain code. The two time-series vectors are multiplied by each element of the gain vector and the results are added for preparing a sound source. This sound source is made to pass through the combining filter
14
to prepare an output voice
15
.
Last, the adaptive sound source decoding means
11
uses the prepared sound source to update the adaptive sound source code book.
Next, related arts intended for improving the CELP base voice coding apparatus and voice decoding apparatus will be discussed.
Document 1
KATAOKA Akitoshi, HAYASHI Shinji, MORITANI Takehiro, KURIHARA Shoko, MANO Kazunori “CS-ACELP no kihon algorithm” NTT R&D, Vol. 45, pp. 325-330 (April 1996) discloses CELP base voice coding apparatus and voice decoding apparatus adopting a pulse sound source for coding a drive sound source for the main purpose of reducing the operation amount and the memory amount. In the configuration in the related art, a drive sound source is represented only by several-pulse position information and polarity information. Such a sound source, which is called an algebraic sound source, has a good coding characteristic for its simple structure and has been adopted in most recent standards.
FIG. 17
is a table listing position candidates of pulse sound sources used in Document 1. In Document 1, the sound source coding frame length is 40 samples and each drive sound source consists of four pulses. The position candidates of each of the pulse sound sources with sound source numbers
1
to
3
are limited to eight positions as shown in
FIG. 17
, and each pulse position can be
Tasaki Hirohisa
Yamaura Tadashi
Chawan Vijay B.
Mitsubishi Denki & Kabushiki Kaisha
Oblon & Spivak, McClelland, Maier & Neustadt P.C.
Opsasnick Michael N.
LandOfFree
Voice coding apparatus and voice decoding apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Voice coding apparatus and voice decoding apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Voice coding apparatus and voice decoding apparatus will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2927055