Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-09-12
2004-03-16
McFadden, Susan (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
Reexamination Certificate
active
06708151
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a reference pattern generating apparatus and method for generating a reference pattern having a high representation efficiency for use in speech recognition. The invention relates to a computer readable medium having a reference pattern generating program embodied thereon that may be used to implement the method.
2. Description of the Prior Art
In general, speech recognition employs a pattern matching method of comparing an input voice pattern with a set of reference patterns associated with words by computing pattern matching distances between the input voice pattern and the reference patterns, and furnishing a word exhibiting a minimum pattern matching distance as the recognition result. A word's reference pattern can be a time series of frames or feature vectors X(
1
), X(
2
), X(
3
), . . . , X(T) obtained from an input voice pattern of the word, where T is the length of the word (i.e. the number of frames). Since T depends on words, the size of the reference pattern depends on words. Therefore, the size of memory used for storing a set of reference patterns cannot be determined in advance, even if the number of words is predetermined. In addition, as the number of frames T assigned to each word increases, the amount of storage used for storing the set of reference patterns also increases.
In order to solve such problems, an apparatus for and method of compressing the time series of feature vectors X(
1
), X(
2
), X(
3
), . . . , X(T) obtained from an input voice pattern of each word and generating a predetermined number J(>1) of states of reference pattern for each word, J being independent of the number of frames of each word, have been studied.
Referring now to
FIG. 17
, there is illustrated a block diagram showing the structure of such a prior art reference pattern generating apparatus as disclosed in Japanese Patent Application Publication (TOKKAISHO) No. 64-44997, for example. In the figure, reference numeral
1
denotes an input terminal for receiving a voice signal
2
, applied thereto, numeral
3
denotes an analysis unit for performing an acoustic analysis on the input voice signal
2
, numeral
4
denotes a time series of feature vectors, which is obtained as an acoustic analysis result from the input voice signal
2
by the analysis unit
3
, numeral
5
denotes an initial reference pattern generating unit for generating an initial reference pattern
6
from the time series
4
of feature vectors, and numeral
7
denotes a reference pattern generating unit for generating a reference pattern
8
from the initial reference pattern
6
.
In operation, a voice signal generated from a person's voice is applied to the input terminal
1
to create a reference pattern. The analysis unit
3
analog-to-digital converts the voice signal
2
applied to the input terminal and then performs an acoustic analysis on the analog-to-digital converted voice signal on a per-frame basis (a certain short time interval is called a “frame”). Based on the acoustic analysis result, the analysis unit
3
extracts a speech region (region in which speech is present) from the digital voice signal and then calculates a time series
4
of feature vectors X(
1
), X(
2
), X(
3
), . . . , X(T) from the speech region. Each feature vector X(t) (t=1, 2, 3, . . . T) is calculated on a per-frame basis. T is the number of frames included in the speech region extracted from the digital voice signal, i.e. the number of feature vectors. Since it is difficult to precisely extract the speech region (where voice is present) from the digital voice signal, a few leading and last frames can include a pause. Each feature vector X(t) can be an LPC cepstrum obtained with linear prediction or LPC analysis.
The initial reference pattern generating unit
5
receives the time series
4
of feature vectors X(
1
), X(
2
), X(
3
), . . . , X(T) from the analysis unit
3
, and then generates an initial value
6
of the reference pattern following a procedure, which will be mentioned later. Referring next to
FIG. 18
, there is illustrated a flow diagram showing the procedure of generating the initial reference pattern
6
.
The initial reference pattern generating unit
5
, in step ST
101
of
FIG. 18
, divides the time series
4
of feature vectors X(
1
), X(
2
), X(
3
), . . . , X(T) into J (J>1) small sections B(
1
), B(
2
), B(
3
), . . . , B(J) in such a manner that any two adjacent small sections do not overlap each other and they are equal in length where possible. Start and end frames sz(j) and ez(j) of each small section B(j) (j=1, 2, 3, . . . , J) are given by the following equations (1) to (3):
L=[T/J]
(1)
sz
⁡
(
j
)
=
{
1
for
⁢
⁢
j
=
1
ez
⁡
(
j
-
1
)
+
1
for
⁢
⁢
j
=
2
,
…
⁢
,
J
(2)
ez
⁡
(
j
)
=
{
sz
⁡
(
j
)
+
L
-
1
for
⁢
⁢
j
=
1
,
…
⁢
,
J
-
1
T
for
⁢
⁢
j
=
J
(3)
where [ ] of the equation (1) is an arithmetic operation to round off the number in the square brackets to produce an integer.
FIG. 19
shows an example in which the number T of frames or feature vectors X(
1
), X(
2
), X(
3
), . . . , X(T) is 15, and the number J of small sections B(
1
), B(
2
), B(
3
), . . . , B(J) or states of the reference pattern is
5
. In this example, the time series
4
of feature vectors X(
1
), X(
2
), X(
3
), . . . , X(
15
) is divided into the plurality of small sections B(
1
), B(
2
), B(
3
), . . . , B(J) in such a manner that they are equal in length and B(
1
) includes feature vectors X(
1
) to X(
3
), B(
2
) includes feature vectors X(
4
) to X(
6
), . . . , and B(
5
) includes feature vectors X(
13
) to X(
15
).
The initial reference pattern generating unit
5
then advances to step ST
102
in which it averages part of the time series
4
of feature vectors included in each small section B(j) obtained in step ST
101
according to the following equation (4) to generate an initial value Rz(j) (j=1, 2, 3, . . . , J):
Rz
⁡
(
j
)
=
1
ez
⁡
(
j
)
-
sz
⁡
(
j
)
+
1
⁢
⁢
∑
k
=
sz
⁡
(
j
)
ez
⁡
(
j
)
⁢
⁢
X
⁡
(
k
)
(4)
The process of generating the initial value Rz(j) (j=1, 2, 3, . . . , J) for each small section in the case of the number J of states of the reference pattern=5 is shown in FIG.
19
. The initial value Rz(
1
) for the first state is produced by averaging the time series of the leading three feature vectors X(
1
) to X(
3
) included in the first small section B(
1
), the initial value Rz(
2
) for the second state is produced by averaging the time series of the three feature, vectors X(
4
) to X(
6
) included in the second small section B(
2
), . . . , and the initial value Rz(
5
) for the fifth state is produced by averaging the time series of the last three feature vectors X(
13
) to X(
15
) included in the fifth small section B(
5
), as shown in FIG.
19
.
During the above-mentioned averaging process, the initial value Rz(j) for each small section is determined in such a manner that the sum D(j) of Euclidean distances between each state of the initial reference pattern Rz(j) and the feature vectors X(sz(j)) to X(ez(j)) included in each small section B(j), which is calculated according to the following equation (5), is minimized.
D
⁡
(
j
)
=
∑
k
=
sz
⁡
(
j
)
ez
⁡
(
j
)
⁢
⁢
&LeftBracketingBar;
Rz
⁡
(
j
)
-
X
⁡
(
k
)
&RightBracketingBar;
2
(5)
In this way, the initial reference pattern generating unit
5
completes the process of generating the initial reference pattern
6
.
The reference pattern generating unit
7
receives the initial reference pattern
6
including the plurality of states Rz(
1
), Rz(
2
), Rz(
3
), . . . , Rz(J) generated by the initial reference pattern generating unit
5
and the time series
4
of feature vectors X(
1
), X(
2
), X(
3
), . . . , X(T), which are calculated from the input voice signal by the analysis unit
3
. The reference patte
Birch & Stewart Kolasch & Birch, LLP
McFadden Susan
Mitsubishi Denki & Kabushiki Kaisha
LandOfFree
Reference pattern generating apparatus and method, and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Reference pattern generating apparatus and method, and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reference pattern generating apparatus and method, and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3231988