Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-01-05
2004-04-06
To, Doris H. (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S243000
Reexamination Certificate
active
06718299
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to information processing apparatuses that integrate a plurality of feature parameters, and in particular, to an information processing apparatus in which, when speech recognition based on speech and on an image of lips observed when the speech was made is performed, the information processing apparatus increases speech recognition performance by integrating audio and image feature parameters so that the parameters can be processed in optimal form.
2. Description of the Related Art
By way of example, speech is recognized by extracting feature parameters from the speech, and comparing the feature parameters with normal parameters (normal patterns) used as a reference.
When speech recognition based on only speech is performed, there is a certain limit to increasing the recognition factor. Accordingly, it is possible that the speech recognition be performed based not only on the speech but also on a captured image of lips of the speaker.
In this case, it is also possible to integrate feature parameters extracted from the speech and feature parameters extracted from the lip image to form so-called “integrated parameters” and to use the integrated parameters to perform speech recognition. The assignee of the present patent application has proposed Japanese Patent Application No. 10-288038 (which was not open to the public when the present patent application was filed) as a type of speech recognition that generates integrated parameters by integrating feature parameters extracted from speech and feature parameters extracted from a lip image and that uses the integrated parameters to perform speech recognition.
With reference to
FIGS. 1
to
16
, Japanese Patent Application No. 10-288038 is described below.
FIG. 1
shows an example of a speech recognition apparatus that performs speech recognition based on integrated parameters obtained by integrating feature parameters based on a plurality of input data.
In addition to speed data (as a speech from a user) to be recognized, image data obtained by capturing an image of the user's lips when the user spoke, noise data on noise in an environment where the user spoke, and data useful in recognizing the user's speech (speech), such as a signal in accordance with the operation of an input unit for inputting a place where the user speaks in the case where the speech recognition apparatus is provided with the input unit, are sequentially input in time series to the speech recognition apparatus. The speech recognition apparatus takes these types of data into consideration, as required, when performing speech recognition.
Specifically, the speech data, the lip-image data, the noise data, and other data, which are in digital form, are input to a parameter unit
1
. The parameter unit
1
includes signal processors
11
1
to
11
N
(where N represents the number of data signals input to the parameter unit
1
). The speech data, the lip-image data, the noise data, and other data are processed by the signal processors
11
1
to
11
N
corresponding thereto, whereby extraction of feature parameters representing each type of data, etc., is performed. The feature parameters extracted by the parameter unit
1
are supplied to an integrated parameter generating unit
2
.
In the parameter unit
1
shown in
FIG. 1
, the signal processor (lip-signal processor)
11
1
processes the lip-image data, the signal processors (audio-signal processors)
11
2
to
11
N−1
process the speech data, and the signal processor (audio-signal processor)
11
N
processes the noise data, etc. The feature parameters of the speech (sound) data such as the speech data and the noise data include, for example, linear prediction coefficients, cepstrum coefficients, power, line spectrum pairs, and zero cross. The feature parameters of the lip-image data include, for example, parameters (e.g., the longer diameter and shorter diameter of an ellipse) defining an ellipse approximating the shape of the lips.
The integrated parameter generating unit
2
includes an intermedia normalizer
21
and an integrated parameter generator
22
, and generates integrated parameters by integrating the feature parameters of the signals from the parameter unit
1
.
In other words, the intermedia normalizer
21
normalizes the feature parameters of the signals from the parameter unit
1
so that they can processed having the same weight, and outputs the normalized parameters to the integrated parameter generator
22
. The integrated parameter generator
22
integrates (combines) the normalized feature parameters of the signals from the intermedia normalizer
21
, thereby generating integrated parameters, and outputs the integrated parameters to the matching unit
3
.
The matching unit
3
compares the integrated feature parameters and normal patterns (a model to be recognized), and outputs the matching results to a determining unit
4
. In other words, the matching unit
3
includes a distance-transition matching unit
31
and a spatial distribution matching unit
32
. The distance-transition matching unit
31
uses a distance-transition model (described below) to perform the matching of the integrated feature parameters by using a distance-transition method (described below), and outputs the matching results to the determining unit
4
. The spatial distribution matching unit
32
performs the matching of the integrated feature parameters by using a spatial distribution method (described below), and outputs the matching results to the determining unit
4
.
The determining unit
4
recognizes the user's speech (sound), based on outputs from the matching unit
3
, i.e., the matching results from the distance-transition matching unit
31
and the spatial distribution matching unit
32
, and outputs the result of recognition, e.g., a word. Accordingly, in the determining unit
4
, what is processed by speech recognition is a word. In addition, for example, a phoneme, etc., can be processed by speech recognition.
With reference to the flowchart shown in
FIG. 2
, processing by the speech recognition apparatus (shown in
FIG. 1
) is described below.
When the speech data, the lip-image data, the noise data, etc., are input to the speech recognition apparatus, they are supplied to the parameter unit
1
.
In step S
1
, the parameter unit
1
extracts feature parameters from the supplied data, and outputs them to the integrated parameter generating unit
2
.
In step S
2
, the intermedia normalizer
21
(in the integrated parameter generating unit
2
) normalizes the feature parameters from the parameter unit
1
, and outputs the normalized feature parameters to the integrated parameter generator
22
.
In step S
3
, the integrated parameter generator
22
generates integrated feature parameters by integrating the normalized feature parameters from the intermedia normalizer
21
. The integrated feature parameters are supplied to the distance-transition matching unit
31
and the spatial distribution matching unit
32
in the matching unit
3
.
In step S
4
, the distance-transition matching unit
31
performs the matching of the integrated feature parameters by using the distance-transition method, and the spatial distribution matching unit
32
performs the matching of the integrated feature parameters by using the spatial distribution method. Both matching results are supplied to the determining unit
4
.
In step S
5
, based on the matching results from the matching unit
3
, the determining unit
4
recognizes the speech data (the user's speech). After outputting the result of (speech) recognition, the determining unit
4
terminates its process.
As described above, the intermedia normalizer
21
(shown in
FIG. 1
) normalizes the feature parameters of the signals from the parameter unit
1
so that they can be processed having the same weight. The normalization is performed by multiplying each feature parameter by a normalization coefficient. This normalization coefficient is found by performing learni
Kondo Tetsujiro
Yoshiwara Norifumi
Frommer William S.
Frommer & Lawrence & Haug LLP
Kessler Gordon
Opsasnick Michael N.
Sony Corporation
LandOfFree
Information processing apparatus for integrating a plurality... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Information processing apparatus for integrating a plurality..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information processing apparatus for integrating a plurality... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3210217