Learning apparatus, learning method, recognition apparatus,...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S211000, C704S236000

Reexamination Certificate

active

06449591

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a learning apparatus and a learning method, and particularly to a learning apparatus, a learning method, a recognition apparatus, a recognition method, and a recording medium which enable recognition of a signal including a nonlinear time component, such as speech or the like, without considering the time component.
Also, the present invention relates particularly to a learning apparatus, a learning method, a recognition apparatus, a recognition method, and a recording medium which are capable of improving a recognition rate by providing models capable of sufficiently expressing, for example, a transition of a state or the like.
Further, the present invention relates to a learning apparatus, a learning method, a recognition apparatus, a recognition method, and a recording medium which are capable of dealing with parameters concerning speech and images by using equal weights, for example, where speech recognition is carried out based on a speech and an image of lips when the speech is pronounced.
For example, with respect to speech, the length of a word nonlinearly extends or contracts every time, even if one person pronounces the same word twice. Therefore, when recognizing pronunciation, it is necessary to cope with such nonlinear extension or contraction of length. For example, a DP (Dynamic Programming) matching method is known as a method in which matching to a standard pattern is carried out while DTW (Dynamic Time Warping) is performed by performing nonlinear time-axis extension or contraction.
However, even if the time-axis extension or contraction is carried out by the DP matching method, there is no guarantee that phonemes of an inputted speech properly correspond to phonemes of a standard pattern. If the phonemes do not correspond properly, a recognition error occurs.
Meanwhile, if matching can be performed without considering nonlinear time components of speech, recognition errors due to time-axis extension or contraction as described above can be prevented.
Also, as an algorithm for recognizing speech, a HMM (Hidden Markov Models) method has been conventionally known. In a discrete HMM method, learning is previously carried out so that models corresponding to recognition targets are obtained. From each model, a probability (observation probability) at which an input series corresponding to an inputted speech is observed is calculated on the basis of a state transition probability given to the model (at which a state transits to another state which normally includes transition to itself) and an output probability (at which a certain code (label or symbol) is outputted when transition of a state occurs). Further, based on the observation provability, the inputted speech is recognized.
Meanwhile, with respect to learning in the HMM method, a manufacturer of a system determines the number of states and forms of state transitions (e.g., a limitation to state transition by which the transition from a state to another state is limited to either itself or a right adjacent state), and models thereof are used to carry out the learning.
However, the models which are, as it were, determined by the system manufacturer do not always comply with the number of states or forms of state transition which recognition targets originally have. Further, if the models do not comply with the number of states or forms of state transition which observation targets originally have, several models cannot correctly express steady states or transiting states, and as a result, the recognition rate is deteriorated.
Further, for example, recognition of a speech is achieved by extracting a characteristic parameter from the speech and comparing the characteristic parameter with a standard parameter (standard pattern) as a reference.
Meanwhile, if recognition of a speech is carried out based only on the speech, improvement of the recognition rate is limited to some extent. Hence, it is possible to consider a method in which the recognition rate is improved by using an image obtained by picking up lips of a speaker who is speaking, in addition to the speech itself.
In this case, a characteristic parameter extracted from the speech and a characteristic parameter extracted from the image of lips are integrated (combined) with each other, into an integrated parameter. It is considered that this integrated parameter can be used to carry out recognition of the speech.
However, if a characteristic parameter of a speech and a characteristic parameter of an image are simply integrated in parallel (or simply joined with each other) to achieve recognition, the recognition is influenced strongly from either the speech or image (i.e., one of the speech and the image may be weighted more than the other), thereby hindering improvement of the recognition rate.
SUMMARY OF THE INVENTION
An advantage of the present invention is, therefore, to achieve improvements of the recognition rate by enabling recognition without considering a time component of a signal.
Another advantage of the present invention is to achieve improvements of the recognition rate of speech and the like by providing a model which can sufficiently express the number of states and the like which a recognition target originally has.
A further advantage of the present invention is to achieve improvements of the recognition performance by making it possible to deal with characteristic parameters of different inputs such as a speech and an image, with equal weights.
To this end, a learning apparatus according to an embodiment of the present invention is provided. The learning apparatus includes calculation means for calculating an expectation degree of each identifier, from a series of identifiers indicating code vectors, obtained from a time series of learning data.
A learning method according to an embodiment of the present invention calculates an expectation degree of each identifier, from a series of identifiers indicating code vectors, obtained from a time series of learning data.
A recording medium according to an embodiment of the present invention records a program having a calculation step of calculating an expectation degree of each identifier, from a series of identifiers indicating code vectors, obtained from a time series of learning data.
A recognition apparatus according to the present invention includes vector quantization means for vector-quantizing input data and for outputting a series of identifiers indicating code vectors. Properness detection means are provided for obtaining properness as to whether or not the input data corresponds to the recognition target, with use of the series of identifiers obtained from the input data and expectation degrees of identifiers. Recognition means are provided for recognizing whether or not the input data corresponds to the recognition target, based on the properness.
A recognition method according to the present invention is characterized in that: input data is vector-quantized, thereby to output a series of identifiers indicating code vectors; properness as to whether or not the input data corresponds to a recognition target is obtained with use of the series of identifiers obtained from the input data and expectation degrees of the identifiers at which the identifiers are expected to be observed; and whether or not the input data corresponds to the recognition target is recognized, based on the properness.
A recording medium according to the present invention is characterized by recording a program including: a vector-quantization step of vector-quantizing the time series of input data pieces, thereby to output a series of identifiers indicating code vectors; a properness detection step of obtaining properness as to whether or not the time series of input data pieces corresponds to the recognition target, with use of the series of identifiers obtained from the input data and expectation degrees of the identifiers at which the identifiers are expected to be observed; and a recognition step of recognizing whether or not the time series o

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Learning apparatus, learning method, recognition apparatus,... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Learning apparatus, learning method, recognition apparatus,..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Learning apparatus, learning method, recognition apparatus,... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2824807

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.