Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-01-14
2002-10-15
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S240000
Reexamination Certificate
active
06466908
ABSTRACT:
STATEMENT OF GOVERNMENT INTEREST
The invention described herein may be manufactured and used by or for the Government of the United States of America for governmental purposes without the payment of any royalties thereon or therefore.
BACKGROUND OF THE INVENTION
(1) Field of the Invention
This invention relates to systems and methods for modeling physical phenomena, and more particularly to a system and method for modeling physical phenomena, such as speech, using a class-specific implementation of the Baum-Welch algorithm for estimating the parameters of a class-specific hidden Markov model (HMM).
(2) Description of the Prior Art
By way of example of the state of the art, reference is made to the following papers, which are incorporated herein by reference. Not all of these references may be deemed to be relevant prior art.
P. M. Baggenstoss, “Class-specific features in classification,”
IEEE Trans. Signal Processing
, December 1999.
S. Kay, “Sufficiency, classification, and the class-specific feature theorem,” to be published IEEE Trans Information Theory.
B. H. Juang, “Maximum likelihood estimation for mixture multivariate stochastic observations of Markov chains,”
AT&T Technical Journal
, vol. 64, no. 6, pp. 1235-1249, 1985.
L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,”
Proceedings of the IEEE
, vol. 77, pp. 257-286, February 1989.
L. E. Baum, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,”
Ann. Math. Stat
., vol. 41, pp. 164-171, 1970.
E. H. Lehmann, Theory of Point Estimation, New York: Wiley, 1983.
S. Kay, Modern Spectral Estimation;
Theory and Applications
. Prentice Hall, 1988.
E. J. Hannan,
Multiple Time Series
, Wiley, 1970.
M. H. Quenouille, “The joint distribution of serial correlation coefficients,”
Ann. Math. Stat
., vol. 20, pp. 561-571, 1949.
Many systems, e.g., communication, data processing and other information systems, can be described or characterized in terms of a series of transitions through a set of states. Hidden Markov models (HMMs) have found applications in modeling physical phenomena characterized by a finite number of states. Often these states represent distinct physical phenomena. In speech, for example, the human voice is characterized by distinct physical phenomena or modes, e.g., voiced speech, fricatives, stops, and nasal sounds. When applied to speech processing applications, the speech modes or components are first modeled by HMMs using an algorithm to estimate parameters for the HMMs (referred to as the training phase). The trained HMMs can then be used to determine which speech components are present in a speech signal (referred to as the recognition phase).
For the classical hidden Markov model (HMM), all observations are assumed to be realizations of a random statistical model that depends on the Markov state. Although the statistical models, i.e. the observation probability density functions (PDF's), are different for each state, they are defined on the same observation space. The dimension of this observation space needs to be high to adequately observe the information content of the data for all states. The high dimension requires a large number of observations (or training samples) and leads to poor performance with limited amounts of training data.
In speech, for example, a different set of parameters control the uttered sound during each of the speech modes. Furthermore, a distinct type of signal processing or feature extraction is best suited to estimate the corresponding parameters of each mode. But, since one cannot know a priori which mode is in effect at a given instant of time and cannot change the observation space accordingly, it is necessary to operate in a unified observation space.
This requires a feature set that carries enough information for the estimation of all modes. This in turn leads to dimensionality issues since there is only a finite amount of data with which to train the observation PDF estimates. In effect, the observation PDF's of each state are represented using a feature set with higher dimension than would be necessary if the other states did not exist. The amount of data required to estimate a PDF is exponentially dependent on feature dimension. Given limitations of computer storage and available data, feature dimensions above a certain point are virtually impossible to accurately characterize. As a result, one may be forced to use a subset of the intended feature set to reduce dimension or else suffer the effects of insufficient training data.
Consider a hidden Markov model (HMM) for a process with N states numbered S
1
through S
N
. Let the raw data be denoted X[t], for time steps t=1,2, . . . , T. The parameters of the HMM, denoted &lgr;, comprise the state transition matrix A={a
ij
}, the state prior probabilities u
j
, and the state observation densities b
j
(X), where i and j range from 1 to N. These parameters can be estimated from training data using the Baum-Welch algorithm, as disclosed in the papers by Rabiner and Juang. But, because X[t] is often of high dimension, it may be necessary to reduce the raw data to a set of features z[t]=T(X[t]). We then define a new HMM with the same A and u
j
but with observations z[t], t=1,2, . . . T and the state densities b
j
(z) (we allow the argument of the density functions to imply the identity of the function, thus b
j
(X) and b
j
(z) are distinct).
This is the approach used in speech processing today where z[t] are usually a set of cepstral coefficients. If z[t] is of low dimension, it is practical to apply probability density function (PDF) estimation methods such as Gaussian Mixtures to estimate the state observation densities. Such PDF estimation methods tend to give poor results above dimensions of about 5 to 10 unless the features are exceptionally, i.e., “well-behaved” are close to independent or multivariate Gaussian. In human speech, it is doubtful that 5 to 10 features can capture all the relevant information in the data. Traditionally, the choices have been (1) use a smaller and insufficient features set, (2) use more features and suffer PDF estimation errors, or (3) apply methods of dimensionality reduction. Such methods include linear subspace analysis, projection pursuit, or simply assuming the features are independent (a factorable PDF). All these methods involve assumptions that do not hold in general.
The class-specific method was recently developed as a method of dimensionality reduction in classification, as disclosed in U.S. patent application Ser. No. 09/431,716 entitled “Class Specific Classifier.” Unlike other methods of dimension reduction, it is based on sufficient statistics and results in no theoretical loss of performance due to approximation. Because of the exponential relationship between training data size and dimension, even a mere factor of 2 reduction in dimension can result in a significant difference.
SUMMARY OF THE INVENTION
Accordingly, one object of the present invention is to reduce the number of data samples needed for training HMMs.
Another object of the present invention is to extend the idea of dimensionality reduction in classification to the problem of HMM modeling when each state of the HMM may have its own minimal sufficient statistic.
A further object of the present invention is to modify the Baum-Welch algorithm used to estimate parameters of class-specific HMMs.
The foregoing objects are attained by the method and system of the present invention. The present invention features a method of training a class-specific hidden Markov model (HMM) used for modeling physical phenomena characterized by a finite number of states. The method comprises the steps of receiving training data forming an observation sequence; estimating parameters of the class-specific HMM from the training data using a modified Baum-Welch algorithm, wherein the modified Baum-Welch algorithm u
Dorvil Richemond
Kasischke James M.
McGowan Michael J.
Oglo Michael F.
The United States of America as represented by the Secretary of
LandOfFree
System and method for training a class-specific hidden... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for training a class-specific hidden..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for training a class-specific hidden... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3000389