Acoustic modeling using a two-level decision tree in a...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S256000

Reexamination Certificate

active

06789063

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field of the Invention
The present invention relates to speech recognition and, more particularly, to a speech recognition system with a two-level decision tree.
2. Background Art
In large vocabulary continuous speech recognition systems, context-dependent phones, typically triphones, and continuous density HMM models are often used to get high accuracy acoustic models. The huge number of triphones and multivariate Gaussian mixture distributions results in too many parameters in a system. It is a problem to maintain a good balance between the model complexity and the number of parameters that can be robustly estimated from the limited training data. The use of phonetic decision trees provides a good solution to this problem. It has two advantages over the bottom-up based approaches. First, by incorporating the phonetic knowledge of the target language into the tree, it can synthesize unseen models or contexts, which do not appear in the training data but occur during recognition. Second, the splitting procedure of decision trees provides a way of maintaining the model complexity and the number of parameters to be robustly estimated.
A phonetic decision tree is a type of classification and regression tree (CART). In decision-tree based acoustic modeling, phonetic decision trees are constructed either for each phone model or for each HMM state of each phone. Since the state-based approach provides a more detailed level of sharing and outperforms the model-based approach, the state-based approach is widely used. The phonetic decision tree is a binary tree in which a yes-no question about the phonetic context is attached to each node. An example question is “Is the phone on the right of the current phone a vowel?” A set of states can be recursively partitioned into subsets according to the answers to the questions at each node when traversing the tree from the root node to its leaf nodes. All states that reach the same leaf nodes are considered similar and are clustered together. The question set can be either manually pre-defined using linguistic and phonetic knowledge of the language, or automatically generated.
The tree construction is a top-down data driven process based on a one-step greedy tree growing algorithm. The goodness-of-split criterion is based on maximum likelihood (ML) of the training data. Initially all corresponding HMM states of all triphones that share the same basic phone are pooled in the root node and the log-likelihood of the training data is calculated based on the assumption that all the states in the node are tied. This node is then split into two by the question that gives the maximum increase in log-likelihood of the training data when partitioning the states in the node. This process is repeated until the increase falls below a threshold. To ensure that each leaf node has sufficient training data to robustly estimate the state, a minimum data count for the leaf node is also applied.
Although the traditional method provides an effective and efficient way to build a decision tree for continuous density HMM models based on the maximum likelihood criterion, it has several problems. One is due to the assumption that the parametric form of the initial unclustered states should be based on only single mixture Gaussian distributions. After the tree is built, the clustered states have more training data and the number of Gaussian components in each state is increased by a mixture-splitting procedure until the performance of the model set peaks on a development set. The use of single Gaussian distributions during tree construction is due to the fact that the multiple mixture Gaussian distribution for a tree node needs to be re-estimated from the training data, whereas the parameters of the single mixture Gaussian distribution can be calculated efficiently from the cluster members without re-accessing the original training data. However, the single Gaussian distribution is a very crude representation of the acoustic space of each state and decision trees based on such initial models may not give good clustering of states. There are many efforts to address this problem. Another approach incorporates a so-called m-level optimal subtree into the traditional tree construction to get a multiple mixture Gaussian distribution parameterization of each node although each member state still has only single Gaussian distribution as in the traditional approach. Another approach directly estimates, by making some assumptions, the multiple mixture Gaussian distribution for a tree node from the statistics of the member states which also have multiple mixture Gaussian distributions. Both of their approaches achieve some improvement. Yet another approach estimates the multiple mixture Gaussian distributions of the un-clustered states by using the fixed state alignment provided by a previously trained and accurate model set. However, this approach has not been shown to give any improvement in terms of performance. Another problem with the standard tree-building process is due to the fact that construction of an optimal tree is an NP-hard problem. Instead, a sub-optimal one-step greedy algorithm is utilized. To make better decisions at each node split, look-ahead search may be used, yet no improvement is obtained. Many efforts address other aspects of the traditional decision-tree based state-clustering approach, such as applying other goodness-of-split criteria, using cross-validation to automatically determine the size of the trees by pruning back instead of using thresholds which have to be determined by many experiments, and expanding the question set to incorporate more knowledge of the language.


REFERENCES:
patent: 5657424 (1997-08-01), Farrell et al.
patent: 5857169 (1999-01-01), Seide
patent: 6192353 (2001-02-01), Assaleh et al.
patent: 6317712 (2001-11-01), Kao et al.
Chien et al. “compact decision trees with clsuter validity for speech recognition” IEEE 2002, PP 873-876.*
Kuhn et al. “improved decision tree for phonetic modeling” 1995 IEEE, pp 552-555.*
6th European Conference on Speech Communication and Technology, Eurospeech '99, Sep. 5-9, 1999, Budapest, Hungary, pp. 1-2.
<telecom.tuc.gr/paperdb/eurospeech99/HTML/First.HTM>, retrieved from WWW on Aug. 7, 2003.
S.J. Young et al., “Tree-Based State Tying for High Accuracy Acoustic Modelling,” ARPA Human Language Technology Workshop, pp. 307-312, 1994.
C. Liu et al., “High Accuracy Acoustic Modeling Using Two-Level Decision-Tree Based-Tying,” Proceedings of 6th European Confrence on Speech Communications and Technology (EuroSpeech), vol. 4, pp. 1703-1706, 1999.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Acoustic modeling using a two-level decision tree in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Acoustic modeling using a two-level decision tree in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Acoustic modeling using a two-level decision tree in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3225301

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.