Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-05-14
2004-05-11
Robinson, Greta (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C704S256000
Reexamination Certificate
active
06735588
ABSTRACT:
Priority is claimed to Korean Patent Application No. 00-60262 filed on Oct. 13, 2000, herein incorporated by reference.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method and apparatus for recognizing unknown information using the Hidden Markov Model (HMM), and more particularly, to an information search method and apparatus using an Inverse Hidden Markov Model (IHMM) and the Viterbi algorithm, which can be used for finding the most likely path through the vocabulary network for a given utterance.
2. Description of the Related Art
Various information recognition methods are used for recognizing various types of information. In case that the information is in the form of speech, a speech recognition method is used for recognizing the information of speech. This will now be explained.
Speech recognition technology refers to technology for developing and implementing methods for enabling a machine or a computer to recognize human speech. To implement an interface which enables natural and smooth communications between a machine and a human being through speech, speech signal processing technology, particularly speech recognition technology, is necessary. Thanks to innovative developments in the fields of computer and signal processing technology, studies on speech recognition technology are making rapid progress. Also, the introduction of stochastic techniques, use of affluent speech data, combination of speech and language knowledge, and use of high-speed search algorithms have boosted this research.
In addition, due to the development of the semiconductor industry, very small-sized systems, which can be used for a long time with low power, have become commercially available. In communications fields, particularly the field of wireless communications, power consumption and device sizes can be reduced with hardware, i.e., by using a dedicated chip which performs speech recognition.
Because of their robust modeling capability and high recognition accuracy for speech signals, HMM-based speech recognition methods are extensively used in the speech recognition field. Speech recognition methods involve computing acoustic probabilities, and finding a best matching word using the acoustic probabilities. In particular, isolated word recognition using the HMM consists of two phases: a training phase and a search (recognition) phase. In the training phase, for each word in a predetermined dictionary to be used, the HMM parameters are estimated, and a distinct HMM is built up for each word using a training set of observations. In the search phase, the probability that a given utterance is similar to each word model of the dictionary is computed, and the highest likelihood word model is selected as the recognized word. The Viterbi algorithm is an efficient search technique for finding a best matching word by comparing an input utterance with each of the word models, i.e., reference speech models, in the dictionary, and thus is generally used in the search phase.
A typical speech recognition method is divided into a pre-process step, a core recognition step, and a post-process step. Detailed techniques applied to each of the steps will be described using an example. For the pre-process step, feature parameters which form reference utterances are estimated from an input speech signal. One of the methods to accomplish the pre-process step consists of linear predictive coding (LPC) and a filter bank front-end process.
The core recognition step involves matching and training processes. The estimated parameters express utterance features to match an input speech signal on a phonetic notation and word level. The utterance features of the input speech signal are expressed with the matching and training data set. The post-process step, as a search process, finds a best matching utterance sequence through the vocabulary network for a given utterance.
FIG. 1
is an exemplary state lattice of a conventional method illustrating a path along which an input speech utterance is recognized using a HMM. The state lattice includes a plurality of states
1
through
24
which are searched in order from state
1
to state
24
.
For example, if a word “abbat” is uttered, the state lattice for the word is expressed, as shown in
FIG. 1
, by the general HMM. The HMM applied for speech recognition has local transition paths. Transition to a next state is determined by a previously accumulated state probability (S
t−1
(i)), a transition probability (a
ij
), and a next state observation probability (b
ij
(W(t)), where t is a time variable, i is a variable denoting a state searched at time t in the lateral direction of the state lattice of
FIG. 1
, and j is a variable denoting a state searched at time t−1 in the lateral direction of the state lattice of FIG.
1
. The next state observation probability is determined by a probability density function (P(y|s
(i)
,&phgr;)). State transition to a next state is performed by a conventional speech information search method that uses a Viterbi algorithm based on the maximum likelihood.
A conventional speech information search method will be described with reference to FIG.
2
.
FIG. 2
shows a pseudo code for illustrating the conventional speech information search method consisting of steps
1
through
12
. Referring to
FIG. 2
, in step
1
, a variable Max which indicates the maximum likelihood score is initialized to 0. In step
2
, a variable all_reference_sequence, which indicates the location of each of the reference speech models stored in a dictionary, is set to 1. The variable t is set to 1 in step
3
, and i is set to 1 in step
4
. In step
5
, the maximum state probability (&phgr;
i
(t)) for the i-th state at time t is initialized to 0. In step
6
, j is set to 1.
After all the variables are initialized, the maximum state probability (&phgr;
i
(t)) for the i-th state at time t is computed in consideration of the accumulated state probability for the j-th state searched at time t−1(steps
6
through
8
). In step
8
, “endfor” means that the process goes back to step
6
if j≠N, and goes to step
9
if j=N. Steps
5
through
8
are iterated for all the states from
1
to N, so that N maximum state probabilities are obtained (steps
4
through
9
). The maximum state probabilities are computed for a period of time T to find an optimal path for a particular reference speech model, so that the similarity between the given reference speech model and an input unknown speech signal is obtained (steps
3
through
10
). Steps
3
through
10
are iterated for all the reference speech models (steps
2
through
12
) stored in the dictionary. In step
11
, &phgr;
all—reference—sequence
, which means the optimal path value computed for a particular reference speech model, is compared with the maximum likelihood score (Max), and the maximum likelihood score (Max) is updated with the greater of the two. With the updated maximum likelihood score (Max), steps
2
through
12
are iterated for another reference speech model.
The conventional speech information search method illustrated in
FIG. 2
needs probability computations for all states to obtain the maximum likelihood score. Due to the need for much computation, the conventional speech information search method is unsuitable for high-speed searching. For the conventional speech information search method, all the states are searched, and all the paths for each state must be stored, so that many hardware resources are consumed and many computations are inefficiently required. For these reasons, the conventional speech information search method has problems of increasing power consumption and computation time.
SUMMARY OF THE INVENTION
To solve the above problems, it is a first object of the present invention to provide an information search method using an Inverse Hidden Markov Model (IHMM), which finds an optimal path in a Hidden Markov Model (HMM) state lattice using a minimum unlikelihood score, rather than the maximum likelihood score, and u
Chang Young-hoon
Cho Jun-dong
Kim Bo-Sung
Park Sun-hee
Burns Doane Swecker & Mathis L.L.P.
Robinson Greta
LandOfFree
Information search method and apparatus using Inverse Hidden... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Information search method and apparatus using Inverse Hidden..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information search method and apparatus using Inverse Hidden... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3215488