Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-04-30
2002-04-09
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S254000
Reexamination Certificate
active
06370505
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to a system and methods for employment in speech recognition.
BACKGROUND OF THE INVENTION
It has been a long desired goal to provide a machine which recognises human speech and can act upon it, either to perform particular control functions or to transform the speech into written text.
In recent years considerable progress has been made towards this goal, firstly by the provision of systems which recognise individual words, and secondly by the provision of systems which recognise strings of words. This second set of systems often operate by assessing the likelihood of a received word being adjacent to other detected words based upon both the likelihood of the word and the grammatical rules and vocabulary of the language being recognised. Whilst some systems are now available which do this to a considerable degree of accuracy, all such systems are computationally expensive, requiring a great deal of processing power and high speed processing circuitry to perform the recognition task at sufficient speed, particularly in relation to the assessment of the received speech's probability of correspondence to known stored alternatives.
One such known speech recognition system, as part of its statistical assessment of received speech, uses Hidden Markov Models (HMMs) and the evaluation of continuous probability distributions to calculate the likelihood of a particular frame of speech corresponding to a particular output state. Whilst such an evaluation system is effective, it can require up to 75% of the computational requirement of the whole recognition system.
An alternative system uses a discrete probability distribution (rather than the usual continuous one) for each possible output state. This is because with a discrete distribution a simple table look-up is all that is needed to determine the likelihood of each output state corresponding to the input speech. There is, however, a considerable reduction in accuracy, compared to the employment of continuous probability distributions.
This simplified system has itself been improved by the employment of a semi-continuous system or tied mixture system, in which each possible output state is given a probability based upon a weighted sum of a set of Gaussian components, rather than one of a small set of discrete values. This improves accuracy, but is still not on a par with continuous distribution systems.
In such systems of the prior art, evaluation of the likelihood of the various output states corresponding to the speech vector is achieved by evaluating the likelihood of each mixture component and then summing these likelihoods for the respective output state. Repeating this for all possible output states determines the likelihood of each output state, but is computationally very expensive.
SUMMARY OF THE INVENTION
The present invention is directed towards systems using continuous probability distributions and their methods and seeks to overcome some of the problems associated with them, such as their need for high processing speed and large amounts of processing capability.
According to a first aspect of the present invention there is provided a method of processing speech, the method comprising:
receiving the speech and determining therefrom an input speech vector (o
r
) representing a sample of the speech to be processed; and,
determining the likelihoods of a number of possible output states (j) corresponding to the input speech vector (o
r
), wherein each output state (j) is represented by a number of state mixture components (m), each state mixture component being a probability distribution function approximated by a weighted sum of a number of predetermined generic probability distribution components (x), the approximation including the step of determining a weighting parameter (w
jmx
) for each generic probability distribution component (x) for each state mixture component (m),
the method of determining the output state (j) likelihoods comprising the steps of:
1) generating a correspondence probability signal representing a correspondence probability (P
r
x
), wherein the correspondence probability (P
r
x
) is the probability provided by each respective generic probability distribution component (x) based on the input speech vector (o
r
);
2) generating a threshold signal, representing a threshold value T
mix
;
3) selecting a number of output states (Nj);
4) determining, for each state mixture component (m) of each selected output state (j), whether a weighted probability (g
jmr
) given by the scalar product of the weighting parameters (w
jmx
) and the respective correspondence probabilities (P
r
x
), exceeds the threshold value T
mix
; and,
5) generating a set of output signals representing state likelihoods (b
j
) for each selected output state (j) by evaluating the likelihoods of the state mixture components (m) of the respective selected output state (j) which have a weighted probability (g
jmr
) exceeding the threshold T
mix
.
According to a second aspect of the invention, there is provided a method of processing speech, the method comprising:
receiving the speech and determining therefrom an input speech vector (o
r
) representing a sample of the speech to be processed; and,
determining the likelihoods of a number of possible output states (j) corresponding to the input speech vector (o
r
), wherein each output state (j) is represented by a number of state mixture components (m), each state mixture component being a probability distribution function approximated by a weighted sum of a number of predetermined generic probability distribution components (x), the approximation including the step of determining a weighting parameter (w
jmx
) for each generic probability distribution component (x) for each state mixture component (m),
the method of determining the output state (j) likelihoods involving determining whether a weighted probability (g
jmr
) exceeds a threshold value T
mix
by determining whether a scalar product of the form:
S
=
∑
i
=
1
K
⁢
A
i
×
B
i
exceeds the threshold T, where K is a predetermined integer, the determination comprising the steps of:
1) receiving a signal representing the value A
i
, where A
i
represents one of the weighting parameters (w
jmx
);
2) receiving a signal representing the value B
i
, where B
i
represents the correspondence probability (p
r
x
) generated from the respective generic probability distribution component (x);
3) generating first, second and third signals representing the values log(A
i
), log(B
i
) and log(T), respectively,
4) comparing the first, second and third signals and generating an output signal indicating that S>T if:
log(A
i
)>P×log(T) AND log (B
i
)>Q×log(T)
where: 0<P<=1 and 0<Q<=1
5) if no output signal has been generated, repeat steps 1 to 4 for subsequent values of i.
According to a third aspect of the invention, there is provided a method of processing speech, the method comprising:
receiving the speech and determining therefrom an input speech vector (o
r
) representing a sample of the speech to be processed; and,
determining the likelihoods of a number of possible output states (j) corresponding to the input speech vector (o
r
), wherein each output state (j) is represented by a number of state mixture components (m), each state mixture component being a probability distribution function approximated by a weighted sum of a number of predetermined generic probability distribution components (x), the approximation including the step of determining a weighting parameter (w
jmx
) for each generic component (x) for each state mixture component (m),
wherein the method of determining the output state (j) likelihoods comprises determining a classification (C
jx
) of each of the possible output states (j) for each generic component (x), the classification representing the likelihood (L
xm
) of each output state (j) representing the input speech vector (o
r
), the method of determining the classification comprising the steps of:
1) generat
Magee Theodore M.
Westman Champlin & Kelly PA
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Speech recognition system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech recognition system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition system and method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2839599