Distributed voice recognition system

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S270000

Reexamination Certificate

active

06594628

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to speech signal processing. More particularly, the present invention relates to a novel method and apparatus for realizing a distributed implementation of a standard voice recognition system.
DESCRIPTION OF THE RELATED ART
Voice recognition represents one of the most important techniques to endow a machine with simulated intelligence to recognize user or uses voiced commands and to facilitate human interface with the machine. It also represents a key technique for human speech understanding. Systems that employ techniques to recover a linguistic message from an acoustic speech signal are called voice recognizers (VR). A voice recognizer is composed of an acoustic processor, which extracts a sequence of information-bearing features (vectors) necessary for VR from the incoming raw speech, and a word decoder, which decodes this sequence of features (vectors) to yield the meaningful and desired format of output, such as a sequence of linguistic words corresponding to the input utterance. To increase the performance of a given system, training is required to equip the system with valid parameters. In other words, the system needs to learn before it can function optimally.
The acoustic processor represents a front end speech analysis subsystem in a voice recognizer. In response to an input speech signal, it provides an appropriate representation to characterize the time-varying speech signal. It should discard irrelevant information such as background noise, channel distortion, speaker characteristics and manner of speaking. Efficient acoustic feature will furnish voice recognizers with higher acoustic discrimination power. The most useful characteristic is the short time spectral envelope. In characterizing the short time spectral envelope, the two most commonly used spectral analysis techniques are linear predictive coding (LPC) and filter-bank based spectral analysis models. However, it is readily shown (as discussed in Rabiner, L. R. and Schafer, R. W.,
Digital Processing of Speech Signals
, Prentice Hall, 1978) that LPC not only provides a good approximation to the vocal tract spectral envelope, but is considerably less expensive in computation than the filter-bank model in all digital implementations. Experience has also demonstrated that the performance of LPC based voice recognizers is comparable to or better than that of filter-bank based recognizers (Rabiner, L. R. and Juang, B. H.,
Fundamentals of Speech Recognition
, Prentice Hall, 1993).
Referring to
FIG. 1
, in an LPC based acoustic processor, the input speech is provided to a microphone (not shown) and converted to an analog electrical signal. This electrical signal is then digitized by an A/D converter (not shown). The digitized speech signals are passed through preemphasis filter
2
in order to spectrally flatten the signal and to make it less susceptible to finite precision effects in subsequent signal processing. The preemphasis filtered speech is then provided to segmentation element
4
where it is segmented or blocked into either temporally overlapped or nonoverlapped blocks. The frames of speech data are then provided to windowing element
6
where framed DC components are removed and a digital windowing operation is performed on each frame to lessen the blocking effects due to the discontinuity at frame boundaries. A most commonly used window function in LPC analysis is the Hamming window, w(n) defined as:
w

(
n
)
=
0.54
-
0.46
·
cos

(
2



π



n
N
-
1
)
,


0

n

N
-
1
1
The windowed speech is provided to LPC analysis element
8
. In LPC analysis element
8
autocorrelation functions are calculated based on the windowed samples and corresponding LPC parameters are obtained directly from autocorrelation functions.
Generally speaking, the word decoder translates the acoustic feature sequence produced by the acoustic processor into an estimate of the speaker's original word string. This is accomplished in two steps: acoustic pattern matching and language modeling. Language modeling can be avoided in the applications of isolated word recognition. The LPC parameters from LPC analysis element
8
are provided to acoustic pattern matching element
10
to detect and classify possible acoustic patterns, such as phonemes, syllables, words, etc. The candidate patterns are provided to language modeling element
12
, which models the rules of syntactic constraints that determine what sequences of words are grammatically well formed and meaningful. Syntactic information can be valuable guide to voice recognition when acoustic information alone is ambiguous. Based on language modeling, the VR sequentially interprets the acoustic feature matching results and provides the estimated word string.
Both the acoustic pattern matching and language modeling in the word decoder requires a mathematical model, either deterministic or stochastic, to describe the speaker's phonological and acoustic-phonetic variations. The performance of a speech recognition system is directly related to the quality of these two modelings. Among the various classes of models for acoustic pattern matching, template-based dynamic time warping (DTW) and stochastic hidden Markov modeling (HMM) are the two most commonly used. However, it has been shown that DTW based approach can be viewed as a special case of HMM based one, which is a parametric, doubly stochastic model. HMM systems are currently the most successful speech recognition algorithms. The doubly stochastic property in HMM provides better flexibility in absorbing acoustic as well as temporal variations associated with speech signals. This usually results in improved recognition accuracy. Concerning the language model, a stochastic model, called k-gram language model which is detailed in F. Jelink, “
The Development of an Experimental Discrete Dictation Recognizer
”, Proc. IEEE, vol. 73, pp. 1616-1624, 1985, has been successfully applied in practical large vocabulary voice recognition systems. While in the small vocabulary case, a deterministic grammar has been formulated as a finite state network (FSN) in the application of airline and reservation and information system (see Rabiner, L. R. and Levinson, S. Z., A Speaker-Independent, Syntax-Directed, Connected Word Recognition System Based on Hidden Markov Model and Level Building, IEEE Trans. on LASSP, Vol. 33, No. 3, June 1985.)
Statistically, in order to minimize the probability of recognition error, the voice recognition problem can be formalized as follows: with acoustic evidence observation O, the operations of voice recognition are to find the most likely word string W* such that

W*=arg
max
P
(
W|O
)  (1)
where the maximization is over all possible word strings W. In accordance with Bayes rule, the posteriori probability P(W|O) in the above equation can be rewritten as:
P

(
W
|
O
)
=
P

(
W
)

P

(
O
|
W
)
P

(
O
)
(
2
)
Since P(O) is irrelevant to recognition, the word string estimate can be obtained alternatively as:
W*=arg
max
P
(
W
)
P
(
O|W
)  (3)
Here P(W) represents the a priori probability that the word string W will be uttered, and P(O|W) is the probability that the acoustic evidence O will be observed given that the speaker uttered the word sequence W. P(O|W) is determined by acoustic pattern matching, while the a priori probability P(W) is defined by language model utilized.
In connected word recognition, if the vocabulary is small (less than 100), a deterministic grammar can be used to rigidly govern which words can logically follow other words to form legal sentences in the language. The deterministic grammar can be incorporated in the acoustic matching algorithm implicitly to constrain the search space of potential words and to reduce the computation dramatically. However, when the vocabulary size is either medium (greater than 100 but less than 1000) or large (greater than 1000), the pr

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Distributed voice recognition system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Distributed voice recognition system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Distributed voice recognition system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3012218

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.