User model-improvement-data-driven selection and update of...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S251000

Reexamination Certificate

active

06363348

ABSTRACT:

BACKGROUND OF THE INVENTION
The invention relates to a method for recognizing an input pattern stored in a user station using a recognition unit of a server station; the server station and the user station being connected via a network; the recognition unit being operative to recognize the input pattern using a model collection of at least one recognition model; the method comprising:
performing an initial recognition enrolment step, comprising transferring model improvement data associated with a user of the user station from the user station to the recognition unit; and associating the user of the user station with a user identifier; and
for a recognition session between the user station and the server station, transferring a user identifier associated with a user of the user station and an input pattern representative of time sequential input generated by the user from the user station to the server station; and using the recognition unit to recognize the input pattern by incorporating at least one recognition model in the model collection which reflects the model improvement data associated with the user.
The invention further relates to a pattern recognition system comprising at least one user station storing an input pattern and a server station comprising a recognition unit; the recognition unit being operative to recognize the input pattern using a model collection of at least one recognition model; the server station being connected to the user station via a network;
the user station comprising means for initially transferring model improvement data associated with a user of the user station and a user identifier associated with the user to the server station; and for each recognition session between the user station and the server station transferring a user identifier associated with a user of the user station and an input pattern representative of time sequential input generated by the user to the server station; and
the server station comprising means for, for each recognition session between the user station and the server station, incorporating at least one recognition model in the model collection which reflects the model improvement data associated with a user from which the input pattern originated; and using the speech recognition unit to recognize the input pattern received from the user station.
Pattern recognition systems, such as large vocabulary continuous speech recognition systems or handwriting recognition systems, typically use a collection of recognition models to recognize an input pattern. For instance, an acoustic model and a vocabulary may be used to recognize words and a language model may be used to improve the basic recognition result.
FIG. 1
illustrates a typical structure of a large vocabulary continuous speech recognition system
100
[refer L. Rabiner, B-H. Juang, “Fundamentals of speech recognition”, Prentice Hall 1993, pages 434 to 454]. The system
100
comprises a spectral analysis subsystem
110
and a unit matching subsystem. In the spectral analysis subsystem
110
the speech input signal (SIS) is spectrally and/or temporally analysed to calculate a representative vector of features (observation vector, OV). Typically, the speech signal is digitised (e.g. sampled at a rate of 6.67 kHz.) and pre-processed, for instance by applying pre-emphasis. Consecutive samples are grouped (blocked) into frames, corresponding to, for instance, 32 msec. of speech signal. Successive frames partially overlap, for instance, 16 msec. Often the Linear Predictive Coding (LPC) spectral analysis method is used to calculate for each frame a representative vector of features (observation vector). The feature vector may, for instance, have 24, 32 or 63 components. The standard approach to large vocabulary continuous speech recognition is to assume a probabilistic model of speech production, whereby a specified word sequence W=w
1
w
2
w
3
. . . w
q
produces a sequence of acoustic observation vectors Y=y
1
y
2
y
3
. . . y
T
. The recognition error can be statistically minimised by determining the sequence of words w
1
w
2
w
3
. . . w
q
which most probably caused the observed sequence of observation vectors y
1
y
2
y
3
. . . y
T
(over time t=1, . . . , T), where the observation vectors are the outcome of the spectral analysis subsystem
110
.
This results in determining the maximum a posteriori probability:
max P(W¦Y), for all possible word sequences W By applying Bayes' theorem on conditional probabilities, P(W¦Y) is given by:
P(W¦Y)=P(Y¦W).P(W)/P(Y)
Since P(Y) is independent of W, the most probable word sequence is given by:
arg max P(Y¦W).P(W) for all possible word sequences W(1)
In the unit matching subsystem
120
, an acoustic model provides the first term of equation (1). The acoustic model is used to estimate the probability P(Y¦W) of a sequence of observation vectors Y for a given word string W. For a large vocabulary system, this is usually performed by matching the observation vectors against an inventory of speech recognition units. A speech recognition unit is represented by a sequence of acoustic references. Various forms of speech recognition units may be used. As an example, a whole word or even a group of words may be represented by one speech recognition unit. A word model (WM) provides for each word of a given vocabulary a transcription in a sequence of acoustic references. For systems, wherein a whole word is represented by a speech recognition unit, a direct relationship exists between the word model and the speech recognition unit. Other systems, in particular large vocabulary systems, may use for the speech recognition unit linguistically based sub-word units, such as phones, diphones or syllables, as well as derivative units, such as fenenes and fenones. For such systems, a word model is given by a lexicon
134
, describing the sequence of sub-word units relating to a word of the vocabulary, and the sub-word models
132
, describing sequences of acoustic references of the involved speech recognition unit. A word model composer
136
composes the word model based on the sub-word model
132
and the lexicon
134
.
FIG. 2A
illustrates a word model
200
for a system based on whole-word speech recognition units, where the speech recognition unit of the shown word is modelled using a sequence of ten acoustic references (
201
to
210
).
FIG. 2B
illustrates a word model
220
for a system based on sub-word units, where the shown word is modelled by a sequence of three sub-word models (
250
,
260
and
270
), each with a sequence of four acoustic references (
251
,
252
,
253
,
254
;
261
to
264
;
271
to
274
). The word models shown in
FIG. 2
are based on Hidden Markov Models (HMMs), which are widely used to stochastically model speech and handwriting signals. Using this model, each recognition unit (word model or subword model) is typically characterised by an HMM, whose parameters are estimated from a training set of data. For large vocabulary speech recognition systems involving, for instance, 10,000 to 60,000 words, usually a limited set of, for instance 40, sub-word units is used, since it would require a lot of training data to adequately train an HMM for larger units. An HMM state corresponds to an acoustic reference (for speech recognition) or an allographic reference (for handwriting recognition). Various techniques are known for modelling a reference, including discrete or continuous probability densities.
A word level matching system
130
matches the observation vectors against all sequences of speech recognition units and provides the likelihoods of a match between the vector and a sequence. If sub-word units are used, constraints are placed on the matching by using the lexicon
134
to limit the possible sequence of sub-word units to sequences in the lexicon
134
. This reduces the outcome to possible sequences of words. A sentence level matching system
140
uses a language model (LM) to place further constraints on the matching so that the paths investigated are

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

User model-improvement-data-driven selection and update of... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with User model-improvement-data-driven selection and update of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and User model-improvement-data-driven selection and update of... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2878912

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.