On-demand language processing system and method

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000, C704S254000

Reexamination Certificate

active

06278973

ABSTRACT:

FIELD OF THE INVENTION
This application is related to the art of pattern recognition and more particularly to an application of pattern recognition methods for language processing.
BACKGROUND OF THE INVENTION
In the art of language processing, particularly in the application thereof exemplified herein of systems for recognition of natural language in spoken form, an ultimate objective will normally be the recognition and/or classification of an unknown pattern, such as a speech fragment, through use of a set of decision rules applied to a comparison of the unknown pattern and stored parameters representative of patterns of the same general class. In general, the steps in such a recognition/classification process can be described as:
performing feature extraction for a training set of patterns, thereby providing a parameterized description for each pattern in such a training set, and providing a similar feature extraction for an unknown pattern;
using a set of labelled training set patterns to infer decision rules, thereby defining a mapping from an unknown object (or pattern) to a known pattern in the training set; and
carrying out that mapping to define the recognition or classification of the unknown pattern.
In a system for recognizing spoken words or phrases from a natural language, a large corpus of training speech is segmented into elemental speech units such as phones and multiple instances of each phone are used to develop an acoustic model of the phone. Ideally the corpus of training speech will be sufficiently large that virtually all variants of each phone that may be encountered by the recognizer are included in the model for the phone. Each model will be defined in terms of the previously described parameterized description for the modeled speech sound.
The architecture for implementation of the modelling of sound patterns in such a speech recognizer system has become largely standardized. Specifically, the various levels of linguistic information—i.e., the language (or grammar) model, the word pronunciation (or lexicon) models, and the phone models—are represented by a cascade of networks linked by a single operation of substitution. Each network at a given level accepts sequences of symbols. Each of those symbols is modeled by a network at the level immediately below. For example, the pronunciation of each word in a word sequence accepted by the language model will be modeled by a phone string, or, more generally, by a phonetic network encoding alternative pronunciations. Each phone is itself modeled, typically by a Hidden Markov Model (“HMM”), to represent possible sequences of acoustic observations in realizations of the phone. Thus to create the recognizer network, a phone string (or phonetic network) will be substituted for the corresponding word label in the language model and an HMM will be substituted for each phone label in the Lexicon model.
This architecture allows a wide range of implementations. In principle, the cascaded networks can be expanded in advance into a single network accepting sequences of the lowest-level inputs (eg. acoustic observations) by applying recursively the substitution of a symbol by a network. However, if the networks involved are large, full expansion is not practical, since the resulting network would be too large to be computationally tractable. Instead, a typical recognizer system will use a hybrid policy in which some levels are fully expanded but others may be expanded on-demand. In such a recognizer, sequences of hypotheses of units at level i are assembled until they correspond to paths from an initial node to a final node in some model of a unit of level i+1, at which point that higher-level unit is hypothesized.
The hybrid arrangement described above works well so long as the combination of modeling levels can be done by substitution alone. However, an improvement in recognizer systems generally has had the effect of limiting the application of on-demand modeling where that improvement is implemented. It has been determined in recent years that the use of context-dependent units at appropriate levels of representation significantly improves the performance of such a recognition system. By its very nature, a context-dependent model, u/c (for unit u in context c), can be substituted for an instance of u only when that instance appears in context c. In prior-art recognizer systems, this constraint is addressed in one of two main ways. If the cascaded networks involved are small enough, the full cascade is expanded in advance down to the level of context-dependent units, using a specialized expansion algorithm that folds in context dependency appropriately. If full expansion is not practical, as, for instance, in large-vocabulary recognition, the standard solution is to use restricted model structures and specialized algorithms to allow on-demand combination of representation levels. A particular problem occurs at word boundaries, where context must be determined as to each adjacent word which can appear in that position. Here, because of the multiplicity of possible contexts for a phone at a word boundary position, substitution does not work. A common restriction is to allow only one-sided context-dependent models at such word boundaries. But even where full context-dependency may be implemented, the particular context-dependency type (eg. triphonic, pentaphonic) must be built into the decoder, thereby preventing any other form of context-dependency being used with such a recognizer system.
SUMMARY OF THE INVENTION
A language recognition methodology is disclosed whereby any finite-state model of context may be used in a very general class of decoding cascades (where paths through such cascades are defined by connections among a number of predetermined states), and without requiring specialized decoders or full network expansion. The methodology includes two fundamental improvements: (1) a simple generalization, weighted finite-state transducers, of existing network models, and (2) a novel on-demand execution technique for network combination. The steps of the methodology of the invention include: (1) formulating at least one of the network cascades as a finite state transducer, where that transducer represents a mapping function between the network cascade and an adjacent cascade; (2) selecting a state in a transducer at a selected level in the cascade having a correspondence to a portion of the network cascade to be expanded; (3) composing that transducer with a next successively higher level of the network cascade to prescribe a mapped portion of that next successively higher level corresponding to the portion of the network cascade selected to be expanded; and (4) iteratively repeating step (2) and step (3) for each successively higher level cascade representing the portion of the network selected to be expanded.


REFERENCES:
patent: 5086472 (1992-02-01), Yoshida
Franzini et al, A connectionist approach to speech recognition IEEE/ICASSP 89 pp. 425-428, May 1989.*
Franzini et al, Contin. speech recogn. with connectionist vterbi training ICASSP pp. 1855-1860, Nov. 1991.*
Lee C-H, Rabiner L R, A Netw. based frame-synchronous level building . . . ICASSP pp. 410 to 413, Apr. 1988.*
Fissore et al, Interaction between fast lexical access and word verification . . . ICASSP pp. 279-82, Apr. 1988.*
Bourlard, Morgan, Cont spch recg. by cnnctnst stat. mthds IEEE Trans Neur Netw, v 7, 893-909, Nov. 1993.*
Rabaey etal, A large vocab real time continuous speech recogn sys VLSI sig proc III ch 7 p61-74, Nov. 1988.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

On-demand language processing system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with On-demand language processing system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and On-demand language processing system and method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2441326

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.