System and methods for optimizing networks of weighted...

Data processing: artificial intelligence – Neural network – Learning task

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S012000, C704S256000, C704S257000

Reexamination Certificate

active

06587844

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of Invention
This invention is related to systems and methods for analyzing and manipulating weighted or unweighted finite state automata, such as those usable in continuous speech automatic speech recognition systems and methods.
2. Description of Related Art
Flexible and robust automated speech recognition systems have long been sought. Automatic speech recognition can be viewed as a processing pipeline or cascade. In each step of the processing cascade, an output string of data from an upstream processing element is input into a current processing element. The processing element of each step uses a directed graph, such as a finite state automaton or a finite state machine, to convert the input data string into an output data string. At each processing element, each portion of the input data string generates one or more possible paths, or hypotheses, through that processing element. The data portions can represent acoustic information, phonemes, words, text strings or the like, depending on the processing element.
In automatic speech recognition, the term “lattice” denotes an acyclic directed and labeled graph, which is usually weighted. In each lattice, there is typically a designated start, or initial, node and one or more final nodes. Each possible path through the lattice from the initial node to a final node induces a hypothesis based on the arc labels extending between each pair of nodes in the path. For example, in a word lattice, the arc labels are words and the various paths between the initial node and the final node form word strings, such as sentences.
Speech recognition systems have progressed from simple, isolated word tasks that recognize only a few words, to dictation systems that are capable of recognizing continuous speech, to systems for directory information retrieval. Continuous speech recognition systems often have active vocabularies of over 500,000 words. Directory information retrieval systems often need vocabularies having millions of words.
To support these larger applications, conventional speech recognition systems use weighted finite state transducers to represent the valid set of word strings, such as sentences, that can be accurately recognized. The weights of the weighted finite state transducers are typically determined from a statistical model. This statistical model is based on statistically analyzing a large corpus of text data.
In practice, conventional speech recognition systems use an acoustic weighted finite state transducer to convert spoken utterances into sequences of phonemes and at least a grammar weighted finite state transducer to convert the sequences of phonemes into recognized word strings, such as sentences. The weights of at least the grammar weighted finite state transducer are combined with the weights produced by the acoustic finite weighted state transducer to determine the probability of each recognition hypothesis for a given utterance. The combined weights are then used to prune out the less-likely hypotheses during a Viterbi beam search or the like. Accordingly, it is essential to accurately determine the weights on the acoustic and grammar finite state transducers if the speech recognition system is to viably handle the large-vocabulary speech recognition tasks outlined above.
If the large-vocabulary speech recognition task to be performed by the speech recognition system does not have an available training corpus, at least the grammar weighted finite state transducer might be left unweighted. This occurs, because, as outlined above, the weights on the weighted finite state transducers are determined statistically from the training corpus. However, it should be appreciated that, while an unweighted finite state transducer can be used, the speed and accuracy of the speech recognition system may be considerably reduced.
Classical shortest-paths problems in a weighted directed graph arise in various contexts. The problems divide into two related categories: single-source shortest-path problems and all-pairs shortest-path problems. Determining the single-source shortest-path problem in a weighted directed graph comprises determining the shortest path from a fixed source node “s” of the nodes of the weighted directed graph to all other nodes of the weighted directed graph. Determining the all-pairs shortest-path is more general than finding the single-source shortest-path, and comprises finding the shortest path or paths between all pairs of nodes of the weighted directed graph.
In the classical shortest-path problem, the weights on the transitions between the nodes of the weighted directed graph represent distances, costs, or any other real-value quantity that can be added along a path and that one wishes to minimize. These classical shortest-path problems can be generalized to use other types of transition weights and to use other mathematical operations. In particular, the weights and operations can be any type of weight and any type of operation that can be defined using semirings.
Semirings define an algebraic structure, as set forth in “Finite-State Transducers in Language and Speech Processing”, Mehryar Mohri, Computational Linguistics, 23:2, 1997 and in “Semirings, Automata, Languages”, W. Kuich et al.,
Monographs in Theoretical Computer Science
, Vol. 5, Springer-Verlag, Berlin, 1986, each incorporated herein by reference in its entirety. As defined in Kuich, semirings combine a “multiplication” operation, symbolized as “{circle around (X)}” and an “addition” operation, symbolized using “⊕”.
Classically, the transition weights are real numbers and the specific operations used to determine the shortest path include the addition and minimum operations. In particular, the transition weights are added along a path using the addition operation as the {circle around (X)} operation. Once all the path weights are determined by addition, the minimum operation is applied as the ⊕ operation to select the path having the minimum weight.
Thus, the transition weights of the directed set are elements of an arbitrary set K, which may be the set of real numbers, a set of strings, a set of regular expressions, subsets of another set, or any other quantity that can be multiplied along a path using the “{circle around (X)}” operation, and that can be “summed” using the “⊕” operation. That is, the weight of a path is obtained by “multiplying” the transition weights along that path using the “{circle around (X)}” operator. Then, the shortest distance from a source node “s” to an end, or final, node “f” is the “sum” of the weights of all paths from the source node “s” to the ended node “f” using the “⊕” operator.
SUMMARY OF THE INVENTION
Within the generalized definition of the shortest distances set forth above, the systems and methods according to this invention determine the shortest distances between a source node “s” and an end node “f” of a weighted directed graph, such as a weighted finite state automaton.
As indicated above, unweighted finite state automata may be used in conventional speech recognition systems. However, such unweighted finite state automata generally considerably reduce the speed and accuracy of the speech recognition system. Unfortunately, developing a suitable training corpus for a speech recognition task that accurately reflects the a priori probabilities of different word and/or phoneme combinations is time consuming and expensive, if it is even possible.
For example, the training corpus for a directory information retrieval speech recognition system, given the huge numbers of given names and surnames used in the United States, and the potential variations in spelling and pronunciation, suggests that a training corpus for this speech recognition task would be prohibitively expensive and time consuming to compile.
Additionally, it is highly unlikely that any such training corpus could adequately reflect the various probabilities for the word and/or phoneme combinations. This occurs because the directory information speech recognition task is

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and methods for optimizing networks of weighted... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and methods for optimizing networks of weighted..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and methods for optimizing networks of weighted... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3062044

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.