System for analyzing and synthesis of multi-factor data

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C709S219000

Reexamination Certificate

active

06549899

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a system for analyzing data in order to classify content of unknown data or to recreate missing content of data. More particularly, it relates to analysis of data which can be represented as matrices of multiple factors.
BACKGROUND OF THE INVENTION
Many learning problems require recognition, classification or synthesis of data generated through the interaction of multiple independent factors. For example, an optical character recognition system may be used to recognize characters in an unfamiliar font. Speech analysis requires recognition of words by different speakers having different tonal characteristics. An adaptive controller may need to produce state-space trajectories with new payload conditions. Each of these types of problems can be decomposed into two factors, each having multiple elements, which interact to create the actual data. For ease of discussion, the two factors will be called “content” and “style”. For example, in typography used for optical character recognition, each character includes as content a letter (A, B, C, etc.) and as style a font (Times, Helvetica, Courier, etc.). In both printing and handwriting, people can generally recognize letters independent of the font or handwriting. However, optical character recognition systems generally are based upon template comparisons. Thus, they do not operate well with unknown fonts, and are extremely poor with the variations in handwriting. Thus, such systems can not classify the elements of one factor (letter) independent of the other factor (font or handwriting).
Similarly, in speech analysis, the sound of words (content) which are spoken are greatly effected by the speaker (style). Thus, systems which analyze the sounds to determine patterns have difficulty with new speakers. This is also true for people, particularly when the speaker has a strong accent. However, after exposure to someone with an strong accent for a period of time, a listener can much more easily determine the words being spoken. The listener has learned to distinguish the content from the style in the speech. On the other hand, speech recognition systems must have specific training for each speaker. They do not generally recognize new speakers or accents, and cannot learn these over time.
Therefore, a need exists for a system which easily separates the content and style of data in order to recognize the content with new styles. A need exists for a system which can also create new content in a known style.
Theoretical work has been performed by others on modeling of data which is formed from a mixture of factors through Cooperative Vector Quantization (CVQ). G. E. Hinton and R. Zemel disclose theories relating to factorial mixtures in “Autoencoders, Minimum Description Length and Helmholtz Free Energy,” NIPS 6, (1994). Z. Ghahramani discloses a system which applies mixture models to data analysis in “Factorial Learning and the EM Algorithm,” NIPS 7, 657-674 (1995). In CVQ, as used in these systems, each element of each factor is assigned a code vector. Each data point is modeled as a linear combination of one code vector from each factor. Thus, the factors interact only additively. The linear nature of the models suggested by these researchers severely limits the modeling capability of their theories. Often, factors do not interact only additively. For example, in typography, the letter and font are not additively combined to form each character. Instead, the font can significantly modify certain characteristics of each letter.
Therefore, a need exists for a system which models complex interactions between factors and yet which allows for simple processing of the model in analyzing data.
SUMMARY OF THE INVENTION
The deficiencies of existing systems and of theoretical approaches previously made on multiple factor problems are substantially overcome by the present invention which provides a computer-based system for analyzing multiple factor data.
According to one aspect of the invention, data is modeled as a product of two linear forms corresponding to parameters of each factor. The data may or may not result from physical processes having a bilinear form that is used to model the data. But, by increasing the dimensionality of the bilinear forms sufficiently, the model can represent known training data to an arbitrary accuracy.
According to another aspect of the invention, the system determines the model parameters from training data having multiple factor interaction. Typically, the training data is a complete matrix of observations as to each content and style type. However, for some analyses, the training data may be fully labeled as to content and style, unlabeled, or partially labeled. Furthermore, the system can reasonably determine parameters from training data having unknown observations within the matrix. To determine the parameters, the system creates a model based upon parameter vectors for each of the factors and a combination matrix for combining the parameter vectors for each factor. The values of the elements in the parameter vectors and the combination matrix are iteratively determined based upon the training data, using Estimation-Maximization (EM) techniques.
According to another aspect of the invention, once parameter vectors are obtained, the system can be used to analyze unknown data. The analysis can be used to categorize content of data in an unknown style. In this manner, the system can be used to recognize letters in new styles in connection with optical character recognition, or words with new speakers. The analysis can also be used to create new content in a known style. Thus, the system can complete missing data, such as generating missing characters for a given font.


REFERENCES:
patent: 4750147 (1988-06-01), Roy et al.
patent: 4965732 (1990-10-01), Roy et al.
patent: 5121337 (1992-06-01), Brown
patent: 5148488 (1992-09-01), Chen et al.
patent: 5267139 (1993-11-01), Johnson
patent: 5446681 (1995-08-01), Gethner et al.
patent: 5459473 (1995-10-01), Dempster et al.
patent: 5459668 (1995-10-01), Dogan et al.
patent: 5517115 (1996-05-01), Prammer
patent: 5579243 (1996-11-01), Levine
patent: 5640429 (1997-06-01), Michels et al.
patent: 5748507 (1998-05-01), Abatzoglou et al.
patent: 5781880 (1998-07-01), Su
patent: 5798942 (1998-08-01), Danchick et al.
patent: 5828999 (1998-10-01), Bellegarda et al.
patent: 5839105 (1998-11-01), Ostendorf et al.
patent: 5844613 (1998-12-01), Chaddha

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for analyzing and synthesis of multi-factor data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for analyzing and synthesis of multi-factor data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for analyzing and synthesis of multi-factor data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3112307

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.