Data processing: artificial intelligence – Neural network – Learning task
Reexamination Certificate
1999-10-12
2002-12-31
Black, Thomas (Department: 2121)
Data processing: artificial intelligence
Neural network
Learning task
C706S015000, C706S020000
Reexamination Certificate
active
06502082
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a system and method for visually tracking objects by fusing results of multiple sensing modalities of a model, and in particular to a model such as a Bayesian network, that can be trained offline from data collected from a sensor, and wherein dependencies considered in the model can be restructured with Bayesian learning methods that identify new dependencies.
2. Related Art
Applications of real-time vision-based object detection and tracking are becoming increasingly important for providing new classes of services to users based on an assessment of the presence, position, and trajectory of objects. Research on computer-based motion analysis of digital video scenes centers on the goal of detecting and tracking objects of interest, typically via the analysis of the content of a sequence of images. Plural objects define each image and are typically nebulous collections of pixels, which satisfy some property. Each object can occupy a region or regions within each image and can change their relative locations throughout subsequent images and the video scene. These objects are considered moving objects, which form motion within a video scene.
Facial objects of a human head, such as mouth, eyes, nose, etc., can be types of moving objects within a video scene. It is very desirable to automatically track movement of these facial objects because successful digital motion analysis of facial movement has numerous applications in real world environments. For example, one application includes facial expression analysis for automatically converting facial expressions into computer readable input for performing computer operations and for making decisions based on human emotions derived from the facial expressions. Another application is for digital speech recognition and “lip reading” for automatically recognizing human speech without requiring human vocal input or for receiving the speech as computer instructions. Another application is the visual identification of the nature of the ongoing activity of one or more individuals so as to provide context-sensitive informational display, assistance, and communications.
However, current real-time tracking systems, which depend on various visual processing modalities, such as color, motion, and edge information, are often confused by waving hands or changing illumination. Also, specific visual processing modalities may work well in certain situations but fail dramatically in others, depending on the nature of the scene being processed. Current visual modalities, used singularly, are not consistent enough to detect all heads nor discriminating enough to detect heads robustly. Color, for example, changes with shifts in illumination. Yet, “skin color” is not restricted to skin.
As such, in the past, a variety of techniques have been investigated to unify the results of sets of sensors. Recent techniques have attempted to perform real-time head tracking by combining multiple visual cues. One previous technique used variations of a probabilistic data association filter to combine color and edge data for tracking a variety of objects. Another previous technique used priors from color data to bias estimation based on edge data within their framework. Another technique uses edge and color data. Head position estimates are made by comparing match scores based on image gradients and color histograms. The estimate from the more reliable modality is returned. Another technique heuristically integrates color data, range data, and frontal face detection for tracking.
Methods employing dynamic models such as Bayesian networks have the ability to fuse the results of multiple modalities of visual analysis. The structure of such models can be based on key patterns of dependency including subassemblies of the overall dependency model that relate the inferred reliabilities of each modality to the true state of the world. The parameters of these models can be assessed manually through a reliance on expert knowledge about the probabilistic relationships.
Nevertheless, these systems and techniques do not reliably and effectively combine the results of multiple modes of analysis, nor do they make use of ideal parameters that are derived from a consideration of data that can be collected experimentally. Therefore, what is needed is a system and method for training a dynamic model, such as a Bayesian network, to effectively capture probabilistic dependencies between the true state of the object being tracked and evidence from the tracking modalities. Such a system can be used to enhance a model constructed by an expert, or to eliminate the need for a person to assess the ideal parameters of the Bayesian model.
SUMMARY OF THE INVENTION
To overcome the limitations in the related art described above, and to overcome other limitations that will become apparent upon reading and understanding the present application, the present invention is embodied in a system and method for training a dynamic model, such as a Bayesian network, to effectively capture probabilistic dependencies between the true state of an object being tracked and evidence from various tracking modalities. The system and method of the present invention fuses results of multiple sensing modalities to automatically infer the structure of a dynamic model, such as a Bayesian network, to achieve robust digital, vision tracking. The model can be trained and structured offline using data collected from a sensor that may be either vision, or non-vision, based in conjunction with position estimates from the sensing modalities. Further, models based on handcrafted structures and probability assessments can also be enhanced by training the models with experimentally derived real-world data.
Automated methods for identifying variable dependencies within the model are employed to discover new structures for the probabilistic dependency models that are more ideal in that they better explain the data. Dependencies considered in the model can be restructured with Bayesian learning methods that identify new dependencies in the model. Further, the model can automatically adapt its position estimates by detecting changes in indicators of reliability of one or more modalities.
In general, context-sensitive accuracies are inferred for fusing the results of multiple vision processing modalities for tracking tasks in order to achieve robust vision tracking, such as head tracking. This is accomplished by fusing together reports from several distinct vision processing procedures. Beyond the reports, evidence with relevance to the accuracy of the reports of each modality is reported by the vision processing modalities.
Evidence about the operating context of the distinct modalities is considered and the accuracy of different modalities is inferred from sets of evidence with relevance to identifying the operating regime in which a modality is operating. In other words, observations of evidence about features in the data being analyzed by the modalities, such as a vision scene, are considered in inferring the reliability of a methods report. The reliabilities are used in the Bayesian integration of multiple reports. Offline training of the model increases the accuracy of the inferences of object position that are derived from the model.
Specifically, dynamic Bayesian modality-accuracy models are built either manually, or automatically by a system and method in accordance with the present invention. Reports from multiple vision processing modalities of the models are fused together with appropriate weighting to infer an objects position. Bayesian network learning algorithms are used to learn the dependencies among variables to infer the structure of the models as well as to restructure and increase the accuracy of the models through training. Structuring and training of the models may be accomplished by providing sets of training cases that incorporate ground truth data obtained by using a sensor to accurately provide object position information, estim
Horvitz Eric J.
Toyama Kentaro
Black Thomas
Holmes Michael B.
Lyon Richard
Lyon & Harr
Watson Mark
LandOfFree
Modality fusion for object tracking with training system and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Modality fusion for object tracking with training system and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Modality fusion for object tracking with training system and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2982302