Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique
Reexamination Certificate
1999-06-01
2002-12-24
Black, Thomas (Department: 2121)
Data processing: artificial intelligence
Knowledge processing system
Knowledge representation and reasoning technique
C342S064000, C700S090000
Reexamination Certificate
active
06499025
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a system and method for tracking objects, and in particular, to a system and method for fusing results of multiple sensing modalities for efficiently performing automated vision tracking, such as tracking human head movement and facial movement.
2. Related Art
Applications of real-time vision-based object detection and tracking is becoming increasingly important for providing new classes of services to users based on an assessment of the presence, position, and trajectory of objects. Research on computer-based motion analysis of digital video scenes centers on the goal of detecting and tracking objects of interest, typically via the analysis of the content of a sequence of images. Plural objects define each image and are typically nebulous collections of pixels, which satisfy some property. Each object can occupy a region or regions within each image and can change their relative locations throughout subsequent images and the video scene. These objects are considered moving objects, which form motion within a video scene.
Facial objects of a human head, such as mouth, eyes, nose, etc., can be types of moving objects within a video scene. It is very desirable to automatically track movement of these facial objects because successful digital motion analysis of facial movement has numerous applications in real world environments. For example, one application includes facial expression analysis for automatically converting facial expressions into computer readable input for performing computer operations and for making decisions based on human emotions derived from the facial expressions. Another application is for digital speech recognition and “lip reading” for automatically recognizing human speech without requiring human vocal input or for receiving the speech as computer instructions. Another application is the visual identification of the nature of the ongoing activity of one or more individuals so as to provide context-sensitive assistance and communications.
However, current real-time tracking systems or visual processing modalities are often confused by waving hands or changing illumination, and systems that track only faces do not run at realistic camera frame rates or do not succeed in real-world environments. Also, visual processing modalities may work well in certain situations but fail dramatically in others, depending on the nature of the scene being processed. Current visual modalities, used singularly, are not consistent enough to detect all heads and discriminating enough to detect heads robustly. Color, for example, changes with shifts in illumination, and people move in different ways. In contrast, “skin color” is not restricted to skin, nor are people the only moving objects in the scene being analyzed.
As such, in the past a variety of techniques have been investigated to unify the results of sets of sensors. One previous technique used variations of a probabilistic data association filter to combine color and edge data for tracking a variety of objects. Another previous technique used priors from color data to bias estimation based on edge data within their framework. Recent techniques have attempted to perform real-time head tracking by combining multiple visual cues. For example, one technique uses edge and color data. Head position estimates are made by comparing match scores based on image gradients and color histograms. The estimate from the more reliable modality is returned. Another technique heuristically integrates color data, range data, and frontal face detection for tracking.
Nevertheless, these systems and techniques are not sufficiently efficient, nor systematically trained, to operate satisfactorily in real world environments. Therefore, what is needed is a technique for fusing the results of multiple vision processing modalities for robustly and efficiently tracking objects of video scenes, such as human head movement and facial movement. What is also needed is a system and method that utilizes Bayesian networks to effectively capture probabilistic dependencies between a true state of the object being tracked and evidence obtained from tracking modalities by incorporating evidence of reliability and integrating different sensing modalities. Whatever the merits of the above mentioned systems and methods, they do not achieve the benefits of the present invention.
SUMMARY OF THE INVENTION
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention is embodied in a system and method for efficiently performing automated vision tracking, such as tracking human head movement and facial movement. The system and method of the present invention fuses results of multiple sensing modalities to achieve robust digital vision tracking.
As a general characterization of the approach, context-sensitive accuracies are inferred for fusing the results of multiple vision processing modalities for performing tracking tasks in order to achieve robust vision tracking. This is accomplished by fusing together reports from several distinct vision processing procedures. Beyond the reports, information with relevance to the accuracy of the reports of each modality is reported by the vision processing modalities.
Specifically, Bayesian modality-accuracy models are built and the reports from multiple vision processing modalities are fused together with appropriate weighting. Evidence about the operating context of the distinct modalities is considered and the accuracy of different modalities is inferred from sets of evidence with relevance to identifying the operating regime in which a modality is operating. In other words, observations of evidence about features in the data being analyzed by the modalities, such as a vision scene, are considered in inferring the reliability of a methods report. The reliabilities are used in the Bayesian integration of multiple reports. The model (a Bayesian network) can be built manually with expertise or trained offline from data collected from a non-vision-based sensor that reports an accurate measure of object position. In addition, the dependencies considered in a model can be restructured with Bayesian learning methods that identify new dependencies.
The foregoing and still further features and advantages of the present invention as well as a more complete understanding thereof will be made apparent from a study of the following detailed description of the invention in connection with the accompanying drawings and appended claims.
REFERENCES:
patent: 4939369 (1990-07-01), Elabd
patent: 5280565 (1994-01-01), Nomoto et al.
patent: 5289563 (1994-02-01), Nomoto et al.
patent: 5307289 (1994-04-01), Harris
patent: 5341142 (1994-08-01), Reis et al.
patent: 5572628 (1996-11-01), Denker et al.
patent: 5598512 (1997-01-01), Niwa
patent: 5673365 (1997-09-01), Basehore et al.
patent: 5687291 (1997-11-01), Smyth
patent: 5751915 (1998-05-01), Werbos
patent: 5809493 (1998-09-01), Ahamed et al.
Visual interaction with lifelike characters, Turk, M.; Automatic Face and Gesture Recognition, 1996., Proceedings of the Second International Conference on , 1996, pp. 368-373.*
VITS—a vision system for autonomous land vehicle navigation, Turk, M.A.; Morgenthaler, D.G.; Gremban, K.D.; Marra, M. Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol. 10 Issue: 3 ,May 1988, pp. 342-361.*
Face recognition using eigenfaces, Turk, M.A.; Pentland, A.P.; Computer Vision and Pattern Recognition, 1991, Proceedings CVPR '91., IEEE Computer Society Conference on , 1991, pp. 586-591.*
View-based interpretation of real-time optical flow for gesture recognition, Cutler, R.; Turk, M.; Automatic Face and Gesture Recognition, 1998, Proceedings. Third IEEE International Conference on , 1998, pp. 416-421.*
Tracking self-occluding articulated objects in dense disparity maps, Jojic, N.; Turk, M.; Huang, T.S.; Computer Vision, 1999. T
Horvitz Eric J.
Toyama Kentaro
Black Thomas
Holmes Michael B.
Lyon & Harr LLP
Microsoft Corporation
Watson Mark A.
LandOfFree
System and method for tracking objects by fusing results of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for tracking objects by fusing results of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for tracking objects by fusing results of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2986310