Image analysis – Applications – Target tracking or detecting
Reexamination Certificate
1999-06-10
2003-06-17
Au, Amelia M. (Department: 2721)
Image analysis
Applications
Target tracking or detecting
C348S014100, C382S154000
Reexamination Certificate
active
06580810
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a method of image processing, particularly to a method of image processing using three facial feature points in 3-D head motion tracking.
2. Description of Related Art
In recent years, model-based image coding has drawn public attention, specifically its foreseeable application in visual communication, such as videophone or the virtual meeting mentioned in MPEG-4 Applications Document, ISO/IEC JTC1/SC29/WG11/N1729, July 1997. Since the primary subject in visual communication is the head section (and part of shoulders) of an image, the focus falls mainly on the head to reduce the data load in transmission. One possible approach is to introduce an explicit 3-D head model, such as the well-known CANDIDE face model (Mikael Rydfalk “CANDIDE-a Parameterised Face,” Linkoping University, Report LiTH-ISY-I-0866, October 1987) with texture mapping. A general system model of model-based face image coding is shown in
FIG. 1. A
user's face model is first inputted into an encoder
10
and then adapted to fit the face image, and analyses on the model are employed to extract meaningful face features, as well as head motion. These analysis data are then sent through a transmission medium
15
to a decoder
20
to synthesize a realistic face image.
Besides, methods for inferring 3-D motion from 2-D images in 3-D motion estimation can largely be divided into the following two classifications:
1. Use of 2-D feature points; and
2. Use of optic flow information.
In most methods, correspondences from 2-D feature points or selected pixels are first established, and inference of 3-D motion with perspective projection are next made if only rigid body motion is involved.
In the first classification, in Thomas S. Huang and Arun N. Netravali's “Motion and Structure from Feature Correspondences: A Review,” Proceedings of the IEEE, vol. 82, No. 2, pp. 252-268, February 1994, Huang and Netravali had categorized and introduced different algorithms to infer 3D motions either for 3D-to-3D feature correspondences, 2D-to-3D feature correspondences, or 2D-to-2D feature correspondences. They concluded, at least 5 feature points are necessary to figure out the actual 3D motion. Further, in Roger Y. Tsai and Thomas S. Huang's “Uniqueness and Estimation of Three-Dimensional Motion Parameters of Rigid Objects with Curved Surfaces,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. PAMI-6, No. 1, pp. 13-27, January 1984, Tsai and Huang proposed a linear algorithm for solving the 2D-to-2D feature correspondences with eight feature points.
In the second classification, luminance differences between two 2D images are utilized. Under the assumption of fixed lighting, the intensity differences between two consecutive video frames are mainly due to object motion. This method is adopted in OBASC (Object-Based Analysis-Synthesis Coder) and KBASC (Knowledge-Based Analysis-Synthesis Coder) developed at the University of Hannover (Liang Zhang, “Tracking a face for knowledge-based coding of videophone sequences,” Signal Processing: Image communication, vol. 10, pp. 93-114, 1997; Jorn Ostermann, “Object-based analysis synthesis coding based on the source model of moving rigid 3D objects,” Signal Processing: Image communication, vol. 6, pp. 143-161, 1994). Li et al. also elaborated this concept and proposed a method that estimates motion with “no correspondence problem” (in Haibo Li, Pertti Roivainen, and Robert Forchheimer, “3-D Motion Estimation in Model-Based Facial Image Coding,” IEEE Tran. on Pattern Analysis and Machine Intelligence, vol. 15, No. 6, pp. 545-555, June 1993). Moreover, Netravali and Salz (in A. N. Netravali and J. Salz, “Algorithms for Estimation of Three-Dimensional Motion,” AT&T Technical Journal, vol. 64, No. 2, pp. 335-346, February 1985) derived robust algorithms for estimating parameters of the motion of rigid bodies observed from a television camera. In their algorithms, the capture rate of a television camera (30 frames/sec) is specifically considered.
However, in a practical software-based application for visual communication, the following two constraints have to be considered:
1. Real-time requirement for on-line communication;
2. Each additional feature point adds extra work load in pattern recognition.
SUMMARY OF THE INVENTION
Accordingly, this invention provides a method of image processing of three-dimensional (3-D) head motion with three facial feature points, including the following steps: providing a user's source image to a first processing device; capturing the user's first image and providing it to a second processing device; selecting three facial feature points of the first image from the second processing device to form a 3-D feature triangle; capturing user's consecutive video frames and providing them to the second processing device when the user proceeds with head motions; tracking the three facial feature points corresponding to the consecutive video frames to form a series of actual 2-D feature triangle; rotating and translating the 3-D feature triangle freely to form a plurality of geometric transformations, selecting one of the geometric transformations with acceptable error between the two consecutive 2-D feature triangles, and repeating the step until the last frame of the consecutive video frames and geometric transformations corresponding to various consecutive video frames are formed; and providing the geometric transformations to the first processing device to generate a head motion corresponding to the user's source image.
Facial feature points are three feature points such as the positions of the lateral canthus of the two eyes and the nose, or the ear-tips and the lower chin. The three feature points form a feature triangle and are calibrated. The motion between two consecutive video frames is slight, and human head motion reveals the following characteristics, namely: (1) Feature points are fixed and the three feature points form a feature triangle that can be considered as a rigid body; (2) most head motions are rotation dominated, and (3) the rotation pivot of one's head can be considered to be at the center of his neck. 3-D head motion estimate can be inferred from consecutive video frames with steepest-descent iterative method. Subsequently, if an estimate for 3-D head motion is not acceptable, error recovery can be made for the 3-D head motion estimate with a prediction process, such as Grey System.
The embodiment as disclosed in this invention presents a procedure that estimates head motion from two consecutive video frames using three feature points. Here, a precise solution is not intended, rather, an approximate one is provided because, for a videophone application, one needs not to know how many degrees a user on the other side turns his or her head, but to see natural face orientation.
Also, a camera captures a new image every tenth to thirtieth of a second. During such a small period, changes in motion would be small, and a simple steepest-decent iterative method that tries each transformation adaptively should be able to resolve the unknown transformation quickly. The local minimum obtained is usually close to the global minimum since the 3-D positions obtained in the last frame give a good initial guess for the iteration used in the frame.
Furthermore, some characteristics in human head motion may help to design good error criteria that guide iterations toward the global minimum. However, incorrect estimations are still possible, so the capability to recover from incorrect results is necessary. Prediction algorithms are usually employed to provide a next possible step from previous history data, so a prediction algorithm in the motion estimation procedure for error recovery is included. In fact, the prediction algorithm can also help to smooth estimated motions.
Moreover, a calibration procedure that also utilizes human head characteristics to simplify the calibration procedure is designed in this embodiment.
REFERENCES:
pa
Ouhyoung Ming
Wu Fu-Che
Yang Tzong-Jer
Au Amelia M.
Cyberlink Corp.
Fish & Richardson P.C.
Miller Martin
LandOfFree
Method of image processing using three facial feature points... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of image processing using three facial feature points..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of image processing using three facial feature points... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3160832