Image analysis – Applications – 3-d or stereo imaging analysis
Reexamination Certificate
1998-03-20
2001-11-20
Mancuso, Joseph (Department: 2623)
Image analysis
Applications
3-d or stereo imaging analysis
C345S440000
Reexamination Certificate
active
06320978
ABSTRACT:
BACKGROUND
1. Technical Field
The invention is related to a system and process for extracting 3D structure from plural, stereo, 2D images of a scene by representing the scene as a group of image layers characterized by estimated parameters including the layer's orientation and position, per-pixel color, per-pixel opacity, and optionally a residual depth map, and more particularly, to such a system and process for refining the estimates for these layer parameters.
2. Background Art
Extracting structure from stereo has long been an active area of research in the imaging field. However, the recovery of pixel-accurate depth and color information from multiple images still remains largely unsolved. Additionally, existing stereo algorithms work well when matching feature points or the interiors of textured objects. However, most techniques are not sufficiently robust and perform poorly around occlusion boundaries and in untextured regions.
For example, a common theme in recent attempts to solve these problems has been the explicit modeling of the 3D volume of the scene. The volume of the scene is first discretized, usually in terms of equal increments of disparity. The goal is then to find the so-called voxels which lie on the surfaces of the objects in the scene using a stereo algorithm. The potential benefits of these approaches can include, the equal and efficient treatment of a large number of images, the explicit modeling of occluded regions, and the modeling of mixed pixels at occlusion boundaries to obtain sub-pixel accuracy. However, discretizing space volumetrically introduces a huge number of degrees of freedom. Moreover, modeling surfaces by a discrete collection of voxels can lead to sampling and aliasing artifacts.
Another active area of research directed toward solving the aforementioned problems is the detection of multiple parametric motion transformations within image sequence data. The overall goal is the decomposition of the images into sub-images (or “layers”) such that the pixels within each sub-image move consistently with a single parametric transformation. Different sub-images are characterized by different sets of parameter values for the transformation. A transformation of particular importance is the 8-parameter homography (collineation), because it describes the motion of points on a rigid planar patch as either it or the camera moves. The 8 parameters of the homography are functions of the plane equations and camera matrices describing the motion.
While existing layer extraction techniques have been successful in detecting multiple independent motions, the same cannot be said for scene modeling. For instance, the fact that the plane equations are constant in a static scene (or a scene imaged by several cameras simultaneously) has not been exploited. This is a consequence of the fact that, for the most part, existing approaches have focused on the two frame problem. Even when multiple frames have been considered, it has primarily been solely for the purposes of using past segmentation data to initialize future frames. Another important omission is the proper treatment of transparency. With a few exceptions, the decomposition of an image into layers that are partially transparent (translucent) has not been attempted.
SUMMARY
The present invention relates to stereo reconstructions that optimally recover pixel-accurate depth and color information from multiple images, including around occlusion boundaries and in untextured regions. This is generally accomplished using an approach to the stereo reconstruction that represents the 3D scene as a collection of approximately planar layers, where each layer has an explicit 3D plane equation and a layer sprite image, and may also be characterized by a residual depth map. The layer sprite refers to a colored image with a defined per-pixel opacity (transparency). The residual depth map refers to a per-pixel depth value relative to the plane. The approach of segregating the scene into planar components allows a modeling of a wider range of scenes. To recover the structure of the scene, standard techniques from parametric motion estimation, image alignment, and mosaicing can be employed.
More specifically, the approach to the stereo reconstruction based on representing the 3D scene as a collection of approximately planar layers involves first estimating the desired parameters (e.g. plane equation, sprite image and depth map) and then refining these estimates. The estimating phase can be accomplished via any appropriate method, such as the methods disclosed in a co-pending application entitled STEREO RECONSTRUCTION EMPLOYING A LAYERED APPROACH by the inventors of this application and assigned to the common assignee. This application was filed on Mar. 20, 1998 and assigned Ser. No. 09/045,519. The full approach disclosed in the application, which is believed to provide the best estimate of the layer parameters, and so the structure of the 3D scene, includes:
(a) inputting plural 2D images as well as camera projection matrices defining the location and orientation of the camera(s) responsible for creating each image, respectively;
(b) assigning each pixel making up each 2D image to one of the plural layers;
(c) estimating a plane equation for each layer that defines the orientation and position of that layer in 3D space;
(d) estimating a sprite image for each layer characterized by a per-pixel color and a per-pixel opacity;
(e) estimating a residual depth map for each layer wherein each residual depth map defines the distance each pixel of the associated layer is offset from the estimated plane of that layer;
(f) re-estimating each layer's sprite image based on the residual depth map associated with the layer;
(g) re-assigning pixels assigned to a particular layer to another layer by using the estimates for the plane equation, sprite image, and residual depth map for each layer as a guide;
(h) iteratively repeating steps (c) through (g) for each layer until the change in the value of at least one layer parameter relative to its value in an immediately preceding iteration falls below a prescribed threshold assigned to the parameter; and
(i) outputting data representative of the plane equation, sprite image and residual depth map estimates for each layer.
Only the input, pixel assignment, plane equation and sprite image estimation, and output modules (less the residual depth map) are necessary to produce a useable layered representation of the scene. However, the accuracy of the layered representation can be progressively improved with the respective addition of each of the remaining modules, i.e. the depth map estimation, sprite image re-estimation, and pixel re-assignment and iteration modules.
The initial estimates of the layer parameters are refined in accordance with systems and methods embodying the present invention. The refining process is accomplished using a re-synthesis, which takes into account both occlusions and mixed pixels. Approximate knowledge of the 3D structure (camera matrices and plane equations) allows reasoning about the image formation process. Specifically, a forward (generative) model of image synthesis is used, as well as a process of measuring of how well the layers re-synthesize the input images. Optimizing this measure allows refinement of the layer sprite estimates, and, in particular, the estimate of their true colors and opacities. This approach results in the correct recovery of mixed pixels, a step which is necessary to obtain sub-pixel accuracy and to ensure robustness at occlusion boundaries. Once the layer sprite estimates are refined, the plane equations and residual depth maps (if employed) can be refined as well using the original estimation process, such as the one disclosed in the aforementioned co-pending application.
The layered approach to stereo reconstruction shares many of the advantages of the previously described volumetric approaches because the 3D information contained in the layers is used to reason about occlusion and mixed pixels. However, the layered approa
Anandan Padmananbhan
Baker Simon
Szeliski Richard S.
Bali Vikkram
Lyon Richard T.
Lyon, Harr & DeFrank
Mancuso Joseph
Microsoft Corporation
LandOfFree
Stereo reconstruction employing a layered approach and layer... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Stereo reconstruction employing a layered approach and layer..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Stereo reconstruction employing a layered approach and layer... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2609739