Computer graphics processing and selective visual display system – Computer graphics processing – Three-dimension
Reexamination Certificate
2000-06-05
2004-09-14
Mancuso, Joseph (Department: 2697)
Computer graphics processing and selective visual display system
Computer graphics processing
Three-dimension
Reexamination Certificate
active
06791540
ABSTRACT:
The present invention relates to the field of image-based rendering, that is the processing of data defining pre-acquired images (real or synthetic, static or dynamic) to synthesise a new image from a desired viewpoint without relying upon a geometric model of the subject.
Images such as photographs, television pictures, video pictures etc provide a two-dimensional view of a scene from only predetermined viewpoints determined by the positions of the cameras. However, it is often desirable to view the scene from a different viewing position/orientation, and accordingly a number of techniques have been developed for this.
In one approach, known as “model-based rendering” a geometric model of the subject is created using geometric primitives such as polygons, and the model is then rendered from a desired viewing position and orientation taking into account reflectance properties of the surface of the subject and parameters defining the position and characteristics of light sources.
Such an approach suffers from many problems, however, and in particular the time and processing resources necessary to define the geometric model, surface reflectances and light sources sufficiently well that a realistic output image can be achieved.
As a result, a number of “image-based rendering” techniques have been developed which can generate an image from a viewing position/orientation different to those of the start images without using a geometric model of the subject.
For example, techniques based on interpolating the positions and colours of pixels in two images have been proposed to generate intermediate views, such as in “View Morphing” by Seitz and Dyer in SIGGRAPH Computer Graphics Proceedings, Annual Conference Series, 1996, pages 21-30. However, the intermediate views are only generated for a viewpoint on the line connecting the two viewpoints of the original images.
An image-based rendering technique which allows an image to be generated from an arbitrary viewing position/orientation is disclosed in “Light Field Rendering” by Levoy and Hanrahan in SIGGRAPH Computer Graphics Proceedings, Annual Conference Series, 1996, pages 31-42, in which a four-dimensional light field defining radiance as a function of position and direction is generated. This function characterises the flow of light through unobstructed space in a static scene with fixed illumination. Generating a new image is done by calculating a slice of the light field in two-dimensions. However, the number of input images required and the time and processing resources necessary to perform this technique are considerable.
“The Lumigraph” by Gortler et al in SIGGRAPH Computer Graphics Proceedings, Annual Conference Series, 1996, pages 43-54 discloses a technique in which a simplified light field function is calculated by considering only light rays leaving points on a convex surface that encloses the object. In this technique, however, images can be synthesised only from viewpoints exterior to the convex hull of the object being modelled, and the number of input images required and the processing time and effort is still very high.
A further image-based rendering technique is described in “Multiple-Centre-of-Projection Images” by Rademacher and Bishop in SIGGRAPH Computer Graphics Proceedings, Annual Conference Series, 1998, pages 199-206. In this technique a multiple-centre-of-projection image of a scene is acquired, that is, a single two-dimensional image and a parameterised set of cameras meeting the conditions that (1) the cameras must lie on either a continuous curve or a continuous surface, (2) each pixel is acquired by a single camera, (3) viewing rays vary continuously across neighbouring pixels, and (4) two neighbouring pixels must either correspond to the same camera or to neighbouring cameras. In practice, the required multiple-centre-of-projection image is acquired by translating a one-dimensional CCD camera along a path so that one-dimensional image-strips are captured at discrete points on the path and concatenated into the image buffer. However, the scene must be static to prevent mismatched data as every image-strip is captured at a different time. To render an image of the scene from a new viewpoint, the reprojected location in world-space of each pixel from the multiple-centre-of-projection image is computed, and the reprojected points are then rendered to reconstruct a conventional range image from the new viewpoint. To perform the rendering, a splatting technique is proposed, which consists of directly rendering each point using a variable-size reconstruction kernel (e.g. a Gaussian blob), for example as described in “An Anti-Aliasing Technique for Splatting” by Swan et al in Proceedings IEEE Visualization 1997, pages 197-204. This technique suffers, inter alia, from the problem that a multiple-centre-of-projection image is required as input.
A number of hybrid approaches, which combine model-based rendering and image-based rendering, have been proposed.
For example, “View-based Rendering: Visualizing Real Objects from Scanned Range and Color Data” by Pulli et al in Proceedings Eurographics 8th Workshop on Rendering, June 1997, pages 23-34, discloses a technique in which a partial geometric model comprising a triangle mesh is interactively created for each input image which originates from a different viewpoint. To synthesize an image from a new viewpoint, the partial models generated from input images at three viewpoints close to the new viewpoint are rendered separately and combined using a pixel-based weighting algorithm to give the synthesised image.
“Constructing Virtual Worlds Using Dense Stereo” by Narayanan and Kanade in Proceedings 6th ICCV, 1998, pages 3-10, discloses a hybrid technique in which the intensity image and depth map for each camera view at each instant in time is processed to generate a respective textured polygon model for each camera, representing the scene visible to that camera. To generate an image for a user-given viewpoint, the polygon model which was generated from the camera closest to the user viewpoint (a so-called “reference” camera) is rendered and holes in the resulting rendered view are filled by rendering the polygon models which were generated from two camera neighbouring the reference camera. If any holes still remain, they are filled by interpolating pixel values from nearby filled pixels. Alternatively, a global polygon model of the whole scene can be constructed and rendered from the desired viewpoint.
In both of the hybrid techniques described above, a large number of closely-spaced cameras is required to provide the input data unless the viewpoints from which a new image can be generated are severely restricted and/or a degraded quality of generated image is accepted. This is because a partial geometric model must be available from each of a number of cameras that are close to the viewpoint from which the new image is to be rendered. For example, in the technique described in “Constructing Virtual Worlds Using Dense Stereo”, 51 cameras are mounted on a 5 meter geodesic dome to record a subject within the dome. In addition, processing time and resource requirements are increased due to the requirement to generate at least partial geometric models.
The present invention has been made with the above problems in mind.
According to the present invention, there is provided an image-based rendering method or apparatus, in which, to generate a value for a pixel in a virtual image from a user-defined viewpoint, input depth map images are tested to identify the pixel or pixels therein which represent the part of the scene potentially visible to the pixel in the virtual image, and a value for the pixel in the virtual image is calculated based on the pixel(s) which represent the part of the scene closest to the virtual image.
Preferably, a Z-buffer is used to maintain pixel values for the virtual image, which is updated as the input depth map images are tested if the pixel or pixels identified from a depth map image represent a part of the scene closer to the virtual image th
Arnold Adam
Canon Kabushiki Kaisha
Fitzpatrick ,Cella, Harper & Scinto
Mancuso Joseph
LandOfFree
Image processing apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Image processing apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Image processing apparatus will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3252614