Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area
Reexamination Certificate
1998-02-19
2001-01-09
Cabeca, John W. (Department: 2752)
Electrical computers and digital processing systems: memory
Storage accessing and control
Shared memory area
C711S117000
Reexamination Certificate
active
06173372
ABSTRACT:
TECHNICAL FIELD
The invention relates to the parallel processing of data matrices, such as images, e.g., in computer graphics or video systems which employ SIMD (single instruction multiple data) arrays of processor elements.
BACKGROUND OF THE INVENTION
The focus in this review of background art is on real-time computer graphics systems. However, no corresponding limitation of the scope of the invention is intended.
The range of technologies employed in computer graphics systems is well summarised in books such as “Computer Graphics: Principles and Practices”, J. D. Foley et al., Addison-Wesley, 1990. These also present valuable summaries of the various uses to which such systems have been and may be put. In a typical system, a task might be to present a two dimensional projection of a three dimensional model held in computer memory. Complex objects in the model are constructed from more basic components such as flat polygons. Each of these components has aspects of its position, appearance, and behaviour defined by a set of numbers. Constructing a two dimensional projection of the model is termed rendering. The rendered image might be displayed on a computer monitor, for example. In an interactive system, the renderer must generate not isolated images but a rapid succession of related frames.
Graphics rendering schemes generally include separate “Rasterization” and “shading” processes. Rasterization determines which parts of the model contribute directly to which pixels (picture elements) of the image. The shading process then “paints in” the contributions of the those parts. The degree of realism achieved in the final image depends very much on the behaviour of shading algorithm. One popular technique for improving the accuracy or richness of rendered images is called texture mapping, see for example “Survey of Texture Mapping,” P.S. Heckbert, IEEE Computer Graphics and Applications, v6.11, p56-67, November 1986, or U.S. Pat. No. 5,490,240: “System and method of generating interactive computer graphic images incorporating three dimensional textures,” J. L. Foran et al., Silicon Graphics.
In this technique the model has data-sets added which can be used to define surface detail. A typical data-set might be a pattern of bricks to tile onto a wall or a user's photograph to personalise a mannequin. The business of reading texture data from shared memory into the image is part of the shading process.
Texture mapping is in part a re-sampling operation, because the texture sample grids and screen pixel grids do not tend to line up. Practical systems use interpolation to keep the re-sampling errors acceptably small. Anti-aliasing artifacts are commonly avoided by making multiple copies of the textures available, each one having a different resolution. These multi-resolution data-sets are called MIP-maps (where MIP stands for multum in parvo, multiple image pyramid, or similar). With bilinear interpolation, texture values are calculated as linear weightings of the four closest texels (texture elements), as indicated in
FIG. 1
of the accompaning drawings. The texels are taken from MIP-map levels which are chosen, on a per pixel basis, to give an acceptable compromise between too much aliasing and too much blurring. In a simple implementation, bilinear interpolation uses four times as many texel lookups as point sampling does. Many high-quality renderers use trilinear interpolation, which demands eight texels per pixel. Four of these are taken from one MIP-map level, and the other four from an adjacent level. This allows blur banding effects to be avoided.
The phrase “texture mappings”, can refer to a wide range of rendering operations, including bump mapping, environment mapping, image resampling, lightsource mapping, and others.
However, there are many other data matrix processing tasks which also involve re-sampling and which hence share the need for fast interpolation. One such is motion-vector-based video processing. See for example U.S. Pat. No. 5,396,592: “Image signal interpolating circuit for calculating interpolated values for varying block sizes”, T. Fujimoto, Sony.
To address the problem of aliasing at polygon edges, supersampling is often employed. With “four times” supersampling, each final-image pixel receives contributions from four separately rasterized sub-pixels. The word “pixel”, used casually, often refers not just to final-image pixels but to sub-pixels as well.
The problem of how to achieve adequate texture bandwidth is a persistent one in graphics system design. One strategy is to deter the shading, for any given region of screen space, until the rasterization of that region has finished. In this way, polygons and parts of polygons which turn out to be obscured in the final image do not ever get shaded. Another measure which can be taken, is to hold copies of recently fetched texels in a cache. It should be apparent from
FIG. 1
that this can cut the number of main-memory lookups by a significant factor.
High performance renderers necessarily rely on parallel processing techniques. Significantly, an increasing number of new designs are using the SIMD (single instruction multiple data) approach to parallelism, in which the individual processor elements (PEs) operate in lockstep from a common instruction stream. The PixelFlow system is one such design “PixelFlow: High-Speed Rendering Using Image Composition”, S. Molnar, J. Eyles and J. Poulton, Computer Graphics, v26.2 (SIGGRAPH '92 Conference Proceedings), p231-240, July 1992. A small PixelFlow system might have 16 array chips, each containing
256
PEs. It would typically tackle the rendering task region by region, the regions being of a size which allocates one PE to each pixel.
Single instruction multiple data (SIMD) arrays intended specifically for graphics rendering tend to have quite limited facilities for interprocessor communications, compared for example to “systolic” arrays. However, a basic facility is often provided, for example to support the sub-pixel merging process in applications which demand supersampling.
Texel caching has been used to advantage in some MIMD (multiple instruction multiple data) arrays. However, none of the standard cache implementation architectures maps sensibly into the SIMD case. This is largely because of the pipelining which is a necessary feature of SIMD array memory subsystems.
The PixelFlow shared-memory subsystem serves as a good example of current art. Since the PEs act in lockstep, they must all wait for every one of any given set of texture reads to have completed before they attempt to use the data. The memory accesses are, in effect, heavily pipelined. Eight banks of memory are provided, each individually addressable. The default strategy is to allocate texels to banks according to the numbering of FIG.
2
. This allows eight texels to be fetched for trilinear interpolation in one parallel read.
Importantly, the PixelFlow system does include a mechanism for making texture lookups conditional on whether PEs do or do not need data. To see the value of this, consider the rasterization and texturing of a single triangle. After rasterization, each PE knows whether it is inside or outside of the triangle's perimeter. The mechanism means that no texture lookups need to take place for PEs which are outside the triangle. This basic step away from full determinism sets the state of the art in SIMD graphics memory subsystems.
SUMMARY OF PRESENT INVENTION
According to a first aspect of the present invention, there is provided a data matrix processing device which includes a SIMD array of processor elements, a shared memory block to which the processor elements have common read access, and control and/or communications means, the control means being configured to enable the read accesses in general, and texture lookups in particular, to be shared between multiple processor elements or processes.
According to a second aspect of the invention, there is provided a system for enabling the look up of two quartets, these being destined for interpolation on two dif
Cabeca John W.
Peugh Brian R.
PixelFusion Limited
Seyfarth Shaw
LandOfFree
Parallel processing of data matrices does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Parallel processing of data matrices, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Parallel processing of data matrices will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2479433