System for writing select non-contiguous bytes of data with...

Electrical computers and digital processing systems: processing – Processing control – Logic operation instruction processing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S225000, C712S022000

Reexamination Certificate

active

06173393

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of computer systems and, in particular, to a system and micro-architecture for writing select, non-contiguous bytes of packed data in a single instruction.
2. Background Information
Computer technology continues to evolve at an ever increasing rate. Gone are the days when the computer was merely a business tool primarily used for word-processing and spreadsheet applications. Today, with the evolution of multimedia applications, computer systems have become a common home electronic appliance, much like the television and home stereo system. Indeed, the line between computer system and other consumer electronic appliance has become blurred as multimedia applications executing on an appropriately configured computer system will function as a television set, a radio, a video playback device, and the like. Consequently, the market popularity of computer systems are often decided by the amount of memory they contain and the speed at which they can execute such multimedia applications.
Those skilled in the art will appreciate that multimedia and communications applications require the manipulation of large amounts of data represented in a small number of bits to provide the true-to-life renderings of audio and video we have come to expect. For example, to render a 3D graphic, large amounts of eight-bit data must be similarly processed. Prior art processors would have to issue a number identical instructions to move each byte of data in order to render such a 3D graphic. To improve the efficiency of multimedia applications, as well as other applications with similar characteristics, the Single Instruction, Multiple Data (SIMD) processor architecture has been developed to improve computer system performance by processing several bytes of information in a single instruction.
SIMD architectures take advantage of packing many bytes of data within one register or memory location, employing a data type known in the art as packed data. Packed data generally refers to the representation of multiple numbers by a single value. For example, four eight-bit integer numbers may be represented by a single 32-bit number having four eight-bit segments. Thus, a single instruction from the SIMD instruction set may be used to process four bytes of data that would have required three additional instructions using prior art instruction sets. Accordingly multiple operations can be performed on separate data elements with one instruction, resulting in significant performance improvements.
Theoretically, with its ability to process multiple bytes of data with one instruction, it has been shown that the SIMD processor architecture is capable of performance improvements of up to 4× over non-SIMD processor architectures, while improvements of 1.5× to 2× are more typical. There are a couple of reasons why the theoretical 4× performance improvement has not been reached. One reason is the manner in which prior art SIMD processor architectures process packed data. That is, the 4× performance mark of the SIMD processor architecture can only be achieved when the entire set of data embedded within packed data are to be similarly processed by the instruction. In instances where select, non-contiguous bytes of the packed data are to be processed, inefficiencies result due to the need for multiple instructions and additional cache management. For example, a prior art move operation (MOVQ SRC1, DEST) typically moves packed data identified by a first operand (SRC1) to a location identified by a second operand (DEST). As shown, the entire packed data set identified by SRC1 will be moved to the location identified by DEST. Moving select, non-contiguous bytes of the packed data identified by SRC1 would require multiple instructions.
One example of a prior art approach to moving select, non-contiguous bytes of packed data might be accomplished by the test, branch and write series of instructions. In accordance with this prior art approach, each byte of the packed data is transferred to an integer register, along with a corresponding mask bit. The mask bit is tested and a branch is used to either write or bypass writing the byte to memory. This approach requires many more instructions, and also suffers a performance penalty for poor branch prediction.
Another example of a prior art approach to moving select, non-contiguous bytes of packed data is the conditional move. In the conditional move, each byte of the packed data is transferred to an integer register, along with a corresponding mask bit. The mask bit is tested and used with a conditional move instruction to write the byte to memory. This approach avoids the performance penalties of the branch misprediction identified above, but still requires a number of instructions to identify and move select, non-contiguous bytes of the packed data.
Moreover, in addition to the performance loss incurred with the necessity of multiple instructions, the cache management associated with these multiple instructions also results in a performance loss of prior art SIMD processor architectures. That is, those skilled in the art will appreciate that a move instruction is a series of write instructions at the micro-architecture level and, as such, require a corresponding number of writes to the local processor cache before updating the desired register or main memory location. Thus, the prior art move instructions often result in a number of intermediate writes to the local processor cache, wherein much of the data written to the cache will never again be accessed by the processor, resulting in wasted cache resources.
Thus, a need exists for an improved SIMD architecture which utilizes the packed data format in a more effective manner. Those skilled in the art will appreciate that the teachings of the present invention achieves these and other desired results, as will become apparent from the description to follow.
SUMMARY OF THE INVENTION
In accordance with the teachings of the present invention, a processor is presented comprising a decoder, an execution core and a bus controller. The decoder is operative to decode instructions received by the processor including a move instruction comprising a first operand identifying a plurality of bytes of packed data and a second operand identifying a corresponding plurality of byte masks. The execution core, coupled to the decoder, is operative to receive the decoded move instruction and analyze each individual byte mask of the plurality of byte masks to identify corresponding bytes within the plurality of bytes of packed data that are write-enabled. The bus controller, coupled to the execution core, is operative to write select bytes of the plurality of bytes of packed data to an implicitly defined location based, at least in part, on the write enabled byte masks identified by the execution core.


REFERENCES:
patent: 4729095 (1988-03-01), Colley et al.
patent: 4874164 (1989-10-01), Miner et al.
patent: 5023776 (1991-06-01), Gregor
patent: 5297266 (1994-03-01), Tanaka
patent: 5426783 (1995-06-01), Norris et al.
patent: 5465374 (1995-11-01), Dinkjian et al.
patent: 5893157 (1999-04-01), Greenspan et al.
Hansen, Craig, “Architecture of a Broadband Mediaprocessor,” Abstract, Proceedings of Compcon '96, Feb. 25-28, 1996, pp. 334-340.
Hayes et al., “MicroUnity Software Development Environment,” Abstract, Proceedings of Compcon '96, Feb. 25-28, 1996, pp. 341-348.
Abbott et al., “Broadband Algorithms with the MicroUnity Mediaprocessor,” Abstract, Proceedings of Compcon '96, Feb. 25-28, 1996, pp. 349-354.
Levinthal, A. and Porter, T., “Chap—A SIMD Graphics Processor,” Abstract, Computer Graphics Project, Lucasfilm Ltd., 1984, pp. 77-82.
Wang et al., A Processor Architecture for 3D Graphics Calculations, Computer Motion Inc., pp. 1-23.
Levinthal et al., “Parallel Computers for Graphics Applications,” Abstract, Proceedings: Second International Conference On Architectural Support For Programming Languages And Operating S

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for writing select non-contiguous bytes of data with... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for writing select non-contiguous bytes of data with..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for writing select non-contiguous bytes of data with... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2500065

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.