Method and system using tagged instructions to allow...

Electrical computers and digital processing systems: processing – Instruction decoding – Decoding by plural parallel decoders

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S200000, C712S210000

Reexamination Certificate

active

06212621

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to microprocessors and, more particularly, to decoding variable length instructions within a microprocessor.
2. Description of the Relevant Art
Superscalar microprocessors are capable of attaining performance characteristics which surpass those of conventional scalar processors by allowing the concurrent execution of multiple instructions. Due to the widespread acceptance of the x86 family of microprocessors, efforts have been undertaken by microprocessor manufacturers to develop superscalar microprocessors which execute x86 instructions. Such superscalar microprocessors achieve relatively high performance characteristics while advantageously maintaining backwards compatibility with the vast amount of existing software developed for previous microprocessor generations such as the 8086, 80286, 80386, and 80486.
The x86 instruction set is relatively complex and is characterized by a plurality of variable length instructions. A generic format illustrative of the x86 instruction set is shown in FIG.
1
. As illustrated in the figure, an x86 instruction consists of from zero to four optional prefix bytes
102
, followed by an operation code (opcode) field
104
, an optional addressing mode (Mod R/M) byte
106
, an optional scale-index-base (SIB) byte
108
, an optional displacement field
110
, and an optional immediate data field
112
.
The opcode field
104
defines the basic operation for a particular instruction. The default operation of a particular opcode may be modified by one or more prefix bytes.
For example, a prefix byte may be used to change the address or operand size for an instruction, to override the default segment used in memory addressing, or to instruct the processor to repeat a string operation a number of times. The opcode field
104
follows the prefix bytes
102
, if any, and may be one or two bytes in length. The addressing mode (Mod R/M) byte
106
specifies the registers used as well as memory addressing modes. The scale-index-base (SIB) byte
108
is used only in 32-bit base-relative addressing using scale and index factors. A base field of the SIB byte specifies which register contains the base value for the address calculation, and an index field specifies which register contains the index value. A scale field specifies the power of two by which the index value will be multiplied before being added, along with any displacement, to the base value. The next instruction field is the optional displacement field
110
, which may be from one to four bytes in length. The displacement field
110
contains a constant used in address calculations. The optional immediate field
112
, which may also be from one to four bytes in length, contains a constant used as an instruction operand. The shortest x86 instructions are only one byte long, and comprise a single opcode byte. The 80286 sets a maximum length for an instruction at 10 bytes, while the 80386 and 80486 both allow instruction lengths of up to 15 bytes.
The complexity of the x86 instruction set poses many difficulties in implementing high performance x86 compatible superscalar microprocessors. In particular, the variable length of x86 instructions makes decoding instructions difficult. Decoding instructions typically involves determining the boundaries of an instruction and then identifying each field within the instruction, e.g., the opcode and operands.
One method for determining the boundaries of instructions involves generating a number of predecode bits for each instruction byte read from main memory. The predecode bits provide information about the instruction byte they are associated with. For example, an asserted predecode start bit indicates that the associated instruction byte is the first byte of an instruction. Similarly, an asserted predecode end bit indicates that the associated instruction byte is the last byte of an instruction. Once the predecode bits for a particular instruction byte are calculated, they are stored together with the instruction byte in an instruction cache. When a “fetch” is performed, i.e., a number of instruction bytes are read from the instruction cache, the associated start and end bits are also read. The start and end bits may then be used to generate valid masks for the individual instructions with the fetch. A valid mask is a series of bits in which each bit corresponds to a particular instruction byte. Valid mask bits associated with the first byte of an instruction, the last byte of the instruction, and all bytes in between the first and last bytes of the instruction are asserted. All other valid mask bits are not asserted. Turning now to
FIG. 2
, an exemplary valid mask is shown. The figure illustrates a portion of a fetch
120
and its associated start and end bits
122
and
124
. Assuming the valid mask for instruction B
128
is to be generated, start and end bits
122
and
124
would be used to generate valid mask
126
. Valid mask
126
could then be used to mask off all bytes within fetch
120
that are not part of instruction B
128
.
Once the boundaries of an instruction have been determined, the fields within the instruction, e.g., the opcode and operand fields, may be identified. Once again, the variable length of x86 instructions complicates the identification process. In addition, the optional prefix bytes within an x86 instruction create further complications. For example, in some instructions the opcode will begin with the first byte of the instruction, while others may begin with the fourth byte.
To perform the difficult task of decoding x86 instructions, a number of cascaded levels of logic are typically used. Thus decoding may require a number of clock cycles and create a significant delay before any instructions are available to the functional stages of the microprocessor's pipeline. As microprocessors increase the number of instructions they are able to execute per clock cycle, instruction decoding may become a performance limiting factor. Therefore, an improved mechanism for rapidly decoding large numbers of instructions is needed.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by an instruction alignment and decode unit capable of out of order decoding. By allowing instructions to be decoded out of order, multiple decoders may be efficiently utilized in parallel, thereby reducing overall decode times. The possible performance advantages of out of order decoding are illustrated in
FIGS. 3A-3C
.
FIG. 3A
represents a number of fetches performed in program order to a cache. Each fetch may contain a varying number of instructions, and each instruction may vary in byte length. As a result, longer fetches may require more clock cycles to decode than short fetches.
FIG. 3B
is a timing diagram illustrating one possible timing relationship for two decoders that are capable of performing out of order decoding. In contrast,
FIG. 3C
illustrates the prior art method for decoding fetches, i.e., each fetch is decoded in order. Out of order decoding may be accomplished by assigning tags to fetches and to decoded instructions within each fetch. The tags may then be used to reorder the instructions after decode so that proper dependency checking may still be performed.
In one embodiment, a microprocessor configured to perform out of order decoding comprises a cache, a tag generator, and a decode unit. The instruction cache is configured to receive a fetch address and in response output a group of instruction bytes corresponding to the fetch address. The tag generator is coupled to the instruction cache and is configured to generate a fetch tag for the group of instructions bytes. The decode unit is coupled to the tag generator and the instruction cache. The decode unit is configured to receive the group of instruction bytes, and decode them into one or more instructions. The decode unit is also configured to generate an instruction tag for each decoded instruction that is indicative of the instruction's position in program order. The

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system using tagged instructions to allow... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system using tagged instructions to allow..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system using tagged instructions to allow... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2509565

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.