Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1998-09-09
2001-06-26
Nguyen, Than (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S118000, C711S123000, C711S202000, C711S205000, C711S206000, C712S208000, C712S213000, C712S210000
Reexamination Certificate
active
06253287
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to storing and scanning variable-length instructions in a microprocessor.
2. Description of the Relevant Art
The number of software applications written for the x86 instruction set is quite large. As a result, despite the introduction of newer and more advanced instruction sets, microprocessor designers have continued to design microprocessors capable of executing the x86 instruction set.
The x86 instruction set is relatively complex and is characterized by a plurality of variable-length instructions. A generic format illustrative of the x86 instruction set is shown in FIG.
1
. As illustrated in the figure, an x86 instruction consists of from one to five optional prefix bytes
102
, followed by an operation code (opcode) field
104
, an optional addressing mode (Mod R/M) byte
106
, an optional scale-index-base (SIB) byte
108
, an optional displacement field
110
, and an optional immediate data field
112
.
The opcode field
104
defines the basic operation for a particular instruction. The default operation of a particular opcode may be modified by one or more prefix bytes
102
. For example, one of prefix bytes
102
may be used to change the address or operand size for an instruction, to override the default segment used in memory addressing, or to instruct the processor to repeat a string operation a number of times. The opcode field
104
follows prefix bytes
102
, if present, and may be one or two bytes in length. The addressing mode (Mod R/M) byte
106
specifies the registers used as well as memory addressing modes. The scale-index-base (SIB) byte
108
is used only in 32-bit base-relative addressing using scale and index factors. A base field within SIB byte
108
specifies which register contains the base value for the address calculation, and an index field within SIB byte
108
specifies which register contains the index value. A scale field within SIB byte
108
specifies the power of two by which the index value will be multiplied before being added, along with any displacement, to the base value. The next instruction field is a displacement field
110
, which is optional and may be from one to four bytes in length. Displacement field
110
contains a constant used in address calculations. The optional immediate field
112
, which may also be from one to four bytes in length, contains a constant used as an instruction operand. The shortest x86 instructions are only one byte long, and comprise a single opcode byte. The 80286 sets a maximum length for an instruction at 10 bytes, while the 80386 and 80486 both allow instruction lengths of up to 15 bytes.
The complexity of the x86 instruction set poses many difficulties in implementing high performance x86-compatible microprocessors. In particular, the variable length of x86 instructions makes decoding instructions difficult. Decoding instructions typically involves determining the boundaries of an instruction and then identifying each field within the instruction, e.g., the opcode and operand fields. Decoding typically takes place once the instruction is fetched from the instruction cache before execution.
One method for determining the boundaries of instructions involves generating a number of predecode bits for each instruction byte read from main memory. The predecode bits provide information about the instruction byte they are associated with. For example, an asserted predecode start bit indicates that the associated instruction byte is the first byte of an instruction. Similarly, an asserted predecode end bit indicates that the associated instruction byte is the last byte of an instruction. Once the predecode bits for a particular instruction byte are calculated, they are stored together with the instruction byte in an instruction cache. When a “fetch” is performed, i.e., a number of instruction bytes are read from the instruction cache, the associated start and end bits are also read. The start and end bits may then be used to generate valid masks for the individual instructions with the fetch. A valid mask is a series of bits in which each bit corresponds to a particular instruction byte. Valid mask bits associated with the first byte of an instruction, the last byte of the instruction, and all bytes in between the first and last bytes of the instruction are asserted. All other valid mask bits are not asserted.
Turning now to
FIG. 2
, an exemplary valid mask is shown. The figure illustrates a portion of a fetch
120
and its associated start and end bits
122
and
124
. Assuming a valid mask
126
for instruction B
128
is to be generated, start and end bits
122
and
124
would be used to generate the mask. Valid mask
126
could then be used to mask off all bytes within fetch
120
that are not part of instruction B
128
.
Once the boundaries of an instruction have been determined, the fields within the instruction, e.g., the opcode and operand fields, may be identified. Once again, the variable length of x86 instructions complicates the identification process. In addition, the optional prefix bytes within an x86 instruction create further complications. For example, in some instructions the opcode will begin with the first byte of the instruction, while others may begin with the second, third, or fourth byte.
To perform the difficult task of decoding x86 instructions, a number of cascaded levels of logic are typically used. Thus, decoding may require a number of clock cycles and may create a significant delay before any instructions are available to the functional stages of the microprocessor's pipeline. As microprocessors increase the number of instructions they are able to execute per clock cycle, instruction decoding may become a performance limiting factor. Therefore, a mechanism for simplifying the complexity and time required for instruction decoding is needed.
SUMMARY OF THE INVENTION
The problems outlined above may in part be solved by a microprocessor capable of predecoding instructions to fixed field lengths and then storing them in a “threedimensional” instruction cache. Broadly speaking, in one embodiment a microprocessor capable of efficient instruction decoding comprises a predecode unit and an instruction cache. The predecode unit is coupled to receive variable-length instructions from a main memory subsystem. The variable-length instructions are then predecoded by detecting instruction field boundaries within each instruction. Once the instruction fields have been determined, the instruction is conveyed to the instruction cache for storage. The instruction cache is coupled to the predecode unit and comprises an array of instruction storage locations. Each instruction storage location in turn comprises an array of instruction field storage locations. Each instruction field storage location is configured to store a particular type of instruction field and comprises at least enough memory cells to store the maximum number of instruction bytes possible for the corresponding type of instruction field. The instruction cache may be logically configured as a three-dimensional array of memory cells. The instruction cache may also be physically configured as a three-dimensional array by forming the constituent memory cells on different layers of the microprocessor's die.
Using a three-dimensional configuration may advantageously allow each instruction to have the same length in two dimensions. This may in turn greatly simplify the task of determining the boundaries of instructions read from the instruction cache. Using a three-dimensional configuration may also allow instructions to be stored in fixed field width format. This, in turn, may potentially reduce or even eliminate the delay and hardware associated with determining instruction field boundaries for instructions read from the instruction cache.
In another embodiment, the instruction cache may be configured as a plurality of two-dimensional arrays, each configured to store a particular type of instruction field. These arrays may be formed either on
Advanced Micro Devices , Inc.
Christen Dan R.
Conley Rose & Tayon PC
Nguyen Than
LandOfFree
Using three-dimensional storage to make variable-length... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Using three-dimensional storage to make variable-length..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Using three-dimensional storage to make variable-length... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2478804