Method and apparatus for speeding sequential access of a...

Electrical computers and digital processing systems: processing – Processing control – Instruction modification based on condition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S208000, C712S211000, C712S233000, C712S237000, C712S239000, C712S240000

Reexamination Certificate

active

06237087

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention is related to the field of use of caches. More particularly, this invention is directed to a method and apparatus for speeding sequential access of a set-associative cache in a pipelined processor.
2. Description of Related Art
Currently, many processors are implemented with a pipelined architecture. These pipelined processors may also include one or more caches, such as an instruction cache for storing instructions for faster access. In such an architecture, there are distinct steps that an instruction undergoes as it is processed. Typically, these include: (1) loading the instruction from main memory into the instruction cache; (2) fetching the instruction from the instruction cache; (3) decoding the instruction; and (4) executing the instruction.
One instruction that often needs to be processed is an instruction pointer (IP) relative branch instruction. IP-relative branch instructions are conditional or unconditional branch instructions that compute the target address of the branch (the address of the next instruction to be executed if the branch is taken, also known as the “branch target”) by adding a value extracted from the branch instruction itself to the address contained in the instruction pointer (which is address of the branch instruction). The IP-relative branch instruction is a common form of branch instruction in many computer instruction sets. In this context, branch instructions may include call and branch-and-link instructions.
The computation of the target address typically occurs in either the instruction decode step or in a subsequent execution step.
FIG. 1
contains a target address computation unit
100
that illustrates how this processing occurs. Target address computation unit
100
contains an IP register
110
; a direct-mapped cache
112
including a set of cache tags
114
, cache data
116
and a hit detection unit
117
; a decoder
118
; an adder
120
for calculating the branch target address; an incrementer
122
for handling non-branch cases; and a selector
124
that determines whether the next IP is obtained from incrementer
122
or adder
120
. For an IP-relative branch, the circuit shown in
FIG. 1
functions by beginning with some number of the less-significant bits (LSb's) of the IP (but not necessarily the least significant bits), sufficient to select one line from the plurality of lines in cache
112
, being sent to the address inputs of the storage arrays that hold set of cache tags
114
and cache data
116
. The remaining more significant bits (MSb's) of the IP are sent to hit detection unit
117
, where they are compared with the tag of the accessed cache line to determine whether a cache hit occurred (that is, whether the instruction referenced by the IP is contained in cache
112
).
Assuming a hit occurred, decoder
118
determines whether the instruction is a branch instruction and sends the displacement from the instruction word that is read out of cache
112
to adder
120
. The displacement is then added to the IP. Assuming the branch is taken, selector
124
, which is under control of other logic (not shown), routes the output of adder
120
to IP register
110
, where it becomes the next IP when the load signal is sent to IP registers
110
. If the instruction is not a branch instruction or if the instruction contains a branch that is not taken, then the output of incrementer
122
becomes the next IP.
FIG. 1
illustrates shows the operation of a typical instruction cache. Two separate processes are involved: the process by which information is placed in the instruction cache, and the process by which information is fetched from the instruction cache. The two arrows coming into the top of cache
112
represent the process of putting information into cache
112
. The information consists of an address and the data that is contained in that address in a main memory. Assuming cache
112
is a simple cache where cache lines are filled all at once, information is placed in the set-associative cache by the following process:
1) The set that the address belongs to is determined. The set is a function of a group of bits in the address. Generally, these are the least significant bits beyond those bits needed to identify an offset within the cache line, sufficient to address all the sets in the cache. For example, if a cache line is 32 bytes, then 5 bits of address are needed to identify where in the cache line data is, so the sixth bit and up are used to select which set to use. If there are 128 sets (the number of sets is essentially always a power of two) then 7 bits would be needed to select one set. In this example, the sixth through twelfth least significant bits would select which set to use.
2) A line within that set is chosen. The selection of which line to use can be done via a number of different algorithms, such as Least-Recently-Used, Least-Recently-Written, Circular, or another algorithm that is known by one of ordinary skill in the art. In a direct mapped cache, there is only one line per set, so the selection process is trivial.
3) The data is written into the selected cache line's data storage, and the more significant bits (those not used in the selection of a byte within a line nor in the selection of which set to use) are placed in the tag for that line.
One exemplary process of fetching information from cache
112
proceeds as follows:
1) The address of the next information desired from cache
112
is generated. If a branch has not occurred, this is done by incrementing the previous address that was used. If a branch has occurred, then the computed target address of the branch is used. For the type of branches covered by this invention, the branch target address is generated by adding a displacement contained within the instruction to the address of the instruction (or, in some architectures, the address of the next instruction that would have been executed if the branch had not been taken.) The displacement is extracted from the instruction via decoder
118
. Incrementer
122
, adder
120
, selector
124
and IP Register
110
of
FIG. 1
are used to perform the function of generating the needed address, under the control of logic not shown in the diagram. If another form of branch is possible (such as a return from subroutine, or, a branch to an address specified by a general purpose register), then another input to selector
124
would be used to route the address generated by that branch into the circuit. This is not shown in the diagram but is obvious to one of ordinary skill in the art.
2) The bits of the generated address needed to select the set are fed into cache
112
. In the example given above, these are the sixth through twelfth least significant bits. These bits select the lines belonging to one set, all of which then drive their tags and data out.
3) The tags coming out of the lines are compared to the more significant bits (in the example, the thirteenth least significant bit and all more significant bits) to determine whether a cache hit has occurred. If one of the lines from the set produces a hit, then the needed data is selected from its data using the least significant bits below those used to select the set (the fifth least significant bit and those bits less significant than it).
4) This fetched data is then fed into decoder
118
, as described for first step of execution.
In an address computation unit such as target address computation unit
100
of
FIG. 1
, the cache access and the decode/adder stages would often be implemented in separate pipestages. Thus, a target address of a branch instruction is not immediately available at the beginning of the clock cycle after the cycle in which instruction was fetched from cache
112
. This in turn means that the instruction that would be executed immediately after the branch, if the branch is taken, can not be fetched in the cycle after the branch instruction was fetched. This causes one or more clock cycle during which no new instruction is issued into the exe

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for speeding sequential access of a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for speeding sequential access of a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for speeding sequential access of a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2555981

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.