Electrical computers and digital processing systems: processing – Instruction fetching
Reexamination Certificate
2000-04-19
2004-06-15
Tsai, Henry W. H. (Department: 2183)
Electrical computers and digital processing systems: processing
Instruction fetching
C712S210000
Reexamination Certificate
active
06751724
ABSTRACT:
FIELD OF THE INVENTION
This application relates generally to data processing systems, and more specifically, to instruction fetching in data processing systems.
RELATED ART
As data processing systems are becoming more widely used for a variety of applications, both speed and cost are becoming greater concerns. The goal in most designs is to reduce latency in order to improve speed and performance. For example, in many data processing systems, a central processing unit (CPU) increases instruction fetching efficiency by incorporating a number of instruction buffers and a wider data bus to memory. As the width of these instruction buffers and data buses increases, the bandwidth of data transfers increases, thus allowing for a more efficient CPU pipeline utilization. For example, a CPU may utilize a 32-bit bus which allows for 32-bit accesses. Therefore, for a processor having a 16-bit instruction length, two instructions may be accessed each cycle from a device that supports 32-bit accesses. However, in such data processing systems, a need exists to be able to also access instructions from devices, such as memories, supporting only 16-bit accesses. Devices having 16-bit access ports are generally cheaper and easier to manufacture than devices having 32-bit access ports since smaller port sizes allow for smaller packages. In the case of these 16-bit devices, the increased bandwidth offered by the 32-bit data busses internal to the data processing system may present a performance penalty rather than a performance improvement when the CPU requests a pair of 16-bit instructions since the 16-bit device is not capable of supplying a pair of instructions with the same latency as a single instruction.
For example,
FIG. 1
illustrates, in timing diagram form, the operation of a data processing system having a CPU utilizing 16-bit instructions coupled to a 32-bit internal data bus, a 16-bit external data bus, and a 16-bit external memory device. In this case, the CPU requests and fetches two instructions during each instruction access, since the internal data bus supports 32-bit fetches. In many sequences of instructions, though, greater pipeline stalls occur due to the fact that two instructions must be accessed before returning the fetched instructions to the CPU. For example, as illustrated in
FIG. 1
, a pair of instructions located at addresses
0
and
2
are accessed during the first two cycles by placing address
0
on the internal address bus (INT ADDR) and requesting a 32-bit fetch. The requested address corresponds to an external 16-bit memory, thus two 16-bit fetches must be performed (to address
0
and
2
respectively) in order to satisfy the CPU's request. In the instruction stream illustrated in the table of
FIG. 1
, the first two instructions stored at addresses
0
and
2
, are branch (BRANCH) and instruction
1
(INST
1
), respectively. Once the branch and instruction
1
are placed on the external data bus (EXT DATA) by the device being accessed, they are provided to the CPU as shown in
FIG. 1
via the internal data bus (INT DATA). Therefore, the CPU does not begin to decode the branch instruction until both the branch and instruction
1
have been fetched from the accessed device.
While the branch is in the decode stage of the CPU pipeline, an access of the next two instructions has already been initiated, as illustrated by INT ADDR receiving address
4
, indicating that address
4
has been accessed. No data is returned to the CPU until both instructions
2
and
3
(INST
2
and INST
3
) corresponding to addresses
4
and
6
, respectively, are placed on the external data bus. However, prior to completing the access of addresses
4
and
6
, the branch was decoded and a target address generated. Because the branch instruction causes a change of flow in the instruction execution stream, the prefetched instructions
2
and
3
(located at addresses
4
and
6
respectively) will be discarded, and are not executed. Since the fetches of addresses
4
and
6
were already initiated, the CPU is stalled until both instructions
2
and
3
are fetched. Therefore, the fetch of instructions
2
and
3
introduces stall
2
into the CPU pipeline. Only after the access of instructions
2
and
3
can the access of the target instruction (TARGET) of the branch located at address
10
begin. Furthermore, the target of the branch is not received until after both the target and target
2
instructions (at addresses
10
and
12
) have been placed on the external data bus and returned to the CPU, since a pair of instructions was requested, thus introducing stall
4
into the CPU pipeline.
The introduction of stalls
1
through
4
into the CPU pipeline results in increased latency and decreased performance of the data processing system.
FIG. 1
illustrates one example of the latencies introduced into a data processing system; however, similar latencies arise in many data processing systems utilizing similar instruction fetches, especially when attempting to interface a data processing device with an external device having a smaller access port than the width of the data processing device's internal data bus. Therefore, a need exists for improved instruction fetching in order to reduce latency and achieve a more efficient data processing system.
REFERENCES:
patent: 4633437 (1986-12-01), Mothersole et al.
patent: 5459847 (1995-10-01), Okamura
patent: 5530825 (1996-06-01), Black et al.
patent: 5596740 (1997-01-01), Quattromani et al.
patent: 5611071 (1997-03-01), Martinez, Jr.
patent: 5651138 (1997-07-01), Le et al.
patent: 5802587 (1998-09-01), Ishikawa et al.
patent: 5826058 (1998-10-01), Hartvigsen et al.
patent: 5867682 (1999-02-01), Witt et al.
patent: 5978908 (1999-11-01), Cumming et al.
patent: 6336182 (2002-01-01), Derrick et al.
patent: 6393549 (2002-05-01), Tran et al.
patent: 6510475 (2003-01-01), Bennett
Arends John H.
Moyer William C.
Scott Jeffrey W.
Thomas James S.
Vaglica John J.
Chiu Joanna G.
Motorola Inc.
Patrick Mark D.
Tsai Henry W. H.
LandOfFree
Method and apparatus for instruction fetching does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for instruction fetching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for instruction fetching will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3364217