Electrical computers and digital processing systems: processing – Processing control – Instruction modification based on condition
Reexamination Certificate
1998-04-22
2001-05-22
Chan, Eddie (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Instruction modification based on condition
C712S202000, C712S209000, C712S215000, C712S026000
Reexamination Certificate
active
06237086
ABSTRACT:
CROSS-REFERENCE TO RELATED APPLICATIONS
This application relates to the co-pending application Ser. No. 09/064,642, filed Apr. 22, 1998, “REISSUE LOGIC FOR HANDLING TRAPS IN A MULTIISSUE STACK BASED COMPUTING SYSTEM”, by Koppala, et. al. owned by the assignee of this application and incorporated herein by reference.
This application relates to the co-pending application Ser. No. 09/064,686, filed Apr. 22, 1998, “STACK CACHE MISS HANDLING”, by Koppala, et. al. owned by the assignee of this application and incorporated herein by reference.
This application relates to the co-pending application Ser. No. 09/064,680, filed Apr. 22, 1998, “LENGTH DECODER FOR VARIABLE LENGTH DATA”, by Koppala, et. al. owned by the assignee of this application and incorporated herein by reference.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to computing systems and, in particular, to super-scalar stack based computing systems.
2. Discussion of Related Art
Most computing systems are coupled to a random access memory system for storing and retrieving data. Various ways to increase the speed of computing systems using random access memory systems are well known in the art. For example using caches between a central processing unit of a computing system and the memory system can improve memory throughput. Furthermore, super-scalar architectures and pipelining can improve the performance of central processing units.
However, other memory architectures such as stacks are also used in computing systems. As shown in
FIG. 1
, a stack based computing system
110
, which can implement for example, the JAVA Virtual Machine, is coupled to a stack
120
. In classical stack architectures, data is either “pushed” onto the stack or “popped” off the stack by stack based computing system
110
. For example, to add the numbers 4 and 5, a stack based computing system
110
first pushes the number 4 onto the top of stack
120
. Then, stack based computing system
110
pushes the number 5 onto the stack. Then, stack based computing system
110
performs an add operation which pops the number 5 off stack
120
and the number 4 off stack
120
and pushes the number 9 onto the top of stack
120
. A major advantage of stack based computing system
110
is that operations using data at the top of the stack do not need to use memory addresses. The top of stack is also referred to as the first location of the stack, and the location just under the top of the stack is also referred to as the second location of the stack. Similarly, the memory location in the stack just after the second location is also referred to as the third location of the stack.
Stack based computing system
110
can become more flexible by also allowing stack based computing system
110
to use some random access techniques with stack
120
. Thus, in some implementation of stack based computing system
110
and stack
120
, the memory locations in stack
120
are part of a random-access memory architecture. Thus, each memory location in stack
120
has a memory address. As used herein, a memory location having a memory address equal to x is referred to as memory location x.
Even in stack based computing systems using random-access techniques, most operations by the stack based computing system use data from or near the top of stack
120
. For example, assume a value V1 from a memory location ADDR1 is to be added to a value V2 from a memory location ADDR2, and the sum stored at a memory location ADDR3, stack based computing system
110
first executes a stack load instruction, which retrieves value V1 from memory location ADDR1 and pushes value V1 onto the top of stack
120
. Next, stack based computing system
110
executes another stack load instruction, which retrieves value V2 from memory location ADDR2 and pushes value V2 onto the top of stack
120
. Then, stack based computing system
110
executes an add instruction which pops the top two locations of stack
120
, which now contain value V1 and value V2, and pushes the sum of value V1 and value V2 onto the top of stack
120
. Finally, stack based computing system
110
executes a stack store instruction which pops the value from the top of stack
120
, i.e. the sum of value V1 and value V2, and stores the value in memory location ADDR3.
Some of the techniques used to improve the performance of random access memory systems can be adapted to improve stack performance. For example, as shown in
FIG. 2
, stack
120
can contain a data cache
210
, a stack cache
220
, a stack cache management unit
240
, and a memory circuit
230
. Data cache
210
is formed with fast memory circuits, such as SRAMS, to improve the throughput of memory circuit
230
. Stack cache
220
specifically caches a top portion of stack
120
using fast memory circuits, such as SRAMS. Stack cache management unit
240
manages stack cache
220
by copying data from memory circuit
230
into stack cache
220
as data is popped off of stack
120
or spilling data from stack cache
220
to memory circuit
230
as data is pushed onto stack
120
. Thus, stack cache
220
maintains the top of stack
120
in fast memory circuit, so that a stack based computing system can perform stack operations with low stack latency. Specific implementations of stack caches and stack management units are described in U.S. patent application Ser. No. 08/828,899, entitled “Stack Caching Circuit with Overflow/Underflow unit”, by Sailendra Koppala, which is hereby incorporated by reference.
Once stack latency is reduced, the operating speed of a stack based computing system may be limited by the rate at which stack operations can be performed. In general-purpose processing units, such as RISC microprocessors, pipelining and super-scalar implementation are used to improve the performance of the processing units. However, the techniques used for RISC processors are not easily adapted to stack based computing systems. For example, in super-scalar architectures, data dependencies determine which instructions can be issued simultaneously. However, for stack based computing systems, most stack operations use the top of the stack and would thus have a data dependency conflict. Hence, there is a need for a stack based computing system architecture to improve the performance of stack based computing systems.
SUMMARY
Accordingly, the present invention provides pipelining techniques to prevent pipeline stalls and a super-scalar architecture for stack based computing systems, which can issue multiple stack operations concurrently. In accordance with one embodiment of the present invention, a stack based computing system includes an instruction pipeline, which prevents many common causes of pipeline stalls. Specifically, one embodiment of the instruction pipeline includes a stack cache fetch stage to retrieve data from a stack cache and a data cache fetch stage to retrieve data from a data cache. If a stack cache miss occurs, instead of stalling, the instruction pipeline requests the data from the data cache in the data cache fetch stage. Data is not written out until a write stage of the instruction pipeline, as opposed to the execution stage in conventional pipelines.
The instruction pipeline can be modified to reduce data coherency problems in accordance with another embodiment of the present invention. Specifically, a feedback path is coupled between the stack cache fetch stage and pipeline stages following the stack cache fetch stage, such as the data cache fetch stage, the write stage, and the execution stage. A comparator is also coupled between to the stack cache fetch stage and the stages following the stack cache fetch stage. If an address of a data request in the stack cache fetch stage matches the address of any data words in the stages following the stack cache fetch stage, the matching data word is fed to the stack cache fetch stage through the feedback path. Using the feedback path removes potential write after read hazards.
In addition to improving pipeline throughput, embodiments of the present invention ca
Buchamwandla Ravinandan R.
Koppala Sailendra
Chan Eddie
Gunnison, McKay & Hodgson LLP
McKay Philip
Patel Gautam R.
Sun Microsystems Inc.
LandOfFree
1 Method to prevent pipeline stalls in superscalar stack... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with 1 Method to prevent pipeline stalls in superscalar stack..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and 1 Method to prevent pipeline stalls in superscalar stack... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2569556