Stack cache miss handling

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S126000, C712S219000

Reexamination Certificate

active

06275903

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to computing systems and, in particular, to super-scalar stack based computing systems.
2. Discussion of Related Art
Most computing systems are coupled to a random access memory system for storing and retrieving data. Various ways to increase the speed of computing systems using random access memory systems are well known in the art. For example using caches between a central processing unit of a computing system and the memory system can improve memory throughput. Furthermore, super-scalar architectures and pipelining can improve the performance of central processing units.
However, other memory architectures such as stacks are also used in computing systems. As shown in
FIG. 1
, a stack based computing system
110
, which can implement for example, the JAVA Virtual Machine, is coupled to a stack
120
. In classical stack architectures, data is either “pushed” onto the stack or “popped” off the stack by stack based computing system
110
. For example, to add the numbers
4
and
5
, a stack based computing system
110
first pushes the number
4
onto the top of stack
120
. Then, stack based computing system
110
pushes the number onto the stack. Then, stack based computing system
110
performs an add operation which pops the number
5
off stack
120
and the number
4
off stack
120
and pushes the number
9
onto the top of stack
120
. A major advantage of stack based computing system
110
is that operations using data at the top of the stack do not need to use memory addresses. The top of stack is also referred to as the first location of the stack, and the location just under the top of the stack is also referred to as the second location of the stack. Similarly, the memory location in the stack just after the second location is also referred to as the third location of the stack.
Stack based computing system
110
can become more flexible by also allowing stack based computing system
110
to use some random access techniques with stack
120
. Thus, in some implementation of stack based computing system
110
and stack
120
, the memory locations in stack
120
are part of a random-access memory architecture. Thus, each memory location in stack
120
has a memory address. As used herein, a memory location having a memory address equal to x is referred to as memory location x.
Even in stack based computing systems using random-access techniques, most operations by the stack based computing system use data from or near the top of stack
120
. For example, assume a value V
1
from a memory location ADDR
1
is to be added to a value V
2
from a memory location ADDR
2
, and the sum stored at a memory location ADDR
3
, stack based computing system
110
first executes a stack load instruction, which retrieves value V
1
from memory location ADDR
1
and pushes value V
1
onto the top of stack
120
. Next, stack based computing system
110
executes another stack load instruction, which retrieves value V
2
from memory location ADDR
2
and pushes value V
2
onto the top of stack
120
. Then, stack based computing system
110
executes an add instruction which pops the top two locations of stack
120
, which now contain value V
1
and value V
2
, and pushes the sum of value V
1
and value V
2
onto the top of stack
120
. Finally, stack based computing system
110
executes a stack store instruction which pops the value from the top of stack
120
, i.e. the sum of value V
1
and value V
2
, and stores the value in memory location ADDR
3
.
Some of the techniques used to improve the performance of random access memory systems can be adapted to improve stack performance. For example, as shown in
FIG. 2
, stack
120
can contain a data cache
210
, a stack cache
220
, a stack cache management unit
240
, and a memory circuit
230
. Data cache
210
is formed with fast memory circuits, such as SRAMS, to improve the throughput of memory circuit
230
. Stack cache
220
specifically caches a top portion of stack
120
using fast memory circuits, such as SRAMS. Stack cache management unit
240
manages stack cache
220
by copying data from memory circuit
230
into stack cache
220
as data is popped off of stack
120
or spilling data from stack cache
220
to memory circuit
230
as data is pushed onto stack
120
. Thus, stack cache
220
maintains the top of stack
120
in fast memory circuit, so that a stack based computing system can perform stack operations with low stack latency. Specific implementations of stack caches and stack management units are described in U.S. patent application Ser. No. 08/828,899, entitled “Stack Caching Circuit with Overflow/Underflow unit”, by Sailendra Koppala, now U.S. Pat. No. 6,167,400, which is hereby incorporated by reference.
Once stack latency is reduced, the operating speed of a stack based computing system may be limited by the rate at which stack operations can be performed. In general-purpose processing units, such as RISC microprocessors, pipelining and super-scalar implementation are used to improve the performance of the processing units. However, the techniques used for RISC processors are not easily adapted to stack based computing systems. For example, in super-scalar architectures, data dependencies determine which instructions can be issued simultaneously. However, for stack based computing systems, most stack operations use the top of the stack and would thus have a data dependency conflict. Hence, there is a need for a stack based computing system architecture to improve the performance of stack based computing systems.
SUMMARY
Accordingly, the present invention provides pipelining techniques to prevent pipeline stalls and a super-scalar architecture for stack based computing systems, which can issue multiple stack operations concurrently. In accordance with one embodiment of the present invention, a stack based computing system includes an instruction pipeline, which prevents many common causes of pipeline stalls. Specifically, one embodiment of the instruction pipeline includes a stack cache fetch stage to retrieve data from a stack cache and a data cache fetch stage to retrieve data from a data cache. If a stack cache miss occurs, instead of stalling, the instruction pipeline requests the data from the data cache in the data cache fetch stage. Data is not written out until a write stage of the instruction pipeline, as opposed to the execution stage in conventional pipelines.
The instruction pipeline can be modified to reduce data coherency problems in accordance with another embodiment of the present invention. Specifically, a feedback path is coupled between the stack cache fetch stage and pipeline stages following the stack cache fetch stage, such as the data cache fetch stage, the write stage, and the execution stage. A comparator is also coupled between to the stack cache fetch stage and the stages following the stack cache fetch stage. If an address of a data request in the stack cache fetch stage matches the address of any data words in the stages following the stack cache fetch stage, the matching data word is fed to the stack cache fetch stage through the feedback path. Using the feedback path removes potential write after read hazards.
In addition to improving pipeline throughput, embodiments of the present invention can provide super-scalar operation of stack based computing systems. In accordance to one embodiment of the present invention, the instructions of a stack based computing system are separated into different instruction types. Common types include the load variable (LV) type, the store variable (SV) type, the operation (OP) type, the break group one (BG
1
) type, the break group two (BG
2
) type, and the non-foldable (NF) type. If instructions of various types occur in specific sequences, the instructions can form an instruction group, so that the instructions in the group can be executed concurrently. Common instruction groups include the LV-SV, LV-OP-SV, LV-OP, LV-LV-OP, LV-LV-OP-SV, LV-BG
1
, LV-BG
2

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Stack cache miss handling does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Stack cache miss handling, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Stack cache miss handling will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2477512

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.