Pipelined snooping of multiple L1 cache lines

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Pipelined snooping of multiple L1 cache lines Pipelined snooping of multiple L1 cache lines

: 1999-09-17
: 2002-08-20
: Nguyen, Hiep T. (Department: 2187)
: Electrical computers and digital processing systems: memory
: Storage accessing and control
: Hierarchical memories

: C711S122000, C711S143000, C711S146000
: Reexamination Certificate
: active
: 06438657
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field of the Invention
This invention generally relates to caches for computer systems, such as set associative caches and direct-mapped caches, and more particularly to reducing snoop busy time.
2. Background Art
The use of caches for performance improvements in computing systems is well known and extensively used. See, example, U.S. Pat. No. 5,418,922 by L. Liu for “History Table for Set Prediction for Accessing a Set Associative Cache”, and U.S. Pat. No. 5,392,410 by L. Liu for “History Table for Prediction of Virtual Address Translation for Cache Access”, the teachings of both of which are incorporated herein by reference.
A cache is a high speed buffer which holds recently used memory data. Due to the locality of references nature for programs, most of the access of data may be accomplished in a cache, in which case slower accessing to bulk memory can be avoided.
In typical high performance processor designs, the cache access path forms a critical path. That is, the cycle time of the processor is affected by how fast cache accessing can be carried out.
A typical shared memory multiprocessor system implements a coherency mechanism for its memory subsystem. This memory subsystem contains one or more levels of cache memory associated with a local processor. These processor/cache subsystems share a bus connection to main memory. A snooping protocol is adopted where certain accesses to memory require that processor caches in the system be searched for the most recent (modified) version of requested data. It is important to optimize this protocol such that interference as seen by local processors is minimized when snooping occurs. It is also important to move data out of the cache as quickly as possible when a memory access is waiting for cache data resulting from a snoop.
In accordance with an exemplary system, a two level cache subsystem with level 2 (L2) cache line size some power of 2 larger than level 1 (L1) cache line size is implemented. Both caches implement writeback policies, and L1 is set-associative. L1 is subdivided into sublines which track which portions of the cache line contain modified data. The cache subsystem implements multi-level inclusion wherein all blocks resident in L1 must also be resident in L2. Snoop requests from the bus are received at L2 and, if appropriate, the request is also forwarded on to L1. The snoop request forwarded to L1, however, requires accessing the L1 directory for all of the consecutive L1 cache entries which may contain data associated with the L2 cache line. Each directory access is sent to the L1 cache subsystem as an individual request. Each cache read access resulting from a directory access waits for cache directory information which indicates slot hit and subline offset. Slot hit information can be used in parallel with the cache access but the subline offset is used to generate the address in the cycle before the cache read.
Referring to
FIG. 7
, an example is given where a single forwarded L2 snoop request requires two L1 directory accesses. Two data transfers out of L1 are required for each directory access because both L1 lines have modified data in both of their sublines. This example demonstrates two problems with the design of this exemplary system.
(1) The processor associated with the L1 cache being snooped is prevented from accessing the L1 cache subsystem when either the L1 directory or cache is being used by a snoop operation. This is illustrated by holding the processor pipe (cache busy) through cycles 1 through 9. Use of these resources occurs in different cycles which extends the overall busy time for the snoop operation.
(2) Delay exists between the transfer of the first and second cache blocks which in turn delays when the memory access associated with the snoops can proceed.
It is, therefore, an object of the invention to reduce the number of cycles required for an L1 snoop operation.
It is a further object of the invention to avoid delays between first and second cache blocks which cause delays in memory access associated with snoops.
SUMMARY OF THE INVENTION
In accordance with the invention, an apparatus and method for operating a computing system including a cache includes accessing a directory for a second snoop request while evaluating a directory access from a first snoop request.
In accordance with a further aspect of the invention, an apparatus and method is provided for operating a computing system including a two level cache subsystem including an L1 cache and an L2 cache. During a REQUEST stage, a directory access snoop to the directory of the L1 cache is requested; and responsive thereto, during a SNOOP stage, the directory is accessed; during an ACCESS stage, the cache arrays are accessed while processing results from the SNOOP stage. These stages are fully overlapped in a pipelined fashion. If multiple data transfers are required out of the L1 cache, a pipeline hold is issued to the REQUEST and SNOOP stages, and the ACCESS stage is repeated. During a FLUSH stage, cache data read from the L1 cache and during the ACCESS stage is sent to the L2 cache.
Other features and advantages of this invention will become apparent from the following detailed description of the presently preferred embodiment of the invention, taken in conjunction with the accompanying drawings.

REFERENCES:
patent: 5325503 (1994-06-01), Stevens et al.
patent: 5339399 (1994-08-01), Lee et al.
patent: 5341487 (1994-08-01), Derwin et al.
patent: 5353415 (1994-10-01), Wolford et al.
patent: 5355467 (1994-10-01), MacWilliams et al.
patent: 5369753 (1994-11-01), Tipley
patent: 5446863 (1995-08-01), Stevens et al.
patent: 5611071 (1997-03-01), Martinez, Jr.
patent: 5737759 (1998-04-01), Merchant
patent: 5778447 (1998-07-01), Kuddes
patent: 5809537 (1998-09-01), Itskin et al.
patent: 5819105 (1998-10-01), Moriarty et al.
patent: 6065101 (2000-05-01), Gilda

Affiliated with

Gilda Glenn David

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Beckstrand Shelley M.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Nguyen Hiep T.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Pipelined snooping of multiple L1 cache lines does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Pipelined snooping of multiple L1 cache lines, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pipelined snooping of multiple L1 cache lines will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2975539

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure