Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1998-03-31
2001-03-20
An, Meng-Ai T. (Department: 2154)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S118000, C711S125000, C711S133000, C711S137000, C711S128000, C711S129000, C711S145000, C711S160000
Reexamination Certificate
active
06205520
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to the field of processors, and specifically, to a method and apparatus for implementing non-temporal stores.
2. Background Information
The use of a cache memory with a processor is well known in the computer art. A primary purpose of utilizing cache memory is to bring the data closer to the processor in order for the processor to operate on that data. It is generally understood that memory devices closer to the processor operate faster than memory devices farther away on the data path from the processor. However, there is a cost trade-off in utilizing faster memory devices. The faster the data access, the higher the cost to store a bit of data. Accordingly, a cache memory tends to be much smaller in storage capacity than main memory, but is faster in accessing the data.
A computer system may utilize one or more levels of cache memory. Allocation and de-allocation schemes implemented for the cache for various known computer systems are generally similar in practice. That is, data that is required by the processor is cached in the cache memory (or memories). If a cache miss occurs, then an allocation is made at the entry indexed by the access. The access can be for loading data to the processor or storing data from the processor to memory. The cached information is retained by the cache memory until it is no longer needed, made invalid or replaced by other data, in which instances the cache entry is de-allocated.
Recently, there has been an increase in demand on processors to provide high performance for graphics applications, especially three-dimensional graphics applications. The impetus behind the increase in demand is mainly due to the fact that graphics applications tend to cause the processor to move large amounts of data (e.g., display data) from cache and/or system memory to a display device. This data, for the most part, is used once or at most only a few times (referred to as “non-reusable data”).
For example, assume a cache set with two ways, one with data A and another with data B. Assume further that data A, data B, and data C target the same cache set, and also assume that a program reads and writes data A and data B multiple times. In the middle of the reads and writes of data A and data B. if the program performs an access of non-reusable data C, the cache will have to evict, for example, data A from line one and replace it with data C. If the program then tries to access data A again, a cache “miss” occurs, in which case data A is retrieved from external memory and data B is evicted from line two and replaced with data A. If the program then tries to access data B again, another cache “miss” occurs, in which case data B is retrieved from external memory and data C is evicted from line one and replaced with data B. Since data C is non-reusable by the program, this procedure wastes a considerable amount of clock cycles, decreases efficiency, and pollutes the cache.
Therefore, there is a need in the technology for a method and apparatus to efficiently write non-reusable data to external memory without polluting cache memory.
A further bottle neck in data intensive applications such as three-dimensional applications, in addition to the processor, is the memory and bus bandwidth. That is, data intensive applications require a considerable amount of bus transactions to and from system memory.
Therefore, there is an additional need in the technology for a method and apparatus to efficiently write non-reusable data to external memory without polluting cache memory while minimizing bus transactions.
SUMMARY OF THE INVENTION
In one embodiment, the present invention is a processor that includes a decoder to decode instructions and a circuit. The circuit, in response to a decoded instruction, detects an incoming write back or write through streaming store instruction that misses a cache and allocates a buffer in write combining mode.
REFERENCES:
patent: 5404484 (1995-04-01), Schlansker et al.
patent: 5526510 (1996-06-01), Akkary et al.
patent: 5630075 (1997-05-01), Joshi et al.
patent: 5671444 (1997-09-01), Akkary et al.
patent: 5680572 (1997-10-01), Akkary et al.
patent: 5829025 (1998-10-01), Mittal
patent: 5829026 (1998-10-01), Leung et al.
21164 Alpha Microprocessor Data Sheet, 1997 Samsung Electronics, pp. 1-77.
TM1000 Preliminary Data Book, (Tri Media), 1997, Philips Electronics.
Visual Instruction Set (VIS) User's Guide, Sun Microsystems, version 1.1, Mar. 1997, pp. 1-127.
AMD-3D Technology manual, /Rev. B, Feb. 1998, pp. 1-58.
The UltraSPARC Processor—Technology White Paper The UltraSPARC Architecture, Sun Microsystems, Jul. 17, 1997, pp. 1-10.
Maiyuran Subramaniam
Palanca Salvador
Pentkovski Vladimir
Tsai Steve
An Meng-Ai T.
Blakely , Sokoloff, Taylor & Zafman LLP
El-Hady Nabil
Intel Corporation
LandOfFree
Method and apparatus for implementing non-temporal stores does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for implementing non-temporal stores, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for implementing non-temporal stores will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2527350