Electrical computers and digital processing systems: memory – Address formation – Slip control – misaligning – boundary alignment
Reexamination Certificate
1993-10-18
2001-04-17
Ellis, Richard L. (Department: 2784)
Electrical computers and digital processing systems: memory
Address formation
Slip control, misaligning, boundary alignment
C710S052000
Reexamination Certificate
active
06219773
ABSTRACT:
BACKGROUND OF THE INVENTION
In the field of microprocessors, the number of instructions executed per second is a primary performance measure. As is well known in the art, many factors in the design and manufacture of a microprocessor impact this measure. For example, the execution rate depends quite strongly on the clock frequency of the microprocessor. The frequency of the clock applied to a microprocessor is limited, however, by power dissipation concerns and by the switching characteristics of the transistors in the microprocessor.
The architecture of the microprocessor is also a significant factor in the execution rate of a microprocessor. For example, many modern microprocessors utilize a “pipelined” architecture to improve their execution rate if many of their instructions require multiple clock cycles for execution. According to conventional pipelining techniques, each microprocessor instruction is segmented into several stages, and separate circuitry is provided to perform each stage of the instruction. The execution rate of the microprocessor is thus increased by overlapping the execution of different stages of multiple instructions in each clock cycle. In this way, one multiple-cycle instruction may be completed in each clock cycle.
By way of further background, some microprocessor architectures are of the “superscalar” type, where multiple instructions are issued in each clock cycle for execution in parallel. Assuming no dependencies among instructions, the increase in instruction throughput is proportional to the degree of scalability.
Another known technique for improving the execution rate of a microprocessor and the system in which it is implemented is the use of a cache memory. Conventional cache memories are small high-speed memories that store program and data from memory locations which are likely to be accessed in performing later instructions, as determined by a selection algorithm. Since the cache memory can be accessed in a reduced number of clock cycles (often a single cycle) relative to main system memory, the effective execution rate of a microprocessor utilizing a cache is much improved over a non-cache system. Many cache memories are located on the same integrated circuit chip as the microprocessor itself, providing further performance improvement.
According to each of these architecture-related performance improvement techniques, certain events may occur that slow the microprocessor performance. For example, in both the pipelined and the superscalar architectures, multiple instructions may require access to the same internal circuitry at the same time, in which case one of the instructions will have to wait (i.e., “stall”) until the priority instruction is serviced by the circuitry.
One type of such a conflict often occurs where one instruction requests a write to memory (including cache) at the same time that another instruction requests a read from the memory. If the instructions are serviced in a “first-come-first-served” basis, the later-arriving instruction will have to wait for the completion of a prior instruction until it is granted memory access. These and other stalls are, of course, detrimental to microprocessor performance.
It has been discovered that, for most instruction sequences (i.e., programs), reads from memory or cache are generally more time-critical than writes to memory or cache, especially where a large number of general-purpose registers are provided in the microprocessor architecture. This is because the instructions and input data are necessary at specific times in the execution of the program in order for the program to execute in an efficient manner; in contrast, since writes to memory are merely writing the result of the program execution, the actual time at which the writing occurs is not as critical since the execution of later instructions may not depend upon the result.
By way of further background, write buffers have been provided in microprocessors, such write buffers are logically located between on-chip cache memory and the bus to main memory. These conventional post-cache write buffers receive data from the cache for a write-through or write-back operation; the contents of the post-cache write buffer are written to main memory under the control of the bus controller, at times when the bus becomes available.
By way of further background, it is well known for microprocessors of conventional architectures, such as those having so-called “X86” compatibility, to effect write operations of byte sizes smaller than the capacity of the internal data bus.
It is an object of the present invention to provide a microprocessor architecture which buffers the writing of data from the CPU core into a write buffer, prior to retiring of the data to a cache, and in which misaligned writes may be easily handled with minimal loss of performance.
Other objects and advantages of the present invention will be apparent to those of ordinary skill in the art having reference to the following specification in combination with the drawings.
SUMMARY OF THE INVENTION
The invention may be implemented into a microprocessor by providing a write buffer. The write buffer is logically located between the core of the microprocessor and the memory (including off-chip main or cache memory and on-chip cache). Each write to memory executed by the core is made to the write buffer, rather than to the memory bus or cache; in this way, cache or memory reads are not impacted by writes performed by the core. The contents of the write buffer are written into cache or memory in an asynchronous manner, when the memory bus or cache is available.
Another feature of the present invention may be implemented in such a microprocessor with provisions for performing gathered writes from the write buffer to the cache. During allocation of the write buffer entries, comparisons are made between the physical address of currently allocated entry and previously allocated to determine if, at least, the physical addresses allocated are within the same byte group, in which case the multiple writes may be gatherable, or mergeable, into a single write operation to the cache. Other constraints on gatherability can include that the bytes are contiguous with one another, and that the writes are from adjacent write instructions in program order. Retiring of gatherable write buffer entries is effected by loading a latch with the data from the write buffer entries, after shifting of the data to place it in the proper byte lanes; the write is effected by presentation of the address in combination with the contents of the latch.
REFERENCES:
patent: 3916388 (1975-10-01), Shimp et al.
patent: 4131940 (1978-12-01), Moyer
patent: 4251864 (1981-02-01), Kindell et al.
patent: 4408275 (1983-10-01), Kubo et al.
patent: 4456955 (1984-06-01), Yanagita et al.
patent: 4580214 (1986-04-01), Kubo et al.
patent: 4594679 (1986-06-01), George et al.
patent: 4814976 (1989-03-01), Hansen et al.
patent: 4959771 (1990-09-01), Ardini, Jr. et al.
patent: 4961162 (1990-10-01), Nguyenphu et al.
patent: 4985825 (1991-01-01), Webb, Jr. et al.
patent: 4992938 (1991-02-01), Cocke et al.
patent: 4992977 (1991-02-01), Matoba et al.
patent: 5023776 (1991-06-01), Gregor
patent: 5073855 (1991-12-01), Staplin et al.
patent: 5075840 (1991-12-01), Grohoski et al.
patent: 5123097 (1992-06-01), Joyce et al.
patent: 5125083 (1992-06-01), Fite et al.
patent: 5125092 (1992-06-01), Prener
patent: 5142631 (1992-08-01), Murray et al.
patent: 5168561 (1992-12-01), Vo
patent: 5168571 (1992-12-01), Hoover et al.
patent: 5202972 (1993-04-01), Gusefski et al.
patent: 5222223 (1993-06-01), Webb et al.
patent: 5226126 (1993-07-01), McFarland et al.
patent: 5226169 (1993-07-01), Gregor
patent: 5285323 (1994-02-01), Hetherington et al.
patent: 5291586 (1994-03-01), Jen et al.
patent: 5313613 (1994-05-01), Gregor
patent: 0 348 652 A3 (1990-01-01), None
patent: 0442690 (1991-08-01), None
Computer Architecture News, “A VLSI superscalar processor architecture for numerical applications”, vol. 19, No. 3, May 1991, New York, US, pp. 160-168.
E
Garibay, Jr. Raul A.
Quattromani Marc A.
Carr & Ferrell LLP
Ellis Richard L.
VIA-Cyrix Inc.
LandOfFree
System and method of retiring misaligned write operands from... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method of retiring misaligned write operands from..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method of retiring misaligned write operands from... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2466260