Method and apparatus for facilitating multiple storage...

Electrical computers and digital processing systems: processing – Processing architecture – Superscalar

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S215000, C712S223000, C712S235000

Reexamination Certificate

active

06192461

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to a processing system and more particularly for increasing the rate of store instruction completions in a processing system.
BACKGROUND OF THE INVENTION
In the continuing development of faster and more powerful computer systems, a significant microprocessor has been utilized, known as a reduced instruction set computer (RISC) processor. Increased advances in the field of RISC processors have led to the development of superscalar processors. Superscalar processors, as their name implies, perform functions not commonly found in traditional scalar microprocessors. Included in these functions is the ability to execute instructions out-of-order with respect to the program order. Although the instructions occur out-of-order, the results of the executions appear to have occurred in program order, so that proper data coherency is maintained.
A common bottleneck in superscalar processor performance is the number of instructions which can be outstanding within the processor at a given time. Typically, the instruction unit includes a queue which indicates the number of outstanding instructions. The queue typically suspends any future dispatching of instructions if a maximum number is reached.
One type of instruction which can be slow to complete is the store instruction. A store instruction is slow to complete for a number of reasons. For example, store instructions are slow to complete due to the maximum number of stores which can be completed per cycle, and due to the number of stores which can update the cache each cycle. Conventional superscalar processors typically only complete one store instruction per cycle. This often causes dispatch stalls. Accordingly, a need exists for a system that efficiently and effectively combats such problems and decreases the number of dispatch unit stalls due to the lack of store instructions completions to enhance overall processor performance.
SUMMARY OF THE INVENTION
Accordingly, the present invention is directed to increasing the number of store instructions that can be completed during a cycle. A method and system for handling multiple store instruction completions in a processing system after a stall condition is disclosed. The processing system includes an instruction unit, the instruction unit including a dispatch unit and a completion unit, a translation unit and at least one execution unit. A load store unit comprises an instruction queue for receiving a plurality of instructions from the dispatch unit; at least one effective address (EA) unit for receiving the plurality of instructions from the instruction queue, and a store queue. The store queue is coupled to the translation unit, the at least one execution unit and the at least one EA unit. The store queue receives data and real address information relating to each of the plurality of instructions from the at least one execution unit prior to completion of each of the plurality of instructions.
In so doing, the bottleneck associated with conventional systems, i.e., a maximum number of instructions that can be dispatched by the instruction unit is reduced,
According to one embodiment of the present invention, there is provided a circuit for use in superscalar processors which allows for rapid completion of Store instructions which are queued up in a completion table. According to this embodiment of the invention, a data queue is provided which stores the data for the Store instruction. After the Store instruction is executed, and its effective address is available, the circuit determines the location of the data required by the store instruction from the GPR or rename registers. If the instruction which originally generated the data required by the Store instruction has successfully completed, then the data will be architected, i.e., stored in the general purpose registers (“GPRs”) or floating point registers (“FPRs”) depending on whether the instruction was a fixed or floating point operation. For purposes of present discussion, fixed point instructions will be presumed. It will be understood that extension of the present invention to floating point instructions will be easily within the skills of one in the art.
If the instruction which generated the data required by the Store instruction has not yet completed, but has been processed to a final result by the relevant execution unit, then the data will be stored in the rename registers as valid. If the instruction has not yet generated the data required by the Store instruction, then the rename register set aside to receive the data will be marked by the processor as invalid.
Once the data required by the Store instruction is available, whether in a GPR or in a valid rename register, then the data is passed into an entry in the data queue. Read ports are provided on the rename register to facilitate this operation. Since, in the present version of the invention, completion must occur in program order, the step of passing data into the data queue may occur long before the completion queue pointer points to the Store instruction as the next instruction to complete. This is especially true if completion has been stalled by an instruction issued before the Store instruction. When it is time for the Store instruction to complete, the data required by the Store instruction is already in the data queue and may be sent to the cache, no access is required to the GPRs as with conventional processors. This permits multiple Store instructions to be completed in a single clock cycle because there is no bottleneck at the GPR ports since the data has been “preloaded” while the Store instructions were waiting for their turns in the completion window. This, of course, is not possible in conventional processors which have only a single read port on the GPRs. More ports could be added, but this would require additional design, chip real estate and complexity. Still other objects and advantages of the invention will become clear to those of skill in the art in view of the following disclosure of detailed embodiments of the invention.
These and other advantages of the aspects of the present invention will be more fully understood in conjunction with the following detailed description and accompanying drawings.


REFERENCES:
patent: 4594660 (1986-06-01), Guenthner et al.
patent: 5621896 (1997-04-01), Burgess et al.
patent: 5664215 (1997-09-01), Burgess et al.
patent: 5696955 (1997-12-01), Goddard et al.
patent: 5857089 (1999-01-01), Goddard et al.
patent: 5887152 (1999-03-01), Tran
patent: 5926645 (1999-07-01), Williamson

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for facilitating multiple storage... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for facilitating multiple storage..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for facilitating multiple storage... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2560698

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.