Cache consistent control of subsequent overlapping memory...

Electrical computers and digital processing systems: processing – Processing architecture – Vector processor

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S144000, C712S006000, C712S216000, C712S225000

Reexamination Certificate

active

06816960

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a vector architecture information processing equipment, and more particularly to a vector scatter instruction control circuit.
BACKGROUND OF THE INVENTION
On a vector architecture information processing equipment, memory area data accessed by a vector instruction is not usually entered in a cache.
The reason is that locality of reference generally does not well applies to data access by a vector instruction so that data accessed by a vector instruction, if entered in a cache memory, is swapped out immediately by other cache line data, and a cache hit ratio decreases.
Also, on a vector architecture information processing equipment, there are provided some vector based memory access instructions, such as VST (vector store)/VLD (vector load) instruction in which a memory access address is defined by a start address and a distance of a vector data to be accessed.
VLD instruction loads data from memory into a vector data storage area made of a plurality of words arranged in a vector unit, called “vector register” in accordance with memory access address defined as described above.
Conversely, VST instruction stores data from a register into memory.
In case of VST instruction, an address accessed by the instruction may be determined on an instruction issue stage. It is relatively easy to accomplish an improvement of performance, by controlling such an instruction as VLD instruction or scalar load instruction that follows VST instruction to be executed ahead of the VST instruction.
On the other hand, with so-called “list vector” instructions, such as VGT (vector gather)/VSC (vector scatter) instructions, data stored in vector registers arranged in the vector unit is used as a memory address to be accessed so that the memory address to be accessed is identified only after the instruction gets to the vector unit, whereas the address is generally random.
For the sake of better understanding of the present invention, a list vector instruction will be described with reference to FIG.
8
.
First, as shown in FIG.
8
(
a
), VGT (vector gather) instruction loads data from memory in such a way that a memory data at an address VA (n) of a vector register Vy, is loaded into a corresponding element of the vector register Vx.
As shown in FIG.
8
(
b
), VSC (vector scatter) instruction stores data into memory in such a way that data of the vector register Vx is stored into a memory area of which address VA (n) is stored in a corresponding element of the vector register Vy.
In contrast to vector memory access instructions, with a scalar memory access instruction, locality of reference generally applies to data accesses, as a result of which, such a system is usually adopted in which data accessed by the scalar memory access instruction is stored in a cache memory to make memory access latancy being hidden.
SUMMARY OF THE INVENTION
When a vector memory access instruction is issued to write data into memory on a vector architecture information processing equipment accommodated with a cache as described in the above, it is necessary to execute cache invalidation to ensure cache consistency in case that an address to be accessed is being entered in the cache, wherein the cache invalidation process generates a stall of a cache access instruction that follows the vector memory access instruction, which is a one of primary causes of degradation of performance.
A cache invalidation process differs between VST (vector store) instruction and VSC (vector scatter) instruction.
In case of VST instruction, a start address and a distance are determined when the instruction is issued so that with these two data relatively high-speed cache invalidation is realized. Furthermore, since memory access start address and end address of VST instruction can be calculated promptly, a scalar LD (load) instruction that follows VST instruction, may be controlled to be executed ahead of the VST instruction if no address coincidence is detected between these two instructions.
On the other hand, in case of VSC (vector scatter) instruction, since an address to be accessed is determined only after the address is read from a vector register and, in addition, the address value is random, it is necessary to send an invalidation address from a vector unit to a cache invalidation control unit (see
4
in
FIG. 1
) in a scalar unit to invalidate cache data that matches the invalidation address.
As a result, all memory access instructions that follow VSC instruction cannot be issued until this cache invalidation processing is completed. This causes degradation of performance.
This problem will be described more in detail with reference to
FIGS. 6 and 7
.
First, in order to make description easy to understand, LDS instruction, which belongs to scalar load (cache access) instructions, will be described with reference to FIG.
7
. As with VGT/VSC instruction, LDS instruction comprises four fields: OPC (operation code) and operands X, Y, and Z wherein a memory access address is calculated as Ry+Rz and a resultant data M (Ry+Rz) that is read from memory area of an address Ry+Rz is stored into register Rx.
In FIG.
6
(
a
), after VST (vector) instruction is issued, the cache is invalidated and, almost at the same time, data is written from the vector into memory.
The LDS instruction following the VST instruction may be issued even with the cache being invalidated, unless access address of the LDS instruction overlaps with that of the VST instruction.
On the other hand, referring to FIG.
6
(
b
), with VSC (vector scatter) instruction, cache invalidation is performed when vector processing starts and an invalidating address is sent. In addition, since an address to be accessed immediately after VSC instruction is issued is not known and an address is random, LDS instruction that follows the VSC instruction is kept waiting in a hold state until cache invalidation is completed.
As described above, all memory access instructions that follow the VSC instruction cannot be issued before cache invalidation is completed and this causes performance degradation.
In view of the foregoing, it is an object of the present invention to provide a vector architecture processing equipment that prevents a following instruction from being delayed because of cache invalidation of a vector scatter instruction and that executes the following instruction before the vector scatter instruction to improve performance.
To achieve the above object, in accordance with one aspect of the present invention is provided a circuit comprising:
means for detecting whether an overlap exists between an address to be accessed by an area-specified vector scatter instruction, which specifies a range of memory access address, and an address to be accessed by a memory access instruction that follows the area-specified vector scatter instruction; and
means for holding the memory access instruction that follows on which address coincidence is detected.
In accordance with one aspect of the present invention is provided a circuit for controlling vector scatter instruction wherein an area-specified vector scatter instruction specifying scattered areas is provided as an instruction set, comprising:
an address coincidence detection unit detecting if an address to be accessed by the area-specified vector scatter instruction overlaps with an address to be accessed by a memory access instruction that follows the vector scatter instruction; and
a hold control unit holding the memory access instruction that follows the vector scatter instruction if the addresses overlap.
In accordance with another aspect, is provided a vector architecture information processing equipment comprising:
a vector scatter address coincidence detection unit including:
registers for storing an area start address and an area end address of an area-specified vector scatter instruction in which the area start address and the area end address are specified; and
a circuit for checking if an address to be accessed by a memory access instruction following the a

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Cache consistent control of subsequent overlapping memory... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Cache consistent control of subsequent overlapping memory..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache consistent control of subsequent overlapping memory... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3339596

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.