Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-12-10
2003-01-14
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S131000, C711S213000
Reexamination Certificate
active
06507894
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to an information processing apparatus having a cache memory for holding a copy of the contents of a main memory and an information processing process.
BACKGROUND OF THE INVENTION
In a micro-processor, a cache memory of a small capacity and a high speed is placed in the vicinity of a processor to speed up frequently performed memory accessing to reduce the entire run time.
However, if large scale data is to be handled, data transfer between a cache memory and the main memory occurs frequently such that frequently used data may be expelled from the cache memory to lower the performance. Moreover, in the cache memory, since plural data arranged in succession on the main memory are usually grouped into a line and the data are exchanged with the main memory on the line basis, there are occasions wherein unneeded data are introduced into the cache memory. This reduces the effective cache memory capacity to lower the performance additionally.
In actuality, in the field of a supercomputer, the cache memory is not used on accessing large scale data and the data is directly introduced into a register from the main memory. However, in this case, a large number of registers are required in order to hide the comparatively long latency required in data transfer between the main memory and the register.
For example, in a vector supercomputer SX-4, manufactured by NEC, such a technique is used in which a vector register of a large capacity is provided in the processor and, if large scale data is to be accessed by the processor, the data is directly introduced from the main memory to the register and the processor excutes computations on the data in the vector register. Although the cache memory is used for data accessing from a scalar processor, the vector processor does not access data in the cache memory (see M. Inoue, K. Owada, T. Furui and M, Katagiri, Hardware of the SX-4 Series, NEC Technical Journal, Vol. 48, No. 11, pp. 13 to 22, 1995).
In a treatise by Nakamura et al. (H. Nakamura, H. Imori and K. Nakazawa, Evaluation of Pseudo Vector Processor Based on a Register Window, Translation of the Information Processing Society of Japan, Vol. 34, No. 4, pp. 669 to 679, 1993), there is disclosed a technique of efficiently executing computations in the scientific application in a micro-processor. Here, plural registers are provided, and data of the main memory is directly loaded into a separate register set during execution of the computations, and data of a separate register set is directly stored into the main memory. Here again, the cache memory is by-passed in accessing the large scale data, and data is directly exchanged between the register and the main memory.
The above-described techniques, having the function of by-passing the cache memory in accessing the large-scale data to transfer data directly between the main memory and the register, and which are provided with a large number of registers in the processor in order to hide the longer latency required in data transfer between the main memory and the register, for evading the lowering in the performance ascribable to cache memory characteristics, is herein termed a first conventional technique.
In the JP Patent Kokai JP-A-8-286928, there is disclosed a technique of efficiently loading discretely arrayed data in a cache memory. Here, the data arranged discretely on the main memory with a certain regularity is collected by an address conversion device. The collected data are re-arrayed as a sequence of data having consecutive addresses. Since the program has access to the re-arrayed data, the cache memory may hold only data required by the processor, so that it may be expected to realize efficient use of the cache memory and efficient use of the bandwidth between the cache memory and the main memory. This technique is herein termed a second conventional technique.
SUMMARY OF THE DISCLOSURE
However, various problems have been encountered in the course of the investigations toward the present invention. For instance, the first conventional technique has an inconvenience that a unit must be provided in the processor to bypass the cache memory and a large number of registers need to be provided for hiding the effect of the memory latency. Therefore, specialized, dedicated microprocessor components are needed, all of which enlarge the circuit.
In the second conventional technique, the following three problems are not resolved.
The first problem is that of cache pollution, i.e., data subsequently to be used is expelled from the cache when data is loaded on a large scale. If the expelled data is again requested by the processor, cache mis-hit occurs, such that the processor must again request data outside the large block of data just loaded. If this occurs frequently, the data transfer between the processor and the main memory device could become excessive and exceed the bandwidth of the channel, thus possibly lowering the performance of the processor system.
The second problem is that, while data copying from the main memory to the cache memory occurs on the cache line basis, this data copying is necessarily started by cache mis-hit, thus increasing the average latency since the time the processor requires data until the data becomes available.
The third problem is that, while a processor core has a low-speed interface on the main memory side and a high-speed interface on the cache memory side, the copying from the main memory to the cache memory occurs via a cache memory controller in the processor core, so that the copying speed is governed by the low-speed interface between the main memory and the processor core.
It is therefore an object of the present invention to provide an information processing apparatus and process in which pre-fetch/post-store between the main memory data and the cache memory is realized efficiently.
Other objects of the present invention will become apparent in the entire disclosure.
An information processing apparatus according to an aspect of the present invent ion has a main memory device, a cache memory for holding a copy of the main memory device, and a processor including a cache memory controller designed to supervise data in the cache memory as the apparatus refers to and updates control information and address information in the cache memory. The information processing apparatus includes pre-fetch unit designed to transfer data in the main memory device to the cache memory without having reference to nor updating the control information and the address information.
The processor has a physical address space area, inclusive of a specified physical space area associated with a specified area on the cache memory in a one-for-one correspondence. The pre-fetch unit directly transfers data between the cache memory and the main memory device, under a command for memory copying for the specified physical space area without obstructing execution of processing by the processor.
The cache memory has a first input/output port and a second input/output port. The first input/output port is connected to the main memory device via the cache memory controller of the processor. The second input/output port is connected to the main memory device via the pre-fetch unit.
The pre-fetch unit copies consecutive or discrete data arrayed at a fixed interval on the main memory device in consecutive areas on the cache memory. The pre-fetch unit copies a sequence of data arrayed in addresses on the main memory device specified by pointers into consecutive areas on the cache memory. The pointers are consecutive or discrete data arrayed at a fixed interval on the main memory device. The pre-fetch unit copies a sequence of data arrayed in addresses on the main memory device specified by pointers into consecutive areas on the cache memory. The pointers are data arranged consecutively in specified areas on the cache memory.
An information processing apparatus according to a second aspect of the present invention has a main memory device, a cache memory holding a copy of the main memory de
Elmore Stephen
Kim Matthew
McGinn & Gibb PLLC
NEC Corporation
LandOfFree
Information processing apparatus and process does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Information processing apparatus and process, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information processing apparatus and process will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3024728