Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1998-10-29
2002-01-22
Yoo, Do Hyun (Department: 2185)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S171000, C712S207000, C710S120000, C710S120000
Reexamination Certificate
active
06341335
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to an information processing system which comprises a processor for performing arithmetic operation, a memory and a memory controller for performing control over the memory and more particularly, to a prefetch function in an information processing system which uses an embedded processor as a processor.
FIG. 13
shows an arrangement of a general information processing system as a prior art. A processor
1
and a memory controller
2
are connected by a system bus
110
, the memory controller
2
and a memory
3
are connected by a memory bus
111
, and the memory controller
2
and another system are connected by an IO bus (not shown). The processor
1
of the present system includes an on-chip cache (which will be referred to as the L1 cache, hereinafter)
12
, and an L2 cache
14
connected to the system bus
110
. The memory controller
2
performs connection control not only over the memory
3
and L2 cache
14
but also over the other system. The operation of the processor
1
of reading an instruction code (which operation will be referred to as fetch, hereinafter) is summarized as follows. The processor
1
issues a memory access request to the memory controller
2
via the instruction processing part
11
and system bus
110
. The memory controller
2
, in response to the request, reads an instruction code from the L2 cache
14
or memory
3
and transmits it to the processor
1
. An access size between the processor
1
and memory
3
is influenced by the L1 cache
12
so that the reading of the code from the memory
3
is carried out on every line size basis as the management unit of the L1 cache
12
. Most processors are each equipped usually with, in addition to an L1 cache, an L2 cache provided outside the processor core as a relatively high-speed memory. The word ‘cache’ as used herein refers to a memory which stores therein an instruction code once accessed by a memory to realize a high-speed access to the same code in the case of an occurrence of the re-access to the same code. In order to perform arithmetic operation, the processor also makes access not only to such an instruction code but also to various sorts of data including operands and to external registers. Even these data is stored in an cache in some cases. Such a technique is already implemented in many systems including a personal computer as a typical example.
SUMMARY OF THE INVENTION
In an information processing system, in addition to the arithmetic operation performance of a processor, the reading performance of an instruction code from a memory to the processor is also important. A delay from the access request of the processor to the acceptance of the data thereof is known as an access latency. In these years, the core performance of the processor has been remarkably improved, but an improvement in the supply capability of the instruction code from the access memory is still insufficient. When the access latency becomes unnegligible due to a performance difference between the two, the operation of the processor stalls, which disadvantageously results in that the processor cannot fully exhibit the performances and thus the memory system becomes a bottleneck in the system. Such an access latency problem occurs not only for the instruction fetch but also for data or register operands.
Conventional methods for improving an access latency include first to fourth methods which follow.
The first improvement method is to improve the performance of a system bus. In order to improve the performance of the system bus, it becomes necessary to extend a bus width and improve an operational frequency. However, the improvement is difficult because of the following problems (1) using too many pins of devices to connect the system bus, and (2) a noise problem, such as crosstalk.
The second improvement method is to speed up the memory. For the speed-up of the memory, it is considered to speed up the operation of the memory per se and also to use a cache as the memory. However, such a high-speed memory as a high-speed SRAM or a processor-exclusive memory is expensive, which undesirably involves an increase in the cost of the entire system. Meanwhile the cache has problems based on its principle as follows. That is, the cache is effective after once accessed and is highly useful when repetitively accessed. In particular, a program to be executed on a so-called embedded processor tends to have a low locality of references, the re-use frequency of an instruction code is low and thus the cache memory cannot work effectively. This causes the instruction code to have to be read out directly from the memory, for which reason this method cannot make the most of the high-speed feature of the cache. Further, such a high-speed cache memory used as a high-speed SRAM or a processor-exclusive memory is expensive. Though the price/performance ratio of the memory is improved, the employment of the latest high-speed memory involves high costs. An increasingly large capacity of memory has been demanded by the system in recent years. Thus the cost increase becomes a serious problem.
The third improvement method is considered to employ a so-called harvard architecture of access separation between the instruction code and data. In other words, a bus for exclusive use in the instruction code access and another bus for exclusive use of the data access are provided in the processor. The harvard architecture can be employed for the L1 cache, but the employment thereof for the system bus involves a problem of using many pins of devices to connect the system bus because it requires mounting of 2 channel buses.
The fourth improvement method is considered, prior to issuance of a fetch request of an instruction code from an arithmetic operation part in a processor, to previously read the instruction code (prefetch) from a memory in a memory within the processor. Details of the prefetch is disclosed in U.S. Pat. No. 5,257,359. Disclosed in the publication is that an instruction decoder in the arithmetic operation part decodes and analyzes a required instruction code to thereby predict an instruction code to be next accessed and to previously read the instruction code. In general, the prefetch is effective when the instruction supply ability or rate of the processor is higher than an instruction execution rate thereof. However, since the prefetch within the processor is carried out through the system bus, the system bus creates a bottleneck. Further, since the prefetch within the processor is carried out through the system bus, this prefetch raises a contention with such another external access as an operand access, which disables expectation of its sufficient effect.
The effect of the prefetch generally depends on the characteristics of an instruction code to be executed. The inventor of the present application has paid attention to the fact that an embedding program to be executed on an embedded type processor contains many flows of collectively processing an access to operand data placed on a peripheral register or memory and a comparison judgement and on the basis of its judgement result, selecting the next processing, that is, the program contains lots of syntax “IF~THEN~ELSE~”, for instance, in C language. In the collective processing of operand data access and comparison judgement, the program is processed highly sequentially and tends to have a low locality of references as already mentioned above. In the processing of selecting the next processing based on the judgement result, on the other hand, a branch takes place typically on each processing unit basis of several to several tens of steps. That is, the embedding program is featured in (1) a highly sequential processing property and (2) many branches. In the case of such a program code, the access latency can be reduced by prefetching an instruction code of several to several tens of steps preceding the instruction code currently being executed. However, since the within-processor prefetch of the instruction code of several to
Inoue Yasuo
Kanai Hiroki
Takamoto Yoshifumi
Antonelli Terry Stout & Kraus LLP
McLean Kimberly
Yoo Do Hyun
LandOfFree
Information processing system for read ahead buffer memory... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Information processing system for read ahead buffer memory..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information processing system for read ahead buffer memory... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2833283