Method and apparatus to share instruction images in a...

Electrical computers and digital processing systems: memory – Addressing combined with specific memory configuration or... – Addressing cache memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S125000, C711S144000, C711S203000, C711S207000, C711S209000, C711S210000

Reexamination Certificate

active

06298411

ABSTRACT:

BACKGROUND OF THE INVENTION
Single-threaded microprocessors are defined by the fact that, although multiple “concurrent” processes appear to be running simultaneously, in reality, only one process or thread of execution is actually running at any time. This distinction becomes slightly blurred where multiple execution blocks, or arithmetic logic units (ALUs) can execute in parallel. In superscalar processors, multiple instructions may be issued and executed each cycle. Still, these instructions come from a single thread of execution. Simultaneous multithreaded processors, on the other hand, allow instructions from different threads to execute simultaneously.
FIG. 1
is a block diagram of a multithreaded instruction pipeline
10
. Multiple program counters
12
(PCs) are maintained, one per thread. The example of
FIG. 1
shows two PCs, one for each of two threads. The PCs are used to fetch instructions for their respective threads from the instruction cache
14
or other memory. The fetched instructions are queued up in the instruction queue
16
and then issued.
Registers within the virtual register files
18
are accessed as necessary for the issued instructions. The instructions are then sent to the execution box
20
for execution, which, in a superscalar architecture, may contain several arithmetic logic units (ALUs) which execute in parallel. Different ALUs may have different functions. For instance, some ALUs perform integer operations while others perform floating point operations. Finally, memory is accessed in block
22
.
FIG. 2A
is a chart illustrating a typical allocation
30
of ALU slots in a non-simultaneous multithreading superscalar processor. In this example, there are four ALUs, each represented by one of the four columns labeled
0
-
3
. Instructions are allocated to ALUs as the ALUs become available. Thus, at time slot
32
, two ALUs have been allocated. In the next cycle, time slot
34
, three ALUs are allocated. However, there are many empty slots in which some of the ALUs sit idle, e.g., ALUs
2
and
3
in time slot
32
. By allowing multiple threads to execute simultaneously, it is the goal of designers to fill these empty slots as often as possible to fully utilize the processor's resources.
FIG. 2B
is a chart
40
similar to that of
FIG. 2A
, but illustrating the allocation of ALU slots in a simultaneous multithreading superscalar system. Allocated instructions associated with one thread, say thread
0
, are indicated with a dot, while instructions associated with another thread, say thread
1
, are indicated with an X. For example, in the time slot
42
, ALUs
0
and
1
are allocated to instructions from thread
0
, while ALUs
2
and
3
are allocated to instructions from thread
1
. In time slot
44
, ALUs
0
,
1
and
3
are allocated to thread
0
while ALU
2
is allocated to thread
1
.
While there may still be idle ALU slots, as in time slot
46
, a comparison with
FIG. 2A
shows that idle ALU slots are far fewer in a simultaneous multithreading system. Thus, simultaneous multithreading systems are more efficient and, while not necessarily speeding up the execution of a single thread, dramatically speed up overall execution of multiple threads, compared to non-simultaneous multithreading systems.
FIG. 3
illustrates the concept of virtual to physical memory mapping. Typically, a program, or a thread, executes in its own virtual address space
50
, organized into blocks called pages
52
. Pages of physical memory
58
are allocated as needed, and the virtual pages
54
are mapped to the allocated physical pages
58
by a mapping function
54
. Typically, this mapping function
54
, or page list as it is more commonly known, is a large table stored in memory, requiring long memory access times.
To reduce these long lookup times, a relatively small cache, called a table lookaside buffer (TLB) is maintained. The TLB holds mappings for recent executing instructions with the expectation that these instructions will need to be fetched again in the near future. The TLB is generally a content-addressable memory (CAM) device having a virtual address as its lookup key.
During a context switch, in which the executing process is swapped out and replaced with another process, much of the cached memory used by the first process becomes invalid. However, some processes may use common instruction code. There are various methods for sharing code, such as mapping different virtual addresses from different address spaces to the same physical address. This is often done with system memory space.
SUMMARY OF THE INVENTION
In a cache system simultaneously executing multiple threads, some number of thread slots are available. When a process is swapped in, it may be swapped into any of the thread slots. Thus, a process can be thought of more as a software device, while a thread can be thought of as a hardware device with which any process may be associated for some length of time. Each thread executes in its own virtual address space. Since different processes may share code, threads may also share code. Of course, where two executing threads refer to the same virtual address within their own address spaces, but the mapped physical addresses are different, different cache entries are required. However, when the mapped physical addresses are the same, it is desirable to use the same cache entry. For example, this happens in many systems when multiple instances of the same program are executing.
In executing a virtual cache lookup, it is necessary to not only use the virtual address to look up an entry, but the address space identifier as well, since matching virtual addresses alone do not guarantee a proper cache entry. Thus, an address space number (ASN), which is a sort of abbreviated process identifier, is maintained in each cache entry. If the ASN of a retrieved cache entry matches the address space of the requesting thread, the cache entry is valid for that lookup. An address space match (ASM) bit may also be maintained in each entry to indicate that the entry is good for all address spaces. This is useful, for example, where the shared space is system space.
Instruction cache hit rate is an important component of performance, and by allowing instruction images to share cache capacity, hit rate is improved. A problem occurs, however, where the virtual addresses for different threads are the same and map to the same physical address. Where the stored ASN is different from a requesting thread's ASN, a cache miss unnecessarily results. A preferred embodiment of the present invention solves this problem by providing thread address space match (TASM) bit indicators.
In accordance with a preferred embodiment of the present invention, a method of accessing information in a cache of a multithreaded system comprises:
providing a virtual address to locate information for a thread; and
upon a cache miss, comparing a physical address of the information with a physical address of information stored in the cache and, with a match of physical addresses, accessing the information from the cache.
In a preferred embodiment, the cache is searched for an entry having a virtual address which matches the virtual address of the information being accessed and having an indication of being associated with the thread which is accessing the information. Upon finding such an entry, the information is accessed from the cache. In addition, the information may be accessed from the cache upon finding an entry in the cache whose virtual address matches the information virtual address and which either has an address space matching the address space of the thread, or has an indication that the entry matches all address spaces.
Upon a cache miss, the accessed information's virtual address is mapped to a physical address. This is preferably performed by a translation lookaside buffer (TLB). In a preferred embodiment, the cache entry's virtual address is also mapped to a physical address, preferably by the TLB, whose look-up key comprises a virtual address and an address sp

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus to share instruction images in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus to share instruction images in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus to share instruction images in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2616258

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.