Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1998-10-12
2002-08-06
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S207000, C711S143000, C711S118000
Reexamination Certificate
active
06430657
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to memory access operations in computer systems. More specifically, the present invention relates to atomic memory update operations typically used to access semaphores.
DESCRIPTION OF THE RELATED ART
In computer systems, it is common for two or more processes to contend for the same resource. For example, two or more processes may attempt to write a particular sequence of commands to a video controller. The processes may be executed by a single central processing unit (CPU), or may be executed by two or more CPUs in a multi-processor computer system. The terms CPU and processor will be used herein in interchangeably.
Since the processes cannot access the resource at the same time, the operating system of the computer must provide some mechanism to schedule access to the resource. One common mechanism known in the art is the “take-a-number” scheduling algorithm. This algorithm is somewhat analogous to a group of customers that wish to be serviced by a single store clerk. When a customer enters the store, the customer takes a number. When the clerk calls that number, the customer is serviced by the clerk.
Using this analogy, the mechanism that provides the “number” to the process is known in the art as a semaphore. Typically, a semaphore is stored in a memory location. A process seeking to access the semaphore first reads the memory location, increments the value read from the memory location, and stores the result back in the memory location. The value read from the memory location acts as the “number” for the process, and the result written back to the memory location acts as the next “number” for the next process that attempts to access the resource. When the operating system indicates that the holder of a particular “number” may access the resource, the process holding that “number” does so.
For the “take-a-number” scheduling algorithm to operate correctly, it is critical that the memory read, increment, and memory write operations occur “atomically”. In other words, there must be no chance that a second process can read the memory location holding the semaphore between the point at which the first process reads the memory location and the point at which the first process writes the incremented value back to the memory location. If such a read operation by the second process occurred, then the first and second processes would each have the same “number”, and may try to access the resource concurrently.
Ensuring that semaphore operations occur atomically is relatively simple in a single CPU computer system in which no other devices coupled to the bus perform direct memory access (DMA) operations. For example, the 32-bit Intelg architecture (IA-32), which is used by the Intel® i486™, Pentium®, Pentium® Pro, Pentium® II, and Celeron™ CPUs, includes the “exchange and add” (XADD) instruction. When using this instruction to access a memory location containing a semaphore, the XADD instruction is typically used as follows:
XADD destination memory location, source register
This instruction stores the sum of the values contained in the destination memory location and the source register in a temporary register, stores the contents of the destination memory location in the source register, and stores the contents of the temporary register in the destination memory location. Accordingly, if the value “1” is stored in the source register when the instruction is executed, then when the instruction is completed the value in the destination memory location will be incremented by “1” and the value originally in the destination memory location will be stored in the source register. Since an interrupt will not be processed until an instruction is complete and the computer system in this example has a single CPU (and no other devices are performing DMA operations), no other process can access the semaphore during the read-modify-write operation performed by the XADD instruction. Accordingly, the semaphore operation occurs atomically. The IA-
32
exchange (XCHG) instruction and compare and exchange (CMPXCHG) instruction are also commonly used to ensure atomic access to semaphores.
In multi-processor computer systems and systems having devices that perform DMA operations, assuring atomicity is more complex because it is possible that a second CPU or device may attempt to access the semaphore before the first CPU increments and writes the semaphore back to the memory location. In such computer systems, atomicity is provided either by a bus lock mechanism or a cache coherency mechanism. Before discussing these mechanisms in detail, it is helpful to first consider the operation of CPU cache memories.
Cache memories are relatively small and fast memories that hold a subset of the contents of main memory. For example, a computer system based on a Pentium® II CPU has a level one (L
1
) cache on the same integrated circuit (IC) as the CPU, and a level two (L
1
) cache on the same module as the CPU, but on a separate IC. The L
1
cache is smaller and faster than the L
2
cache. Main memory contents are stored in cache memories in units called cache lines. The cache line size of the L
1
and L
2
caches in a Pentium® CPU is 32 bytes.
The Intel® i486™ CPU uses a “write-through” L
1
cache. In such a cache, a memory write from the CPU is written to the cache and main memory concurrently. Beginning with the Intel® Pentium® CPU, Intel® processors provide support for “write-back” caches. In a write-back cache, a memory write from the CPU is only written to the cache. The cache mechanism then determines whether (and when) the memory write is actually committed to main memory. This increases performance because the write to main memory can be deferred until main memory is not busy. In addition, it is possible that the memory operand many change several times before it is necessary to write the memory operand back to main memory. Also, it provides an opportunity for a cache to assemble a complete cache line of changes before writing the cache line back to memory, which is known in the art as coalescing.
Cache coherency mechanisms ensure that memory contents stored in CPU caches and main memory remain coherent. For example, if the cache of a first CPU contains a cache line having changed (or “dirty”) contents that have not been written back to main memory, and a second CPU attempts to read the corresponding memory location from main memory, the cache coherency mechanism ensures that the second CPU is provided with the correct contents from the cache of the first CPU, not the incorrect contents currently stored in main memory. The cache coherency mechanism can accomplish this in several ways. One technique is to simply force the cache of the first CPU to write the changed cache line back to main memory. Another technique allows the cache of a second CPU to “snoop” changes to the cache of the first CPU, thereby allowing the second CPU cache to be continually updated with the changes made in the first CPU cache.
Furthermore, a CPU can request that a cache line be loaded as “shared” or “exclusive”. A shared cache line cannot be changed by the CPU, and therefore is advantageously used in situations where it is known that the contents of the cache line will not be changed (e.g., program code). An exclusive (or alternatively, “private”) cache line can be changed by the CPU. Typically, a “dirty bit” is associated with an exclusive cache line to indicate if the contents have changed. If the dirty bit is set to indicate that the cache line has changed, the cache line must be written back to main memory. If the dirty bit is cleared to indicate that the cache line has not changed, the cache line can be discarded with being written back to main memory. Typically only one CPU can hold a particular cache line as exclusive at any given time.
Returning to the topic ofatomicity, early IA-
32
CPUs provide atomicity by storing semaphores in non-cacheable memory or memory cached using the write-through method, and by issuing a “bus lock” when accessing the semaphore. A bus lock
Hammond Gary N.
Huck Jerome C.
Mittal Millind
Whittaker Martin J.
Chace Christian P.
Institute for the Development of Emerging Architecture L.L.C.
Kim Matthew
Plether David A.
LandOfFree
COMPUTER SYSTEM THAT PROVIDES ATOMICITY BY USING A TLB TO... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with COMPUTER SYSTEM THAT PROVIDES ATOMICITY BY USING A TLB TO..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and COMPUTER SYSTEM THAT PROVIDES ATOMICITY BY USING A TLB TO... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2905190