Method and apparatus for advancing load operations

Electrical computers and digital processing systems: processing – Processing control – Processing sequence control

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S225000

Reexamination Certificate

active

06658559

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to computers, and more particularly, to a computer product, method, and apparatus for load operations.
2. Description of the Related Art
Modern computers contain microprocessors, which are essentially the brains of the computer. In operation, the computer uses the microprocessor to run computer programs.
A computer program might be written in a high-level computer language, such as C or C++, using statements similar to English, which are then translated (by another program called a compiler) into numerous machine language instructions. A program might also be written in assembly language, and then translated (by another program called an assembler) into machine language instructions. In practice, every computer language above assembly language is a high-level language.
Each computer program contains numerous instructions which tell the computer what it must do to achieve the desired goal of the program. The computer runs a particular computer program by executing the instructions contained in that program.
Modem computers also contain memory. The memory might be used to store computer program data, or it might be used to store computer program instructions. In general, every individual location in a computer memory has an address associated with it. The address might be a physical address or a virtual address. A physical address is one that corresponds to a fixed hardware memory location; a virtual address does not. Specifically, in microprocessors which support virtual addressing, computer programs reference virtual addresses, which are then mapped by memory management hardware onto physical addresses before the memory is actually read or written.
A memory cache is a special sub-system in which frequently used data is stored for quick access, e.g. it stores the contents of frequently accessed memory locations and the address where those data items belong. When a microprocessor attempts to perform a load reference to an address in memory, the cache is checked to see whether it holds that address/data. If it does, the data is returned to the microprocessor from the cache and no reference is sent to memory. If it does not, a regular memory access occurs and the missing data is commonly copied from memory into the cache. When a microprocessor attempts to perform a store reference to an address in memory, again the cache is checked to see whether it holds that address. If it does, the cache will be updated with the store data. The store may also be sent to memory (write-through policy) or not (write-back policy). If the cache does not hold the store address (or the line in the cache is also contained within another device's cache, i.e. in a SHARED state), then the store may be sent directly to memory (write-through policy) or the missing data may be copied from memory into the cache and then updated (in the cache) with the store data (typical write-back policy). Accessing a memory cache is faster than accessing memory.
RAM or Random Access Memory, is a semiconductor-based memory that can be read and written by the microprocessor or other hardware devices. The storage locations can be accessed in any order. RAM is the type of memory frequently used as main memory on a personal computer.
Most modern microprocessors use a design technique called pipelining, where each operation is performed in a series of pipeline stages. In operation, a microprocessor fetches an instruction from memory and feeds it into one end of the pipeline. The pipeline is made up of several stages, each stage performing some function or process necessary or desirable to process the instruction before passing the instruction to the next stage. Thus the output of one stage serves as input to a second, the output of the second stage serves as input to the third, and so on. Therefore, in any clock cycle, more than one instruction may be in the process of execution (one per stage, or more than one per stage if the stages have multiple functional units).
Ideally, pipelining speeds execution time by ensuring that the microprocessor does not have to wait for instructions; when it completes execution of one instruction, the next is ready and waiting.
In some advanced microprocessors, the pipeline is designed to support the processing of selected instructions speculatively. Speculative execution is a technique in which certain instructions are executed and results made available before they are determined to be needed by the program. Consequently, it also involves determining whether the need ever actually occurs, and if it does, making sure that the results of what was done ahead of time are still valid. Once all these questions about a speculatively executed instruction have been answered favorably, the instruction is said to be resolved, retired, or architecturally committed, and is no longer speculative.
One class of instructions frequently contained in a computer program are store instructions. Store instructions are assembly or machine level instructions that cause information to be written by the executing processor into a particular location (address) in memory.
Another class of instructions frequently contained in a computer program are load instructions. Load instructions are assembly or machine level instructions that cause data to be taken from a particular location (address) in memory, and placed into a specified register within the executing processor so that the data can be acted upon during execution of a subsequent instruction.
An important source of performance loss in modern microprocessors is waiting for data to be returned from long latency load operations. In the sequence of instructions contained in a computer program, a load instruction often closely precedes the instruction that acts upon the data loaded. Because such an instruction needs to wait for the load operation to complete before it can begin its execution, time spent waiting for completion of the load operation delays execution of the computer program.
One technique used to reduce this delay involves changing the sequence of instructions in the computer program so that the load occurs earlier than it would in the normal sequence of instructions. This change in sequence may be done by the compiler. Moving a load up-stream from its normal position in the sequence of instructions is sometimes called advancing the load or boosting the load. The basic idea is to start the load operation as early as possible, giving as much time as possible for the load operation to complete before any instructions dependent on the load are encountered in the sequence of instructions. Store instructions, however, limit how far ahead a load instruction may be advanced. This limit arises because the compiler often cannot determine whether a load instruction and a store instruction conflict, that is, whether they are reading from and writing to overlapping physical memory locations.
In the unoptimized sample code fragment,
add r
1
+r
2
→r
3
store [r
4
], r
5
sub r
6
−r
7
→r
8
load [r
9
]→r
10
and r
10
, r
11
→r
12
the r
1
, r
2
, and so forth are registers. The brackets around r
4
and r
9
are used to denote that the contents of r
4
and r
9
are to be used as the addresses for the store and load operations. If the compiler cannot determine whether r
4
and r
9
are referring to overlapping physical memory locations, then r
4
and r
9
are referred to as being unresolved with respect to each other, or as undisambiguated memory addresses.
In this example, since the load instruction (the next-to-last instruction) and the instruction that uses the data loaded (the last instruction, i.e. the “and” instruction) are only separated by one clock cycle, then if the load instruction has a latency of over one clock cycle, the microprocessor will not have the data needed by the “and” instruction available in time, and, consequently, will need to defer or stall execution of the “and” instruction and potentially all later in

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for advancing load operations does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for advancing load operations, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for advancing load operations will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3095930

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.