Slack fetch to improve performance in a simultaneous and...

Electrical computers and digital processing systems: processing – Processing control

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Slack fetch to improve performance in a simultaneous and... Slack fetch to improve performance in a simultaneous and...

: 2000-05-30
: 2004-06-29
: Chan, Eddie (Department: 2783)
: Electrical computers and digital processing systems: processing
: Processing control

: C710S034000, C714S011000
: Reexamination Certificate
: active
: 06757811
: ABSTRACT:

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
Not applicable.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to microprocessors. More particularly, the present invention relates to a pipelined, multithreaded processor that can process the same instruction set in at least two separate threads. More particularly still, the invention relates to imposing “slack” between corresponding instructions in the threads of a multithreaded processor to improve the processor's performance.
2. Background of the Invention
Solid state electronics, such as microprocessors, are susceptible to transient hardware faults. For example, cosmic rays can alter the voltage levels that represent data values in microprocessors, which typically include tens or hundreds of thousands of transistors. Cosmic radiation can change the state of individual transistors causing faulty operation. Faults caused by cosmic radiation typically are temporary and the transistors eventually switch back to their normal state. The frequency of such transient faults is relatively low—typically less than one fault per year per thousand computers. Because of this relatively low failure rate, making computers fault tolerant currently is attractive more for mission-critical applications, such as online transaction processing and the space program, than computers used by average consumers. However, future microprocessors will be more prone to transient fault due to their smaller anticipated size, reduced voltage levels, higher transistor count, and reduced noise margins. Accordingly, even low-end personal computers may benefit from being able to protect against such faults.
One way to protect solid state electronics from faults resulting from cosmic radiation is to surround the potentially effected electronics by a sufficient amount of concrete. It has been calculated that the energy flux of the cosmic rays can be reduced to acceptable levels with six feet or more of concrete surrounding the computer containing the chips to be protected. For obvious reasons, protecting electronics from faults caused by cosmic ray with six feet of concrete usually is not feasible. Further, computers usually are placed in buildings that have already been constructed without this amount of concrete. Other techniques for protecting microprocessors from faults created by cosmic radiation also have been suggested or implemented.
Rather than attempting to create an impenetrable barrier through which cosmic rays cannot pierce, it is generally more economically feasible and otherwise more desirable to provide the effected electronics with a way to detect and recover from a fault caused by cosmic radiation. In this manner, a cosmic ray may still impact the device and cause a fault, but the device or system in which the device resides can detect and recover from the fault. This disclosure focuses on enabling microprocessors (referred to throughout this disclosure simply as “processors”) to recover from a fault condition. One technique, such as that implemented in the Compaq Himalaya system, includes two identical “lockstepped” microprocessors. Lockstepped processors have their clock cycles synchronized and both processors are provided with identical inputs (i.e., the same instructions to execute, the same data, etc.). A checker circuit compares the processors' data output (which may also include memory addressed for store instructions). The output data from the two processors should be identical because the processors are processing the same data using the same instructions, unless of course a fault exists. If an output data mismatch occurs, the checker circuit flags an error and initiates a software or hardware recovery sequence. Thus, if one processor has been affected by a cosmically-created fault, its output likely will differ from that of the other synchronized processor. Although lockstepped processors are generally satisfactory for creating a fault tolerant environment, implementing fault tolerance with two processors takes up valuable real estate.
A pipelined, simultaneous multithreaded, out-of-order processor generally can be lockstepped. A “pipelined” processor includes a series of functional units (e.g., fetch unit, decode unit, execution units, etc.), arranged so that several units can be simultaneously processing an appropriate part of several instructions. Thus, while one instruction is being decoded, an earlier fetched instruction can be executed. A “simultaneous multithreaded” (“SMT”) processor permits instructions from two or more different program threads (e.g., applications) to be processed through the processor simultaneously. An “out-of-order” processor permits instructions to be processed in an order that is different than the order in which the instructions are provided in the program (referred to as “program order”). Out-of-order processing potentially increases the throughput efficiency of the processor. Accordingly, an SMT processor can process two programs simultaneously. It is generally possible to cycle lock step on SMT processor.
An SMT processor can be modified so that the same program is simultaneously executed in two separate threads to provide fault tolerance within a single processor. Such a processor is called a simultaneously and redundantly threaded (“SRT”) processor. Some of the modifications to turn a SMT processor into an SRT processor are described in U.S. Provisional Application Serial No. 60/198,530. Executing the same program in two different threads permits the processor to detect faults such as may be caused by cosmic radiation, noted above. By comparing the output data from the two threads at appropriate times and locations within the SRT processor, it is possible to detect whether a fault has occurred. For example, data written to cache memory or registers that should be identical from corresponding instructions in the two threads can be compared. If the output data matches, there is no fault. Alternatively, if there is a mismatch in the output data, a fault has occurred in one or both of the threads.
Executing the same program in two separate threads advantageously affords the SRT processor some degree of fault tolerance, but also may cause several performance problems. For instance, any latency caused by a cache miss is exacerbated. Cache misses occur when an instruction requests data from memory that is not also available in cache memory. The processor first checks whether the requested data already resides in the faster access cache memory, which generally is onboard the processor die. If the requested data is not present in cache (a condition referred to as a cache “miss”), then the processor is forced to retrieve the data from main system memory which takes more time, thereby causing latency, than if the data could have been retrieved from the faster onboard cache. Because the two threads are executing the same instructions, any instruction in one thread that results in a cache miss will also experience the same cache miss when that same instruction is executed in other thread. That is, the cache latency will be present in both threads.
A second performance problem concerns branch misspeculation. A branch instruction requires program execution either to continue with the instruction immediately following the branch instruction if a certain condition is met, or branch to a different instruction if the particular condition is not met. Accordingly, the outcome of a branch instruction is not known until the instruction is executed. In a pipelined architecture, a branch instruction (or any instruction for that matter) may not be executed for at least several, and perhaps many, clock cycles after the branch instruction is fetched by the fetch unit in the processor. In order to keep the pipeline full (which is desirable for efficient operation), a pipelined processor includes branch prediction logic which predicts the outcome of a branch instruction before it is actually executed (also referred to as “speculating”). Branch prediction logi

Affiliated with

Mukherjee Shubhendu S.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Chan Eddie

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Harkness Charles

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Slack fetch to improve performance in a simultaneous and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Slack fetch to improve performance in a simultaneous and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Slack fetch to improve performance in a simultaneous and... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3310920

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure