Electrical computers and digital processing systems: processing – Processing architecture – Array processor
Reexamination Certificate
1997-06-25
2001-05-29
Eng, David Y. (Department: 2155)
Electrical computers and digital processing systems: processing
Processing architecture
Array processor
Reexamination Certificate
active
06240502
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates in general to microprocessors and, more particularly, to a system, method, and processor architecture for dynamically reconfiguring a processor between uniprocessor and selected multiprocessor configurations.
2. Relevant Background
Early computer processors (also called microprocessors) included a central processing unit or instruction execution unit that executed only one instruction at a time. As used herein the term processor includes complete instruction set computers (CISC), reduced instruction set computers (RISC) and hybrids. The processor executes programs having instructions stored in main memory by fetching their instruction, decoding them, and executing them one after the other. In response to the need for improved performance several techniques have been used to extend the capabilities of these early processors including pipelining, superpipelining, superscaling, speculative instruction execution, and out-of-order instruction execution.
Pipelined architectures break the execution of instructions into a number of stages where each stage corresponds to one step in the execution of the instruction. Pipelined designs increase the rate at which instructions can be executed by allowing a new instruction to begin execution before a previous instruction is finished executing. Pipelined architectures have been extended to “superpipelined” or “extended pipeline” architectures where each execution pipeline is broken down into even smaller stages (i.e., microinstruction granularity is increased). Superpipelining increases the number of instructions that can be executed in the pipeline at any given time. “Superscalar” processors generally refer to a class of microprocessor architectures that include multiple pipelines that process instructions in parallel. Superscalar processors typically execute more than one instruction per clock cycle, on average. Superscalar processors allow parallel instruction execution in two or more instruction execution pipelines. The number of instructions that may be processed is increased due to parallel execution. Each of the execution pipelines may have differing number of stages. Some of the pipelines may be optimized for specialized functions such as integer operations or floating point operations, and in some cases execution pipelines are optimized for processing graphic, multimedia, or complex math instructions.
The goal of superscalar and superpipeline processors, is to execute multiple instructions per cycle (IPC). Instruction-level parallelism (ILP) available in programs written to operate on the processor can be exploited to realize this goal. However, many programs are not coded in a manner that can take full advantage of deep, wide instruction execution pipelines in modern processors. Many factors such as low cache hit rate, instruction interdependency, frequent access to slow peripherals, branch mispredictions and the like cause the resources of a superscalar processor to be used inefficiently.
Superscalar architectures require that instructions be dispatched for execution at a sufficient rate. Conditional branching instructions create a problem for instruction fetching because the instruction fetch unit (IFU) cannot know with certainty which instructions to fetch until the conditional branch instruction is resolved. Also, when a branch is detected, the target address of the instructions following the branch must be predicted to supply those instructions for execution.
Recent processor architectures use a branch prediction unit to predict the outcome of branch instructions allowing the fetch unit to fetch subsequent instructions according to the predicted outcome. These instructions are “speculatively executed” to allow the processor to make forward progress during the time the branch instruction is resolved.
Another solution to increased processing power is provided by multiprocessing. Multiprocessing is a hardware and operating system feature that allows multiple processors to work together to share workload within a computing system. In a shared memory multiprocessing system, all processors have access to the same physical memory. One limitation of multiprocessing is that programs that have not been optimized to run as multiple process may not realize significant performance gain from multiple processors. However, improved performance is achieved where the operating system is able to run multiple programs concurrently, each running on a separate processor.
Multithreaded software is a recent development that allows applications to be split into multiple independent threads such that each thread can be assigned to a separate processor and executed independently parallel as if it were a separate program. The results of these separate threads are reassembled to produce a final result. By implementing each thread on a separate processor, multiple tasks are handled in a fast, efficient manner. The use of multiple processors allows various tasks or functions to be handled by other than a single CPU so that the computer power of the overall system is enhanced. However, because conventional multiprocessors are implemented using a plurality of discrete integrated circuits, communication between the devices limits system clock frequency and the ability to share resources amongst the plurality of processors. As a result, conventional multiprocessor architectures result in duplication of resources which increases cost and complexity.
Given the wide variety and mix of software used on general purpose processors, it often occurs that some programs run most efficiently on superscalar, superpipeline uniprocessors while other programs run most efficiently in a multiprocessor environment. Moreover, the more efficient architecture may change over time depending on the mix of programs running at any given time. Because the architecture was defined by the CPU manufacturer and system board producer, end users and programmers had little or no ability to configure the architecture to most efficiently use the hardware resources to accomplish a given set of tasks.
SUMMARY OF THE INVENTION
Briefly stated, the present invention involves a system, method, and processor architecture that adapts a processor's hardware to support multiple applications running in parallel on a single integrated circuit chip. The processor in accordance with the present invention can be dynamically reconfigured to have one, or more than one virtual processor unit, also called a strand. Each strand can run an independent application. Instructions from each application are fetched in a round-robin fashion from the instruction cache and deposited in the instruction scheduling window. The instruction scheduling window picks instructions from all active processes for execution. The processor includes retirement logic to retire instructions on a process-by-process basis. The configuration change from m strands to n strands is accomplished by an instruction issued either by the operating system, or by an application.
In one aspect, the present invention is a method for dynamically reconfiguring a processor that involves placing the processor in a first configuration having a first number (m) of virtual processors while the coded instructions comprise instructions from a number (m) threads or processes. The instructions in each of the m threads are executed on one of the m strands using execution resources at least some of which are shared among the m strands. While the coded instructions comprise instructions from a number (n) threads, the processor is placed in a second configuration having a second number (n) strands. The instruction are executed in each of the n strands using execution resources at least some of which are shared among the n strands.
In another aspect, the present invention involves a processor that executes coded instructions from one or more applications. The processor includes a fetch unit operative to fetch selected bundles of instructions on a thread-by-thread basis and a marking
Hetherington Ricky C.
Panwar Ramesh
Eng David Y.
Gunnison McKay & Hodgson, L.L.P.
McKay Philip J.
Sun Microsystems Inc.
LandOfFree
Apparatus for dynamically reconfiguring a processor does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus for dynamically reconfiguring a processor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus for dynamically reconfiguring a processor will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2479325