Master-slave latch circuit for multithreaded processing

Electrical computers and digital processing systems: processing – Processing control – Context preserving (e.g. – context swapping – checkpointing,...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S241000, C327S203000

Reexamination Certificate

active

06629236

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to digital data processing systems, and in particular to high-speed latches used in register memory of digital computing devices.
BACKGROUND OF THE INVENTION
A modem computer system typically comprises a central processing unit (CPU) and supporting hardware necessary to store, retrieve and transfer information, such as communications buses and memory. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU is the heart of the system. It executes the instructions which comprise a computer program and directs the operation of the other system components.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Programs which direct a computer to perform massive numbers of these simple operations give the illusion that the computer is doing something sophisticated. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but doing it much faster. Therefore continuing improvements to computer systems require that these systems be made ever faster.
The overall speed of a computer system (also called the throughput) may be crudely measured as the number of operations performed per unit of time. Conceptually, the simplest of all possible improvements to system speed is to increase the clock speeds of the various components, and particularly the clock speed of the processor(s). E.g., if everything runs twice as fast but otherwise works in exactly the same manner, the system will perform a given task in half the time. Early computer processors, which were constructed from many discrete components, were susceptible to significant speed improvements by shrinking component size, reducing component number, and eventually, packaging the entire processor as an integrated circuit on a single chip. The reduced size made it possible to increase clock speed of the processor, and accordingly increase system speed.
Despite the enormous improvement in speed obtained from integrated circuitry, the demand for ever faster computer systems has continued. Hardware designers have been able to obtain still further improvements in speed by greater integration (i.e., increasing the number of circuits packed onto a single chip), by further reducing the size of circuits, and by various other techniques. However, designers can see that physical size reductions can not continue indefinitely, and there are limits to their ability to continue to increase clock speeds of processors. Attention has therefore been directed to other approaches for further improvements in overall speed of the computer system.
Without changing the clock speed, it is possible to improve system throughput by using multiple processors. The modest cost of individual processors packaged on integrated circuit chips has made this approach practical. However, one does not simply double a system's throughput by going from one processor to two. The introduction of multiple processors to a system creates numerous architectural problems. For example, the multiple processors will typically share the same main memory (although each processor may have its own cache). It is therefore necessary to devise mechanisms that avoid memory access conflicts, and assure that extra copies of data in caches are tracked in a coherent fashion. Furthermore, each processor puts additional demands on the other components of the system as storage, I/O, memory, and particularly, the communications buses that connect various components. As more processors are introduced, there is greater likelihood that processors will spend significant time waiting for some resource being used by another processor.
Without delving into further architectural complications of multiple processor systems, it can still be observed that there are many reasons to improve the speed of the individual CPU, whether a system uses multiple CPUs or a single CPU. If the CPU clock speed is given, it is possible to further increase the work done by the individual CPU, i.e., the number of operations executed per unit time, by increasing the average number of operations executed per clock cycle.
In order to boost CPU speed, it is common in high performance processor designs to employ instruction pipelining, as well as one or more levels of cache memory. Pipeline instruction execution allows subsequent instructions to begin execution before previously issued instructions have finished. Cache memories store frequently used and other data nearer the processor and allow instruction execution to continue, in most cases, without waiting the full access time of a main memory access.
Pipelines will stall under certain circumstances. An instruction that is dependent upon the results of a previously dispatched instruction that has not yet completed may cause the pipeline to stall. For instance, instructions dependent on a load/store instruction in which the necessary data is not in the cache, i.e., a cache miss, cannot be executed until the data becomes available in the cache. Maintaining the requisite data in the cache necessary for continued execution and sustaining a high hit ratio (i.e., the number of requests for data compared to the number of times the data was readily available in the cache), is not trivial, especially for computations involving large data structures. A cache miss can cause the pipelines to stall for several cycles, and the total amount of memory latency will be severe if the data is not available most of the time. Although memory devices used for main are becoming faster, the speed gap between such memory chips and high-end processors is becoming increasingly larger. Accordingly, a significant amount of execution time in current high-end processor designs is spent waiting for resolution of cache misses.
Reducing the amount of time that the processor is idle waiting for certain events, such as re-filling a pipeline or retrieving data from memory, will increase the average number of operations per clock cycle. One architectural innovation directed to this problem is called “hardware multithreading” or simply “multithreading”. This technique involves concurrently maintaining the state of multiple executable sequences of instructions, called threads, within a single CPU. As a result, it is relatively simple and fast to switch threads.
The term “multithreading” as defined in the computer architecture community is not the same as the software use of the term. In the case of software, “multithreading” refers to one task being subdivided into multiple related threads. In the hardware definition, the threads being concurrently maintained in a processor are merely arbitrary sequences of instructions, which don't necessarily have any relationship with one another. Therefore the term “hardware multithreading” is often used to distinguish the two used of the term. As used herein, “multithreading” will refer to hardware multithreading.
There are two basic forms of multithreading. In the more traditional form, sometimes called “fine-grained multithreading”, the processor executes N threads concurrently by interleaving execution on a regular basis, such as interleaving cycle-by-cycle. This creates a gap in time between the execution of each instruction within a single thread, which removes the need for the processor to wait for certain short term latency events, such as re-filling an instruction pipeline. In the second form of multithreading, sometimes called “coarse-grained multithreading”, multiple instructions in a single thread are sequentially executed until t

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Master-slave latch circuit for multithreaded processing does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Master-slave latch circuit for multithreaded processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Master-slave latch circuit for multithreaded processing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3095818

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.