Method and apparatus for handling cache misses in a computer...

Electrical computers and digital processing systems: multicomput – Computer-to-computer data routing – Least weight routing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S241000

Reexamination Certificate

active

06272516

ABSTRACT:

COPYRIGHT NOTICE
Contained herein is material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent disclosure by any person as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights to the copyright whatsoever.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is related to the field of computer system architecture. More specifically, the present invention is related to a multitasking computer system architecture supporting multiple independent, specialized, loosely coupled processors. The architecture provides a novel approach to scheduling processes for execution on one of the multiple processors, migrating processes between the processors, rescheduling of processes upon a cache miss, and distributing memory along pipeline stages in the processors. The computer system architecture is particularly optimized for operations related to data packet switching as may be performed by an International Standards Organization (ISO) Open Systems Interconnection (OSI) Layer 2 (i.e., media access control sublayer—MAC) based network switching device, i.e., a switching hub, in a data communications network. The architecture is further applicable to operations related to routing as may be performed by an ISO Layer 3 (i.e., network layer) based network device.
2. Description of the Related Art
Introduction
In the prior art, a switching hub typically is designed around a computer system having a single processor. The computer system is controlled by software optimized to receive and transmit data packets between local or wide area network segments in a data communications network. As an optimized computer system, the prior art switching hub is generally comprised of the same components as a general purpose computer system, including a central programmable processor, an internal control bus, a data bus, and shared common memory controlled by the central programmable processor. Additionally, the prior art switching hub has a plurality of media access controllers (MACs), each having an associated port coupled to one of the local or wide area network segments.
A prior art switching hub may be further optimized, for example, by introducing memory subsystems and input/output (I/O) devices particularly suited for processing data packets. However, by definition, a general purpose computer system is not designed with a particular application, such as data packet switching, as the primary application. As a result, a switching hub based on a general purpose computer system generally does not fully utilize the capabilities of the computer system. Moreover, the maximum data packet processing throughput of the switching hub is limited by the general purpose computer system architecture. In general, in order for a particular application to be performed by a computer system as quickly, inexpensively and efficiently as reasonably possible, what is needed is a computer system architecture designed to optimize the operations performed by the computer system to carry out a particular application. In particular, what is needed is an improved computer system architecture that is designed to facilitate the extremely high data packet processing rates required by a high performance switching hub.
Overview of Switching Hub Functions vs. General Purpose Computer System Functions
A brief overview of some of the needs of and functions typically performed by a switching hub as opposed to the functions generally performed by a general purpose computer system will now be discussed. The overview serves to further identify the need for an improved computer system for use in a switching hub.
Latency and Throughput
A switching hub primarily performs data packet processing. A switching hub “switches” data packets from one network segment to another network segment. That is, the switching hub receives data packets on a port coupled to a network segment, internally processes the data packet, and transmits the data packet out a port coupled to a different network segment. Data packet processing is very I/O intensive relative to the processing performed by a general purpose computer system. A switching hub may process data packets at very high rates. At these high rates, there is no long or medium term temporal locality of data because all the data (in the form of data packets) enters the switching hub and shortly thereafter leaves the hub. Furthermore, data packets received by the hub are generally independent of each other. Thus, traditional parallel processing techniques are more readily applied to data packet processing within the hub.
The volume of data packets switched by a switching hub is a very important factor when considering the performance of the hub. However, the time required to process a particular data packet is not as critical. In other words, latency, i.e., the delay in switching a data packet, is not so important a consideration as overall data packet throughput. This factor, combined with the fact that data packets are generally independent of each other, means it is not so important what task is being performed by the switching hub so long as that at least some task is being performed at any given time.
However, a primary goal in the design of a general purpose computer system is to reduce instruction latency. To this end, a computer system uses well known pipeline techniques in an attempt to reduce the clock cycle time and thereby improve throughput. When using these techniques, each instruction executed by a general purpose computer system generally requires the results from the immediately preceding instruction. As a result, such systems typically incorporate a bypassing or feedforward technique, in which an instruction at a stage in the pipeline receives its arguments sooner than it would otherwise. However, introducing these techniques adds stages to the pipeline. While adding stages allows for a decrease in the clock cycle time of the system, introduction of the bypass logic requires the clock cycle time to be increased to allow time for the bypass logic to operate. What is needed is a computer architecture where each instruction in a pipeline is being executed for a different independent process so that the instructions do not depend on the preceding one or more instructions in the pipeline, thus obviating the need for bypass logic and allowing the ability to provide simpler and deeper, i.e., longer pipelines.
Temporal Locality
Most general purpose computer systems assume data have several properties, including temporal and spatial locality. Temporal locality refers to the notion that once a data item is accessed, it will generally be accessed again relatively soon. Thus, most general purpose computer systems have general registers which provide an extremely fast (and small) cache for recently accessed, i.e., important, data. Indeed, most general purpose computer systems require specific instructions to load data into or store data maintained in these general registers. While there is overhead associated with performing a load or store instruction, the overhead is minimal in most computing environments. In a switching hub environment in which data packet processing is the primary function, the load and store instructions can comprise a large percentage of the overall instruction stream in many cases. What is needed, then, is a means by which the need to load and store data in a general register is eliminated.
Temporal locality strongly influences the design of cache memory in most general purpose computer systems. As the size of the cache grows, it effectively increases the time scale for temporal locality. Larger caches allow the general purpose computer system to retain recently used data for a longer period of time in the relatively faster cache, thereby improving the average speed of retrieving and processing the data. However, in a switching hub environment, there is generally no long term temporal locality for a data packet, because once the data packet h

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for handling cache misses in a computer... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for handling cache misses in a computer..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for handling cache misses in a computer... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2480613

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.