Method and system for substantially registerless processing

Electrical computers and digital processing systems: processing – Instruction fetching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S228000

Reexamination Certificate

active

06738895

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to processors for computer systems and, more specifically, to processors utilized in conjunction with and/or embedded within memory devices.
BACKGROUND OF THE INVENTION
Automated systems commonly utilize Central Processing Units (CPU) connected to various peripheral devices including caches, memory storage devices, and numerous peripherals over various busses and other interconnections. Generally, designers of automated systems have strived to improve system performance by increasing CPU processing speeds, bus speeds, memory utilization rates, and various other parameters. Additionally, significant efforts have been undertaken to simultaneously reduce the size and power requirements of such systems. While significant reductions in size and power requirements have occurred, software programs used by many of today's systems have tremendously increased in size and complexity. As a result, today's designers are often faced with the daunting challenge of having to squeeze ever more data, including video data and audio data, through CPUs at ever increasing rates while decreasing the size and power requirements of such systems.
For many applications, the ability of CPUs to process large quantities of data is often dictated by how fast, how much, and how quickly the CPU can obtain information from and/or write to memory or other data storage devices. As is well known in the art, today's systems often include multiple data storage devices, such as Random Access Memory (RAM), Read Only Memory (ROM), and various other peripheral storage devices such as hard disc drives, and write/rewritable magnetic and optical storage devices. Additionally, CPUs often obtain data from various non-localized data storage devices via communications networks such as the Internet. Since each storage device often contains data which is specified in variable word lengths and since today's CPUs generally utilize registers of fixed widths, the CPU commonly has to repeatedly request segments of the data until an entire data word is processed.
In most computer applications, the process of retrieving data from a memory location often takes longer than the time necessary to actually process the given quantity of data because the ability of the CPU to process information is significantly greater than its ability to retrieve information from memory storage devices. In order to speed up the processing capabilities of CPUs, many system designers utilize cache memory, which may be built onto the same chip as the processor itself. While caching certain segments of code is helpful in processing routine instructions, for many applications, such as data mining, speech recognition and video image processing, caching such information is generally not practical. As a result, for many applications, CPUs generally have to recall vast quantities of information from memory storage devices in byte sizes set by the size of registers.
Additionally, since registers are commonly provided in pre-set widths (i.e., 64 bits or 32 bits), multiple registers are often needed to download/retrieve large quantities of data from a storage device within a reasonable time period. These registers are often directed to download data and then hold it until the CPU is ready to perform a specific task. When configured in this manner, many systems result in CPUs with large numbers of registers, each of which increase power requirements and inhibit system miniaturization. For example, the popular Pentium III® processor utilizes over 100 registers to support its various features and fuinctions.
As is commonly known in the art, CPU's often begin the processing of large quantities of data by first determining a location for the data (i.e., the address), then fetching the data provided at the address, processing the fetched data, determining a location (i.e., a second address) where the result of the data processing is to be sent, sending the result to the second location, and then determining an instruction pointer, which preferably contains the address for the next instruction. Generally, the first address, the data, the second address, the result location, and the instruction pointer are provided in a memory array in sequential order. The memory is generally configured in sequential order during compiling so that the number of JUMPs are limited and the processing needed to determine which instruction is to be processed next is reduced. While compiling a program to reduce the number of JUMPs is often desirable from a CPU processing viewpoint, compiling often results in memory arrays which are not utilized to their maximum capacity. Instead, many memories often have significant blocks in which data may be stored that are never used.
Additionally, while compilers often attempt to create software instructions that flow from one sequence line to a next, in reality, much of today's software code contains JUMPs, conditional branches, loops, and other data flow techniques. Since these software programs often do not naturally flow from one line to the next, system designers generally must also keep track of code locations via address pointers, and various other devices, each of which require additional registers and additional power.
Additionally, currently available CPUs commonly require multiple instructions and processing steps to accomplish some of the simplest tasks, such as adding two operands. For example, currently available CPUs often execute an instruction requiring Operand
1
to be added to Operand
2
by performing the following steps:
1. Fetch ADD instruction from location pointed to by Instruction Pointer (“IP”), and load the instruction into an instruction register;
2. Decode the instruction and store in instruction register;
3. Access a location in memory where a first operand is located, obtain the value for the first operand and store it in a temporary register;
4. Access a second location in memory where a second operand is located, obtain the value for the second operand and store it in a temporary register;
5. Perform the operation specified in the instruction register on the first and second operands by transferring the instruction and the first and second operands from their respective registers to the ALU;
6. Determine where the result of the ALU process is to be stored;
7. Store the results data to the determined location; and
8. Determine the next address for the next instruction, which may require a JUMP to another memory location.
While the above operation may be accomplished extremely quickly for a single mathematical calculation, today's CPUs often are required to process millions of transactions a second. When utilized on this magnitude, the constant reading, storing, addressing, and writing to and from memory via registers may significantly degrade a system's performance.
Therefore, since today's CPU often spend inordinate amounts of time determining from where data and instructions are to be obtained and/or stored, storing the data, processing data, determining where the result of the data processing is to be stored, and then actually storing the result, a system is needed that reduces the amount of time a CPU spends determining where to obtain data and actually fetching the data needed for processing.
Additionally, many of today's systems control numerous input/output devices, all of which are constantly requesting processor time. Each time a processor determines that a different Input/Output (I/O) device or a different processing routine needs to be executed, the processor commonly performs a state change. In a Windows® multi-tasking environment, state changes occur often because the various devices connected to the I/O bus are continuously jostling for the attention of the processors.
As shown in
FIG. 3A
, the process by which many currently available processors perform a state change often requires numerous steps. The state change operation begins at
302
when a processor receives a request to stop processing a first task an

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for substantially registerless processing does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for substantially registerless processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for substantially registerless processing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3216828

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.