Apparatus and method for instruction cache access

Electrical computers and digital processing systems: processing – Instruction fetching

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Apparatus and method for instruction cache access Apparatus and method for instruction cache access

: 2000-02-03
: 2002-03-19
: Robertson, David L. (Department: 2187)
: Electrical computers and digital processing systems: processing
: Instruction fetching

: C712S219000, C712S225000
: Reexamination Certificate
: active
: 06360310
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to data processing systems, and, more particularly, to the providing of instruction fields to a processing unit. The instruction fields control the processing of the data fields applied to the processing unit. Any delay in receiving the instruction fields can impact the performance of the data processing system.
2. Description of the Related Art
Microprocessor units and systems that use microprocessor units have attained wide-spread use throughout many industries. A goal of any microprocessor system is to process information quickly. One technique that increases the speed with which the microprocessor system processes information is to provide the microprocessor system with an architecture which includes at least one local memory called a cache unit.
A cache unit is used by the microprocessor system to store temporarily instructions and/or data. A cache unit that stores both instructions and data is referred to as a unified cache; a cache unit that stores only instructions is an instruction cache unit and a cache unit that stores only data is a data cache unit. Providing a microprocessor architecture with a unified instruction and data cache or with an instruction cache and a data cache is a matter of design choice. Both data and instructions are represented by data signal groups or fields. In the following discussion, the relationship of the instruction cache unit with the processing unit will be emphasized. Referring to
FIG. 1
, a microprocessing system
10
displaying the components that are important to the present discussion is shown. A microprocessing system
10
includes a processing unit
11
and an instruction cache unit
12
. The processing unit
11
performs operations under control of instructions or instruction fields retrieved from the instruction cache unit
12
. The processing unit
11
and the instruction cache unit
12
are coupled by a bus
13
over which the instruction fields are transferred to the processing unit
11
. The processing unit
11
includes a program counter
111
that determines the instruction cache unit location that is currently being accessed, the accessed locations in the instruction cache unit storing instruction fields required by the processing unit
11
. Thus, the program counter fields determine which instruction cache unit locations are to be accessed. The program counter fields are therefor addresses or address fields for the instruction cache unit
12
. In the following discussion, the term program counter number and program counter address fields will be used interchangeably to mean location values in the program counter unit.
In order to increase the performance of the microprocessor systems in the past, the clock cycle period, the basic unit of time for the operations performed by the microprocessor system, has been decreased. At some point, the individual processing operations could no longer be performed within a single clock cycle. In order to decrease further the clock cycle period, the technique of pipelining the microprocessor system and, in particular, the processing unit, was developed. In pipelining a microprocessor, an operation was divided into a plurality of sub-operations, each sub-operation requiring approximately the same amount of time. Because each sub-operation required less time, the clock cycle period could be further reduced, thereby increasing the performance. This increase in performance is accomplished at the expense of increased complexity of the microprocessor resulting from the partitioning of a single operation into a plurality of sub-operations. As a result of the pipelining procedure, sequence of sub-operations can be completed at the lower clock cycle period, even though the total operation itself requires a longer period of time to be completed.
Referring to
FIG. 2A
, an example of a five stage pipeline for the execution of an instruction by a processing unit is shown. As above, the interaction between the processing unit and the instruction cache unit is emphasized. During clock cycle
1
, an access of the instruction cache unit (labeled IC in
FIG. 2A
) is performed. During clock cycle
2
, the instruction field decode and register file read (RF) operations are executed. During clock cycle
3
, the activity of the execution (EX) pipeline stage is performed. During clock cycle
4
, the data cache access (DC) operation is executed. And during clock cycle
5
, the update register file (UB) operation is executed. As is clear from
FIG. 2A
, each pipeline stage requires one clock cycle to accomplish the operations assigned thereto. These operations are actually sub-operations of activity of the processing unit that was formerly performed in its entirety in one clock cycle. When the processor clock frequency goes up, the cycle time is reduced. Therefore, an execution of an activity of a pipeline can be completed during each of the reduced clock cycle periods. However, the total time to complete the activity of the pipeline is greater than the original time to execute the activity without the pipeline architecture.
Referring to
FIG. 2B
, the typical flow of instruction execution in a five stage pipeline, according to the prior art is shown. For each clock cycle, the implementation of another instruction is begun. At t (clock cycle)=1, instruction I
1
begins execution in the IC pipeline stage. At t (clock cycle)=2, instruction I
1
is being implemented in the RF pipeline stage, while the next instruction I
2
is being executed in the IC pipeline stage. At t (clock cycle)=3, instruction I
1
is being executed in the EX pipeline stage, instruction I
2
is being executed in the RF pipeline stage, and instruction I
3
has begun execution in the IC pipeline stage. The progress of the instructions is illustrated in
FIG. 2B
until at t (clock cycle)=5, instruction I
1
is being executed in the last WB pipeline stage. At t (clock cycle)=6, instruction I
1
has completed execution and is no longer being executed in the processor unit. At t (clock cycle)=j, the instruction Ij is being executed in the first or IC pipeline stage, instruction Ij-
1
is being executed in the RF pipeline stage, instruction Ij-
2
is being executed in the EX pipeline stage, instruction Ij-
3
is being executed in the DC pipeline sage and instruction Ij-
4
is being executed in the WB pipeline stage.
As can be seen from FIG.
2
A and
FIG. 2B
, the pipelined processor can complete the execution of an instruction every clock cycle. The clock cycle time is typically much shorter than the time to execute the instruction in a non-pipelined processor. However, this performance benefit has a performance penalty, the performance penalty being the (5 clock cycle) delay before the first instruction is completed and the completion of the execution for each clock cycle can begin. This delay is typically referred to as the (5 cycle) latency of the pipeline. The latency can provide an obstacle to achieving the full execution performance of the pipelined processing unit.
The subdividing of the processing unit into pipeline stages can increase the performance of the processing unit. However, in each clock cycle, a plurality of operations are performed. For example, referring to
FIG. 3
, during the first pipeline (IC) stage, three separate sub-sub-operations are performed. First, the correct location in the instruction cache unit must be accessed and the instruction field stored therein transferred to the processing unit. Then the processing unit performs a decoding operation on a predecode subfield of the transferred instruction filed. The predecode subfield is an instruction field component assisting in the determination of the next program counter (NPC) address. This program counter address identifies the location of the next instruction field. Thus, this activity must be completed before the beginning of the second (RF) clock cycle, because the next instruction field must be accessed and transferred during the secon

Affiliated with

Au Edmund

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

NEC Electronics Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Robertson David L.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Skjerven Morrill & MacPherson LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Terrile Stephen A.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and method for instruction cache access does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for instruction cache access, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for instruction cache access will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2825843

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure