Method and apparatus for improving dispersal performance in...

Electrical computers and digital processing systems: processing – Instruction issuing – Simultaneous issuance of multiple instructions

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method and apparatus for improving dispersal performance in... Method and apparatus for improving dispersal performance in...

: 2000-12-29
: 2004-04-13
: Treat, William M. (Department: 2183)
: Electrical computers and digital processing systems: processing
: Instruction issuing
: Simultaneous issuance of multiple instructions

: C712S214000, C712S023000, C712S024000
: Reexamination Certificate
: active
: 06721873
: ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention pertains to a method and apparatus for improving processor performance. More particularly, the present invention pertains to improving processor performance through special handling of no-op instructions.
As is known in the art, a processor includes a variety of sub-modules, each adapted to carry out specific tasks. In one known processor, these sub-modules include the following: an instruction cache; an instruction fetch unit for fetching appropriate instructions from the instruction cache; an instruction (or decoupling) buffer that holds the fetched instructions; dispersal logic that schedules the dispersal of individual instructions into separate execution pipes; execution pipes that execute the individual instructions; and an exception and retirement logic that checks for illegal operations and retires the exception-free instructions in program order. The components used for fetching instructions are often referred to as the “front end” of the processor. The components used for the execution of instructions are often referred to as the “back end” of the processor.
Programming code to be executed by the processor can sometimes be broken down into smaller components referred to as “threads.” A thread is a set of instructions whose execution achieves a given task.
In certain known architectures such as VLIW (Very Long Instruction Word) and EPIC (Explicitly Parallel Instruction Computing), instructions are packaged together into bundles. For example, in these architectures, several different templates may be used that include a plurality of instructions to be executed. Such instruction handling is provided in the IA 64 (Intel Architecture—64-bits) architecture processors manufactured by Intel Corporation, Santa Clara, Calif. Each individual instruction may be referred to as a “syllable.” In these processors two such bundles may be provided at a time from the decoupling buffer to the dispersal logic. In multithreaded processors, two or more decoupling buffers may be provided, corresponding to the number of simultaneous threads the processor can handle. Each decoupling buffer provides two bundles at a time to the dispersal logic. The two bundles would consist of six syllables, or six instructions. Depending on the type of instruction performed, each syllable is broadly categorized into one of four execution unit types: Integer (I), Memory (M), Floating-point (F), and Branch (B). The syllables from a bundle are each assigned to different execution ports. The execution ports are designed to specifically handle one or more types of the four execution unit types previously mentioned. For example, a Memory port (“M port”) can handle Memory operations (“M-ops”) or Integer operations (“I-ops”). In common processor implementations, there are a fixed number of execution ports, consisting of, for example, 4 M ports, 2 I ports, 2 F ports, and 3 B ports.
The templates that are provided for the bundles include MFI, MII, MFB, MMI, etc. For example, the MFI template includes one Memory operation, one Floating-point operation, and one Integer operation.
As is known in the art, an instruction may be provided to the execution unit that results in no significant task performance for the processor system. For example, in the Intel® x86 processor systems, a NOP (No operation) instruction causes the execution to take no action for an “instruction cycle.” An instruction cycle as used herein is a set number of processor clock cycles that are needed for the processor to execute an instruction. Because of the templates that are used, the compiler is forced to pad a bundle with a NOP instruction when no other instruction can be issued in parallel due to dependencies. Although these NOP instructions do not perform useful work, each NOP needs to be executed to maintain precise exception handling. For example, if the next integer operation is to take place after two memory operations, the MMI template may be used, but without an Integer operation. Instead, a NOP instruction is inserted in the MMI template for the Integer operation.
When a NOP instruction is one of the syllables in a bundle, the NOP is also assigned to an available execution port. NOPs are type specific (i.e. NOP.i, NOP.m, NOP.f, or NOP.b), so they must be executed on the specified port (I, M, F or B). The NOP uses an execution port, but does not utilize the task performance capabilities of the execution port. Thus, the NOP may take up execution resources that would otherwise be available for other useful operations, such as those from a different thread in an SMT implementation. As a result, in a simultaneous multithreaded processor, all bundles may not be issued in one cycle because a NOP would have taken an execution port that a true operation would have utilized. NOPs can make up a significant percentage of programming code. Using current compilers, they can account for approximately 30% of the code density in service systems (e.g. transaction processing systems). This inherently limits the performance of a processor by having numerous NOPS, that may require little or no execution, occupying the limited execution resources.
In view of the above, there is a need for a method and apparatus for improving dispersal performance.

REFERENCES:
patent: 6195756 (2001-02-01), Hurd
patent: 6442701 (2002-08-01), Hurd
patent: 6564316 (2003-05-01), Perets et al.

Affiliated with

Kottapalli Sailesh

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Sit Kinkee

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Sun Andrew

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Walterscheidt Udo

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Yeh Thomas

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Intel Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kenyon & Kenyon

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Treat William M.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for improving dispersal performance in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for improving dispersal performance in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for improving dispersal performance in... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3198578

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure