Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
2000-12-21
2004-06-22
Ngo, Chuong Dinh (Department: 2124)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
C708S210000
Reexamination Certificate
active
06754685
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to bit-shift and population counter (“popcount”) circuits and more particularly to a dynamic popcount/shift circuit.
BACKGROUND OF THE INVENTION
A computer system typically comprises a processor, which is responsible for executing programs supplied by a user or software. Accordingly, computer processors comprise arithmetic, logic, and control circuitry that interpret and execute instructions from a computer program. Referring to
FIG. 1
, a typical computer system includes a microprocessor (
20
) having, among other components, a central processing unit (“CPU”) (
22
), a memory controller (
24
), and on-board, or Level 1, cache memory (
26
). Additionally, the microprocessor (
20
) is connected to external, or Level 2, cache memory (
28
) and a main memory (
30
). Typically, the three units of memory (
26
,
28
,
30
) all hold data and program instructions to be executed by the microprocessor (
20
). Internally, the execution of program instructions is carried out by the CPU (
22
). When instructions call for data operations to be carried out, the data is fetched by the memory controller (
24
) and loaded into internal registers (
32
) within the CPU (
22
). Upon command from the CPU (
22
), the memory controller (
24
) first searches for the needed data in the fast on-board cache (
26
), and if the memory controller (
24
) is unsuccessful locating the data in the on-board cache (
26
), then it searches next in the slower external cache (
28
). In the case that the memory controller (
24
) doesn't find the data in the external cache (
28
), then it must retrieve the data from the main memory (
30
), which is significantly slower than both forms of cache memory (
26
,
28
).
A computer system's memory comprises thousands of sequential storage locations, and each location is identified by a unique address. The memory addresses in a given computer usually range from 0 to a maximum value that depends on the amount of memory the system has installed.
In order for a program to be executed by a computer system, the program must be divided into individual instructions, which are recognizable by a processor. These instructions are first decoded by the processor in order for the processor to determine the type of operation that is needed for executing the instruction. The processor then executes the instruction by performing the operation on the data specified by the instruction. If the data is not already in the CPU, then the CPU, as described above, must retrieve the data via the memory controller. The individual instructions usually do not reside in contiguous memory locations. An instruction address is used to determine where a particular instruction resides in memory. Moreover, the data needed by a CPU to execute the instruction also do not usually reside in contiguous memory locations. Therefore, data, needed by the CPU, is usually referenced by the memory location address where it resides.
Typically, a program defines data by variables so that it can use the variable names instead of the actual data values that the variables represent. When a data value is defined by a variable, the memory system assigns the variable the memory location where the data resides. Hence, in a program, an instruction may reference a data value by its memory location instead of by referencing the variable that represents the data. This is advantageous because it allows a user to directly access a memory address. Because a memory address is a number, instructions often store the memory address in a variable called a pointer. Therefore, it follows that a pointer is a variable that holds the address of another variable.
A pointer can be incremented to reference other memory addresses. The amount by which a pointer increments can be based on a population count (“popcount”) of a sparse vector. A sparse vector is a vector having a relatively small number of nonzero elements. In other words, a sparse vector is a vector in which most of the elements are zero.
Popcount circuitry outputs the number of “1” bits in an input word, e.g., the population count (“popcount”) value of 110111011 is 7 and the popcount value of 0111101011010110 is 10. A popcount is performed on a sparse vector to determine the amount of nonzero elements within the sparse vector. Using the popcount value, a pointer can be incremented by shifting the pointer value by a number of bit positions equal to the amount of the popcount value. For example, if a sparse vector is 8 bits wide, then by performing a popcount on the sparse vector, a pointer can increment by shifting somewhere between 0 and 8 bit positions.
FIG. 2
is a block diagram of a three-stage popcount/shift circuit (
33
) designed to increment an 8-bit pointer (“P”) using an 8-bit sparse vector (“V”). The first stage comprises two parallel 4-bit popcount blocks (
34
,
36
) that both perform 4-bit popcount operations on V. When the two 4-bit popcount blocks (
34
,
36
) complete their respective 4-bit popcount operations, each 4-bit popcount block (
34
,
36
) generates 5 bits that denote the popcount values for the 4-bit popcount operations performed by the two 4-bit popcount blocks (
34
,
36
).
The second stage of the three-stage popcount/shift circuit (
33
) comprises a first 4-bit shift block (
38
) that performs a 4-bit shift operation on 8 bits from P based on the 5 bits generated by the first 4-bit popcount block (
34
) in the first stage. When the first 4-bit shift block (
38
) completes its 4-bit shift operation, the first 4-bit shift block (
38
) generates 12 bits to represent the shifted pointer value according to the 4 bits in V that were popcounted by the first 4-bit popcount block (
34
) in the first stage.
The third stage of the three-stage popcount/shift circuit (
33
) comprises a second 4-bit shift block (
40
) that uses the 12 bits generated by the first 4-bit shift block (
38
) and the 5 bits generated by the second 4-bit popcount block (
36
) to perform a 4-bit shift operation on those 12 bits to generate a 16 bit shifted pointer (“PS”). PS represents the pointer value that results when P is incremented by the population count value of V. Other prior art embodiments may use pointers and sparse vectors that are represented by a different amount of bits than the number of bits used in the preceding description.
SUMMARY OF THE INVENTION
In one aspect, the invention relates to a method for integrating population count operations with bit shift operations. This integration of population count operations with bit shift operations allows for increased efficiency when performing these types of operations.
In another aspect, the invention relates to a method for incrementing a pointer vector by a population count on a sparse vector. Oftentimes, a pointer is increased by the population count of a sparse vector, and this invention allows a pointer to be incrementing by shifting sets of bits from the pointer using population counts on sets of bits from a sparse vector. The vector can be divided into an indefinite amount of sets based on the amount of bits in the pointer and the amount of bits in the sparse vector.
In another aspect, the invention relates to a method for balancing loads at inputs of circuits so that the amount of bits remaining to be population counted at a certain time is reduced. By reducing the amount of bits left to be population counted, the speed of the circuit that performs population count and bit shift operations is increased.
In another aspect, the present invention balances the population count operations and bit shift operations such that they occur in parallel. This leads to a savings in computation time and resources.
In another aspect, the present invention relates to an apparatus that integrates population count circuitry with bit shift circuitry. Through this integration, separate circuitry for each type of operation is not necessary.
In another aspect, the present invention relates to an apparatus that comprises dynamic and static components such that the apparatus is
Ngo Chuong Dinh
Osha & May L.L.P.
Sun Microsystems Inc.
LandOfFree
Dynamic popcount/shift circuit does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Dynamic popcount/shift circuit, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic popcount/shift circuit will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3305346