Electrical computers and digital processing systems: memory – Storage accessing and control – Control technique
Reexamination Certificate
1999-03-25
2001-08-21
Peikari, B. James (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Control technique
C365S220000
Reexamination Certificate
active
06279088
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to computer structures, and in particular to a parallel processing memory chip containing single instruction, multiple data path processors.
DESCRIPTION OF THE PRIOR ART
In conventional Von Neumann computer architectures, the speed of the processor is often restricted by the bandwidth of the interconnecting data bus, which is typically 8 to 64 bits in word width. In order to increase the speed of computers restricted by such constraints, parallel computer architectures have been designed, for example, those described briefly below.
In a structure called The Connection Machine, 64K processors are used with 4K bits of memory allocated to each processor. The memory permits two read functions and a write function in one processor cycle to support three operand instructions. The Connection Machine integrated circuit chip contains 16 processors and a hypercube routing node. A high performance interconnect network is a major feature of the architecture. The peak performance of the connection machine is about 1,000 MIPS, using a 32 bit addition function as a reference. A description of The Connection Machine may be found in Scientific American article “Trends in Computers”, by W. Daniel Hillis, Special Issue/Vol. 1, page 24ff.
A structure referred to as the Massively Parallel Processor (MPP) constructed by Goodyear Aerospace contains several 128×128 processor planes. The MPP was designed to process Landsat images; it makes heavy use of its two dimensional grid connectivity. Processors are packaged eight to a chip.
The ICL Distributed Array Processor was designed to be an active memory module for an ICL type 29000 mainframe. Its first implementation was a 32×32 grid built from MSI TTL components. A CMOS version has since been made containing 16 processors. Each 1 bit processor consists of a full adder, a multiplexer to select data from neighbors, and three registers.
A computer MP-1 is described by MasPar Computer Corporation in preliminary product literature, the product being formed of chips containing 32 processors which will be assembled into machines with 1K-16K processors. The machine utilizes two instruction streams. Each processing element can elect to obey either of the streams, so both halves of an if-then-else statement can be concurrently followed without nesting.
NCR Corporation has produced a chip containing 6×12 serial processors which is called the Geometric Arithmetic Parallel Processor (GAPP). Each processor can communicate with its four nearest neighbors on its two dimensional grid and with a private 128 bit memory. The processing elements operate on instructions with five fields. Due to their complexity, these processing elements take up slightly more than half the chip. It has been found that yields are low and the cost is expensive.
In an article entitled “Building a 512×512 Pixel-Planes System” in Advanced Research in FLSI—Proceedings of the 1987 Stanford Conference, pages 57-71, 1987, by John Poulton et al, a pixel planes machine is described which integrates processing elements with memory. The machine was designed for computer graphics rendering. The pixel planes machine is connected to a host processor via a DMA channel. It is noted that for many operations, data transfer between the host and pixel planes machine dominate the execution time.
SUMMARY OF THE INVENTION
In the aforenoted structures, while each uses plural processors, separate memory is accessed by the processors. Locating memory on different chips than the processor elements limits the degree of integration. The data path between the memory chips and the processors limits the bandwidth available at the sense amplifiers. In contrast, in an embodiment of the present invention, one processing element per sense amplifier can be achieved, the processing elements carrying out the same instruction on all bits of a memory row in parallel. Therefore an entire memory row (e.g. word) at a time can be read and processed in a minimum time, maximizing the parallel processing throughput to virtually the maximum bandwidth capacity of the memory.
While in prior art structures an entire memory row is addressed during each operation, typically only one bit at a time is operated on. The present invention exploits the unused memory bandwidth by operating on all bits in the entire row in parallel. Further, the memory is the same memory accessed by the main computer processor, and not special memory used for the parallel processing elements as in the prior art.
By locating the processors on the same chip as the memory, the present invention exploits the extremely wide data path and high data bandwidth available at the sense amplifiers.
In one embodiment of the present invention, integrated into the memory chip is one processing element per sense amplifier. The memory is preferred to be the main computer memory, accessible by the central processing unit.
Alternatively, each processor element can be connected to more than one sense amplifier. When sense amplifiers belong to different arrays (or “cores”) of memory, some of those cores need not perform a memory cycle, thereby reducing sensing power draw from a power supply.
In the prior art each parallel processor has its own memory, and the processors must communicate with each other, slowing down communication and being limited by inter-processor bus word length. In the present invention the main memory is used directly and may be accessed by a conventional single microprocessor at the same rate as conventional memories. Yet virtually the maximum bandwidth of the memory can be utilized using the parallel on-chip processing elements.
It should be noted that in the aforenoted NCR GAPP device, processors are located on the same chip as the memory. However because of the size of the processors, each processor communicates with 8 sense amplifiers, and requires extensive multiplexing. This slows the chip down because the maximum bandwidth of the memory cannot be utilized. In order to minimize the number of sense amplifiers dealt with by a single processor, the structure is limited to use with static memory cells, since the static memory cells are considerably wider in pitch than dynamic memory cells. Still, a very large number of sense amplifiers must be multiplexed to each processor element. Due to the smaller sense amplifier pitch required in a prior art DRAM chip, processors have not been put into a DRAM chip.
The present invention utilizes an unique form of processing element, based on a dynamic multiplexer, which we have found can be made substantially narrower in pitch than previous processing elements, such that the number of sense amplifiers per processing element can be reduced to 1, for static random access memories, and to 4 or fewer for dynamic random access memories. For the 1:1 ratio no multiplexing is required, and therefore in 1 memory cycle, with a single instruction given to all the processing element, all the bits of a row can be read, processed and written back to memory in parallel. For the larger ratio multiplexing is required of processing elements to sense amplifiers, but for the first time dynamic random access memories can have processing elements on the same chip, and can have a substantially increased number of parallel processing elements. For the dynamic memory, a typical ratio of processing elements to sense amplifiers would be 8:1 or 4:1, although as close to 1:1 as possible is preferred. The bandwidth of the processor to memory interface is thereby substantially increased, enormously increasing the processing speed.
Further, the invention allows direct memory access of the same memory having the on-chip processors by a remote processor. This renders the memory to be even more versatile, allowing flexibility in programming and applications.
In accordance with another embodiment of the invention, a novel simultaneous bidirectional buffer is described, which can logically connect two buses and actively drive the signal in either direction, either into or out from each processin
Elliott Duncan G.
Snelgrove W. Martin
Baker Harold C.
Hendry Robert G.
Mosaid Technologies Incorporated
Peikari B. James
Wilkes Robert A.
LandOfFree
Memory device with multiple processors having parallel... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Memory device with multiple processors having parallel..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Memory device with multiple processors having parallel... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2503753