Method and apparatus for using smart memories in computing

Electrical computers and digital processing systems: memory – Storage accessing and control – Access timing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S241000, C709S241000

Reexamination Certificate

active

06807614

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a computing system and, more particularly, to a computing system that uses computing processors residing in data storage devices to process data in a highly parallel fashion.
2. Description of the Related Art
A computing system generally includes a Central Processing Unit (CPU), a cache, a main memory, a chip set, and a peripheral. The computing system normally receives data input from the peripheral and supplies the data to the CPU where the data is to be processed. The processed data can then be stored back to the peripheral. The CPU can, for example, be an Arithmetic Logic Unit (ALU), a floating-point processor, a Single-Instruction-Multiple-Data execution (SIMD) unit, or a special functional unit. The peripheral can be a memory peripheral, such as a hard disk drive or any nonvolatile massive data storage device to provide mass data storage, or an I/O peripheral device, such as a printer or graphics sub-system, to provide I/O capabilities. The main memory provides less data storage than the hard drive peripheral but at a faster access time. The cache provides even lesser data storage capability than the main memory, but at a much faster access time. The chip set contains supporting chips for said computing system and, in effect, expands the small number of I/O pins with which the CPU can communicate with many peripherals.
FIG. 1
illustrates a conventional system architecture of a general computing system. In
FIG. 1
, block
10
is a CPU. Block
11
is a cache that has a dedicated high speed bus connecting to CPU for high performance. Block
12
is a chip set to connect CPU with main memory
13
and a fast peripheral
14
such as a graphics subsystem. Block
15
is another chip set to expand the bus, such as RS-232 or parallel port for slower peripherals. Note that the components discussed above are very general building blocks of a computing system. Those skilled in the art understand that a computing system may have different configurations and building blocks beyond these general building blocks.
An execution model indicates how a computing system works.
FIG. 2
illustrates an execution model of a typical scalar computing system. Between a CPU
10
and a hard disk
17
, there are many different levels of data storage devices such as main memory
13
, a cache
11
, and register
16
. The farther the memory devices are positioned from the CPU
10
, the more capacity and the slower speed the memory devices have. The CPU
10
fetches data from the hard disk
17
, processes the data to obtain resulting data, and stores the resulting data into the various intermediate data storage devices, such as the main memory
13
, the cache
11
or the register
16
, depending on how often they will be used and how long they will be used. Each level of storage is a superset of the smaller and faster devices nearer to the CPU
10
. The efficiency of this buffering scheme depends on the temporal and spatial localities. The temporal locality means the data accessed now are very likely to be accessed later. The spatial locality means the data accessed now are very likely to be accessed in the same neighborhood later. In today's technology, the CPU
10
, the register
16
, and two levels of cache
11
are integrated into a monolithic integrated circuit.
FIG. 3
shows an execution model of a vector computer. A vector computer has an array of vector CPUs
210
, an array of vector registers
216
, a main memory
13
, and a hard drive
17
. The size of the vector array is usually a power of 2, such as 16 or 32, for example. The vector CPUs
210
fetch the data from the hard drive
17
through the main memory
13
to the vector registers
216
and then process an array of the data at the same time. Hence, the processing speed by the vector computer can be improved by a factor equal to the size of the array. Note that a vector computer can also have a scalar unit, such as the computer system described in
FIG. 2
, as well as many vector units such as those described in FIG.
3
. Some vector computers also make use of caches.
A vector computer is able to exploit data parallelism to speed up those special applications that can be vectorized. However, vector computers replicate many expensive hardware components such as vector CPUs and vector register files to achieve high performance. Moreover, vector computers require very high data bandwidth in order to support the vector CPUs. The end result is a very expensive, bulky and power hungry computing system.
In recent years, logic has been embedded into memories to provide a special purpose computing system to perform specific processing. Memories that include processing capabilities are sometimes referred to as “smart memory” or intelligent RAM. Research on embedding logic into memories has led to some technical publications, namely: (1) Duncan G, Elliott, “Computational RAM: A Memory-SIMD Hybrid and its Application to DSP,” Custom Integrated Circuit Conference, Session 30.6, 1992, which describes simply a memory chip integrating bit-serial processors without any system architecture considerations; (2) Andreas Schilling et al., “Texram: A Smart Memory for Texturing,” Proceedings of the Sixth International Symposium on High Performance Computer Architecture, IEEE, 1996, which describes a special purpose smart memory for texture mapping used in a graphics subsystem; (3) Stylianos Perissakis et al., “Scalable Processors to 1 Billion Transistors and Beyond: IRAM,” IEEE Computer, September 1997, pp. 75-78, which is simply a highly integrated version of a vector computer without any enhancement in architecture level; (4) Mark Horowitz et al., “Smart Memories: A Modular Configurable Architecture,” International Symposium of Computer Architecture, June 2000, which describes a project to try to integrate general purpose multi-processors and multi-threads on the same integrated circuit chip; and (5) Lewis Tucker, “Architecture and Applications of the Connection Machines,” IEEE Computer, 1988, pp. 26-28, which used massively distributed array processors connected by many processors, memories, and routers among them. The granularity of the memory size, the bit-serial processors, and the I/O capability is so fine that these processors end up spending more time to communicate than to process data.
Accordingly, there is a need for computing systems with improved efficiency and reduced costs as compared to conventional vector computers.
SUMMARY OF THE INVENTION
The invention pertains to a smart memory computing system that uses smart memory for massive data storage as well as for massive parallel execution. The data stored in the smart memory can be accessed just like the conventional main memory, but the smart memory also has many execution units to process data in situ. The smart memory computing system offers improved performance and reduced costs for those programs having massive data-level parallelism. This invention is able to take advantage of data-level parallelism to improve execution speed by, for example, use of inventive aspects such as algorithm mapping, compiler techniques, architecture features, and specialized instruction sets.
The invention can be implemented in numerous ways including, a method, system, device, and computer readable medium. Several embodiments of the invention are discussed below.
As a smart memory computing system to process data in parallel, one embodiment of the invention includes at least: a central processing unit; a main memory unit that provides data storage for the central processing unit; a smart memory unit to not only store data for the central processing unit but also to process data therein; and a massive data storage that provides storage for a superset of data stored in the main memory system and in the smart memory system.
As a smart memory computing system to process data in parallel, another embodiment of the invention includes at least: a central processing unit; a main memory unit that provides data storage for the centra

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for using smart memories in computing does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for using smart memories in computing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for using smart memories in computing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3299097

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.