Method of constructing a very wide, very fast distributed...

Static information storage and retrieval – Addressing – Particular decoder or driver circuit

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C365S230010, C365S230080, C711S169000

Reexamination Certificate

active

06707754

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of semiconductor memory devices and, more particularly to a very wide, very fast distributed memory and a method of constructing the same.
2. Description of the Related Art
In certain processor-based applications there is a need for a very wide memory having relatively few memory locations that are distributed throughout a single chip. For example, in a single instruction, multiple data (SIMD) massively parallel processor (MPP) array, a very large number of processing elements (PEs) each typically contain a register file. The register file typically contains only a few memory words (e.g., sixty-four) organized as a single column of one word rows. Because all of the PEs execute the same instruction at a time, they all use the same address to access its respective register file. In addition, each register file is read from or written to at essentially the same time. In effect, the distributed register files act as a single, very wide memory device.
It is impractical to implement this very wide memory as a single random access memory (RAM) array core. Such a large memory would be very slow and the routing difficulties associated with connecting thousands of data lines through the chip would be formidable. Therefore, several smaller memory cores are needed, with each core serving a small group of PEs. The use of several smaller memory cores, however, is not without its shortcomings. For instance, the address decoding logic responsible for decoding an address and selecting the appropriate word to be accessed from the memory array has to be repeated for every core, which takes up precious space on the chip.
A normal memory core
10
is illustrated in
FIG. 1. A
decode circuit
12
is positioned to one side of the memory bit array
20
and sense amplifiers and other select logic
30
are positioned beneath the array
20
. Note that the address lines
14
are driven in vertically, along the length of the decoder circuit
12
, to the decode logic
16
within the decode circuit
12
. The address lines
14
are decoded by the decode circuit
12
and converted into a word line number/address corresponding to one of the word lines
18
in the core
10
. A word select signal is then driven across the word line
18
and through the memory array
20
to activate the appropriate word or row of memory within the array
20
.
For a read operation, the activated row couples all of the memory cells corresponding to the word line
18
to respective bit lines
22
, which typically define the columns of the array
20
. It should be noted that a register file typically consists of a single column and that column address decoding is typically not required. For a dynamic random access memory (DRAM), when a particular row is activated, the sense amplifiers
30
connected to the bit lines
22
detect and amplify the data bits transferred from the array
20
by measuring the potential difference between the activated bit lines
22
and a reference line (which may be an inactive bit line). As is known in the art, for a static random access memory (SRAM), the sense amplifier circuitry
30
would not be required. The read operation is completed by outputting the accessed data bits over input/output (I/O) lines
32
.
Since the typical memory core
10
contains the decode circuit
12
and performs the address decode operation as part of the memory access operation (e.g., data read or write), the core
10
has a relatively long access time.
FIG. 2
illustrates an example of a timing diagram for the conventional memory core
10
illustrated in FIG.
1
. For this example it is presumed that the memory core
10
is a SRAM device. The core
10
is driven by a clock signal CLOCK, and the read operation begins at time t
0
and ends at time t
1
. The typical access time t
access
for the conventional memory core
10
includes the time required for the memory core circuitry to properly latch the address signals t
hold
(often referred to as the “hold time”), the time required to decode the address lines t
adec
, the time required to drive the corresponding word line(s) t
wrd
, the time required to drive the bit lines t
bit
, and the time required by the output logic to output the accessed information t
op
. Thus, for the conventional memory core
10
(FIG.
1
), the access time t
access
is calculated as follows:
t
access
=t
hold
+t
adec
+t
wrd
+t
bit
+t
op
.  (1)
It is desirable to reduce the access time t
access
of the memory core so that the core could be used in a very wide, very fast, distributed memory device. It is also desirable to reduce the access time t
access
of the memory core so that the core could be used as a very wide, very fast, distributed register file in a SIMD MPP device.
Accordingly, there is a desire and need for a memory core having a substantially reduced access time so that the core can be implemented in a very wide, very fast, distributed memory device.
SUMMARY OF THE INVENTION
The present invention provides a memory core having a substantially reduced access time.
The present invention also provides a very wide, very fast, distributed memory device.
The present invention also provides a very wide, very fast, distributed register file in a SIMD MPP device.
The above and other features and advantages of the invention are achieved by providing a memory core with an access time that does not include a delay associated with decoding address information. Address decode logic is removed from the memory core and the address decode operation is performed in an addressing pipeline stage that occurs during a clock cycle prior to a clock cycle associated with a memory access operation for the decoded address. After decoding the address in a first pipeline stage, the external decode logic drives word lines connected to the memory core in a subsequent pipeline stage. Since the core is being driven by word lines, the appropriate memory locations are accessed without decoding the address information within the core. Thus, the delay associated with decoding the address information is removed from the access time of the memory core.


REFERENCES:
patent: 5579267 (1996-11-01), Koshikawa
patent: 5638533 (1997-06-01), Law
patent: 5727229 (1998-03-01), Kan et al.
patent: 6061367 (2000-05-01), Siemers
patent: 6084802 (2000-07-01), Shinozaki
patent: 6202139 (2001-03-01), Witt et al.
patent: 6282603 (2001-08-01), Rao
patent: 6427173 (2002-07-01), Boucher et al.
Michael J. Flynn, “Very High-Speed Computing Systems,” Proceedings of the IEEE, vol. 54, No. 12, Dec. 1966, pp. 1901-1909.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of constructing a very wide, very fast distributed... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of constructing a very wide, very fast distributed..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of constructing a very wide, very fast distributed... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3254890

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.