Computer-aided design and analysis of circuits and semiconductor – Nanotechnology related integrated circuit design
Reexamination Certificate
2008-07-08
2008-07-08
Chiang, Jack (Department: 2825)
Computer-aided design and analysis of circuits and semiconductor
Nanotechnology related integrated circuit design
C716S030000, C711S101000, C711S165000
Reexamination Certificate
active
07398484
ABSTRACT:
A schedule can be generated for physically transposing an array such that when the array is transferred from a first memory type to a second memory type, the number of block transfers performed is minimized. The array can be rearranged to ensure that most or all data elements in any block read into internal memory are used before that block's internal storage is reused. The algorithm can compute an offline schedule and then execute that schedule. The method can assemble the schedule during one or more passes with an algorithm. Scheduling passes can apply a permutation to a representation of the array's structure and then tile the representation to ensure efficient use of internal memory. Tiling may alter the permutation, so the algorithm can be reinvoked and run on the tiled representation. The algorithm can be run on successively retiled representations until no additional tiling is required. Optimizations for schedule generation and/or execution include: simplification of the data structure representing the array; alteration of the order in which tiles are accessed; inversion of permutations; processing non-canonical input; and the production of stream output. The method can also be modified for specialized architectures, including: block I/O architecture; fully associative caches; set-associative caches; and multi-level memory hierarchies.
REFERENCES:
patent: 6438747 (2002-08-01), Schreiber et al.
patent: 6507947 (2003-01-01), Schreiber et al.
patent: 2005/0102572 (2005-05-01), Oberlaender
patent: 2006/0026154 (2006-02-01), Altinel et al.
patent: 2006/0031652 (2006-02-01), Richter et al.
patent: 2006/0069874 (2006-03-01), Desai
patent: 2006/0143390 (2006-06-01), Kottapalli
patent: 2006/0224834 (2006-10-01), O'Connor et al.
Wolfe “High Performance Compilers for Parallel Computing,” pp. 137-165, 307-366, 367-384 and 514-527, Addison-Wesley, Redwood City, CA, 1996.
Aggarwal and Vitter, “The input/output complexity of sorting and related problems,”Communications of the ACM31(9):1116-1127, Sep. 1998.
Chatterjee et al., “Recursive array layouts and fast parallel matrix multiplication,” InACM Symposium on Parallel Algorithms and Architectures, pp. 222-231, 1999.
Cormen “Virtual Memory for Data-Parallel Computing,” PhD Thesis, Massachusetts Institute of Technology, 1992, mIT/LCS/TR-559.
Cormen “Fast permuting on disk arrays,”Journal of Parallel and Distributed Computing17(1-2):41-57, 1993.
Chatterjee and Sen, “Cache-efficient matrix transposition.” InHPCA, pp. 195-205, 2000.
Frigo et al., “Cache-oblivious algorithms,” InProceedings of the 40thAnnual Symposium on Foundations of Computer Science, p. 285, 1999.
Fraser “Array permutation by index-digit permutation,”Journal of the ACM, 23(2):298-309, 1976.
Jouppi “Cache write policies and performance,” Technical Report 91/12, DEC WRL, 1991.
Kaushik et al., “Efficient transposition algorithms for large matrices.” InProceedings of the 1993 ACM/IEEE Conference on Supercomputing, pp. 656-665, 1993.
Lin et al. “Efficient representation scheme for multidimensional array operations,”IEEE Transactions on Computers, 51(3):327-345, 2001.
Lam et al., “The cache performand and optimization of blocked algorithms,” InProceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 63-74, 1991.
Muchnick “Advanced Complier Design and Implementation,” Morgan Kaufmann, San Francisco, 1997.
Sen et al. “Towards a theory of cache-efficient algorithms,”Journal of the ACM, 9(6):828-858, 2002.
Swarztrauber “Transposing large arrays in extended memory,” pp. 283-288, 1988.
Wolf and Lam, “A data locality optimizing algorithm,” InProceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pp. 30-44, 1991.
Cormen et al., “Asymptotically tight bounds for performing BMMC permutations on parallel disk systems,”SIAM Journal on Computing, 28(1):105-136, 1998.
Chiang Jack
Klarquist & Sparkman, LLP
Memula Suresh
Microsoft Corporation
LandOfFree
Memory efficient array transposition via multi pass tiling does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Memory efficient array transposition via multi pass tiling, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Memory efficient array transposition via multi pass tiling will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2806207