A. Sudarsanam, T. Hauser, A. Dasu, and S. Young
[1] S.M. Trimberger, Field-programmable gate array technology. Norwell, MA, USA: Kluwer Academic Publishers, 1994. [2] P.S. Graham & M.B. Gokhale, Reconfigurable computing:Accelerating computation with field-programmable gate arrays. Springer, Dordrecht, The Netherlands, 1995. [3] V. Kindratenko & D. Pointer, “A case study in porting aproduction scientific supercomputing application to a reconfigurable computer, FCCM ’06: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Washington, DC, USA: IEEE Computer Society, 2006, 13–22. [4] S. Alam, P. Agarwal, M. Smith, J. Vetter, & D. Caliga,Using FPGA devices to accelerate biomolecular simulations,computer, 40(3), March 2007, 66–73. [5] A. Ditkowski, G. Fibich, & N. Gavish, “Efficient Solutionof Ax(k) = b(k)using A−1, Journal of Scientific Computing,32(1), 2007, 29–44. [6] T. Tierney, G. Dahlquist, A. Bjorck, & N. Anderson, Numerical Methods (New York: Courier Dover Publications, 2003).70 [7] E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. McKenney, J. Du Croz, S. Hammerling, J. Demmel, C. Bischof, & D. Sorensen, “LAPACK: A portable linear algebra library for high-performance computers, Supercomputing’90: Proceedings of the 1990 conference on Supercomputing, Los Alamitos, CA, USA: IEEE Computer Society Press, 1990, 2–11. [8] T. Hauser & M. Perl, “Design of a low power cluster supercomputer for distributed processing of particle image velocimetry data, Journal of Aerospace Computing, Information, and Communication, 5(11), November 2008, 448–459. [9] A. El-Amawy, A systolic architecture for fast dense matrix inversion, Computers, IEEE Transactions on, 38(3), March 1989, 449–455. [10] K. Lau, M. Kumar, & R. Venkatesh, “Parallel matrix inversion techniques, Algorithms and Architectures for Parallel Processing, 1996. ICAPP ’96. 1996 IEEE Second International Conference on Jun, 1996, 515–521. [11] F. Edman & V. Owall, FPGA implementation of a scalable matrix inversion architecture for triangular matrices, Proceedings of PIMRC, Beijing, China, 2003. [12] L. Zhuo &V. Prasanna, High-performance designs for linear algebra operations on reconfigurable hardware, Computers, IEEE Transactions on, 57(8), Aug. 2008, 1057–1071. [13] S. Choi & V.K. Prasanna, Time and energy efficient matrix factorization using FPGAs, Proceedings of Field Programmable Logic, 2003, 507–519. [14] T. Hauser, A. Dasu, A. Sudarsanam, & S. Young, Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system, Journal of Scalable Computing: Practice and Experience, 8(4), 2007, 373–385. [15] G. Govindu, V.K. Prasanna, V. Daga, S. Gangadharpalli, & V. Sridhar, “Efficient floating-point based block LU decomposition on FPGAs, Proceedings of the Engineering of Reconfigurable Systems and Algorithms, 2004, 276–279. [16] X. Wang & S.G. Ziavras, Parallel LU factorization of sparse matrices on FPGA-based configurable computing engines: Research articles, Concurrency and Computation: Practice & Experience, 16(4), 2004, 319–343. [17] X.-Q. Sheng & E. Kai-Ning Yung, “Implementation and experiments of a hybrid algorithm of the MLFMA-enhancedFE-BI method for open-region inhomogeneous electromagneticproblems, Antennas and Propagation, IEEE Transactions on,50(2), 2 Feb. 2002, pp. 163– 67. [18] X.S. Li & J. Demmel, “SuperLU DIST: A scalable distributed memory sparse direct solver for unsymmetric linear systems, ACM Transactions on Mathematical Software (TOMS), 29(2), 2003, 110–140. [19] K. Forsman, W. Gropp, L. Kettunen, D. Levine, & J. Salonen, solution of dense systems of linear equations arising from integral-equation formulations, Antennas and Propagation Magazine, IEEE, 37(6), Dec. 1995, 96–100. [20] K. Chan, Parallel algorithms for direct solution of large sparse power system matrix equations, Generation, Transmission and Distribution IEE Proceedings-, 148(6), Nov. 2001, 615–622. [21] K. Balasubramanya Murthy, & C. Siva Ram Murthy, A newparallel algorithm for solving sparse linear systems, Circuits and Systems, 1995. ISCAS ’95., 1995 IEEE International Symposium on, 2, Apr. 3, May 1995, 1416–1419. [22] Y. fai Fung, W. leung Cheung, M. Singh, & M. Ercan, APC-based parallel LU decomposition Algorithm for sparsematrices, Communications, Computers and signal Processing,2003. PACRIM. 2003 IEEE Pacific Rim Conference on, Vol.2, Aug. 2003, 776–779. [23] J.Q. Wu & A. Bose, Parallel solution of large sparse matrix equations and parallel power flow, Power Systems, IEEE Transactions on, 10(3), Aug. 1995, 1343–1349. [24] S. Kratzer, Massively parallel sparse LU factorization, Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the, Oct. 1992, 136–140. [25] X. Wang & S. Ziavras, A configurable multiprocessor anddynamic load balancing for parallel LU factorization, Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, April 2004, 234–241. [26] L. Zhuo & V. Prasanna, Scalable hybrid designs for linear algebra on reconfigurable computing systems, Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on, Vol. 1, 12–15 July 2006, 87–95. [27] X. Wang & S. Ziavras, performance optimization of an FPGA-based configurable multiprocessor for matrix operations, Field-Programmable Technology (FPT), 2003. Proceedings. 2003 IEEE International Conference on, Dec. 2003, 303–306. [28] G. Govindu, S. Choi, V. Prasanna, V. Daga, S. Gangad-harpalli, & V. Sridhar, A high-performance and energy-efficient architecture for floating-point based LU decomposition on FP-GAs, Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, April 2004, 149–156. [29] J. Park & P.C. Diniz, Synthesis and estimation of memory interfaces for FPGA-based reconfigurable computing engines, in FCCM ’03: Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.Washington, DC, USA: IEEE Computer Society, 2003, 297. [30] Z. Liu, K. Zheng, & B. Liu, FPGA implementation of hierarchical memory architecture for network processors, Field-Programmable Technology, 2004. Proceedings. 2004 IEEE International Conference on, Dec. 2004, 295–298. [31] S. Heithecker, A.d.C. Lucas, & R. Ernst, A Mixed QoS SDRAM Controller for FPGA-based High-end Image Processing, Signal Processing Systems, 2003. SIPS 2003. IEEE Workshop on, Aug. 2003, 322–327. [32] (2007) Hypercomputers from Starbridge. [Online]. Available: http://www.starbridgesystems.com/products/HypercomputerSpecSheet.pdf. [33] (2007)Viva: A Graphical Programming Environment. [Online]. Available: http://www.starbridgesystems.com/products/VivaSpecSheet.pdf. [34] (2007) Xilinx ISE 10.1 Manual. [Online]. Available: http://www.xilinx.com. [35] K.D. Underwood & K.S. Hemmert, Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance, FCCM ’04: Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. Washington, DC, USA: IEEE Computer Society, 2004, 219–228. [36] S. Young, A. Sudarsanam, A. Dasu, & T. Hauser, Memorysupport design for LU decomposition on the starbridge hypercomputer, Field Programmable Technology, 2006. FPT 2006. IEEE International Conference on, Dec. 2006, 157–164. [37] A. Sudarsanam, S. Young, A. Dasu, & T. Hauser, Multi-FPGA based high performance LU decomposition, 10th HighPerformance Embedded Computing (HPEC) workshop, 2006. [38] S. Kamil, J. Shalf, & E. Strohmaier, Power efficiency in high performance computing, Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, April 2008, 1–8. [39] J. Williams, A.D. George, J. Richardson, K. Gosrani, & S. Suresh, “Computational density of fixed and reconfigurable multi-core devices for application acceleration, Proceedings of the Fourth Annual Reconfigurable Systems Summer Institute(RSSI’08), 2008. [40] (2008) Top 500 List. [Online]. Available: http://www.top500.org. [41] (2007) Intel Math Kernel Library for Linux. [Online]. Available: http://developer.intel.com.
Important Links:
Go Back