A POWER EFFICIENT LINEAR EQUATION SOLVER ON A MULTI-FPGA ACCELERATOR

doi:10.2316/Journal.202.2010.1.202-2478

A POWER EFFICIENT LINEAR EQUATION SOLVER ON A MULTI-FPGA ACCELERATOR

A. Sudarsanam, T. Hauser, A. Dasu, and S. Young

References

[1] S.M. Trimberger, Field-programmable gate array technology. Norwell, MA, USA: Kluwer Academic Publishers, 1994.
[2] P.S. Graham & M.B. Gokhale, Reconﬁgurable computing:Accelerating computation with ﬁeld-programmable gate arrays. Springer, Dordrecht, The Netherlands, 1995.
[3] V. Kindratenko & D. Pointer, “A case study in porting aproduction scientiﬁc supercomputing application to a reconﬁgurable computer, FCCM ’06: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Washington, DC, USA: IEEE Computer Society, 2006, 13–22.
[4] S. Alam, P. Agarwal, M. Smith, J. Vetter, & D. Caliga,Using FPGA devices to accelerate biomolecular simulations,computer, 40(3), March 2007, 66–73.
[5] A. Ditkowski, G. Fibich, & N. Gavish, “Eﬃcient Solutionof Ax(k) = b(k)using A−1, Journal of Scientiﬁc Computing,32(1), 2007, 29–44.
[6] T. Tierney, G. Dahlquist, A. Bjorck, & N. Anderson, Numerical Methods (New York: Courier Dover Publications, 2003).70
[7] E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. McKenney, J. Du Croz, S. Hammerling, J. Demmel, C. Bischof, & D. Sorensen, “LAPACK: A portable linear algebra library for high-performance computers, Supercomputing’90: Proceedings of the 1990 conference on Supercomputing, Los Alamitos, CA, USA: IEEE Computer Society Press, 1990, 2–11.
[8] T. Hauser & M. Perl, “Design of a low power cluster supercomputer for distributed processing of particle image velocimetry data, Journal of Aerospace Computing, Information, and Communication, 5(11), November 2008, 448–459.
[9] A. El-Amawy, A systolic architecture for fast dense matrix inversion, Computers, IEEE Transactions on, 38(3), March 1989, 449–455.
[10] K. Lau, M. Kumar, & R. Venkatesh, “Parallel matrix inversion techniques, Algorithms and Architectures for Parallel Processing, 1996. ICAPP ’96. 1996 IEEE Second International Conference on Jun, 1996, 515–521.
[11] F. Edman & V. Owall, FPGA implementation of a scalable matrix inversion architecture for triangular matrices, Proceedings of PIMRC, Beijing, China, 2003.
[12] L. Zhuo &V. Prasanna, High-performance designs for linear algebra operations on reconﬁgurable hardware, Computers, IEEE Transactions on, 57(8), Aug. 2008, 1057–1071.
[13] S. Choi & V.K. Prasanna, Time and energy eﬃcient matrix factorization using FPGAs, Proceedings of Field Programmable Logic, 2003, 507–519.
[14] T. Hauser, A. Dasu, A. Sudarsanam, & S. Young, Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system, Journal of Scalable Computing: Practice and Experience, 8(4), 2007, 373–385.
[15] G. Govindu, V.K. Prasanna, V. Daga, S. Gangadharpalli, & V. Sridhar, “Eﬃcient ﬂoating-point based block LU decomposition on FPGAs, Proceedings of the Engineering of Reconﬁgurable Systems and Algorithms, 2004, 276–279.
[16] X. Wang & S.G. Ziavras, Parallel LU factorization of sparse matrices on FPGA-based conﬁgurable computing engines: Research articles, Concurrency and Computation: Practice & Experience, 16(4), 2004, 319–343.
[17] X.-Q. Sheng & E. Kai-Ning Yung, “Implementation and experiments of a hybrid algorithm of the MLFMA-enhancedFE-BI method for open-region inhomogeneous electromagneticproblems, Antennas and Propagation, IEEE Transactions on,50(2), 2 Feb. 2002, pp. 163– 67.
[18] X.S. Li & J. Demmel, “SuperLU DIST: A scalable distributed memory sparse direct solver for unsymmetric linear systems, ACM Transactions on Mathematical Software (TOMS), 29(2), 2003, 110–140.
[19] K. Forsman, W. Gropp, L. Kettunen, D. Levine, & J. Salonen, solution of dense systems of linear equations arising from integral-equation formulations, Antennas and Propagation Magazine, IEEE, 37(6), Dec. 1995, 96–100.
[20] K. Chan, Parallel algorithms for direct solution of large sparse power system matrix equations, Generation, Transmission and Distribution IEE Proceedings-, 148(6), Nov. 2001, 615–622.
[21] K. Balasubramanya Murthy, & C. Siva Ram Murthy, A newparallel algorithm for solving sparse linear systems, Circuits and Systems, 1995. ISCAS ’95., 1995 IEEE International Symposium on, 2, Apr. 3, May 1995, 1416–1419.
[22] Y. fai Fung, W. leung Cheung, M. Singh, & M. Ercan, APC-based parallel LU decomposition Algorithm for sparsematrices, Communications, Computers and signal Processing,2003. PACRIM. 2003 IEEE Paciﬁc Rim Conference on, Vol.2, Aug. 2003, 776–779.
[23] J.Q. Wu & A. Bose, Parallel solution of large sparse matrix equations and parallel power ﬂow, Power Systems, IEEE Transactions on, 10(3), Aug. 1995, 1343–1349.
[24] S. Kratzer, Massively parallel sparse LU factorization, Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the, Oct. 1992, 136–140.
[25] X. Wang & S. Ziavras, A conﬁgurable multiprocessor anddynamic load balancing for parallel LU factorization, Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, April 2004, 234–241.
[26] L. Zhuo & V. Prasanna, Scalable hybrid designs for linear algebra on reconﬁgurable computing systems, Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on, Vol. 1, 12–15 July 2006, 87–95.
[27] X. Wang & S. Ziavras, performance optimization of an FPGA-based conﬁgurable multiprocessor for matrix operations, Field-Programmable Technology (FPT), 2003. Proceedings. 2003 IEEE International Conference on, Dec. 2003, 303–306.
[28] G. Govindu, S. Choi, V. Prasanna, V. Daga, S. Gangad-harpalli, & V. Sridhar, A high-performance and energy-eﬃcient architecture for ﬂoating-point based LU decomposition on FP-GAs, Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, April 2004, 149–156.
[29] J. Park & P.C. Diniz, Synthesis and estimation of memory interfaces for FPGA-based reconﬁgurable computing engines, in FCCM ’03: Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.Washington, DC, USA: IEEE Computer Society, 2003, 297.
[30] Z. Liu, K. Zheng, & B. Liu, FPGA implementation of hierarchical memory architecture for network processors, Field-Programmable Technology, 2004. Proceedings. 2004 IEEE International Conference on, Dec. 2004, 295–298.
[31] S. Heithecker, A.d.C. Lucas, & R. Ernst, A Mixed QoS SDRAM Controller for FPGA-based High-end Image Processing, Signal Processing Systems, 2003. SIPS 2003. IEEE Workshop on, Aug. 2003, 322–327.
[32] (2007) Hypercomputers from Starbridge. [Online]. Available: http://www.starbridgesystems.com/products/HypercomputerSpecSheet.pdf.
[33] (2007)Viva: A Graphical Programming Environment. [Online]. Available: http://www.starbridgesystems.com/products/VivaSpecSheet.pdf.
[34] (2007) Xilinx ISE 10.1 Manual. [Online]. Available: http://www.xilinx.com.
[35] K.D. Underwood & K.S. Hemmert, Closing the gap: CPU and FPGA trends in sustainable ﬂoating-point BLAS performance, FCCM ’04: Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. Washington, DC, USA: IEEE Computer Society, 2004, 219–228.
[36] S. Young, A. Sudarsanam, A. Dasu, & T. Hauser, Memorysupport design for LU decomposition on the starbridge hypercomputer, Field Programmable Technology, 2006. FPT 2006. IEEE International Conference on, Dec. 2006, 157–164.
[37] A. Sudarsanam, S. Young, A. Dasu, & T. Hauser, Multi-FPGA based high performance LU decomposition, 10th HighPerformance Embedded Computing (HPEC) workshop, 2006.
[38] S. Kamil, J. Shalf, & E. Strohmaier, Power eﬃciency in high performance computing, Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, April 2008, 1–8.
[39] J. Williams, A.D. George, J. Richardson, K. Gosrani, & S. Suresh, “Computational density of ﬁxed and reconﬁgurable multi-core devices for application acceleration, Proceedings of the Fourth Annual Reconﬁgurable Systems Summer Institute(RSSI’08), 2008.
[40] (2008) Top 500 List. [Online]. Available: http://www.top500.org.
[41] (2007) Intel Math Kernel Library for Linux. [Online]. Available: http://developer.intel.com.

Important Links:

Abstract
DOI: 10.2316/Journal.202.2010.1.202-2478
From Journal (202) International Journal of Computers and Applications - 2010

Go Back