A POWER EFFICIENT LINEAR EQUATION SOLVER ON A MULTI-FPGA ACCELERATOR

A. Sudarsanam, T. Hauser, A. Dasu, and S. Young

References

  1. [1] S.M. Trimberger, Field-programmable gate array technology. Norwell, MA, USA: Kluwer Academic Publishers, 1994.
  2. [2] P.S. Graham & M.B. Gokhale, Reconfigurable computing:Accelerating computation with field-programmable gate arrays. Springer, Dordrecht, The Netherlands, 1995.
  3. [3] V. Kindratenko & D. Pointer, “A case study in porting aproduction scientific supercomputing application to a reconfigurable computer, FCCM ’06: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Washington, DC, USA: IEEE Computer Society, 2006, 13–22.
  4. [4] S. Alam, P. Agarwal, M. Smith, J. Vetter, & D. Caliga,Using FPGA devices to accelerate biomolecular simulations,computer, 40(3), March 2007, 66–73.
  5. [5] A. Ditkowski, G. Fibich, & N. Gavish, “Efficient Solutionof Ax(k) = b(k)using A−1, Journal of Scientific Computing,32(1), 2007, 29–44.
  6. [6] T. Tierney, G. Dahlquist, A. Bjorck, & N. Anderson, Numerical Methods (New York: Courier Dover Publications, 2003).70
  7. [7] E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. McKenney, J. Du Croz, S. Hammerling, J. Demmel, C. Bischof, & D. Sorensen, “LAPACK: A portable linear algebra library for high-performance computers, Supercomputing’90: Proceedings of the 1990 conference on Supercomputing, Los Alamitos, CA, USA: IEEE Computer Society Press, 1990, 2–11.
  8. [8] T. Hauser & M. Perl, “Design of a low power cluster supercomputer for distributed processing of particle image velocimetry data, Journal of Aerospace Computing, Information, and Communication, 5(11), November 2008, 448–459.
  9. [9] A. El-Amawy, A systolic architecture for fast dense matrix inversion, Computers, IEEE Transactions on, 38(3), March 1989, 449–455.
  10. [10] K. Lau, M. Kumar, & R. Venkatesh, “Parallel matrix inversion techniques, Algorithms and Architectures for Parallel Processing, 1996. ICAPP ’96. 1996 IEEE Second International Conference on Jun, 1996, 515–521.
  11. [11] F. Edman & V. Owall, FPGA implementation of a scalable matrix inversion architecture for triangular matrices, Proceedings of PIMRC, Beijing, China, 2003.
  12. [12] L. Zhuo &V. Prasanna, High-performance designs for linear algebra operations on reconfigurable hardware, Computers, IEEE Transactions on, 57(8), Aug. 2008, 1057–1071.
  13. [13] S. Choi & V.K. Prasanna, Time and energy efficient matrix factorization using FPGAs, Proceedings of Field Programmable Logic, 2003, 507–519.
  14. [14] T. Hauser, A. Dasu, A. Sudarsanam, & S. Young, Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system, Journal of Scalable Computing: Practice and Experience, 8(4), 2007, 373–385.
  15. [15] G. Govindu, V.K. Prasanna, V. Daga, S. Gangadharpalli, & V. Sridhar, “Efficient floating-point based block LU decomposition on FPGAs, Proceedings of the Engineering of Reconfigurable Systems and Algorithms, 2004, 276–279.
  16. [16] X. Wang & S.G. Ziavras, Parallel LU factorization of sparse matrices on FPGA-based configurable computing engines: Research articles, Concurrency and Computation: Practice & Experience, 16(4), 2004, 319–343.
  17. [17] X.-Q. Sheng & E. Kai-Ning Yung, “Implementation and experiments of a hybrid algorithm of the MLFMA-enhancedFE-BI method for open-region inhomogeneous electromagneticproblems, Antennas and Propagation, IEEE Transactions on,50(2), 2 Feb. 2002, pp. 163– 67.
  18. [18] X.S. Li & J. Demmel, “SuperLU DIST: A scalable distributed memory sparse direct solver for unsymmetric linear systems, ACM Transactions on Mathematical Software (TOMS), 29(2), 2003, 110–140.
  19. [19] K. Forsman, W. Gropp, L. Kettunen, D. Levine, & J. Salonen, solution of dense systems of linear equations arising from integral-equation formulations, Antennas and Propagation Magazine, IEEE, 37(6), Dec. 1995, 96–100.
  20. [20] K. Chan, Parallel algorithms for direct solution of large sparse power system matrix equations, Generation, Transmission and Distribution IEE Proceedings-, 148(6), Nov. 2001, 615–622.
  21. [21] K. Balasubramanya Murthy, & C. Siva Ram Murthy, A newparallel algorithm for solving sparse linear systems, Circuits and Systems, 1995. ISCAS ’95., 1995 IEEE International Symposium on, 2, Apr. 3, May 1995, 1416–1419.
  22. [22] Y. fai Fung, W. leung Cheung, M. Singh, & M. Ercan, APC-based parallel LU decomposition Algorithm for sparsematrices, Communications, Computers and signal Processing,2003. PACRIM. 2003 IEEE Pacific Rim Conference on, Vol.2, Aug. 2003, 776–779.
  23. [23] J.Q. Wu & A. Bose, Parallel solution of large sparse matrix equations and parallel power flow, Power Systems, IEEE Transactions on, 10(3), Aug. 1995, 1343–1349.
  24. [24] S. Kratzer, Massively parallel sparse LU factorization, Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the, Oct. 1992, 136–140.
  25. [25] X. Wang & S. Ziavras, A configurable multiprocessor anddynamic load balancing for parallel LU factorization, Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, April 2004, 234–241.
  26. [26] L. Zhuo & V. Prasanna, Scalable hybrid designs for linear algebra on reconfigurable computing systems, Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on, Vol. 1, 12–15 July 2006, 87–95.
  27. [27] X. Wang & S. Ziavras, performance optimization of an FPGA-based configurable multiprocessor for matrix operations, Field-Programmable Technology (FPT), 2003. Proceedings. 2003 IEEE International Conference on, Dec. 2003, 303–306.
  28. [28] G. Govindu, S. Choi, V. Prasanna, V. Daga, S. Gangad-harpalli, & V. Sridhar, A high-performance and energy-efficient architecture for floating-point based LU decomposition on FP-GAs, Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, April 2004, 149–156.
  29. [29] J. Park & P.C. Diniz, Synthesis and estimation of memory interfaces for FPGA-based reconfigurable computing engines, in FCCM ’03: Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.Washington, DC, USA: IEEE Computer Society, 2003, 297.
  30. [30] Z. Liu, K. Zheng, & B. Liu, FPGA implementation of hierarchical memory architecture for network processors, Field-Programmable Technology, 2004. Proceedings. 2004 IEEE International Conference on, Dec. 2004, 295–298.
  31. [31] S. Heithecker, A.d.C. Lucas, & R. Ernst, A Mixed QoS SDRAM Controller for FPGA-based High-end Image Processing, Signal Processing Systems, 2003. SIPS 2003. IEEE Workshop on, Aug. 2003, 322–327.
  32. [32] (2007) Hypercomputers from Starbridge. [Online]. Available: http://www.starbridgesystems.com/products/HypercomputerSpecSheet.pdf.
  33. [33] (2007)Viva: A Graphical Programming Environment. [Online]. Available: http://www.starbridgesystems.com/products/VivaSpecSheet.pdf.
  34. [34] (2007) Xilinx ISE 10.1 Manual. [Online]. Available: http://www.xilinx.com.
  35. [35] K.D. Underwood & K.S. Hemmert, Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance, FCCM ’04: Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. Washington, DC, USA: IEEE Computer Society, 2004, 219–228.
  36. [36] S. Young, A. Sudarsanam, A. Dasu, & T. Hauser, Memorysupport design for LU decomposition on the starbridge hypercomputer, Field Programmable Technology, 2006. FPT 2006. IEEE International Conference on, Dec. 2006, 157–164.
  37. [37] A. Sudarsanam, S. Young, A. Dasu, & T. Hauser, Multi-FPGA based high performance LU decomposition, 10th HighPerformance Embedded Computing (HPEC) workshop, 2006.
  38. [38] S. Kamil, J. Shalf, & E. Strohmaier, Power efficiency in high performance computing, Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, April 2008, 1–8.
  39. [39] J. Williams, A.D. George, J. Richardson, K. Gosrani, & S. Suresh, “Computational density of fixed and reconfigurable multi-core devices for application acceleration, Proceedings of the Fourth Annual Reconfigurable Systems Summer Institute(RSSI’08), 2008.
  40. [40] (2008) Top 500 List. [Online]. Available: http://www.top500.org.
  41. [41] (2007) Intel Math Kernel Library for Linux. [Online]. Available: http://developer.intel.com.

Important Links:

Go Back