CODE OPTIMIZATION OF POLYNOMIAL APPROXIMATION FUNCTIONS ON CLUSTERED INSTRUCTION-LEVEL PARALLELISM PROCESSORS

M. Yang, J. Wang, S.Q. Zheng, and Y. Jiang

References

  1. [1] K. Kailas, K. Ebcioglu, & A. Agrawala, CARS: A new code generation framework for clustered ILP processors, Proc. 7th Int.Symp. on High-Performance Computer Architecture, NuevoLeone, Mexico, 2001, 133–143. doi:10.1109/HPCA.2001.903258
  2. [2] B. Rau & J. Fisher, Instruction-level parallel processing: History, overview, and perspective, Journal of Supercomputing(Special Issues on Instruction-Level Parallelism), 7 (1/2), 1983,9–50.
  3. [3] Texas Instruments, TMS320C6701, Floating-point digital sig-nal processor (online), available at http://focus.ti.com/docs/prod.
  4. [4] Motorola 1995, DSP 56K central architecture overview.377
  5. [5] Analog device, TigerSHARC processor (online), availableat http://www.analog.com/processors/processors/tigersharc/index.html.
  6. [6] D. Das, K. Mukhopadhyaya, & B.P. Sinha, Implementation offour common functions on an LNS co-processor, IEEE Trans.on Computers, 44 (1), 1995, 155–161. doi:10.1109/12.367997
  7. [7] Y.T. Hwang & Y.C. Chuang, High performance code generationfor VLIW digital signal processors, Proc. IEEE Workshopon Signal Processing Systems (SiPS), Lafayette, LA, 2000,683–692.
  8. [8] M. Lam, Software pipelining: An efficient scheduling techniquefor VLIW machines, Proc. SIGPLAN’88 Conf. on ProgrammingLanguage Design and Implementation, Atlanta, GA, 1988,318–328. doi:10.1145/53990.54022
  9. [9] J.L. Hennessy & D.A. Patterson, Computer architecture: Aquantitative approach, 3rd ed. (San Francisco, CA: ElsevierScience & Technology Books, 2002).
  10. [10] M. Schlansker, T.M. Conte, J. Dehnert, K. Ebcioglu, J.Z. Fang,& C.L. Thompson, Compilers for instruction-level parallelism,Computer, 30 (12), 1997, 63–69. doi:10.1109/2.642817
  11. [11] S. Hanono & S. Devadas, Instruction selection, resource allocation and scheduling in the VIV retargetable code generator, Proc. 35th ACM/IEEE Design Automation Conf., SanFrancisco, CA, 1998, 510–515. doi:10.1145/277044.277184
  12. [12] S. Shang, S. Sun, & Q. Wang, An efficient parallel schedulingalgorithm of dependent task graphs, Proc. 4th Int. Conf.on Parrallel and Distributed Computing, Applications andTechnologies, Chengdu, China, 2003, 595–598.
  13. [13] R. Sethi, Algorithms for minimal-length schedules (New York:John Wiley & Sons, 1976), chap. 2.
  14. [14] M. Abramovitz & I.A. Stegun, Handbook of mathematicalfunctions (New York: Dover, 1965).
  15. [15] Engineering fundamentals, Taylor Series (online), available athttp://www.efunda.com.
  16. [16] Texas Instruments, A collection of functions for theTMS320C30, 1990.
  17. [17] Texas Instruments 2001, TMS320C62x/C67x library functionspackage with Code Composer Studio1.2.
  18. [18] T.H. Cormen, C.E. Leiserson, R.L. Rivest, & C. Stein, Instruction to algorithms, 2nd ed. (Cambridge, MA: MIT Press,2001).
  19. [19] M. Yang, Y. Wang, J. Wang, & S.Q. Zheng, Optimizedscheduling and mapping of logarithm and arctangent functionson TI TMS320C67x processor, Proc. IEEE Int. Conf. onAcoustics, Speeach, and Signal Processing (ICASSP), Orlando,FL, 2002, 3156–3159.
  20. [20] Texas Instruments, Code Composer Studio users guide, 2000.

Important Links:

Go Back