A. Hossain,∗ D. Pease,∗∗ and A. El Kateeb∗


  1. [1] E. Rotenberg & S. Bennett, A trace cache microarchitecture and evaluation, IEEE Transactions on Computers, 48(2), February 1999. doi:10.1109/12.752652
  2. [2] S.J. Patel, Trace Cache design for wide-issue superscalar processors, Ph.D. dissertation, Department of Computer Science and Engineering, University of Michigan, 1999.
  3. [3] E. Rotenberg, et al., Trace processors, Proc. 30th IEEE/ACM International Symposium on Microarchitecture, 1997.
  4. [4] Q. Jacobson, E. Rotenberg, & J. Smith, Path-based next trace prediction, Proc. 30th International Symposium on Microarchitecture, December 1997.
  5. [5] T. Sato, Evaluating trace cache on moderate-scale processors, IEE Proceedings on Computers and Digital Techniques, 147(6), November 2000, 369–374. doi:10.1049/ip-cdt:20000889
  6. [6] T. Conte, K. Menenzes, P. Mills, & B. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proc. 22nd International Symposium on Computer Architecture, June 1995.
  7. [7] J.A. Fisher, Trace scheduling: A technique for global microcode compaction, IEEE Transactions on Computers, C-30(7), July 1981.
  8. [8] R.E. Hank, S.A. Mahlke, R.A. Bringmann, J.C. Gyllenhaal, & W.W. Hwu, Superblock formation using static program analysis, Proc. 26th Annual ACM/IEEE International Symposium on Microarchitecture, 1993.
  9. [9] M. Mahlke, D.C. Lin, W.Y. Chen, R.E. Hank et al., Effective compiler support for predicted execution using hyperblocks, Proc. 25th Annual Symposium on Microarchitecture, 1992.
  10. [10] S. Wallace & N. Bagherzadeh, Modeled and measured instruction fetching performance for superscalar microprocessors, IEEE Transactions on Parallel and Distributed Systems, 9(6), June 1998. doi:10.1109/71.689444
  11. [11] A. Seznec, S. Jourdan, P. Sainrat, & P. Michaud, Multipleblock ahead branch predictors, Proc. 7th International Conf. on Architectural Support for Programming Languages and Operating Systems, October 1996. 209
  12. [12] S. Reches & S. Weiss, Implementation and analysis of path history in dynamic branch prediction schemes, IEEE Transactions on Computers, 47(8), August 1998. doi:10.1109/12.707596
  13. [13] S. McFarling, Combining branch predictors, Technical Report TN-36, Digital Western Research Laboratory, June 1993.
  14. [14] T.-Y. Yeh & Y.N. Patt, Alternative implementations of twolevel adaptive branch prediction, 124–134, May 19–21, Gold Coast, Australia, ISCA 1992.
  15. [15] M. Behar, A. Mendelson, & A. Kolodny, Trace Cache sampling filter, 14th International Conference on Parallel Architectures and Compilation Techniques, 2005, PACT 2005, September 2005, 255–266. doi:10.1109/PACT.2005.38
  16. [16] J.S. Hu, N. Vijaykrishnan, A. Kandemir, & A. Irwin, Powerefficient trace caches, Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe, March 2002, 1091. doi:10.1109/DATE.2002.999209
  17. [17] B. Black, B. Rychlik, & J.P. Shen, The block-based trace cache, Proceedings of the 26th International Symposium on Computer Architecture, May 1999, 196–207.
  18. [18] A. Agarwal, Performance tradeoffs in multithreaded processors, IEEE Transactions on Parallel and Distributed Systems, 3(5), September 1992. doi:10.1109/71.159037
  19. [19] D. Thiebaut, On the fractal dimension of computer programs and its application to the prediction of cache miss ratio, IEEE Transaction on Computers, 38(7), July 1989. doi:10.1109/12.30852
  20. [20] J.S. Harper, D.J. Kerbyson, & G.R. Nudd, Analytical modeling of set-associative cache behavior, IEEE Transactions on Computer, 48(10), October 1999.
  21. [21] M. Vachharajani, Microarchitecture modeling for design-space exploration, Ph.D. Dissertation, Princeton University, 2004.
  22. [22] S.J. Eggers, Simultaneous multithreading: A platform for next-generation processors, IEEE Micro, September/October 1997.
  23. [23] J.S. Burns, Parallel on-chip simultaneous multithreading dissertation, The University of Southern California, Los Angeles, May 2000.
  24. [24] C.-Y. Cher, Exploring and evaluating control-flow and threadlevel parallelism, Ph.D. Dissertation, Purdue University, 2004.
  25. [25] M. Chaudhuri, Architectural extensions for executing coherence protocols on multi-threaded processors with integrated memory controllers, Ph.D. Dissertation, Cornell University, 2004.
  26. [26] J.D. Collins, Data prefetching via speculative precomputation on a simultaneous multithreaded processor, Ph.D. Dissertation, University of California, San Diego, 2004.
  27. [27] Multicore ties programmers in KNOTS, EE Times, Monday, October 24, 2005.
  28. [28] L.A. Belady & C.J. Kuehner, Dynamic space-sharing in computer systems, Communications of ACM, 12(5), May 1969. doi:10.1145/362946.363002
  29. [29] A. Agarwal, J. Hennessy, & M. Horowitz, Cache performance of operating systems and multiprogramming, ACM Transactions on Computer Systems, 6(4), November 1988. doi:10.1145/48012.48037
  30. [30] M. Kobayashi & M.H. MagDougall, The stack growth function: Cache line reference models, IEEE Transactions on Computers, 38(6), June 1989. doi:10.1109/12.24288
  31. [31] P.J. Denning, The working-set model for program behavior, Communications of the ACM, 11(5), May 1968. doi:10.1145/363095.363141
  32. [32] The Standard Performance Evaluation Corporation. Available at:
  33. [33] A. Hossain, Simultaneous multithreading with Trace Cache, Ph.D. dissertation, Syracuse University, May 2002.
  34. [34] E.J. Dudewicz & S.N. Mishra, Modern Mathematical Statistics (New York: Wiley & Sons, Inc., 1988).
  35. [35] D. Kang, Speculation-aware thread scheduling for simultaneous multithreading, Ph.D. Dissertation, University of Southern California, 2004.

Important Links:

Go Back