Hiroyoshi Jutori, Kanemitsu Ootsu, Takashi Yokota, and Takanobu Baba


  1. [1] M.S. Lam and R.P. Wilson, Limits of control flow on parallelism, Proc. 19th Ann. Int. Symp. on Computer Architecture, Gold Coast, Queensland, Australia, 1992, 46–57.
  2. [2] G.S. Sohi, S.E. Breach, and T.N. Vijaykumar, Multiscalar processors, Proc. 22nd Ann. Int. Symp. on Computer Architectures, Santa Margherita Ligure, Italy, 1995, 414–425.
  3. [3] J.-Y. Tsai, J. Huang, C. Amlo, D.J. Lilja, and P.-C. Yew, The Superthreaded processor architecture, IEEE Transactions on Computers, Special Issue on Multithreaded Architectures, 48 (9), 1999, 881–902.
  4. [4] H. Zhong, M. Mehrara, S.A. Lieberman, and S.A. Mahlke, Uncovering hidden loop level parallelism in sequential applications, IEEE 14th Int. Symp. on High Performance Computer Architecture, Salt Lake City, UT, 2008, 290–301.
  5. [5] L. Gao, J. Xue, and T.-F. Ngai, Loop recreation for thread-level speculation on multicore processors, Software: Practice and Experience, 40 (1), 2009, 45–72.
  6. [6] T. Yokota, M. Saito, F. Furukawa, K. Ootsu, and T. Baba, Two-path limited speculation method for static/dynamic optimization in multithreaded systems, Proc. 6th Int. Conf. on Parallel and Distributed Computing, Applications and Technologies, Dalian, China, 2005, 46–50.
  7. [7] T.Y. Yeh and Y.N. Patt, Two-level adaptive training branch prediction, Proc. 24th Ann. Int. Symp. on Microarchitecture, New Mexico, USA, 1991, 51–61.
  8. [8] T. Baba, T. Yokota, K. Ootsu, J. Yoneda, K. Sato, H. Jutori, and H. Yanome, Two-path limited speculative multithreading processor, Proc. 20th IASTED Int. Conf. on Parallel and Distributed Computing and Systems, Orlando, FL, 2008, 348–355.
  9. [9] T. Baba, T. Masuho, T. Yokota, and K. Ootsu, Design of a two-level hot path detector for path-based loop optimizations, Proc. IASTED Int. Conf. on Advances in Computer Science and Technology, Phuket, Thailand, 2007, 23–28.
  10. [10] D. Burger and T.M. Austin, The SimpleScalar tool set, Version 2.0, University of Wisconsin-Madison Computer Sciences Department Technical Report, No.1324, 1997.
  11. [11] M.R. de Alba and D.R. Kaeli, Characterization and evaluation of hardware loop unrolling, Proc. 1st Boston Area Architecture Conference, Cambridge, MA, 2003.
  12. [12] S. Wang, X. Dai, K.S. Yellajyosula, A. Zhai, and P.-C. Yew, Loop selection for thread-level speculation, Proc. 18th Int. Workshop on Languages and Compilers for Parallel Computing, Hawthorne, NY, 2005, 289–303.
  13. [13] C. Mandriles, C.G. Quiñones, F.J. Sánchez, P. Marcuello, A. González, D.M. Tullsen, H. Wang, and J.P. Shen, Mitosis: a speculative multithreaded processor based on precomputation slices, IEEE Transactions on Parallel and Distributed Systems, 19 (7), 2008, 914–925.
  14. [14] T. Ohsawa, M. Takagi. S. Kawahara, and S. Matsushita, Pinot: speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities, Proc. 38th Ann. IEEE/ACM Int. Symp. on Microarchitecture, Barcelona, Spain, 2005, 81–92.
  15. [15] Z. Dong, Y. Zhao, Y. Wei, X. Wang, and S. Song, Prophet: a speculative multi-threading execution model with architectural support based on CMP, Proc. 2009 Int. Conf. on Scalable Computing and Communications; 8th Int. Conf. on Embedded Computing, Dalian, China, 2009, 103–108.
  16. [16] V. Bala, E. Duesterwald, and S. Banerjia, Dynamo: a transparent dynamic optimization system, Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, Vancouver, Canada, 2000, 1–12.
  17. [17] H. Noori, N. Yoshimatsu, Y. Fujii, K. Eshima, M. Yoshida, T. Soga, T. Hayashida, and K. Murakami, An online profiling-based dynamically adaptable processor, Proc. 11th Int. CSI Conf., Tehran, Iran, 2006, 520–523.
  18. [18] B. Choi, L. Porter, and D.M. Tullsen, Accurate branch prediction for short threads, Proc. 13th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Houston, TX, 2008, 125–134.
  19. [19] P. Xekalakis and M. Cintra, Handling branches in TLS systems with multi-path execution, IEEE 16th Int. Symp. on High Performance Computer Architecture, Bangalore, India, 2010, 1–12.
  20. [20] J. Renau, K. Strauss, L. Ceze, W. Liu, S. Sarangi, J. Tuck, and J. Torrellas, Thread-level speculation on a CMP can be energy efficient, Proc. 19th Ann. Int. Conf. on Supercomputing, Cambridge, MA, 2005, 219–228.

Important Links:

Go Back