Create New Account
Login
Search or Buy Articles
Browse Journals
Browse Proceedings
Submit your Paper
Submission Information
Journal Review
Recommend to Your Library
Call for Papers
CURRENT RESEARCH AND PRACTICE IN PROACTIVE FAULT MANAGEMENT
Y. Li and Z. Lan
References
[1] S. Chakravorty, C. Mendes, & L. Kale, Proactive fault tolerancein large systems, Proc. HPCRI Workshop, 2005.
[2] S. Pertet & P. Narasimhan, Proactive recovery in distributedCORBA applications, Proc. Int. Conf. on Dependable Systems and Networks, 2004, 357–366.
[3] V. Castelli et al., Proactive management of software aging,IBM Journal of Research and Development, 45 (2), 2001, 331.
[4] Y. Huang, C. Kintala, N. Kolettis, & N. Fulton, Softwarerejuvenation: Analysis, module, and applications, Proc. Int. Symp. on Fault-Tolerance Computing, 1995, 381–390.
[5] A. Avizienis, J.-C. Laprie, & B. Randell, Dependability and itsthreats – A taxonomy, IFIP Congress Topical Sessions, 2004, 91–120.
[6] D. Tang, R. Iyer, & S. Subramani, Failure analysis and modelling of a VAXcluster system, Proc. Int. Symp. Fault Tolerance Computing, 1990, 244–251.
doi:10.1109/FTCS.1990.89372
[7] J. Xu, Z. Kalbarczyk, & R. Iyer, Networked windows NT system filed failure data analysis, Proc. Pacific Rim Int. Symp. on Dependable Computing, 1999.
[8] R. Sahoo, A. Sivasubramaniam, M. Squillante, & Y. Zhang, Failure data analysis of a large-scale heterogeneous server environment, Proc. Int. Conf. on Dependable Systems and Networks, 2004, 772.
[9] C. Lu, Scalable diskless checkpointing for large parallel systems,Ph.D. Thesis, University of Illinois, Urbana-Champaign, 2005.
[10] J. Brevik, D. Nurmi, & R. Wolski, Automatic methods forpredicting machine availability in desktop grid and peer-to-peersystems, Proc. 2004 IEEE Int. Symp. on Cluster Computing and the Grid, 190–199.
[11] C. Leangsuksun, L. Shen, & S. Scott, Availability predictionand modelling of high availability OSCAR cluster, Proc. IEEE Int. Conf. on Cluster, 2003, 380–386.
doi:10.1109/CLUSTR.2003.1253337
[12] J. Hellerstein, F. Zhang, & P. Shahabuddin, A statistical approach to predictive detection, Computer Networks: The International Journal of Computer and Telecommunications Networking, 2001, 77–95.
[13] R. Vilalta et al., Predictive algorithms in the management ofcomputer systems, IBM Systems Journal, 41(3), 2002.
[14] R. Sahoo et al., Critical event prediction for proactive management in large-scale computer clusters, Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2003, 426–435.
[15] G. Hamerly & C. Elkan, Bayesian approaches to failure prediction for disk drives, Proc. 18th Int. Conf. on Machine Learning, 2001, 1–9.
[16] G. Hoffmann, F. Salfner, & M. Malek, Advanced failure prediction in complex software systems, Research Report Number 172, Department of Computer Science, Humboldt University, Berlin, 2004.
[17] F. Salfner, Predicting failures with hidden Markov models, Proc. 5th European Dependable Computing Conf., April 2005.
[18] S. Garg, A. Puliafito, & K. Trivedi, Analysis of softwarerejuvenation using Markov regenerative stochastic petri net, Proc. 6th Int. Symp. on Software Reliability Engineering, 1995, 180–187.
[19] Y. Li & Z. Lan, Exploit failure prediction for adaptive fault-tolerance in cluster computing, Proc. IEEE/ACM Int. Symp. on Cluster Computing and the Grid (CCGrid06), 2006, 531–538.
[20] Y. Zhang et al., Performance implications of failures in large-scale cluster scheduling, Proc. 10th Workshop on Job Scheduling Strategies for Parallel Processing, 2004, 233–252.
[21] A.J. Oliner, R.K. Sahoo, & A. Sivasubramaniam, Fault-aware job scheduling for BlueGene/L systems, Proc. 18th Int. Parallel and Distributed Processing Symp., 2004, 64.
doi:10.1109/IPDPS.2004.1302991
[22] R.L. Graham, G.M. Shipman, B.W. Barrett, R.H. Castain, G. Bosilca, & A. Lumsdaine, Open MPI: A high-performance, heterogeneous MPI, Proc. HeteroPar, 2006.
Important Links:
Abstract
DOI:
10.2316/Journal.202.2007.4.202-2248
From Journal
(202) International Journal of Computers and Applications - 2007
Go Back