Dynamic Multi-Resource Monitoring for Predictive Job Scheduling with ScoPro

A.C. Sodan and L. Liu (Canada)

Keywords

dynamic job scheduling, realtime monitoring, time sharing slowdowns, heterogeneous resources, grid computing, clusters

Abstract

Standard job schedulers for parallel machines apply dedicated resource allocation and typically rely on user estimates regarding runtime. Modern job schedulers move towards applying dynamic approaches like time sharing or adaptive resource allocation to accommodate grid jobs or to better utilize local resources. Also, the resources may be heterogeneous and a proper distribution of the application’s workload be hard to estimate. Our ScoPro monitoring tool permits to obtain and to store resource related behavior information of parallel applications. This information is used to create an application signature for predictive use in future runs and to dynamically check competition under time-shared execution or imbalances of workload on heterogeneous resources. ScoPro is applicable to production runs on standard clusters. As main innovative contributions ScoPro can be trigged by job-scheduling events, can monitor several coscheduled jobs simultaneously for accurate prediction of slowdowns, and performs realtime short-period measurements with low intrusion and intrusion only for the monitored period.

Important Links:



Go Back