Scheduling for high performance computing with reinforcement learning
Keywords:
High Performance Computing, Scheduling, Artificial Intelligence, Reinforcement Learning, Deep LearningSynopsis
This is a Chapter in:
Book:
Competitive Tools, Techniques, and Methods
Print ISBN 978-1-6692-0008-6
Online ISBN 978-1-6692-0007-9
Series:
Chronicle of Computing
Chapter Abstract:
Job scheduling for high performance computing systems involves building a policy to optimize for a particular metric, such as minimizing job wait time or maximizing system utilization. Different administrators may value one metric over another, and the desired policy may change over time. Tuning a scheduling application to optimize for a particular metric is challenging, time consuming, and error prone. However, reinforcement learning can quickly learn different scheduling policies dynamically from log data and effectively apply those policies to other workloads. This research demonstrates that a reinforcement learning agent trained using the proximal policy optimization algorithm performs 18.44% better than algorithmic scheduling baselines for one metric and has comparable performance for another. Reinforcement learning can learn scheduling policies which optimize for multiple different metrics and can select not only which job in the queue to schedule next, but also the machine on which to run it. The agent considers jobs with three resource constraints (CPU, GPU, and memory) while respecting individual machine resource constraints.
Cite this paper as:
Hutchison S., Andresen D., Hsu W., Parsons B., Neilsen M. (2024) Scheduling for high performance computing with reinforcement learning. In: Tiako P.F. (ed) Competitive Tools, Techniques, and Methods. Chronicle of Computing. OkIP. APDC24#6. https://doi.org/10.55432/978-1-6692-0007-9_1
Presented at:
The 2024 OkIP International Conference on Advances in Parallel and Distributed Computing (APDC) in Oklahoma City, Oklahoma, USA, and Online, on April 3, 2024
Contact:
Scott Hutchison
scotthutch@ksu.edu
References
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. ArXiv Preprint ArXiv:1606.01540.
Chapin, S., Cirne, W., Feitelson, D., Jones, J., Leutenegger, S., Schwiegelshohn, U., . . . Talby, D. (1999). Benchmarks and standards for the evaluation of parallel job schedulers. Job Scheduling Strategies for Parallel Processing: IPPS/SPDP’99Workshop, JSSPP’99 (pp. 67-90). San Juan: Springer Berlin Heidelberg.
Corbalan, J., & D’Amico, M. (2021). Modular workload format: Extending SWF for modular systems. Workshop on Job Scheduling Strategies for Parallel Processing (pp. 43-55). Cham: Springer International Publishing.
Dósa, G., & Sgall, J. (2014). Optimal analysis of best fit bin packing. International Colloquium on Automata, Languages, and Programming (pp. 429-441). Berlin: Heidelberg: Springer Berlin Heidelberg.
Girden, E. R. (1992). ANOVA: Repeated measures (No. 84). Sage.
Huang, S., & Ontañón, S. (2020). A closer look at invalid action masking in policy gradient algorithms. arXiv preprint arXiv:2006.14171.
Jette, M., & Grondona, M. (2003). SLURM: Simple Linux Utility for Resource Management. Proceedings of ClusterWorld Conference and Expo. . San Jose, California.
Liang, S., Yang, Z., Jin, F., & Chen, Y. (2020). Data centers job scheduling with deep reinforcement learning. Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020 (pp. 906-917). Singapore: Springer International Publishing.
Mao, H., Alizadeh, M., Menache, I., & Kandula, S. (2016). Resource management with deep reinforcement learning. Proceedings of the 15th ACM workshop on hot topics in networks, (pp. 50-56).
Raffin, A., Hill, A., Gleave, A., A., K., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22, pp. 1-8. Retrieved from http://jmlr.org/papers/v22/20-1364.html
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.0634.
Student. (1908). The probable error of a mean. Biometrika, (pp. 1-25).
Ullman, J. D. (1975). NP-complete scheduling problems. Journal of Computer and System sciences, 384-393.
Wang, Q., Zhang, H., Qu, C., Shen, Y., Liu, X., & Li, J. (2021). RLSchert: an hpc job scheduler using deep reinforcement learning and remaining time prediction. Applied Sciences, (p. 9448).
Zhang, D., Dai, D., He, Y., Bao, F. S., & Xie, B. (2020). RLScheduler: an automated HPC batch job scheduler using reinforcement learning. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-15). IEEE.