Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive Workloads

Timilsina, Sankalpa (ORCID:0009000939365078); Shannigrahi, Susmit (ORCID:0000000162137473)

doi:10.1145/3731599.3767457

Citation Details

Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive Workloads

Distributed cloud environments running data-intensive applications often slow down because of network congestion, uneven bandwidth, and data shuffling between nodes. Traditional host metrics such as CPU or memory do not capture these factors. Scheduling without considering network conditions causes poor placement, longer data transfers, and weaker job performance. This work presents a network-aware job scheduler that uses supervised learning to predict job completion time. The system collects real-time telemetry from all nodes, uses a trained model to estimate how long a job would take on each node, and ranks nodes to choose the best placement. The scheduler is evaluated on a geo-distributed Kubernetes cluster on the FABRIC testbed using network-intensive Spark workloads. Compared to the default Kubernetes scheduler, which uses only current resource availability, the supervised scheduler shows 34–54% higher accuracy in selecting the optimal node. The contribution is the demonstration of supervised learning for real-time, network-aware job scheduling on a multi-site cluster. more »

Award ID(s):: 2430341 2126148 2019012

PAR ID:: 10647697

Author(s) / Creator(s):: Timilsina, Sankalpa; Shannigrahi, Susmit

Publisher / Repository:: ACM

Date Published:: 2025-11-15

Page Range / eLocation ID:: 848 to 853

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
https://doi.org/10.1145/3731599.3767457

More Like this