skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Modern data center workloads are composed of multiserver jobs, computational jobs that require multiple servers in order to run. A data center can run many multiserver jobs in parallel, as long as it has sufficient resources to meet their individual demands. Multiserver jobs are generally stateful, meaning that job preemptions incur significant overhead from saving and reloading the state associated with running jobs. Hence, most systems try to avoid these costly job preemptions altogether. Given these constraints, a scheduling policy must determine what set of jobs to run in parallel at each moment in time to minimize the mean response time across a stream of arriving jobs. Unfortunately, simple non-preemptive policies such as First-Come First-Served (FCFS) may leave many servers idle, resulting in high mean response times or even system instability. Our goal is to design and analyze non-preemptive scheduling policies for multiserver jobs that maintain high system utilization to achieve low mean response time. One well-known non-preemptive scheduling policy, Most Servers First (MSF), prioritizes jobs with higher server needs and is known for achieving high resource utilization. However, MSF causes extreme variability in job waiting times, and can perform significantly worse than FCFS in practice. To address this issue, we propose and analyze a class of scheduling policies called Most Servers First with Quickswap (MSFQ) that performs well in a wide variety of cases. MSFQ reduces the variability of job waiting times by periodically granting priority to other jobs in the system. We provide both stability results and an analysis of mean response time under MSFQ to prove that our policy dramatically outperforms MSF in the case where jobs either request one server or all the servers. In more complex cases, we evaluate MSFQ in simulation. We show that, with some additional optimization, variants of the MSFQ policy can greatly outperform MSF and FCFS on real-world multiserver job workloads. 
    more » « less
  2. Background When transitioning from high school, autistic job seekers often navigate three different pathways to employment: University, Job Coaching, and Self-Directed (defined as those job seekers who independently complete the job search process, without formal support). Assistive technology may aid job seekers throughout the job seeking process. The aim of this study is to learn more about the challenges and assistive technology that autistic job seekers encounter while navigating these three different employment pathways. Methods Qualitative semi-structured interviews were conducted with fifteen stakeholders in the United States, autistic job seekers and support personnel, within each pathway of the hiring process to gather information regarding the challenges autistic job seekers encounter, and the assistive technology they use to address those challenges. Results From a thematic analysis of these interviews, we found that autistic job seekers along each pathway commonly move through the following, phases of the hiring process or “checkpoints”: resume building, networking, job search, job application, and interviews. Autistic job seekers also face challenges within each checkpoint, such as knowing when and what to disclose; self-efficacy, anxiety, and communication challenges; and a lack of communication from potential employers. We also learned that some self-directed autistic job seekers, when compared to those in the University and Job Coaching pathways, may not be using assistive technologies available in the job search process. From our interviews, we also learned the types of assistive technology that autistic job seekers and assistants use in the job seeking process which can be classified as organizational tools, connectivity tools, and visual media tools. Conclusion and implications Our findings revealed a necessity to connect self-directed autistic job seekers to assistive technology available. Based on these results, we present suggestions for future research and design suggestions for developing assistive technology for autistic job seekers. What this paper adds? We define three career pathways for autistic job seekers: University, Job Coaching and Self Directed. To learn more about the hiring process for autistic job seekers and the assistive technology used within each pathway, we conducted a need-finding study. As a contribution of this study, we discovered challenges along each checkpoint in the hiring process, as well as various forms of assistive technology used to support autistic job seekers when encountering those challenges. For our second contribution, we use the information from these interviews to provide suggestions for the design of future assistive technology within the hiring process, potentially supporting the self-efficacy of autistic job seekers, during this process. 
    more » « less
  3. Significant changes in the digital employment landscape, driven by rapid technological advancements and the COVID-19 pandemic, have introduced new opportunities for blind and visually impaired (BVI) individuals in developing countries like India. However, a significant portion of the BVI population in India remains unemployed despite extensive accessibility advancements and job search interventions. Therefore, we conducted semi-structured interviews with 20 BVI persons who were either pursuing or recently sought employment in the digital industry. Our findings reveal that despite gaining digital literacy and extensive training, BVI individuals struggle to meet industry requirements for fulfilling job openings. While they engage in self-reflection to identify shortcomings in their approach and skills, they lack constructive feedback from peers and recruiters. Moreover, the numerous job intervention tools are limited in their ability to meet the unique needs of BVI job seekers. Our results, therefore, provide key insights that inform the design of future collaborative intervention systems that offer personalized feedback for BVI individuals, effectively guiding their self-reflection process and subsequent job search behaviors, and potentially leading to improved employment outcomes. 
    more » « less
  4. This report offers the first comprehensive real-time mapping of how ethical, responsible, and public interest technology roles have evolved—and where they're headed. The report is based on analysis of over 5,700 job postings; interviews with key practitioners and critics across the ecosystem and insights from this community. 
    more » « less
  5. This work presents a framework for estimating job wait times in High-Performance Computing (HPC) scheduling queues, leverag- ing historical job scheduling data and real-time system metrics. Using machine learning techniques, specifically Random Forest and Multi-Layer Perceptron (MLP) models, we demonstrate high accuracy in predicting wait times, achieving 94.2% reliability within a 10-minute error margin. The framework incorporates key fea- tures such as requested resources, queue occupancy, and system utilization, with ablation studies revealing the significance of these features. Additionally, the framework offers users wait time esti- mates for different resource configurations, enabling them to select optimal resources, reduce delays, and accelerate computational workloads. Our approach provides valuable insights for both users and administrators to optimize job scheduling, contributing to more efficient resource management and faster time to scientific results. 
    more » « less
  6. Queue scheduling, in which limited resources must be allocated to incoming customers, has numerous applications in service operations management. With increasing data availability and advances in predictive models, personalized scheduling—which leverages individual information about underlying stochastic processes beyond just probability distributions—has gained significant attention. A new study reveals that, even with noisy service-time predictions, the (predicted) shortest-job-first (SJF) policy can effectively optimize performance in many-server systems with inpatient customers. The study also characterizes the impact of prediction errors on the policy’s effectiveness. Additionally, the study shows that a two-class priority rule, in which customers with shorter predicted service times (below a carefully designed threshold) are prioritized, can asymptotically match the performance of SJF, offering a simpler policy for implementation in practice. 
    more » « less
  7. Recent advances in virtualization technologies used in cloud computing offer performance that closely approaches bare-metal levels. Combined with specialized instance types and high-speed networking services for cluster computing, cloud platforms have become a compelling option for high-performance computing (HPC). However, most current batch job schedulers in HPC systems are designed for homogeneous clusters and make decisions based on limited information about jobs and system status. Scientists typically submit computational jobs to these schedulers with a requested runtime that is often over- or under-estimated. More accurate runtime predictions can help schedulers make better decisions and reduce job turnaround times. They can also support decisions about migrating jobs to the cloud to avoid long queue wait times in HPC systems. In this study, we design neural network models to predict the runtime and resource utilization of jobs on integrated cloud and HPC systems. We developed two monitoring strategies to collect job and system resource utilization data using a workload management system and a cloud monitoring service. We evaluated our models on two Department of Energy (DOE) HPC systems and Amazon Web Services (AWS). Our results show that we can predict the runtime of a job with 31–41 % mean absolute percentage error (MAPE), 14–17 seconds mean absolute value error (MAE), and 0.99 R-squared (R²) score. Having an MAE of less than a minute corresponds to 100 % accuracy since the requested time for batch jobs is always specified in hours and/or minutes 
    more » « less
  8. The inequitable distribution of principal effectiveness raises concern among policymakers. Principal sorting likely contributes to wider achievement and opportunity gaps between low- and high-need schools. As a possible policy tool, policymakers proposed performance-based compensation systems (PBCS). Tennessee was one of the states that supported the implementation of PBCS. This study examined the relationship between PBCS and principal job performance in the state, using longitudinal administrative data, principal evaluation data, and unique PBCS data from 2012 to 2019. The study did not find consistently significant, positive relationships between PBCS and principal job performance. However, the relationships were generally more pronounced among high-need schools. The study concludes with detailed discussions about the results, the assumptions behind PBCS, limitations, and implications. 
    more » « less
  9. Dragonfly is an indispensable interconnect topology for exascale high-performance computing (HPC) systems. To link tens of thousands of compute nodes at a reasonable cost, Dragonfly shares network resources with the entire system such that network bandwidth is not exclusive to any single application. Since HPC systems are usually shared among multiple co-running applications at the same time, network competition between co-existing workloads is inevitable. This network contention manifests as workload interference, in which a job’s network communication can be severely delayed by other jobs. This study presents a comprehensive examination of leveraging intelligent routing and flexible job placement to mitigate workload interference on Dragonfly systems. Specifically, we leverage the parallel discrete event simulation toolkit, the Structural Simulation Toolkit (SST), to investigate workload interference on Dragonfly with three contributions. We first present Q-adaptive routing, a multi-agent reinforcement learning routing scheme, and a flexible job placement strategy that, together, can mitigate workload interference based on workload communication characteristics. Next, we enhance SST with Q-adaptive routing and develop an automatic module that serves as the bridge between the SST and HPC job scheduler for automatic simulation configuration and automated simulation launching. Finally, we extensively examine workload interference under various job placement and routing configurations. 
    more » « less
  10. Distributed cloud environments running data-intensive applications often slow down because of network congestion, uneven bandwidth, and data shuffling between nodes. Traditional host metrics such as CPU or memory do not capture these factors. Scheduling without considering network conditions causes poor placement, longer data transfers, and weaker job performance. This work presents a network-aware job scheduler that uses supervised learning to predict job completion time. The system collects real-time telemetry from all nodes, uses a trained model to estimate how long a job would take on each node, and ranks nodes to choose the best placement. The scheduler is evaluated on a geo-distributed Kubernetes cluster on the FABRIC testbed using network-intensive Spark workloads. Compared to the default Kubernetes scheduler, which uses only current resource availability, the supervised scheduler shows 34–54% higher accuracy in selecting the optimal node. The contribution is the demonstration of supervised learning for real-time, network-aware job scheduling on a multi-site cluster. 
    more » « less