NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Towards Agile and Judicious Metadata Load Balancing for Ceph File System via Matrix-based Modeling

https://doi.org/10.1145/3721483

Shao, Xinyang; Wang, Yiduo; Li, Cheng; Liang, Hengyu; Wang, Chenhan; Yan, Feng; Xu, Yinlong (March 2025, ACM Transactions on Storage)

To scale out the massive metadata access, the Ceph distributed file system (CephFS) adopts adynamic subtree partitioningmethod, splitting the hierarchical namespace and distributingsubtreesacross multiple metadata servers. However, this method suffers from a severe imbalance problem that may result in poor performance due to its inaccurate imbalance prediction, ignorance of workload characteristics, and unnecessary/invalid migration activities. To eliminate these inefficiencies, we propose Lunule, a novel CephFS metadata load balancer, which employs animbalance factor modelfor accurately determiningwhento trigger re-balance and tolerate unharmful imbalanced situations. Lunule further adopts aworkload-aware migration plannerto appropriately select subtree migration candidates. Finally, we extend Lunule to Lunule⁺, which models metadata accesses into matrices, and employs matrix-based formulas for more accurate load prediction and re-balance decision. Compared to baselines, Lunule achieves better load balance, increases the metadata throughput by up to 315.8%, and shortens the tail job completion time by up to 64.6% for five real-world workloads and their mixture, respectively. Besides, Lunule is capable of handling the metadata cluster expansion and the workload growth, and scales linearly on a 16-node cluster. Compared to Lunule, Lunule⁺achieves up to 64.96% better metadata load balance, and 13.53-86.09% higher throughput.
more » « less
Free, publicly-accessible full text available March 5, 2026
Enabling scalable and adaptive machine learning training via serverless computing on public cloud

https://doi.org/10.1016/j.peva.2024.102451

Ali, Ahsan; Ma, Xiaolong; Zawad, Syed; Aditya, Paarijaat; Akkus, Istemi Ekin; Chen, Ruichuan; Yang, Lei; Yan, Feng (March 2025, Performance Evaluation)

Free, publicly-accessible full text available March 1, 2026
FedCust: Offloading hyperparameter customization for federated learning

https://doi.org/10.1016/j.peva.2024.102450

Zawad, Syed; Ma, Xiaolong; Yi, Jun; Li, Cheng; Zhang, Minjia; Yang, Lei; Yan, Feng; He, Yuxiong (March 2025, Performance Evaluation)

Free, publicly-accessible full text available March 1, 2026
Speed Up Federated Learning in Heterogeneous Environments: A Dynamic Tiering Approach

https://doi.org/10.1109/JIOT.2024.3487473

Mahmoud_Sajjadi_Mohammadabadi, Seyed; Zawad, Syed; Yan, Feng; Yang, Lei (March 2025, IEEE Internet of Things Journal)

Free, publicly-accessible full text available March 1, 2026
DistDNAS: Search Efficient Feature Interactions within 2 Hours

https://doi.org/10.1109/BigData62323.2024.10825061

Zhang, Tunhou; Wen, Wei; Fedorov, Igor; Liu, Xi; Zhang, Buyun; Han, Fangqiu; Chen, Wen-Yen; Han, Yiping; Yan, Feng; Li, Hai; et al (December 2024, IEEE)

Full Text Available
Towards Automated Model Design on Recommender Systems

https://doi.org/10.1145/3706124

Zhang, Tunhou; Cheng, Dehua; He, Yuchen; Chen, Zhengxing; Dai, Xiaoliang; Xiong, Liang; Liu, Yudong; Cheng, Feng; Cao, Yufan; Yan, Feng; et al (December 2024, ACM Transactions on Recommender Systems)

The increasing popularity of deep learning models has created new opportunities for developing AI-based recommender systems. Designing recommender systems using deep neural networks requires careful architecture design, and further optimization demands extensive co-design efforts on jointly optimizing model architecture and hardware. Design automation, such as Automated Machine Learning (AutoML), is necessary to fully exploit the potential of recommender model design, including model choices and model-hardware co-design strategies. We introduce a novel paradigm that utilizes weight sharing to explore abundant solution spaces. Our paradigm creates a large supernet to search for optimal architectures and co-design strategies to address the challenges of data multi-modality and heterogeneity in the recommendation domain. From a model perspective, the supernet includes a variety of operators, dense connectivity, and dimension search options. From a co-design perspective, it encompasses versatile Processing-In-Memory (PIM) configurations to produce hardware-efficient models. Our solution space’s scale, heterogeneity, and complexity pose several challenges, which we address by proposing various techniques for training and evaluating the supernet. Our crafted models show promising results on three Click-Through Rates (CTR) prediction benchmarks, outperforming both manually designed and AutoML-crafted models with state-of-the-art performance when focusing solely on architecture search. From a co-design perspective, we achieve 2 × FLOPs efficiency, 1.8 × energy efficiency, and 1.5 × performance improvements in recommender models.
more » « less
Full Text Available
Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning

https://doi.org/10.1109/ICDCS60910.2024.00069

Sajjadi_Mohammadabadi, Seyed Mahmoud; Yang, Lei; Yan, Feng; Zhang, Junshan (July 2024, IEEE)

Full Text Available
Noctua: Towards Practical and Automated Fine-grained Consistency Analysis

https://doi.org/10.1145/3627703

Ma, Kai; Li, Cheng; Zhu, Enzuo; Chen, Ruichuan; Yan, Feng; Chen, Kang (April 2024, Proceedings of the ACM Nineteenth European Conference on Computer Systems)

Full Text Available
ZeRO++: Extremely Efficient Collective Communication for Large Model Training

Wang, Guanhua; Qin, Heyang; Jacobs, Sam; Wu, Xiaoxia; Holmes, Connor; Yao, Zhewei; Rajbhandari, Samyam; Ruwase, Olatunji; Yan, Feng; Yang, Lei; et al (March 2024, International Conference on Learning Representations)
MalleTrain: Deep Neural Networks Training on Unfillable Supercomputer Nodes

Ma, Xiaolong; Yan, Feng; Yang, Lei; Foster, Ian; Papka, Michael; Liu, Zhengchun; Kettimuthu, Rajkumar (March 2024, International Conference on Performance Engineering)

« Prev Next »

Search for: All records