Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models

Calle, Paul (ORCID:0009000018494481); Bates, Averi; Reynolds, Justin C; Liu, Yunlong; Cui, Haoyang; Ly, Sinaro; Wang, Chen; Zhang, Qinghao; de_Armendi, Alberto J; Shettar, Shashank S (ORCID:000000023484205X); Fu, Kar-Ming; Tang, Qinggong (ORCID:0000000194995384); Pan, Chongle (ORCID:0000000328600334)

doi:10.1016/j.cmpb.2025.109063

Citation Details

This content will become publicly available on December 1, 2026

Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models

Background and Objectives: The variability and biases in the real-world performance benchmarking of deep learning models for medical imaging compromise their trustworthiness for real-world deployment. The common approach of holding out a single fixed test set fails to quantify the variance in the estimation of test performance metrics. This study introduces NACHOS (Nested and Automated Cross-validation and Hyperparameter Optimization using Supercomputing) to reduce and quantify the variance of test performance metrics of deep learning models. Methods: NACHOS integrates Nested Cross-Validation (NCV) and Automated Hyperparameter Optimization (AHPO) within a parallelized high-performance computing (HPC) framework. NACHOS was demonstrated on a chest X-ray repository and an Optical Coherence Tomography (OCT) dataset under multiple data partitioning schemes. Beyond performance estimation, DACHOS (Deployment with Automated Cross-validation and Hyperparameter Optimization using Supercomputing) is introduced to leverage AHPO and cross-validation to build the final model on the full dataset, improving expected deployment performance. Results: The findings underscore the importance of NCV in quantifying and reducing estimation variance, AHPO in optimizing hyperparameters consistently across test folds, and HPC in ensuring computational feasibility. Conclusions: By integrating these methodologies, NACHOS and DACHOS provide a scalable, reproducible, and trustworthy framework for DL model evaluation and deployment in medical imaging. To maximize public availability, the full open-source codebase is provided at https://github.com/thepanlab/NACHOS. more »

Award ID(s):: 2331409

PAR ID:: 10658499

Author(s) / Creator(s):: Calle, Paul; Bates, Averi; Reynolds, Justin C; Liu, Yunlong; Cui, Haoyang; Ly, Sinaro; Wang, Chen; Zhang, Qinghao; de_Armendi, Alberto J; Shettar, Shashank S; Fu, Kar-Ming; Tang, Qinggong; Pan, Chongle

Publisher / Repository:: ELSEVIER

Date Published:: 2025-12-01

Journal Name:: Computer Methods and Programs in Biomedicine

Volume:: 272

Issue:: C

ISSN:: 0169-2607

Page Range / eLocation ID:: 109063

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on December 1, 2026
Journal Article:
https://doi.org/10.1016/j.cmpb.2025.109063

More Like this