Magnetic resonance imaging (MRI) scans are known to suffer from a variety of acquisition artifacts as well as equipment‐based variations that impact image appearance and segmentation performance. It is still unclear whether a direct relationship exists between magnetic resonance (MR) image quality metrics (IQMs) (e.g., signal‐to‐noise, contrast‐to‐noise) and segmentation accuracy.
Deep learning (DL) approaches have shown significant promise for automated segmentation of brain tumors on MRI but depend on the quality of input training images. We sought to evaluate the relationship between IQMs of input training images and DL‐based brain tumor segmentation accuracy toward developing more generalizable models for multi‐institutional data.
We trained a 3D DenseNet model on the BraTS 2020 cohorts for segmentation of tumor subregions enhancing tumor (ET), peritumoral edematous, and necrotic and non‐ET on MRI; with performance quantified via a 5‐fold cross‐validated Dice coefficient. MRI scans were evaluated through the open‐source quality control tool MRQy, to yield 13 IQMs per scan. The Pearson correlation coefficient was computed between whole tumor (WT) dice values and IQM measures in the training cohorts to identify quality measures most correlated with segmentation performance. Each selected IQM was used to group MRI scans as “better” quality (BQ) or “worse” quality (WQ), via relative thresholding. Segmentation performance was re‐evaluated for the DenseNet model when (i) training on BQ MRI images with validation on WQ images, as well as (ii) training on WQ images, and validation on BQ images. Trends were further validated on independent test sets derived from the BraTS 2021 training cohorts.
For this study, multimodal MRI scans from the BraTS 2020 training cohorts were used to train the segmentation model and validated on independent test sets derived from the BraTS 2021 cohort. Among the selected IQMs, models trained on BQ images based on inhomogeneity measurements (coefficient of variance, coefficient of joint variation, coefficient of variation of the foreground patch) and the models trained on WQ images based on noise measurement peak signal‐to‐noise ratio (SNR) yielded significantly improved tumor segmentation accuracy compared to their inverse models.
Our results suggest that a significant correlation may exist between specific MR IQMs and DenseNet‐based brain tumor segmentation performance. The selection of MRI scans for model training based on IQMs may yield more accurate and generalizable models in unseen validation.