skip to main content


Title: Assessing Fit in Ordinal Factor Analysis Models: SRMR vs. RMSEA
This study introduces the statistical theory of using the Standardized Root Mean Squared Error (SRMR) to test close fit in ordinal factor analysis. We also compare the accuracy of confidence intervals (CIs) and tests of close fit based on the Standardized Root Mean Squared Error (SRMR) with those obtained based on the Root Mean Squared Error of Approximation (RMSEA). We use Unweighted Least Squares (ULS) estimation with a mean and variance corrected test statistic. The current (biased) implementation for the RMSEA never rejects that a model fits closely when data are binary and almost invariably rejects the model in large samples if data consist of five categories. The unbiased RMSEA produces better rejection rates, but it is only accurate enough when the number of variables is small (e.g., p = 10) and the degree of misfit is small. In contrast, across all simulated conditions, the tests of close fit based on the SRMR yield acceptable type I error rates. SRMR tests of close fit are also more powerful than those using the unbiased RMSEA.  more » « less
Award ID(s):
1659936
NSF-PAR ID:
10099860
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Structural equation modeling
ISSN:
1070-5511
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We examined the effect of estimation methods, maximum likelihood (ML), unweighted least squares (ULS), and diagonally weighted least squares (DWLS), on three population SEM (structural equation modeling) fit indices: the root mean square error of approximation (RMSEA), the comparative fit index (CFI), and the standardized root mean square residual (SRMR). We considered different types and levels of misspecification in factor analysis models: misspecified dimensionality, omitting cross-loadings, and ignoring residual correlations. Estimation methods had substantial impacts on the RMSEA and CFI so that different cutoff values need to be employed for different estimators. In contrast, SRMR is robust to the method used to estimate the model parameters. The same criterion can be applied at the population level when using the SRMR to evaluate model fit, regardless of the choice of estimation method. 
    more » « less
  2. null (Ed.)
    We examine the accuracy of p values obtained using the asymptotic mean and variance (MV) correction to the distribution of the sample standardized root mean squared residual (SRMR) proposed by Maydeu-Olivares to assess the exact fit of SEM models. In a simulation study, we found that under normality, the MV-corrected SRMR statistic provides reasonably accurate Type I errors even in small samples and for large models, clearly outperforming the current standard, that is, the likelihood ratio (LR) test. When data shows excess kurtosis, MV-corrected SRMR p values are only accurate in small models ( p = 10), or in medium-sized models ( p = 30) if no skewness is present and sample sizes are at least 500. Overall, when data are not normal, the MV-corrected LR test seems to outperform the MV-corrected SRMR. We elaborate on these findings by showing that the asymptotic approximation to the mean of the SRMR sampling distribution is quite accurate, while the asymptotic approximation to the standard deviation is not. 
    more » « less
  3. Background

    Deep learning (DL)‐based automatic segmentation models can expedite manual segmentation yet require resource‐intensive fine‐tuning before deployment on new datasets. The generalizability of DL methods to new datasets without fine‐tuning is not well characterized.

    Purpose

    Evaluate the generalizability of DL‐based models by deploying pretrained models on independent datasets varying by MR scanner, acquisition parameters, and subject population.

    Study Type

    Retrospective based on prospectively acquired data.

    Population

    Overall test dataset: 59 subjects (26 females); Study 1: 5 healthy subjects (zero females), Study 2: 8 healthy subjects (eight females), Study 3: 10 subjects with osteoarthritis (eight females), Study 4: 36 subjects with various knee pathology (10 females).

    Field Strength/Sequence

    A 3‐T, quantitative double‐echo steady state (qDESS).

    Assessment

    Four annotators manually segmented knee cartilage. Each reader segmented one of four qDESS datasets in the test dataset. Two DL models, one trained on qDESS data and another on Osteoarthritis Initiative (OAI)‐DESS data, were assessed. Manual and automatic segmentations were compared by quantifying variations in segmentation accuracy, volume, and T2 relaxation times for superficial and deep cartilage.

    Statistical Tests

    Dice similarity coefficient (DSC) for segmentation accuracy. Lin's concordance correlation coefficient (CCC), Wilcoxon rank‐sum tests, root‐mean‐squared error‐coefficient‐of‐variation to quantify manual vs. automatic T2 and volume variations. Bland–Altman plots for manual vs. automatic T2 agreement. APvalue < 0.05 was considered statistically significant.

    Results

    DSCs for the qDESS‐trained model, 0.79–0.93, were higher than those for the OAI‐DESS‐trained model, 0.59–0.79. T2 and volume CCCs for the qDESS‐trained model, 0.75–0.98 and 0.47–0.95, were higher than respective CCCs for the OAI‐DESS‐trained model, 0.35–0.90 and 0.13–0.84. Bland–Altman 95% limits of agreement for superficial and deep cartilage T2 were lower for the qDESS‐trained model, ±2.4 msec and ±4.0 msec, than the OAI‐DESS‐trained model, ±4.4 msec and ±5.2 msec.

    Data Conclusion

    The qDESS‐trained model may generalize well to independent qDESS datasets regardless of MR scanner, acquisition parameters, and subject population.

    Evidence Level

    1

    Technical Efficacy

    Stage 1

     
    more » « less
  4. Purpose

    To improve the performance of neural networks for parameter estimation in quantitative MRI, in particular when the noise propagation varies throughout the space of biophysical parameters.

    Theory and Methods

    A theoretically well‐founded loss function is proposed that normalizes the squared error of each estimate with respective Cramér–Rao bound (CRB)—a theoretical lower bound for the variance of an unbiased estimator. This avoids a dominance of hard‐to‐estimate parameters and areas in parameter space, which are often of little interest. The normalization with corresponding CRB balances the large errors of fundamentally more noisy estimates and the small errors of fundamentally less noisy estimates, allowing the network to better learn to estimate the latter. Further, proposed loss function provides an absolute evaluation metric for performance: A network has an average loss of 1 if it is a maximally efficient unbiased estimator, which can be considered the ideal performance. The performance gain with proposed loss function is demonstrated at the example of an eight‐parameter magnetization transfer model that is fitted to phantom and in vivo data.

    Results

    Networks trained with proposed loss function perform close to optimal, that is, their loss converges to approximately 1, and their performance is superior to networks trained with the standard mean‐squared error (MSE). The proposed loss function reduces the bias of the estimates compared to the MSE loss, and improves the match of the noise variance to the CRB. This performance gain translates to in vivo maps that align better with the literature.

    Conclusion

    Normalizing the squared error with the CRB during the training of neural networks improves their performance in estimating biophysical parameters.

     
    more » « less
  5. Abstract

    Inflow anomalies at varying temporal scales, seasonally varying storage mandates, and multipurpose allocation requirements contribute to reservoir operational decisions. The difficulty of capturing these constraints across many basins in a generalized framework has limited the accuracy of streamflow estimates in land‐surface models for locations downstream of reservoirs. We develop a Piecewise Linear Regression Tree to learn generalized daily operating policies from 76 reservoirs from four major basins across the coterminous US. Reservoir characteristics, such as residence time and maximum storage, and daily state variables, such as storage and inflow, are used to group similar observations across all reservoirs. Linear regression equations are then fit between daily state variables and release for each group. We recommend two models—Model 1 (M1) that performs the best when simulating untrained records but is complex and Model 2 (M2) that is nearly as performant as M1 but more parsimonious. The simulated release median root mean squared error is 49.7% (53.2%) of mean daily release with a median Nash‐Sutcliffe efficiency of 0.62 (0.52) for M1 (M2). Long‐term residence time is shown to be useful in grouping similar operating reservoirs. Release from low residence time reservoirs can be mostly described using inflow‐based variables. Operations at higher residence time reservoirs are more related to previous release variables or storage variables, depending on the current inflow. The ability of the models presented to capture operational dynamics of many types of reservoirs indicates their potential to be used for untrained and limited data reservoirs.

     
    more » « less