skip to main content

Title: The Effect of Estimation Methods on SEM Fit Indices
We examined the effect of estimation methods, maximum likelihood (ML), unweighted least squares (ULS), and diagonally weighted least squares (DWLS), on three population SEM (structural equation modeling) fit indices: the root mean square error of approximation (RMSEA), the comparative fit index (CFI), and the standardized root mean square residual (SRMR). We considered different types and levels of misspecification in factor analysis models: misspecified dimensionality, omitting cross-loadings, and ignoring residual correlations. Estimation methods had substantial impacts on the RMSEA and CFI so that different cutoff values need to be employed for different estimators. In contrast, SRMR is robust to the method used to estimate the model parameters. The same criterion can be applied at the population level when using the SRMR to evaluate model fit, regardless of the choice of estimation method.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Educational and Psychological Measurement
Page Range / eLocation ID:
421 to 445
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This study introduces the statistical theory of using the Standardized Root Mean Squared Error (SRMR) to test close fit in ordinal factor analysis. We also compare the accuracy of confidence intervals (CIs) and tests of close fit based on the Standardized Root Mean Squared Error (SRMR) with those obtained based on the Root Mean Squared Error of Approximation (RMSEA). We use Unweighted Least Squares (ULS) estimation with a mean and variance corrected test statistic. The current (biased) implementation for the RMSEA never rejects that a model fits closely when data are binary and almost invariably rejects the model in large samples if data consist of five categories. The unbiased RMSEA produces better rejection rates, but it is only accurate enough when the number of variables is small (e.g., p = 10) and the degree of misfit is small. In contrast, across all simulated conditions, the tests of close fit based on the SRMR yield acceptable type I error rates. SRMR tests of close fit are also more powerful than those using the unbiased RMSEA. 
    more » « less
  2. null (Ed.)
    Accurate, precise, and timely estimation of crop yield is key to a grower’s ability to proactively manage crop growth and predict harvest logistics. Such yield predictions typically are based on multi-parametric models and in-situ sampling. Here we investigate the extension of a greenhouse study, to low-altitude unmanned aerial systems (UAS). Our principal objective was to investigate snap bean crop (Phaseolus vulgaris) yield using imaging spectroscopy (hyperspectral imaging) in the visible to near-infrared (VNIR; 400–1000 nm) region via UAS. We aimed to solve the problem of crop yield modelling by identifying spectral features explaining yield and evaluating the best time period for accurate yield prediction, early in time. We introduced a Python library, named Jostar, for spectral feature selection. Embedded in Jostar, we proposed a new ranking method for selected features that reaches an agreement between multiple optimization models. Moreover, we implemented a well-known denoising algorithm for the spectral data used in this study. This study benefited from two years of remotely sensed data, captured at multiple instances over the summers of 2019 and 2020, with 24 plots and 18 plots, respectively. Two harvest stage models, early and late harvest, were assessed at two different locations in upstate New York, USA. Six varieties of snap bean were quantified using two components of yield, pod weight and seed length. We used two different vegetation detection algorithms. the Red-Edge Normalized Difference Vegetation Index (RENDVI) and Spectral Angle Mapper (SAM), to subset the fields into vegetation vs. non-vegetation pixels. Partial least squares regression (PLSR) was used as the regression model. Among nine different optimization models embedded in Jostar, we selected the Genetic Algorithm (GA), Ant Colony Optimization (ACO), Simulated Annealing (SA), and Particle Swarm Optimization (PSO) and their resulting joint ranking. The findings show that pod weight can be explained with a high coefficient of determination (R2 = 0.78–0.93) and low root-mean-square error (RMSE = 940–1369 kg/ha) for two years of data. Seed length yield assessment resulted in higher accuracies (R2 = 0.83–0.98) and lower errors (RMSE = 4.245–6.018 mm). Among optimization models used, ACO and SA outperformed others and the SAM vegetation detection approach showed improved results when compared to the RENDVI approach when dense canopies were being examined. Wavelengths at 450, 500, 520, 650, 700, and 760 nm, were identified in almost all data sets and harvest stage models used. The period between 44–55 days after planting (DAP) the optimal time period for yield assessment. Future work should involve transferring the learned concepts to a multispectral system, for eventual operational use; further attention should also be paid to seed length as a ground truth data collection technique, since this yield indicator is far more rapid and straightforward. 
    more » « less
  3. The association between elevation (agro-climatic zones, ACZs) and the mean annual total rainfall (MATRF) is not straightforward in different parts of the world. This study sought to estimate the amount of MATRF across four elevation zones of Jema watershed, which is situated in the northwestern highlands of Ethiopia, by employing an appropriate interpolation method. The elevation of the watershed ranges from 1895 to 3518 m a.s.l. For the sake of this study, 34 sample MATRF data were extracted from satellite and nearby gauge stations that were recorded from 1983 to 2010. These data sources were reconstructed by International Research Institute for Climate and Society at Columbia University, USA, at a scale of 10 km by 10 km. An elevation data set generated from a digital elevation model with 30-m resolution (DEM 30 m) was considered as a covariable to estimate the MATRF. To identify the optimal interpolation model, mean errors were computed using cross-validation statistics. The root-mean-square error (RMSE) analysis showed that ordinary cokriging (OCK) was the most accurate model with a predictive power of 87.3%. The root-mean-square standardized (RMSSE) analysis showed that the best precision value (0.72) occurred in OCK. Stable and Gaussian trend lines together with local polynomial types of trend removal, and an elliptical neighborhood search function could perform best to maximize the accuracy and the precision of estimating MATRF. Elevation, as a covariable, enhanced the degree of accuracy and precision of estimation. The value of the trend line function (least square) between the MATRF and elevation was very weak (R2 = 0.07), whereas the value of trend line function (least square) between the MATRF and the longitude coordinates (east–west direction) was medium (R2 = 0.34). The estimated MATRF for the entire watershed under study ranged from 1228 to 1640 mm. To conclude, elevation could contribute to the estimation of the MATRF. The value of the MATRF showed a declining pattern from the lower to higher elevation areas of the watershed. 
    more » « less
  4. Abstract

    To implement equilibrium hard‐modeling of spectroscopic titration data, the analyst must make a variety of crucial data processing choices that address negative absorbance and molar absorptivity values. The efficacy of three such methodological options is evaluated via high‐throughput Monte Carlo simulations, root‐mean‐square error surface mapping, and two mathematical theorems. Accuracy of the calculated binding constant values constitutes the key figure of merit used to compare different data analysis approaches. First, using singular value decomposition to filter the raw absorbance data prior to modeling often reduces the number of negative values involved but has little effect on the calculated binding constant despite its ability to address spectrometer noise. Second, both truncation of negative molar absorptivity values and the fast nonnegative least squares algorithms are superior to unconstrained regression because they avoid local minima; however, they introduce bias into the calculated binding constants in the presence of negative baseline offsets. Finally, we establish two theorems showing that negative values are best addressed when all the chemical solutions leading to the raw absorbance data are the result of mixing exactly two distinct stock solutions. This allows the raw absorbance data to be shifted up, eliminating negative baseline offsets, without affecting the concentration matrix, residual matrix, or calculated binding constants. Otherwise, the data cannot be safely upshifted. A comprehensive protocol for analyzing experimental absorbance datasets with is included.

    more » « less
  5. We introduce a novel approach to waveform inversion based on a data-driven reduced order model (ROM) of the wave operator. The presentation is for the acoustic wave equation, but the approach can be extended to elastic or electromagnetic waves. The data are time resolved measurements of the pressure wave gathered by an acquisition system that probes the unknown medium with pulses and measures the generated waves. We propose to solve the inverse problem of velocity estimation by minimizing the square misfit between the ROM computed from the recorded data and the ROM computed from the modeled data, at the current guess of the velocity. We give a step by step computation of the ROM, which depends nonlinearly on the data and yet can be obtained from them in a noniterative fashion, using efficient methods from linear algebra. We also explain how to make the ROM robust to data inaccuracy. The ROM computation requires the full array response matrix gathered with colocated sources and receivers. However, we find that the computation can deal with an approximation of this matrix, obtained from towed-streamer data using interpolation and reciprocity on-the-fly. Although the full-waveform inversion approach of nonlinear least-squares data fitting is challenging without low-frequency information, due to multiple minima of the data fit objective function, we find that the ROM misfit objective function has better behavior, even for a poor initial guess. We also find by explicit computation of the objective functions in a simple setting that the ROM misfit objective function has convexity properties, whereas the least-squares data fit objective function displays multiple local minima. 
    more » « less