In this article, a testlet hierarchical diagnostic classification model (TH-DCM) was introduced to take both attribute hierarchies and item bundles into account. The expectation-maximization algorithm with an analytic dimension reduction technique was used for parameter estimation. A simulation study was conducted to assess the parameter recovery of the proposed model under varied conditions, and to compare TH-DCM with testlet higher-order CDM (THO-DCM; Hansen, M. (2013). Hierarchical item response models for cognitive diagnosis (Unpublished doctoral dissertation). UCLA; Zhan, P., Li, X., Wang, W.-C., Bian, Y., & Wang, L. (2015). The multidimensional testlet-effect cognitive diagnostic models. Acta Psychologica Sinica, 47(5), 689. https://doi.org/10.3724/SP.J.1041.2015.00689 ). Results showed that (1) ignoring large testlet effects worsened parameter recovery, (2) DCMs assuming equal testlet effects within each testlet performed as well as the testlet model assuming unequal testlet effects under most conditions, (3) misspecifications in joint attribute distribution had an differential impact on parameter recovery, and (4) THO-DCM seems to be a robust alternative to TH-DCM under some hierarchical structures. A set of real data was also analyzed for illustration.
more »
« less
A testing based approach to the discovery of differentially correlated variable sets
Given data obtained under two sampling conditions, it is often of interest to identify variables that behave differently in one condition than in the other. We introduce a method for differential analysis of second-order behavior called Differential Correlation Mining (DCM). The DCM method identifies differentially correlated sets of variables, with the property that the average pairwise correlation between variables in a set is higher under one sample condition than the other. DCM is based on an iterative search procedure that adaptively updates the size and elements of a candidate variable set. Updates are performed via hypothesis testing of individual variables, based on the asymptotic distribution of their average differential correlation. We investigate the performance of DCM by applying it to simulated data as well as to recent experimental datasets in genomics and brain imaging.
more »
« less
- Award ID(s):
- 1633212
- PAR ID:
- 10073282
- Date Published:
- Journal Name:
- Annals of applied statistics
- Volume:
- 12
- Issue:
- 2
- ISSN:
- 1941-7330
- Page Range / eLocation ID:
- 1180-1203
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Subdata selection from big data is an active area of research that facilitates inferences based on big data with limited computational expense. For linear regression models, the optimal design-inspired Information-Based Optimal Subdata Selection (IBOSS) method is a computationally efficient method for selecting subdata that has excellent statistical properties. But the method can only be used if the subdata size, k, is at last twice the number of regression variables, p. In addition, even when $$k\ge 2p$$, under the assumption of effect sparsity, one can expect to obtain subdata with better statistical properties by trying to focus on active variables. Inspired by recent efforts to extend the IBOSS method to situations with a large number of variables p, we introduce a method called Combining Lasso And Subdata Selection (CLASS) that, as shown, improves on other proposed methods in terms of variable selection and building a predictive model based on subdata when the full data size n is very large and the number of variables p is large. In terms of computational expense, CLASS is more expensive than recent competitors for moderately large values of n, but the roles reverse under effect sparsity for extremely large values of n.more » « less
-
null (Ed.)Many inverse problems involve two or more sets of variables that represent different physical quantities but are tightly coupled with each other. For example, image super-resolution requires joint estimation of the image and motion parameters from noisy measurements. Exploiting this structure is key for efficiently solving these large-scale optimization problems, which are often ill-conditioned. In this paper, we present a new method called Linearize And Project (LAP) that offers a flexible framework for solving inverse problems with coupled variables. LAP is most promising for cases when the subproblem corresponding to one of the variables is considerably easier to solve than the other. LAP is based on a Gauss–Newton method, and thus after linearizing the residual, it eliminates one block of variables through projection. Due to the linearization, this block can be chosen freely. Further, LAP supports direct, iterative, and hybrid regularization as well as constraints. Therefore LAP is attractive, e.g., for ill-posed imaging problems. These traits differentiate LAP from common alternatives for this type of problem such as variable projection (VarPro) and block coordinate descent (BCD). Our numerical experiments compare the performance of LAP to BCD and VarPro using three coupled problems whose forward operators are linear with respect to one block and nonlinear for the other set of variables.more » « less
-
This paper proposes a simple yet effective method for power system probabilistic transient stability assessment considering the wind farm uncertainties and correlations. Specifically, the inverse Nataf-transformation-based three-point estimation method and the Cornish-Fisher expansion have been integrated together to deal with the uncertainties and the correlations among different wind farms. Then, by resorting to the extended dynamic security region approach, the transient stability criterion is derived as a linear combination of nodal injection vector under a given fault condition. New indices for the identification of critical lines have also been developed. Extensive simulation results carried out on four different systems, including the practical GZ power system in China show that the computational efficiency of the proposed method is much higher than the Monte-Carlo-based method and other methods almost without the loss of accuracy. The effectiveness of the proposed method under different penetrations of wind power with different degree of correlations is also validated. It is shown that correlation among wind farms has a larger impact on the transient stability results with a higher penetration level of renewable energy.more » « less
-
This paper investigates the problem of selecting instrumental variables relative to a target causal influence X→Y from observational data generated by linear non-Gaussian acyclic causal models in the presence of unmeasured confounders. We propose a necessary condition for detecting variables that cannot serve as instrumental variables. Unlike many existing conditions for continuous variables, i.e., that at least two or more valid instrumental variables are present in the system, our condition is designed with a single instrumental variable. We then characterize the graphical implications of our condition in linear non-Gaussian acyclic causal models. Given that the existing graphical criteria for the instrument validity are not directly testable given observational data, we further show whether and how such graphical criteria can be checked by exploiting our condition. Finally, we develop a method to select the set of candidate instrumental variables given observational data. Experimental results on both synthetic and real-world data show the effectiveness of the proposed method.more » « less
An official website of the United States government

