skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Zheng, X."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Kochmar, E ; Bexte, M ; Burstein, J ; Horbach, A ; Laarmann-Quante, R ; Tack, A ; Yaneva, V ; Yuan, Z (Ed.)
    Free, publicly-accessible full text available June 20, 2025
  2. Kochmar, E ; Bexte, M ; Burstein, J ; Horbach, A ; Laarmann-Quante, R ; Tack, A ; Yaneva, V ; Yuan, Z (Ed.)
    The practice of soliciting self-explanations from students is widely recognized for its pedagogical benefits. However, the labor-intensive effort required to manually assess students’ explanations makes it impractical for classroom settings. As a result, many current solutions to gauge students’ understanding during class are often limited to multiple choice or fill-in-the-blank questions, which are less effective at exposing misconceptions or helping students to understand and integrate new concepts. Recent advances in large language models (LLMs) present an opportunity to assess student explanations in real-time, making explanation-based classroom response systems feasible for implementation. In this work, we investigate LLM-based approaches for assessing the correctness of students’ explanations in response to undergraduate computer science questions. We investigate alternative prompting approaches for multiple LLMs (i.e., Llama 2, GPT-3.5, and GPT-4) and compare their performance to FLAN-T5 models trained in a fine-tuning manner. The results suggest that the highest accuracy and weighted F1 score were achieved by fine-tuning FLAN-T5, while an in-context learning approach with GPT-4 attains the highest macro F1 score. 
    more » « less
    Free, publicly-accessible full text available June 20, 2025
  3. Abelló, A ; Vassiliadis, P ; Romero, O ; Wrembel, R ; Bugiotti, F ; Gamper, J ; Vargas-Solar, G ; Zumpano, E (Ed.)
    Constructing knowledge graphs from heterogeneous data sources and evaluating their quality and consistency are important research questions in the field of knowledge graphs. We propose mapping rules to guide users to translate data from relational and graph sources into a meaningful knowledge graph and design a user-friendly language to specify the mapping rules. Given the mapping rules and constraints on source data, equivalent constraints on the target graph can be inferred, which is referred to as data source constraints. Besides this type of constraint, we design other two types: user-specified constraints and general rules that a high-quality knowledge graph should adhere to. We translate the three types of constraints into uniform expressions in the form of graph functional dependencies and extended graph dependencies, which can be used for consistency checking. Our approach provides a systematic way to build and evaluate knowledge graphs from diverse data sources. 
    more » « less
  4. Aims. We study the ensemble X-ray variability properties of active galactic nuclei (AGN) over large ranges of timescale (20 ks ≤  T  ≤ 14 yr), redshift (0 ≤  z  ≲ 3), luminosity (10 40  erg s −1  ≤  L X  ≤ 10 46  erg s −1 ), and black hole (BH) mass (10 6  ≤  M ⊙  ≤ 10 9 ). Methods. We propose the use of the variance-frequency diagram as a viable alternative to the study of the power spectral density (PSD), which is not yet accessible for distant, faint, and/or sparsely sampled AGN. Results. We show that the data collected from archival observations and previous literature studies are fully consistent with a universal PSD form, which does not show any evidence for systematic evolution of shape or amplitude with redshift or luminosity, even if there may be differences between individual AGN at a given redshift or luminosity. We find new evidence that the PSD bend frequency depends on BH mass and possibly on accretion rate. We finally discuss the implications for current and future AGN population and cosmological studies. 
    more » « less
  5. null (Ed.)
    Over the eastern north Atlantic (ENA) ocean, a total of 21 non-drizzling single-layer marine boundary layer (MBL) stratus and stratocumulus cloud caseperiods are selected in order to investigate the impacts of the environmental variables on the aerosol-cloud interaction (ACI_r) using the ground-based measurements from the Department of Energy Atmospheric Radiation Measurement (ARM) facility at the ENA site during the period 2016 – 2018. The ACI_r represents the relative change of cloud-droplet effective radius r_e with respect to the relative change of cloud condensation nuclei (CCN) number concentration (N_CCN) in the water vapor stratified environment. The ACI_r values vary from -0.004 to 0.207 with increasing precipitable water vapor (PWV) conditions, indicating that r_e is more sensitive to the CCN loading under sufficient water vapor supply, owing to the combined effect of enhanced condensational growth and coalescence processes associated with higher N_c and PWV. The environmental effects on ACI_r are examined by stratifying the data into different lower tropospheric stability (LTS) and vertical component of turbulence kinetic energy (TKE_w) regimes. The higher LTS normally associates with a more adiabatic cloud layer and a lower boundary layer and thus results in higher CCN to cloud droplet conversion and ACI_r. The ACI_r values under a range of PWV double from low TKE_w to high TKE_w regime, indicating a strong impact of turbulence on the ACI_r. The stronger boundary layer turbulence represented by higher TKE_w strengthens the connection and interaction between cloud microphysical properties and the underneath CCN and moisture sources. With sufficient water vapor and low CCN loading, the active coalescence process broadens the cloud droplet size distribution spectra, and consequently results in an enlargement of r_e. The enhanced N_c conversion and condensational growth induced by more intrusions of CCN effectively decrease r_e, which jointly presents as the increased ACI_r. The TKE_w median value of 0.08 m^2 s^(-2) suggests a feasible way in distinguishing the turbulence-enhanced aerosol-cloud interaction in non-drizzling MBL clouds. 
    more » « less
  6. We develop a framework for learning sparse nonparametric directed acyclic graphs (DAGs) from data. Our approach is based on a recent algebraic characterization of DAGs that led to a fully continuous program for scorebased learning of DAG models parametrized by a linear structural equation model (SEM). We extend this algebraic characterization to nonparametric SEM by leveraging nonparametric sparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases. Unlike existing approaches that require specific modeling choices, loss functions, or algorithms, we present a completely general framework that can be applied to general nonlinear models (e.g. without additive noise), general differentiable loss functions, and generic black-box optimization routines. 
    more » « less
  7. Spatial–temporal data arise frequently in biomedical, environmental, political and social science studies. Capturing dynamic changes of time-varying correlation structure is scientifically important in spatio-temporal data analysis. We approximate the time-varying empirical estimator of the spatial correlation matrix by groups of selected basis matrices representing substructures of the correlation matrix. After projecting the correlation structure matrix onto a space spanned by basis matrices, we also incorporate varying-coefficient model selection and estimation for signals associated with relevant basis matrices. The unique feature of the proposed method is that signals at local regions corresponding with time can be identified through the proposed penalized objective function. Theoretically, we show model selection consistency and the oracle property in detecting local signals for the varying-coefficient estimators. The proposed method is illustrated through simulation studies and brain fMRI data. 
    more » « less