skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 31, 2025

Title: Gradient-Based Adversarial Training on Transformer Networks for Detecting Check-Worthy Factual Claims
This article presents the latest developments to ClaimBuster’s claim-spotting model, which tackles the critical task of identifying check-worthy claims from large streams of information. We introduce the first adversarially regularized, transformer-based claim-spotting model, which achieves state-of-the-art results on several benchmark datasets. In addition to analyzing model performance metrics, we also quantitatively and qualitatively analyze the impact of ClaimBuster’s real-world deployment. Moreover, to help facilitate reproducibility and community engagement, we publicly release our codebase, dataset, data curation platform, API, Google Colab notebooks, and various ClaimBuster-based demo systems, atclaimbuster.org.  more » « less
Award ID(s):
2346261
PAR ID:
10644849
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Association for Computing Machinery
Date Published:
Journal Name:
ACM Transactions on Intelligent Systems and Technology
Volume:
15
Issue:
6
ISSN:
2157-6904
Page Range / eLocation ID:
1 to 25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al. (2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al. (2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs always yields optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics. The source code to reproduce experimental results is available athttps://github.com/Kingsford-Group/gtednewilp/. 
    more » « less
  2. Abstract Discovering new materials is a challenging task in materials science crucial to the progress of human society. Conventional approaches based on experiments and simulations are labor-intensive or costly with success heavily depending on experts’ heuristic knowledge. Here, we propose a deep learning based Physics Guided Crystal Generative Model (PGCGM) for efficient crystal material design with high structural diversity and symmetry. Our model increases the generation validity by more than 700% compared to FTCP, one of the latest structure generators and by more than 45% compared to our previous CubicGAN model. Density Functional Theory (DFT) calculations are used to validate the generated structures with 1869 materials out of 2000 are successfully optimized and deposited into the Carolina Materials Databasewww.carolinamatdb.org, of which 39.6% have negative formation energy and 5.3% have energy-above-hull less than 0.25 eV/atom, indicating their thermodynamic stability and potential synthesizability. 
    more » « less
  3. Abstract We introduce a new framework called Machine Learning (ML) based Auroral Ionospheric electrodynamics Model (ML‐AIM). ML‐AIM solves a current continuity equation by utilizing the ML model of Field Aligned Currents of Kunduri et al. (2020,https://doi.org/10.1029/2020JA027908), the FAC‐derived auroral conductance model of Robinson et al. (2020,https://doi.org/10.1029/2020JA028008), and the solar irradiance conductance model of Moen and Brekke (1993,https://doi.org/10.1029/92gl02109). The ML‐AIM inputs are 60‐min time histories of solar wind plasma, interplanetary magnetic fields (IMF), and geomagnetic indices, and its outputs are ionospheric electric potential, electric fields, Pedersen/Hall currents, and Joule Heating. We conduct two ML‐AIM simulations for a weak geomagnetic activity interval on 14 May 2013 and a geomagnetic storm on 7–8 September 2017. ML‐AIM produces physically accurate ionospheric potential patterns such as the two‐cell convection pattern and the enhancement of electric potentials during active times. The cross polar cap potentials (ΦPC) from ML‐AIM, the Weimer (2005,https://doi.org/10.1029/2004ja010884) model, and the Super Dual Auroral Radar Network (SuperDARN) data‐assimilated potentials, are compared to the ones from 3204 polar crossings of the Defense Meteorological Satellite Program F17 satellite, showing better performance of ML‐AIM than others. ML‐AIM is unique and innovative because it predicts ionospheric responses to the time‐varying solar wind and geomagnetic conditions, while the other traditional empirical models like Weimer (2005,https://doi.org/10.1029/2004ja010884) designed to provide a quasi‐static ionospheric condition under quasi‐steady solar wind/IMF conditions. Plans are underway to improve ML‐AIM performance by including a fully ML network of models of aurora precipitation and ionospheric conductance, targeting its characterization of geomagnetically active times. 
    more » « less
  4. Abstract Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter (DM) components that cannot be directly observed. Galaxy formation simulations can be used to study the relationship between DM density fields and galaxy distributions. However, this relationship can be sensitive to assumptions in cosmology and astrophysical processes embedded in galaxy formation models, which remain uncertain in many aspects. In this work, we develop a diffusion generative model to reconstruct DM fields from galaxies. The diffusion model is trained on the CAMELS simulation suite that contains thousands of state-of-the-art galaxy formation simulations with varying cosmological parameters and subgrid astrophysics. We demonstrate that the diffusion model can predict the unbiased posterior distribution of the underlying DM fields from the given stellar density fields while being able to marginalize over uncertainties in cosmological and astrophysical models. Interestingly, the model generalizes to simulation volumes ≈500 times larger than those it was trained on and across different galaxy formation models. The code for reproducing these results can be found athttps://github.com/victoriaono/variational-diffusion-cdm✎. 
    more » « less
  5. Abstract We compared the performance of DREAM3D simulations in reproducing the long‐term radiation belt dynamics observed by Van Allen Probes over the entire year of 2017 with various boundary conditions (BCs) and model inputs. Specifically, we investigated the effects of three different outer boundary conditions, two different low‐energy boundary conditions for seed electrons, four different radial diffusion (RD) coefficients (DLL), four hiss wave models, and two chorus wave models from the literature. Using the outer boundary condition driven by GOES data, our benchmark simulation generally well reproduces the observed radiation belt dynamics insideL* = 6, with a better model performance at lowerμthan higherμ, whereμis the first adiabatic invariant. By varying the boundary conditions and inputs, we find that: (a) The data‐driven outer boundary condition is critical to the model performance, while adding in the data‐driven seed population doesn't further improve the performance. (b) The model shows comparable performance withDLLfrom Brautigam and Albert (2000,https://doi.org/10.1029/1999ja900344), Ozeke et al. (2014,https://doi.org/10.1002/2013ja019204), and Liu et al. (2016,https://doi.org/10.1002/2015gl067398), while withDLLfrom Ali et al. (2016,https://doi.org/10.1002/2016ja023002) the model shows less RD compared to data. (c) The model performance is similar with data‐based hiss models, but the results show faster loss is still needed inside the plasmasphere. (d) The model performs similarly with the two different chorus models, but better capturing the electron enhancement at higherμusing the Wang et al. (2019,https://doi.org/10.1029/2018ja026183) model due to its stronger wave power, since local heating for higher energy electrons is under‐reproduced in the current model. 
    more » « less