skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Gong, Jiaqi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available December 1, 2025
  2. Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external data sources beyond their training sets and querying predefined knowledge bases to generate accurate, context-rich responses. Most RAG implementations use vector similarity searches, but the effectiveness of this approach and the representation of knowledge bases remain underexplored. Emerging research suggests knowledge graphs as a promising solution. Therefore, this paper presents StructuGraphRAG, which leverages document structures to inform the extraction process and constructs knowledge graphs to enhance RAG for social science research, specifically using NSDUH datasets. Our method parses document structures to extract entities and relationships, constructing comprehensive and relevant knowledge graphs. Experimental results show that StructuGraphRAG outperforms traditional RAG methods in accuracy, comprehensiveness, and contextual relevance. This approach provides a robust tool for social science researchers, facilitating precise analysis of social determinants of health and justice, and underscores the potential of structured document-informed knowledge graph construction in AI and social science research. 
    more » « less
    Free, publicly-accessible full text available November 8, 2025
  3. Data storytelling is the skill to communicate data effectively and efficiently. Effective data storytelling goes beyond data visualization and focuses on explanation with clear rhetorical functions. It starts with a set of data insights collected from the data science workflow and involves iterative and interactive processes of filtering those insights into story slices, from which data stories can be created through ordering, organizing and narration. Data storytelling is an integral component of a well-rounded data science education, which complements foundational skills like quantitative reasoning and programming. Despite its significance, solid understanding of the theory and practice of developing data storytelling competency is lacking. Data storytelling is often perceived as a mythical process where quantitative information magically transforms into compelling narratives. Designing scalable coaching tools for data storytelling requires leveraging multidisciplinary expertise from learning science, computer science, data science, communication science, and human-centered design. In this workshop, we will share some initial findings and reflections from our interdisciplinary team searching for effective coaching methods and tools to support coaching data storytelling at scale. We will present results from literature reviews and expert interviews which will be packaged into a set of foundational tools such as mental model, cognitive processes and schema for story construction, assessment strategy, as well as preliminary ideas of tools to support data storytelling coaching. We hope to use this workshop to build a community of researchers and practitioners in coaching data storytelling in postsecondary formal and informal learning context. 
    more » « less
  4. The Sava River Basin (SRB) includes six countries (Slovenia, Croatia, Bosnia and Herzegovina, Serbia, Albania, and Montenegro), with the Sava River (SR) being a major tributary of the Danube River. The SR originates in the mountains (European Alps) of Slovenia and, because of a recent Slovenian government initiative to increase clean, sustainable energy, multiple hydropower facilities have been constructed within the past ~20 years. Given the importance of this river system for varying demands, including hydropower (energy production), information about past (paleo) dry (drought) and wet (pluvial) periods would provide important information to water managers and planners. Recent research applying traditional regression techniques and methods developed skillful reconstructions of seasonal (April–May–June–July–August–September or AMJJAS) streamflow using tree-ring-based proxies. The current research intends to expand upon these recent research efforts and investigate developing reconstructions of seasonal (AMJJAS) precipitation applying novel Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) techniques. When comparing the reconstructed AMJJAS precipitation datasets, the AI/ML/DL techniques statistically outperformed traditional regression techniques. When comparing the SRB AMJJAS precipitation reconstruction developed in this research to the SRB AMJJAS streamflow reconstruction developed in previous research, the temporal variability of the two reconstructions compared favorably. However, pluvial magnitudes of extreme periods differed, while drought magnitudes of extreme periods were similar, confirming drought is likely better captured in tree-ring-based proxy reconstructions of hydrologic variables. 
    more » « less
  5. Abstract Ultrasound computed tomography (USCT) shows great promise in nondestructive evaluation and medical imaging due to its ability to quickly scan and collect data from a region of interest. However, existing approaches are a tradeoff between the accuracy of the prediction and the speed at which the data can be analyzed, and processing the collected data into a meaningful image requires both time and computational resources. We propose to develop convolutional neural networks (CNNs) to accelerate and enhance the inversion results to reveal underlying structures or abnormalities that may be located within the region of interest. For training, the ultrasonic signals were first processed using the full waveform inversion (FWI) technique for only a single iteration; the resulting image and the corresponding true model were used as the input and output, respectively. The proposed machine learning approach is based on implementing two-dimensional CNNs to find an approximate solution to the inverse problem of a partial differential equation-based model reconstruction. To alleviate the time-consuming and computationally intensive data generation process, a high-performance computing-based framework has been developed to generate the training data in parallel. At the inference stage, the acquired signals will be first processed by FWI for a single iteration; then the resulting image will be processed by a pre-trained CNN to instantaneously generate the final output image. The results showed that once trained, the CNNs can quickly generate the predicted wave speed distributions with significantly enhanced speed and accuracy. 
    more » « less
  6. null (Ed.)