NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Higher-order accurate two-sample network inference and network hashing

https://doi.org/10.1080/01621459.2025.2520459

Shao, Meijia; Xia, Dong; Zhang, Yuan; Wu, Qiong; Chen, Shuo (July 2025, Journal of the American Statistical Association)

Free, publicly-accessible full text available July 3, 2026
SHAPLEY-GUIDED UTILITY LEARNING FOR EFFECTIVE GRAPH INFERENCE DATA VALUATION

Chi, Hongliang; Wu, Qiong; Zhou, Zhengyi; Ma, Yao (January 2025, ICLR)

Graph Neural Networks (GNNs) have demonstrated remarkable performance in various graph-based machine learning tasks, yet evaluating the importance of neighbors of testing nodes remains largely unexplored due to the challenge of assessing data importance without test labels. To address this gap, we propose Shapley-Guided Utility Learning (SGUL), a novel framework for graph inference data valuation. SGUL innovatively combines transferable data-specific and model-specific features to approximate test accuracy without relying on ground truth labels. By incorporating Shapley values as a preprocessing step and using feature Shapley values as input, our method enables direct optimization of Shapley value prediction while reducing computational demands. SGUL overcomes key limitations of existing methods, including poor generalization to unseen test-time structures and indirect optimization. Experiments on diverse graph datasets demonstrate that SGUL consistently outperforms existing baselines in both inductive and transductive settings. SGUL offers an effective, efficient, and interpretable approach for quantifying the value of test-time neighbors.
more » « less
Free, publicly-accessible full text available January 22, 2026
A multivariate to multivariate approach for voxel‐wise genome‐wide association analysis

https://doi.org/10.1002/sim.10101

Wu, Qiong; Zhang, Yuan; Huang, Xiaoqi; Ma, Tianzhou; Hong, L Elliot; Kochunov, Peter; Chen, Shuo (August 2024, Statistics in Medicine)

The joint analysis of imaging‐genetics data facilitates the systematic investigation of genetic effects on brain structures and functions with spatial specificity. We focus on voxel‐wise genome‐wide association analysis, which may involve trillions of single nucleotide polymorphism (SNP)‐voxel pairs. We attempt to identify underlying organized association patterns of SNP‐voxel pairs and understand the polygenic and pleiotropic networks on brain imaging traits. We propose abi‐cliquegraph structure (ie, a set of SNPs highly correlated with a cluster of voxels) for the systematic association pattern. Next, we develop computational strategies to detect latent SNP‐voxelbi‐cliquesand an inference model for statistical testing. We further provide theoretical results to guarantee the accuracy of our computational algorithms and statistical inference. We validate our method by extensive simulation studies, and then apply it to the whole genome genetic and voxel‐level white matter integrity data collected from 1052 participants of the human connectome project. The results demonstrate multiple genetic loci influencing white matter integrity measures on splenium and genu of the corpus callosum.
more » « less
Full Text Available
Unraveling the depth-dependent causal dynamics of methanogenesis and methanotrophy in a high-latitude fen peatland

https://doi.org/10.1088/1748-9326/adaf44

Yang, Shuai; Tang, Jinyun; Li, Zhen; Yuan, Kunxiaojia; Wu, Qiong; Chang, Kuang-Yu; Hodgkins, Suzanne B; Wilson, Rachel M; Zhu, Qing; Grant, Robert F; et al (February 2025, Environmental Research Letters)

Abstract The dynamics of methane (CH₄) cycling in high-latitude peatlands through different pathways of methanogenesis and methanotrophy are still poorly understood due to the spatiotemporal complexity of microbial activities and biogeochemical processes. Additionally, long-termin situmeasurements within soil columns are limited and associated with large uncertainties in microbial substrates (e.g. dissolved organic carbon, acetate, hydrogen). To better understand CH₄cycling dynamics, we first applied an advanced biogeochemical model,ecosys, to explicitly simulate methanogenesis, methanotrophy, and CH₄transport in a high-latitude fen (within the Stordalen Mire, northern Sweden). Next, to explore the vertical heterogeneity in CH₄cycling, we applied the PCMCI/PCMCI+ causal detection framework with a bootstrap aggregation method to the modeling results, characterizing causal relationships among regulating factors (e.g. temperature, microbial biomass, soil substrate concentrations) through acetoclastic methanogenesis, hydrogenotrophic methanogenesis, and methanotrophy, across three depth intervals (0–10 cm, 10–20 cm, 20–30 cm). Our results indicate that temperature, microbial biomass, and methanogenesis and methanotrophy substrates exhibit significant vertical variations within the soil column. Soil temperature demonstrates strong causal relationships with both biomass and substrate concentrations at the shallower depth (0–10 cm), while these causal relationships decrease significantly at the deeper depth within the two methanogenesis pathways. In contrast, soil substrate concentrations show significantly greater causal relationships with depth, suggesting the substantial influence of substrates on CH₄cycling. CH₄production is found to peak in August, while CH₄oxidation peaks predominantly in October, showing a lag response between production and oxidation. Overall, this research provides important insights into the causal mechanisms modulating CH₄cycling across different depths, which will improve carbon cycling predictions, and guide the future field measurement strategies.
more » « less
Free, publicly-accessible full text available February 11, 2026
Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models

https://doi.org/10.1609/aaai.v38i19.30178

Zhang, Jiang; Wu, Qiong; Xu, Yiming; Cao, Cheng; Du, Zheng; Psounis, Konstantinos (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Toxic content detection is crucial for online services to remove inappropriate content that violates community standards. To automate the detection process, prior works have proposed varieties of machine learning (ML) approaches to train Language Models (LMs) for toxic content detection. However, both their accuracy and transferability across datasets are limited. Recently, Large Language Models (LLMs) have shown promise in toxic content detection due to their superior zero-shot and few-shot in-context learning ability as well as broad transferability on ML tasks.However, efficiently designing prompts for LLMs remains challenging. Moreover, the high run-time cost of LLMs may hinder their deployments in production. To address these challenges, in this work, we propose BD-LLM, a novel and efficient approach to bootstrapping and distilling LLMs for toxic content detection. Specifically, we design a novel prompting method named Decision-Tree-of-Thought (DToT) to bootstrap LLMs' detection performance and extract high-quality rationales. DToT can automatically select more fine-grained context to re-prompt LLMs when their responses lack confidence. Additionally, we use the rationales extracted via DToT to fine-tune student LMs. Our experimental results on various datasets demonstrate that DToT can improve the accuracy of LLMs by up to 4.6%. Furthermore, student LMs fine-tuned with rationales extracted via DToT outperform baselines on all datasets with up to 16.9% accuracy improvement, while being more than 60x smaller than conventional LLMs. Finally, we observe that student LMs fine-tuned with rationales exhibit better cross-dataset transferability.
more » « less
Full Text Available
A framework for integrating genomics, microbial traits, and ecosystem biogeochemistry

https://doi.org/10.1038/s41467-025-57386-5

Li, Zhen; Riley, William_J; Marschmann, Gianna_L; Karaoz, Ulas; Shirley, Ian_A; Wu, Qiong; Bouskill, Nicholas_J; Chang, Kuang-Yu; Crill, Patrick_M; Grant, Robert_F; et al (March 2025, Nature Communications)
GoPlaces: An App for Personalized Indoor Place Prediction

https://doi.org/10.1109/MASS58611.2023.00076

Sen, Pritam; Jiang, Xiaopeng; Wu, Qiong; Talasila, Manoop; Hsu, Wen-Ling; Borcea, Cristian (September 2023, IEEE)

Full Text Available
HiFlash: Communication-Efficient Hierarchical Federated Learning With Adaptive Staleness Control and Heterogeneity-Aware Client-Edge Association

https://doi.org/10.1109/TPDS.2023.3238049

Wu, Qiong; Chen, Xu; Ouyang, Tao; Zhou, Zhi; Zhang, Xiaoxi; Yang, Shusen; Zhang, Junshan (May 2023, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
Symphony in the Latent Space: Provably Integrating High-dimensional Techniques with Non-linear Machine Learning Models

https://doi.org/10.1609/aaai.v37i9.26233

Wu, Qiong; Li, Jian; Liu, Zhenming; Li, Yanhua; Cucuringu, Mihai (February 2023, AAAI)

This paper revisits building machine learning algorithms that involve interactions between entities, such as those between financial assets in an actively managed portfolio, or interactions between users in a social network. Our goal is to forecast the future evolution of ensembles of multivariate time series in such applications (e.g., the future return of a financial asset or the future popularity of a Twitter account). Designing ML algorithms for such systems requires addressing the challenges of high-dimensional interactions and non-linearity. Existing approaches usually adopt an ad-hoc approach to integrating high-dimensional techniques into non-linear models and re- cent studies have shown these approaches have questionable efficacy in time-evolving interacting systems. To this end, we propose a novel framework, which we dub as the additive influence model. Under our modeling assump- tion, we show that it is possible to decouple the learning of high-dimensional interactions from the learning of non-linear feature interactions. To learn the high-dimensional interac- tions, we leverage kernel-based techniques, with provable guarantees, to embed the entities in a low-dimensional latent space. To learn the non-linear feature-response interactions, we generalize prominent machine learning techniques, includ- ing designing a new statistically sound non-parametric method and an ensemble learning algorithm optimized for vector re- gressions. Extensive experiments on two common applica- tions demonstrate that our new algorithms deliver significantly stronger forecasting power compared to standard and recently proposed methods.
more » « less
Full Text Available
FedHome: Cloud-Edge Based Personalized Federated Learning for In-Home Health Monitoring

https://doi.org/10.1109/TMC.2020.3045266

Wu, Qiong; Chen, Xu; Zhou, Zhi; Zhang, Junshan (August 2022, IEEE Transactions on Mobile Computing)

Full Text Available

« Prev Next »

Search for: All records