skip to main content


Search for: All records

Creators/Authors contains: "Wu, Qiong"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The joint analysis of imaging‐genetics data facilitates the systematic investigation of genetic effects on brain structures and functions with spatial specificity. We focus on voxel‐wise genome‐wide association analysis, which may involve trillions of single nucleotide polymorphism (SNP)‐voxel pairs. We attempt to identify underlying organized association patterns of SNP‐voxel pairs and understand the polygenic and pleiotropic networks on brain imaging traits. We propose abi‐cliquegraph structure (ie, a set of SNPs highly correlated with a cluster of voxels) for the systematic association pattern. Next, we develop computational strategies to detect latent SNP‐voxelbi‐cliquesand an inference model for statistical testing. We further provide theoretical results to guarantee the accuracy of our computational algorithms and statistical inference. We validate our method by extensive simulation studies, and then apply it to the whole genome genetic and voxel‐level white matter integrity data collected from 1052 participants of the human connectome project. The results demonstrate multiple genetic loci influencing white matter integrity measures on splenium and genu of the corpus callosum.

     
    more » « less
    Free, publicly-accessible full text available August 20, 2025
  2. Toxic content detection is crucial for online services to remove inappropriate content that violates community standards. To automate the detection process, prior works have proposed varieties of machine learning (ML) approaches to train Language Models (LMs) for toxic content detection. However, both their accuracy and transferability across datasets are limited. Recently, Large Language Models (LLMs) have shown promise in toxic content detection due to their superior zero-shot and few-shot in-context learning ability as well as broad transferability on ML tasks.However, efficiently designing prompts for LLMs remains challenging. Moreover, the high run-time cost of LLMs may hinder their deployments in production. To address these challenges, in this work, we propose BD-LLM, a novel and efficient approach to bootstrapping and distilling LLMs for toxic content detection. Specifically, we design a novel prompting method named Decision-Tree-of-Thought (DToT) to bootstrap LLMs' detection performance and extract high-quality rationales. DToT can automatically select more fine-grained context to re-prompt LLMs when their responses lack confidence. Additionally, we use the rationales extracted via DToT to fine-tune student LMs. Our experimental results on various datasets demonstrate that DToT can improve the accuracy of LLMs by up to 4.6%. Furthermore, student LMs fine-tuned with rationales extracted via DToT outperform baselines on all datasets with up to 16.9% accuracy improvement, while being more than 60x smaller than conventional LLMs. Finally, we observe that student LMs fine-tuned with rationales exhibit better cross-dataset transferability.

     
    more » « less
    Free, publicly-accessible full text available March 25, 2025
  3. This paper revisits building machine learning algorithms that involve interactions between entities, such as those between financial assets in an actively managed portfolio, or interactions between users in a social network. Our goal is to forecast the future evolution of ensembles of multivariate time series in such applications (e.g., the future return of a financial asset or the future popularity of a Twitter account). Designing ML algorithms for such systems requires addressing the challenges of high-dimensional interactions and non-linearity. Existing approaches usually adopt an ad-hoc approach to integrating high-dimensional techniques into non-linear models and re- cent studies have shown these approaches have questionable efficacy in time-evolving interacting systems. To this end, we propose a novel framework, which we dub as the additive influence model. Under our modeling assump- tion, we show that it is possible to decouple the learning of high-dimensional interactions from the learning of non-linear feature interactions. To learn the high-dimensional interac- tions, we leverage kernel-based techniques, with provable guarantees, to embed the entities in a low-dimensional latent space. To learn the non-linear feature-response interactions, we generalize prominent machine learning techniques, includ- ing designing a new statistically sound non-parametric method and an ensemble learning algorithm optimized for vector re- gressions. Extensive experiments on two common applica- tions demonstrate that our new algorithms deliver significantly stronger forecasting power compared to standard and recently proposed methods. 
    more » « less
  4. Chemotrophic microorganisms face the steep challenge of limited energy resources in natural environments. This observation has important implications for interpreting and modeling the kinetics and thermodynamics of microbial reactions. Current modeling frameworks treat microbes as autocatalysts, and simulate microbial energy conservation and growth with fixed kinetic and thermodynamic parameters. However, microbes are capable of acclimating to the environment and modulating their parameters in order to gain competitive fitness. Here we constructed an optimization model and described microbes as self-adapting catalysts by linking microbial parameters to intracellular metabolic resources. From the optimization results, we related microbial parameters to the substrate concentration and the energy available in the environment, and simplified the relationship between the kinetics and the thermodynamics of microbial reactions.We took as examples Methanosarcina and Methanosaeta – the methanogens that produce methane from acetate – and showed how the acclimation model extrapolated laboratory observations to natural environments and improved the simulation of methanogenesis and the dominance of Methanosaeta over Methanosarcina in lake sediments. These results highlight the importance of physiological acclimation in shaping the kinetics and thermodynamics of microbial reactions and in determining the outcome of microbial interactions. 
    more » « less
  5. Gralnick, Jeffrey A. (Ed.)
    ABSTRACT The Monod equation has been widely applied as the general rate law of microbial growth, but its applications are not always successful. By drawing on the frameworks of kinetic and stoichiometric metabolic models and metabolic control analysis, the modeling reported here simulated the growth kinetics of a methanogenic microorganism and illustrated that different enzymes and metabolites control growth rate to various extents and that their controls peak at either very low, intermediate, or very high substrate concentrations. In comparison, with a single term and two parameters, the Monod equation only approximately accounts for the controls of rate-determining enzymes and metabolites at very high and very low substrate concentrations, but neglects the enzymes and metabolites whose controls are most notable at intermediate concentrations. These findings support a limited link between the Monod equation and methanogen growth, and unify the competing views regarding enzyme roles in shaping growth kinetics. The results also preclude a mechanistic derivation of the Monod equation from methanogen metabolic networks and highlight a fundamental challenge in microbiology: single-term expressions may not be sufficient for accurate prediction of microbial growth. IMPORTANCE The Monod equation has been widely applied to predict the rate of microbial growth, but its application is not always successful. Using a novel metabolic modeling approach, we simulated the growth of a methanogen and uncovered a limited mechanistic link between the Monod equation and the methanogen’s metabolic network. Specifically, the equation provides an approximation to the controls by rate-determining metabolites and enzymes at very low and very high substrate concentrations, but it is missing the remaining enzymes and metabolites whose controls are most notable at intermediate concentrations. These results support the Monod equation as a useful approximation of growth rates and highlight a fundamental challenge in microbial kinetics: single-term rate expressions may not be sufficient for accurate prediction of microbial growth. 
    more » « less
  6. Jin, Q ; Wu, Q ; Shapiro, B ; McKernan, S. (Ed.)
    The Monodequationhasbeenwidelyappliedasthegeneralratelaw of microbialgrowth,butitsapplicationsarenotalwayssuccessful.Bydrawingon the frameworksofkineticandstoichiometricmetabolicmodelsandmetaboliccon- trol analysis,themodelingreportedheresimulatedthegrowthkineticsofametha- nogenic microorganismandillustratedthatdifferentenzymesandmetabolitescon- trol growthratetovariousextentsandthattheircontrolspeakateitherverylow, intermediate, orveryhighsubstrateconcentrations.Incomparison,withasingle term andtwoparameters,theMonodequationonlyapproximatelyaccountsforthe controls ofrate-determiningenzymesandmetabolitesatveryhighandverylow substrate concentrations,butneglectstheenzymesandmetaboliteswhosecontrols are mostnotableatintermediateconcentrations.These findings supportalimited link betweentheMonodequationandmethanogengrowth,andunifythecompet- ing viewsregardingenzymerolesinshapinggrowthkinetics.Theresultsalsopre- clude amechanisticderivationoftheMonodequationfrommethanogenmetabolic networks andhighlightafundamentalchallengeinmicrobiology:single-termexpres- sions maynotbesufficient foraccuratepredictionofmicrobialgrowth. 
    more » « less
  7. The Q10 coefficient is the ratio of reaction rates at two temperatures 10°C apart, and has been widely applied to quantify the temperature sensitivity of organic matter decomposition. However, biogeochemists and ecologists have long recognized that a constant Q10 coefficient does not describe the temperature sensitivity of organic matter decomposition accurately. To examine the consequences of the constant Q10 assumption, we built a biogeochemical reaction model to simulate anaerobic organic matter decomposition in peatlands in the Upper Peninsula of Michigan, USA, and compared the simulation results to the predictions with Q10 coefficients. By accounting for the reactions of extracellular enzymes, mesophilic fermenting and methanogenic microbes, and their temperature responses, the biogeochemical reaction model reproduces the observations of previous laboratory incubation experiments, including the temporal variations in the concentrations of dissolved organic carbon, acetate, dihydrogen, carbon dioxide, and methane, and confirms that fermentation limits the progress of anaerobic organic matter decomposition. The modeling results illustrate the oversimplification inherent in the constant Q10 assumption and how the assumption undermines the kinetic prediction of anaerobic organic matter decomposition. In particular, the model predicts that between 5°C and 30°C, the decomposition rate increases almost linearly with increasing temperature, which stands in sharp contrast to the exponential relationship given by the Q10 coefficient. As a result, the constant Q10 approach tends to underestimate the rates of organic matter decomposition within the temperature ranges where Q10 values are determined, and overestimate the rates outside the temperature ranges. The results also show how biogeochemical reaction modeling, combined with laboratory experiments, can help uncover the temperature sensitivity of organic matter decomposition arising from underlying catalytic mechanisms. 
    more » « less