skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Ma, Lei"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Large Language Models (LLMs) have demonstrated unprecedented capabilities in code generation. However, there remains a limited understanding of code generation errors that LLMs can produce. To bridge the gap, we conducted an in-depth analysis of code generation errors across six representative LLMs on the HumanEval dataset. Specifically, we first employed open coding and thematic analysis to distill a comprehensive taxonomy of code generation errors. We analyzed two dimensions of error characteristics -- semantic characteristics and syntactic characteristics. Our analysis revealed that LLMs often made non-trivial, multi-line code generation errors in various locations and with various root causes. We further analyzed the correlation between these errors and task complexity as well as test pass rate. Our findings highlighted several challenges in locating and fixing code generation errors made by LLMs. In the end, we discussed several future directions to address these challenges. 
    more » « less
    Free, publicly-accessible full text available April 26, 2026
  2. Iwanowicz, Luke R (Ed.)
    ABSTRACT The mummichog,Fundulus heteroclitus, an abundant estuarine fish broadly distributed along the eastern coast of North America, has repeatedly evolved tolerance to otherwise lethal levels of aromatic hydrocarbon exposure. This tolerance is linked to reduced activation of the aryl hydrocarbon receptor (AHR) signaling pathway. In other animals, the AHR has been shown to influence the gastrointestinal-associated microbial community, particularly when activated by the model toxic pollutant 3,3′,4,4′,5-pentachlorobiphenyl (PCB-126) and other dioxin-like compounds. To understand host population and PCB-126 exposure effects on mummichog gut microbiota, we sampled two populations of wild fish, one from a PCB-contaminated environment (New Bedford Harbor, MA, USA) and the other from a much less polluted location (Scorton Creek, MA, USA), as well as laboratory-reared F2 generation fish originating from each of these populations. We examined the microbes associated with the gut of these fish using amplicon sequencing of bacterial and archaeal small subunit ribosomal RNA genes. Fish living in the PCB-polluted site had high microbial alpha and beta diversity compared to fish from the low PCB site. These differences between wild fish were not present in laboratory-reared F2 fish that originated from the same populations. Microbial compositional differences existed between wild and lab-reared fish, with the wild fish dominated by Vibrionaceae and the lab-reared fish by Enterococceae. These results suggest that mummichog habitat and/or environmental conditions have a stronger influence on the mummichog gut microbiome compared to population or hereditary-based influences. Mummichog are important eco-evolutionary model organisms; this work reveals their importance for exploring host-environmental-microbiome dynamics. IMPORTANCEThe mummichog fish, a common resident of North America's east coast estuaries, has evolved the ability to survive in waters contaminated with toxic chemicals that would typically be deadly. Our study investigates how living in and adapting to these toxic environments may affect their gut microbiomes. We compared mummichogs from a polluted area in Massachusetts with those from a non-polluted site and found significant differences in their gut microbes. Interestingly, when we raised the next generation of these fish in a lab, these differences disappeared, suggesting that the environment plays a more crucial role in shaping the gut microbiome than genetics. Understanding these changes helps shed light on how animals and their associated microbiomes adapt to pollution, which can inform conservation efforts and our broader understanding of environmental impacts on host-microbe dynamics. 
    more » « less
    Free, publicly-accessible full text available March 4, 2026
  3. Due to the scarcity of reliable anomaly labels, recent anomaly detection methods leveraging noisy auto-generated labels either select clean samples or refurbish noisy labels. However, both approaches struggle due to the unique properties of anomalies.Sample selectionoften fails to separate sufficiently many clean anomaly samples from noisy ones, whilelabel refurbishmenterroneously refurbishesmarginalclean samples. To overcome these limitations, we design Unity, thefirstlearning from noisy labels (LNL) approach for anomaly detection that elegantly leverages the merits of both sample selection and label refurbishment to iteratively prepare a diverse clean sample set for network training. Unity uses a pair of deep anomaly networks to collaboratively select samples with clean labels based on prediction agreement, followed by a disagreement resolution mechanism to capture marginal samples with clean labels. Thereafter, Unity utilizes unique properties of anomalies to design an anomaly-centric contrastive learning strategy that accurately refurbishes the remaining noisy labels. The resulting set, composed ofselected and refurbishedclean samples, will be used to train the anomaly networks in the next training round. Our experimental study on 10 real-world benchmark datasets demonstrates that Unity consistently outperforms state-of-the-art LNL techniques by up to 0.31 in F-1 Score (0.52 \rightarrow 0.83). 
    more » « less
    Free, publicly-accessible full text available February 10, 2026
  4. Log anomaly detection, critical in identifying system failures and preempting security breaches, finds irregular patterns within large volumes of log data. Modern log anomaly detectors rely on training deep learning models on clean anomaly-free log data. However, such clean log data requires expensive and tedious human labeling. In this paper, we thus propose a robust log anomaly detection framework, PlutoNOSPACE, that automatically selects a clean representative sample subset of the polluted log sequence data to train a Transformer-based anomaly detection model. Pluto features three innovations. First, due to localized concentrations of anomalies inherent in the embedding space of log data, Pluto partitions the sequence embedding space generated by the model into regions that then allow it to identify and discard regions that are highly polluted by our pollution level estimation scheme, based on our pollution quantification via Gaussian mixture modeling. Second, for the remaining more slightly polluted regions, we select samples that maximally purify the eigenvector spectrum, which can be transformed into the NP-hard facility location problem; allowing us to leverage its greedy solution with a (1-(1/e)) approximation guarantee in optimality. Third, by iteratively alternating between the above subset selection, a model re-training on the latest subset, and a subset filtering using dynamic training artifacts generated by the latest model, the data selected is progressively refined. The final sample set is used to retrain the final anomaly detection model. Our experiments on four real-world log benchmark datasets demonstrate that by retaining 77.7% (BGL) to 96.6% (ThunderBird) of the normal sequences while effectively removing 90.3% (BGL) to 100.0% (ThunderBird, HDFS) of the anomalies, Pluto provides a significant absolute F-1 improvement up to 68.86% (2.16% → 71.02%) compared to the state-of-the-art sample selection methods. The implementation of this work is available at https://github.com/LeiMa0324/Pluto-SIGMOD25. 
    more » « less
  5. Log anomaly detection, critical in identifying system failures and preempting security breaches, finds irregular patterns within large volumes of log data. Modern log anomaly detectors rely on training deep learning models on clean anomaly-free log data. However, such clean log data requires expensive and tedious human labeling. In this paper, we thus propose a robust log anomaly detection framework, PlutoNOSPACE, that automatically selects a clean representative sample subset of the polluted log sequence data to train a Transformer-based anomaly detection model. Pluto features three innovations. First, due to localized concentrations of anomalies inherent in the embedding space of log data, Pluto partitions the sequence embedding space generated by the model into regions that then allow it to identify and discard regions that are highly polluted by our pollution level estimation scheme, based on our pollution quantification via Gaussian mixture modeling. Second, for the remaining more slightly polluted regions, we select samples that maximally purify the eigenvector spectrum, which can be transformed into the NP-hard facility location problem; allowing us to leverage its greedy solution with a (1-(1/e)) approximation guarantee in optimality. Third, by iteratively alternating between the above subset selection, a model re-training on the latest subset, and a subset filtering using dynamic training artifacts generated by the latest model, the data selected is progressively refined. The final sample set is used to retrain the final anomaly detection model. Our experiments on four real-world log benchmark datasets demonstrate that by retaining 77.7\% (BGL) to 96.6\% (ThunderBird) of the normal sequences while effectively removing 90.3\% (BGL) to 100.0\% (ThunderBird, HDFS) of the anomalies, Pluto provides a significant absolute F-1 improvement up to 68.86\% (2.16\% → 71.02\%) compared to the state-of-the-art sample selection methods. The implementation of this work is available at https://github.com/LeiMa0324/Pluto-SIGMOD25. 
    more » « less
  6. Large Language Models (LLMs) have recently been widely used for code generation. Due to the complexity and opacity of LLMs, little is known about how these models generate code. We made the first attempt to bridge this knowledge gap by investigating whether LLMs attend to the same parts of a task description as human programmers during code generation. An analysis of six LLMs, including GPT-4, on two popular code generation benchmarks revealed a consistent misalignment between LLMs' and programmers' attention. We manually analyzed 211 incorrect code snippets and found five attention patterns that can be used to explain many code generation errors. Finally, a user study showed that model attention computed by a perturbation-based method is often favored by human programmers. Our findings highlight the need for human-aligned LLMs for better interpretability and programmer trust. 
    more » « less
  7. Machine learning is routinely used to automate consequential decisions about users in domains such as finance and healthcare, raising concerns of transparency and recourse for negative outcomes. Existing Explainable AI techniques generate a static counterfactual point explanation which recommends changes to a user's instance to obtain a positive outcome. Unfortunately, these recommendations are often difficult or impossible for users to realistically enact. To overcome this, we present FACET, the first interactive robust explanation system which generates personalized counterfactual region explanations. FACET's expressive explanation analytics empower users to explore and compare multiple counterfactual options and develop a personalized actionable plan for obtaining their desired outcome. Visitors to the demonstration will interact with FACET via a new web dashboard for explanations of a loan approval scenario. In doing so, visitors will experience how lay users can easily leverage powerful explanation analytics through visual interactions and displays without the need for a strong technical background. 
    more » « less
  8. Automated decision-making systems are increasingly deployed in domains such as hiring and credit approval where negative outcomes can have substantial ramifications for decision subjects. Thus, recent research has focused on providing explanations that help decision subjects understand the decision system and enable them to take actionable recourse to change their outcome. Popular counterfactual explanation techniques aim to achieve this by describing alterations to an instance that would transform a negative outcome to a positive one. Unfortunately, little user evaluation has been performed to assess which of the many counterfactual approaches best achieve this goal. In this work, we conduct a crowd-sourced between-subjects user study (N = 252) to examine the effects of counterfactual explanation type and presentation on lay decision subjects’ understandings of automated decision systems. We find that the region-based counterfactual type significantly increases objective understanding, subjective understanding, and response confidence as compared to the point-based type. We also find that counterfactual presentation significantly effects response time and moderates the effect of counterfactual type for response confidence, but not understanding. A qualitative analysis reveals how decision subjects interact with different explanation configurations and highlights unmet needs for explanation justification. Our results provide valuable insights and recommendations for the development of counterfactual explanation techniques towards achieving practical actionable recourse and empowering lay users to seek justice and opportunity in automated decision workflows. 
    more » « less