skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Lee, Justin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 27, 2025
  2. Recent advances have greatly increased the capabilities of large language models (LLMs), but our understanding of the models and their safety has not progressed as fast. In this paper we aim to understand LLMs deeper by studying their individual neurons. We build upon previous work showing large language models such as GPT-4 can be useful in explaining what each neuron in a language model does. Specifically, we analyze the effect of the prompt used to generate explanations and show that reformatting the explanation prompt in a more natural way can significantly improve neuron explanation quality and greatly reduce computational cost. We demonstrate the effects of our new prompts in three different ways, incorporating both automated and human evaluations. 
    more » « less
  3. Islamophobia, a negative predilection towards the Muslim community, is present on social media platforms. In addition to causing harm to victims, it also hurts the reputation of social media platforms that claim to provide a safe online environment for all users. The volume of social media content is impossible to be manually reviewed, thus, it is important to find automated solutions to combat hate speech on social media platforms. Machine learning approaches have been used in the literature as a way to automate hate speech detection. In this paper, we use deep learning techniques to detect Islamophobia over Reddit and topic modeling to analyze the content and reveal topics from comments identified as Islamophobic. Some topics we identified include the Islamic dress code, religious practices, marriage, and politics. To detect Islamophobia, we used deep learning models. The highest performance was achieved with BERTbase+CNN, with an F1-Score of 0.92.

     
    more » « less
  4. Abstract Background

    The topology of metabolic networks is both well-studied and remarkably well-conserved across many species. The regulation of these networks, however, is much more poorly characterized, though it is known to be divergent across organisms—two characteristics that make it difficult to model metabolic networks accurately. While many computational methods have been built to unravel transcriptional regulation, there have been few approaches developed for systems-scale analysis and study of metabolic regulation. Here, we present a stepwise machine learning framework that applies established algorithms to identify regulatory interactions in metabolic systems based on metabolic data: stepwise classification of unknown regulation, or SCOUR.

    Results

    We evaluated our framework on both noiseless and noisy data, using several models of varying sizes and topologies to show that our approach is generalizable. We found that, when testing on data under the most realistic conditions (low sampling frequency and high noise), SCOUR could identify reaction fluxes controlled only by the concentration of a single metabolite (its primary substrate) with high accuracy. The positive predictive value (PPV) for identifying reactions controlled by the concentration of two metabolites ranged from 32 to 88% for noiseless data, 9.2 to 49% for either low sampling frequency/low noise or high sampling frequency/high noise data, and 6.6–27% for low sampling frequency/high noise data, with results typically sufficiently high for lab validation to be a practical endeavor. While the PPVs for reactions controlled by three metabolites were lower, they were still in most cases significantly better than random classification.

    Conclusions

    SCOUR uses a novel approach to synthetically generate the training data needed to identify regulators of reaction fluxes in a given metabolic system, enabling metabolomics and fluxomics data to be leveraged for regulatory structure inference. By identifying and triaging the most likely candidate regulatory interactions, SCOUR can drastically reduce the amount of time needed to identify and experimentally validate metabolic regulatory interactions. As high-throughput experimental methods for testing these interactions are further developed, SCOUR will provide critical impact in the development of predictive metabolic models in new organisms and pathways.

     
    more » « less
  5. Bluetongue virus (BTV) is an arthropod-borne, segmented double-stranded RNA virus that can cause severe disease in both wild and domestic ruminants. BTV evolves via several key mechanisms, including the accumulation of mutations over time and the reassortment of genome segments.Additionally, BTV must maintain fitness in two disparate hosts, the insect vector and the ruminant. The specific features of viral adaptation in each host that permit host-switching are poorly characterized. Limited field studies and experimental work have alluded to the presence of these phenomena at work, but our understanding of the factors that drive or constrain BTV's genetic diversification remains incomplete. Current research leveraging novel approaches and whole genome sequencing applications promises to improve our understanding of BTV's evolution, ultimately contributing to the development of better predictive models and management strategies to reduce future impacts of bluetongue epizootics. 
    more » « less
  6. null (Ed.)
  7. Abstract

    Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public Galaxy services by platform stability, tool and reference dataset diversity, training, support and integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements in accessibility, tool discoverability through Galaxy Labs/subdomains, and a redesigned Galaxy ToolShed. Galaxy tool capabilities are progressing in two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support. Engagement with global research consortia is being increased by developing more workflows in Galaxy and by resourcing the public Galaxy services to run them. The Galaxy Training Network (GTN) portfolio has grown in both size, and accessibility, through learning paths and direct integration with Galaxy tools that feature in training courses. Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface. Environmental impact assessment is also helping engage users and developers, reminding them of their role in sustainability, by displaying estimated CO2 emissions generated by each Galaxy job.

     
    more » « less