skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Lan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Modern model hubs, such as Hugging Face, store tens of petabytes of LLMs, with fine-tuned variants vastly outnumbering base models and dominating storage consumption. Existing storage reduction techniques---such as deduplication and compression---are either LLM-oblivious or not compatible with each other, limiting data reduction effectiveness. Our large-scale characterization study across all publicly available Hugging Face LLM repositories reveals several key insights: (1) fine-tuned models within the same family exhibit highly structured, sparse parameter differences suitable for delta compression; (2) bitwise similarity enables LLM family clustering; and (3) tensor-level deduplication is better aligned with model storage workloads, achieving high data reduction with low metadata overhead. Building on these insights, we design BitX, an effective, fast, lossless delta compression algorithm that compresses XORed difference between fine-tuned and base LLMs. We build ZipLLM, a model storage reduction pipeline that unifies tensor-level deduplication and lossless BitX compression. By synergizing deduplication and compression around LLM family clustering, ZipLLM reduces model storage consumption by 54%, over 20% higher than state-of-the-art deduplication and compression approaches. 
    more » « less
    Free, publicly-accessible full text available May 4, 2027
  2. The deployment of deep learning-based malware detection systems has transformed cybersecurity, offering sophisticated pattern recognition capabilities that surpass traditional signature-based approaches. However, these systems introduce new vulnerabilities requiring systematic investigation. This chapter examines adversarial attacks against graph neural network-based malware detection systems, focusing on semantics-preserving methodologies that evade detection while maintaining program functionality. We introduce a reinforcement learning (RL) framework that formulates the attack as a sequential decision making problem, optimizing the insertion of no-operation (NOP) instructions to manipulate graph structure without altering program behavior. Comparative analysis includes three baseline methods: random insertion, hill-climbing, and gradient-approximation attacks. Our experimental evaluation on real world malware datasets reveals significant differences in effectiveness, with the reinforcement learning approach achieving perfect evasion rates against both Graph Convolutional Network and Deep Graph Convolutional Neural Network architectures while requiring minimal program modifications. Our findings reveal three critical research gaps: transitioning from abstract Control Flow Graph representations to executable binary manipulation, developing universal vulnerability discovery across different architectures, and systematically translating adversarial insights into defensive enhancements. This work contributes to understanding adversarial vulnerabilities in graph-based security systems while establishing frameworks for evaluating machine learning-based malware detection robustness. 
    more » « less
    Free, publicly-accessible full text available December 1, 2026
  3. Free, publicly-accessible full text available October 14, 2026
  4. Free, publicly-accessible full text available September 29, 2026
  5. Free, publicly-accessible full text available November 13, 2026
  6. ABSTRACT In metazoans, autophagosomes fuse with late endosomes (LEs)/multivesicular bodies (MVBs) to form a hybrid organelle known as an amphisome. Subsequently upon fusion with lysosomes the contents of amphisomes are degraded. While the formation of metazoan amphisomes has been well established, it has remained an open question whether amphisomes form and deliver their cargo to the central vacuole for degradation in plant cells. In this mini review, we provide an update on recent discoveries in the field of plant autophagy that demonstrate the formation of amphisome-like organelles that are generated through several distinct autophagosome/MVB fusion pathways. 
    more » « less
    Free, publicly-accessible full text available November 23, 2026
  7. Abstract Non-adiabatic molecular dynamics (NAMD) simulations have become an indispensable tool for investigating excited-state dynamics in solids. In this work, we propose a general framework, N2AMD (Neural-Network Non-Adiabatic Molecular Dynamics), which employs an E(3)-equivariant deep neural Hamiltonian to boost the accuracy and efficiency of NAMD simulations. Distinct from conventional machine learning methods that predict key quantities in NAMD, N2AMD computes these quantities directly with a deep neural Hamiltonian, ensuring excellent accuracy, efficiency, and consistency. N2AMD not only achieves impressive efficiency in performing NAMD simulations at the hybrid functional level within the framework of the classical path approximation (CPA), but also demonstrates great potential in predicting non-adiabatic coupling vectors and suggests a method to go beyond CPA. Furthermore, N2AMD demonstrates excellent generalizability and enables seamless integration with advanced NAMD techniques and infrastructures. Taking several extensively investigated semiconductors as the prototypical system, we successfully simulate carrier recombination in both pristine and defective systems at large scales where conventional NAMD often significantly underestimates or even qualitatively incorrectly predicts lifetimes. This framework offers a reliable and efficient approach for conducting accurate NAMD simulations across various condensed materials. 
    more » « less
    Free, publicly-accessible full text available December 1, 2026
  8. Free, publicly-accessible full text available July 20, 2026
  9. In high-performance computing (HPC), modern supercomputers typically provide exclusive computing resources to user applications. Nevertheless, the interconnect network is a shared resource for both inter-node communication and across-node I/O access, among co-running workloads, leading to inevitable network interference. In this study, we develop MFNetSim, a multi-fidelity modeling framework that enables simulation of multi-traffic simultaneously over the interconnect network, including inter-process communication and I/O traffic. By combining different levels of abstraction, MFNetSim can efficiently co-model the communication and I/O traffic occurring on HPC systems equipped with flash-based storage. We conduct simulation studies of hybrid workloads composed of traditional HPC applications and emerging ML applications on a 1,056-node Dragonfly system with various configurations. Our analysis provides various observations regarding how network interference affects communication and I/O traffic. 
    more » « less
    Free, publicly-accessible full text available September 12, 2026
  10. Abstract In song-learning birds, vocalizations are species recognition signals and may act as premating reproductive barriers; for allopatric taxa, testing how the signals can influence the speciation processes is quite challenging. This study aims to understand genomic divergence and species recognition via songs in 2 allopatric taxa, eastern and western Nashville warblers (Leiothlypis ruficapilla ruficapilla vs. Leiothlypis ruficapilla ridgwayi). We performed playback experiments to assess their reciprocal behavioral responses, which suggests an asymmetric barrier: the eastern L. r. ruficapilla discriminates between the 2 songs, but the western L. r. ridgwayi does not. Using whole-genome sequencing, we also examined the extent of the taxa’s genomic divergence and estimated their demographic history. We identified dozens of highly differentiated genomic regions, as well as fluctuations in historical effective population sizes that indicate independent demographic trajectories during the Pleistocene. To contextualize the magnitude of divergence between L. ruficapilla subspecies, we applied the same genomic analyses to 2 additional eastern-western pairs of parulid warblers, Setophaga virens vs. Setophaga townsendi and Setophaga coronata coronata vs. Setophaga coronata auduboni, which have existing behavior studies but are not in strict allopatry. Our findings provide insights into the role of vocalizations in defining within-pair relationship and the important legacy of isolation during the Pleistocene. 
    more » « less