skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Shen, X"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Transformer models have been widely investigated in different domains by providing long-range dependency handling and global contextual awareness, driving the development of popular AI applications such as ChatGPT, Gemini, and Alexa. State Space Models (SSMs) have emerged as strong contenders in the field of sequential modeling, challenging the dominance of Transformers. SSMs incorporate a selective mechanism that allows for dynamic parameter adjustment based on input data, enhancing their performance. However, this mechanism also comes with increasing computational complexity and bandwidth demands, posing challenges for deployment on resource-constraint mobile devices. To address these challenges without sacrificing the accuracy of the selective mechanism, we propose a sparse learning framework that integrates architecture-aware compiler optimizations. We introduce an end-to-end solution–C 4 n kernel sparsity, which prunes n elements from every four contiguous weights, and develop a compiler-based acceleration solution to ensure execution efficiency for this sparsity on mobile devices. Based on the kernel sparsity, our framework generates optimized sparse models targeting specific sparsity or latency requirements for various model sizes. We further leverage pruned weights to compensate for the remaining weights, enhancing downstream task performance. For practical hardware acceleration, we propose C 4 n -specific optimizations combined with a layout transformation elimination strategy. This approach mitigates inefficiencies arising from fine-grained pruning in linear layers and improves performance across other operations. Experimental results demonstrate that our method achieves superior task performance compared to other semi-structured pruning methods and achieves up-to 7→ speedup compared to llama.cpp framework on mobile devices. 
    more » « less
    Free, publicly-accessible full text available April 1, 2026
  2. Abstract While whistler‐mode waves are generated by injected anisotropic electrons on the nightside, the observed day‐night asymmetry of wave distributions raises an intriguing question about their generation on the dayside. In this study, we evaluate the distributions of whistler‐mode wave amplitudes and electrons as a function of distance from the magnetopause (MP) on the dayside from 6 to 18 hr in magnetic local time (MLT) within ±18° of magnetic latitude using the Time History of Events and Macroscale Interaction During Substorms measurements from June 2010 to August 2018. Specifically, under different levels of solar wind dynamic pressure and geomagnetic index, we conduct a statistical analysis to examine whistler‐mode wave amplitude, as well as anisotropy and phase space density (PSD) of source electrons across 1–20 keV energies, which potentially provide a source of free energy for wave generation. In coordinates relative to the MP, we find that lower‐band (0.05–0.5fce) waves occur much closer to the MP than upper‐band (0.5–0.8fce) waves, wherefceis electron cyclotron frequency. Our statistical results reveal that strong waves are associated with high anisotropy and high PSD of source electrons near the equator, indicating a preferred region for local wave generation on the dayside. Over 10–14 hr in MLT, as latitude increases, electron anisotropy decreases, while whistler‐mode wave amplitudes increase, suggesting that wave propagation from the equator to higher latitudes, along with amplification along the propagation path, is necessary to explain the observed waves on the dayside. 
    more » « less
  3. Abstract During geomagnetic storms relativistic outer radiation belt electron flux exhibits large variations on rapid time scales of minutes to days. Many competing acceleration and loss processes contribute to the dynamic variability of the radiation belts; however, distinguishing the relative contribution of each mechanism remains a major challenge as they often occur simultaneously and over a wide range of spatiotemporal scales. In this study, we develop a new comprehensive model for storm‐time radiation belt dynamics by incorporating electron wave‐particle interactions with parallel propagating whistler mode waves into our global test‐particle model of the outer belt. Electron trajectories are evolved through the electromagnetic fields generated from the Multiscale Atmosphere‐Geospace Environment (MAGE) global geospace model. Pitch angle scattering and energization of the test particles are derived from analytical expressions for quasi‐linear diffusion coefficients that depend directly on the magnetic field and density from the magnetosphere simulation. Using a study of the 17 March 2013 geomagnetic storm, we demonstrate that resonance with lower band chorus waves can produce rapid relativistic flux enhancements during the main phase of the storm. While electron loss from the outer radiation belt is dominated by loss through the magnetopause, wave‐particle interactions drive significant atmospheric precipitation. We also show that the storm‐time magnetic field and cold plasma density evolution produces strong, local variations of the magnitude and energy of the wave‐particle interactions and is critical to fully capturing the dynamic variability of the radiation belts caused by wave‐particle interactions. 
    more » « less
  4. critical to reveal a blackbox model’s decision-making process from raw data to prediction. In this article, we use two real datasets, the MNIST handwritten digits and MIT-BIH Electrocardiogram (ECG) signals, to motivate key characteristics of discriminative features, namely adaptiveness, predictive importance and effectiveness. Then, we develop a localization framework based on adversarial attacks to effectively localize discriminative features. In contrast to existing heuristic methods, we also provide a statistically guaranteed interpretability of the localized features by measuring a generalized partial R2. We apply the proposed method to the MNIST dataset and the MIT-BIH dataset with a convolutional auto-encoder. In the first, the compact image regions localized by the proposed method are visually appealing. Similarly, in the second, the identified ECG features are biologically plausible and consistent with cardiac electrophysiological principles while locating subtle anomalies in a QRS complex that may not be discernible by the naked eye. Overall, the proposed method compares favorably with state-of-the-art competitors. Accompanying this paper is a Python library dnn-locate that implements the proposed approach. 
    more » « less
  5. Pradeep Ravikumar (Ed.)
    Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires identifying the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables. Moreover, we prove that the peeling algorithm yields a consistent estimator in low-order polynomial time. Second, we propose a likelihood ratio test integrated with a data perturbation scheme to account for the uncertainty of identifying ancestors and interventions. Also, we show that the distribution of a data perturbation test statistic converges to the target distribution. Numerical examples demonstrate the utility and effectiveness of the proposed methods, including an application to infer gene regulatory networks. The R implementation is available at https://github.com/chunlinli/intdag. 
    more » « less
  6. Interchange instability is known to drive fast radial transport of particles in Jupiter's inner magnetosphere. Magnetic flux tubes associated with the interchange instability often coincide with changes in particle distributions and plasma waves, but further investigations are required to understand their detailed characteristics. We analyze representative interchange events observed by Juno, which exhibit intriguing features of particle distributions and plasma waves, including Z‐mode and whistler‐mode waves. These events occurred at an equatorial radial distance of ∼9 Jovian radii on the nightside, with Z‐mode waves observed at mid‐latitude and whistler‐mode waves near the equator. We calculate the linear growth rate of whistler‐mode and Z‐mode waves based on the observed plasma parameters and electron distributions and find that both waves can be locally generated within the interchanged flux tube. Our findings are important for understanding particle transport and generation of plasma waves in the magnetospheres of Jupiter and other planetary systems. 
    more » « less
  7. Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at https://inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models. 
    more » « less
  8. Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models. 
    more » « less