NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sparse learning for state space models on mobile

Shen, X; Zheng, H; Gong, Y; Kong, Z; Yang, C; Zhan, Z; Wu, Y; Lin, X; Wang, Y; Zhao, P; et al (April 2025, The Thirteenth International Conference on Learning Representations.)

Transformer models have been widely investigated in different domains by providing long-range dependency handling and global contextual awareness, driving the development of popular AI applications such as ChatGPT, Gemini, and Alexa. State Space Models (SSMs) have emerged as strong contenders in the ﬁeld of sequential modeling, challenging the dominance of Transformers. SSMs incorporate a selective mechanism that allows for dynamic parameter adjustment based on input data, enhancing their performance. However, this mechanism also comes with increasing computational complexity and bandwidth demands, posing challenges for deployment on resource-constraint mobile devices. To address these challenges without sacriﬁcing the accuracy of the selective mechanism, we propose a sparse learning framework that integrates architecture-aware compiler optimizations. We introduce an end-to-end solution–C 4 n kernel sparsity, which prunes n elements from every four contiguous weights, and develop a compiler-based acceleration solution to ensure execution efﬁciency for this sparsity on mobile devices. Based on the kernel sparsity, our framework generates optimized sparse models targeting speciﬁc sparsity or latency requirements for various model sizes. We further leverage pruned weights to compensate for the remaining weights, enhancing downstream task performance. For practical hardware acceleration, we propose C 4 n -speciﬁc optimizations combined with a layout transformation elimination strategy. This approach mitigates inefﬁciencies arising from ﬁne-grained pruning in linear layers and improves performance across other operations. Experimental results demonstrate that our method achieves superior task performance compared to other semi-structured pruning methods and achieves up-to 7→ speedup compared to llama.cpp framework on mobile devices.
more » « less
Free, publicly-accessible full text available April 1, 2026
Statistical Properties of Dayside Whistler‐Mode Waves at Low Latitudes Under Various Solar Wind Conditions

https://doi.org/10.1029/2024JA033225

Peng, Y.; Li, W.; Ma, Q.; Shen, X‐C (March 2025, Journal of Geophysical Research: Space Physics)

Abstract While whistler‐mode waves are generated by injected anisotropic electrons on the nightside, the observed day‐night asymmetry of wave distributions raises an intriguing question about their generation on the dayside. In this study, we evaluate the distributions of whistler‐mode wave amplitudes and electrons as a function of distance from the magnetopause (MP) on the dayside from 6 to 18 hr in magnetic local time (MLT) within ±18° of magnetic latitude using the Time History of Events and Macroscale Interaction During Substorms measurements from June 2010 to August 2018. Specifically, under different levels of solar wind dynamic pressure and geomagnetic index, we conduct a statistical analysis to examine whistler‐mode wave amplitude, as well as anisotropy and phase space density (PSD) of source electrons across 1–20 keV energies, which potentially provide a source of free energy for wave generation. In coordinates relative to the MP, we find that lower‐band (0.05–0.5f_ce) waves occur much closer to the MP than upper‐band (0.5–0.8f_ce) waves, wheref_ceis electron cyclotron frequency. Our statistical results reveal that strong waves are associated with high anisotropy and high PSD of source electrons near the equator, indicating a preferred region for local wave generation on the dayside. Over 10–14 hr in MLT, as latitude increases, electron anisotropy decreases, while whistler‐mode wave amplitudes increase, suggesting that wave propagation from the equator to higher latitudes, along with amplification along the propagation path, is necessary to explain the observed waves on the dayside.
more » « less
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

Niu, W; Sanim, M; Shu, Z; Guan, J; Shen, X; Yin, M; Agrawal, G; Ren, B (April 2024, acm)

Full Text Available
Cross‐Scale Modeling of Storm‐Time Radiation Belt Variability

https://doi.org/10.1029/2023JA032175

Michael, A. T.; Sorathia, K. A.; Ukhorskiy, A. Y.; Albert, J.; Shen, X.; Li, W.; Merkin, V. G. (April 2024, Journal of Geophysical Research: Space Physics)

Abstract During geomagnetic storms relativistic outer radiation belt electron flux exhibits large variations on rapid time scales of minutes to days. Many competing acceleration and loss processes contribute to the dynamic variability of the radiation belts; however, distinguishing the relative contribution of each mechanism remains a major challenge as they often occur simultaneously and over a wide range of spatiotemporal scales. In this study, we develop a new comprehensive model for storm‐time radiation belt dynamics by incorporating electron wave‐particle interactions with parallel propagating whistler mode waves into our global test‐particle model of the outer belt. Electron trajectories are evolved through the electromagnetic fields generated from the Multiscale Atmosphere‐Geospace Environment (MAGE) global geospace model. Pitch angle scattering and energization of the test particles are derived from analytical expressions for quasi‐linear diffusion coefficients that depend directly on the magnetic field and density from the magnetosphere simulation. Using a study of the 17 March 2013 geomagnetic storm, we demonstrate that resonance with lower band chorus waves can produce rapid relativistic flux enhancements during the main phase of the storm. While electron loss from the outer radiation belt is dominated by loss through the magnetopause, wave‐particle interactions drive significant atmospheric precipitation. We also show that the storm‐time magnetic field and cold plasma density evolution produces strong, local variations of the magnitude and energy of the wave‐particle interactions and is critical to fully capturing the dynamic variability of the radiation belts caused by wave‐particle interactions.
more » « less
Full Text Available
Data-Adaptive Discriminative Feature Localization with Statistically Guaranteed Interpretation

Dai, B.; Shen, X.; Li, C.; Chen, C.; Pan, W. (October 2023, Annals of applied statistics)

critical to reveal a blackbox model’s decision-making process from raw data to prediction. In this article, we use two real datasets, the MNIST handwritten digits and MIT-BIH Electrocardiogram (ECG) signals, to motivate key characteristics of discriminative features, namely adaptiveness, predictive importance and effectiveness. Then, we develop a localization framework based on adversarial attacks to effectively localize discriminative features. In contrast to existing heuristic methods, we also provide a statistically guaranteed interpretability of the localized features by measuring a generalized partial R2. We apply the proposed method to the MNIST dataset and the MIT-BIH dataset with a convolutional auto-encoder. In the first, the compact image regions localized by the proposed method are visually appealing. Similarly, in the second, the identified ECG features are biologically plausible and consistent with cardiac electrophysiological principles while locating subtle anomalies in a QRS complex that may not be discernible by the naked eye. Overall, the proposed method compares favorably with state-of-the-art competitors. Accompanying this paper is a Python library dnn-locate that implements the proposed approach.
more » « less
Full Text Available
Inference for a large directed graphical model with interventions.

Li, C.; Shen, X.; Pan, W. (April 2023, Journal of machine learning research)
Pradeep Ravikumar (Ed.)
Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires identifying the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables. Moreover, we prove that the peeling algorithm yields a consistent estimator in low-order polynomial time. Second, we propose a likelihood ratio test integrated with a data perturbation scheme to account for the uncertainty of identifying ancestors and interventions. Also, we show that the distribution of a data perturbation test statistic converges to the target distribution. Numerical examples demonstrate the utility and effectiveness of the proposed methods, including an application to infer gene regulatory networks. The R implementation is available at https://github.com/chunlinli/intdag.
more » « less
Full Text Available
Plasma Wave and Particle Dynamics During Interchange Events in the Jovian Magnetosphere Using Juno Observations

https://doi.org/10.1029/2023GL103894

Daly, A.; Li, W.; Ma, Q.; Shen, X.‐C.; Yoon, P. H.; Menietti, J. D.; Kurth, W. S.; Hospodarsky, G. B.; Mauk, B. H.; Clark, G.; et al (December 2023, Geophysical Research Letters)

Interchange instability is known to drive fast radial transport of particles in Jupiter's inner magnetosphere. Magnetic flux tubes associated with the interchange instability often coincide with changes in particle distributions and plasma waves, but further investigations are required to understand their detailed characteristics. We analyze representative interchange events observed by Juno, which exhibit intriguing features of particle distributions and plasma waves, including Z‐mode and whistler‐mode waves. These events occurred at an equatorial radial distance of ∼9 Jovian radii on the nightside, with Z‐mode waves observed at mid‐latitude and whistler‐mode waves near the equator. We calculate the linear growth rate of whistler‐mode and Z‐mode waves based on the observed plasma parameters and electron distributions and find that both waves can be locally generated within the interchanged flux tube. Our findings are important for understanding particle transport and generation of plasma waves in the magnetospheres of Jupiter and other planetary systems.
more » « less
Full Text Available
Inverse Scaling: When Bigger Isn’t Better

McKenzie, IR; Lyzhov, A; Pieler, M; Parrish, A; Mueller, A; Prabhu, A; McLean, E; Kirtland, A; Ross, A; Liu, A; et al (February 2024, Transactions on machine learning research)

Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at https://inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models.
more » « less
Full Text Available
Inverse Scaling: When Bigger Isn't Better

McKenzie, IR; Lyzhov, A; Pieler, M; Parrish, A; Mueller, A; Prabhu, A; McLean, E; Kirtland, A; Ross, A; Liu, A; et al (October 2023, Transactions on machine learning research)

Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models.
more » « less
Full Text Available
Toward Efficient Interactions between Python and Native Libraries

Tan, J; Chen, C; Liu, Z; Ren, R; Song, R; Shen, X; Liu, X (August 2021, The 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE))

Full Text Available

« Prev Next »

Search for: All records