NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations

Guan, J; Hu, Z; Antonopoulus, C; Bellas, N; Lalis, S; Smirni, E; Zhou, G; Agrawal, G; Ren, B (June 2025, ACM - Proceedings of ICS 2025)

The demand for Deep Neural Network (DNN) execution (including both inference and training) on mobile system-ona-chip (SoCs) has surged, driven by factors like the need for real-time latency, privacy, and reducing vendors’ costs. Mainstream mobile GPUs (eg, Qualcomm Adreno GPUs) usually have a 2.5 D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5 D cache, which limits the optimization potential. This paper introduces TMModel, a framework with three components: 1) a set of micro-benchmarks and a novel performance assessment methodology to characterize a non-well-documented architecture with 2D memory, 2) a complete analytical performance model configurable for different data access pattern (s), tiling size (s), and other GPU execution parameters for a given operator (and associated size and shape), and 3) a compilation framework incorporating this model and generating optimized code with low overhead. TMModel is validated both on a set of DNN kernels and for training complete models on mobile GPU.
more » « less
Free, publicly-accessible full text available June 9, 2026
TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations

Guan, J; Hu, Z; Antonopoulus, C; Bellas, N; Lalis, S; Smirni, E; Zhou, G; Agrawal, G; Ren, B (June 2025, ACM - Proceedings of ICS 2025)

The demand for Deep Neural Network (DNN) execution (including both inference and training) on mobile system-ona-chip (SoCs) has surged, driven by factors like the need for real-time latency, privacy, and reducing vendors’ costs. Mainstream mobile GPUs (eg, Qualcomm Adreno GPUs) usually have a 2.5 D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5 D cache, which limits the optimization potential. This paper introduces TMModel, a framework with three components: 1) a set of micro-benchmarks and a novel performance assessment methodology to characterize a non-well-documented architecture with 2D memory, 2) a complete analytical performance model configurable for different data access pattern (s), tiling size (s), and other GPU execution parameters for a given operator (and associated size and shape), and 3) a compilation framework incorporating this model and generating optimized code with low overhead. TMModel is validated both on a set of DNN kernels and for training complete models on mobile GPU.
more » « less
Free, publicly-accessible full text available June 9, 2026
Halogen enrichment on the continental surface: a perspective from loess

https://doi.org/10.7185/geochemlet.2442

Han, P-Y; Rudnick, RL; Hu, Z-C; He, T; Marks, MAW; Chen, K (November 2024, Geochemical Perspectives Letters)

Halogen (F, Cl, Br, and I) concentrations for 129 loess samples from worldwide localities yield geometric means of 517 ± 53 μg/g F, 150 ± 20 μg/g Cl, 1.58 ± 0.16 μg/g Br, 1.16 ± 0.11 μg/g I (2 standard errors). These concentrations, notably for Br and I, are substantially higher than previous estimates for the average upper continental crystalline bedrocks, with enrichment factors of 1.3 +0.7/−0.4 (F), 1.8 +2.4/−0.8 (Cl), 3.8 +1.3/−1.0 (Br), and 39 +71/−16 (I) (95%confidence), documenting enrichment of halogens on the continental surface. These surface halogens are likely sourced from the oceans and may be influenced by climate fluctuations. Halogen ratios (Br/Cl, I/Cl, and Br/I) in loess are similar to those of organic-rich soils/sediments from both terrigenous and marine settings, suggesting that terrigenous and marine organic matter have indistinguishable halogen ratios. The Br/I ratios differ from those in the fine grained matrix of glacial diamictites, indicating that another process (beyond biological influence) is responsible for fractionating halogens in the upper continental crust. Using a mixing model, we calculate that over 80–90 % of loess originates from crystalline bedrocks, while the remainder (<10–20 %) derives from the halogen- and organic-rich sedimentary cover or other sources (e.g., marine aerosols).
more » « less
Free, publicly-accessible full text available November 1, 2025
Failures and successes to learn a core conceptual distinction from the statistics of language

Hu, Z; van_Paridon, J; Lupyan, G (July 2024, The Evolution of Language: Proceedings of the 15th International Conference (Evolang XV))
Nölle, J; Raviv, L; Graham, E; Hartmann, S; Jadoul, Y; Josserand, M; Matzinger, T; Mudd, K; Pleyer, M; Slonimska, A (Ed.)
Generic statements like “tigers are striped” and “cars have radios” com- municate information that is, in general, true. However, while the first state- ment is true *in principle*, the second is true only statistically. People are exquisitely sensitive to this principled-vs-statistical distinction. It has been argued that this ability to distinguish between something being true by virtue of it being a category member versus being true because of mere statistical regularity, is a general property of people’s conceptual machinery and cannot itself be learned. We investigate whether the distinction between principled and statistical properties can be learned from language itself. If so, it raises the possibility that language experience can bootstrap core conceptual dis- tinctions and that it is possible to learn sophisticated causal models directly from language. We find that language models are all sensitive to statistical prevalence, but struggle with representing the principled-vs-statistical dis- tinction controlling for prevalence. Until GPT-4, which succeeds.
more » « less
Full Text Available
On the Dependence of Simulated Convection on Domain Size in CRMs

https://doi.org/10.1029/2024MS004749

Jenney, A_M; Hu, Z.; Hannah, W_M (March 2025, Journal of Advances in Modeling Earth Systems)

Abstract We present a heuristic model to explain the suppression of deep convection in convection‐resolving models (CRMs) with a small number of grid columns, such as those used in super‐parameterized or multi‐scale modeling framework (MMF) general circulation models (GCM) of the atmosphere. Domains with few grid columns require greater instability to sustain convection because they force a large convective fraction, driving strong compensating subsidence warming. Updraft dilution, which is stronger for reduced horizontal grid spacing, enhances this effect. Thus, suppression of deep convection in CRMs with few grid columns can be reduced by increasing grid spacing. Radiative‐convective equilibrium simulations using standalone CRM simulations with the System for Atmospheric Modeling (SAM) and using GCM‐coupled CRM simulations with the Energy Exascale Earth System Model (E3SM)‐MMF confirm the heuristic model results.
more » « less
Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering

Naeem, A; Li, T; Liao, H_R; Xu, J; Mathew, A M; Zhu, Z; Tan, Z; Jaiswal, A K; Salibian, R A; Hu, Z; et al (November 2024, https://doi.org/10.48550/arXiv.2411.17073)

Accurate diagnosis and prognosis assisted by pathology images are essential for cancer treatment selection and planning. Despite the recent trend of adopting deep-learning approaches for analyzing complex pathology images, they fall short as they often overlook the domain-expert understanding of tissue structure and cell composition. In this work, we focus on a challenging Open-ended Pathology VQA (PathVQA-Open) task and propose a novel framework named Path-RAG, which leverages HistoCartography to retrieve relevant domain knowledge from pathology images and significantly improves performance on PathVQA-Open. Admitting the complexity of pathology image analysis, Path-RAG adopts a human-centered AI approach by retrieving domain knowledge using HistoCartography to select the relevant patches from pathology images. Our experiments suggest that domain guidance can significantly boost the accuracy of LLaVA-Med from 38% to 47%, with a notable gain of 28% for H&E-stained pathology images in the PathVQA-Open dataset. For longer-form question and answer pairs, our model consistently achieves significant improvements of 32.5% in ARCH-Open PubMed and 30.6% in ARCH-Open Books on H\&E images.
more » « less
Free, publicly-accessible full text available November 26, 2025
Navigating the Privacy Compliance Maze: Understanding Risks with Privacy-Configurable Mobile SDKs

Zhang, Y; Hu, Z; Wang, X; Hong, Y; Nan, Y; Wang, X; Cheng, J; Xing, L (August 2024, USENIX Security Symposium)

Full Text Available
Adaptive oracle-efficient online learning

Wang, G; Hu, Z; Muthukumar, V; Abernethy, J (November 2022, Neural Information Processing Systems 2022)

Full Text Available
A CFD–DEM study on the suffusion and shear behaviors of gap-graded soils under stress anisotropy

https://doi.org/10.1007/s11440-022-01755-7

Hu, Z.; Li, J. Z.; Zhang, Y. D.; Yang, Z. X.; Liu, J. K. (December 2022, Acta Geotechnica)

Full Text Available
Equations of motion for weakly compressible point vortices

https://doi.org/10.1098/rsta.2021.0052

Llewellyn Smith, Stefan G.; Chu, T.; Hu, Z. (June 2022, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences)

Equations of motion for compressible point vortices in the plane are obtained in the limit of small Mach number, M , using a Rayleigh–Jansen expansion and the method of Matched Asymptotic Expansions. The solution in the region between vortices is matched to solutions around each vortex core. The motion of the vortices is modified over long time scales O ( M 2 log ⁡ M ) and O ( M 2 ) . Examples are given for co-rotating and co-propagating vortex pairs. The former show a correction to the rotation rate and, in general, to the centre and radius of rotation, while the latter recover the known result that the steady propagation velocity is unchanged. For unsteady configurations, the vortex solution matches to a far field in which acoustic waves are radiated. This article is part of the theme issue ‘Mathematical problems in physical fluid dynamics (part 2)’.
more » « less
Full Text Available

« Prev Next »

Search for: All records