NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Developing A novel AI enabled extended reality system for real-time automatic facial expression recognition and system performance evaluation

https://doi.org/10.1016/j.aei.2025.103207

Kashef, Amirarash; Wang, Yu; Assafi, Mohammad Nafe; Ma, Junfeng; Wang, Jun; Jones, J Adam; Thiamwong, Ladda (May 2025, Advanced Engineering Informatics)

Free, publicly-accessible full text available May 1, 2026
What do we know about Hugging Face? A systematic literature review and quantitative validation of qualitative claims

Jones, J; Jiang, W; Synovic, N; Thiruvathukal, GK; Davis, JC (October 2024, Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024.)

Background: Software Package Registries (SPRs) are an integral part of the software supply chain. These collaborative platforms unite contributors, users, and packages, and they streamline pack- age management. Much engineering work focuses on synthesizing packages from SPRs into a downstream project. Prior work has thoroughly characterized the SPRs associated with traditional soft- ware, such as NPM (JavaScript) and PyPI (Python). Pre-Trained Model (PTM) Registries are an emerging class of SPR of increasing importance, because they support the deep learning supply chain. Aims: A growing body of empirical research has examined PTM reg- istries from various angles, such as vulnerabilities, reuse processes, and evolution. However, no existing research synthesizes them to provide a systematic understanding of the current knowledge. Furthermore, much of the existing research includes unsupported qualitative claims and lacks sufficient quantitative analysis. Our research aims to fill these gaps by providing a thorough knowledge synthesis and use it to inform further quantitative analysis. Methods: To consolidate existing knowledge on PTM reuse, we first conduct a systematic literature review (SLR). We then observe that some of the claims are qualitative and lack quantitative evi- dence. We identify quantifiable metrics assoiated with those claims, and measure in order to substantiate these claims. Results: From our SLR, we identify 12 claims about PTM reuse on the HuggingFace platform, 4 of which lack quantitative validation. We successfully test 3 of these claims through a quantitative analysis, and directly compare one with traditional software. Our findings corroborate qualitative claims with quantitative measurements. Our two most notable findings are: (1) PTMs have a significantly higher turnover rate than traditional software, indicating a dynamic and rapidly evolving reuse environment within the PTM ecosystem; and (2) There is a strong correlation between documentation quality and PTM popularity. Conclusions: Our findings validate several qual- itative research claims with concrete metrics, confirming prior qualitative and case study research. Our measures show further dynamics of PTM reuse, motivating further research infrastructure and new kinds of measurements.
more » « less
Full Text Available
What do we know about Hugging Face? A systematic literature review and quantitative validation of qualitative claims

Jones, J; Jiang, W; Synovic, N; Thiruvathukal, GK; Davis, JC (October 2024, Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024.)

Background: Software Package Registries (SPRs) are an integral part of the software supply chain. These collaborative platforms unite contributors, users, and packages, and they streamline pack- age management. Much engineering work focuses on synthesizing packages from SPRs into a downstream project. Prior work has thoroughly characterized the SPRs associated with traditional soft- ware, such as NPM (JavaScript) and PyPI (Python). Pre-Trained Model (PTM) Registries are an emerging class of SPR of increasing importance, because they support the deep learning supply chain. Aims: A growing body of empirical research has examined PTM registries from various angles, such as vulnerabilities, reuse processes, and evolution. However, no existing research synthesizes them to provide a systematic understanding of the current knowledge. Furthermore, much of the existing research includes unsupported qualitative claims and lacks sufficient quantitative analysis. Our research aims to fill these gaps by providing a thorough knowledge synthesis and use it to inform further quantitative analysis. Methods: To consolidate existing knowledge on PTM reuse, we first conduct a systematic literature review (SLR). We then observe that some of the claims are qualitative and lack quantitative evidence. We identify quantifiable metrics associated with those claims, and measure in order to substantiate these claims. Results: From our SLR, we identify 12 claims about PTM reuse on the HuggingFace platform, 4 of which lack quantitative validation. We successfully test 3 of these claims through a quantitative analysis, and directly compare one with traditional software. Our findings corroborate qualitative claims with quantitative measurements. Our two most notable findings are: (1) PTMs have a significantly higher turnover rate than traditional software, indicating a dynamic and rapidly evolving reuse environment within the PTM ecosystem; and (2) There is a strong correlation between documentation quality and PTM popularity. Conclusions: Our findings validate several qual- stative research claims with concrete metrics, confirming prior qualitative and case study research. Our measures show further dynamics of PTM reuse, motivating further research infrastructure and new kinds of measurements.
more » « less
Full Text Available
VR Geoscience Education: Building Spatial Reasoning Skills

https://doi.org/10.1109/VRW62533.2024.00034

Johanesen, Katharine E; Jones, J Adam; Poole, Territa; Ryker, Katherine; Green, Christopher (March 2024, IEEE)

We tested the impact of a 15-minute VR training on spatial skills and performance on a geoscience task with a control group. The VR group improved more on the Water Level Task-a measure of understanding of horizontal (B = 0.68, p=0.008). Both groups performed equally on the geology task, except for an orientation rule not well instructed in the VR module (B = -1.33, p=0.0057). In the post-survey, the VR group reported higher ability to link knowledge (X2=4.45, p=0.035) and more interest than in past activities (X2=8.47, p=0.004). This is encouraging, given the brevity of the VR lesson.
more » « less
Full Text Available
PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source Software

https://doi.org/10.1145/3643991.3644907

Jiang, W; Yasmin, J; Jones, J; Synovic, N; Kuo, J; Bielanski, N; Tian, Y; Thiruvathukal, G K; Davis, J C (May 2024, 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR))

The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these models. Without such data, the MSR community cannot comprehensively understand the impact of PTM adoption and reuse.This paper presents the PeaTMOSS dataset, which comprises metadata for 281,638 PTMs and detailed snapshots for all PTMs with over 50 monthly downloads (14,296 PTMs), along with 28,575 open-source software repositories from GitHub that utilize these models. Additionally, the dataset includes 44,337 mappings from 15,129 downstream GitHub repositories to the 2,530 PTMs they use. To enhance the dataset’s comprehensiveness, we developed prompts for a large language model to automatically extract model metadata, including the model’s training datasets, parameters, and evaluation metrics. Our analysis of this dataset provides the first summary statistics for the PTM supply chain, showing the trend of PTM development and common shortcomings of PTM package documentation. Our example application reveals inconsistencies in software licenses across PTMs and their dependent projects. PeaTMOSS lays the foundation for future research, offering rich opportunities to investigate the PTM supply chain. We outline mining opportunities on PTMs, their downstream usage, and cross-cutting questions.Our artifact is available at https://github.com/PurdueDualityLab/PeaTMOSS-Artifact. Our dataset is available at https://transfer.rcac.purdue.edu/file-manager?origin_id=ff978999-16c2-4b50-ac7a-947ffdc3eb1d&origin_path=%2F.
more » « less
Full Text Available
PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source Software

Jiang, W; Yasmin, J; Jones, J; Synovic, N; Kuo, J; Bielanski, N; Tian, Y; Thiruvathukal, G K; Davis, J C (May 2024, 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR))

The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these models. Without such data, the MSR community cannot comprehensively understand the impact of PTM adoption and reuse. This paper presents the PeaTMOSS dataset, which comprises metadata for 281,638 PTMs and detailed snapshots for all PTMs with over 50 monthly downloads (14,296 PTMs), along with 28,575 open-source software repositories from GitHub that utilize these models. Additionally, the dataset includes 44,337 mappings from 15,129 downstream GitHub repositories to the 2,530 PTMs they use. To enhance the dataset’s comprehensiveness, we developed prompts for a large language model to automatically extract model metadata, including the model’s training datasets, parameters, and evaluation metrics. Our analysis of this dataset provides the first summary statistics for the PTM supply chain, showing the trend of PTM development and common shortcomings of PTM package documentation. Our example application reveals inconsistencies in software licenses across PTMs and their dependent projects. PeaTMOSS lays the foundation for future research, offering rich opportunities to investigate the PTM supply chain. We outline mining opportunities on PTMs, their downstream usage, and cross-cutting questions. Our artifact is available at https://github.com/PurdueDualityLab/PeaTMOSS-Artifact. Our dataset is available at https://transfer.rcac.purdue.edu/file-manager?origin_id=ff978999-16c2-4b50-ac7a-947ffdc3eb1d&origin_path=%2F.
more » « less
Full Text Available
Microbe surveillance in the amphibian pet trade: Results from a pilot study

https://doi.org/10.1002/ecs2.4968

Pearhill, R_A; Gray, M_J; Jones, J.; Brinks, Z.; Brunner, J_L (August 2024, Ecosphere)

Abstract Regional and global trade of live animals can contribute to the spread and emergence of novel pathogens, including several important pathogens of amphibians. However, understanding the spread or even frequency of infections in large, complex amphibian trade networks has been difficult, in part because businesses tend to be reluctant to participate in surveillance programs. Thus, we developed a novel approach to surveillance in which anonymous participating businesses were sent surveillance kits through a trusted trade advocacy partner, samples were returned to researchers via anonymous prepaid envelopes, and results were provided via a secure website with access regulated by a unique personal identification number (PIN) created by the business. We tested samples for the amphibian pathogens,Batrachochytrium salamandrivorans(Bsal),Batrachochytrium dendrobatidis(Bd), andRanavirusspp. (Rv), as well as the beneficial microbe,Janthinobacterium lividum(Jliv), using quantitative real‐time polymerase chain reaction (qPCR). Out of 120 businesses invited to complete an anonymous socioeconomic survey, 24 volunteered to participate in pathogen surveillance, of which 14 were sent surveillance kits. Eight of these businesses returned samples consisting of swabs collected from amphibians in 78 terrestrial habitats and water filters from 49 aquatic habitats. Copies of a highly conserved vertebrate gene (EBF3N), quantified using qPCR, were consistently low (<100 copies) in returned samples, but similar to those collected by researchers, indicating comparable sample quality. Three samples (from two facilities) had detectable levels ofBdDNA;Bsal,Rv, andJlivwere not detected. This pilot study provides evidence that information about pathogens in pet trade networks can be acquired by developing partnerships with industry, and business participation might be enhanced by ensuring anonymity and inclusion of a trade advocacy partner.
more » « less
Precision spectroscopy of the hyperfine components of the 1S–2S transition in antihydrogen

https://doi.org/10.1038/s41567-024-02712-9

Baker, C J; Bertsche, W; Capra, A; Carruth, C; Cesar, C L; Charlton, M; Christensen, A; Collister, R; Cridland_Mathad, A; Eriksson, S; et al (February 2025, Nature Physics)

Free, publicly-accessible full text available February 1, 2026
Understanding barriers to collaborative governance for the food-energy-water nexus: The case of Phoenix, Arizona

https://doi.org/10.1016/j.envsci.2021.10.025

Jones, J. Leah; White, Dave D. (January 2022, Environmental Science & Policy)

Full Text Available
Thermodynamics contributes to high limonene productivity in cyanobacteria

https://doi.org/10.1016/j.mec.2022.e00193

Shinde, Shrameeta; Singapuri, Sonali; Jiang, Zhenxiong; Long, Bin; Wilcox, Danielle; Klatt, Camille; Jones, J. Andrew; Yuan, Joshua S.; Wang, Xin (June 2022, Metabolic Engineering Communications)

Full Text Available

« Prev Next »

Search for: All records