NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Twister2 Cross‐platform resource scheduler for big data

https://doi.org/10.1002/cpe.6502

Uyar, Ahmet; Gunduz, Gurhan; Kamburugamuve, Supun; Wickramasinghe, Pulasthi; Widanage, Chathura; Govindarajan, Kannan; Perera, Niranda; Abeykoon, Vibhatha; Akkas, Selahattin; Fox, Geoffrey (July 2021, Concurrency and Computation: Practice and Experience)

Abstract Twister2 is an open‐source big data hosting environment designed to process both batch and streaming data at scale. Twister2 runs jobs in both high‐performance computing (HPC) and big data clusters. It provides a cross‐platform resource scheduler to run jobs in diverse environments. Twister2 is designed with a layered architecture to support various clusters and big data problems. In this paper, we present the cross‐platform resource scheduler of Twister2. We identify required services and explain implementation details. We present job startup delays for single jobs and multiple concurrent jobs in Kubernetes and OpenMPI clusters. We compare job startup delays for Twister2 and Spark at a Kubernetes cluster. In addition, we compare the performance of terasort algorithm on Kubernetes and bare metal clusters at AWS cloud.
more » « less
A Windowed Temporal Saliency Rescaling Method for Interpreting Time Series Deep Learning Models

Islam, Md Khairul; Fox, Judy (March 2025, AAAI'25 Workshop: AI for Time Series Analysis: Theory, Algorithms, and Applications (AI4TS))
Song, Dongjin; Xie, Yao; Purushotham, Sanjay; Chen, Haifeng; Shen, Cong (Ed.)
Interpreting complex time series forecasting models is challenging due to the temporal dependencies between time steps and the dynamic relevance of input features over time. Existing interpretation methods are limited by focusing mostly on classification tasks, evaluating using custom baseline models instead of the latest time series models, using simple synthetic datasets, and requiring training another model. We introduce a novel interpretation method, Windowed Temporal Saliency Rescaling (WinTSR) addressing these limitations. WinTSR explicitly captures temporal dependencies among the past time steps and efficiently scales the feature importance with this time importance. We benchmark WinTSR against 10 recent interpretation techniques with 5 state-of-the-art deep-learning models of different architectures, including a time series foundation model. We use 3 real-world datasets for both time-series classification and regression. Our comprehensive analysis shows that WinTSR significantly outranks the other local interpretation methods in overall performance. Finally, we provide a novel and open-source framework to interpret the latest time series transformers and foundation models.
more » « less
Free, publicly-accessible full text available March 4, 2026
Large Language Models for Financial Aid in Financial Time-series Forecasting

https://doi.org/10.1109/BigData62323.2024.10824953

Islam, Md Khairul; Karmacharya, Ayush; Sue, Timothy; Fox, Judy (December 2024, IEEE Computer Society Digital Library)
na (Ed.)
Considering the difficulty of financial time series forecasting in financial aid, much of the current research focuses on leveraging big data analytics in financial services. One modern approach is to utilize ”predictive analysis”, analogous to forecasting financial trends. However, many of these time series data in Financial Aid (FA) pose unique challenges due to limited historical datasets and high dimensional financial information, which hinder the development of effective predictive models that balance accuracy with efficient runtime and memory usage. Pre-trained foundation models are employed to address these challenging tasks. We use state-of-the-art time series models including pre-trained LLMs (GPT-2 as the backbone), transformers, and linear models to demonstrate their ability to outperform traditional approaches, even with minimal (”few-shot”) or no fine-tuning (”zero-shot”). Our benchmark study, which includes financial aid with seven other time series tasks, shows the potential of using LLMs for scarce financial datasets.
more » « less
Free, publicly-accessible full text available December 15, 2025
Does Differential Privacy Impact Bias in Pretrained Language Models?

Islam, MK; Wang, A; Wang, T; Ji, Y; Fox, J; Zhao, J (June 2024, IEEE Data Engineering Bulletin (Special Issue on Privacy-preserving Data Management) Vol. 48 No. 2, June 2024.)
Wang, H; Xiao, X (Ed.)
Differential privacy (DP) is applied when fine-tuning pre-trained language models (LMs) to limit leakage of training examples. While most DP research has focused on improving a model’s privacy-utility tradeoff, some find that DP can be unfair to or biased against underrepresented groups. In this work, we extensively analyze the impact of DP on bias in LMs. We find differentially private training can increase the model bias against protected groups w.r.t AUC-based bias metrics. DP makes it more difficult for the model to differentiate between the positive and negative examples from the protected groups and other groups in the rest of the population. Our results also show that the impact of DP on bias is affected by both the privacy protection level and the underlying distribution of the dataset.
more » « less
Full Text Available
Validation, Robustness, and Accuracy of Perturbation-Based Sensitivity Analysis Methods for Time-Series Deep Learning Models

https://doi.org/10.1609/aaai.v38i21.30559

Wang, Zhengguang (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)
Michael, Wooldridge; Dy, Jennifer; Natarajan, Sriraam (Ed.)
This work undertakes studies to evaluate Interpretability Methods for Time Series Deep Learning. Sensitivity analysis assesses how input changes affect the output, constituting a key component of interpretation. Among the post-hoc interpretation methods such as back-propagation, perturbation, and approximation, my work will investigate perturbation-based sensitivity Analysis methods on modern Transformer models to benchmark their performances. Specifically, my work intends to answer three research questions: 1) Do different sensitivity analysis methods yield comparable outputs and attribute importance rankings? 2) Using the same sensitivity analysis method, do different Deep Learning models impact the output of the sensitivity analysis? 3) How well do the results from sensitivity analysis methods align with the ground truth?
more » « less
Full Text Available
Temporal Dependencies and Spatio-Temporal Patterns of Time Series Models

https://doi.org/10.1609/aaai.v38i21.30396

Islam, Khairul (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

The widespread use of Artificial Intelligence (AI) has highlighted the importance of understanding AI model behavior. This understanding is crucial for practical decision-making, assessing model reliability, and ensuring trustworthiness. Interpreting time series forecasting models faces unique challenges compared to image and text data. These challenges arise from the temporal dependencies between time steps and the evolving importance of input features over time. My thesis focuses on addressing these challenges by aiming for more precise explanations of feature interactions, uncovering spatiotemporal patterns, and demonstrating the practical applicability of these interpretability techniques using real-world datasets and state-of-the-art deep learning models.
more » « less
Full Text Available
Opportunities for enhancing MLCommons efforts while leveraging insights from educational MLCommons earthquake benchmarks efforts

https://doi.org/10.3389/fhpcp.2023.1233877

von Laszewski, Gregor; Fleischer, J. P.; Knuuti, Robert; Fox, Geoffrey C.; Kolessar, Jake; Butler, Thomas S.; Fox, Judy (October 2023, Frontiers in High Performance Computing)

MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by others, educational institutions provide valuable opportunities for engagement. In this article, we identify numerous insights obtained from different viewpoints as part of efforts to utilize high-performance computing (HPC) big data systems in existing education while developing and conducting science benchmarks for earthquake prediction. As this activity was conducted across multiple educational efforts, we project if and how it is possible to make such efforts available on a wider scale. This includes the integration of sophisticated benchmarks into courses and research activities at universities, exposing the students and researchers to topics that are otherwise typically not sufficiently covered in current course curricula as we witnessed from our practical experience across multiple organizations. As such, we have outlined the many lessons we learned throughout these efforts, culminating in the need forbenchmark carpentryfor scientists using advanced computational resources. The article also presents the analysis of an earthquake prediction code benchmark while focusing on the accuracy of the results and not only on the runtime; notedly, this benchmark was created as a result of our lessons learned. Energy traces were produced throughout these benchmarks, which are vital to analyzing the power expenditure within HPC environments. Additionally, one of the insights is that in the short time of the project with limited student availability, the activity was only possible by utilizing a benchmark runtime pipeline while developing and using software to generate jobs from the permutation of hyperparameters automatically. It integrates a templated job management framework for executing tasks and experiments based on hyperparameters while leveraging hybrid compute resources available at different institutions. The software is part of a collection calledcloudmeshwith its newly developed components, cloudmesh-ee (experiment executor) and cloudmesh-cc (compute coordinator).
more » « less
Full Text Available
Interpreting County-Level COVID-19 Infections using Transformer and Deep Learning Time Series Models

https://doi.org/10.1109/ICDH60066.2023.00046

Islam, Md Khairul; Liu, Yingzheng; Erkelens, Andrej; Daniello, Nick; Marathe, Aparna; Fox, Judy (July 2023, IEEE)

Deep Learning for Time-series plays a key role in AI for healthcare. To predict the progress of infectious disease outbreaks and demonstrate clear population-level impact, more granular analyses are urgently needed that control for important and potentially confounding county-level socioeconomic and health factors. We forecast US county-level COVID-19 infections using the Temporal Fusion Transformer (TFT). We focus on heterogeneous time-series deep learning model prediction while interpreting the complex spatiotemporal features learned from the data. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves better prediction performance compared to other time-series models. 2) We analyzed the attention patterns from TFT to interpret the temporal and spatial patterns learned by the model. 3) We collected around 2.5 years of socioeconomic and health features for 3142 US counties, such as observed cases, and a number of static (age distribution and health disparity) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we have shown that our model can learn complex interactions. Interpreting different impacts at the county level would be crucial for understanding the infection process that can help effective public health decision-making.
more » « less
Interpreting County-Level COVID-19 Infections using Transformer and Deep Learning Time Series Models

https://doi.org/10.1109/ICDH60066.2023.00046

Islam, K; Liu, Y.; Erkelens, A.; Daniello, N.; Marathe, A.; Fox, J. (February 2023, IEEE International Conference on Digital Health (ICDH))

Deep Learning for Time-series plays a key role in AI for healthcare. To predict the progress of infectious disease outbreaks and demonstrate clear population-level impact, more granular analyses are urgently needed that control for important and potentially confounding county-level socioeconomic and health factors. We forecast US county-level COVID-19 infections using the Temporal Fusion Transformer (TFT). We focus on heterogeneous time-series deep learning model prediction while interpreting the complex spatiotemporal features learned from the data. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves better prediction performance compared to other time-series models. 2) We analyzed the attention patterns from TFT to interpret the temporal and spatial patterns learned by the model. 3) We collected around 2.5 years of socioeconomic and health features for 3142 US counties, such as observed cases, and a number of static (age distribution and health disparity) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we have shown that our model can learn complex interactions. Interpreting different impacts at the county level would be crucial for understanding the infection process that can help effective public health decision-making.
more » « less
HySec-Flow: Privacy-Preserving Genomic Computing with SGX-based Big-Data Analytics Framework

https://doi.org/10.1109/CLOUD53861.2021.00098

Widanage, Chathura; Liu, Weijie; Li, Jiayu; Chen, Hongbo; Wang, XiaoFeng; Tang, Haixu; Fox, Judy (September 2021, 2021 IEEE 14th International Conference on Cloud Computing (CLOUD))

Full Text Available

« Prev Next »

Search for: All records