NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DyESP: Accelerating Hyperparameter-Architecture Search via Dynamic Exploration and Space Pruning

https://doi.org/10.1609/aaaiss.v5i1.35585

Liu, Xukun; Lv, Haoze; Ma, Fenglong; Wang, Chi; Xu, Dongkuan DK (May 2025, Proceedings of the AAAI Symposium Series)

In this work, we introduce DyESP, a novel approach that unites dynamic exploration with space pruning to expedite the combined search of hyperparameters and architecture, enhancing the efficiency and accuracy of hyperparameter-architecture search (HAS). Central to DyESP are two innovative components: a meta-scheduler that customizes the search strategy for varying spaces and a pruner designed to minimize the hyperparameter space by discarding suboptimal configurations. The meta-scheduler leverages historical data to dynamically refine the search direction, targeting the most promising areas while minimizing unnecessary exploration. Meanwhile, the pruner employs a surrogate model, specifically a fine-tuned multilayer perceptron (MLP), to predict and eliminate inferior configurations based on static metrics, thereby streamlining the search and conserving computational resources. The results from the pruner, which identifies and removes underperforming configurations, are fed into the meta-scheduler. This process updates the historical dataset used by the meta-scheduler, enabling it to adjust the exploration degree and refine the sampling strategy for subsequent iterations. This integration ensures the meta-scheduler is continually updated with relevant data, allowing for more accurate and timely adjustments to the exploration strategy.Experiments on various benchmarks show that DyESP outperforms existing methods in terms of both speed and stability on almost all benchmarks.
more » « less
Free, publicly-accessible full text available May 28, 2026
Framework for Accurate Single-Molecule Spectroscopic Imaging Analyses Using Monte Carlo Simulation and Deep Learning

https://doi.org/10.1021/acs.analchem.5c01486

Mao, Hongjing; Liu, Yunshu; KanchanadeviVenkataraman, Obblivignes; Shahid, Md Abul; Laplante, Caroline; Xu, Dongkuan; Song, Ki-Hee; Zhang, Yang (July 2025, Analytical Chemistry)

Free, publicly-accessible full text available July 4, 2026
Adaptive Draft-Verification for Efficient Large Language Model Decoding

https://doi.org/10.1609/aaai.v39i23.34647

Liu, Xukun; Lei, Bowen; Zhang, Ruqi; Xu, Dongkuan DK (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires a separate forward pass through the model for each token generated, which is computationally inefficient and poses challenges for deploying LLMs in latency-sensitive scenarios.The main limitations of current decoding methods stem from their inefficiencies and resource demands. Existing approaches either necessitate fine-tuning smaller models, which is resource-intensive, or relying on fixed retrieval schemes to construct drafts for the next tokens, which lack adaptability and fail to generalize across different models and contexts.To address these issues, we introduce a novel methodology called Adaptix, which accelerates LLM decoding without requiring fine-tuning. Our approach involves an adaptive draft-verification process that evolves over time to improve efficiency. We utilize a tri-gram matrix-based LLM representation to dynamically approximate the output distribution of the LLM, allowing the model to adjust to changing token probabilities during the decoding process. Additionally, we implement a draft construction mechanism that effectively balances exploration and exploitation, ensuring that the drafts generated are both diverse and close to the true output distribution of the LLM.The importance of this design lies in its ability to optimize the draft distribution adaptively, leading to faster and more accurate decoding. Through extensive experiments on various benchmark datasets and LLM architectures, we demonstrate that Adaptix significantly accelerates the decoding process while maintaining high accuracy, making it suitable for deployment in a wide range of practical applications.
more » « less
Free, publicly-accessible full text available April 11, 2026
MerryQuery: A Trustworthy LLM-Powered Tool Providing Personalized Support for Educators and Students

https://doi.org/10.1609/aaai.v39i28.35372

Tabarsi, Benyamin; Basarkar, Aditya; Liu, Xukun; Xu, Dongkuan DK; Barnes, Tiffany (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

The potential of Large Language Models (LLMs) in education is not trivial, but concerns about academic misconduct, misinformation, and overreliance limit their adoption. To address these issues, we introduce MerryQuery, an AI-powered educational assistant using Retrieval-Augmented Generation (RAG), to provide contextually relevant, course-specific responses. MerryQuery features guided dialogues and source citation to ensure trust and improve student learning. Additionally, it enables instructors to monitor student interactions, customize response granularity, and input multimodal materials without compromising data fidelity. By meeting both student and instructor needs, MerryQuery offers a responsible way to integrate LLMs into educational settings.
more » « less
Free, publicly-accessible full text available April 11, 2026
Revolutionizing Wireless Modeling and Simulation with Network-Oriented LLMs

https://doi.org/10.1109/IPCCC59868.2024.10850042

Liu, Jiewen; Peng, Zhiyuan; Xu, Dongkuan; Liu, Yuchen (November 2024, IEEE)

Free, publicly-accessible full text available November 22, 2025
EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture

https://doi.org/10.1109/TCAD.2024.3443692

Dong, Peiyan; Zhuang, Jinming; Yang, Zhuoping; Ji, Shixin; Li, Yanyu; Xu, Dongkuan; Huang, Heng; Hu, Jingtong; Jones, Alex K; Shi, Yiyu; et al (November 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Free, publicly-accessible full text available November 1, 2025
Empowering Secondary School Teachers: Creating, Executing, and Evaluating a Transformative Professional Development Course on ChatGPT

https://doi.org/10.1109/FIE61694.2024.10893106

Reichert, Heidi; Tabarsi, Benyamin T; Zang, Zifan; Fennell, Cheri; Bhandari, Indira; Robinson, David; Drayton, Madeline; Crofton, Catherine; Lococo, Matthew; Xu, Dongkuan; et al (October 2024, IEEE)

Background and Context. This innovative practice full paper describes the development and implementation of a professional development (PD) opportunity for secondary teachers to learn about ChatGPT. Incorporating generative AI techniques from Large Language Models (LLMs) such as ChatGPT into educational environments offers unprecedented opportunities and challenges. Prior research has highlighted their potential to personalize feedback, assist in lesson planning, generate educational content, and reduce teachers' workload, alongside concerns such as academic integrity and student privacy. However, the rapid adoption of LLMs since ChatGPT's public release in late 2022 has left educators, particularly at the secondary level, with a lack of clear guidance on how LLMs work and can be effectively adopted. Objective. This study aims to introduce a comprehensive, free, and vetted ChatGPT course tailored for secondary teachers, with the objective of enhancing their technological competencies in LLMs and fostering innovative teaching practices. Method. We developed a five-session interactive course on ChatGPT capabilities, limitations, prompt-engineering techniques, ethical considerations, and strategies for incorporating ChatGPT into teaching. We introduced the course to six middle and high school teachers. Our curriculum emphasized active learning through peer discussions, hands-on activities, and project-based learning. We conducted pre- and post-course focus groups to determine the effectiveness of the course and the extent to which teachers' attitudes toward the use of LLMs in schools had changed. To identify trends in knowledge and attitudes, we asked teachers to complete feedback forms at the end of each of the five sessions. We performed a thematic analysis to classify teacher quotes from focus groups' transcripts as positive, negative, and neutral and calculated the ratio of positive to negative comments in the pre- and post-focus groups. We also analyzed their feedback on each individual session. Finally, we interviewed all participants five months after course completion to understand the longer-term impacts of the course. Findings. Our participants unanimously shared that all five of the sessions provided a deeper understanding of ChatGPT, featured enough opportunities for hands-on practice, and achieved their learning objectives. Our thematic analysis underlined that teachers gained a more positive and nuanced understanding of ChatGPT after the course. This change is evidenced quantitatively by the fact that quotes with positive connotations rose from 45% to 68% of the total number of positive and negative quotes. Participants shared that in the longer term, the course improved their professional development, understanding of ChatGPT, and teaching practices. Implications. This research underscores the effectiveness of active learning in professional development settings, particularly for technological innovations in computing like LLMs. Our findings suggest that introducing teachers to LLM tools through active learning can improve their work processes and give them a thorough and accurate understanding of how these tools work. By detailing our process and providing a model for similar initiatives, our work contributes to the broader discourse on teaching professional educators about computing and integrating emerging technologies in educational and professional development settings.
more » « less
Full Text Available
Towards Inductive and Efficient Explanations for Graph Neural Networks

https://doi.org/10.1109/TPAMI.2024.3362584

Luo, Dongsheng; Zhao, Tianxiang; Cheng, Wei; Xu, Dongkuan; Han, Feng; Yu, Wenchao; Liu, Xiao; Chen, Haifeng; Zhang, Xiang (August 2024, IEEE Transactions on Pattern Analysis and Machine Intelligence)

Despite recent progress in Graph Neural Networks (GNNs), explaining predictions made by GNNs remains a challenging and nascent problem. The leading method mainly considers the local explanations, i.e., important subgraph structure and node features, to interpret why a GNN model makes the prediction for a single instance, e.g. a node or a graph. As a result, the explanation generated is painstakingly customized at the instance level. The unique explanation interpreting each instance independently is not sufficient to provide a global understanding of the learned GNN model, leading to the lack of generalizability and hindering it from being used in the inductive setting. Besides, training the explanation model explaining for each instance is time-consuming for large-scale real-life datasets. In this study, we address these key challenges and propose PGExplainer, a parameterized explainer for GNNs. PGExplainer adopts a deep neural network to parameterize the generation process of explanations, which renders PGExplainer a natural approach to multi-instance explanations. Compared to the existing work, PGExplainer has better generalization ability and can be utilized in an inductive setting without training the model for new instances. Thus, PGExplainer is much more efficient than the leading method with significant speed-up. In addition, the explanation networks can also be utilized as a regularizer to improve the generalization power of existing GNNs when jointly trained with downstream tasks. Experiments on both synthetic and real-life datasets show highly competitive performance with up to 24.7% relative improvement in AUC on explaining graph classification over the leading baseline.
more » « less
Full Text Available
EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture

Dong, Peiyan; Zhuang, Jinming; Yang, Zhuoping; Ji, Shixin; Li, Yanyu; Xu, Dongkuan; Huang, Heng; Hu, Jingtong; Jones, Alex K; Shi, Yiyu; et al (October 2024, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS)

While Vision Transformers (ViTs) have shown consistent progress in computer vision, deploying them for real-time decision-making scenarios (< 1 ms) is challenging. Current computing platforms like CPUs, GPUs, or FPGA-based solutions struggle to meet this deterministic low-latency real-time requirement, even with quantized ViT models. Some approaches use pruning or sparsity to reduce model size and latency, but this often results in accuracy loss. To address the aforementioned constraints, in this work, we propose EQ-ViT, an end-to-end acceleration framework with novel algorithm and architecture co-design features to enable real-time ViT acceleration on AMD Versal Adaptive Compute Acceleration Platform (ACAP). The contributions are four-fold. First, we perform in-depth kernel- level performance profiling & analysis and explain the bottlenecks for existing acceleration solutions on GPU, FPGA, and ACAP. Second, on the hardware level, we introduce a new spatial and heterogeneous accelerator architecture, EQ-ViT architec- ture. This architecture leverages the heterogeneous features of ACAP, where both FPGA and artificial intelligence engines (AIEs) coexist on the same system-on-chip (SoC). Third, On the algorithm level, we create a comprehensive quantization-aware training strategy, EQ-ViT algorithm. This strategy concurrently quantizes both weights and activations into 8-bit integers, aiming to improve accuracy rather than compromise it during quanti- zation. Notably, the method also quantizes nonlinear functions for efficient hardware implementation. Fourth, we design EQ- ViT automation framework to implement the EQ-ViT architec- ture for four different ViT applications on the AMD Versal ACAP VCK190 board, achieving accuracy improvement with 2.4%, and average speedups of 315.0x, 3.39x, 3.38x, 14.92x, 59.5x, 13.1x over computing solutions of Intel Xeon 8375C vCPU, Nvidia A10G, A100, Jetson AGX Orin GPUs, and AMD ZCU102, U250 FPGAs. The energy efficiency gains are 62.2x, 15.33x, 12.82x, 13.31x, 13.5x, 21.9x.
more » « less
Full Text Available
Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks

https://doi.org/10.1109/JSAC.2024.3431575

Zhang, Zifan; Liu, Yuchen; Peng, Zhiyuan; Chen, Mingzhe; Xu, Dongkuan; Cui, Shuguang (January 2024, IEEE Journal on Selected Areas in Communications)

Optimizing edge caching is crucial for the advancement of next-generation (nextG) wireless networks, ensuring high-speed and low-latency services for mobile users. Existing data-driven optimization approaches often lack awareness of the distribution of random data variables and focus solely on optimizing cache hit rates, neglecting potential reliability concerns, such as base station overload and unbalanced cache issues. This oversight can result in system crashes and degraded user experience. To bridge this gap, we introduce a novel digital twin-assisted optimization framework, called D-REC, which integrates reinforcement learning (RL) with diverse intervention modules to ensure reliable caching in nextG wireless networks. We first develop a joint vertical and horizontal twinning approach to efficiently create network digital twins, which are then employed by D-REC as RL optimizers and safeguards, providing ample datasets for training and predictive evaluation of our cache replacement policy. By incorporating reliability modules into a constrained Markov decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints, minimizing the risk of network failures. Theoretical analysis demonstrates comparable convergence rates between DREC and vanilla data-driven methods without compromising caching performance. Extensive experiments validate that D-REC outperforms conventional approaches in cache hit rate and load balancing while effectively enforcing predetermined reliability intervention modules.
more » « less
Full Text Available

« Prev Next »

Search for: All records