NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees

Nguyen, Thien H; Nguyen, Huy L (July 2025, Proceedings of Machine Learning Research)

We introduce two complementary techniques for efficient optimization that reduce memory requirements while accelerating training of large-scale neural networks. The first technique, Subset-Norm step size, generalizes AdaGrad-Norm and AdaGrad(-Coordinate) through step-size sharing. Subset-Norm (SN) reduces AdaGrad’s memory footprint from O(d) to O(sqrt(d)), where d is the model size. For non-convex smooth objectives under coordinate-wise sub-gaussian noise, we show a noise-adapted high-probability convergence guarantee with improved dimensional dependence of SN over existing methods. Our second technique, Subspace-Momentum, reduces the momentum state’s memory footprint by restricting momentum to a low-dimensional subspace while performing SGD in the orthogonal complement. We prove a high-probability convergence result for Subspace-Momentum under standard assumptions. Empirical evaluation on pre-training and fine-tuning LLMs demonstrates the effectiveness of our methods. For instance, combining Subset-Norm with Subspace-Momentum achieves Adam’s validation perplexity for LLaMA 1B in approximately half the training tokens (6.8B vs 13.1B) while reducing Adam’s optimizer-states memory footprint by more than 80% with minimal additional hyperparameter tuning.
more » « less
Free, publicly-accessible full text available July 13, 2026
Householder Pseudo-Rotation: A Novel Approach to Activation Editing in LLMs with Direction-Magnitude Perspective

https://doi.org/10.18653/v1/2024.emnlp-main.761

Pham, Van-Cuong; Nguyen, Thien Huu (November 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024))

Full Text Available
Modal analysis of blood flows in saccular aneurysms

https://doi.org/10.1063/5.0243383

Nguyen, Thien-Tam; Kasperski, Davina; Huynh, Phat Kim; Le, Trung Quoc; Le, Trung Bao (January 2025, Physics of Fluids)

Currently, it is challenging to investigate aneurismal hemodynamics based on current in vivo data such as Magnetic Resonance Imaging or Computed Tomography due to the limitations in both spatial and temporal resolutions. In this work, we investigate the use of modal analysis at various resolutions to examine its usefulness for analyzing blood flows in brain aneurysms. Two variants of Dynamic Mode Decomposition (DMD): (i) Hankel-DMD; and (ii) Optimized-DMD, are used to extract the time-dependent dynamics of blood flows during one cardiac cycle. First, high-resolution hemodynamic data in patient-specific aneurysms are obtained using Computational Fluid Dynamics. Second, the dynamics modes, along with their spatial amplitudes and temporal magnitudes are calculated using the DMD analysis. Third, an examination of DMD analyses using a range of spatial and temporal resolutions of hemodynamic data to validate the applicability of DMD for low-resolution data, similar to ones in clinical practices. Our results show that DMD is able to characterize the inflow jet dynamics by separating large-scale structures and flow instabilities even at low spatial and temporal resolutions. Its robustness in quantifying the flow dynamics using the energy spectrum is demonstrated across different resolutions in all aneurysms in our study population. Our work indicates that DMD can be used for analyzing blood flow patterns of brain aneurysms and is a promising tool to be explored in in vivo.
more » « less
Full Text Available
ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning

https://doi.org/10.18653/v1/2024.emnlp-demo.24

Man, Hieu; Ngo, Nghia Trung; Dernoncourt, Franck; Nguyen, Thien Huu (November 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP 2024))

Full Text Available
Lifelong Event Detection via Optimal Transport

https://doi.org/10.18653/v1/2024.emnlp-main.701

Dao, Viet; Pham, Van-Cuong; Tran, Quyen; Le, Thanh-Thien; Ngo, Linh; Nguyen, Thien Huu (November 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024))

Full Text Available
Counterfactual Augmentation for Robust Authorship Representation Learning

https://doi.org/10.1145/3626772.3657956

Man, Hieu; Nguyen, Thien Huu (July 2024, Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024))

Full Text Available
Preserving Generalization of Language Models in Few-shot Continual Relation Extraction

https://doi.org/10.18653/v1/2024.emnlp-main.763

Tran, Quyen; Thanh, Nguyen Xuan; Anh, Nguyen Hoang; Hai, Nam Le; Le, Trung; Ngo, Linh; Nguyen, Thien Huu (November 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP))

Full Text Available
Realistic Evaluation of Toxicity in Large Language Models

https://doi.org/10.18653/v1/2024.findings-acl.61

Luong, Tinh; Le, Thanh-Thien; Ngo, Linh; Nguyen, Thien Huu (August 2024, Association for Computational Linguistics)

Full Text Available
MCECR: A Novel Dataset for Multilingual Cross-Document Event Coreference Resolution

https://doi.org/10.18653/v1/2024.findings-naacl.245

Pouran_Ben_Veyseh, Amir; Lai, Viet; Nguyen, Chien; Dernoncourt, Franck; Nguyen, Thien Huu (June 2024, Findings of the Association for Computational Linguistics: NAACL 2024)

Full Text Available
Mastering Context-to-Label Representation Transformation for Event Causality Identification with Diffusion Models

Man, Hieu; Dernoncourt, Franck; Nguyen, Thien Huu (March 2024, Proceedings of the 38th AAAI Conference on Artificial Intelligence)

« Prev Next »

Search for: All records