NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Lin, Yujun; Tang, Haotian; Yang, Shang; ZHang, Zhekai; Xiao, Guangxuan; Gan, Chuang; Han, Song (May 2025, The Eighth Annual Conference on Machine Learning and Systems)

Free, publicly-accessible full text available May 12, 2026
Socialgpt: Prompting llms for social relation reasoning via greedy segment optimization

Li, Wanhua; Meng, Zibin; Zhou, Jiawei; Wei, Donglai; Gan, Chuang; Pfister, Hanspeter (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
RoboGen: towards unleashing infinite data for automated robot learning via generative simulation

Wang, Yufei; Xian, Zhou; Chen, Feng; Wang, Tsun-Hsuan; Wang, Yian; Fragkiadaki, Katerina; Erickson, Zackory; Held, David; Gan, Chuang (June 2024, arxiv.org)

We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. RoboGen leverages the latest advancements in foundation and generative models. Instead of directly adapting these models to produce policies or low-level actions, we advocate for a generative scheme, which uses these models to automatically generate diversified tasks, scenes, and training supervisions, thereby scaling up robotic skill learning with minimal human supervision. Our approach equips a robotic agent with a self-guided propose-generate-learn cycle: the agent first proposes interesting tasks and skills to develop, and then generates simulation environments by populating pertinent assets with proper spatial configurations. Afterwards, the agent decomposes the proposed task into sub-tasks, selects the optimal learning approach (reinforcement learning, motion planning, or trajectory optimization), generates required training supervision, and then learns policies to acquire the proposed skill. Our fully generative pipeline can be queried repeatedly, producing an endless stream of skill demonstrations associated with diverse tasks and environments.
more » « less
Full Text Available
Aligning Large Multimodal Models with Factually Augmented RLHF

Sun, Zhiqing; Shen, Sheng; Cao, Shengcao; Liu, Haotian; Li, Chunyuan; Shen, Yikang; Gan, Chuang; Gui, Liangyan; Wang, Yu-Xiong; Yang, Yiming; et al (August 2024, Findings of the Association for Computational Linguistics (ACL Findings))

Full Text Available
AWQ: ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION

Lin, Ji; Tang, Jiaming; Tang, Haotian; Yang, Shang; Chen, Wei-Ming; Wang, Wei-Chen; Xiao, Guangxuan; Dang, Xingyu; Gan, Chuang; Han, Song (May 2024, The Seventh Annual Conference on Machine Learning and Systems (MLSys))

Full Text Available
PockEngine: Sparse and Efficient Fine-tuning in a Pocket

https://doi.org/10.1145/3613424.3614307

Zhu, Ligeng; Hu, Lanxiang; Lin, Ji; Chen, Wei-Ming; Wang, Wei-Chen; Gan, Chuang; Han, Song (October 2023, ACM)

Full Text Available
Learning Situation Hyper-Graphs for Video Question Answering

https://doi.org/10.1109/CVPR52729.2023.01429

Khan, Aisha Urooj; Kuehne, Hilde; Wu, Bo; Chheu, Kim; Bousselham, Walid; Gan, Chuang; Lobo, Niels; Shah, Mubarak (June 2023, IEEE Computer Society)

Answering questions about complex situations in videos requires not only capturing the presence of actors, objects, and their relations but also the evolution of these relationships over time. A situation hyper-graph is a representation that describes situations as scene sub-graphs for video frames and hyper-edges for connected sub-graphs and has been proposed to capture all such information in a compact structured form. In this work, we propose an architecture for Video Question Answering (VQA) that enables answering questions related to video content by predicting situation hyper-graphs, coined Situation Hyper-Graph based Video Question Answering (SHG- VQA). To this end, we train a situation hyper-graph decoder to implicitly identify graph representations with actions and object/human-object relationships from the input video clip. and to use cross-attention between the predicted situation hyper-graphs and the question embedding to predict the correct answer. The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction. The effectiveness of the proposed architecture is extensively evaluated on two challenging benchmarks: AGQA and STAR. Our results show that learning the underlying situation hyper-graphs helps the system to significantly improve its performance for novel challenges of video question-answering tasks11Code will be available at https://github.com/aurooj/SHG-VQA.
more » « less
Full Text Available
FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation

Zhou, Xian; Zhu, Bo; Xu, Zhenjia; Tung, Hsiao-Yu; Torralba, Antonio; Fragkiadaki, Katerina; Gan, Chuang (March 2023, International Conference on Learning Representations 2023)
On-Device Training Under 256KB Memory

Lin, Ji; Zhu, Ligeng; Chen, Wei-Ming; Wang, Wei-Chen; Gan, Chuang; Han, Song (December 2022, The Thirty-Six Annual Conference on Neural Information Processing Systems)

Full Text Available
PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification

Li, Xuan; Qiao, Yi-Ling; Chen, Peter Yichen; Jatavallabhula, Krishna Murthy; Lin, Ming; Jiang, Chenfanfu; Gan, Chuang (January 2023, International Conference on Learning Representations (ICLR))

Full Text Available

« Prev Next »

Search for: All records