- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources5
- Resource Type
-
0004100000000000
- More
- Availability
-
23
- Author / Contributor
- Filter by Author / Creator
-
-
Jain, Jitesh (5)
-
Shi, Humphrey (5)
-
Li, Jiachen (3)
-
Chen, Fan (2)
-
Kuo, Chia-Wen (2)
-
Wang, Xinyao (2)
-
Wen, Longyin (2)
-
Xu, Lu (2)
-
Zhu, Sijie (2)
-
Huang, Yun (1)
-
Huang, Zilong (1)
-
Jin, Qiao (1)
-
Lu, Xi (1)
-
Meadan-Kaplansky, Hedda (1)
-
Orlov, Nikita (1)
-
Singh, Anukriti (1)
-
Walton, Steven (1)
-
Xiong, Jinjun (1)
-
Yang, Jianwei (1)
-
Zheng, Qingxiao (1)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of efficiently improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Experts (MoE) in LLMs, which improves model scalability during training while keeping inference costs similar to those of smaller models, we propose CuMo, which incorporates Co-upcycled Top-K sparsely-gated Mixtureof-experts blocks into both the vision encoder and the MLP connector, thereby enhancing the multimodal LLMs with neglectable additional activated parameters during inference. CuMo first pre-trains the MLP blocks and then initializes each expert in the MoE block from the pre-trained MLP block during the visual instruction tuning stage, with auxiliary losses to ensure a balanced loading of experts. CuMo outperforms state-of-the-art multimodal LLMs across various VQA and visual-instruction-following benchmarks within each model size group, all while training exclusively on open-sourced datasets.more » « lessFree, publicly-accessible full text available December 10, 2025
-
Li, Jiachen; Wang, Xinyao; Zhu, Sijie; Kuo, Chia-Wen; Xu, Lu; Chen, Fan; Jain, Jitesh; Shi, Humphrey; Wen, Longyin (, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))Free, publicly-accessible full text available December 1, 2025
-
Zheng, Qingxiao; Lu, Xi; Jin, Qiao; Jain, Jitesh; Meadan-Kaplansky, Hedda; Shi, Humphrey; Xiong, Jinjun; Huang, Yun (, ACM)Free, publicly-accessible full text available November 11, 2025
-
Jain, Jitesh; Yang, Jianwei; Shi, Humphrey (, IEEE)
-
Jain, Jitesh; Singh, Anukriti; Orlov, Nikita; Huang, Zilong; Li, Jiachen; Walton, Steven; Shi, Humphrey (, IEEE)
An official website of the United States government
