NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Towards Automated Model Design on Recommender Systems

https://doi.org/10.1145/3706124

Zhang, Tunhou; Cheng, Dehua; He, Yuchen; Chen, Zhengxing; Dai, Xiaoliang; Xiong, Liang; Liu, Yudong; Cheng, Feng; Cao, Yufan; Yan, Feng; et al (December 2024, ACM Transactions on Recommender Systems)

The increasing popularity of deep learning models has created new opportunities for developing AI-based recommender systems. Designing recommender systems using deep neural networks requires careful architecture design, and further optimization demands extensive co-design efforts on jointly optimizing model architecture and hardware. Design automation, such as Automated Machine Learning (AutoML), is necessary to fully exploit the potential of recommender model design, including model choices and model-hardware co-design strategies. We introduce a novel paradigm that utilizes weight sharing to explore abundant solution spaces. Our paradigm creates a large supernet to search for optimal architectures and co-design strategies to address the challenges of data multi-modality and heterogeneity in the recommendation domain. From a model perspective, the supernet includes a variety of operators, dense connectivity, and dimension search options. From a co-design perspective, it encompasses versatile Processing-In-Memory (PIM) configurations to produce hardware-efficient models. Our solution space’s scale, heterogeneity, and complexity pose several challenges, which we address by proposing various techniques for training and evaluating the supernet. Our crafted models show promising results on three Click-Through Rates (CTR) prediction benchmarks, outperforming both manually designed and AutoML-crafted models with state-of-the-art performance when focusing solely on architecture search. From a co-design perspective, we achieve 2 × FLOPs efficiency, 1.8 × energy efficiency, and 1.5 × performance improvements in recommender models.
more » « less
Full Text Available
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts

https://doi.org/10.1109/CVPR52734.2025.01227

Liang, Feng; Ma, Haoyu; He, Zecheng; Hou, Tingbo; Hou, Ji; Li, Kunpeng; Dai, Xiaoliang; Juefei-Xu, Felix; Azadi, Samaneh; Sinha, Animesh; et al (June 2025, IEEE)

Free, publicly-accessible full text available June 10, 2026
An Investigation on Hardware-Aware Vision Transformer Scaling

https://doi.org/10.1145/3611387

Li, Chaojian; Kim, Kyungmin; Wu, Bichen; Zhang, Peizhao; Zhang, Hang; Dai, Xiaoliang; Vajda, Peter; Lin, Yingyan (August 2023, ACM Transactions on Embedded Computing Systems)

Vision Transformer (ViT) has demonstrated promising performance in various computer vision tasks, and recently attracted a lot of research attention. Many recent works have focused on proposing new architectures to improve ViT and deploying it into real-world applications. However, little effort has been made to analyze and understand ViT’s architecture design space and its implication of hardware-cost on different devices. In this work, by simply scaling ViT’s depth, width, input size, and other basic configurations, we show that a scaled vanilla ViT model without bells and whistles can achieve comparable or superior accuracy-efficiency trade-off than most of the latest ViT variants. Specifically, compared to DeiT-Tiny, our scaled model achieves a\(\uparrow 1.9\% \)higher ImageNet top-1 accuracy under the same FLOPs and a\(\uparrow 3.7\% \)better ImageNet top-1 accuracy under the same latency on an NVIDIA Edge GPU TX2. Motivated by this, we further investigate the extracted scaling strategies from the following two aspects: (1) “can these scaling strategies be transferred across different real hardware devices?”; and (2) “can these scaling strategies be transferred to different ViT variants and tasks?”. For (1), our exploration, based on various devices with different resource budgets, indicates that the transferability effectiveness depends on the underlying device together with its corresponding deployment tool; for (2), we validate the effective transferability of the aforementioned scaling strategies obtained from a vanilla ViT model on top of an image classification task to the PiT model, a strong ViT variant targeting efficiency, as well as object detection and video classification tasks. In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from\(74.6\% \)to\(76.7\% \)(\(\uparrow 2.1\% \)) under the same 0.7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by\(\uparrow 0.7\% \)under a similar throughput on a V100 GPU.
more » « less
Full Text Available
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

https://doi.org/10.1109/CVPR52729.2023.01387

You, Haoran; Xiong, Yunyang; Dai, Xiaoliang; Wu, Bichen; Zhang, Peizhao; Fan, Haoqi; Vajda, Peter; Lin, Yingyan Celine (June 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Vision Transformers (ViTs) have shown impressive per-formance but still require a high computation cost as compared to convolutional neural networks (CNNs), one rea-son is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of in-put tokens. Existing efficient ViTs adopt local attention or linear attention, which sacrifice ViTs' capabilities of capturing either global or local context. In this work, we ask an important research question: Can ViTs learn both global and local context while being more efficient during inference? To this end, we propose a framework called Castling- ViT, which trains ViTs using both linear-angular attention and masked softmax-based quadratic attention, but then switches to having only linear-angular attention during inference. Our Castling- ViT leverages angular ker-nels to measure the similarities between queries and keys via spectral angles. And we further simplify it with two techniques: (1) a novel linear-angular attention mechanism: we decompose the angular kernels into linear terms and high-order residuals, and only keep the linear terms; and (2) we adopt two parameterized modules to approximate high-order residuals: a depthwise convolution and an aux-iliary masked softmax attention to help learn global and lo-cal information, where the masks for softmax attention are regularized to gradually become zeros and thus incur no overhead during inference. Extensive experiments validate the effectiveness of our Castling- ViT, e.g., achieving up to a 1.8% higher accuracy or 40% MACs reduction on classification and 1.2 higher mAP on detection under comparable FLOPs, as compared to ViTs with vanilla softmax-based at-tentions. Project page is available at here.
more » « less
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

https://doi.org/10.1109/CVPR52729.2023.00682

Liang, Feng; Wu, Bichen; Dai, Xiaoliang; Li, Kunpeng; Zhao, Yinan; Zhang, Hang; Zhang, Peizhao; Vajda, Peter; Marculescu, Diana (June 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

Zhang, Tunhou; Cheng, Dehua; He, Yuchen; Chen, Zhengxing; Dai, Xiaoliang; Xiong, Liang; Yan, Feng; Li, Hai; Chen, Yiran; Wen, Wei (May 2023, 2023 ACM Web Conference (WWW 2023))

Full Text Available
NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

https://doi.org/10.1145/3543507.3583446

Zhang, Tunhou; Cheng, Dehua; He, Yuchen; Chen, Zhengxing; Dai, Xiaoliang; Xiong, Liang; Yan, Feng; Li, Hai; Chen, Yiran; Wen, Wei (April 2023, the ACM Web Conference 2023)

The rise of deep neural networks offers new opportunities in optimizing recommender systems. However, optimizing recommender systems using deep neural networks requires delicate architecture fabrication. We propose NASRec, a paradigm that trains a single supernet and efficiently produces abundant models/sub-architectures by weight sharing. To overcome the data multi-modality and architecture heterogeneity challenges in the recommendation domain, NASRec establishes a large supernet (i.e., search space) to search the full architectures. The supernet incorporates versatile choice of operators and dense connectivity to minimize human efforts for finding priors. The scale and heterogeneity in NASRec impose several challenges, such as training inefficiency, operator-imbalance, and degraded rank correlation. We tackle these challenges by proposing single-operator any-connection sampling, operator-balancing interaction modules, and post-training fine-tuning. Our crafted models, NASRecNet, show promising results on three Click-Through Rates (CTR) prediction benchmarks, indicating that NASRec outperforms both manually designed models and existing NAS methods with state-of-the-art performance. Our work is publicly available here.
more » « less
Full Text Available
NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

https://doi.org/10.1145/3543507.3583446

Zhang, Tunhou; Chen, Dehua; He, Yuchen; Cheng, Zhengxing; Dai, Xiaoliang; Xiong, Liang; Yan, Feng; Li, Hai; Chen, Yiran; Wen Wei. (April 2023, Proceedings of the ACM Web Conference)
FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions

https://doi.org/10.1109/CVPR42600.2020.01298

Wan, Alvin; Dai, Xiaoliang; Zhang, Peizhao; He, Zijian; Tian, Yuandong; Xie, Saining; Wu, Bichen; Yu, Matthew; Xu, Tao; Chen, Kan; et al (June 2020, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
null (Ed.)
Differentiable Neural Architecture Search (DNAS) has demonstrated great success in designing state-of-the-art, efficient neural networks. However, DARTS-based DNAS's search space is small when compared to other search methods', since all candidate network layers must be explicitly instantiated in memory. To address this bottleneck, we propose a memory and computationally efficient DNAS variant: DMaskingNAS. This algorithm expands the search space by up to 10^14x over conventional DNAS, supporting searches over spatial and channel dimensions that are otherwise prohibitively expensive: input resolution and number of filters. We propose a masking mechanism for feature map reuse, so that memory and computational costs stay nearly constant as the search space expands. Furthermore, we employ effective shape propagation to maximize per-FLOP or per-parameter accuracy. The searched FBNetV2s yield state-of-the-art performance when compared with all previous architectures. With up to 421x less search cost, DMaskingNAS finds models with 0.9% higher accuracy, 15% fewer FLOPs than MobileNetV3-Small; and with similar accuracy but 20% fewer FLOPs than Efficient-B0. Furthermore, our FBNetV2 outperforms MobileNetV3 by 2.6% in accuracy, with equivalent model size. FBNetV2 models are open-sourced at https://github.com/facebookresearch/mobile-vision.
more » « less
Full Text Available

Search for: All records