NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

Yuan, Xin; Savarese, Pedro; Maire, Michael (May 2024, NeurIPS Proceedings)

Full Text Available
HyperFields: Towards Zero-Shot Generation of NeRFs from Text

Babu, Sudarshan; Liu, Richard Liu; Zhou, Avery; Maire, Michael; Shakhnarovich, Greg; Hanocka, Rana (July 2024, ICML 2024 in OpenReview.net)

Full Text Available
HyperFields: Towards zero-shot generation of NeRFs from text

Babu, Sudarshan; Liu, Richard; Zhou, Avery; Maire, Michael; Shakhnarovich, Greg; Hanocka, Rana (June 2024, ICML)

We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth mapping from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hypernetwork. These techniques enable a single network to fit over a hundred unique scenes. We further demonstrate that HyperFields learns a more general map between text and NeRFs, and consequently is capable of predicting novel in-distribution and out-of-distribution scenes--either zero-shot or with a few finetuning steps. Finetuning HyperFields benefits from accelerated convergence thanks to the learned general map, and is capable of synthesizing novel scenes 5 to 10 times faster than existing neural optimization-based methods. Our ablation experiments show that both the dynamic architecture and NeRF distillation are critical to the expressivity of HyperFields.
more » « less
Full Text Available
ChameleonAPI: Automatic and Efficient Customization of Neural Networks for ML Applications

Liu, Yuhan; Wan, Chengcheng; Du, Kuntai; Hoffmann, Henry; Jiang, Junchen; Lu, Shan; Maire, Michael (July 2024, Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation)

ML APIs have greatly relieved application developers of the burden to design and train their own neural network models—classifying objects in an image can now be as simple as one line of Python code to call an API. However, these APIs offer the same pre-trained models regardless of how their output is used by different applications. This can be suboptimal as not all ML inference errors can cause application failures, and the distinction between inference errors that can or cannot cause failures varies greatly across applications. To tackle this problem, we first study 77 real-world applications, which collectively use six ML APIs from two providers, to reveal common patterns of how ML API output affects applications' decision processes. Inspired by the findings, we propose ChameleonAPI, an optimization framework for ML APIs, which takes effect without changing the application source code. ChameleonAPI provides application developers with a parser that automatically analyzes the application to produce an abstract of its decision process, which is then used to devise an application-specific loss function that only penalizes API output errors critical to the application. ChameleonAPI uses the loss function to efficiently train a neural network model customized for each application and deploys it to serve API invocations from the respective application via existing interface. Compared to a baseline that selects the best-of-all commercial ML API, we show that ChameleonAPI reduces incorrect application decisions by 43%.
more » « less
Full Text Available
ChameleonAPI: Automatic and Efficient Customization of Neural Networks for ML Applications

Liu, Yuhan; Wan, Chengcheng; Du, Kuntai; Hoffmann, Henry; Jiang, Junchen; Lu, Shan; Maire, Michael (July 2024, USENIX, OSDI 2024)

Full Text Available
CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

Liu, Yuhan; Li, Hanchen; Cheng, Yihua; Ray, Siddhant; Huang, Yuyang; Zhang, Qizheng; Du, Kuntai; Yao, Jiayi; Lu, Shan; Ananthanarayanan, Ganesh; et al (August 2024, Association for Computing Machinery, New York, NY, United States)

As large language models (LLMs) take on complex tasks, their inputs are supplemented with longer contexts that incorporate domain knowledge. Yet using long contexts is challenging as nothing can be generated until the whole context is processed by the LLM. While the context-processing delay can be reduced by reusing the KV cache of a context across different inputs, fetching the KV cache, which contains large tensors, over the network can cause high extra network delays. CacheGen is a fast context-loading module for LLM systems. First, CacheGen uses a custom tensor encoder, leveraging KV cache's distributional properties to encode a KV cache into more compact bitstream representations with negligible decoding overhead, to save bandwidth usage. Second, CacheGen adapts the compression level of different parts of a KV cache to cope with changes in available bandwidth, in order to maintain low context-loading delay and high generation quality. We test CacheGen on popular LLMs and datasets. Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3.5--4.3x and the total delay in fetching and processing contexts by 3.2--3.7x with negligible impact on the LLM response quality. Our code is at: https://github.com/UChi-JCL/CacheGen.
more » « less
Full Text Available
CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

https://doi.org/10.1145/3651890.3672274

Liu, Yuhan; Li, Hanchen; Cheng, Yihua; Ray, Siddhant; Huang, Yuyang; Zhang, Qizheng; Du, Kuntai; Yao, Jiayi; Lu, Shan; Ananthanarayanan, Ganesh; et al (August 2024, ACM)

Full Text Available
Run-Time Prevention of Software Integration Failures of Machine Learning APIs

https://doi.org/10.1145/3622806

Wan, Chengcheng; Liu, Yuhan; Du, Kuntai; Hoffmann, Henry; Jiang, Junchen; Maire, Michael; Lu, Shan (October 2023, Proceedings of the ACM on Programming Languages)

Due to the under-specified interfaces, developers face challenges in correctly integrating machine learning (ML) APIs in software. Even when the ML API and the software are well designed on their own, the resulting application misbehaves when the API output is incompatible with the software. It is desirable to have an adapter that converts ML API output at runtime to better fit the software need and prevent integration failures. In this paper, we conduct an empirical study to understand ML API integration problems in real-world applications. Guided by this study, we present SmartGear, a tool that automatically detects and converts mismatching or incorrect ML API output at run time, serving as a middle layer between ML API and software. Our evaluation on a variety of open-source applications shows that SmartGear detects 70% incompatible API outputs and prevents 67% potential integration failures, outperforming alternative solutions.
more » « less
Full Text Available
Automated testing of software that uses machine learning APIs

https://doi.org/10.1145/3510003.3510068

Wan, Chengcheng; Liu, Shicheng; Xie, Sophie; Liu, Yifan; Hoffmann, Henry; Maire, Michael; Lu, Shan (May 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE))

Full Text Available
Are Machine Learning Cloud APIs Used Correctly?

https://doi.org/10.1109/ICSE43902.2021.00024

Wan, Chengcheng; Liu, Shicheng; Hoffmann, Henry; Maire, Michael; Lu, Shan (May 2021, 43rd International Conference on Software Engineering)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records