NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Distilling Vision-Language Models on Millions of Videos

https://doi.org/10.1109/CVPR52733.2024.01245

Zhao, Yue; Zhao, Long; Zhou, Xingyi; Wu, Jialin; Chu, Chun-Te; Miao, Hui; Schroff, Florian; Adam, Hartwig; Liu, Ting; Gong, Boqing; et al (June 2024, CVPR)

Full Text Available
ProvDB: Lifecycle Management of Collaborative Analysis Workflows

https://doi.org/10.1145/3077257.3077267

Miao, Hui; Chavan, Amit; Deshpande, Amol (May 2017, 2nd Workshop on Human-In-the-Loop Data Analytics)

As data-driven methods are becoming pervasive in a wide variety of disciplines, there is an urgent need to develop scalable and sustainable tools to simplify the process of data science, to make it easier to keep track of the analyses being performed and datasets being generated, and to enable introspection of the workflows. In this paper, we describe our vision of a unified provenance and metadata management system to support lifecycle management of complex collaborative data science workflows. We argue that a large amount of information about the analysis processes and data artifacts can, and should be, captured in a semi-passive manner; and we show that querying and analyzing this information can not only simplify bookkeeping and debugging tasks for data analysts but can also enable a rich new set of capabilities like identifying flaws in the data science process itself. It can also significantly reduce the time spent in fixing post-deployment problems through automated analysis and monitoring. We have implemented an initial prototype of our system, called ProvDB, on top of git (a version control system) and Neo4j (a graph database), and we describe its key features and capabilities.
more » « less
Full Text Available
ModelHub: Deep Learning Lifecycle Management

https://doi.org/10.1109/ICDE.2017.192

Miao, Hui; Li, Ang; Davis, Larry S.; Deshpande, Amol (April 2017, Data Engineering (ICDE), 2017 IEEE 33rd International Conference on)

Deep learning has improved the state-of-the-art results in many domains, leading to the development of several systems for facilitating deep learning. Current systems, however, mainly focus on model building and training phases, while the issues of lifecycle management are largely ignored. Deep learning modeling lifecycle contains a rich set of artifacts and frequently conducted tasks, dealing with them is cumbersome and left to the users. To address these issues in a comprehensive manner, we demonstrate ModelHub, which includes a novel model versioning system (dlv), a domain-specific language for searching through model space (DQL), and a hosted service (ModelHub).
more » « less
Full Text Available
Towards Unified Data and Lifecycle Management for Deep Learning

https://doi.org/10.1109/ICDE.2017.112

Miao, Hui; Li, Ang; Davis, Larry S.; Deshpande, Amol (April 2017, Data Engineering (ICDE), 2017 IEEE 33rd International Conference on)

Deep learning has improved state-of-the-art results in many important fields, and has been the subject of much research in recent years, leading to the development of several systems for facilitating deep learning. Current systems, however, mainly focus on model building and training phases, while the issues of data management, model sharing, and lifecycle management are largely ignored. Deep learning modeling lifecycle generates a rich set of data artifacts, e.g., learned parameters and training logs, and it comprises of several frequently conducted tasks, e.g., to understand the model behaviors and to try out new models. Dealing with such artifacts and tasks is cumbersome and largely left to the users. This paper describes our vision and implementation of a data and lifecycle management system for deep learning. First, we generalize model exploration and model enumeration queries from commonly conducted tasks by deep learning modelers, and propose a high-level domain specific language (DSL), inspired by SQL, to raise the abstraction level and thereby accelerate the modeling process. To manage the variety of data artifacts, especially the large amount of checkpointed float parameters, we design a novel model versioning system (dlv), and a read-optimized parameter archival storage system (PAS) that minimizes storage footprint and accelerates query workloads with minimal loss of accuracy. PAS archives versioned models using deltas in a multi-resolution fashion by separately storing the less significant bits, and features a novel progressive query (inference) evaluation algorithm. Third, we develop efficient algorithms for archiving versioned models using deltas under co-retrieval constraints. We conduct extensive experiments over several real datasets from computer vision domain to show the efficiency of the proposed techniques.
more » « less
Full Text Available

Search for: All records