Self-Enhancing Video Data Management System for Compositional Events with Large Language Models

Zhang, Enhao; Sullivan, Nicole; Haynes, Brandon; Krishna, Ranjay; Balazinska, Magdalena

doi:10.1145/3725352

Citation Details

This content will become publicly available on June 17, 2026

Self-Enhancing Video Data Management System for Compositional Events with Large Language Models

Complex video queries can be answered by decomposing them into modular subtasks. However, existing video data management systems assume the existence of predefined modules for each subtask. We introduce VOCAL-UDF, a novel self-enhancing system that supports compositional queries over videos without the need for predefined modules. VOCAL-UDF automatically identifies and constructs missing modules and encapsulates them as user-defined functions (UDFs), thus expanding its querying capabilities. To achieve this, we formulate a unified UDF model that leverages large language models (LLMs) to aid in new UDF generation. VOCAL UDF handles a wide range of concepts by supporting both program-based UDFs (i.e., Python functions generated by LLMs) and distilled-model UDFs (lightweight vision models distilled from strong pretrained models). To resolve the inherent ambiguity in user intent, VOCAL-UDF generates multiple candidate UDFs and uses active learning to efficiently select the best one. With the self-enhancing capability, VOCAL-UDF significantly improves query performance across three video datasets. more »

Award ID(s):: 2211133

PAR ID:: 10611949

Author(s) / Creator(s):: Zhang, Enhao; Sullivan, Nicole; Haynes, Brandon; Krishna, Ranjay; Balazinska, Magdalena

Publisher / Repository:: ACM DL

Date Published:: 2025-06-17

Journal Name:: Proceedings of the ACM on Management of Data

Volume:: 3

Issue:: 3

ISSN:: 2836-6573

Page Range / eLocation ID:: 1 to 29

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 17, 2026
Journal Article:
https://doi.org/10.1145/3725352

More Like this