Localizing video moments based on the movement patterns of objects is an important task in video analytics. Existing video analytics systems offer two types of querying interfaces based on natural language and SQL, respectively. However, both types of interfaces have major limitations. SQL-based systems require high query specification time, whereas natural language-based systems require large training datasets to achieve satisfactory retrieval accuracy. To address these limitations, we present SketchQL, a video database management system (VDBMS) for offline, exploratory video moment retrieval that is both easy to use and generalizes well across multiple video moment datasets. To improve ease-of-use, SketchQL features avisual query interfacethat enables users to sketch complex visual queries through intuitive drag-and-drop actions. To improve generalizability, SketchQL operates on object-tracking primitives that are reliably extracted across various datasets using pre-trained models. We present a learned similarity search algorithm for retrieving video moments closely matching the user's visual query based on object trajectories. SketchQL trains the model on a diverse dataset generated with a novel simulator, that enhances its accuracy across a wide array of datasets and queries. We evaluate SketchQL on four real-world datasets with nine queries, demonstrating its superior usability and retrieval accuracy over state-of-the-art VDBMSs.
more »
« less
Visual Road: A Video Data Management Benchmark
Recently, video database management systems (VDBMSs) have re-emerged as an active area of research and development. To accelerate innovation in this area, we present Visual Road, a benchmark that evaluates the performance of these systems. Visual Road comes with a data generator and a suite of queries over cameras positioned within a simulated metropolitan environment. Visual Road's video data is automatically generated with a high degree of realism, and annotated using a modern simulation and visualization engine. This allows for VDBMS performance evaluation while scaling up the size of the input data. Visual Road is designed to evaluate a broad variety of VDBMSs: real-time systems, systems for longitudinal analytical queries, systems processing traditional videos, and systems designed for 360 videos. We use the benchmark to evaluate three recent VDBMSs both in capabilities and performance.
more »
« less
- Award ID(s):
- 1703051
- PAR ID:
- 10257096
- Date Published:
- Journal Name:
- SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data•
- Page Range / eLocation ID:
- 972 to 987
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Current video database management systems (VDBMSs) fail to support the growing number of video datasets in diverse domains because these systems assume clean data and rely on pretrained models to detect known objects or actions. Existing systems also lack good support for compositional queries that seek events con- sisting of multiple objects with complex spatial and temporal rela- tionships. In this paper, we propose VOCAL, a vision of a VDBMS that supports efficient data cleaning, exploration and organization, and compositional queries, even when no pretrained model exists to extract semantic content. These techniques utilize optimizations to minimize the manual effort required of users.more » « less
-
State-of-the-art video database management systems (VDBMSs) often use lightweight proxy models to accelerate object retrieval and aggregate queries. The key assumption underlying these systems is that the proxy model is an order of magnitude faster than the heavyweight oracle model. However, recent advances in computer vision have invalidated this assumption. Inference time of recently proposed oracle models is on par with or even lower than the proxy models used in state-of-the-art (SoTA) VDBMSs. This paper presents Seiden, a VDBMS that leverages this radical shift in the runtime gap between the oracle and proxy models. Instead of relying on a proxy model, Seiden directly applies the oracle model over a subset of frames to build a query-agnostic index, and samples additional frames to answer the query using an exploration-exploitation scheme during query processing. By leveraging the temporal continuity of the video and the output of the oracle model on the sampled frames, Seiden delivers faster query processing and better query accuracy than SoTA VDBMSs. Our empirical evaluation shows that Seiden is on average 6.6 x faster than SoTA VDBMSs across diverse queries and datasets.more » « less
-
null (Ed.)Modern video data management systems store videos as a single encoded file, which significantly limits possible storage level optimizations. We design, implement, and evaluate TASM, a new tile-based storage manager for video data. TASM uses a feature in modern video codecs called "tiles" that enables spatial random access into encoded videos. TASM physically tunes stored videos by optimizing their tile layouts given the video content and a query workload. Additionally, TASM dynamically tunes that layout in response to changes in the query workload or if the query workload and video contents are incrementally discovered. Finally, TASM also produces efficient initial tile layouts for newly ingested videos. We demonstrate that TASM can speed up subframe selection queries by an average of over 50% and up to 94%. TASM can also improve the throughput of the full scan phase of object detection queries by up to 2×.more » « less
-
In this research, we take an innovative approach to the Video Corpus Visual Answer Localization (VCVAL) task using the MedVidQA dataset. We expand on it by incorporating causal inference for medical videos, a novel approach in this field. By leveraging the state-of-the-art GPT-4 and Gemini Pro 1.5 models, the system aims to localize temporal segments in videos and analyze cause-effect relationships from subtitles to enhance medical decision-making. This paper extends the work from the MedVidQA challenge by introducing causality extraction to enhance the interpretability of localized video content. Subtitles are segmented to identify causal units such as cause, effect, condition, action, and signal. Prompts guide GPT-4 and Gemini Pro 1.5 in detecting and quantifying causal structures while analyzing explicit and implicit relationships, including those spanning multiple subtitle fragments. Our results reveal that both GPT-4 and Gemini Pro 1.5 perform better when handling queries individually but face challenges in batch processing for both temporal localization and causality extraction. Despite these challenges, our innovative approach has the potential to significantly advance the field of Health Informatics. In this research, we address the Video Corpus Visual Answer Localization (VCVAL) task using the MedVidQA dataset and take it a step further by integrating causal inference for medical videos. By leveraging the state-of-the-art GPT-4 and Gemini Pro 1.5 model, our system is designed to localize temporal segments in videos and analyze cause-effect relationships from subtitles to enhance medical decision-making. Our preliminary results indicate that while both models perform well for some videos, they face challenges for most, resulting in varying performance levels. The successful integration of temporal localization with causal inference can provide significant improvement for the scalability and overall performance of medical video analysis. Our work demonstrates how AI systems can uncover valuable insights from medical videos, driving significant progress in medical AI applications and potentially making significant contributions to the field.more » « less
An official website of the United States government

