V2V: Efficiently Synthesizing Video Results for Video Queries

Winecki, Dominik; Nandi, Arnab

doi:10.1109/ICDE60146.2024.00449

Querying video data has become increasingly popular and useful. Video queries can be complex, ranging from retrieval tasks (“find me the top videos that have … ”), to analytics (“how many videos contained object X per day?”), to excerpting tasks (“highlight and zoom into scenes with object X near object Y”), or combinations thereof. Results for video queries are still typically shown as either relational data or a primitive collection of clickable thumbnails on a web page. Presenting query results in this form is an impedance mismatch with the video medium: they are cumbersome to skim through and are in a different modality and information density compared to the source data. We describe V2V, a system to efficiently synthesize video results for video queries. V2V returns a fully-edited video, allowing the user to consume results in the same manner as the source videos. A key challenge is that synthesizing video results from a collection of videos is computationally intensive, especially within interactive query response times. To address this, V2V features a grammar to express video transformations in a declarative manner and a heuristic optimizer that improves the efficiency of V2V processing in a manner similar to how databases execute relational queries. Experiments show that our V2V optimizer enables video synthesis to run 3x faster.

More Like this