Text Query based Traffic Video Event Retrieval with Global-Local Fusion Embedding

Thang-Long Nguyen-Ho; Minh-Khoi Pham; Tien-Phat Nguyen; Minh N. Do; Tam V. Nguyen; Minh-Triet Tran

Citation Details

Retrieving event videos based on textual description is a promising research topic in the fast-growing data field. Since traffic data increases every day, there is an essential need of an intelligent traffic system to speed up the traffic event search. We propose a multi-module system that outputs accurate results. Our solution considers neighboring entities related to the mentioned object to represent an event by rule-based, which can represent an event by the relationship of multiple objects. We also propose to add a modified model from last year's Alibaba model with an explainable architecture. As the traffic data is vehicle-centric, we apply two language and image modules to analyze the input data and obtain the global properties of the context and the internal attributes of the vehicle. We introduce a one-on-one dual training strategy for each representation vector to optimize the interior features for the query. Finally, a refinement module gathers previous results to enhance the final retrieval result. We benchmarked our approach on the data of the AI City Challenge 2022 and obtained the competitive results at an MMR of 0.3611. We were ranked in the top 4 on 50\% of the test set and in the top 5 on the full set. more »

Award ID(s):: 2025234

PAR ID:: 10330105

Author(s) / Creator(s):: Thang-Long Nguyen-Ho; Minh-Khoi Pham; Tien-Phat Nguyen; Minh N. Do; Tam V. Nguyen; Minh-Triet Tran

Date Published:: 2022-06-01

Journal Name:: IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops

ISSN:: 2160-7516

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this