skip to main content

Search for: All records

Creators/Authors contains: "Nguyen, Tam V."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 1, 2023
  2. Free, publicly-accessible full text available April 1, 2023
  3. Counting multi-vehicle motions via traffic cameras in urban areas is crucial for smart cities. Even though several frameworks have been proposed in this task, there is no prior work focusing on the highly common, dense and size-variant vehicles such as motorcycles. In this paper, we propose a novel framework for vehicle motion counting with adaptive label-independent tracking and counting modules that processes 12 frames per second. Our framework adapts hyperparameters for multi-vehicle tracking and properly works in complex traffic conditions, especially invariant to camera perspectives. We achieved the competitive results in terms of root-mean-square error and runtime performance.
    Free, publicly-accessible full text available April 1, 2023
  4. Free, publicly-accessible full text available January 1, 2023
  5. Face recognition with wearable items has been a challenging task in computer vision and involves the problem of identifying humans wearing a face mask. Masked face analysis via multi-task learning could effectively improve performance in many fields of face analysis. In this paper, we propose a unified framework for predicting the age, gender, and emotions of people wearing face masks. We first construct FGNET-MASK, a masked face dataset for the problem. Then, we propose a multi-task deep learning model to tackle the problem. In particular, the multi-task deep learning model takes the data as inputs and shares their weight tomore »yield predictions of age, expression, and gender for the masked face. Through extensive experiments, the proposed framework has been found to provide a better performance than other existing methods.« less
    Free, publicly-accessible full text available October 1, 2022
  6. Free, publicly-accessible full text available October 1, 2022
  7. In this paper, we introduce a practical system for interactive video object mask annotation, which can support multiple back-end methods. To demonstrate the generalization of our system, we introduce a novel approach for video object annotation. Our proposed system takes scribbles at a chosen key-frame from the end-users via a user-friendly interface and produces masks of corresponding objects at the key-frame via the Control-Point-based Scribbles-to-Mask (CPSM) module. The object masks at the key-frame are then propagated to other frames and refined through the Multi-Referenced Guided Segmentation (MRGS) module. Last but not least, the user can correct wrong segmentation at somemore »frames, and the corrected mask is continuously propagated to other frames in the video via the MRGS to produce the object masks at all video frames.« less
  8. Traffic event retrieval is one of the important tasks for intelligent traffic system management. To find accurate candidate events in traffic videos corresponding to a specific text query, it is necessary to understand the text query's attributes, represent the visual and motion attributes of vehicles in videos, and measure the similarity between them. Thus we propose a promising method for vehicle event retrieval from a natural-language-based specification. We utilize both appearance and motion attributes of a vehicle and adapt the COOT model to evaluate the semantic relationship between a query and a video track. Experiments with the test dataset ofmore »Track 5 in AI City Challenge 2021 show that our method is among the top 6 with a score of 0.1560.« less
  9. In recent years, the need to exploit digitized document data has been increasing. In this paper, we address the problem of parsing digitized Vietnamese paper documents. The digitized Vietnamese documents are mainly in the form of scanned images with diverse layouts and special characters introducing many challenges. To this end, we first collect the UIT-DODV dataset, a novel Vietnamese document image dataset that includes scientific papers in Vietnamese derived from different scientific conferences. We compile both images that were converted from PDF and scanned by a smartphone in addition a physical scanner that poses many new challenges. Additionally, we furthermore »leverage the state-of-the-art object detector along with the fused loss function to efficiently parse the Vietnamese paper documents. Extensive experiments conducted on the UIT-DODV dataset provide a comprehensive evaluation and insightful analysis.« less