skip to main content

Search for: All records

Creators/Authors contains: "Do, Minh N."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task, which tries to segment instance objects from a query image with a few annotated examples of novel categories. Conventional approaches have attempted to address the task via prototype learning, known as point estimation. However, this mechanism depends on prototypes (e.g. mean of K-shot) for prediction, leading to performance instability. To overcome the disadvantage of the point estimation mechanism, we propose a novel approach, dubbed MaskDiff, which models the underlying conditional distribution of a binary mask, which is conditioned on an object region and K-shot information. Inspired by augmentation approaches that perturb data with Gaussian noise for populating low data density regions, we model the mask distribution with a diffusion probabilistic model. We also propose to utilize classifier-free guided mask sampling to integrate category information into the binary mask generation process. Without bells and whistles, our proposed method consistently outperforms state-of-the-art methods on both base and novel classes of the COCO dataset while simultaneously being more stable than existing methods. The source code is available at: 
    more » « less
    Free, publicly-accessible full text available March 25, 2025
  2. Retrieving event videos based on textual description is a promising research topic in the fast-growing data field. Since traffic data increases every day, there is an essential need of an intelligent traffic system to speed up the traffic event search. We propose a multi-module system that outputs accurate results. Our solution considers neighboring entities related to the mentioned object to represent an event by rule-based, which can represent an event by the relationship of multiple objects. We also propose to add a modified model from last year's Alibaba model with an explainable architecture. As the traffic data is vehicle-centric, we apply two language and image modules to analyze the input data and obtain the global properties of the context and the internal attributes of the vehicle. We introduce a one-on-one dual training strategy for each representation vector to optimize the interior features for the query. Finally, a refinement module gathers previous results to enhance the final retrieval result. We benchmarked our approach on the data of the AI City Challenge 2022 and obtained the competitive results at an MMR of 0.3611. We were ranked in the top 4 on 50\% of the test set and in the top 5 on the full set. 
    more » « less
  3. Rapid, simple, inexpensive, accurate, and sensitive point-of-care (POC) detection of viral pathogens in bodily fluids is a vital component of controlling the spread of infectious diseases. The predominant laboratory-based methods for sample processing and nucleic acid detection face limitations that prevent them from gaining wide adoption for POC applications in low-resource settings and self-testing scenarios. Here, we report the design and characterization of an integrated system for rapid sample-to-answer detection of a viral pathogen in a droplet of whole blood comprised of a 2-stage microfluidic cartridge for sample processing and nucleic acid amplification, and a clip-on detection instrument that interfaces with the image sensor of a smartphone. The cartridge is designed to release viral RNA from Zika virus in whole blood using chemical lysis, followed by mixing with the assay buffer for performing reverse-transcriptase loop-mediated isothermal amplification (RT-LAMP) reactions in six parallel microfluidic compartments. The battery-powered handheld detection instrument uniformly heats the compartments from below, and an array of LEDs illuminates from above, while the generation of fluorescent reporters in the compartments is kinetically monitored by collecting a series of smartphone images. We characterize the assay time and detection limits for detecting Zika RNA and gamma ray-deactivated Zika virus spiked into buffer and whole blood and compare the performance of the same assay when conducted in conventional PCR tubes. Our approach for kinetic monitoring of the fluorescence-generating process in the microfluidic compartments enables spatial analysis of early fluorescent “bloom” events for positive samples, in an approach called “Spatial LAMP” (S-LAMP). We show that S-LAMP image analysis reduces the time required to designate an assay as a positive test, compared to conventional analysis of the average fluorescent intensity of the entire compartment. S-LAMP enables the RT-LAMP process to be as short as 22 minutes, resulting in a total sample-to-answer time in the range of 17–32 minutes to distinguish positive from negative samples, while demonstrating a viral RNA detection as low as 2.70 × 10 2 copies per μl, and a gamma-irradiated virus of 10 3 virus particles in a single 12.5 μl droplet blood sample. 
    more » « less
  4. null (Ed.)
    Traffic event retrieval is one of the important tasks for intelligent traffic system management. To find accurate candidate events in traffic videos corresponding to a specific text query, it is necessary to understand the text query's attributes, represent the visual and motion attributes of vehicles in videos, and measure the similarity between them. Thus we propose a promising method for vehicle event retrieval from a natural-language-based specification. We utilize both appearance and motion attributes of a vehicle and adapt the COOT model to evaluate the semantic relationship between a query and a video track. Experiments with the test dataset of Track 5 in AI City Challenge 2021 show that our method is among the top 6 with a score of 0.1560. 
    more » « less
  5. Abstract Motivation

    Neural networks have been widely used to analyze high-throughput microscopy images. However, the performance of neural networks can be significantly improved by encoding known invariance for particular tasks. Highly relevant to the goal of automated cell phenotyping from microscopy image data is rotation invariance. Here we consider the application of two schemes for encoding rotation equivariance and invariance in a convolutional neural network, namely, the group-equivariant CNN (G-CNN), and a new architecture with simple, efficient conic convolution, for classifying microscopy images. We additionally integrate the 2D-discrete-Fourier transform (2D-DFT) as an effective means for encoding global rotational invariance. We call our new method the Conic Convolution and DFT Network (CFNet).


    We evaluated the efficacy of CFNet and G-CNN as compared to a standard CNN for several different image classification tasks, including simulated and real microscopy images of subcellular protein localization, and demonstrated improved performance. We believe CFNet has the potential to improve many high-throughput microscopy image analysis applications.

    Availability and implementation

    Source code of CFNet is available at:

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less