Search for: All records

Creators/Authors contains: "Gong, B."

« Prev Next »

Total Resources

4

Resource Type
Conference Paper

4

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

4

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Less: Label-efficient semantic segmentation for lidar point clouds

Liu, M. ; Zhou, Y. ; Qi, C.R. ; Gong, B. ; Su, H. ; and Anguelov, D. ( October 2022 , European Conference on Computer Vision)

Full Text Available
Adversarially Adaptive Normalization for Single Domain Generalization

Fan, X. ; Wang, Q. ; Ke, J. ; Yang, F. ; Gong, B. ; Zhou, M. ( June 2021 , IEEE/CVF Conference on Computer Vision and Pattern Recognition)
null (Ed.)
Full Text Available
A Fast and Accurate One-Stage Approach to Visual Grounding

https://doi.org/10.1109/ICCV.2019.00478

Yang, Z ; Gong, B ; Wang, L ; Huang, W ; Yu, D ; Luo, J. ( October 2019 , International Conference on Computer Vision)

We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight. The performances of existing propose-and-rank twostage methods are capped by the quality of the region candidates they propose in the first stage — if none of the candidates could cover the ground truth region, there is no hope in the second stage to rank the right region to the top. To avoid this caveat, we propose a one-stage model that enables end-to-end joint optimization. The main idea is as straightforward as fusing a text query’s embedding into the YOLOv3 object detector, augmented by spatial features so as to account for spatial mentions in the query. Despite being simple, this one-stage approach shows great potential in terms of both accuracy and speed for both phrase localization and referring expression comprehension, according to our experiments. Given these results along with careful investigations into some popular region proposals, we advocate for visual grounding a paradigm shift from the conventional two-stage methods to the one-stage framework.
more » « less
Full Text Available
How Local Is the Local Diversity? Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization

https://doi.org/10.1007/978-3-030-01237-3_10

Li, Y. ; Wang, L. ; Yang, T. ; Gong, B. ( January 2018 , Computer Vision - ECCV 2018)

The large volume of video content and high viewing frequency demand automatic video summarization algorithms, of which a key property is the capability of modeling diversity. If videos are lengthy like hours-long egocentric videos, it is necessary to track the temporal structures of the videos and enforce local diversity. The local diversity refers to that the shots selected from a short time duration are diverse but visually similar shots are allowed to co-exist in the summary if they appear far apart in the video. In this project, we propose a novel probabilistic model, built upon SeqDPP [1], to dynamically control the time span of a video segment upon which the local diversity is imposed. In particular, we enable SeqDPP to learn to automatically infer how local the local diversity is supposed to be from the input video. The resulting model is extremely involved to train by the hallmark maximum likelihood estimation (MLE), which further suffers from the exposure bias and non-differentiable evaluation metrics. To tackle these problems, we instead devise a reinforcement learning algorithm for training the proposed model. Extensive experiments verify the advantages of our model and the new learning algorithm over MLE-based methods.
more » « less
Full Text Available