NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Towards Long-Form Video Understanding

https://doi.org/10.1109/CVPR46437.2021.00192

Wu, Chao-Yuan; Krahenbuhl, Philipp (June 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Our world offers a never-ending stream of visual stimuli, yet today’s vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset.
more » « less
Full Text Available
Towards Long-Form Video Understanding

Wu, Chao-Yuan; Krähenbühl, Philipp (June 2021, IEEE Conference on Computer Vision and Pattern Recognition)
null (Ed.)
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset.
more » « less
Full Text Available
A Multigrid Method for Efficiently Training Video Models

https://doi.org/10.1109/CVPR42600.2020.00023

Wu, Chao-Yuan and (June 2020, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
null (Ed.)
Training competitive deep video models is an order of magnitude slower than training their counterpart image models. Slow training causes long research cycles, which hinders progress in video understanding research. Following standard practice for training image models, video model training has used a fixed mini-batch shape: a specific number of clips, frames, and spatial size. However, what is the optimal shape? High resolution models perform well, but train slowly. Low resolution models train faster, but are less accurate. Inspired by multigrid methods in numerical optimization, we propose to use variable mini-batch shapes with different spatial-temporal resolutions that are varied according to a schedule. The different shapes arise from resampling the training data on multiple sampling grids. Training is accelerated by scaling up the mini-batch size and learning rate when shrinking the other dimensions. We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU). As an illustrative example, the proposed multigrid method trains a ResNet-50 SlowFast network 4.5 x faster (wall-clock time, same hardware) while also improving accuracy (+ 0.8% absolute) on Kinetics-400 compared to baseline training. Code is available online.
more » « less
Full Text Available
MEMORY OPTIMIZATION FOR DEEP NETWORKS

Shah, A; Chao-Yuan Wu, C; Jayashree Mohan, J; Vijay Chidambaram, V; Krähenbühl, P (April 2021, International Conference on Learning Representations (ICLR))
null (Ed.)
Full Text Available
A General Approach to Stereospecific Cross-Coupling Reactions of Nitrogen-Containing Stereocenters

https://doi.org/10.1016/j.chempr.2020.02.002

Ma, Xinghua; Zhao, Haoran; Binayeva, Meruyert; Ralph, Glenn; Diane, Mohamed; Zhao, Shibin; Wang, Chao-Yuan; Biscoe, Mark R. (March 2020, Chem)
null (Ed.)
Full Text Available
Long-Term Feature Banks for Detailed Video Understanding

https://doi.org/10.1109/cvpr.2019.00037

Wu, Chao-Yuan; Feichtenhofer, Christoph; Fan, Haoqi; He, Kaiming; Krahenbuhl, Philipp; Girshick, Ross (June 2019, CVPR)

Full Text Available
A new calibration method for charm jet identification validated with proton-proton collision events at √s = 13 TeV

https://doi.org/10.1088/1748-0221/17/03/P03014

Tumasyan, Armen; Adam, Wolfgang; Andrejkovic, Janik Walter; Bergauer, Thomas; Chatterjee, Suman; Dragicevic, Marko; Escalante Del Valle, Alberto; Fruehwirth, Rudolf; Jeitler, Manfred; Krammer, Natascha; et al (March 2022, Journal of Instrumentation)

Abstract Many measurements at the LHC require efficient identification of heavy-flavour jets, i.e. jets originating from bottom (b) or charm (c) quarks. An overview of the algorithms used to identify c jets is described and a novel method to calibrate them is presented. This new method adjusts the entire distributions of the outputs obtained when the algorithms are applied to jets of different flavours. It is based on an iterative approach exploiting three distinct control regions that are enriched with either b jets, c jets, or light-flavour and gluon jets. Results are presented in the form of correction factors evaluated using proton-proton collision data with an integrated luminosity of 41.5 fb -1 at √s = 13 TeV, collected by the CMS experiment in 2017. The closure of the method is tested by applying the measured correction factors on simulated data sets and checking the agreement between the adjusted simulation and collision data. Furthermore, a validation is performed by testing the method on pseudodata, which emulate various mismodelling conditions. The calibrated results enable the use of the full distributions of heavy-flavour identification algorithm outputs, e.g. as inputs to machine-learning models. Thus, they are expected to increase the sensitivity of future physics analyses.
more » « less
Full Text Available

Search for: All records