Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Our world offers a never-ending stream of visual stimuli, yet today’s vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset.more » « less
-
null (Ed.)Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset.more » « less
-
null (Ed.)Training competitive deep video models is an order of magnitude slower than training their counterpart image models. Slow training causes long research cycles, which hinders progress in video understanding research. Following standard practice for training image models, video model training has used a fixed mini-batch shape: a specific number of clips, frames, and spatial size. However, what is the optimal shape? High resolution models perform well, but train slowly. Low resolution models train faster, but are less accurate. Inspired by multigrid methods in numerical optimization, we propose to use variable mini-batch shapes with different spatial-temporal resolutions that are varied according to a schedule. The different shapes arise from resampling the training data on multiple sampling grids. Training is accelerated by scaling up the mini-batch size and learning rate when shrinking the other dimensions. We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU). As an illustrative example, the proposed multigrid method trains a ResNet-50 SlowFast network 4.5 x faster (wall-clock time, same hardware) while also improving accuracy (+ 0.8% absolute) on Kinetics-400 compared to baseline training. Code is available online.more » « less
-
Abstract Many measurements at the LHC require efficient identification of heavy-flavour jets, i.e. jets originating from bottom (b) or charm (c) quarks. An overview of the algorithms used to identify c jets is described and a novel method to calibrate them is presented. This new method adjusts the entire distributions of the outputs obtained when the algorithms are applied to jets of different flavours. It is based on an iterative approach exploiting three distinct control regions that are enriched with either b jets, c jets, or light-flavour and gluon jets. Results are presented in the form of correction factors evaluated using proton-proton collision data with an integrated luminosity of 41.5 fb -1 at √s = 13 TeV, collected by the CMS experiment in 2017. The closure of the method is tested by applying the measured correction factors on simulated data sets and checking the agreement between the adjusted simulation and collision data. Furthermore, a validation is performed by testing the method on pseudodata, which emulate various mismodelling conditions. The calibrated results enable the use of the full distributions of heavy-flavour identification algorithm outputs, e.g. as inputs to machine-learning models. Thus, they are expected to increase the sensitivity of future physics analyses.more » « less
An official website of the United States government

Full Text Available