XL

5 records found

Efficiency in Deep Learning

Image and Video Deep Model Efficiency

Deep learning is the core algorithmic tool for automatically processing large amounts of data. Deep learning models are defined as a stack of functions (called layers) with millions of parameters, that are updated during training by fitting them to data. Deep learning models have ...

Objects do not disappear

Video object detection by single-frame object location anticipation

Objects in videos are typically characterized by continuous smooth motion. We exploit continuous smooth motion in three ways. 1) Improved accuracy by using object motion as an additional source of supervision, which we obtain by anticipating object locations from a static keyfram ...

Video BagNet

Short temporal receptive fields increase robustness in long-term action recognition

Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF). We argue that these models are not always the best choice for temporal modeling in videos. A large temporal receptive field allows the model ...

No frame left behind

Full Video Action Recognition

Not all video frames are equally informative for recognizing an action. It is computationally infeasible to train deep networks on all video frames when actions develop over hundreds of frames. A common heuristic is uniformly sampling a small number of video frames and using thes ...
Cross domain image matching between image collections from different source and target domains is challenging in times of deep learning due to i) limited variation of image conditions in a training set, ii) lack of paired-image labels during training, iii) the existing of outlier ...