Abstract
Effective extraction of temporal patterns is crucial for the recognition of temporally varying actions in video. We argue that the fixed-sized spatio-temporal convolution kernels used in convolutional neural networks (CNNs) can be improved to extract informative motions that are executed at different time scales. To address this challenge, we present a novel convolution block that is capable of extracting spatio-temporal patterns at multiple temporal resolutions. Our proposed multi-temporal convolution (MTConv) blocks utilize two branches that focus on brief and prolonged spatio-temporal patterns, respectively. The extracted time-varying features are aligned in a third branch, with respect to global motion patterns through recurrent cells. The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture. This introduces a substantial reduction in computational costs. Extensive experiments on Kinetics, Moments in Time and HACS action recognition benchmark datasets demonstrate competitive performance of MTConvs compared to the state-of-the-art with a significantly lower computational footprint 11Our code is available at: https://git.io/JfuPi.
| Original language | English |
|---|---|
| Title of host publication | IJCNN 2021 - International Joint Conference on Neural Networks, Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
| ISBN (Electronic) | 9781665439008 |
| ISBN (Print) | 9781665445979 |
| DOIs | |
| Publication status | Published - 22 Jul 2021 |
| Event | 2021 International Joint Conference on Neural Networks, IJCNN 2021 - Virtual, Shenzhen, China Duration: 18 Jul 2021 → 22 Jul 2021 |
Publication series
| Name | Proceedings of the International Joint Conference on Neural Networks (IJCNN) |
|---|---|
| ISSN (Print) | 2161-4393 |
| ISSN (Electronic) | 2161-4407 |
Conference
| Conference | 2021 International Joint Conference on Neural Networks, IJCNN 2021 |
|---|---|
| Country/Territory | China |
| City | Virtual, Shenzhen |
| Period | 18/07/21 → 22/07/21 |
Bibliographical note
Funding Information:This publication is supported by the Netherlands Organization for Scientific Research (NWO) with a TOP-C2 grant for Automatic recognition of bodily interactions (ARBITER).
Publisher Copyright:
© 2021 IEEE.
Fingerprint
Dive into the research topics of 'Multi-Temporal Convolutions for Human Action Recognition in Videos'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver