Abstract
The self-supervised pretraining paradigm has achieved great success in learning 3D action representations for skeleton-based action recognition using contrastive learning. However, learning effective representations for skeleton-based temporal action localization remains challenging and underexplored. Unlike video-level {action} recognition, detecting action boundaries requires temporally sensitive features that capture subtle differences between adjacent frames where labels change. To this end, we formulate a snippet discrimination pretext task for self-supervised pretraining, which densely projects skeleton sequences into non-overlapping segments and promotes features that distinguish them across videos via contrastive learning. Additionally, we build on strong backbones of skeleton-based action recognition models by fusing intermediate features with a U-shaped module to enhance feature resolution for frame-level localization. Our approach consistently improves existing skeleton-based contrastive learning methods for action localization on BABEL across diverse subsets and evaluation protocols. We also achieve state-of-the-art transfer learning performance on PKUMMD with pretraining on NTU RGB+D and BABEL.
| Original language | English |
|---|---|
| Title of host publication | Pattern Recognition: 28th International Conference, ICPR 2026, Lyon, France, August 17–21, 2026, Proceedings, Part I. |
| Publisher | Springer |
| DOIs | |
| Publication status | Accepted/In press - 31 Mar 2026 |
| Event | 28th International Conference on Pattern Recognition - Lyon, France Duration: 17 Aug 2026 → 22 Aug 2026 https://icpr2026.org/ |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 28th International Conference on Pattern Recognition |
|---|---|
| Abbreviated title | ICPR 2026 |
| Country/Territory | France |
| City | Lyon |
| Period | 17/08/26 → 22/08/26 |
| Internet address |
Fingerprint
Dive into the research topics of 'Skeleton-Snippet Contrastive Learning with Multiscale Feature Fusion for Action Localization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver