Abstract
Most recent view-invariant action recognition and performance assessment approaches rely on a large amount of annotated 3D skeleton data to extract view-invariant features. However, acquiring {3D} skeleton data can be cumbersome, if not impractical, in in-the-wild scenarios. To overcome this problem, we present a novel unsupervised approach that learns to extract view-invariant 3D human pose representation from a 2D image without using 3D joint data.
Our model is trained by exploiting the intrinsic view-invariant properties of human pose between simultaneous frames from different viewpoints and their equivariant properties between augmented frames from the same viewpoint. We evaluate the learned view-invariant pose representations for two downstream tasks. We perform comparative experiments that show improvements on the state-of-the-art unsupervised cross-view action classification accuracy on NTU RGB+D by a significant margin, on both RGB and depth images. We also show the efficiency of transferring the learned representations from NTU RGB+D to obtain the first ever unsupervised cross-view and cross-subject rank correlation results on the multi-view human movement quality dataset, QMAR, and marginally improve on the-state-of-the-art supervised results for this dataset. We also carry out ablation studies to examine the contributions of the different components of our proposed network.
Our model is trained by exploiting the intrinsic view-invariant properties of human pose between simultaneous frames from different viewpoints and their equivariant properties between augmented frames from the same viewpoint. We evaluate the learned view-invariant pose representations for two downstream tasks. We perform comparative experiments that show improvements on the state-of-the-art unsupervised cross-view action classification accuracy on NTU RGB+D by a significant margin, on both RGB and depth images. We also show the efficiency of transferring the learned representations from NTU RGB+D to obtain the first ever unsupervised cross-view and cross-subject rank correlation results on the multi-view human movement quality dataset, QMAR, and marginally improve on the-state-of-the-art supervised results for this dataset. We also carry out ablation studies to examine the contributions of the different components of our proposed network.
| Original language | English |
|---|---|
| Title of host publication | Proc. 32nd British Machine Vision Conference (BMVC) |
| Publisher | BMVA Press |
| Publication status | Published - 25 Nov 2021 |
| Event | The 32nd British Machine Vision Conference - Online Duration: 22 Nov 2021 → 25 Nov 2021 Conference number: 32 https://www.bmvc2021-virtualconference.com/ https://www.bmvc2021.com/ |
Conference
| Conference | The 32nd British Machine Vision Conference |
|---|---|
| Abbreviated title | BMVC 2021 |
| Period | 22/11/21 → 25/11/21 |
| Internet address |
Fingerprint
Dive into the research topics of 'Unsupervised View-Invariant Human Posture Representation'. Together they form a unique fingerprint.Student theses
-
View-invariant human movement assessment
Sardari, F. (Author), Mirmehdi, M. (Supervisor) & Paiement, A. T. M. (Supervisor), 21 Jun 2022Student thesis: Doctoral Thesis › Doctor of Philosophy (PhD)
File