Towards Egocentric 3D Hand Pose Estimation in Unseen Domains

Wiktor Mucha, Michael Wray, Martin Kampel

Research output: Contribution to conferenceConference Paper

Abstract

We present V-HPOT, a novel approach for improving the cross-domain performance of 3D hand pose estimation from egocentric images across diverse, unseen domains. State-of-the-art methods demonstrate strong performance when trained and tested within the same domain. However, they struggle to generalise to new environments due to limited training data and depth perception -- overfitting to specific camera intrinsics. Our method addresses this by estimating keypoint z-coordinates in a virtual camera space, normalised by focal length and image size, enabling camera-agnostic depth prediction. We further leverage this invariance to camera intrinsics to propose a self-supervised test-time optimisation strategy that refines the model's depth perception during inference. This is achieved by applying a 3D consistency loss between predicted and in-space scale-transformed hand poses, allowing the model to adapt to target domain characteristics without requiring ground truth annotations. V-HPOT significantly improves 3D hand pose estimation performance in cross-domain scenarios, achieving a 71% reduction in mean pose error on the H2O dataset and a 41% reduction on the AssemblyHands dataset. Compared to state-of-the-art methods, V-HPOT outperforms all single-stage approaches across all datasets and competes closely with two-stage methods, despite needing approximately x3.5 to x14 less data.
Original languageEnglish
Number of pages15
DOIs
Publication statusAccepted/In press - 11 Nov 2025
EventIEEE/CVF Winter Conference on Applications of Computer Vision - Tucson, Tucson, United States
Duration: 6 Mar 202610 Mar 2026

Conference

ConferenceIEEE/CVF Winter Conference on Applications of Computer Vision
Country/TerritoryUnited States
CityTucson
Period6/03/2610/03/26

Research Groups and Themes

  • Intelligent Systems Laboratory (MaVi)

Fingerprint

Dive into the research topics of 'Towards Egocentric 3D Hand Pose Estimation in Unseen Domains'. Together they form a unique fingerprint.

Cite this