Use Your Head: Improving Long-Tail Video Recognition

Research output: Contribution to conferenceConference Paperpeer-review

76 Downloads (Pure)


This paper presents an investigation into long-tail
video recognition. We demonstrate that, unlike naturallycollected
video datasets and existing long-tail image benchmarks,
current video benchmarks fall short on multiple
long-tailed properties. Most critically, they lack few-shot
classes in their tails. In response, we propose new video
benchmarks that better assess long-tail recognition, by sampling
subsets from two datasets: SSv2 and VideoLT.
We then propose a method, Long-Tail Mixed Reconstruction
(LMR), which reduces overfitting to instances
from few-shot classes by reconstructing them as weighted
combinations of samples from head classes. LMR then
employs label mixing to learn robust decision boundaries.
It achieves state-of-the-art average class accuracy
on EPIC-KITCHENS and the proposed SSv2-LT and
VideoLT-LT. Benchmarks and code at:
Original languageEnglish
Number of pages12
Publication statusPublished - 23 Jun 2023
EventIEEE/CVF Computer Vision and Pattern Recognition - Vancouver, Canada
Duration: 18 Jun 202323 Jun 2023


ConferenceIEEE/CVF Computer Vision and Pattern Recognition
Abbreviated titleCVPR


Dive into the research topics of 'Use Your Head: Improving Long-Tail Video Recognition'. Together they form a unique fingerprint.

Cite this