• Ahmad Dar Khalil (Contributor)
  • Dandan Shan (Contributor)
  • Bin Zhu (Contributor)
  • Jian Ma (Contributor)
  • Amlan Kar (Contributor)
  • Richard Higgins (Contributor)
  • Sanja Fidler (Contributor)
  • David Fouhey (Contributor)
  • Dima Damen (Creator)



We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. Data published under the Creative Commons Attribution-NonCommerial 4.0 International License.
Date made available11 Aug 2022
PublisherUniversity of Bristol
Date of data productionAug 2022

Cite this