Projects per year
a unique viewpoint on people’s interaction with objects, their attention, and even intention. In this paper, we detail how this large-scale
dataset was captured by 32 participants in their native kitchen environments, and densely annotated with actions and object
interactions. Our videos depict nonscripted daily activities, as recording is started every time a participant entered their kitchen.
Recording took place in 4 countries by participants belonging to 10 different nationalities, resulting in highly diverse kitchen habits and
cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labelled for a total of 39.6K action
segments and 454.2K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after
recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and
anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. We introduce new baselines
that highlight the multimodal nature of the dataset and the importance of explicit temporal modelling to discriminate fine-grained actions
(e.g. ‘closing a tap’ from ‘opening’ it up).
|Journal||IEEE Transactions on Pattern Analysis and Machine Intelligence|
|Publication status||Accepted/In press - 25 Apr 2020|
1/02/20 → 31/01/25
4/07/16 → 3/05/18
Aldamen, D. (Creator), Moltisanti, D. (Creator), Kazakos, V. (Creator), Doughty, H. R. (Creator), Munro, J. P. N. (Creator), Price, W. (Creator), Wray, M. (Creator), Perrett, T. J. (Creator), Fidler, S. (Contributor), Farinella, G. M. (Contributor) & Furnari, A. (Contributor), University of Bristol, 3 Apr 2018