On Semantic Similarity in Video Retrieval

Michael Wray, Hazel R Doughty*, Dima Damen

*Corresponding author for this work

Research output: Contribution to conferenceConference Paperpeer-review

99 Downloads (Pure)


Current video retrieval efforts all found their evaluation on an instance-based assumption, that only a single caption is relevant to a query video and vice versa. We demonstrate that this assumption results in performance comparisons often not indicative of models’ retrieval capabilities. We propose a move to semantic similarity video retrieval, where (i) multiple videos/captions can be deemed equally relevant, and their relative ranking does not affect a method’s reported performance and (ii) retrieved videos/captions are ranked by their similarity to a query. We propose several proxies to estimate semantic similarities in large-scale retrieval datasets, without additional annotations. Our analysis is performed on three commonly used video retrieval datasets (MSR-VTT, YouCook2 and EPIC-KITCHENS).
Original languageEnglish
Number of pages15
Publication statusAccepted/In press - 6 Mar 2021
EventComputer Vision and Pattern Recognition 2021 - Online
Duration: 19 Jun 202125 Jun 2021


ConferenceComputer Vision and Pattern Recognition 2021
Abbreviated titleCVPR
Internet address


Dive into the research topics of 'On Semantic Similarity in Video Retrieval'. Together they form a unique fingerprint.

Cite this