It’s Just Another Day: Unique Video Captioning by Discriminitive Prompting

Toby J Perrett, Tengda Han, Dima Damen, Andrew Zisserman

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

Abstract

Long videos contain many repeating actions, events and shots.
These repetitions are frequently given identical captions, which makes it
difficult to retrieve the exact desired clip using a text search. In this
paper, we formulate the problem of unique captioning: Given multiple
clips with the same caption, we generate a new caption for each clip
that uniquely identifies it. We propose Captioning by Discriminative
Prompting (CDP), which predicts a property that can separate identically
captioned clips, and use it to generate unique captions. We introduce
two benchmarks for unique captioning, based on egocentric footage
and timeloop movies – where repeating actions are common. We demonstrate
that captions generated by CDP improve text-to-video R@1 by
15% for egocentric videos and 10% in timeloop movies.
Original languageEnglish
Title of host publicationComputer Vision – ACCV 2024
Subtitle of host publication17th Asian Conference on Computer Vision, Hanoi, Vietnam, December 8–12, 2024, Proceedings
PublisherSpringer, Singapore
Publication statusAccepted/In press - 11 Oct 2024
EventAsian Conference on Computer Vision - Hanoi, Viet Nam
Duration: 8 Dec 202412 Dec 2024
https://accv2024.org

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume15480
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceAsian Conference on Computer Vision
Abbreviated titleACCV
Country/TerritoryViet Nam
CityHanoi
Period8/12/2412/12/24
Internet address

Cite this