Projects per year
Abstract
We address the task of generating temporally consistent and physically plausible images of actions and object state transformations. Given an input image and a text prompt describing the targeted transformation, our generated im- ages preserve the environment and transform objects in the initial image. Our contributions are threefold. First, we leverage a large body of instructional videos and automati- cally mine a dataset of triplets of consecutive frames cor- responding to initial object states, actions, and resulting object transformations. Second, equipped with this data, we develop and train a conditioned diffusion model dubbed GenHowTo. Third, we evaluate GenHowTo on a variety of objects and actions and show superior performance com- pared to existing methods. In particular, we introduce a quantitative evaluation where GenHowTo achieves 88% and 74% on seen and unseen interaction categories, respec- tively, outperforming prior work by a large margin.
Original language | English |
---|---|
Title of host publication | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Publication status | Accepted/In press - 17 Jun 2024 |
Event | IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): CVPR - Seattle, United States Duration: 17 Jun 2024 → 21 Jun 2024 https://cvpr.thecvf.com |
Publication series
Name | IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
---|---|
Publisher | IEEE |
ISSN (Print) | 1063-6919 |
ISSN (Electronic) | 2575-7075 |
Conference
Conference | IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 17/06/24 → 21/06/24 |
Internet address |
Fingerprint
Dive into the research topics of 'GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos'. Together they form a unique fingerprint.-
8030 EPSRC via Oxford EP/T028572/1 Visual AI
Damen, D. (Principal Investigator)
1/12/20 → 30/11/25
Project: Research, Parent
-
UMPIRE: United Model for the Perception of Interactions for visual Recognition
Damen, D. (Principal Investigator)
1/02/20 → 31/01/25
Project: Research