Detecting inappropriate material used to train AI image generation models

Jeremy Budd*, Gandhar Joshi, Lucia Noelle, Matthew Pickering, Siddharth Setlur, Irina Starikova, Charles Morehead, Ningyuan Xu, Kairui Zhang, Maxim Zyskin

*Corresponding author for this work

Research output: Other contribution

Abstract

<jats:p>Generative AI models, for example diffusion models, have emerged as state-of-the-art methods for generating novel images described by a text prompt. Open-source AI models can furthermore be fine-tuned to produce images similar to a given dataset of images. However, bad actors may seek to use illegal images to fine-tune a model so that it produces inappropriate and harmful images. We investigate various methods for detecting whether such images have been used to fine-tune a given diffusion model. This task raises two key challenges: (1) Images from a suspicious model should not be produced. (2) Any prompts yielding inappropriate images may be obfuscated. We propose a multi-layered framework to overcome these challenges. We combine embedding analysis, trajectory classification, parameter inspections, and neural network encoding in a promising framework, and suggest that controlled experiments should be conducted to test this strategy in future work.</jats:p>
Original languageEnglish
DOIs
Publication statusPublished - 5 Feb 2026

Bibliographical note

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Keywords

  • Diffusion models
  • Forensics
  • Image-free auditing
  • Radon--Nikodym theory

Fingerprint

Dive into the research topics of 'Detecting inappropriate material used to train AI image generation models'. Together they form a unique fingerprint.

Cite this