Abstract
<jats:p>Generative AI models, for example diffusion models, have emerged as state-of-the-art methods for generating novel images described by a text prompt. Open-source AI models can furthermore be fine-tuned to produce images similar to a given dataset of images. However, bad actors may seek to use illegal images to fine-tune a model so that it produces inappropriate and harmful images. We investigate various methods for detecting whether such images have been used to fine-tune a given diffusion model. This task raises two key challenges: (1) Images from a suspicious model should not be produced. (2) Any prompts yielding inappropriate images may be obfuscated. We propose a multi-layered framework to overcome these challenges. We combine embedding analysis, trajectory classification, parameter inspections, and neural network encoding in a promising framework, and suggest that controlled experiments should be conducted to test this strategy in future work.</jats:p>
| Original language | English |
|---|---|
| DOIs | |
| Publication status | Published - 5 Feb 2026 |
Bibliographical note
The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.Keywords
- Diffusion models
- Forensics
- Image-free auditing
- Radon--Nikodym theory
Fingerprint
Dive into the research topics of 'Detecting inappropriate material used to train AI image generation models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver