核心概念
Images generated by Latent Diffusion Models (LDMs) can be effectively detected by identifying artifacts introduced by their autoencoders, eliminating the need for training on synthetic data and reducing computational costs.
統計
Users were creating more than 2 million images daily using DALL-E in 2022.
Adobe FireFly generated over 1 billion images in 3 months after its launch in 2023.
The LAION-5B dataset, containing 5 billion image and text pairs, was used for training.
Images were divided into 14 groups based on resolution, ranging from 300² to 6000² pixels.
A testing dataset included images generated by 12 different LDM models, including Stable Diffusion, DiT, Kandinsky 3, and user-trained models.
Three different detector architectures were used: ConvNext, EVA-02 ViT, and EfficientNet-V2 B0.
ConvNext Large achieved a TPR of up to 99.8% in detecting images from LDMs not included in its training data.
EVA-02 ViT L/14 demonstrated the highest robustness to JPEG compression and image resizing.