The paper presents a method called DiffHarmony that adapts a pre-trained latent diffusion model, specifically Stable Diffusion, to the image harmonization task. The key challenges addressed are:
Computational resource consumption of training diffusion models from scratch: DiffHarmony leverages the pre-trained Stable Diffusion model to quickly converge on the image harmonization task.
Reconstruction error induced by the VAE compression in latent diffusion models: Two strategies are proposed to mitigate this issue:
Extensive experiments on the iHarmony4 dataset demonstrate the superiority of the proposed DiffHarmony method compared to state-of-the-art image harmonization approaches. The method achieves the best overall performance in terms of PSNR, MSE, and foreground MSE metrics. Further analysis shows that DiffHarmony particularly excels when the foreground region is large, compensating for the reconstruction loss from the VAE compression.
Başka Bir Dile
kaynak içeriğinden
arxiv.org
Önemli Bilgiler Şuradan Elde Edildi
by Pengfei Zhou... : arxiv.org 04-10-2024
https://arxiv.org/pdf/2404.06139.pdfDaha Derin Sorular