The paper presents a method called DiffHarmony that adapts a pre-trained latent diffusion model, specifically Stable Diffusion, to the image harmonization task. The key challenges addressed are:
Computational resource consumption of training diffusion models from scratch: DiffHarmony leverages the pre-trained Stable Diffusion model to quickly converge on the image harmonization task.
Reconstruction error induced by the VAE compression in latent diffusion models: Two strategies are proposed to mitigate this issue:
Extensive experiments on the iHarmony4 dataset demonstrate the superiority of the proposed DiffHarmony method compared to state-of-the-art image harmonization approaches. The method achieves the best overall performance in terms of PSNR, MSE, and foreground MSE metrics. Further analysis shows that DiffHarmony particularly excels when the foreground region is large, compensating for the reconstruction loss from the VAE compression.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Pengfei Zhou... às arxiv.org 04-10-2024
https://arxiv.org/pdf/2404.06139.pdfPerguntas Mais Profundas