核心概念
Identifying layers within text-to-image models that control visual attributes can facilitate efficient model editing through closed-form updates.
摘要
The paper examines the effectiveness of knowledge localization across various open-source text-to-image models. It first observes that while causal tracing proves effective for early Stable-Diffusion variants, its generalizability diminishes when applied to newer text-to-image models like DeepFloyd and SD-XL for localizing control points associated with visual attributes.
To address this limitation, the paper introduces LOCOGEN, a method capable of effectively identifying locations within the UNet across diverse text-to-image models. Harnessing these identified locations within the UNet, the paper evaluates the efficacy of closed-form model editing across a range of text-to-image models leveraging LOCOEDIT.
Notably, for specific visual attributes such as "style", the paper discovers that knowledge can even be traced to a small subset of neurons and subsequently edited by applying a simple dropout layer, thereby underscoring the possibilities of neuron-level model editing.
统计
Text-to-image models like Stable-Diffusion, OpenJourney, SD-XL, and DeepFloyd have 70 to 227 cross-attention layers in the UNet.
For SD-v1-5 and SD-v2-1, knowledge about "style" is controlled from layer 8, while "objects" and "facts" are controlled from layer 6.
For SD-XL, knowledge about "style" and "facts" is controlled from layer 45, while "objects" are controlled from layer 15.
DeepFloyd exhibits prompt-dependent localization, unlike other models.
引用
"Identifying layers within text-to-image models which control visual attributes can facilitate efficient model editing through closed-form updates."
"Extending this framework, we observe that for recent models (e.g., SD-XL, DeepFloyd), causal tracing fails in pinpointing localized knowledge, highlighting challenges in model editing."
"Leveraging LOCOGEN, we probe knowledge locations for different visual attributes across popular open-source text-to-image models such as Stable-Diffusion-v1, Stable-Diffusion-v2, OpenJourney, SD-XL (Podell et al., 2023) and DeepFloyd."