[Paper Reading] ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models-编程知识

[Paper Reading] ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models

news/2025/4/2 17:49:06/文章来源:https://www.cnblogs.com/fariver/p/18377021

ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models

link
时间：23.11
机构：Standford

TL;DR

提出ControlNet算法模型，用来给一个预训练好的text2image的diffusion model增加空间条件控制信息。作者尝试使用5w-1M的edges/depth/segmentation/pose等信息训练ControlNet，都能得到比较好的生成效果。为下游文生图使用者提供了极大的便利。

Method

ZeroConv
FreezeNet与ControlNet模型是在Decoder部分融合特征的，ControlNet Decoder都是从ZeroConv初始化的，根据下面公式来看，从ControlNet连入FreezeNet的特征一开始是全零所以融合到Freeze模型上不影响原始效果。
这么设计的好处：
效果方面：
a) 保留了原始Encoder的参数。b) Decoder是ZeroConv相当于让ControlNet逐步学习参与进来。
性能方面：FreezeNet不需要backward，提升速度与降低显存

As tested on a single NVIDIA A100 PCIE 40GB, optimizing Stable Diffusion with Control- Net requires only about 23% more GPU memory and 34% more time in each training iteration, compared to optimizing Stable Diffusion without ControlNet.