Adapter Guidance Distillation Improves Efficiency of Diffusion Model Image Generation

Top post
More Efficient Image Generation with Diffusion Models through Adapter-Based Distillation
The generation of images using diffusion models has made enormous progress in recent years. A key factor for the quality of the generated images is the so-called Classifier-Free Guidance (CFG). This technique allows the generation process to be controlled by text prompts, thus achieving the desired results. However, CFG doubles the number of required neural function evaluations (NFEs) per step, thereby increasing the computational cost and generation time. A new approach called Adapter Guidance Distillation (AGD) now promises to significantly improve the efficiency of CFG.
AGD simulates the behavior of CFG in just a single pass, instead of the usual two passes. The core of this technique lies in the use of lightweight adapters that approximate the complex behavior of CFG. These adapters are trained on the existing diffusion model without changing its original weights. This leaves the base model intact and allows it to continue to be used for other tasks. Another advantage of this method is the ability to combine adapters from different checkpoints trained on the same base model.
In contrast to previous distillation methods, which adapt the entire model, AGD trains only minimal additional parameters (approximately 2%). This significantly reduces the resource requirements of the training process and allows the training of large models (up to 2.6 billion parameters) even on single consumer GPUs with 24 GB of VRAM. Previous approaches required multiple high-end GPUs for such models.
An important aspect of AGD is the consideration of the differences between training and inference. Existing distillation methods often train on standard diffusion paths, while AGD trains on CFG-guided paths. This approach leads to a better match between training and application and improves the quality of the generated images.
Extensive experiments show that AGD achieves comparable or even better FID scores (Fréchet Inception Distance), a common quality measure for generated images, compared to CFG. At the same time, AGD requires only half the NFEs, which doubles the generation speed. The developers of AGD plan to make the implementation of their method publicly available, which will facilitate integration into existing diffusion pipelines.
The development of AGD represents an important step towards increasing the efficiency of diffusion models. By reducing the computational cost and generation time while maintaining or even improving image quality, AGD opens up new possibilities for the use of diffusion models in various application areas, from the creation of artwork to the development of new drugs.
The combination of efficiency, flexibility, and the ability to train large models on commercially available hardware makes AGD a promising approach for the future of image generation with AI.
Bibliography: - https://arxiv.org/abs/2503.07274 - http://paperreading.club/page?id=290767 - https://arxiv.org/html/2405.05967v1 - https://storage.prod.researchhub.com/uploads/papers/2024/02/07/2402.00769.pdf - https://www.researchgate.net/publication/388791112_DICE_Distilling_Classifier-Free_Guidance_into_Text_Embeddings - https://mingukkang.github.io/Diffusion2GAN/static/paper/diffusion2gan_arxiv_v2.pdf - https://huggingface.co/papers?q=Hyper-SD - https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05967.pdf - https://openreview.net/pdf/dbb548a54050f82ad788c1ff54b1ab069059edbd.pdf