Distilling Diversity: New Research Improves Efficiency and Control in Diffusion Models

Top post
Distillation of Diversity and Control in Diffusion Models: A Breakthrough for More Efficient AI Image Generation
Diffusion models have established themselves as powerful tools for generating images. They enable the creation of high-quality and diverse images from text descriptions or other input modalities. However, the computational cost for inference, i.e., image generation, is considerable for large diffusion models. Distilled diffusion models offer a solution to this problem by reducing the complexity of the original model (base model) and thus significantly increasing the inference speed. A previous drawback of these distilled models, however, was a limited diversity of the generated images compared to the base model.
New research now presents a promising approach to overcome this limitation. The study shows that distilled models, despite the loss of diversity, retain the fundamental concept representations of the base model. This enables so-called "control distillation": Control mechanisms, such as Concept Sliders and LoRAs (Low-Rank Adaptation), trained on the base model, can be directly transferred to the distilled model – and vice versa – without additional training.
The researchers investigated the causes of diversity loss during distillation and developed a new analysis tool for this purpose: the "Diffusion Target (DT) Visualization". This tool visualizes how models "predict" the final result in the intermediate steps of image generation. DT visualization allows the identification of artifacts and inconsistencies in the generation process. The analysis revealed that the initial timesteps of the diffusion have the greatest influence on the diversity of the output, while later steps mainly refine details.
Based on these findings, the researchers propose a hybrid inference approach, the so-called "diversity distillation". In this method, the base model is used only for the first, critical timestep before switching to the more efficient distilled model. Experiments show that this simple modification not only restores the diversity of the base model in the distilled model but even surpasses it. At the same time, the speed of inference remains almost as high as when using the distilled model alone – all without additional training or model modifications.
The Key Research Findings at a Glance:
Control mechanisms can be transferred between base and distilled models.
The initial timesteps of the diffusion are crucial for the diversity of the output.
Diversity distillation allows for increased diversity while maintaining high inference speed.
These research results open up new possibilities for the efficient generation of diverse and high-quality images with diffusion models. The findings could lead to the development of new and improved distillation methods and further advance the use of AI image generation in various application areas. Especially for companies like Mindverse, which develop customized AI solutions, these results offer valuable potential for optimizing chatbots, voicebots, AI search engines, and knowledge systems.
Bibliography: - Hsiao, Yu-Ting, et al. "Plug-and-Play Diffusion Distillation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. - Meng, Chenlin, et al. "On Distillation of Guided Diffusion Models." Advances in Neural Information Processing Systems. 2024. - Kang, Minguk, et al. "Diffusion-GAN: Training GANs with Diffusion." arXiv preprint arXiv:2410.06084 (2024). - Crowson, Katherine, et al. "Diversity-Rewarded CFG Distillation." arXiv preprint arXiv:2410.06084 (2024). - Ho, Jonathan, et al. "Cascaded Diffusion Models for High Fidelity Image Generation." ICLR 2024. - Zhou, Wang, et al. "Boot: Boosting out-of-distribution robustness via selective augmentation and consistency regularization." arXiv preprint arXiv:2503.10637 (2025).