CoRe² Improves Speed and Quality of Text-to-Image Generation

Faster and Better Text-to-Image Generation with CoRe²

The development of text-to-image (T2I) models that generate both fast and high-quality images is a central research field in Artificial Intelligence. Previous approaches focused either on improving the visual quality of the generated images at the expense of speed or on accelerating the generation process without improving the underlying model capacity. Furthermore, most inference methods could not simultaneously guarantee stable performance in both diffusion models (DMs) and visual autoregressive models (ARMs).

A new approach called CoRe² (Collect, Reflect, and Refine) now promises a remedy. This innovative inference paradigm consists of three sub-processes: Collect, Reflect, and Refine.

In the first step, Collect, CoRe² captures trajectories of so-called Classifier-Free Guidance (CFG). CFG is a technique that improves the quality of generated images by controlling the generation process. The collected data is used in the second step, Reflect, to train a "weak" model. This weak model learns the easy-to-capture image content, thereby reducing the number of function evaluations during inference by half. In the final step, Refine, CoRe² utilizes "weak-to-strong" guidance to optimize the conditional output. This improves the model's ability to generate high-frequency and realistic content that is difficult for the base model to capture.

To the best of our current knowledge, CoRe² is the first method to demonstrate both efficiency and effectiveness across a broad spectrum of DMs, including SDXL, SD3.5, and FLUX, as well as ARMs like LlamaGen. It has shown significant performance improvements in benchmarks such as HPD v2, Pick-of-Pic, Drawbench, GenEval, and T2I-Compbench.

Another advantage of CoRe² is its seamless integration with state-of-the-art techniques like Z-Sampling. In combination with Z-Sampling, CoRe² surpasses it by 0.3 and 0.16 in the PickScore and AES metrics, respectively, while achieving a time saving of 5.64 seconds when using SD3.5.

The development of CoRe² represents a significant advancement in the field of T2I generation. By combining three intelligent sub-processes, it succeeds in significantly improving both the speed and quality of image generation. The compatibility with various model architectures and the ability to integrate with existing methods underscore the potential of CoRe² for future applications in AI-driven image synthesis.

For companies like Mindverse, which specialize in AI-powered content creation, CoRe² opens up new possibilities. The improved efficiency and quality of image generation can shorten development time and increase the performance of AI tools for content creation. This applies to both the creation of marketing materials and the development of customized AI solutions such as chatbots, voicebots, AI search engines, and knowledge systems.

Bibliographie: http://paperreading.club/page?id=291753 https://scispace.com/ https://www.sciencedirect.com/science/article/pii/S0268401223000233 https://arxiv.org/html/2403.18746v1 https://www.sessionlab.com/blog/brainstorming-techniques/ https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-8/ https://dl.acm.org/doi/10.1145/3544548.3581255 https://www.sciencedirect.com/science/article/pii/S0963868720300081 https://nips.cc/virtual/2024/papers.html https://openreview.net/pdf?id=hSyW5go0v8