Parallel Processing Boosts Autoregressive Image Generation

Autoregressive Image Generation: A New Approach for Parallel Processing

The generation of images using autoregressive models has made significant progress in recent years. These models generate images pixel by pixel, based on the preceding pixels. Similar to writing text, where each word depends on the preceding context, the autoregressive model builds the image step by step. Traditionally, this generation occurs in a fixed order, such as from left to right and top to bottom (raster scan order). However, this sequential approach limits the speed of image generation and makes it difficult to apply to tasks like image inpainting or outpainting, where the pixels are not available in a predefined order.

A new approach, known as "Randomized Parallel Decoding," promises to overcome these limitations. Instead of generating the pixels in a fixed order, this approach allows for random and parallel processing. This means that multiple pixels can be calculated simultaneously, significantly increasing the speed of image generation. An example of this approach is ARPG (Autoregressive Image Generation with Randomized Parallel Decoding).

ARPG utilizes a novel, guided decoding procedure. At the core of this procedure is the separation of position information and image content. The position of the next pixel to be generated is encoded as a "Query," while the image content is represented in the form of "Key-Value pairs." This separation allows the model to explicitly control the position of the next pixel without affecting the representation of the image content.

By integrating this guidance into the so-called "Causal Attention" mechanism, ARPG can generate and train images in a completely random order. The need for bidirectional attention, as used in other models, is eliminated. This simplifies the architecture of the model and enables parallel processing.

The advantages of ARPG are particularly evident in its application to zero-shot tasks such as image inpainting, outpainting, and image resolution enhancement. Due to the flexible generation order, ARPG can handle these tasks efficiently and without prior training on specific datasets.

Initial results on the ImageNet-1K 256 benchmark demonstrate the potential of this approach. With only 64 sampling steps, ARPG achieves an FID (Fréchet Inception Distance) of 1.94. Compared to other autoregressive models of similar size, this represents a more than 20-fold increase in throughput while simultaneously reducing memory requirements by over 75%.

The development of ARPG and similar approaches represents an important step in autoregressive image generation. The parallel processing and the flexible generation order open up new possibilities for the application of these models in various fields, from image editing to the generation of synthetic data for training other AI systems.

Bibliography: - Haopeng Li, Jinyue Yang, Guoqi Li, Huan Wang. "Autoregressive Image Generation with Randomized Parallel Decoding." arXiv preprint arXiv:2503.10568 (2025). - Huang, J., et al. "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. - Stern, M., et al. "Blockwise Parallel Decoding for Deep Autoregressive Models." Advances in Neural Information Processing Systems. 2018. - Tao, C. "Autoregressive Models in Vision Survey." GitHub repository, github.com/ChaofanTao/Autoregressive-Models-in-Vision-Survey. - Esser, P., et al. "Parallelized Autoregressive Visual Generation." ResearchGate, 2024.