Scalable and Versatile 3D Generation with Structured 3D Latents

The development of 3D content is experiencing rapid progress through the use of Artificial Intelligence (AI). New generative AI models enable the creation of 3D objects from text or image descriptions, opening up exciting possibilities for various applications, from game development to product design. A promising approach in this area is the use of structured 3D latents, which enable scalable and flexible generation of high-quality 3D assets.

Structured 3D Latents: A New Approach

Traditional 3D generation methods are often based on the direct modeling of 3D representations such as polygon meshes, voxel grids, or implicit functions. However, these approaches can have limitations regarding scalability and the quality of the generated objects. An innovative approach that addresses these challenges is the use of Structured LATents (SLAT). SLAT combines the advantages of sparse 3D structures with dense visual features extracted from powerful pre-trained image models.

The core idea of SLAT is to define local latents on active voxels that intersect the surface of a 3D object. These local latents are encoded by fusing and processing image features from different views of the 3D object. The image features, derived from pre-trained image models, capture detailed geometric and visual properties and complement the coarse structure provided by the active voxels. By using SLAT, various decoders can be employed to decode the latent representation into different 3D formats such as radiance fields, 3D Gaussian models, or polygon meshes. This approach allows for a flexible selection of the output format and high quality of the generated 3D objects.

TRELLIS: A Powerful 3D Generation Model

Based on SLAT, a family of large 3D generation models called TRELLIS has been developed. TRELLIS uses rectified flow transformers as backbone models and is trained with text or image descriptions as conditions. The training process takes place in two stages: first, the sparse structure of SLAT is generated, followed by the generation of the latent vectors for the non-empty cells. TRELLIS was trained with up to 2 billion parameters on a large dataset of 500,000 different 3D objects.

The results show that TRELLIS can generate high-quality 3D objects with detailed geometry and vibrant textures. Compared to existing methods, TRELLIS achieves significant improvements in the quality and flexibility of 3D generation. Furthermore, TRELLIS allows for the flexible editing of 3D objects, such as deleting, adding, or replacing local regions, based on text or image descriptions.

Applications and Future Perspectives

The application possibilities of TRELLIS and SLAT are diverse. In game development, 3D assets can be generated quickly and efficiently. In product design, prototypes can be created and tested virtually. In the field of architecture, building models can be generated from text descriptions. The flexible selection of the output format allows adaptation to different use cases and workflows.

The development of structured 3D latents and models like TRELLIS represents a significant advance in the field of 3D generation. The combination of sparse structures with dense visual features enables the scalable and versatile generation of high-quality 3D objects. Future research could focus on improving the generation quality, expanding the supported 3D formats, and developing new interactive 3D editing features. The advancements in this area will further simplify the creation of 3D content and open up new possibilities for creative applications.

Bibliographie: https://arxiv.org/abs/2412.01506 https://trellis3d.github.io/ https://arxiv.org/html/2412.01506v1 https://www.reddit.com/r/StableDiffusion/comments/1h7leqx/structured_3d_latents_for_scalable_and_versatile/ https://openreview.net/pdf/73ba4b5d27d022a07fdc057a6b490740bcbeabc4.pdf https://paperswithcode.com/paper/direct3d-scalable-image-to-3d-generation-via https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06815.pdf https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Rapid_3D_Model_Generation_with_Intuitive_3D_Input_CVPR_2024_paper.pdf https://taohuumd.github.io/projects/StructLDM/ https://twitter.com/_h0x0d_/status/1863790418988847277