WISE Benchmark Evaluates World Knowledge in Text-to-Image Generation

The Importance of World Knowledge for Text-to-Image Generation: A New Benchmark Called WISE

Text-to-image (T2I) models have made remarkable progress in recent years and are capable of generating impressive images from text input. However, the focus of previous research and evaluation has been mainly on the realism of the generated images and the superficial correspondence between text and image. The ability of these models to understand more complex semantic relationships and integrate world knowledge into the generation process has received less attention so far.

To address this gap, WISE (World Knowledge-Informed Semantic Evaluation) has been developed, a new benchmark specifically designed to evaluate the integration of world knowledge in text-to-image generation. WISE goes beyond the simple mapping of words to pixels and challenges the models with 1000 carefully crafted prompts covering 25 sub-areas of general cultural knowledge, spatio-temporal reasoning, and natural sciences.

The Limits of Conventional Metrics and the Introduction of WiScore

Traditional metrics like the CLIP score, often used to evaluate T2I models, mainly focus on the similarity between text and image on a rather superficial level. They do not capture a model's ability to understand deeper semantic relationships and world knowledge and incorporate them into image generation. To address this shortcoming, WiScore was developed as part of WISE, a new quantitative metric that specifically evaluates the agreement between the generated image and the world knowledge contained in the prompt.

Comprehensive Tests and Results

As part of the development of WISE, 20 different models, including 10 dedicated T2I models and 10 universal multimodal models, were comprehensively tested. The models were confronted with the 1000 structured prompts from the 25 sub-areas. The results of these tests showed that current T2I models still have considerable difficulty effectively integrating and applying world knowledge into the generation process. This highlights the need for further research in this area to improve the next generation of T2I models.

Outlook and Significance for the Future of T2I Models

WISE provides an important foundation for the future development and evaluation of T2I models. By focusing on the integration of world knowledge, WISE enables a more differentiated assessment of the capabilities of these models and identifies areas where improvements are needed. The development of WiScore as a specialized metric helps to extend the evaluation of T2I models beyond mere image quality and emphasizes the importance of semantic understanding and knowledge integration.

The results of the WISE benchmark tests show that the integration of world knowledge into T2I models is a central challenge that must be addressed to develop the next generation of more powerful and versatile T2I models. The availability of WISE and WiScore provides researchers and developers with valuable tools to make progress in this important area.

The Most Important Findings of WISE at a Glance:

- Focus on world knowledge in text-to-image generation - 1000 carefully crafted prompts in 25 sub-areas - WiScore: A new metric for evaluating knowledge integration - Tests with 20 different T2I and multimodal models - Identification of weaknesses in knowledge integration Bibliography: Niu, Y. et al. (2025). WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation. arXiv preprint arXiv:2503.07265. Niu, Y. et al. (2025). WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation. arXiv preprint arXiv:2503.07265v1. Offline Evaluation of Set-Based Text-to-Image Generation. ResearchGate. Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge. Advances in Neural Information Processing Systems. Interactive Visual Assessment for Text-to-Image Generation Models. ResearchGate. CVPR 2024 Accepted Papers. The IEEE/CVF Computer Vision and Pattern Recognition Conference. ScienceDirect - Journal of Multivariate Analysis. Transactions on Machine Learning Research. NeurIPS 2024 Papers.