Evaluating Multimodal Generative AI with the Korean National Educational Test Benchmark (KoNET)

Evaluating Multimodal Generative AI in the Context of Korean Educational Standards

The development of generative AI systems is progressing rapidly. These systems are increasingly capable of handling complex tasks that were previously reserved for human intelligence. In order to explore the potential of these technologies in the field of education and to objectively measure their performance, standardized evaluation criteria are needed. In this context, the development of benchmarks based on established educational standards is gaining importance.

A promising approach in this direction is the Korean National Educational Test Benchmark (KoNET). KoNET uses the Korean national educational tests to evaluate the performance of multimodal generative AI systems. These tests are characterized by their high standards and the diversity of the questions, thus providing a solid basis for a comprehensive analysis of AI performance.

The Structure of KoNET

KoNET comprises four different examinations covering different educational levels:

The Korean Elementary General Educational Development Test (KoEGED) for elementary school, the Korean Middle General Educational Development Test (KoMGED) for middle school, the Korean High General Educational Development Test (KoHGED) for high school, and finally the Korean College Scholastic Ability Test (KoCSAT), which assesses college readiness. This tiered structure allows for the analysis of the performance of AI systems across different age groups and difficulty levels.

The Significance of KoNET for AI Research

KoNET offers valuable insights into the performance of AI models in less-researched languages like Korean. By using an established education system as a benchmark, the results can be compared and interpreted internationally. The evaluation includes both open-source and open-access models as well as closed APIs. Various aspects are considered, such as the difficulty of the tasks, the subject diversity, and the human error rate.

Open Access to KoNET

A key advantage of KoNET is the planned release of the code and the dataset builder as open-source on GitHub. This allows other researchers to use the benchmark data and conduct their own experiments. The transparency and accessibility promote collaboration and exchange within the AI community and accelerate the further development of generative AI systems.

Outlook

KoNET represents an important step towards standardized evaluation of multimodal generative AI systems. By using established educational tests, KoNET offers a solid foundation for the objective measurement of AI performance. The disclosure of the code and data allows for wide usage and contributes to the further development of AI research. Future research could expand the benchmark to include other languages and educational systems to gain an even more comprehensive understanding of the capabilities of generative AI systems.

Bibliographie: Park, S., & Kim, G. (2025). Evaluating Multimodal Generative AI with Korean Educational Standards. arXiv preprint arXiv:2502.15422. https://arxiv.org/abs/2502.15422 https://arxiv.org/html/2502.15422v1 https://huggingface.co/papers https://ieeexplore.ieee.org/iel8/6287639/10380310/10695056.pdf https://www.researchgate.net/publication/383303500_Generative_Artificial_Intelligence_in_Education_Advancing_Adaptive_and_Personalized_Learning https://www.oph.fi/sites/default/files/documents/Guidance%20for%20generative%20AI%20in%20education%20and%20research.pdf https://www.researchgate.net/publication/373495223_Transforming_Education_A_Comprehensive_Review_of_Generative_Artificial_Intelligence_in_Educational_Settings_through_Bibliometric_and_Content_Analysis https://www.sciencedirect.com/science/article/pii/S2666920X24000985 https://m.kjronline.org/DOIx.php?id=10.3348/kjr.2023.0818 https://www.mdpi.com/2071-1050/16/22/9779