Human Perception of Uncertainty in Large Language Models

The Uncertainty of Large Language Models as Reflected in Human Perception

Large language models (LLMs) have made enormous progress in recent years and have become an integral part of many applications, from chatbots and automated text generation to research. However, with the increasing performance of these models, the question of their reliability and the assessment of their uncertainty is also coming into focus. A deeper understanding of the uncertainty of LLMs is crucial to both improve control over the models and strengthen user trust.

Current research is intensively engaged in quantifying the uncertainty of LLMs. Various approaches are being pursued, ranging from theoretically grounded measures to observing the behavior of the models. A promising branch of research focuses on the correlation between the uncertainty of LLMs and human perception of uncertainty.

A recently published study investigates a range of uncertainty measures to identify those that correlate with human uncertainty at the group level. The results show that Bayesian measures and a variation of entropy measures, the top-k entropy, tend to align with human behavior depending on the model size. Interestingly, it was found that some strong measures decrease in their similarity to human uncertainty as model size increases.

Multiple linear regression demonstrated that combining several uncertainty measures allows for a comparable match with human uncertainty while reducing the dependence on model size. This finding opens up new possibilities for the development of more robust and reliable LLMs.

The Significance for AI Applications

The findings of this research are particularly relevant for companies like Mindverse, which specialize in the development and implementation of AI solutions. A better understanding of the uncertainty of LLMs enables the development of tailored solutions such as chatbots, voicebots, AI search engines, and knowledge systems that are more precise, reliable, and trustworthy.

The ability to assess the uncertainty of an LLM can, for example, be used to qualify the responses of a chatbot. If the model is uncertain about a particular query, it could inform the user or request additional information instead of providing a potentially incorrect answer. This improves the user experience and increases trust in the AI solution.

Furthermore, uncertainty quantification plays an important role in the development of AI systems for critical applications, such as in the medical field. Here, it is essential that the system is aware of its uncertainty and communicates it transparently to avoid incorrect decisions.

Outlook

Research on the uncertainty of LLMs is a dynamic field with great potential. Future studies could focus on the development of new uncertainty measures that correlate even better with human perception. Investigating the effects of different training data and methods on the uncertainty of LLMs is also a promising research approach. The results of this research will contribute to making the next generation of AI systems even more powerful, reliable, and trustworthy.

Bibliography

Moore, K., Roberts, J., Watson, D., & Wisniewski, P. (2025). Investigating Human-Aligned Large Language Model Uncertainty. arXiv preprint arXiv:2503.12528.

Desai, S., & Durrett, G. (2023). Calibration of Large Language Models for Open-Ended Text Generation. arXiv preprint arXiv:2310.11732.

Lee, J., Wallace, E., & Singh, S. (2025). Uncertainty-Aware Prompting for Large Language Models. arXiv preprint arXiv:2502.10709.

Nguyen, V., Nguyen, T., Tang, H., & Phung, D. (2024). Calibrating Large Language Models with Holistic Feedback. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 7970–7985.

Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., ... & Amodei, D. (2021). Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.

Perez, E., Ringer, S., Lukosiute, K., Nguyen, K., Chen, E., Heiner, S., ... & Irving, G. (2024). Discovering Language Model Behaviors with Model-Written Evaluations. arXiv preprint arXiv:2409.00621.

Finlayson, M. A., Bowers, J., Guha, A., Plank, B., & Jurafsky, D. (2024). Can LLMs Express Uncertainty? An Empirical Evaluation of Uncertainty Measures. arXiv preprint arXiv:2405.14758.

Amini, A., Gabriel, R. A., Lin, P., Nguyen, R., Johnson, N., Ramesh, A., ... & Schulman, J. (2022). Math word problem solving with transformers through decomposition and symbolic reasoning. arXiv preprint arXiv:2212.03024.

Lai, F., Zhou, Y., Zhang, J., Chen, Y., Liu, Y., & Liu, S. (2024). Less is More: Deliberate Reasoning with Uncertainty Estimation Improves Factuality, Calibration, and Out-of-Distribution Robustness in Large Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 7881–7904.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2023). Aligning language models with human values. arXiv preprint arXiv:2302.02921.

Bevendorff, J., Benton, A., Finlayson, M. A., Guha, A., Kummerfeld, J. K., & Plank, B. (2024). Survey on Uncertainty Estimation in Large Language Models: Sources, Methods, Applications, Challenges. arXiv preprint arXiv:2412.14591.

```