Relative Confidence Estimation Improves Reliability of Language Models

Top post
Language Models and their Reliability: Focus on Relative Confidence Estimation
Language models (LMs) have made enormous progress in recent years and are being used in an increasing number of areas. A crucial factor for their successful deployment is the reliability of their statements. Users need to be able to recognize when an LM makes a mistake and when it makes sense to consult human expertise. A promising approach to assessing reliability is confidence estimation, i.e., the model's ability to assess the certainty of its own answers.
A common method is absolute confidence estimation, where the model rates its confidence on a scale from 0 to 1. However, practice shows that LMs have difficulty estimating their confidence absolutely. The resulting values are often inaccurate and offer only limited benefit for assessing the correctness of the answers. Another disadvantage is the coarse granularity of the ratings, which makes differentiated assessment difficult.
A newer research approach therefore pursues relative confidence estimation. Here, questions are presented to the model in pairs, and it has to decide which question it feels more confident in answering correctly. This relative assessment of confidence circumvents the problems of absolute estimation and allows for a more precise assessment of reliability.
The functionality of relative confidence estimation can be compared to a tournament. Each question represents a "player," and the model's preferences correspond to the results of the individual "matches." Using ranking algorithms such as Elo rating or the Bradley-Terry model, these "match results" can be translated into confidence values. This creates a ranking of the questions, sorted by the model's confidence in its answers.
Studies have shown that relative confidence estimation leads to more reliable results than the absolute method. In various experiments with current LMs such as GPT-4, Gemini 1.5 Pro, and Llama 3.1 405B, relative confidence estimation was compared with absolute confidence estimation and self-consistency methods. The tests included 14 challenging tasks from the fields of STEM, social sciences, and commonsense reasoning. The results show that relative confidence estimation consistently delivers better results. Compared to direct absolute confidence estimation, it achieved an average improvement in AUC (Area under the ROC Curve) in selective classification of 3.5%. Compared to self-consistency approaches, the improvement was on average 1.7%.
For companies like Mindverse, which develop AI-powered content solutions, these findings are of great importance. Integrating relative confidence estimation into tools for text generation, chatbots, knowledge bases, and AI search engines can significantly increase the reliability and transparency of these systems. Users benefit from more precise information about the certainty of the AI-generated content and can make more informed decisions.
Research on confidence estimation in language models is still ongoing, but the results of relative confidence estimation so far are promising. It offers the potential to further improve the usability and trust in AI systems, thus paving the way for innovative applications in various fields.
Bibliography: https://www.arxiv.org/abs/2502.01126 https://arxiv.org/html/2502.01126v1 https://paperreading.club/page?id=281736 https://aclanthology.org/2023.trustnlp-1.28.pdf https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf https://aclanthology.org/2023.findings-acl.847.pdf https://openreview.net/forum?id=bxfKIYfHyx https://pmc.ncbi.nlm.nih.gov/articles/PMC9528860/ https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf