Managing Contradictions in Retrieval-Augmented Generation

Top post
Retrieval-Augmented Generation: Challenges and New Approaches in Dealing with Conflicting Information
Retrieval-Augmented Generation (RAG) has established itself as a promising method for improving the accuracy and factuality of Large Language Models (LLMs). By incorporating external information sources, LLMs can access a broader knowledge base and thus generate more comprehensive and precise answers. However, practice shows that RAG systems are often confronted with ambiguous user queries and potentially conflicting information from different sources. At the same time, they must filter out inaccurate information from noisy or irrelevant documents. Previous research has mostly considered these challenges in isolation, focusing on only one aspect at a time, such as dealing with ambiguities or robustness against misinformation.
A new approach that considers multiple factors simultaneously is enabled by the recently introduced RAMDocs dataset (Retrieval with Ambiguity and Misinformation in Documents) and the associated MADAM-RAG framework. RAMDocs simulates complex and realistic scenarios for conflicting evidence related to a user query, including ambiguity, misinformation, and noise. MADAM-RAG is based on a multi-agent approach where LLM agents discuss the merits of an answer over multiple rounds. An aggregator then collects the answers that refer to unambiguous entities and discards misinformation and noise. This allows various sources of conflict to be addressed jointly.
The effectiveness of MADAM-RAG has been demonstrated using established benchmarks such as AmbigDocs and FaithEval. AmbigDocs requires the presentation of all valid answers for ambiguous queries, while FaithEval focuses on the suppression of misinformation. The results show that MADAM-RAG, with both closed-source and open-source models, achieves significant improvements compared to established RAG baselines. For example, on AmbigDocs, improvements of up to 11.40% were achieved, and on FaithEval with Llama3.3-70B-Instruct, even up to 15.80% (absolute) improvement was reached.
RAMDocs presents significant challenges to existing RAG baselines. For instance, Llama3.3-70B-Instruct only achieves an exact match score of 32.60%. Although MADAM-RAG represents an important step in dealing with conflicting information, analyses show that there is still considerable potential for improvement, particularly with an increasing imbalance between supporting evidence and misinformation.
Research in the field of RAG is increasingly focusing on the development of more robust and reliable systems that can deliver accurate and fact-based results even in complex information landscapes. The consideration of ambiguities, misinformation, and noise in training data and the development of conflict resolution strategies are central challenges. Multi-agent approaches like MADAM-RAG offer promising possibilities to overcome these challenges and further enhance the performance of RAG systems. The development of effective evaluation metrics that adequately capture the ability of RAG systems to handle conflicting information remains an important aspect of future research.