Multi-Agent LLM Training Enhances Reasoning Capabilities

Cooperative AI: Multi-Agent LLM Training Improves Reasoning Abilities

Large language models (LLMs) have made impressive strides in natural language processing in recent years. However, they are mostly used as individual models, with their results reviewed and refined by humans. The potential of jointly trained, collaborative models has remained largely unexplored. While there are promising results in the field of multi-agent communication and debate, little progress has been made in training models that work together on tasks.

A new research approach called "Multi-agent LLM training" (MALT) aims to improve the reasoning abilities of LLMs through the collaboration of multiple specialized agents. At the core of MALT is the idea of equipping different LLMs with specific roles and having them iteratively solve problems in a sequential process. A generator model initially creates solution proposals, a verifier model checks their validity, and a refiner model optimizes the proposals based on the verifier's feedback.

This iterative process is supported by a novel, trajectory expansion-based, synthetic data generation process. This allows the system to learn from both successful and failed solution attempts. A reward system based on joint outcomes governs the allocation of credit to individual agents. This enables the specialized skills of each model to be autonomously improved within the joint sequential system.

Initial Successes and Future Prospects

Initial tests of MALT with Llama 3.1 8B models on datasets for mathematical reasoning (MATH), common-sense reasoning (GSM8k), and Community Question Answering (CQA) show promising results. MALT achieved relative improvements of 14.14% on MATH, 7.12% on GSM8k, and 9.40% on CQA compared to the same base model.

These results suggest early progress in the cooperative performance of multi-agent systems in mathematical and common-sense reasoning tasks. MALT offers a concrete research direction for multi-agent LLM training approaches and could pave the way for more autonomous and powerful AI systems. The ability of LLMs to verify both facts and conclusions plays a crucial role, enabling more robust evaluation and filtering of proposed solutions.

Further research is necessary to investigate the scalability and applicability of MALT to more complex problems. In particular, fine-tuning the interaction between agents, optimizing reward mechanisms, and developing efficient training methods are important areas for future work.

The development of MALT is situated within a broader research landscape that explores various approaches to improving LLMs. These include prompting techniques, self-consistency methods, and the integration of external knowledge. MALT represents a complementary approach that leverages the potential of collaboration between specialized LLMs to enhance the reasoning and problem-solving abilities of AI systems.

Bibliography http://paperreading.club/page?id=269958 https://arxiv.org/html/2409.11527v2 https://arxiv.org/abs/2409.11527 https://www.researchgate.net/publication/384115481_Improving_LLM_Reasoning_with_Multi-Agent_Tree-of-Thought_Validator_Agent https://composable-models.github.io/llm_debate/ https://openreview.net/forum?id=uwagVHmyNA&referrer=%5Bthe%20profile%20of%20Paul%20Mineiro%5D(%2Fprofile%3Fid%3D~Paul_Mineiro1) https://huggingface.co/papers/2312.10003 https://bohrium.dp.tech/paper/arxiv/2311.09618 https://aclanthology.org/2024.findings-emnlp.427.pdf https://icml.cc/virtual/2024/poster/32620