Enhancing Reasoning Abilities in Large Language Models through Reinforcement Learning and Tool Use

Strengthening Thought: Advances in R1-like Reasoning in Large Language Models

The development of AI systems capable of complex reasoning processes is a central concern of current research. A promising approach focuses on so-called "Slow-Thinking" models, which, in contrast to fast, intuitive reactions, simulate deliberate and step-by-step procedures. A recent research report from the STILL project highlights advances in this area and demonstrates how reinforcement learning (RL) and the use of tools can significantly enhance the capabilities of large language models (LLMs) in the realm of R1-like reasoning – i.e., deductive reasoning.

Reinforcement Learning as a Key Technology

The report emphasizes the importance of RL training for the development of Slow-Thinking models. Systematic experiments with various influencing factors on RL training, both with base models and fine-tuned models, illustrate the positive effect of this method. For example, the performance of the Qwen2.5-32B base model was consistently improved through RL training, which was reflected in both the length of the responses and the accuracy in tests.

Particularly noteworthy is the finding that even already high-performing models like DeepSeek-R1-Distill-Qwen-1.5B can be further optimized through RL training. In this specific case, the model achieved an accuracy of 39.33% in the AIME 2024 benchmark, a test for evaluating skills in mathematical and logical reasoning, after training.

Tool Use for Enhanced Performance

In addition to RL training, the STILL project also investigated the use of tools to improve the reasoning abilities of LLMs. The results show that the integration of tools significantly increases performance in R1-like reasoning. Using a greedy search strategy, this approach achieved an impressive accuracy of 86.67% in the AIME 2024 benchmark. This underscores the potential of tools to expand the capabilities of LLMs in the field of complex reasoning.

Outlook and Significance for the Future of AI

The results presented in the report mark an important step in the development of AI systems with improved reasoning capabilities. The combination of RL training and tool use opens promising possibilities for the future of R1-like reasoning in LLMs. The research findings of the STILL project contribute to deepening the understanding of Slow-Thinking models and unlocking their application potential in various areas. From solving complex mathematical problems to supporting decision-making processes – the ability of AI systems to think step-by-step and deliberately holds enormous potential for future innovations.

The resources developed in the STILL project are publicly available and offer the research community the opportunity to build upon the achieved progress and further advance the development of Slow-Thinking models. This underlines the collaborative nature of AI research and the shared ambition to continuously push the boundaries of what is possible.

Bibliographie: - https://arxiv.org/abs/2503.04548 - https://arxiv.org/html/2503.04548v1 - http://paperreading.club/page?id=289725 - https://github.com/RUCAIBox/Slow_Thinking_with_LLMs - https://huggingface.co/papers - https://huggingface.co/papers/2501.12948 - https://papers.cool/arxiv/cs.CL?sort=1 - https://openreview.net/forum?id=1Y5hMMuCFU - https://d-nb.info/1191912698/34 - https://aclanthology.org/2024.findings-emnlp.206.pdf