Efficient Reasoning with Large Language Models: A Survey

Efficient Reasoning with Large Language Models: An Overview

Large language models (LLMs) have demonstrated remarkable capabilities in tackling complex tasks. Advances in large reasoning models (LRMs), such as OpenAI's GPT-4 or DeepSeek-R1, have further enhanced performance in so-called System-2 thinking areas like mathematics and programming. This is achieved through the use of supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to optimize step-by-step reasoning, known as Chain-of-Thought (CoT). While longer CoT sequences lead to better results, they also cause significant computational overhead due to verbose and redundant outputs – a phenomenon known as "overthinking."

The challenge lies in increasing the efficiency of reasoning without compromising the models' performance. This article provides an overview of current advances in research on efficient reasoning with LLMs. The various approaches can be broadly categorized into three groups:

Model-Based Efficient Reasoning

This approach focuses on optimizing the models themselves. There are two main directions: compressing existing, complex reasoning models and directly training efficient models. Compression aims to reduce the model size and thus the computational cost without significantly impacting performance. Direct training of efficient models, on the other hand, pursues the approach of developing models from scratch that are designed for resource-saving reasoning from the outset.

Inference Output-Based Efficient Reasoning

Here, the focus is on dynamically adapting the reasoning process during inference. By analyzing intermediate results, attempts are made to identify and eliminate redundant or unnecessary steps. This can be achieved, for example, by terminating the CoT sequence early once a sufficiently confident result has been obtained. The goal is to reduce the number of steps and the length of the output, thereby lowering computational costs.

Input Prompt-Based Efficient Reasoning

This approach focuses on the design of the input prompts. By adjusting prompt properties, such as difficulty or length limitations, the efficiency of the reasoning process can be influenced. For example, targeted prompts can encourage the model to generate more concise and less redundant answers. The use of specific keywords or phrasing can also improve the efficiency of reasoning.

In addition to these three main categories, there are other relevant research areas, such as the use of efficient training data, the exploration of the reasoning capabilities of smaller language models, and the development of suitable evaluation methods and benchmarks. Research in this area is dynamic and promising. Efficient reasoning is crucial for the widespread application of LLMs in real-world scenarios, especially in resource-constrained environments.

Bibliography: Sui, Y., et al. "Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models." arXiv preprint arXiv:2503.16419 (2025). Wei, J., et al. "Chain of thought prompting elicits reasoning in large language models." arXiv preprint arXiv:2201.11903 (2022). Zamir, A. R., et al. "Symbolic reasoning and program synthesis with large language models." arXiv preprint arXiv:2212.10403 (2022). Huang, W., et al. "Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models." arXiv preprint arXiv:2404.14294 (2024). Min, S., et al. "Reasoning with Large Language Models: a Survey." arXiv preprint arXiv:2402.05174 (2024). Su, J., et al. "Selective annotation makes language models better few-shot learners." arXiv preprint arXiv:2404.14294 (2024). Dai, D., et al. "Finetuned language models are zero-shot learners." arXiv preprint arXiv:2109.01652 (2021). Brown, T. B., et al. "Language models are few-shot learners." arXiv preprint arXiv:2005.14165 (2020). Chen, M., et al. "Efficient Large Language Model Training and Inference Survey." arXiv preprint arXiv:2502.17419 (2025).