Boosting Open Source Models for Automated Bug Fixing with SoRFT

AI-Powered Debugging: A New Approach to Improving Open-Source Models

Debugging software is a time-consuming and complex process. While commercial AI models have already achieved considerable success in this area, they often come with high costs and data privacy concerns. Open-source models offer an attractive alternative but have so far struggled in terms of generalization and efficiency. A new research approach called SoRFT (Subtask-oriented Reinforced Fine-Tuning) promises to remedy this and significantly increase the performance of open-source models for debugging.

SoRFT: A Two-Stage Training Process

SoRFT takes an innovative approach by breaking down debugging into structured subtasks. These include locating the affected file, narrowing down the relevant function, identifying the faulty line, and finally generating the corrected code. The SoRFT training process takes place in two phases:

In the first phase, called "Rejection-Sampled Supervised Fine-Tuning," the model is trained with Chain-of-Thought (CoT) data. The data is filtered based on ground-truth information to improve the quality of the training. This approach ensures that the model learns from relevant and correct examples from the beginning.

The second phase is based on rule-based reinforcement learning. Here, the Proximal Policy Optimization (PPO) algorithm is used, which is guided by ground-truth-based rewards. This allows the model to develop optimal debugging strategies and continuously improve its performance.

Impressive Results and Improved Generalization

The effectiveness of SoRFT has been evaluated using established benchmarks such as SWE-Bench Verified and SWE-Bench Lite. The results show that SoRFT-trained models, especially in combination with models like Qwen-7B, achieve state-of-the-art performance among open-source models. For example, SoRFT-Qwen-7B was able to fix 21.4% of the bugs in SWE-Bench Verified.

Furthermore, SoRFT shows improved generalization capabilities. This means that the model can also react better to unknown error types and codebases than conventionally trained models. This is a crucial advantage in practice, as software projects often exhibit high variability.

Cost-Effective Alternative to Commercial Solutions

SoRFT offers a promising and cost-effective alternative to commercial AI models for debugging. By leveraging open-source resources and efficient training methods, companies and developers can benefit from the advantages of AI-powered debugging without incurring high costs or data privacy risks.

The research results on SoRFT highlight the potential of open-source models in the field of software development. Through innovative training methods like SoRFT, these models can further increase their performance and make a valuable contribution to automation and increased efficiency in software development.

Bibliographie: - https://huggingface.co/papers/2502.20127 - https://huggingface.co/papers - https://arxiv.org/abs/2401.08967 - https://neurips.cc/virtual/2024/poster/95369 - https://github.com/azminewasi/Awesome-LLMs-ICLR-24 - https://openreview.net/forum?id=1vDArHJ68h - https://www.sciencedirect.com/science/article/abs/pii/S0957417423028117 - https://aclanthology.org/volumes/2024.acl-long/ - https://dl.acm.org/doi/pdf/10.1613/jair.1.15278 - https://icml.cc/virtual/2024/session/35596 - https://arxiv.org/abs/2502.20127