Conceitos Básicos
Integrating a self-driven reasoning augmentation process using Monte Carlo Tree Search (SRA-MCTS) significantly improves the code generation capabilities of large language models, particularly in solving complex problems, by enabling the models to autonomously generate and evaluate diverse reasoning paths.
Estatísticas
The performance of SRA-MCTS was evaluated on Human-Eval, Human-Eval+, MBPP, and MBPP+ benchmarks.
The study used gemma-2-2b-it, Meta-Llama-3.1-8B-Instruct, and Qwen2.5-14B-Instruct as baseline models.
The LeetCode dataset, focusing on medium and hard problems, was used for training.
After decontamination, the training dataset contained approximately 2,000 data samples.
SRA-MCTS demonstrated an average increase of 2 points on Human-Eval and Human-Eval+ benchmarks compared to data synthesized by a 70B model in the 2B model category.
In the 8B model category, SRA-MCTS showed similar gains of over 2 points.
On the MBPP+ benchmark, the 2B model trained with SRA-MCTS showed a nearly 7-point increase compared to the model trained without natural language data.
For the 8B model, the performance gap on MBPP benchmarks averaged around 7 points when trained with and without natural language data.
The largest performance gap was observed in the 14B model on MBPP+ for pass@10, with a 13-point difference between models trained with and without natural language data.
Citações
"The experiments conducted by ScaleAI provide concrete experimental validation for previous work on answers that providing LLMs with correct solutions in natural language as a part of the answer, even if incomplete (just 10-20 tokens), can substantially boost the performance on benchmarks."
"This demonstrates that providing solutions to large models can guide and inspire their reasoning process, and the correctness of the solution directly impacts the accuracy of the final result."