Belangrijkste concepten
This paper introduces MdEval, a new benchmark for evaluating the code debugging capabilities of large language models across 18 programming languages, addressing the limitations of existing benchmarks that primarily focus on Python.
Liu, S., Chai, L., Yang, J., Shi, J., Zhu, H., Wang, L., ... & Li, Z. (2024). MDEVAL: Massively Multilingual Code Debugging. arXiv preprint arXiv:2411.02310v1.
This paper introduces a novel benchmark called MdEval designed to evaluate the code debugging capabilities of large language models (LLMs) in a multilingual context. The authors aim to address the limitations of existing code debugging benchmarks that primarily focus on Python and lack diversity in programming languages and bug types.