Semi-Instruct bridges the gap between Natural-Instruct and Self-Instruct to improve code Large Language Models by converting diverse but improper codes into proper instruction-code pairs.
Large language models blur the lines between machine- and human-authored code, but DetectCodeGPT offers a novel method to detect machine-generated code by capturing distinct stylized patterns.
ProCQA introduces a large-scale programming question answering dataset from StackOverflow, improving code retrieval models.
Differentiable programming enables end-to-end differentiation of complex computer programs, allowing for gradient-based optimization of program parameters.
Large language models (LLMs) are utilized for decompilation, introducing the first open-source LLMs dedicated to decompilation and a benchmark emphasizing re-compilability and re-executability.
Criticism from junior programmers should be valued for potential insights and learning opportunities, even if it challenges the expertise of senior team members.
The author developed TinyPy Generator, a tool that generates random Python programs using context-free grammars to address challenges in obtaining high-quality code data. The approach ensures correctness, executability, privacy preservation, and low computational cost.