The paper introduces eigenpruning, a novel method for improving the performance of large language models (LLMs) on specific tasks. The key insights are:
Existing automated circuit discovery approaches, such as ACDC and Attribution Patching, use "big" nodes (attention heads and MLP layers) in their definitions, which may not capture the true computations in an LLM.
Instead of directly removing edges from the computational graph, eigenpruning removes singular values from weight matrices, which can lead to more natural changes in the model's activation distribution.
The eigenpruning method works as follows:
The authors test eigenpruning on two synthetic datasets (integer addition and multiplication) and three tasks from the SuperGLUE benchmark (CB, COPA, and RTE). They find that eigenpruning can significantly improve the performance of the Phi-2 model, particularly on the synthetic tasks. The results on the NLP tasks are more modest but still promising, with a 6% improvement on the COPA task.
The authors acknowledge several limitations, including the need to test eigenpruning on a wider range of models and the potential overfitting of the synthetic datasets. They also note the need to further explore the effects of finetuning in combination with eigenpruning.
Overall, the eigenpruning method presents a novel and computationally efficient approach to improving LLM performance on specific tasks, with the potential to provide insights into the inner workings of these complex models.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Tomá... om arxiv.org 04-05-2024
https://arxiv.org/pdf/2404.03147.pdfDiepere vragen