The paper proposes a novel decoding technique called "permutation self-consistency" to improve the listwise ranking ability of large language models (LLMs). The key idea is to marginalize out the positional biases in LLMs by repeatedly shuffling the input list, passing it through the LLM, and then aggregating the resulting rankings.
The authors first demonstrate that LLMs exhibit positional biases, especially in the middle of long input lists, which can lead to poor ranking performance. To address this, they introduce permutation self-consistency, which has two main steps:
The authors provide theoretical guarantees, showing that the Kemeny-Young optimal ranking used in the aggregation step can recover the true ranking under certain noise distributions.
Empirically, the authors evaluate permutation self-consistency on three sorting tasks (math expressions, words, and sentences) and two passage reranking datasets. They consistently observe improvements of up to 34-52% for the Mistral model, 7-18% for GPT-3.5, and 8-16% for LLaMA v2 (70B) compared to conventional inference. The authors also conduct analyses to justify their design choices, such as the number of aggregated rankings and the use of Kemeny ranking over alternative methods.
Overall, the paper introduces a novel and effective technique for improving the listwise ranking capabilities of black-box LLMs, with potential applications in various domains that require high-quality ranking, such as information retrieval, recommendation systems, and question answering.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Raphael Tang... alle arxiv.org 04-23-2024
https://arxiv.org/pdf/2310.07712.pdfDomande più approfondite