The article introduces Chatbot Arena, an open platform for evaluating LLMs based on human preferences. It employs a pairwise comparison approach and crowdsourcing to gather diverse user input. The platform has been operational for several months, accumulating over 240K votes. The article discusses the challenges in evaluating LLMs, the methodology of Chatbot Arena, statistical methods used for evaluation, and the credibility of the platform. It also highlights the platform's unique value, openness, and collaborations with leading model developers. The article emphasizes the need for an open, live evaluation platform based on human preference to better reflect real-world usage.
Ke Bahasa Lain
dari konten sumber
arxiv.org
Wawasan Utama Disaring Dari
by Wei-Lin Chia... pada arxiv.org 03-08-2024
https://arxiv.org/pdf/2403.04132.pdfPertanyaan yang Lebih Dalam