The article introduces Chatbot Arena, an open platform for evaluating LLMs based on human preferences. It employs a pairwise comparison approach and crowdsourcing to gather diverse user input. The platform has been operational for several months, accumulating over 240K votes. The article discusses the challenges in evaluating LLMs, the methodology of Chatbot Arena, statistical methods used for evaluation, and the credibility of the platform. It also highlights the platform's unique value, openness, and collaborations with leading model developers. The article emphasizes the need for an open, live evaluation platform based on human preference to better reflect real-world usage.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Wei-Lin Chia... lúc arxiv.org 03-08-2024
https://arxiv.org/pdf/2403.04132.pdfYêu cầu sâu hơn