The article introduces Chatbot Arena, an open platform for evaluating LLMs based on human preferences. It employs a pairwise comparison approach and crowdsourcing to gather diverse user input. The platform has been operational for several months, accumulating over 240K votes. The article discusses the challenges in evaluating LLMs, the methodology of Chatbot Arena, statistical methods used for evaluation, and the credibility of the platform. It also highlights the platform's unique value, openness, and collaborations with leading model developers. The article emphasizes the need for an open, live evaluation platform based on human preference to better reflect real-world usage.
Başka Bir Dile
kaynak içeriğinden
arxiv.org
Önemli Bilgiler Şuradan Elde Edildi
by Wei-Lin Chia... : arxiv.org 03-08-2024
https://arxiv.org/pdf/2403.04132.pdfDaha Derin Sorular