The paper explores the use of large language models (LLMs) for data preprocessing (DP) through instruction-tuning, focusing on the creation of the Jellyfish dataset. It discusses the challenges in developing generic solutions for DP tasks and highlights the strengths of LLMs in processing natural language. The experiments show that Jellyfish models, particularly Jellyfish-13B, outperform non-LLM methods on seen and unseen datasets, showcasing their effectiveness in solving DP tasks beyond what they are tuned for. The impact of tuning with single-task data and multi-task data on DP performance is analyzed, revealing insights into the importance of different tasks in enhancing overall performance.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Haochen Zhan... klo arxiv.org 03-14-2024
https://arxiv.org/pdf/2312.01678.pdfSyvällisempiä Kysymyksiä