The paper explores the use of large language models (LLMs) for data preprocessing (DP) through instruction-tuning, focusing on the creation of the Jellyfish dataset. It discusses the challenges in developing generic solutions for DP tasks and highlights the strengths of LLMs in processing natural language. The experiments show that Jellyfish models, particularly Jellyfish-13B, outperform non-LLM methods on seen and unseen datasets, showcasing their effectiveness in solving DP tasks beyond what they are tuned for. The impact of tuning with single-task data and multi-task data on DP performance is analyzed, revealing insights into the importance of different tasks in enhancing overall performance.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies