Centrala begrepp
Efficiently optimize Transformer models for long-text classification on limited GPU resources.
Sammanfattning
The content discusses the challenges faced by NLP researchers in Indonesia due to limited computational resources when fine-tuning Transformer models for long-text classification. The study investigates the impact of different pre-trained models, text-shortening strategies, and hyperparameter optimization procedures to achieve optimal performance with constrained resources.
Index:
- Abstract
- Researchers face limitations in hyperparameter optimization (HPO) for long-text classification using free computational services.
- Introduction
- Fine-tuning Transformers for long texts is computationally intensive and discourages optimal results due to quadratic complexity.
- Related Work
- Limited studies exist on Indonesian long-text classification using Transformer models.
- Methodology
- Dataset: IndoSum with 18k news articles used for benchmarking.
- Tokenization Output Length Investigation: Recommending monolingual models with minimal additional token length.
- Shortening Strategies: Removing stopwords improves performance significantly.
- Truncation Strategy: 256-token sequences outperform 512-token sequences.
- Result and Discussion
- Tokenization output length impacts model performance; removing stopwords enhances it significantly.
- Conclusion and Future Work
- Recommendations include using efficient Indonesian models and optimizing text-shortening strategies further.
Statistik
"Using the best hack found, we then compare 512, 256, and 128 tokens length."
"Most tokenizers of Indonesian models produce 10%-14% more tokens on average."
"Removing stopwords while keeping punctuation and low-frequency words is the best hack."
Citat
"The findings could help developers to efficiently pursue optimal performance of the models using limited resources."