Kernkonzepte
The author introduces the Yi model family, highlighting its strong multi-dimensional capabilities achieved through data quality and engineering efforts. The performance of Yi models is attributed to scalable super-computing infrastructure and classical transformer architecture.
Zusammenfassung
The Yi model family, developed by 01.AI, showcases advanced language and multimodal capabilities through various models like chat models, vision-language models, and more. The models achieve high performance on benchmarks like MMLU and demonstrate strong human preference rates. Data quality plays a crucial role in the success of Yi models, with extensive data processing and cleaning pipelines ensuring high-quality training data. Pretraining involves constructing a massive corpus of English and Chinese tokens, while finetuning focuses on meticulously curated instruction datasets. The architecture of Yi models follows standard Transformer implementations with unique modifications for improved performance. The capability extension includes long context modeling, vision-language adaptation, and depth upscaling to enhance model performance further.
Statistiken
Our base models achieve strong performance on benchmarks like MMLU.
For pretraining, we construct 3.1 trillion tokens of English and Chinese corpora.
Finetuning involves polishing a small-scale instruction dataset over multiple iterations.
The Yi model uses byte-pair encoding (BPE) with a vocabulary size of 64,000.
Model configurations include hidden sizes, Q-heads, KV-heads, layers, pretrain sequence length, and max learning rates.
Zitate
"Our base models achieve strong performance on a wide range of benchmarks like MMLU."
"For pretraining, we construct 3.1 trillion tokens of English and Chinese corpora using a cascaded data deduplication and quality filtering pipeline."