Core Concepts
BAHE, a novel hierarchical encoding approach, decouples the representation extraction of atomic behaviors from the learning of behavior interactions, significantly improving the efficiency and effectiveness of LLM-based CTR prediction with long user sequences.
Abstract
The paper proposes the Behavior Aggregated Hierarchical Encoding (BAHE) method to address the efficiency bottleneck of large language models (LLMs) when processing long textual user behavior sequences for click-through rate (CTR) prediction.
Key highlights:
- The efficiency bottleneck arises from the redundant encoding of identical user behaviors across different sequences and the tight coupling between behavior representation extraction and behavior interaction modeling.
- BAHE introduces a hierarchical architecture that decouples these two components.
- Firstly, BAHE employs the pre-trained lower layers of the LLM to extract embeddings of atomic user behaviors and stores them in an offline database. This converts the encoding from token-level to behavior-level, substantially reducing sequence length.
- Subsequently, BAHE utilizes the deeper, trainable layers of the LLM to model the interactions between the retrieved atomic behavior embeddings, generating comprehensive user representations.
- This separation allows the learning of high-level user representations to be independent of low-level behavior encoding, significantly reducing computational complexity.
- Extensive experiments show that BAHE reduces training time and memory usage by 5 times compared to traditional LLM-based CTR models, while also improving the overall CTR performance.
- BAHE has been successfully deployed in a real-world industrial CTR prediction system, enabling daily model updates on 50 million data using 8 A100 GPUs.
Stats
The dataset contains around 50 million CTR records collected over a week, with 6 text features like user bills, searches, and mini-program visits, along with item titles.
Each user sequence has 50 user behaviors, averaging 5 tokens each, summing up to 10 million atomic behaviors.