Concepts de base
Generative models, including auto-encoding, auto-regressive, adversarial, and diffusion models, have significantly enhanced the capabilities of modern recommender systems by enabling them to model and sample from complex data distributions beyond just user-item interactions.
Résumé
The content provides a comprehensive overview of how generative models have advanced the field of recommender systems (RS). It covers the following key aspects:
-
Interaction-Driven Recommendation:
- Auto-encoding models like Variational Autoencoders (VAEs) are used for collaborative filtering, sequential recommendation, and slate generation.
- Auto-regressive models like Recurrent Neural Networks (RNNs) and self-attentive models are applied to session-based, sequential, and bundle recommendations.
- Generative Adversarial Networks (GANs) are used for selecting informative training samples, synthesizing user preferences, and generating recommendation lists.
- Diffusion models are leveraged to learn user future interaction probabilities and augment training sequences.
-
Large Language Models (LLMs) in Recommendation:
- Encoder-only LLM-based recommendation approaches use dense retrieval or item-preference fusion for rating prediction and top-k recommendation.
- Generative recommendation with LLMs explores zero-shot, few-shot, and fine-tuned/prompt-tuned approaches for generating recommendations, explanations, and ratings.
- Retrieval-augmented generation and LLM-based input generation combine LLMs with traditional RS components.
- Conversational recommendation utilizes LLMs for multi-turn, multi-task dialogues with recommendation, explanation, and preference elicitation.
-
Multimodal Recommendation:
- Motivations and challenges for developing multimodal RS are discussed, including the cold-start problem, complex user requests, and virtual try-on capabilities.
- Contrastive learning approaches like CLIP and ALBEF are used to align text and image modalities.
- Generative multimodal models leverage VAEs, diffusion models, and multimodal LLMs for tasks like text-to-image generation.
-
Evaluation of Gen-RecSys:
- Offline evaluation considers accuracy metrics, computational efficiency, and benchmarks.
- Online and longitudinal evaluations measure real-world performance and long-term impacts.
- Conversational evaluation uses task-specific and objective-specific metrics, as well as human evaluation.
- Evaluating for potential harms considers content, privacy, autonomy, transparency, fairness, and societal effects.
The survey highlights the significant advancements in Gen-RecSys and the need for holistic evaluation frameworks to assess their performance and potential impacts.