This survey provides a comprehensive review of the emerging field of integrating LLMs into the RL paradigm, known as LLM-enhanced RL. It proposes a structured taxonomy to systematically categorize the functionalities of LLMs within the classical agent-environment interaction, including roles as information processors, reward designers, decision-makers, and generators.
For each role, the survey summarizes the methodologies, analyzes the specific RL challenges that are mitigated, and provides insights into future directions. As information processors, LLMs can extract meaningful feature representations or translate natural language-based information to formal specifications to reduce the burden on RL agents. As reward designers, LLMs can implicitly provide reward values or explicitly generate executable reward function codes based on their understanding of task objectives and observations. As decision-makers, LLMs can directly generate actions or indirectly provide action candidates and reference policies to guide the RL agent's decision-making process. As generators, LLMs can serve as world model simulators to synthesize accurate trajectories for model-based RL or provide policy explanations to improve interpretability.
The survey also discusses the overall characteristics of LLM-enhanced RL, including its ability to handle multi-modal information, facilitate multi-task learning and generalization, improve sample efficiency, handle long-horizon tasks, and generate reward signals. Finally, it analyzes the potential applications, opportunities, and challenges of this interdisciplinary field to provide a roadmap for future research.
In eine andere Sprache
aus dem Quellinhalt
arxiv.org
Wichtige Erkenntnisse aus
by Yuji Cao,Hua... um arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.00282.pdfTiefere Fragen