Toward Efficient Exploration by Large Language Model Agents

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This paper introduces a novel approach to reinforcement learning (RL) that leverages Large Language Models (LLMs) to implement existing RL algorithms, specifically Posterior Sampling for Reinforcement Learning (PSRL). Instead of trying to make LLMs implicitly learn RL strategies through techniques like in-context learning, the authors propose using distinct LLMs to perform the core functions of PSRL: posterior updating, posterior sampling, and optimal policy execution based on samples. Empirical results on natural language tasks like a combination lock problem and Wordle, as well as a simplified RiverSwim environment, suggest this method can achieve data-efficient exploration by explicitly implementing a known algorithm's mechanism for handling uncertainty. However, scaling to more complex stochastic environments and limitations inherited from Thompson sampling highlight areas for future improvement, such as exploring information-directed sampling with LLMs.