Best AI papers explained

A podcast by Enoch H. Kang

534 Episodes

RLAD: Training LLMs to Discover Abstractions
Published: 10/29/2025
How to Train Your Advisor: Steering Black-Box LLMs with ADVISOR MODELS
Published: 10/29/2025
Self-improving LLM agents at Test-Time
Published: 10/27/2025
KL-Regularized Reinforcement Learning is designed to Mode Collapse
Published: 10/27/2025
How do LLMs use their depth?
Published: 10/27/2025
Thought Communication in Multiagent Collaboration
Published: 10/27/2025
Reasoning with Sampling: Base Models Outperform RL
Published: 10/26/2025
Continual Learning via Sparse Memory Finetuning
Published: 10/26/2025
Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences
Published: 10/24/2025
The Coverage Principle: How Pre-Training Enables Post-Training
Published: 10/24/2025
The Era of Real-World Human Interaction: RL from User Conversations
Published: 10/24/2025
Agent Learning via Early Experience
Published: 10/24/2025
Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL
Published: 10/22/2025
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Published: 10/22/2025
A Definition of AGI
Published: 10/22/2025
Provably Learning from Language Feedback
Published: 10/21/2025
In-Context Learning for Pure Exploration
Published: 10/21/2025
On the Role of Preference Variance in Preference Optimization
Published: 10/20/2025
Training LLM Agents to Empower Humans
Published: 10/20/2025
Richard Sutton Declares LLMs a Dead End
Published: 10/20/2025

2 / 27

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

534 Episodes

RLAD: Training LLMs to Discover Abstractions

How to Train Your Advisor: Steering Black-Box LLMs with ADVISOR MODELS

Self-improving LLM agents at Test-Time

KL-Regularized Reinforcement Learning is designed to Mode Collapse

How do LLMs use their depth?

Thought Communication in Multiagent Collaboration

Reasoning with Sampling: Base Models Outperform RL

Continual Learning via Sparse Memory Finetuning

Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

The Coverage Principle: How Pre-Training Enables Post-Training

The Era of Real-World Human Interaction: RL from User Conversations

Agent Learning via Early Experience

Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL

Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior

A Definition of AGI

Provably Learning from Language Feedback

In-Context Learning for Pure Exploration

On the Role of Preference Variance in Preference Optimization

Training LLM Agents to Empower Humans

Richard Sutton Declares LLMs a Dead End