Lectures in Active Sequential Hypothesis Testing and Adaptive Exploration in Reinforcement Learning - Lecture 4
- Starts: 4:00 pm on Wednesday, November 19, 2025
- Ends: 6:00 pm on Wednesday, November 19, 2025
Lecture 4: Instance-dependent lower bounds for Markov Decision Processes (MDPs) and algorithm design
In this lecture we extend the BAI problem to tabular RL: identifying a target property (e.g., the best policy) in a controlled Markov chain. Using change-of-measure over transition kernels, we derive KL-based bounds that depend on state-action occupancy measures rather than arm pulls. The resulting characteristic time shows how mixing, reachability, and visitation constraints shape fundamental sample complexity.
Lecture notes will be provided in advance.
- Registration:
- https://forms.gle/ohC8KJPPbBt6jrQH8