Lectures in Active Sequential Hypothesis Testing and Adaptive Exploration in Reinforcement Learning - Lecture 4

  • Starts: 4:00 pm on Wednesday, November 19, 2025
  • Ends: 6:00 pm on Wednesday, November 19, 2025
Lecture 4: Instance-dependent lower bounds for Markov Decision Processes (MDPs) and algorithm design In this lecture we extend the BAI problem to tabular RL: identifying a target property (e.g., the best policy) in a controlled Markov chain. Using change-of-measure over transition kernels, we derive KL-based bounds that depend on state-action occupancy measures rather than arm pulls. The resulting characteristic time shows how mixing, reachability, and visitation constraints shape fundamental sample complexity. Lecture notes will be provided in advance.