Lectures in Active Sequential Hypothesis Testing and Adaptive Exploration in Reinforcement Learning - Lecture 5
- Starts: 4:00 pm on Monday, November 24, 2025
- Ends: 6:00 pm on Monday, November 24, 2025
Lecture 5: Instance-dependent lower bounds for Markov Decision Processes (MDPs) and algorithm design
Building on the sample complexity bounds derived in the previous lecture, we will design an algorithm that is asymptotically optimal in the confidence parameter. We will see how the proof for Markov Decision Processes is more challenging than classical Bandit models, and what are the critical aspects in the proof of the sample complexity upper bound.
Lecture notes will be provided in advance.
- Registration:
- https://forms.gle/ohC8KJPPbBt6jrQH8