CISE Seminar: Rayadurgam Srikant, Professor of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

December 10, 2021
3:00PM-4:00PM
Location: Zoom

The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation

When the sizes of the state and action spaces are large, solving MDPs can be computationally prohibitive even if the probability transition matrix is known. So in practice, a number of techniques are used to approximately solve the dynamic programming problem, including lookahead, approximate policy evaluation using an m-step return, and function approximation. In a recent paper, (Efroni et al. 2019) studied the impact of lookahead on the convergence rate of approximate dynamic programming. In this talk, we will show that these convergence results change dramatically when function approximation is used in conjunction with lookout and approximate policy evaluation using an m-step return. Specifically, we show that when linear function approximation is used to represent the value function, a certain minimum amount of lookahead and multi-step return is needed for the algorithm to even converge. And when this condition is met, we characterize the finite-time performance of policies obtained using such approximate policy iteration. Our results are presented for two different procedures to compute the function approximation: linear least-squares regression and gradient descent. Joint work with Anna Winnicki, Michael Livesay, and Joseph Lubars.

Rayadurgam Srikant is the Co-Director of C3.ai Digital Transformation Institute and the Fredrick G. and Elizabeth H. Nearing Endowed Professor of Electrical and Computer Engineering and Coordinated Science Lab at the University of Illinois at Urbana-Champaign. His research interests include machine learning and communication networks. He is a winner of the ACM SIGMETRICS Achievement Award, the IEEE Koji Kobayashi Computers and Communication Award, and the IEEE INFOCOM Achievement Award. He has won several best paper awards including the Applied Probability Society’s Best Publication Award, the IEEE INFOCOM Best Paper Award, and the WiOpt Best Paper Award.

Faculty Host: Venkatesh Saligrama
Student Host: Zili Wang