• Starts: 2:30 pm on Monday, February 23, 2026
  • Ends: 4:30 pm on Monday, February 23, 2026

ECE PhD Thesis Defense: Haoxing Tian

Title: Finite-time Analysis for Distributed and Neural Reinforcement Learning

Presenter: Haoxing Tian

Advisor: Professor Yannis Paschalidis and Professor Alex Olshevsky

Chair: TBA

Committee: Professor Yannis Paschalidis, Professor Alex Olshevsky, Professor Aldo Pacchiano, Professor Christos Cassandras

Google Scholar Link: https://scholar.google.com/citations?user=g8jTnD0AAAAJ&hl=en&oi=ao

Abstract: This dissertation develops a modular finite-sample theory for reinforcement learning under two realities of modern practice that challenge classical analysis: nonlinear function approximation and trajectory-based, temporally dependent data, including distributed and continuing settings. The central objective is to provide reproducible end-to-end convergence guarantees that explicitly separate the roles of approximation error, Markov dependence, and algorithm design. The theory is organized around four tightly connected components—policy evaluation, policy improvement, distributed learning, and continuous learning—with bounds that remain interpretable across regimes.

For policy evaluation, the dissertation analyzes neural temporal-difference learning through a gradient-splitting viewpoint that decomposes the mean update into a dominant descent term plus a curvature remainder, yielding Lyapunov recursions that cleanly isolate descent, stochastic error under Markov sampling, and an approximation floor. For policy improvement, it studies neural actor–critic methods and derives finite-sample stationarity guarantees by combining actor progress with critic tracking and closing the resulting feedback loop via a small-gain argument, making the required balance among stepsizes, approximation, and mixing explicit. For distributed learning, it establishes finite-sample guarantees for TD($\lambda$) with one-shot averaging and identifies regimes with linear speedup in the dominant error term while remaining communication-efficient under Markov sampling. For continuous learning, it develops and analyzes average-reward TD-style algorithms that enforce a steady-state normalization using either a double-chain construction or a single-chain auxiliary recursion, proving finite-sample convergence rates that expose dependence on stepsizes, feature conditioning, and mixing time.

Location:
PHO 339