FRP Reinforcement Learning Symposium

Date: Friday, May 10th, 2024

Time: 10:00 am – 5:30 pm ET

Location (In-person Only): Boston University, Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 1750 (17th floor), Boston, MA

Register Here

Symposium Mission: Reinforcement Learning (RL), a field in AI inspired by learning mechanisms in biological systems, has emerged as a powerful generalized paradigm for a diverse set of applications, particularly those requiring adaptive reasoning, such as large language model training (e.g., chatGPT), education and rehabilitation technologies, transportation and energy-grid optimization, robotics, and more. However, its impact has thus far been limited due to optimization, implementation, efficiency, and safety challenges. Through invited talks, panels, and discussions, this symposium will uncover fundamental challenges in reinforcement learning frameworks and directions toward addressing them, particularly toward closing the current gap between theory, AI model training, and real-world applications and users. 

The Symposium is organized by the Optimal Bio-Inspired Design of Holistic Rehabilitation Systems Focused Research Program, which is led by BU College of Engineering Professors Eshed Ohn-Bar, Assistant Professor (ECE, CS) and Alex Olshevsky,  Associate Professor (ECE, SE, CS).

Detailed Program & Speakers:

10:00AM 10:15AM Welcome & Opening remarks: BU College of Engineering Professors Eshed Ohn-Bar, Assistant Professor (ECE, CS) and Alex Olshevsky,  Associate Professor (ECE, SE, CS)
10:15AM 11:00AM Speaker: Antonin Raffin, Research Engineer in Robotics and Machine Learning, German Aerospace Center (DLR)

Talk Title: Designing and Running Real-World RL Experiments

Abstract: This talk covers the challenges and best practices for designing and running real-world reinforcement learning (RL) experiments. The idea is to walk through the different steps of RL experimentation (task design, choosing the right algorithm, implementing safety layers) and also provide practical advice on how to run experiments and troubleshoot common problems.

Bio: Antonin Raffin is a research engineer in robotics and machine learning at the German Aerospace Center (DLR). Previously, he worked on state representation learning in the ENSTA robotics lab (U2IS), where he created the Stable-Baselines library together with Ashley Hill. His research focus is now on applying reinforcement learning directly to real robots, for which he continues to maintain the Stable-Baselines3 library

11:00AM 11:45AM Speaker: Alec Koppel, AI Research Lead/VP in the Multiagent Learning and Simulation Group within Artificial Intelligence Research, JP Morgan Chase & Co.

Talk Title: Exploration Incentives in Model-Based Reinforcement Learning

Abstract: Reinforcement Learning (RL) is a form of stochastic adaptive control in which one seeks to estimate parameters of a controller only from data, and has gained popularity in recent years. However, technological applications of RL are often hindered astronomical sample complexity demanded by their training. Model-based reinforcement learning is known to provide a practically sample efficient approach; however, its performance certificates in terms of Bayesian regret often require restrictive Gaussian assumptions, and may fail to distinguish between vastly different performance in sparse or dense reward settings. Motivated by these gaps, we propose a way to make MBRL, namely, Posterior Sampling combined with Model-Predictive Control (MPC), computationally efficient for mixture distributions based a novel application of integral probability metrics and kernelized Stein discrepancy.  Then, we build upon this insight to pose a new exploration incentive called Stein Information Gain, which permits us to come up with a variant of information-directed sampling (IDS) whose exploration incentive is evaluable in closed-form. Bayesian and information-theoretic regret bounds of the proposed algorithms are presented. Finally, experimental validation on some environments from OpenAI Gym and Deepmind Control Suite illuminates the merits of the proposed methodologies in the sparse-reward setting.

Bio: Alec Koppel is an AI Research Lead (Senior Scientist) at JP Morgan AI Research in the Multi-agent Learning and Simulation Group. From 2021-2022, he was Research Scientist at Amazon within Supply Chain Optimization Technologies (SCOT). From 2017-2021, he was a Research Scientist with the U.S. Army Research Laboratory in the Computational and Information Sciences Directorate (CISD) from 2017-2021. He completed his Master’s degree in Statistics and Doctorate in Electrical and Systems Engineering, both at the University of Pennsylvania (Penn) in August of 2017. Before coming to Penn, he completed his Master’s degree in Systems Science and Mathematics and Bachelor’s Degree in Mathematics, both at Washington University in St. Louis (WashU), Missouri. He is a recipient of the 2016 UPenn ESE Dept. Award for Exceptional Service, an awardee of the Science, Mathematics, and Research for Transformation (SMART) Scholarship, a co-author of Best Paper Finalist at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers, a finalist for the ARL Honorable Scientist Award 2019, an awardee of the 2020 ARL Director’s Research Award Translational Research Challenge (DIRA-TRC), a 2020 Honorable Mention from the IEEE Robotics and Automation Letters, and mentor to the 2021 ARL Summer Symposium Best Project Awardee. His academic work focuses on approximate Bayesian inference, reinforcement learning, and decentralized optimization. He has worked on applications spanning robotics and autonomy; vendor selection and sourcing; and financial markets of various types.

11:45AM 12:30PM Speaker: Kaiqing Zhang, Assistant Professor, Electrical and Computer Engineering (ECE), Institute for Systems Research (ISR), University of Maryland, College Park

Talk Title: Independent Learning in Stochastic Games: Where Strategic Decision-Making Meets RL

Abstract: Reinforcement learning (RL) has recently achieved great successes in many sequential decision-making applications. Many of the forefront applications of RL involve the decision-making of multiple strategic agents, e.g., playing chess and Go games, autonomous driving, and robotics. Unfortunately, classical RL framework is inappropriate for multi-agent learning as it assumes an agent’s environment is stationary and does not take into account the adaptive nature of behavior. In this talk, I focus on stochastic games for multi-agent reinforcement learning in dynamic environments, and develop independent learning dynamics for stochastic games: each agent is myopic and chooses best-response type actions to other agents’ strategies independently, meaning without any coordination with her opponents. I will present our independent learning dynamics that guarantee convergence in stochastic games, including for both two-player zero-sum, identical-interest, and multi-player zero-sum settings. Time-permitting, I will also discuss our other results along the line of learning in stochastic games, including both the positive ones on the sample and iteration complexity of certain (partially observable) multi-agent RL algorithms, and negative ones on the computation complexity of general-sum stochastic games that leads to a sharp difference between single-agent and multi-agent sequential decision-making.

Bio: Kaiqing Zhang is currently an Assistant Professor at the Department of Electrical and Computer Engineering (ECE) and the Institute for Systems Research (ISR), at the University of Maryland, College Park. He is also a member of the Maryland Robotics Center (MRC), UMIACS, and Center for Machine Learning. During the deferral time before joining Maryland, he was a postdoctoral scholar affiliated with LIDS and CSAIL at MIT, and a Research Fellow at Simons Institute for the Theory of Computing at Berkeley. He finished his Ph.D. from the Department of ECE and CSL at the University of Illinois at Urbana-Champaign (UIUC). He also received M.S. in both ECE and Applied Math from UIUC, and B.E. from Tsinghua University. His research interests lie broadly in Control and Decision Theory, Game Theory, Robotics, Reinforcement/Machine Learning, Computation, and their intersections. He serves as area chairs for ICML/NeurIPS/ICLR/UAI, and is the recipient of several awards and fellowships, including Hong, McCully, and Allen Fellowship, Simons-Berkeley Research Fellowship, CSL Thesis Award, IEEE Robotics and Automation Society TC Best-Paper Award, and ICML Outstanding Paper Award.

12:30PM 1:30PM Lunch
1:30PM 2:15PM Speaker: Alejandro Ribeiro, Professor, Electrical and Systems Engineering (ESE), University of Pennsylvania

Talk Title: Constrained Reinforcement Learning

Abstract: Constrained reinforcement learning (CRL) involves multiple rewards that must individually accumulate to given thresholds. CRL arises naturally in cyberphysical systems which are most often specified by a set of requirements. We explain in this talk that CRL problems have null duality gaps even though they are not convex. These facts imply that they can be solved in the dual domain but that standard dual gradient descent algorithms may fail to find optimal policies. We circumvent this limitation with the introduction of a state augmented algorithm in which Lagrange multipliers are incorporated in the state space. We show that state augmented algorithms sample from stochastic policies that achieve target rewards. We further introduce resilient CRL as a mechanism to relax constraints when requirements are overspecified. We illustrate results and implications with a brief discussion of safety constraints.

Bio: Alejandro Ribeiro received the B.Sc. degree in Electrical Engineering from the Universidad de la República Oriental del Uruguay in 1998 and the M.Sc. and Ph.D. degrees in electrical engineering from the Department of Electrical and Computer Engineering at the University of Minnesota in 2005 and 2007. He joined the University of Pennsylvania (Penn) in 2008 where he is currently Professor of Electrical and Systems Engineering. His research is in wireless autonomous networks, machine learning on network data and distributed collaborative learning. Papers coauthored by Dr. Ribeiro received the 2022 IEEE Signal Processing Society Best Paper Award, the 2022 IEEE Brain Initiative Student Paper Award, the 2021 Cambridge Ring Publication of the Year Award, the 2020 IEEE Signal Processing Society Young Author Best Paper Award, the 2014 O. Hugo Schuck best paper award, and paper awards at EUSIPCO 2021, ICASSP 2020, EUSIPCO 2019, CDC 2017, SSP Workshop 2016, SAM Workshop 2016, Asilomar SSC Conference 2015, ACC 2013, ICASSP 2006, and ICASSP 2005. His teaching has been recognized with the 2017 Lindback award for distinguished teaching and the 2012 S. Reid Warren, Jr. Award presented by Penn’s undergraduate student body for outstanding teaching. Dr. Ribeiro received an Outstanding Researcher Award from Intel University Research Programs in 2019.  He is a Penn Fellow class of 2015, a Fulbright scholar class of 2003, husband to Gabriela, and father to Miranda, Guillermo, and Ariel.

2:15PM 3:00PM Speaker: Bahman Gharesifard, Professor, Electrical & Computer Engineering, University of California, Los Angeles

Talk Title: Single timescale actor critic: a small-gain analysis 

Abstract: We consider the used-in-practice setting of actor-critic where proportional step-sizes are used for both the actor and the critic, with only one critic update with a single sample from the stationary distribution per actor step. Using a small-gain analysis, we prove convergence to a stationary point, with a sample complexity that improves the state of the art. The key technical challenge is in connecting the actor-critic to a perturbed gradient descent, which is often obtained by allowing for infinitely many critic steps and is not possible in single-time scale settings. This is a joint work with Alex Olshevsky at Boston University. 

Bio: Bahman Gharesifard is currently a Professor and Area Director for Signals and Systems at the Electrical & Computer Engineering Department, University of California, Los Angeles. He was an Associate Professor, from 2019 to 2021, and an Assistant Professor, from 2013 to 2019, with the Department of Mathematics and Statistics at Queen’s University. He was an Alexander von Humboldt research fellow with the Institute for Systems Theory and Automatic Control at the University of Stuttgart in 2019-2020. He held postdoctoral positions with the Department of Mechanical and Aerospace Engineering at University of California, San Diego 2009-2012 and with the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign from 2012- 2013. He received the 2019 CAIMS-PIMS Early Career Award, a Humboldt research fellowship for experienced researchers from the Alexander von Humboldt Foundation in 2019, an NSERC Discovery Accelerator Supplement in 2019, and the SIAG/CST Best SICON Paper Prize 2021, and the Canadian Society for Information Theory Best Paper Award in 2022. He has served on the Conference Editorial Board of the IEEE Control Systems Society and IEEE Control System Letters, and is currently an Associate Editor for the IEEE Transactions on Network Control Systems. His research interests include systems and control, distributed control, distributed optimization, machine learning, social and economic networks, game theory, geometric control theory, geometric mechanics, and applied Riemannian geometry.

3:00PM 3:45PM Speaker: Na (Lina) Li, Winokur Family Professor, Electrical Engineering and Applied Mathematics, Harvard University School of Engineering and Applied Sciences (SEAS)

Talk Title: Representation-based Learning and Control for Dynamical Systems

Abstract: The explosive growth of machine learning and data-driven methodologies have revolutionized numerous fields. Yet, the translation of these successes to the domain of dynamical physical systems remains a significant challenge. Closing the loop from data to actions in these systems faces many difficulties, stemming from the need for sample efficiency and computational feasibility, along with many other requirement such as verifiability, robustness, and safety. In this talk, we bridge this gap by introducing innovative representations to develop nonlinear stochastic control and reinforcement learning methods. Key in the representation is to  represent the stochastic, nonlinear  dynamics linearly onto a nonlinear feature space. We present a comprehensive framework to develop control and learning strategies which achieve efficiency, safety, and robustness with provable performance. We also show how the representation could be used to close the sim-to-real gap.

Bio: Na (Lina) Li  is a Winokur Family Professor of Electrical Engineering and Applied Mathematics at Harvard University.  She received her Bachelor’s degree in Mathematics from Zhejiang University in 2007 and Ph.D. degree in Control and Dynamical systems from California Institute of Technology in 2013. She was a postdoctoral associate at the Massachusetts Institute of Technology 2013-2014.  She has held a variety of short-term visiting appointments including the Simons Institute for the Theory of Computing, MIT, and Google Brain. Her research lies in the control, learning, and optimization of networked systems, including theory development, algorithm design, and applications to real-world cyber-physical societal system.  She has been an associate editor for IEEE Transactions on Automatic Control, Systems & Control Letters, IEEE Control Systems Letters, and served on the organizing committee for a few conferences.  She received the NSF career award (2016), AFSOR Young Investigator Award (2017), ONR Young Investigator Award(2019),  Donald P. Eckman Award (2019), McDonald Mentoring Award (2020), the IFAC Manfred Thoma Medal (2023), along with some other awards.

3:45PM 4:30PM Speaker: Daniel Russo, Associate Professor, Decisions, Risk, and Operations Division, Columbia Business School

Title: Posterior Sampling by Autoregressive Generation

Abstract: Conventionally trained neural networks excel at prediction but often struggle to model uncertainty in their own predictions. We explore this challenge in the cold-start content exploration problem for recommendation systems. We present a scalable approach to Bayesian uncertainty quantification by posing it as a problem of autoregressive generative modeling.  First, we pre-train a generative model to predict the next user’s response to a recommended item based on that item’s features and previous recommendation responses for the item from other users. At inference time, our algorithm makes item recommendations based on limited previous responses and autoregressively generated hypothetical future responses. Far from a heuristic, we synthesize insights from the literature to show our method is a novel implementation of Thompson (posterior) sampling, a prominent bandit algorithm. We prove that the algorithm has low regret whenever the pre-trained autoregressive model has near optimal prediction loss. We then empirically demonstrate the scalability of our approach on a news recommendation problem where text features are required for the best performance.

Bio: Daniel Russo is a Philip H. Geier Jr. Associate Professor in the Decision, Risk, and Operations division of Columbia Business School. His research lies at the intersection of statistical machine learning and online decision making, mostly falling under the broad umbrella of reinforcement learning. His work has been recognized by the Frederick W. Lanchester Prize, an INFORMS Junior Faculty Interest Group Best Paper Award, and first place in the George Nicholson Student Paper Competition. Daniel serves as an associate editor at the journals Operations Research, Management Science, and Stochastic Systems. Outside academia, he works with Spotify’s to apply reinforcement learning and large language models in audio recommendations.

4:30PM 5:15PM Speaker: Amin Karbasi, Associate Professor, Electrical Engineering & Computer Science, Yale University

Talk Title: Replicability in Interactive Learning

Bio: Amin Karbasi is currently an associate professor of Electrical Engineering, Computer Science, and Statistics & Data Science at Yale University. He is also a staff scientist at Google NY. He has been the recipient of the National Science Foundation (NSF) Career Award, Office of Naval Research (ONR) Young Investigator Award, Air Force Office of Scientific Research (AFOSR) Young Investigator Award, DARPA Young Faculty Award, National Academy of Engineering Grainger Award, Amazon Research Award, Nokia Bell-Labs Award, Google Faculty Research Award, Microsoft Azure Research Award, Simons Research Fellowship, and ETH Research Fellowship. His work has also been recognized with a number of paper awards, including Graphs in Biomedical Image Analysis (GRAIL), Medical Image Computing and Computer Assisted Interventions Conference (MICCAI), International Conference on Artificial Intelligence and Statistics (AISTATS), IEEE ComSoc Data Storage, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), ACM SIGMETRICS, and IEEE International Symposium on Information Theory (ISIT). His Ph.D. thesis received the Patrick Denantes Memorial Prize from the School of Computer and Communication Sciences at EPFL, Switzerland.

5:15PM 5:30PM Closing Remarks: Eshed Ohn-Bar, Assistant Professor (ECE, CS) and Alex Olshevsky,  Associate Professor (ECE, SE, CS)


Registration Form