ORIE Colloquium: Qiaomin Xie (Cornell University), April 7, 2021
From Henry Lam
Title: Power of Monte Carlo Methods: Tree-Search, Convergence and Stability
Monte Carlo methods, aka simulation-based methods, is a powerful paradigm for online planning/decision-making. In this talk, I will discuss my work on understanding Monte Carlo Reinforcement Learning methods and applying them to challenging network control problems.
First, I will focus on Monte-Carlo Tree Search (MCTS), which enjoys remarkable empirical success but lacks theoretical understanding. We provide a complete and rigorous non-asymptotic analysis of MCTS. Our analysis develops a general framework based on a hierarchy of bandits, and highlights the importance of using a non-standard confidence bound (also used by AlphaGo) for convergence. Our results show that MCTS provides high-quality samples for supervised learning architectures, which together serve as a strong policy improvement operator. I will further discuss extension to Markov games, and new variants of MCTS for continuous action space using adaptive quantization.
In the second part of the talk, I will discuss applications in stochastic network control problems. These problems feature long-run average-cost criteria and unbounded state spaces, making most existing RL approaches inapplicable. For these problems, stability is a key performance metric: it is the first step to optimality, and a necessary subroutine for many algorithms. We argue that the local policy improvement property of Monte Carlo methods makes them well-suited for stabilizing the system on-the-fly. We develop an online planning algorithm that provably achieves the stability property under a Lyapunov function assumption. Given a stable policy, we further show that a model-based RL method converges to the optimal policy. Our method demonstrates promising performance in a variety of network control problems including routing, dynamic server allocation and switch scheduling.
Qiaomin Xie is a visiting assistant professor in the School of Operations Research and Information Engineering (ORIE) at Cornell. Prior to that, she was a postdoctoral researcher with LIDS at MIT, and was a research fellow at the Simons Institute during Fall 2016. Qiaomin received her Ph.D. degree in Electrical and Computing Engineering from University of Illinois Urbana Champaign, and her B.E. degree in Electronic Engineering from Tsinghua University. Her research interests lie in the fields of applied probability, reinforcement learning and stochastic networks. She is the recipient of Google System Research Award 2020, UIUC CSL PhD Thesis Award 2017 and the best paper award from IFIP Performance Conference 2011.