Algorithms & Randomness Center (ARC)
Ashwin Pananjady (Georgia Tech)
Monday, April 26, 2021
Virtual via Bluejeans - 11:00 am
Title: Toward instance-optimal policy evaluation: From lower bounds to algorithms
Abstract: The paradigm of reinforcement learning has now made inroads in a wide range of applied problem domains. This empirical research has revealed the limitations of our theoretical understanding: popular RL algorithms exhibit a variety of behavior across domains and problem instances, and existing theoretical bounds, which are generally based on worst-case assumptions, can often produce pessimistic predictions. An important theoretical goal is to develop instance-specific analyses that help to reveal what aspects of a given problem make it "easy" or "hard", and allow distinctions to be drawn between ostensibly similar algorithms. Taking an approach grounded in nonparametric statistics, we initiate a study of this question for the policy evaluation problem. We show via information-theoretic lower bounds that many popular variants of temporal difference (TD) learning algorithms *do not* exhibit the optimal instance-specific performance in the finite-sample regime. On the other hand, making careful modifications to these algorithms does result in automatic adaptation to the intrinsic difficulty of the problem. When there is function approximation involved, our bounds also characterize the instance-optimal tradeoff between "bias" and "variance" in solving projected fixed-point equations, a general class of problems that includes policy evaluation as a special case.
The talk is based on joint work with Koulik Khamaru, Wenlong Mou, Feng Ruan, Martin Wainwright, and Michael Jordan. No prior knowledge of reinforcement learning will be assumed.
Videos of recent talks are available at: http://arc.gatech.edu/node/121