Lunch Talk: Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach

The University of Toronto Operations Research Group (UTORG) is hosting a lunch talk by Michael Gimelfarb. The talk is entitled “Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach”.  Lunch and coffee will be provided.  Hope to see you there!

Who: Michael Gimelfarb, Ph.D. candidate, University of Toronto


When: Thursday, September 20th @ 12:00pm – 1:00pm

Where: MB101


Bio-sketch: Michael Gimelfarb is a full-time PhD student in MIE since September 2017, supervised jointly by Professor Chi-Guhn Lee and Professor Scott Sanner. He received his BBA in Finance from the Schulich School of Business in 2014, and his MASc from MIE in 2016, where his thesis focused on the theoretical analysis of the Thompson sampling algorithm applied to queuing control problems. His current research focuses on the application of Bayesian methods and deep learning techniques to reinforcement learning problems. Some of his current and recent work includes reward shaping, decision tree classification using bandits, and automated curriculum learning.

Abstract:  Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms. Typically, such information includes an estimate of the optimal value function and is often provided by a human expert or other sources of domain knowledge. However, this information is often biased or inaccurate and can mislead many reinforcement learning algorithms. In this paper, we apply Bayesian Model Combination with multiple experts in a way which learns to trust the best combination of experts as training progresses. This approach is both computationally efficient and general, and is shown numerically to improve convergence of various reinforcement learning algorithms across many domains.