Lunch Talk: Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models

The University of Toronto Operations Research Group (UTORG) is hosting a lunch talk by Buser Say. The talk is entitled “Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models”.  Lunch and coffee will be provided.  Hope to see you there!

Who: Buser Say, Ph.D. candidate, University of Toronto


When: Wednesday, August 01st @ 12:00pm – 1:00pm

Where: MB101


Bio-sketch: Buser is a Ph.D. candidate at University of Toronto under the supervision of Professor Scott Sanner, and a member of the Data-Driven Decision Making Lab (D3M). Previously, he has completed my BASc. in Industrial Engineering from University of Toronto (2014) with emphasis on Operations Research, and earned my MASc. from University of Toronto (2016) as a member of the Toronto Intelligent Decision Engineering Laboratory (TIDEL). His main research focus is in the application of Operations Research techniques and Deep Neural Networks to our Data-Driven Automated Hybrid Planning framework.

Abstract:  In this paper, we leverage the efficiency of Binarized Neural Networks (BNNs) to learn complex state transition models of planning domains with discretized factored state and action spaces. In order to directly exploit this transition structure for planning, we present two novel compilations of the learned factored planning problem with BNNs based on reductions to Boolean Satisfiability (FDSAT-Plan) as well as Binary Linear Programming (FD-BLP-Plan). Experimentally, we show the effectiveness of learning complex transition models with BNNs, and test the runtime efficiency of both encodings on the learned factored planning problem. After this initial investigation, we present an incremental constraint generation algorithm based on generalized landmark constraints to improve the planning accuracy of our encodings. Finally, we show how to extend the best performing encoding (FD-BLP-Plan+) beyond goals to handle factored planning problems with rewards.