Course Information

Deep reinforcement learning (DeepRL) is an emerging research field that has made tremendous advances in the last few years. Most notable among them are: AlphaGo beating the world champion in the ancient game of Go, Deep Q-Networks that achieved human level performance on a wide range of computer games, and many many advances in robotics.

RL deals with the problem of how to make broadly competent decisions in a wide range of dynamic and uncertain environments. Since the problem of decision making under uncertainty is prevalent in almost every field, the set of algorithmic and modeling tools provided by RL are broadly applicable. This course aims to provide an overview of the recent developments in RL combined with advances in deep learning. We plan to take a broad perspective on RL as a problem setting and cover a wide range of methods: model-free RL, model-based RL, imitation learning, search and trajectory optimization. The expected course outcomes are:

Understand a modern take on RL (+ deep learning).
Be able to spot applications of RL in a wide variety of settings.
Develop sufficient proficiency to be able to perform research in this field.

Instructors

Aravind Rajeswaran, Office Hours: 3.30pm-4.30pm Mondays CSE 021
Kendall Lowrey, Office Hours: 10am-11am Thursdays CSE 220

Logistics

Time: Monday + Wednesday 2:00pm to 3:20pm
Venue: THO 134
Slack: https://uwcse599g1.slack.com

Prerequisites

The course assumes a strong footing in the following, comparable to taking a graduate or advanced undergraduate level course:

Linear algebra (refresher notes by Zico Kolter and Chuong Do)
Machine learning (see CSE446 or CSE546)
Proficiency in python (quick tutorial). In addition, homeworks and project report should be typeset in LaTeX (quick tutorial)

Homeworks and Grading

We will have 2 (or 3) homeworks and a course project. The homeworks are intended primarily to get you started with resources for studying RL questions and algorithms, and hence are expected to be: (a) fairly light; (b) somewhat open-ended. The course project must be construed as the principal outcome of this course.

Project: 60%
Homeworks: 30%
Discussions and participation: 10%

Acknowledgements

We thank Profs. Emanuel Todorov and Sham Kakade for providing feedback and overseeing the course.
We thank all the robotics and ML faculty at UW for providing excellent feedback and supporting the course.
We thank Sergey Levine and his students for sharing the course material used at Berkeley.
The course homepage is based on the design of DLSys course at UW. We thank the course staff of DLSys.