Machine Learning Systems (Spring 2022)

Course Description

The recent success of AI has been in large part due in part to advances in hardware and software systems. These systems have enabled training increasingly complex models on ever larger datasets. In the process, these systems have also simplified model development, enabling the rapid growth in the machine learning community. These new hardware and software systems include a new generation of GPUs and hardware accelerators (e.g., TPU), open source frameworks such as Theano, TensorFlow, PyTorch, MXNet, Apache Spark, Clipper, Horovod, and Ray, and a myriad of systems deployed internally at companies just to name a few. At the same time, we are witnessing a flurry of ML/RL applications to improve hardware and system designs, job scheduling, program synthesis, and circuit layouts.

In this course, we will describe the latest trends in systems designs to better support the next generation of AI applications, and applications of AI to optimize the architecture and the performance of systems. The format of this course will be a mix of lectures, seminar-style discussions, and student presentations. Students will be responsible for paper readings, and completing a hands-on project. For projects, we will strongly encourage teams that contains both AI and systems students.

New Course Format

Two previous versions of this course were offered in Spring 2019, and Fall 2019. The format of this third offering is slightly different. Each week will cover a different research area in AI-Systems. The lecture will be organized around a mini program committee meeting for the weeks readings. Students will be required to submit detailed reviews for a subset of the papers and lead the paper review discussions. For some of the topics, we have also invited prominent researchers for each area and they will present an overview of the field, followed by discussions raised during the “committee meeting”. The goal of this new format is to both build a mastery of the material and also to develop a deeper understanding of how to evaluate and review research and hopefully provide insight into how to write better papers.

Course Syllabus

This is a tentative schedule. Specific readings are subject to change as new material is published.

Jump to Today

Week Date Topic
1
1/24/22

Introduction and Course Overview

This lecture will be an overview of the class, requirements, and an introduction to the history of machine learning and systems research.

2
1/31/22

Big Data Systems

Guest Speaker: Reynold Xin (Databricks)

3
2/07/22

Hardware for Machine Learning

Guest Speaker: Prof. Sophia Shao (UC Berkeley)

4
2/14/22

Distributed deep learning, Part I: Systems

Guest Speaker: Microsoft DeepSpeed Team

5
2/21/22

Holiday (Presidents Day)

6
2/28/22

Distributed deep learning, Part II: Scaling Constraints

Guest Speaker: Michael Houston (Nvidia)

7
03/07/22

Project Proposals

8
03/14/22

Machine learning Applied to Systems

Guest Speaker: Prof. Tim Kraska (MIT)

9
03/21/22

Spring Break

10
03/28/22

Machine Learning Frameworks and Automatic Differentiation

Guest Speaker: Prof. Tianqi Chen (OctoML and CMU)

11
04/04/22

Efficient Machine Learning

Guest Speaker: Vikas Chandra (Facebook)

12
04/11/22

Fundamentals of Machine Learning in the Cloud, the Modern Data Stack

Guest Speaker: Prof. Matei Zaharia (Databricks and Stanford)

13
04/18/22

Benchmarking Machine Learning Workloads

Guest Speaker: Prof. Vijay Reddi (Harvard)

14
04/25/22

Machine learning and Security

15
05/02/22

RRR Week

16
05/09/22

Project Presentations

Projects

Detailed candidate project descriptions will be posted shortly. However, students are encourage to find projects that relate to their ongoing research.

Grading

Grades will be largely based on class participation and projects. In addition, we will require weekly paper summaries submitted before class.