CS6216 Advanced Topics in Machine Learning

The rise of Large Generative Models (LGMs) has revolutionized AI capabilities, but building efficient systems to support them is a critical next step. This course delves into modern machine learning systems for LGMs, discussing the fundamentals and cutting-edge topics in this field. The students will learn the system design principles for training, inference, and serving LGMs, scaling techniques to handle ever-growing models, memory reduction strategies to optimize resource utilization, as well as acceleration techniques to improve model performance. This course offers some background for students who would like to pursue engineering or research in machine learning systems.

Pre-requisitions: UG machine learning, UG operating systems, Python coding.
Instructor: Yao Lu
TAs: Shenggan Cheng, Xuanlei Zhao
When and where: Wed 10:00-12:00 COM1-0212 (SR3)

Schedule:

Lecture date	Plan	Lecturer if not Yao	Note
Aug 14	Week 1: Introduction [slides]		[HW1 Release]
Aug 21	Week 2: MLsys foundations [slides]		Jeff Dean and Prateek Jain's talk in the first hour
Aug 28	Week 3: Automatic differentiation [slides]		HW1 due (Aug 31)
Sep 04	Week 4: Hardware acceleration [slides]		[HW2 Release]
Sep 11	Week 5: Parallelism and training techniques [slides]
Sep 18	Week 6: Transformers, Attention and Optimizations [slides]		HW2 due, Project proposal due (Sep 21)
Sep 25	Recess week
Oct 02	Week 7: Serving LLMs [slides]		[HW3 Release]
Oct 09	Week 8: Fine-tuning and alignment techniques [slides]
Oct 16	Week 9: AI for systems	Guest lecture: Dr. Jialin Ding	HW3 due, Mid-term project report due
Oct 23	Week 10: Application Systems: server design, AI Agents and RAGs [slides]
Oct 30	Week 11: ML compilers	Guest lecture: Dr. Tianqi Chen	[HW4 Release]
Nov 06	Week 12: Cloud systems for AI [slides]
Nov 13	Week 13: Project presentations [Schedule]		HW4 due, final project report due

Grading schemes:

Mandatory: (1) Paper reading and discussion, and (2) HW1 for individual students.
Elecive: (1) HW2-4 for individual students, and (2) course project for groups of 2-3 students.
You can choose between more (all) homeworks and less (no) projects, or the other way.
Note: pure ML/AI/CV/NLP projects are not acceptable. The project has to demonstrate systems design and implementation which leads to improvements of the system efficiency, robustness or generalizability.