Modern data systems are evolving to cope with the increasing scalability and complexity of multi-modal data that is being used in emerging applications such as search engines, business intelligence, and large generative models. This course begins by exploring individual databases designed for various data models, progresses to cloud databases such as data lakes and warehouses, followed by optimization techniques, tuning and data integration strategies. Our goal is to provide a contemporary perspective that supports practical, data-intensive applications. Instructor: Yao Lu When and where: Fridays, LT15 18:30-20:30 (lecture), 20:30-21:30 (tutorial) Schedule: Lecturedate Plan Note Jan 17 Week 1: Introduction [lecture slides] Jan 24 Week 2: Relation Databases I. Concepts Jan 31 Week 3: Relation Databases II. Tuning Strategies Tutorial session: Labs for relational DB design [HW1 Release: relation DB design] Feb 07 Week 4: Modern Databases I. Key-Value and Vector Databases Tutorial session: Labs for vector DB Feb 14 Week 5: Modern Databases II. Streaming and Time Series Databases Tutorial session: Labs for time series DB Feb 21 Week 6: Modern Databases III. Document Databases Tutorial session: Labs for relational DB tuning [HW2 Release: relational DB tuning] Feb 28 Recess week, no lecture Mar 07 Week 7: Cloud Databases I: MapReduce and Spark Tutorial session: team project presentations, vector DB Mar 14 Week 8: Cloud Databases II: Data Lakes and Warehouses Tutorial session: team project presentations, time series DB Mar 21 Week 9: Query Optimization Tutorial session: team project presentations, document DB [Final project release] Mar 28 Well-Being Day, no lecture Apr 4 Week 11: Data Integration Tutorial session: TBD Apr 11 Week 12: Data Curation for Machine Learning Tutorial session: TBD Apr 18 Final project presentations, time & location: TBD Grading schemes: 4 Tutorials: 10%, 2 Homeworks: 40%, 1 Team Project: 20%, Final Project: 30%