Modern data systems are evolving to cope with the increasing scalability and complexity of multi-modal data that is being used in emerging applications such as search engines, business intelligence, and large generative models. This course begins by exploring individual databases designed for various data models, progresses to cloud databases such as data lakes and warehouses, followed by optimization techniques, tuning and data integration strategies. Our goal is to provide a contemporary perspective that supports practical, data-intensive applications.

Instructor: Yao Lu
When and where: Fridays, LT15 18:30-20:30 (lecture), 20:30-21:30 (tutorial)

Schedule:

Lecture
date
Plan Note
Jan 17 Week 1: Introduction
[lecture slides]
Jan 24 Week 2: Relation Databases I. Concepts
Jan 31 Week 3: Relation Databases II. Tuning Strategies

Tutorial session: Labs for relational DB design
[HW1 Release: relation DB design]
Feb 07 Week 4: Modern Databases I. Key-Value and Vector Databases

Tutorial session: Labs for vector DB
Feb 14 Week 5: Modern Databases II. Streaming and Time Series Databases

Tutorial session: Labs for time series DB

Feb 21 Week 6: Modern Databases III. Document Databases

Tutorial session: Labs for relational DB tuning
[HW2 Release: relational DB tuning]
Feb 28 Recess week, no lecture
Mar 07 Week 7: Cloud Databases I: MapReduce and Spark

Tutorial session: team project presentations, vector DB
Mar 14 Week 8: Cloud Databases II: Data Lakes and Warehouses

Tutorial session: team project presentations, time series DB

Mar 21 Week 9: Query Optimization

Tutorial session: team project presentations, document DB
[Final project release]
Mar 28 Well-Being Day, no lecture
Apr 4 Week 11: Data Integration

Tutorial session: TBD
Apr 11 Week 12: Data Curation for Machine Learning

Tutorial session: TBD
Apr 18 Final project presentations, time & location: TBD
Grading schemes:
  • 4 Tutorials: 10%, 2 Homeworks: 40%, 1 Team Project: 20%, Final Project: 30%