dsc100-fa22

The_Data_Lifecycle

  DSC 100: Introduction to Data Management

Description:

Databases are at the heart of modern commercial application development. Their use extends beyond this to many other environments and domains where large amounts of data must be stored for efficient update, retrieval, and analysis. The purpose of this course is to provide a comprehensive introduction to the use of management systems for applications. Some topics covered are the following: data models, query languages, query evaluation and optimisation, database design and transactions.

Instructional team:

Instructor:

Babak Salimi, bsalimi@ucsd.edu

Course Assistants:

Aditya Lahiri, adlahiri@ucsd.edu

Baharan Khatami, skhatami@ucsd.edu

Divija Devarla, ddevarla@ucsd.edu

Manasi Agrawal, maagrawa@ucsd.edu

Lectures:

The lecture for this class will be SYNCHRONOUS.

Office Hours:

Manasi - Monday 11am-12pm

Aditya - Tuesday 4PM-5PM

Divija - Wednesday 12-1PM

Baharan - Thursday 12:50PM-1:50PM

Note: Office hours will be held via Zoom.

Piazza: link (Requires access code posted on Canvas)

Have questions? Please email both Babak Salimi (bsalimi@ucsd.edu) and one of the TAs for questions on logistics. All other questions SHOULD be discussed on Piazza.

Calender:

Date           Description           Discussions Remarks** Lectures Optional Reading
 September 22   Introduction         Introduction  
 September 27   Relational Data Model    Homework 1: SQLITE and SQL Basics (released)    Data Models Demo Sec. 2.1, 2.2, 2.3
 September 29   SQL Basics SQL Intro Discussion  WQ1: Data Models and Simple SQL (released)    SQL Basics Demo  
October 04   Joins         Joins demo  
October 06   Grouping and Aggregation SQL Group and Aggregation       Grouping and Aggregation Demo Data Sec. 6.1, 6.2
October 11   Nested SQL Queries and Set Operations         Nested SQL Queries  
October 13   Nested SQL Queries and Set Operations Nested Queries       Set Operations Demo Data  
October 18   Formal Query Languages (Part 1)         Formal Query Langaues  
October 20   Formal Query Languages (Part2) Discussion:Midterm Sample Questions       Formal Query Langaues  
October 25   Midterm Midterm          
October 27   Query Evaluation Relational Algebra, Algebra of Bags       Slides Video  
November 1   Basics of Data Storage and Indexes (Part 1) Indexes Discussion       Slides Video  
November 3   Basics of Data Storage and Indexes (Part 2)         Slides Video  
November 8   Conceptual Design Conceptual Design,ER Diagram       Slides Video Sec. 4.1-4.6
November 10   Integrity Constraints         Slides  
November 14   Design Theory         Slides  
November 17   Normal Forms Discussion: Database Design Theory and BCNF       Slides  
November 22   NoSQL Databases         Slides  
November 29   Basics of Data Cleaning         Slides  

Note: Some slides are adopted from the UW database group.

Workload:

(subject to change)

Homework (60%): There will be weekly homeworks. They will be based on the last 1-2 lectures. They are of two types:

  1. Written problem-solving and programming assignments (50%):    Start early and allocate enough time to solve these problems!
  2. Gradiance exercises (10%):    Gradiance is an online service pioneered by one of the authors of the textbook, Prof. Jeffrey Ullman at Stanford. One of the best features of Gradiance is that you are permitted to test yourself on a particular topic as many times as you like. You receive immediate feedback for each attempt, which avoids the shortcoming of the traditional submit-and-then-wait-for-grades assignments where one error in understanding can permeate solutions to multiple problems and does not get rectified until much later. We encourage you to continue testing on each topic until you complete the part of the assignment with a 100% score. The highest score will be recorded. The questions will be the same in every attempt, but the answer choices will be selected at random.
  3. Midterm (15%) and final  (25%): Details would be posted later.
  4. Extra Credit:

   - Some howmeworks have extra credit questions.

   - Large number of good answers on Piazza.  

Resources / Communication / Toolkits:

Book: Although a textbook is not required in the course, the following textbook is optional and recommended. Lecture slides and recorded videos would be sufficient for this class.

Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2nd Edition. Prentice Hall. 2008.

Canvas: All weekly homework assignments should be turned in via Canvas.

Communication and Piazza:  All important announcements will be sent through both Piazza.

All questions that may be of general interest to the class should be directed to Piazza. You will get your questions answered faster on Piazza than via personal emails to the instructional team, because Piazza is monitored closely by everybody in the class, not just the course staff. You are highly encouraged to answer each others’ questions on Piazza (you will get extra credit for # of good answers on Piazza!) and the instructional team would endorse/add to those answers.