dsc261-fa25

Image genarted using DALL·E

DSC 261: Responsible Data Science

Description:

This course explores the foundations of Responsible Data Science, emphasizing rigorous methodology and ethical application. Students begin with Causal Inference, moving beyond correlation to uncover true causal relationships, and continue with Algorithmic Fairness, learning to detect and mitigate bias for equitable outcomes. The course also highlights Data Valuation, teaching how to measure and attribute the impact of datasets on model performance, alongside Debugging and Data Selection, where students develop practical skills for identifying data issues, refining inputs, and ensuring robustness. By integrating theory with applied techniques, the course prepares students to design, evaluate, and deploy data science methods that are both technically sound and socially responsible.

Instructional team:

Instructor:

Babak Salimi, bsalimi@ucsd.edu

Course Assistants:

Jiongli Zhu, jiz143@ucsd.edu

Parjanya Prashant, pprashant@ucsd.edu

Lectures:

MWF 12:00p-12:50p

Office Hours: Babak: Wednesday 4pm-5pm Link

TAs (Jiongli/Parjanya): Wednesday 1pm-2pm Link

Note: Office hours will be held via Zoom (link can be found on the Canvas calendar).

Course Workload

In this course, students will engage in a project-based and presentation learning experience, emphasizing teamwork, practical application of data science techniques, and effective communication. Groups of 5-6 students will collaborate to present papers and develop projects on topics relevant to the class.

Team formation: Team formation details will be announced on Canvas.

This course will involve paper presentations and projects. The grading break up is as follows: Paper Presentations (30%), Project Proposal(5%), Midterm Update (5%), Project Report (30%), Project Presentation (20%), Class Participation (10%).

Paper Presentation (Mid-Quarter)

Each team will present 2 related papers in one domain. The presentation will be 20 minutes, followed by a 5-minute discussion. Teams will sign up for the presentation topics here

Grading for the paper presentation is split as follows:

Depth of Analysis (12%) – Understanding, critique, and synthesis of the papers.
Clarity & Organization (9%) – Clear structure, effective visuals, and presentation.
Discussion Engagement (6%) – Thoughtful questions and interaction with the audience.
Team Collaboration (3%) – Equal contribution from all members.

The slides of the presentation must be submitted on canvas within one week of the presentation.

Links to video recordings of presentations:

Nov 10 (Use password: N^3w^gNC)

Nov 12 (Use password: N^3w^gNC)

Nov 17 (Use password: %QLCjb8#)

Nov 19 (Use password: tGy7Eva%)

Nov 21 (Use password: 0r%+68W6)

Nov 24 (Passcode: +Xr*88xD)

Dec 1 (Passcode: Q%M!kK22)

Dec 3 (Passcode: 0zXqZ&.4)

Dec 5 noon (Passcode: 5V5uq=+*)

Dec 5 afternoon (Passcode: NPYKiF7&)

Projects:

Guidelines for Project Each project must

Present a clearly defined hypothesis and contribute in at least one of the following areas:
- Propose a novel theory or algorithm.
- Apply an existing approach to a new domain.
- Reimplement a paper (published at top ML venues like NeurIPS/ICLR/ICML) where no public code is available.
- A comprehensive tutorial on one of the topics listed in the presentation sheet. The tutorial must have some empirical results and analysis.
Experiments should be rigorously designed, with well-defined, measurable metrics. All results must include error bars.
The topic has to be picked from the list of paper presentation topics.
For theory-focused projects, formal proofs are required.
Please use this template for project proposal and reports.

Note on Project grading Projects will be graded on contribution, effort, writing, and execution. Projects with thoughtfully designed experiments and solid execution will not be penalized for yielding negative results.

Project Timeline The course will have three project checkpoints:

Initial Proposal: Due October 12 11:59 PM PST. The proposal must follow the provided template and be a maximum of two pages. Please include in the proposal: problem statement, brief related works, collaboration plan, and datasets to be used.
Midterm Update: Due November 2 11:59 PST. The midterm report must follow the provided template and be no more than 3 pages, excluding references. The report should include data collection and initial analysis, and reflect feedback received for the proposal.
Final Report: Date to be announced

Students are encouraged to meet with the instructor team regularly for guidance. The project will culminate in a short in-class presentation, where students will also create visual posters and dynamic presentations.

Titles and abstracts of courses in the last offering can be found here: Past Projects

Course Calendar:

Week	Date	Lecture Topics	Slides	Readings
1	Sep 30	Introduction	Introduction and Course Overview · Introduction to Causal Inference	* Yuval Noah Harari argues that AI has hacked the operating system of human civilization * Book Extract: Yuval Noah Harari on AI * President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence * AI, Fake News, and Misinformation * Generative AI is the Ultimate Disinformation Amplifier
2	Oct 7	Graphical Models and Potential Outcomes	Graphical Models · Potential Outcomes
3	Oct 14	Structural Causal Models	Structural Causal Models
4	Oct 21	Algorithmic Fairness	Algorithmic Fairness
5	Oct 28	Bias Mitigation	Bias Mitigation	* Optimal Transport Blog Post * Paper 1 · Paper 2
6	Nov 4	—
7	Nov 11	—
8	Nov 18	—
9	Nov 25	—
10	Dec 2	—

Textbook

Causal Inference in Statistics: A Primer

Trustworthy Machine Learning

Fairness and Machine Learning: Limitations and Opportunities

Interpretable Machine Learning A Guide for Making Black Box Models Explainable