This course delves into Responsible Data Science, emphasizing the importance of conscientious practices in data analysis and application. It commences with Causal Inference, enabling students to distinguish between correlation and causation for accurate data interpretation. Subsequent modules cover Algorithmic Fairness to promote unbiased AI development, and Explainable AI, which aims to enhance the transparency and reliability of AI outputs. The curriculum concludes with Data Cleaning, Profiling, and Debiasing, where students learn to refine data quality and mitigate inherent biases. The course is tailored for those seeking to apply data science principles responsibly in practical scenarios.
Instructor:
Babak Salimi, bsalimi@ucsd.edu
Course Assistants:
Baharan Khatami, skhatami@ucsd.edu
Jiongli Zhu, jiz143@ucsd.edu
Parjanya Prashant, pprashant@ucsd.edu
Lectures:
Tuesdays and Thursdays at 5:00pm-6:20pm
Office Hours:
Baharan Khatami: Mon 15:00-16:00. Zoom link
Jiongli Zhu: Thu 10:00-11:00am. Zoom link
Parjanya Prashant: Wed 13:00-14:00. Zoom link
Note: Office hours will be held via Zoom (link can be found on the Canvas calendar).
In this course, students will engage in a project-based learning experience, emphasizing teamwork, practical application of data science techniques, and effective communication. Groups of 2-3 students will collaborate to develop projects on topics relevant to the class. These projects can range from open-ended research that could potentially lead to a paper to the implementation and analysis of algorithms, building tools with GUIs for applications in responsible data science, or analyzing real and synthetic datasets. The focus is on applying responsible data science methodologies to real-world problems and data scenarios, rather than just reading papers or writing surveys.
The course will have three checkpoints: an initial proposal, a midterm update, and a final report. Students are encouraged to meet with the instructor regularly for guidance. The project will culminate in a short in-class presentation. Evaluation will consider the team’s effort, the results, the quality of the related work survey, and the effectiveness of their presentation and final report. Students will also participate in peer reviews, offering and receiving constructive feedback. To enhance the learning experience, students will create visual posters and dynamic presentations, aiming to foster a deep understanding of responsible data science. For detailed project ideas and refer to the provided Google Docs.
(subject to change)
Week | Date | Lecture Topics | Slides | Readings |
---|---|---|---|---|
1 | Jan 9 | Course Overview, Introduction to Causal Inference | Slides Slides | Understanding Simpson’s Paradox |
2 | Jan 15 | Potential Outcome, Structural Causal Models | Slides Slides | d-Separation Without Tears |
3 | Jan 22 | Project Brainstorming | ||
4 | Jan 29 | Structural Causal Models, Identification | Slides Recording | Structural Causal Models — A Quick Introduction |
5 | Feb 5 | Causal Estimation (Guest Lecture), Fair Ranking (Guest Lecture) | Recording | Matching Methods for Causal Inference |
6 | Feb 12 | Algorithmic Fairness | Slides | A Survey on Bias and Fairness in Machine Learning |
7 | Feb 19 | Fairness and Privacy (Guest Lecture), Algorithmic Fairness | Recording Slides | Direct and Indirect Effects, Interventional Fairness |
8 | Feb 26 | Data Profiling, Data Profiling (Guest Lecture) | Slides Slides | |
9 | Mar 4 | Data Cleaning | Slides | |
10 | Mar 11 | Final Project Presentations |
Note: The readings and slides are placeholders and should be replaced with actual links to resources.
Causal Inference in Statistics: A Primer
Fairness and Machine Learning: Limitations and Opportunities
Interpretable Machine Learning A Guide for Making Black Box Models Explainable