Course website for CSSS/SOC/STAT 321: Data Science and Statistics for the Social Sciences I
University of WashingtonInstructor: Charles Lanfear
Lecture
lecture_room
: lecture_day
lecture_start
-lecture_end
Lab
lab_room
: lab_day
lab_start
-lab_end
My Office
Savery 255: office_day
TBA
Imai:
“Would universal health insurance improve the health of the poor? Do patterns of arrests in US cities show evidence of racial proling? What accounts for who votes and their choice of candidates? This course will teach students how to address these and other social science questions by analyzing quantitative data. The course introduces basic principles of statistical inference and programming skills for data analysis. The goal is to provide students with the foundation necessary to analyze data in their own research and to become critical consumers of statistical claims made in the news media, in policy reports, and in academic research.”
This course is intended as an accessible, holistic, and case-based introduction to quantitative data analysis using programming and statistics. General topics we will focus on include:
By the end of this course you should feel confident conducting basic research with quantitative data. While the programming component focuses on the R language, the concepts you learn will be applicable to other programming languages and research in general, such as logic, algorithmic thinking, and project management.
No specific courses are required and I anticipate this will be the first course in which most students encounter programming. Programming is difficult and can be learned effectively only through time and effort. Trial and error is a key aspect of learning to code and you will always find yourself searching for answers or asking for help. Therefore the only prerequisites for this course are the capability to work hard at something difficult and a willingness to ask for help.
The course website is accessible without a UWnetID and features all of the slides, R code, and assignments. It will be updated continually throughout the quarter, and available after the term has ended as a reference.
Labs are in lab_room
on lab_day
from lab_start
to lab_end
. Labs will consist of working through
lab notebooks as well as question and answer sessions for homeworks and projects.
Using your own computer is recommended but computers are provided in the lab. Lab
attendance is mandatory as lab notebooks must be completed during lab.
This course features both a Slack channel and a mailing list for obtaining extra help. As you develop your skills, you will find that learning how to ask questions well makes finding answers much easier. Students are encouraged and expected to assist one another with technical problems, both in and out of class. Diagnosing problems in others’ code is a very effective way to improve your understanding of programming.
This course features a mailing list for asking questions. I encourage
you to use this list as your primary means of answering long technical questions.
To use the mailing list, address your email to mailing_list_address
and it will be sent to the instructor and all students in the class. Unless you specifically
request otherwise, emailed questions directed to me (clanfear@uw.edu) that may be useful to others in the class will be answered with a response to the class mailing list.
We will also use a Slack channel for communication.
You may ask and answer questions in the Slack channel instead of the mailing list.
Slack is best-suited for short technical questions or extended conversations.
Your @uw.edu
email must be used to register for the channel. You will receive
an invitation in the first week of class.
Grades will be assessed with the following breakdown:
Item | Number | Percent of Grade |
---|---|---|
Problem Sets | 5 | 40 |
Lab Notebooks | 8 | 20 |
swirl Assignments |
8 | 10 |
Project Proposal | 1 | 10 |
Project Report | 1 | 20 |
A score of at least 95% will guarantee a 4.0 in the course. Every 2.5 percentage points below 95 will correspond to a .1 difference in the final grade (e.g. 90/100 is a 3.8).
These must be turned in as knitted markdown documents which we will learn to create in the first weekand for which templates are provided. These are designed first and foremost to develop skills rather than “prove” you have learned concepts. Problem Sets will be uploaded to Canvas as both a knitted HTML file and an R Markdown document with all required code to produce the output document. They will be graded on a 0 to 5 point scale based on a simple effort-focused rubric found on the homework page.
I encourage you to communicate and work together, so long as you write and explain your code yourself and do not copy work wholesale. You can learn a lot from replicating others’ code but you will learn nothing if you copy it without knowing how it works. Use of “found” or “borrowed” code is permitted only provided a citation (a link is sufficient). If you collaborate with others, limit it to a total group size of THREE (including yourself) and list your collaborators on the front page of your problem set. Problem sets containing uncited code from other source will be treated as plagiarism. Evidence of collaboration with others without attribution–or in groups larger than THREE–will result in a 0 for the assignment for everyone with the shared code.
In each lab, students will complete exercises in an R Markdown document. These exercises will be turned in as both a knitted HTML document and the original R Markdown (.Rmd) file at the end of lab. Each notebook is worth 2 points and is evaluated as complete (2), at least 50% complete (1), or less than 50% complete (0). The course has 9 labs available but students are required to complate any 8 labs. Completing all 9 labs will yield up to 2.5 percentage points of extra credit.
swirl
AssignmentsStudents will complete interactive R tutorials outside of class using the swirl
package.
These tutorials give students the opportunity to learn R commands at their own pace in
a structured environment. Each assignment is evaluated only as complete or incomplete.
Groups of 2 to 4 students will choose from a diverse series of quantitative data sets and prepare an analytical report. Projects will combine descriptive statistics, visualization, and statistical models in a readable format for the general population. These will be similar in style and content to a research memo in a workplace or an exploratory research proposal for a graduate-level project. They will include:
Project reports will be submitted as two files like homework assignments: (1) a knitted HTML file with no visible raw code and (2) an R Markdown file with all necessary code embedded in the document. Reports must be between 2500 and 5000 words and feature between 2 and 5 figures or tables. You will turn in a project proposal at the start of Week 9 which includes the introduction, a proposal for the methods section, and some exploratory results from the data. You will receive detailed feedback on your proposal. Proposals and final reports will be evaluated using the rubric found on the project report page.
The required text for this course is Kosuke Imai’s Quantitative Social Science. It is hands-on introduction to data analysis and statistics which integrates R code, interactive tutorials, and real data analysis projects. We will supplement this textbook with Jeff Arnold’s tidyverse
adaption of the code used in the book.
There are many free texts and resources which may offer alternative pespectives on the content we will cover. Here are some suggestions:
R for Data Science by Garrett Grolemund and Hadley Wickham, a great general introduction to R programming for data management and analysis.
OpenIntro Statistics by David Diez, Christopher Barr, and Mine Çetinkaya-Rundel, a free and open-source introduction to statistics using R.
Data Visualization: A Practical Introduction by Kieran Healy, an introduction to visualizing data and R programming.
Additional recommended readings will be posted on the website over the term.
The following resources are recommended and may prove useful.
RStudio Primers: Interactive web-based tutorials for R using modern approaches.
RStudio Cheat Sheets: Handy reference sheets covering a variety of common R tasks and packages.
StackOverflow, the internet’s largest programming help community. Nearly any question you have about programming in R has probably already been asked and answered here, and if not, this is the place to ask.
This is a course in programming and statistics using R. You are welcome to use lab computers, however R and RStudio are free software, so you are encouraged to use your own computer for familiarity and accessibility. You can acquire R from the Comprehensive R Archive Network (CRAN) and RStudio from the RStudio home page (you want the free RStudio desktop version). Installation instructions may be found here. The instructor can also provide support with installing R or RStudio in office hours or over email or Slack.
Lectures will cover both R programming and statistical concepts. The focus is on identifying problems then applying an appropriate method using computation and statistics. Labs will be used to apply quantitative social science concepts in a collaborative environment using structured exercises.
Week | Topic | Reading | Due |
---|---|---|---|
1 | Causality | TBA | |
2 | Causality | TBA | HW1 |
3 | Measurement | TBA | |
4 | Measurement | TBA | HW2 |
5 | Prediction | TBA | |
6 | Prediction | TBA | HW3 |
7 | Probability | TBA | |
8 | Probability | TBA | HW4 |
9 | Uncertainty | TBA | Report Draft |
10 | Uncertainty | TBA | HW5 |
Finals | None | None | Report Final |
You are permitted to submit one problem set up to four days late without penalty with no explanation required. Other late assignments will be penalized by 25% per day without official documentation of an unavoidable cause for absence or inability to complete the assigngment on time. Valid unavoidable causes are limited to those described in Student Governance Policies Chapter 112, Subsection 1.B.
Your experience in this class is important to me. If you have already established accommodations with Disability Resources for Students (DRS), please communicate your approved accommodations to me at your earliest convenience so we can discuss your needs in this course. If you have non-disability related concerns for participating in this class–such as difficulty with course materials or the pace of the course– please contact me. I want this course to be maximally useful and accessible to all students.
As a University of Washington student, you are bound by the university’s student conduct policy. Academic misconduct, including plagiarism–which includes copying any material from any source including fellow students without attribution–will be referred to the Community Standards & Student Conduct office. If you are not sure a particular practice is acceptable, please contact me via email or in office hours.