class: center, top, title-slide .title[ # Quantitative Data Management ] .subtitle[ ## CRMW Workshop 2 ] .author[ ### Charles Lanfear ] .date[ ### 25 Jan 2024
Updated: 23 Jan 2024 ] --- # Today A research question: .text-center[ *Controlling for **density**, how is **deprivation** related to **crime** in London?* ] <br> ![](slides_quantitative-data-management_files/figure-html/mod-dag-1.svg)<!-- --> <br> -- .pull-left[ Today we will: * Create a basic project * Load data we will need * Prepare the data for analysis ] -- .pull-right[ Next time we will: * Visualize our data * Model our outcomes * Diagnose and (try to) address problems ] --- # Setup 1. Open RStudio -- 2. In the project menu (top right) select *New Project...* -- 3. Select *New Directory* * Place it wherever you want -- 4. Using the files tab of the bottom-right panel * Make sure you are in your project's main directory * Create a new `code` folder * Create a new folder called `data` * Create a folders called `raw` and `derived` in `data` -- 5. Browse to this lecture on the course website * []( --- # The Data Save these to the `data` directory in your project * London Crime data - 2022 + Crime outcomes + [Direct download link (.zip)]( — **multiple files!** + Source: [``]( * LSOA Indices of Deprivation - 2019 + Deprivation predictors + [Direct download link (.csv)]( + Source: [``]( * LSOA Population Density - 2014 + Density control + [Direct download link (.xlsx)]( — a multi-sheet **Excel file!** + Source: []( --- # Get to Work! We want to produce **analysis-ready data**: * Cross-sectional (one row per unit) * Columns for predictors * Columns for outcomes -- The process: * **Load and clean** up each file with *separate scripts* * **Save derived data** as *separate files* * **Join together** in another script and save the analysis data -- We'll have *at least* **four scripts**! * We'll start by making `1_process-metro.R` * Numbers make **run order** clear -- .text-center[ *Let's work on this together!* ] --- class: inverse # Cleaning Data <br> ![](img/owl.jpg) --- class: inverse # Wrap-Up ### Practice! * Data management is the hardest and most time consuming part of any project * You get good with **practice** and **intentional improvement**