Reproducibility is not replication.
Reproducibility is not replication.
Reproducible studies can still be wrong... and reproducibility makes proving studies wrong much easier.
Reproducibility is not replication.
Reproducible studies can still be wrong... and reproducibility makes proving studies wrong much easier.
Reproducibility means:
Reproducibility is not replication.
Reproducible studies can still be wrong... and reproducibility makes proving studies wrong much easier.
Reproducibility means:
Any study that isn't reproducible can be trusted only on faith.
Reproducibility comes in three forms (Stodden 2014):
Reproducibility comes in three forms (Stodden 2014):
Reproducibility comes in three forms (Stodden 2014):
Empirical: Repeatability in data collection.
Statistical: Verification with alternate methods of inference.
Reproducibility comes in three forms (Stodden 2014):
Empirical: Repeatability in data collection.
Statistical: Verification with alternate methods of inference.
Computational: Reproducibility in cleaning, organizing, and presenting data and results.
Reproducibility comes in three forms (Stodden 2014):
Empirical: Repeatability in data collection.
Statistical: Verification with alternate methods of inference.
Computational: Reproducibility in cleaning, organizing, and presenting data and results.
R is particularly well suited to enabling computational reproducibility.
Reproducibility comes in three forms (Stodden 2014):
Empirical: Repeatability in data collection.
Statistical: Verification with alternate methods of inference.
Computational: Reproducibility in cleaning, organizing, and presenting data and results.
R is particularly well suited to enabling computational reproducibility.
It will not fix flawed research design, nor offer a remedy for improper application of statistical methods.
Those are the difficult, non-automatable things you want skills in.
Elements of computational reproducibility:
Elements of computational reproducibility:
Shared data
Elements of computational reproducibility:
Shared data
Shared code
Elements of computational reproducibility:
Shared data
Shared code
Documentation
Elements of computational reproducibility:
Shared data
Shared code
Documentation
Version Control
For academic papers, degrees of reproducibility vary:
For academic papers, degrees of reproducibility vary:
"Read the article"
Shared data with documentation
For academic papers, degrees of reproducibility vary:
"Read the article"
Shared data with documentation
Shared data and all code
For academic papers, degrees of reproducibility vary:
"Read the article"
Shared data with documentation
Shared data and all code
Literate programming
For academic papers, degrees of reproducibility vary:
"Read the article"
Shared data with documentation
Shared data and all code
Literate programming
Research compendium
For academic papers, degrees of reproducibility vary:
"Read the article"
Shared data with documentation
Shared data and all code
Literate programming
Research compendium
Docker compendium: Self-contained ecosystem
Literate programming combines code and text together into a the same self-contained document.
Literate programming combines code and text together into a the same self-contained document.
Literate programming allows a reader to examine your computational methods within the document itself.
Literate programming combines code and text together into a the same self-contained document.
Literate programming allows a reader to examine your computational methods within the document itself.
By re-running the code, they reproduce your results on demand.
Literate programming combines code and text together into a the same self-contained document.
Literate programming allows a reader to examine your computational methods within the document itself.
By re-running the code, they reproduce your results on demand.
Common Platforms:
Literate programming combines code and text together into a the same self-contained document.
Literate programming allows a reader to examine your computational methods within the document itself.
By re-running the code, they reproduce your results on demand.
Common Platforms:
We'll cover and practice this in a bit
A research compendium is a portable, reproducible distribution of a project.
A research compendium is a portable, reproducible distribution of a project.
Research compendia feature:
A literate programming document as the foundation
Files organized in a recognizable structure
Clear separation of data, method, and output. Data are read only.
Well-documented or even preserved computational environment (e.g. Docker)
A research compendium is a portable, reproducible distribution of a project.
Research compendia feature:
A literate programming document as the foundation
Files organized in a recognizable structure
Clear separation of data, method, and output. Data are read only.
Well-documented or even preserved computational environment (e.g. Docker)
Compendia are commonly managed via git repositories:
Or on platforms like Harvard's Dataverse:
Quarto and R Markdown's bookdown
generate properly formatted articles, books, and dissertations.
Quarto and R Markdown's bookdown
generate properly formatted articles, books, and dissertations.
Quarto and R Markdown's bookdown
generate properly formatted articles, books, and dissertations.
Integrate citations and automate reference pages using bibtex files (even directly from Zotero)
More accessible than LATEX for typesetting and reference management
Quarto and R Markdown's bookdown
generate properly formatted articles, books, and dissertations.
Integrate citations and automate reference pages using bibtex files (even directly from Zotero)
More accessible than LATEX for typesetting and reference management
More consistent, flexible, and reproducible than Word and Google Docs
Quarto and R Markdown's bookdown
generate properly formatted articles, books, and dissertations.
Integrate citations and automate reference pages using bibtex files (even directly from Zotero)
More accessible than LATEX for typesetting and reference management
More consistent, flexible, and reproducible than Word and Google Docs
Generate submission-ready .docx
or publication-ready .html
or .pdf
documents at the same time:
Quarto and R Markdown's bookdown
generate properly formatted articles, books, and dissertations.
Integrate citations and automate reference pages using bibtex files (even directly from Zotero)
More accessible than LATEX for typesetting and reference management
More consistent, flexible, and reproducible than Word and Google Docs
Generate submission-ready .docx
or publication-ready .html
or .pdf
documents at the same time:
We'll get to these in a bit.
Organizing research projects is something you either do accidentally—and badly—or purposefully with some upfront labor.
Organizing research projects is something you either do accidentally—and badly—or purposefully with some upfront labor.
Uniform organization makes switching between or revisiting projects easier.
Organizing research projects is something you either do accidentally—and badly—or purposefully with some upfront labor.
Uniform organization makes switching between or revisiting projects easier.
I suggest something like the following:
project/ code/ functions.R models.R data/ derived/ processed_data.RData raw/ core_data.csv docs/ memo_2022-06-22.pdf memo_2022-09-01.pdf paper.Qmd readme.md
docs
code
data
To summarize Jenny Bryan, one should separate workflow from projects.
To summarize Jenny Bryan, one should separate workflow from projects.
The software you use to write your code (e.g. RStudio)
The location you store a project
The specific computer you use
The code you ran earlier or typed into your console
To summarize Jenny Bryan, one should separate workflow from projects.
The software you use to write your code (e.g. RStudio)
The location you store a project
The specific computer you use
The code you ran earlier or typed into your console
The raw data
The code that operates on your raw data
The packages you use
The output files or documents
To summarize Jenny Bryan, one should separate workflow from projects.
The software you use to write your code (e.g. RStudio)
The location you store a project
The specific computer you use
The code you ran earlier or typed into your console
The raw data
The code that operates on your raw data
The packages you use
The output files or documents
Projects should not modify anything outside of the project nor need to be modified by someone else (or future you) to run.
Projects should be independent of your workflow.
For research to be reproducible, it must also be portable. Portable software operates independently of workflow such as fixed file locations.
For research to be reproducible, it must also be portable. Portable software operates independently of workflow such as fixed file locations.
Do Not:
setwd()
in scripts, .Rmd, or .Qmd files.read_csv("C:/my_project/data/my_data.csv")
install.packages()
in script or .Rmd files.rm(list=ls())
anywhere but your console.For research to be reproducible, it must also be portable. Portable software operates independently of workflow such as fixed file locations.
Do Not:
setwd()
in scripts, .Rmd, or .Qmd files.read_csv("C:/my_project/data/my_data.csv")
install.packages()
in script or .Rmd files.rm(list=ls())
anywhere but your console.Do:
here
package, or R/Qmd docs to set directories.read_csv("./data/my_data.csv")
library()
.Usually you do not want to include all code for a project in one .Qmd
file:
Usually you do not want to include all code for a project in one .Qmd
file:
There are two ways to deal with this:
Use separate .R
scripts or .Qmd
files which save results from complicated parts of a project, then load these results in the main .Qmd
file.
Usually you do not want to include all code for a project in one .Qmd
file:
There are two ways to deal with this:
Use separate .R
scripts or .Qmd
files which save results from complicated parts of a project, then load these results in the main .Qmd
file.
Use source()
to run external .R
scripts when the .Qmd
knits.
Professional researchers and teams design projects as a pipeline.
Professional researchers and teams design projects as a pipeline.
A pipeline is a series of consecutive processing elements (e.g., scripts and functions).
Professional researchers and teams design projects as a pipeline.
A pipeline is a series of consecutive processing elements (e.g., scripts and functions).
Each stage of a pipeline...
Professional researchers and teams design projects as a pipeline.
A pipeline is a series of consecutive processing elements (e.g., scripts and functions).
Each stage of a pipeline...
This means...
Every stage (oval) has an unambiguous input and output. Everything that precedes a given stage is a dependency—something required to run it.
{targets}
is a package for managing R research pipelines.
Donald Knuth, creator of LATEX, and thus also of tears
Quarto and R Markdown are powerful tools:
Quarto and R Markdown are powerful tools:
Quarto and R Markdown are powerful tools:
Document analyses by combining text, code, and output
Quarto and R Markdown are powerful tools:
Document analyses by combining text, code, and output
Quarto and R Markdown are powerful tools:
Document analyses by combining text, code, and output
Quarto and R Markdown are powerful tools:
Document analyses by combining text, code, and output
Flexible output:
{tinytex}
)Quarto and R Markdown are powerful tools:
Document analyses by combining text, code, and output
Flexible output:
{tinytex}
)Works with LATEX, HTML, and CSS for math and more formatting control
Both work basically the same way—Quarto can even render all basic R Markdown docs!
Both work basically the same way—Quarto can even render all basic R Markdown docs!
We'll focus on Quarto today
Let's try making a Quarto file:
my_first_qmd.Qmd
You may also open up the file in your computer's browser if you so desire, using the Open in Browser button at the top of the preview window.
The header of a Quarto file is a YAML (YAML Ain't Markup Language1) code block, and everything else is part of the main document.
[1] Nerds love recursive acronyms.
The header of a Quarto file is a YAML (YAML Ain't Markup Language1) code block, and everything else is part of the main document.
[1] Nerds love recursive acronyms.
---title: "Untitled"author: "Charles Lanfear"date: "February 5, 2025"format: html---
The header of a Quarto file is a YAML (YAML Ain't Markup Language1) code block, and everything else is part of the main document.
[1] Nerds love recursive acronyms.
---title: "Untitled"author: "Charles Lanfear"date: "February 5, 2025"format: html---
To mess with global formatting, you can modify the header2.
[2] Be careful though, YAML is space-sensitive; indents matter!
format: html: theme: pulse
Quarto headers have autocomplete!
bold/strong emphasis
italic/normal emphasis
Block quote from famous person
**bold/strong emphasis** *italic/normal emphasis* # Header ## Subheader ### Subsubheader > Block quote from > famous person
1. Ordered lists 1. Are real easy 1. Even with sublists 1. Or when lazy with numbering * Unordered lists * Are also real easy + Also even with sublists [URLs are trivial](http://www.uw.edu) 
You can put some math y=(23)2 right up in there.
1nn∑i=1xi=ˉxn
Or a sentence with code-looking font
.
Or a block of code:
y <- 1:5 z <- y^2
You can put some math $y= \left( \frac{2}{3} \right)^2$ right up in there `$$\frac{1}{n} \sum_{i=1}^{n} x_i = \bar{x}_n$$` Or a sentence with `code-looking font`. Or a block of code: ``` y <- 1:5 z <- y^2 ```
To keep Quarto and R Markdown dead-simple, they lacks some features you might occasionally want to use. Your options for fancier documents are:
For day-to-day use, plain vanilla HTML docs do the job.
For handouts, memos, and homeworks, default PDFs look surprisingly good!
[1] These slides were created using Xaringan, a blend of RMarkdown and CSS.
[2] Here be dragons! LATEX is powerful but exacts a terrible price.
Inside RMarkdown, lines of R code are called chunks. Code is sandwiched between sets of three backticks and {r}
. This chunk of code...
```{r}summary(cars)```
Produces this output in your document:
summary(cars)
## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.:19.0 3rd Qu.: 56.00 ## Max. :25.0 Max. :120.00
Insert chunks using Ctrl-Alt-I
(PC) or ⌘-Option-I
(Mac)
Chunks have options that control what happens with their code, such as:
```{r}#| echo: falsesummary(cars)```
echo:FALSE
: Keeps R code from being shown in the document, but still shows resultChunks have options that control what happens with their code, such as:
```{r}#| echo: falsesummary(cars)```
echo:FALSE
: Keeps R code from being shown in the document, but still shows result
eval: FALSE
: Shows R code in the document without running it
include=FALSE
: Hides all output but still runs code (good for setup
chunks where you load packages!)
fig.height=5, fig.width=5
: modify the dimensions of any plots that are generated in the chunk (units are in inches)
There are a lot of other options!
You can label chunks in the chunk header or the label:
option1
[1] Chunks need unique labels or you'll get an error!
```{r summarize-cars-1}#| echo: falsesummary(cars)```
```{r}#| label: summarize-cars-2#| echo: falsesummary(cars)```
Labels enable browsing in the lower-left Chunk Label menu
We cab insert values directly into our text using code in backticks starting with r
.
We cab insert values directly into our text using code in backticks starting with r
.
Write this in your Quarto doc:
Four score and seven years ago is the same as `r inline_expr("4*20 + 7", "md")` years.
And you'll get this in your output doc:
Four score and seven years ago is the same as 87 years.
We cab insert values directly into our text using code in backticks starting with r
.
Write this in your Quarto doc:
Four score and seven years ago is the same as `r inline_expr("4*20 + 7", "md")` years.
And you'll get this in your output doc:
Four score and seven years ago is the same as 87 years.
Maybe we've saved a variable in a chunk we want to reference in the text:
x <- sqrt(77) # <- is how we assign objects
We cab insert values directly into our text using code in backticks starting with r
.
Write this in your Quarto doc:
Four score and seven years ago is the same as `r inline_expr("4*20 + 7", "md")` years.
And you'll get this in your output doc:
Four score and seven years ago is the same as 87 years.
Maybe we've saved a variable in a chunk we want to reference in the text:
x <- sqrt(77) # <- is how we assign objects
The value of `x` rounded to the nearest two decimals is `r inline_expr("round(x, 2)", "md")`.
The value of x
rounded to the nearest two decimals is 8.77.
Never wonder where a value came from: Look it up in your code!
Consistency! No "find/replace" mishaps or manually updating if calculations change (e.g. reporting sample sizes).
Never wonder where a value came from: Look it up in your code!
Consistency! No "find/replace" mishaps or manually updating if calculations change (e.g. reporting sample sizes).
Fewer mistakes: You are more likely to mistype a "hard-coded" number than to write R code that works but gives you the wrong value.
Never wonder where a value came from: Look it up in your code!
Consistency! No "find/replace" mishaps or manually updating if calculations change (e.g. reporting sample sizes).
Fewer mistakes: You are more likely to mistype a "hard-coded" number than to write R code that works but gives you the wrong value.
Reference management works this way too:
Never wonder where a value came from: Look it up in your code!
Consistency! No "find/replace" mishaps or manually updating if calculations change (e.g. reporting sample sizes).
Fewer mistakes: You are more likely to mistype a "hard-coded" number than to write R code that works but gives you the wrong value.
Reference management works this way too:
This is all huge for writing and, especially, rewriting journal articles
Make a file hierarchy
Make an RStudio project
Make a Quarto doc
Preview features
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |