Charles Lanfear
4/19/2018
This presentation focuses on reproducible articles with R using Ben Marwick's rrtools.
After presenting rrtools
, it will be demonstrated using an actual research project in progress: Lanfear, C. and R. Matsueda, “A Dynamic Intrafamily Model of Child Behavior Problems and Birth Timing”.
The compendium repository for this project is available at github.com/clanfear/birthtiming or can be
installed and loaded as the R package birthtiming
using the following code:
devtools::install_github("clanfear/birthtiming")
The current article draft and these slides may be viewed from the links on the project repository.
birthtiming
has convenience functions for opening these files:
birthtiming::browse_paper()
opens the paper in a browser window.birthtiming::browse_presentation()
opens the slides in a browser window.birthtiming::open_paper_dir()
should open the local directory with the paper draft.Reproducibility comes in three forms (Stodden 2014):
R is particularly well suited to enabling computational reproducibility.
R will not fix flawed experimental methods or observations, nor offer a remedy for improper application of statistical methods.
Elements of computational reproducibility:
R packages are a convenient method of sharing data and code with documentation,
which are easily combined with tools like git
for version control.
For academic papers, degrees of reproducibility vary:
A research compendium is an R package used not to share statistical or computational methods but to organize and share reproducible projects.
Research compendia feature:
rrtools
provides a simplified workflow to accomplish this.
rrtools
compendia facilitate:
You can install rrtools
directly from GitHub:
devtools::install_github("benmarwick/rrtools")
The simple README
on the rrtools
github page outlines the one-time function calls used to prepare a compendium for use.
Most important of these:
use_compendium("pkgname")
generates a top-level project directory, subfolders, and necessary files.devtools::use_github()
initializes a github repository for the project.rrtools::use_readme_rmd()
generates an RMarkdown file to produce a repository READMErrtools::use_analysis()
creates directories and bookdown files for an articleFor exact usage, see the link above.
Use of the preceding rrtools
functions will result in a compendium (R package) with the following structure, familiar to users who have created R packages:
README.md
describing the projectDESCRIPTION
text file documenting dependencies and metadatainst/data/
directory for raw data filesinst/analysis/
or inst/paper/
for scripts, reports, and makefiles
paper.Rmd
pre-formatted article templateR/
for scripts with reusable functionsman/
for documentation of functions (roxygen
suggested)Once the compendium has been created, you can
inst/data/
R/
for processing data or doing analysisinst/paper/
or inst/analysis/
inst/figures/
directoryBy loading the compendium library at the start of the paper's .Rmd
file, all data and functions in the other directories become available.
If using your compendium to generate an article, thesis, or dissertation, rrtools
makes formatting simple by integrating bookdown
.
bookdown
provides an accessible alternative to manually writing \( \LaTeX \) for typesetting and reference management.
You can integrate citations and automate reference page generation using bibtex files simply by placing the .bib
file (such as produced by Zotero) in inst/paper/
then choosing an appropriate citation format in the csl:
field of the header of paper.Rmd
.
bookdown
supports .html
output for ease and speed and also renders .pdf
files through \( \LaTeX \) for publication-ready documents.
For University of Washington theses and dissertations, consider Ben Marwick's huskydown
package which uses Markdown but renders via a UW approved \( \LaTeX \) template.
rrtools
provides a relatively simple framework for organizing reproducible projects.
A small upfront investment in organization pays large dividends.
It is much easier to start in a reproducible framework than move to one later.
In general, reproducible frameworks reduce mistakes, improve organization, and protect work.
There is evidence that reproducible and shared research may be more likely to be cited, and definitely contributes more to the discipline (see Marwick et al. 2017).
Lanfear, C. and R. Matsueda, “A Dynamic Intrafamily Model of Child Behavior Problems and Birth Timing”
This project combines the question of Hao & Matsueda (2006) with the estimation technique of Rosenzweig & Wolpin (1995)
How does mother's age at birth impact child behavior problems if we account for the possibility that child behavior impacts future fertility decisions?
Note: Focusing on mother's age in general, not teen births.
Estimates obtained using lavaan
; unstandardized parameters with standard errors.
All data manipulation and estimation—including generation of the diagrams—occur on the fly when the birthtiming
package compiles the paper.
dplyr
, reshape2
)lavaan
and estimates extracted.sweave
calls generate the diagrams separately (\( \LaTeX \) with tikz
)
.png
images (pdftools
)bookdown
with bibtex references as either .pdf
or .html
(via Pandoc)Stodden, V. 2014. “What scientific idea is ready for retirement? Reproducibility.” Edge. URL: https://www.edge.org/response-detail/25340
Marwick, B., C. Boettiger & L. Mullen. 2017. “Packaging data analytical work reproducibly using R (and friends).” PeerJ Preprints 5:e3192v1 https://doi.org/10.7287/peerj.preprints.3192v1
Hao, L. & R. Matsueda. 2006. “Family Dynamics Through Childhood: A sibling model of behavior problems.” Social Science Research 35(2):500–524. http://www.sciencedirect.com/science/article/pii/S0049089X04001024
Rosenzweig, M. & K. Wolpin. 1995. “Sisters, Siblings, and Mothers: The effect of teen-age childbearing on birth outcomes in a dynamic family context.” Econometrica 63(2):303–26. http://www.jstor.org/stable/2951628