Reproducibility workshop

Max Joseph
12/4/2013

Why worry about reproducibility?

  • part of scientific method

Why worry about reproducibility?

  • scientific method
  • crisis

e.g. Amgen cancer researchers: 6/53 “landmark” findings reproduced

“Open science: a preference for leaving an honest mess for others to clean up rather than a tidy lie for them to admire” - Jake Vanderplas

Why worry about reproducibility?

  • scientific method
  • crisis
  • convenience

Outline

  1. knitr + Rmarkdown
  2. git + GitHub
  3. GNU Make

knitr

  • R package by Yihui Xie (webpage)
  • produces reports
  • joins text with executed code

Rmarkdown

  • markdown + R code chunks
  • simple syntax

(knitr demo)

What's knitr useful for?

What's knitr better than?

  • storing code/commentary in Word
  • storing model results in Excel
  • manually inserting R output in webpages & slideshows

Outline

  1. knitr + Rmarkdown
  2. git + GitHub
  3. GNU Make

git + GitHub

git

  • version control system
  • used to manage files (often code)
  • system to collaborate, backup, and organize

git + GitHub

GitHub

  • online hub for repositories
  • multiuser collaboration
  • issue tracking & wiki system

Getting started

(GitHub example)

What is git useful for?

  • managing collaborative projects
  • sharing code
  • maintaining transparency
  • contributing to open source projects

What is git better than?

  • many file versions w/ silly names
  • (only) manual backups on physical hard drives

Outline

  1. knitr + Rmarkdown
  2. git + GitHub
  3. GNU Make

What is Make?

A utility to build files from source code

  • executes instructions in “makefiles”
  • often used for software
  • also useful for papers

(make example)

What is make good for?

  • integrating analysis/computations with text
  • R, Python, BEAST, Mr. Bayes, etc. (anything)
  • w/ GitHub repo \( \rightarrow \) reproducible analysis

Thanks