Tutorial 17.4 - Knitr and reproducible research
18 Nov 2018
Overview
Tutorials 17.1 and 17.2 introduced two document markup languages for the preparation of PDF and HTML respectively. Tutorial 17.3 introduced the markdown language and pandoc - the universal document conversion tool.
Reproducible research is a data analyses concept that promotes publishing of all analysis source, outcomes and supporting commentary (such as a description of methodologies and interpretation of results) in such a way that others can reproduce the findings for verification. Ideally, this works best when the documentation and source codes are woven together into a single document. Traditionally, this would involve substantial quantities of 'cutting and pasting' from statistical software into document authoring tools such as LaTeX, html or Microsoft Word. Of course, any minor changes in the analyses then would necessitate replacements of the code in the document as well as replacing any affected figures or tables. Keeping everything synchronized was a bit of a battle.
This is where packages like knitr come in. Knitr evaluates blocks of code within a document and converting both the code and output into the same format as the surrounding document (e.g. LaTeX or html). This scheme greatly facilitates reproducible research by allowing the document and all source code to be contained in a single file or related files.
How it works
Within the surrounding document, code blocks are defined within language specific tag pairs:
- for LaTeX
<<>>= ... @
- for HTML
<!--begin.rcode ... end.rcode -->
For a first example, we will add a single simple code block to minimum LaTeX and html documents.
LaTeX
The workflow consists of the source document/code (in this case a text file called min.Rtex), an R session in which the knit() function from the knitr package is used to 'knit' the code blocks into the surrounding document format, and finally compiling into pdf via xelatex (of pdflatex).
LaTeX code (min.Rtex) | PDF result |
---|---|
\documentclass[a4paper,12pt]{article} \begin{document} \section{A section}\label{sec:s1} This is a minimum \LaTeX~document with embeded R code. <<Summary>>= x = rnorm(10) summary(x) @ \end{document} |
|
Within R |
library(knitr) knit('min.Rtex', 'min.tex') |
Command line |
xelatex min.tex |
HTML
The workflow consists of the source document/code (in this case a text file called min.HTML), and an R session in which the knit() function from the knitr package is used to 'knit' the code blocks into the surrounding document format.
LaTeX code (min.Rhtml) | HTML result |
---|---|
<!DOCTYPE html> <html> <head> <title> Simple document </title> </head> <body> <h1>A Section</h1> This is a minimum HTML document with embedded R code <!--begin.rcode Summary1 x = rnorm(10) summary(x) end.rcode--> </body> </html> |
Within R |
library(knitr) knit('min.Rhtml', 'min.html') |
knitr options
In the examples above, a single option was provided as a knitr 'chunk' argument. This option was chunk label and is used to provide a name for the chunk (chunks can refer to other chunks). The following table lists other common options available (for a full list of options, visit the knitr chunk options website).
Option | Description | LaTeX | HTML |
---|---|---|---|
Text output | |||
echo | either:
|
Yes | Yes |
eval | either:
|
Yes | Yes |
results | either:
|
Yes | Yes |
warning | either:
|
Yes | Yes |
error | either:
|
Yes | Yes |
message | either:
|
Yes | Yes |
Code decoration | |||
tidy | either:
|
Yes | Yes |
tidy.opts | a list of options passed on to the tidying function. For example, tidy.opts = list(width.cutoff = 60) restricts the width of R output to 60 characters wide. | Yes | Yes |
prompt | TRUE or FALSE (whether to include R prompts in the echoed code) | Yes | Yes |
comment | the comment character used before output (defaults to ##) | Yes | Yes |
highlight | TRUE or FALSE (whether to apply syntax highlighting to the code) | Yes | Yes |
size | the font size for code and output | Yes | No |
background | the color of the code and output background | Yes | No |