Knitr – Elegant, nimble, and rapid dynamic report generation with R – Yihui Xie, 谢益辉

knitr

Overview

The knitr package was designed to be a semitransparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver + animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.Two * brew + 0.1 * SweaveListingUtils + more).

  • Transparency means that the user has utter access to every lump of the input and output, e.g., one + two produces [1] three in an R terminal, and knitr can let the user determine whether to put one + two inbetween \begin and \end , or <div class=”rsource”> and </div> , and put [1] three in \begin and \end ; see the hooks page for details
  • knitr attempts to be consistent with users` expectations by running R code as if it were pasted in an R terminal, e.g., qplot(x, y) directly produces the plot (no need to print() it), and all the plots in a code chunk will be written to the output by default
  • Packages like pgfSweave and cacheSweave have added useful features to Sweave (high-quality tikz graphics and cache), and knitr has simplified the implementations
  • The design of knitr permits any input languages (e.g. R, Python and awk) and any output markup languages (e.g. Spandex, HTML, Markdown, AsciiDoc, and reStructuredText)

This package is developed on GitHub; for installation instructions and FAQ`s, see README. This website serves as the utter documentation of knitr, and you can find the main manual, the graphics manual and other demos / examples here. You can also witness a 5-min movie introduction. For a more organized reference, see the knitr book.

Motivation

One of the difficulties with extending Sweave is we have to copy a large amount of code from the utils package (the file SweaveDrivers.R has more than seven hundred lines of R code), and this is what the two packages mentioned above have done. Once the code is copied, the package authors have to pay close attention to what is switching in the version in official R – evidently an extra cargo. The knitr package attempted to modularize the entire process of weaving a document into petite manageable functions, so it is hopefully lighter to maintain and extend (e.g. effortless to support HTML output); on the other palm, knitr has many built-in features and it should not be the case to have to hack at the core components of this package. By the way, several FAQ`s in the Sweave manual are solved in knitr directly.

Let us switch our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.

– Donald E. Knuth, Literate Programming, 1984

Features

The ideas are borrowed from other packages, and some of them are re-implemented in a different way (like cache). A selected list of features include:

  • faithful output: using evaluate as the backend to evaluate R code, knitr writes everything that you see in an R terminal into the output by default, including printed results, plots and even warnings, messages as well as errors (they should not be disregarded in serious computations, especially warnings)
  • a minor issue is that for grid-based graphics packages like ggplot2 or lattice, users often leave behind to print() the plot objects, because they can get the output in an R terminal without indeed print() ing; in knitr, what you get is what you expected
  • built-in cache: ideas like cacheSweave but knitr directly uses base R functions to fulfill cache and lazy loading, and another significant difference is that a cached chunk can still have output (in cacheSweave, cached chunks no longer have any output, even you explicitly print() an object; knitr actually caches the chunk output as well)
  • formatting R code: the formatR package is used to reformat R code automatically (wrap long lines, add spaces and indent, etc), without sacrificing comments as keep.source=FALSE does
  • more than twenty graphics devices are directly supported: with dev=’CairoPNG’ in the chunk options, you can switch to the CairoPNG() device in Cairo in a 2nd; with dev=’tikz’ , the tikz() device in tikzDevice is used; Could anything be lighter than that? These built-in devices (rigorously speaking, wrappers) use inches as units, even for bitmap devices (pixels are converted to inches by the option dpi , which defaults to 72)
  • even more plasticity on graphics:
    • width and height in the output document of plots can be additionally specified (the fig.width option is for the graphics device, and out.width is for the output document; think out.width=’.8\\textwidth’ )
    • locations of plots can be rearranged: they can either show up exactly in the place where they are created, or go to the end of a chunk together (option fig.demonstrate=’hold’ )
    • numerous plots per code chunk are recorded, unless you truly want to keep the last plot only (option fig.keep=’last’ )
    • R code not only can come from code chunks in the input document, but also may be from an outer R script, which makes it lighter to run the code as you write the document (this will especially benefit LyX)
    • for power users, further customization is still possible:
      • the regular expressions to parse R code can be defined, i.e., you do not have to use <<>>= and @ or \Sexpr<> ; if you like, you can use any patterns, e.g., %% begin.rcode and %% end.rcode
      • hooks can be defined to control the output; e.g. you may want to put errors in crimson bold texts, or you want the source code to be italic, etc; hooks can also be defined to be executed before or after a code chunk, and there are infinite possibilities to extend the power of this package by hooks (e.g. animations, rgl 3D plots, …)
      • Lots of efforts have been made to producing beautiful output and enhancing readability by default. For example, code chunks are highlighted and put in a shaded environment in Spandex with a very light gray background (the framed package), so they can stand out a little bit from other texts. The reading practice is hopefully better than the verbatim or Verbatim environments. The leading characters > and + (called prompts) in the output are not added by default (you can bring them back by prompt=TRUE , tho’). I find them truly annoying in the output when I read the output document, because it is so very inconvenient to copy and run the code which is messed up by these characters.

        Acknowledgements

        I thank the authors of Sweave, pgfSweave, cacheSweave, brew, decumar, R2HTML, tikzDevice, highlight, digest, evaluate, roxygen2 and of course, R, for the many inspiring ideas and contraptions. I indeed appreciate the feedback from many early beta testers. This package was initiated based on the design of decumar.

        knitr is pridefully affiliated with the Foundation for Open Access Statistics (FOAS). FOAS is a nonprofit public benefit corporation with a worldwide mission to promote free software, open access publishing, and reproducible research in statistics.

        Related movie:

        Leave a Reply