Tutorial 3.1 - Package management
27 Mar 2017
One of the great strengths of R is the ease to which it can be extended via the creation of new functions. This means that the functionality of the environment is not limited by the development priorities and economics of a comercial enterprise. Moreover, collections of related functions can be assembled together into what is called a package or library. These packages can be distributed to others to use or modify and thus the community and capacity grows.
One of the keys to the concept of packages is that they extend the functionality when it is required. Currently (2013), there are in excess of 4000 packages available on CRAN (Comprehensive R Archive Network) and an additional 2000 packages available via other sources. If all of that functionality was available simultaneously, the environment would be impeared with bloat. In any given session, the amount of extended functionality is likely to be relatively low, therefore it makes sence to only 'load' the functionality into memory when it is required.
The R environment comprises the core language itself (with its built in data, memory and control structures along with parsers error handlers and built in operators and constants) along with any number of packages. Even on a brand new install of R there are some packages. These tend to provide crucial of common functions and as such many of them are automatically loaded at the start of an R session.
To see what packages are currently loaded in your session, enter the following:
(.packages())
[1] "nlme" "stats" "graphics" "grDevices" "utils" "datasets" "methods" [8] "base"
A more general alternative to using the .packages() function, is to use the seach() function.
search()
[1] ".GlobalEnv" "package:nlme" "package:stats" "package:graphics" [5] "package:grDevices" "package:utils" "package:datasets" "package:methods" [9] "Autoloads" "package:base"
[1] "knitr" "mgcv" "nlme" "stats" "graphics" "grDevices" "utils" [8] "datasets" "methods" "base"
If the object is not found in .GlobalEnv, the search continues within the next search location (in my case the stats package and so on. When you load an additional package (such as the car package, this package (along with any of other packages that it depends on) will be placed towards the start of the search que. The logic being that if you have just loaded the package, then chances are you intend to use its functionality and therefore your statements will most likely be evaluated faster (because there is likely to be less to search through before locating the relevant objects).
library(car) search()
[1] ".GlobalEnv" "package:car" "package:nlme" "package:stats" [5] "package:graphics" "package:grDevices" "package:utils" "package:datasets" [9] "package:methods" "Autoloads" "package:base"
Listing installed packages
The installed.packages() function tabulates a list of all the currently installed packages available on your system along with the package path (where is resides on your system) and version number. Additional fields can be requested (including "Priority", "Depends", "Imports", "LinkingTo", "Suggests", "Enhances", "OS_type", "License" and "Built").
installed.packages() installed.packages(fields=c("Package", "LibPath", "Version", "Depends","Built"))
Yet more information can be obtained for any single package with the packageDescription() and library functions - the latter provides all the information of the former and then includes a descriptive index of all the functions and datasets defined within the package.
packageDescription('car')
library(help='car')
Installing packages
The R community contains some of the brightest and most generous mathematician, statisticians and practitioners who continue to actively develop and maintain concepts and routines. Most of these routines end up being packaged as a collection of functions and then hosted on one or more publicly available sites so that others can benefit from their efforts.
The locations of collections of packages are called repositories or 'repos' for short. There four main repositories are CRAN, Bioconductor, R-Forge and github. By default, R is only 'tuned in' to CRAN. That is any package queries or actions pertain just to the CRAN repositories.
To get a tabulated list of all the packages available on CRAN (warning there are over 4000 packages, so this will be a large table):
available.packages()
Comprehensive R Archive Network - CRAN
CRAN is a repository of R packages mirrored across 90 sites throughout the world. Packages are installed from CRAN using the install.packages() function. The first (and only mandatory) argument to the install.packages() function is the name of the package(s) to install (pkgs=). If no other arguments are provided, the install.packages() function will search CRAN for the specified package(s) and install it along with any of its dependencies that are not yet installed on your system.
Note, unless you have started the session with administrator (root) privileges, the packages will be installed within a path of your home folder. Whilst this is not necessarily a bad thing, it does mean that the package is not globally available to all users on your system (not that it is common to have multiple users of a single system these days). Moreover, it means that R packages reside in multiple locations across your system. The packages that came with your R install will be in one location (or a couple or related locations) and the packages that you have installed will be in another location.
To see the locations currently used on your system, you can issue the following statement.
.libPaths()
To install a specific package (and its dependencies):
install.packages("devtools")
By indicating a specific repository, you can avoid being prompted for a mirror. For example, I chose to use a CRAN mirror at Melbourne University (Australia), and therefore the following statement gives me direct access
install.packages("devtools", repos="http://cran.csiro.au")
Finally, you could provide a vector of repository names if you were unsure which repository was likely to contain the package you were after. This can also be useful if your preferred mirror regularly experiences downtime - the alternative mirror (second in the vector) is used only when the first fails.
Bioconductor
Bioconductor is an open source and open development project devoted to genomic data analysis tools, most of which are available as R packages. Whilst initially the packages focused primarily on the manipulation and analysis of DNA microarrays, as the scope of the projects has expanded, so too has the functional scope of the packages there hosted.
source("http://bioconductor.org/biocLite.R") biocLite("limma")
Or to install multiple packages from Bioconductor
source("http://bioconductor.org/biocLite.R") biocLite(c("GenomicFeatures", "AnnotationDbi"))
R-Forge
Unlike both CRAN and Bioconductor (which are essentially package repositories), R-Forge is an entire R package development platform. Package development is supported through a range of services including:
- version control (SVN) - allowing multiple collaborators to maintain current and historical versions of files by facilitating simultaneous editing, conflict resolution and rolling back
- daily package checking and building - so packages are always up to date
- bug tracking and feature request tools
- mailing lists and message boards
- full backup and archival system
Installing packages from R-Forge is the same as it is for CRAN, just that the path of the root repository needs to be specified with the repos= argument.
install.packages("lme4.0", repos="http://R-Forge.R-project.org")
Github
Github builds upon the philosophy of the development platform promoted by the Source Forge family (including R-Forge) by adding the ability to fork a project. Forking is when the direction of a project is split so that multiple new opportunities can be explored without jeopardizing the stability and integrity of the parent source. If the change in direction proves valuable, the project (package) can either become a new package or else feedback into the development of the original package.
Hadley Wickham has yet again come up with a set of outrageously useful tools (devtools package). This package is a set of functions that simplify (albeit slightly dictatorially) the processes of package authoring, building, releasing and installing. For now, we will concentrate on the latter feature.
In order to make use of this package to install packages from github, the devtools package must itself be installed. It is recommended that this install take place from CRAN (as outline above). Thereafter, the devtools package can be included in the search path and the install_github function used to retrieve and install a nominated package or packages from github.
install_github("ggplot2")
As described above, github is a development platform and therefore it is also a source of 'bleeding edge' development versions of packages. Whilst the development versions are less likely to be as stable or even as statistically rigorous as the final release versions, they do offer the very latest ideas and routines. They provide the very latest snapshot of where the developers are currently at.
Most of the time users only want the stable release versions of a package. However there are times when having the ability to try out new developments as they happen can be very rewarding. The dev_mode() function within the devtools package provides a switch that can be used to toggle your system in and out of development mode. When in development mode, installed packages are quarantined within a separate path (R-dev) to prevent them overriding or conflicting with the stable versions that are critical for your regular analyses.
# switch to development mode dev_mode(on=T) #install the development version of ggplot2 install_github("ggplot2") # use the development version of ggplot2 library(ggplot2) # switch development mode off dev_mode(on=F) # stable version of ggplot2 is now engaged
Manual download and install
Packages are made available on the various repositories in compressed form and differ between Windows, MacOSX and Linux versions. Those web repositories all have functionality for navigating or searching through the repositories for specific packages. The packages (compressed files) can be directly downloaded from these sites.
Additionally, some packages are not available on the various repositories and firewalls and proxies can sometimes prevent R from accessing the repositories directly. In these cases, packages must be manually downloaded and installed.
There are a number of ways to install a package that resides locally. Note, do not uncompress the packages.
- From the command line (outside of R).
R CMD INSTALL packagename
- Using the install.packages() function by specifying repos=NULL.
install.packages('packagename', repos=NULL)
- Via the Windows RGui, select the Install package(s) from local zip files... option of the Packages menu and select the compressed package.
Updating packages
An integral component of package management is being able to maintain an up to date system. Many packages are regularly updated so as to adopt new ideas and functionality. Indeed, it is the speed of functional evolution that sets R apart from most other statistical environments.
Along with the install.packages() function, there are three other functions to help manage and maintain the packages on your system.
- old.packages() compares the versions of packages you have installed with the versions of those packages available in the current repositories.
It tabulates the names, install paths and versions of old packages on your system.
old.packages()
old.packages(repos="http://R-Forge.R-project.org") #or even multiple repos old.packages(repos=c("http://cran.csiro.au","http://R-Forge.R-project.org"))
- new.packages() provides a tabulated list of all the packages on the repository that are either not in your local install, or else are of a newer version.
Note, with over 4000 packages available on CRAN, unless the repos= parameter is pointing to somewhere very specific (and with a narrow subset of packages) this function is rarely of much use.
new.packages()
- update.packages() downloads and installs packages for which newer versions of those packages identified as 'old' by the old.packages() function.
Just like old.packages(), alternative or multiple repositories can be specified.
update.packages() #or from alternative multiple repos update.packages(repos=c("http://cran.csiro.au","http://R-Forge.R-project.org"))
Reinstalling packages following re-installation of R
Just as development continues on packages, so to does development continue on the main base R system. Periodically, a new version of R comes out with a new set of features, performance enhancements, bug fixes and requirements. When a new stable version is released, some packages are also altered to reflect the changes. Hence, some new functionality can necessitate not only an update of a package, but also an update of the base R system.
Updating the entire system can be a lengthy and inconvenient process as not only does R need to be re-installed, typically all of the packages need to be re-installed or updated. The following steps can be used to reduce the pain by semi-automating the process and ensuring that no package is forgotten.
- Start R and change the current working directory to a convenient location (such as your home, desktop or downloads folder).
- Get a vector of all packages currently installed on your system
packages <- installed.packages()[,"Package"]
- Get a vector of packages installed on your system that are part of the base install.packages and exclude them from the packages vector
base <- installed.packages(priority="base")[,"Package"] packages <- packages[!packages %in% base]
- Get a vector of packages that are available across the desired repositories and compare it to your vector of packages to produce:
- A vector of packages that will need to be re-installed from the repositories. If the contriburl= argument is omitted,
then only the repositories in the current repos path (usually only CRAN) will be searched.
rep <- available.packages(contriburl=contrib.url(c("http://cran.csiro.au", "http://R-Forge.R-project.org")))[,"Package"] toGetFromRepos <- packages[packages %in% rep]
- A vector of packages that will have to be acquired and installed from other locations
toGetFromElse <- packages[!packages %in% rep] toGetFromElse
acepack akima assertthat AUC backports "acepack" "akima" "assertthat" "AUC" "backports" base64enc bayesplot biglm binGroup bitops "base64enc" "bayesplot" "biglm" "binGroup" "bitops" boot brms broom btergm caTools "boot" "brms" "broom" "btergm" "caTools" checkmate coda colourpicker commonmark covr "checkmate" "coda" "colourpicker" "commonmark" "covr" crayon curl DBI desc dplyr "crayon" "curl" "DBI" "desc" "dplyr" DT dtplyr dygraphs ergm ergm.count "DT" "dtplyr" "dygraphs" "ergm" "ergm.count" estimability evaluate evd flexmix forcats "estimability" "evaluate" "evd" "flexmix" "forcats" gam gamlss geepack GERGM GGally "gam" "gamlss" "geepack" "GERGM" "GGally" ggmap ggplot2 glmnet gridBase gtable "ggmap" "ggplot2" "glmnet" "gridBase" "gtable" gtools haven highr Hmisc hms "gtools" "haven" "highr" "Hmisc" "hms" HSAUR3 htmlTable htmlwidgets httpuv httr "HSAUR3" "htmlTable" "htmlwidgets" "httpuv" "httr" igraph inline irlba jpeg jsonlite "igraph" "inline" "irlba" "jpeg" "jsonlite" KFAS knitr lazyeval lfe lme4 "KFAS" "knitr" "lazyeval" "lfe" "lme4" lmtest loo lpSolve lubridate magrittr "lmtest" "loo" "lpSolve" "lubridate" "magrittr" mapdata mapproj maps markdown mclust "mapdata" "mapproj" "maps" "markdown" "mclust" memoise mgcv microbenchmark mime miniUI "memoise" "mgcv" "microbenchmark" "mime" "miniUI" misc3d mnormt modelr modeltools mstate "misc3d" "mnormt" "modelr" "modeltools" "mstate" muhaz multicool munsell ncdf4 networkDynamic "muhaz" "multicool" "munsell" "ncdf4" "networkDynamic" nlme nloptr NMF nycflights13 openssl "nlme" "nloptr" "NMF" "nycflights13" "openssl" orcutt packrat pbkrtest pcaPP permute "orcutt" "packrat" "pbkrtest" "pcaPP" "permute" PKI plogr plyr png poLCA "PKI" "plogr" "plyr" "png" "poLCA" praise prettyunits progress proto psych "praise" "prettyunits" "progress" "proto" "psych" purrr quadprog quantreg R6 raster "purrr" "quadprog" "quantreg" "R6" "raster" R.cache RColorBrewer RcppEigen RcppParallel RCurl "R.cache" "RColorBrewer" "RcppEigen" "RcppParallel" "RCurl" readr readxl RefManageR registry rem "readr" "readxl" "RefManageR" "registry" "rem" reshape reshape2 rex rgdal rgeos "reshape" "reshape2" "rex" "rgdal" "rgeos" rjson RJSONIO rmarkdown R.methodsS3 RMySQL "rjson" "RJSONIO" "rmarkdown" "R.methodsS3" "RMySQL" ROCR R.oo roxygen2 RPostgreSQL rprojroot "ROCR" "R.oo" "roxygen2" "RPostgreSQL" "rprojroot" rrcov R.rsp rsconnect RSQLite rstan "rrcov" "R.rsp" "rsconnect" "RSQLite" "rstan" rstanarm rstantools rstudioapi RUnit R.utils "rstanarm" "rstantools" "rstudioapi" "RUnit" "R.utils" rvest RWiener scales selectr shiny "rvest" "RWiener" "scales" "selectr" "shiny" shinyjs shinystan shinythemes slackr sna "shinyjs" "shinystan" "shinythemes" "slackr" "sna" sourcetools sp SparseM speedglm spTimer "sourcetools" "sp" "SparseM" "speedglm" "spTimer" StanHeaders statmod statnet statnet.common stringr "StanHeaders" "statmod" "statnet" "statnet.common" "stringr" survival tergm testthat threejs tibble "survival" "tergm" "testthat" "threejs" "tibble" tidyr tidyverse tnam trust tweedie "tidyr" "tidyverse" "tnam" "trust" "tweedie" viridis withr xergm xergm.common XML "viridis" "withr" "xergm" "xergm.common" "XML" xml2 xtable yaml boot class "xml2" "xtable" "yaml" "boot" "class" cluster codetools foreign KernSmooth MASS "cluster" "codetools" "foreign" "KernSmooth" "MASS" mgcv nlme nnet rpart spatial "mgcv" "nlme" "nnet" "rpart" "spatial" survival "survival"
- A vector of packages that will need to be re-installed from the repositories. If the contriburl= argument is omitted,
then only the repositories in the current repos path (usually only CRAN) will be searched.
- Save the vector of packages to be re-installed from the repositories and the vector of packages to be installed from elsewhere
save(toGetFromRepos, file="Rpackages") save(toGetFromElse, file="Otherpackages")
- Download and install the latest version of R
- Start R, preferably with administrator (root) privileges and make sure that the current working directory is pointing to the location that the Rpackages file was saved.
- Load the vector of packages to be re-installed from the repositories.
load(file="Rpackages") load(file="Otherpackages")
- Loop through a the vector of packages and install those that are not already on the system
for (p in setdiff(toGetFromRepos, installed.packages()[,"Package"])) { install.packages(p, repos=c("http://cran.csiro.au","http://R-Forge.R-project.org")) }
- Manually install those packages that do not reside in repositories (the vector toGetFromElse serves as a reminder of what these packages are).
The above steps are also useful when setting up R on an additional machine. It helps minimize incompatibilities between the machines.
The above steps could also be put into a shell script to automate the process even further.
Creating packages
Comming soon - based on devtools