Jump to main navigation


Tutorial 6 - The Grammar of Graphics in R (ggplot2)

25 Feb 2012

This Tutorial has been thrown together a little hastily and is therefore not very well organized - sorry! Graphical features are demonstrated either via tables of properties or as clickable graphics that reveal the required R code. Click on a graphic to reveal/toggle the source code or to navigate to an expanded section.

This tutorial is intended to be viewed sequentially. It begins with the basic ggplot framework and then progressively builds up more and more features as default elements are gradually replaced to yeild more customized graphics.

Having said that, I am going to start with a sort of showcase of graphics which should act as quick navigation to entire sections devoted to the broad series of graphs related to each of the featured graphics. I have intentionally titled each graph according to the main feature it encapsulates rather than any specific functions that are used to produce the features as often a single graphic requires a combination of features and thus functions. Furthermore, the grammar of graphics specifications are sufficiently unfamiliar to many that the relationships between the types of graphical features a researcher wishes to produce and the specific syntax required to achieve the desired result can be difficult to recognize.

Each graphic is intended to encapsulate a broad series of related graph types.

Basic plot types
Boxplots
Histograms
Density plots
Scatterplots
Line graphs
Smoothers
Trendlines
Bar charts
Stacked bar charts
Bar graphcs
Interaction plots
Scatterplot matrix
Heat maps
Contour maps
Additions
Segments
Confidence bands
Error bars
Horizontal lines
Vertical lines
Range bars
Axes rugs
Text plots
Customizations
Colours
Line types
Plotting symbols
Transparency
Sizes
Date and time
Layouts
facet_wrap
facet_grid
viewpots

The Grammar of Graphics

The Grammar of Graphics was first introduced/presented by Wilkinson and Wills (2006) as a new graphics philosophy that laid down a series of rules to govern the production of quantitative graphics. Essentially the proposed graphics infrastructure considers a graphic as comprising a plot (defined by a coordinate system, scales and panelling) over which one or more data layers are applied.

Each layer is defined as:

  • the data - a data frame
  • mapping specifications that establish the visual aesthetics (colour, line type and thickness, shapes etc) of each variable
  • statistical methods that determine how the data rows should be summarised (stat)
  • geometric instructions (geom) on how each summary should be represented (bar, line, point etc)
  • positional mechanism for dealing with overlapping data (position)
The visual aspects of all the graphical features are then governed by themes.

Following a very short example, the next section will largely concentrate on describing each of the above graphical components. Having then established the workings of these components, we can then put them together to yield specific graphics.

Hadley Wickham's interpretation of these principals in an R context is implimented via the ggplot2 package. In addition the following packages are also commonly used alongside ggplot so as to expand on the flexibility etc.

  • grid
  • gridExtra
  • scales
library(ggplot2)
library(grid)
library(gridExtra)
library(scales)

The following very simple graphic will be used to illustrate the above specification by implicitly stating many of the default specifications. It will use a cartesian coordinate system, continuous axes scales, a single facet (panel) and then define a single layer with a dataframe (BOD), with red points, identity (no summarisation) statistic visualised as a point geometric.

plotGgplot,
plot of chunk plotGgplot
 p <- ggplot() +
  coord_cartesian() + #cartesian coordinates
  scale_x_continuous() + #continuous x axis
  scale_y_continuous() + #continuous y axis
  #single layer
  layer( data=BOD, #data.frame
    mapping=aes(y=demand,x=Time),
    stat="identity", #use original data
    geom="point" #plot data as points
  )+
  layer( data=BOD, #data.frame
    mapping=aes(y=demand,x=Time),
    stat="identity", #use original data
    geom="line" #plot data as a line
  )
p #print the plot 

OR, by leaving out all the default stuff
p <- ggplot(data = BOD, map = aes(y = demand, x = Time)) +
    geom_point() + geom_line()
p

Note, the following important features of the grammar of graphics as implemented in R:

  • the order in which each of the above components in the first code snippet were added is unimportant. They each add additional information to the overall graphical object. The object itself is evaluated as a whole when it is printed.
  • multiple layers are laid down in the order that they appear in the statement
  • in the second code snippet (the shorter version), a layer is created for each of the two geoms
  • the data and mapping used by both geom_point() and geom_line are inherited from the main ggplot() function.
  • since layers are ordered, the points are drawn first and the line over the top

In an attempt to illustrate the use of ggplot for elegant graphics, we will drill down into each of the plot and layer specifications. Although the geoms and thus layers are amongst the last features to be constructed by the system, the data and aesthetic features of the data impact on how the coordinate system, scales and panelling work. Therefore, we will explore the geoms first.

Geometric objects - geom_ and stat_

Geometric objects (geoms) are visual representations of observations. For example, there is a geom to represent points based on a set of x,y coordinates. All graphics need at least one geom and each geom is mapped to its own layer. Multiple geoms can be added to a graphic and the order that they are added to the expression determines the order that their respective layer is constructed.

When a ggplot expression is being evaluated, geoms are coupled together with a stat_ function. This function is responsible for generating data appropriate for the geom. For example, the stat_boxplot is responsible for generating the quantiles, whiskers and outliers for the geom_boxplot function.

In addition to certain specific stat_ functions, all geoms can be coupled to a stat_identity function. In mathematical contexts, identity functions map each element to themselves - this essentially means that each element passes straight through the identity function unaltered. Coupling a geom to an identity function is useful when the characteristics of the data that you wish to represent are present in the data frame. For example, your dataframe may contain the x,y coordinates for a series of points and you wish for them to be used unaltered as the x,y coordinates on the graph. Moreover, your dataframe may contain pre-calculated information about the quantiles, whiskers and outliers and you wish these to be used in the construction of a boxplot (rather than have the internals of ggplot perform the calculations on raw data).

Since geom_ and stats_ functions are coupled together, a geometric representation can be expressed from either a geom_ function OR a stats_ function. That is, you either:

  • specify a geom_ function that itself calls a stat_ function to provide the data for the geom function..
    ggplot(CO2) + geom_smooth(aes(x = conc, y = uptake),
        stat = "smooth")
    
  • specify a stat_ function that itself calls a geom_ function to visually represent the data..
    ggplot(CO2) + stat_smooth(aes(x = conc, y = uptake),
        geom = "smooth")
    
It does not really make any difference which way around you do this. For the remainder of this tutorial, we will directly engage the geom_ function for all examples.

The geom_ functions all have numerous arguments, many of which are common to all geoms_.

  • data - the data frame containing the data. Typically this is inherited from the ggplot function.
  • mapping - the aesthetic mapping instructions. Through the aesthetic mapping the aesthetic visual characteristics of the geometric features can be controlled (such as colour, point sizes, shapes etc). The aesthetic mapping can be inherited from the ggplot function. Common aesthetic features (mapped via a aes function) include:
    • alpha - transparency
    • colour - colour of the geometric features
    • fill - fill colour of geometric features
    • linetype - type of lines used in geometric features (dotted, dashed, etc)
    • size - size of geometric features such as points or text
    • shape - shape of geometric features such as points
    • weight - weightings of values
  • stat - the stat_ function coupled to the geom_ function
  • position - the position adjustment for overlapping objects
    • identity - leave objects were they are
    • dodge - shift objects to the side to prevent overlapping
    • stack - stack objects on top of each other
    • fill - stack objects on top of each other and standardize each group to equal height

Currently, there are a large number of available geoms_ and stat_ functions within the ggplot system. This tutorial is still a work in progress and therefore does not include all of them - I have focused on the more commonly used ones.

In an attempt to break up the set of geoms_ and stat_ functions, I have somewhat arbitrarily divided them up into primary and secondary geometric features. Primary geometric features are those that could be viewed as graphics in their own right, whereas secondary geometric features are those that are added to other geometric features to provide additional information (but would rarely be considered a graphic in their own right).

Primary geometric objects

The following icon matrix provides navigation and an overview to the geometric features described in this section.
geom_bar
geom_bar
geom_bar
geom_bar
geom_boxplot
geom_density
geom_point
geom_line
geom_smooth
geom_smooth
geom_tile
geom_contour

geom_bar and stats_bin

geom_bar constructs barcharts and histograms. By default, the bins of each bar along with the associated bar heights are calculated by the stats_bin function. The following list describes the mapping aesthetic properties associated with geom_bar and stats_bin. The entries in bold are compulsory.
geom_barstat_bar
  • x - x axis value (categorical)
  • alpha - transparency
  • colour - colour of the lines
  • fill - colour of the bar
  • linetype - type of lines used to construct bar
  • size - symbol size for outliers
  • weight - weightings of values
  • x - a vector that is to be binned
  • y - optional y axis value (continuous)

The following table illustrates the first six rows of the diamonds dataset (comes with R) that will be used for the following examples.
carat cut color clarity depth table price x y z
1 0.23 Ideal E SI2 61.50 55.00 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.80 61.00 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.90 65.00 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.40 58.00 334 4.20 4.23 2.63
5 0.31 Good J SI2 63.30 58.00 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.80 57.00 336 3.94 3.96 2.48


FeaturegeomstatpositionAesthetic parameters / NotesExample
barchart _bar _bin stack x,y,size,linetype,colour,fill,alpha, weight

bar heights represent number of items in each level of the categorical vector
ggplot(diamonds) + geom_bar(aes(x = cut))
barchart _bar _bin stack Bar heights represent the number of items in each level of a categorical vector and stacked according to another categorical vector
ggplot(diamonds) + geom_bar(aes(x = cut,
    fill = clarity))
barchart _bar _bin dodge bar heights represent the number of items in each combination of levels of multiple categorical vectors displayed side by side
ggplot(diamonds) + geom_bar(aes(x = cut,
    fill = clarity), position = "dodge")
barchart _bar _identity stack bar heights represent value of y for each x
diamonds1 <- as.data.frame(table(diamonds$cut))
ggplot(diamonds1) + geom_bar(aes(x = Var1,
    y = Freq), stat = "identity")
bargraph _bar _summary stack bar heights represent mean y within each level of categorical vector
ggplot(diamonds) + geom_bar(aes(x = cut,
    y = carat), stat = "summary",
    fun.y = mean)
histogram _bar _bin stack bar heights represent counts within a binned continuous vector
ggplot(diamonds) + geom_bar(aes(x = carat))

id="boxplot"geom_boxplot and stat_boxplot

geom_boxplot constructs boxplots. The values of the various elements of the boxplot (quantiles, whiskers etc) are calculated by its main pairing function (stat_boxplot). The following list describes the mapping aesthetic properties associated with geom_boxplot. The entries in bold are compulsory. Note that boxplots are usually specified via the geom_boxplot function which will engage the stat_boxplot to calculate the quantiles, whiskers and outliers. Therefore, confusingly, when calling geom_boxplot, the compulsory paramters are actually those required by stat_boxplot (unless you indicated to use stat_identity).
geom_boxplotstat_boxplot
  • x - x axis value (categorical)
  • lower - value of the lower box line (25% percentile)
  • middle - value of the middle box line (50% percentile - median)
  • lower - value of the upper box line (75% percentile)
  • ymax - value of the upper whisker
  • ymax - value of the lower whisker
  • alpha - transparency
  • colour - colour of the lines
  • fill - colour of the boxplot
  • linetype - type of lines used to construct boxplot
  • shape - symbol shape for outliers
  • size - symbol size for outliers
  • weight - weightings of values
  • x - x axis value (categorical)
  • y - y axis value (continuous)

FeaturegeomstatpositionNotes, additional parametersExample
boxplot _boxplot _boxplot dodge Plot of quantiles, whiskers and outliers

  • outlier.colour
  • outlier.shape
  • outlier.size
  • notch - whether to include a notch or not
  • notchwidth - width of notch (fraction of box width)
ggplot(diamonds) + geom_boxplot(aes(x = "carat",
    y = carat))

geom_density and stat_density

geom_density constructs smooth density distributions from continuous vectors. The actual smoothed densities are calculated by its main pairing function (stat_density). The following list describes the mapping aesthetic properties associated with geom_density and stat_density. The entries in bold are compulsory. Note that density plots are usually specified via the geom_density function which will engage the stat_density. Therefore, confusingly, when calling geom_density, the compulsory paramaters are actually those required by stat_density (unless you indicated to use stat_identity).
geom_densitystat_density
  • x - x axis value (continuous)
  • y - y axis value (densities)
  • alpha - transparency
  • colour - colour of the lines
  • fill - colour of the density
  • linetype - type of lines used to construct density
  • shape - symbol shape for outliers
  • size - symbol size for outliers
  • weight - weightings of values
  • x - a continuous vector from which to create density distribution
  • fill - fill colour
  • y

FeaturegeomstatpositionNotes, additional parametersExample
density _density _density dodge Density plot of a distribution of a vector

  • adjust - smoothness
  • kernel - kernel density
  • trim - whether to trim densities to data range
ggplot(diamonds) + geom_density(aes(x = carat))

geom_point

geom_point draws points (scatterplot). Typically the stat used is stat_identity as we wish to use the values in two continuous vectors as the coordinates of each point. The following list describes the mapping aesthetic properties associated with geom_point. The entries in bold are compulsory.
geom_point
  • x - x axis value (continuous)
  • y - y axis value (densities)
  • alpha - transparency
  • colour - colour of the lines
  • fill - colour of the point
  • linetype - type of lines used to construct point
  • shape - symbol shape for outliers
  • size - symbol size for outliers
Note, it is possible to combine geom_point with other stats (such as stat_summary), so as to plot summaries of the data rather than raw data.
FeaturegeomstatpositionNotes, additional parametersExample
point _point _identity identity Scatterplot

ggplot(BOD) + geom_point(aes(x = Time,
    y = demand))
means point _point _summary identity Means plot

ggplot(CO2) + geom_point(aes(x = conc,
    y = uptake), stat = "summary",
    fun.y = mean)

geom_line

geom_line draws lines joining coordinates. Typically the stat used is stat_identity as we wish to use the values in two continuous vectors as the coordinates of each line segment. The following list describes the mapping aesthetic properties associated with geom_line. The entries in bold are compulsory.
geom_line
  • x - x axis value (continuous)
  • y - y axis value (densities)
  • alpha - transparency
  • colour - colour of the lines
  • fill - colour of the line
  • linetype - type of lines used to construct line
  • size - symbol size for outliers
Note, it is possible to combine geom_line with other stats (such as stat_summary), so as to plot summaries of the data rather than raw data.
FeaturegeomstatpositionNotes, additional parametersExample
line _line _identity identity Line plot

ggplot(BOD) + geom_line(aes(x = Time,
    y = demand))
means line _line _summary identity Means line plot

ggplot(CO2) + geom_line(aes(x = conc,
    y = uptake), stat = "summary",
    fun.y = mean)

geom_smooth and stat_smooth

geom_smooth draws smooths lines (and 95% confidence intervals) through data clouds. Typically the stat used is stat_smooth which in turn engages one of the available smoothing methods (e.g. lm, glm, gam, loess or rlm). The following list describes the mapping aesthetic properties associated with geom_smooth and stat_smooth. The entries in bold are compulsory.
geom_smoothstat_smooth
  • x - x axis value (continuous)
  • y - y axis value (densities)
  • alpha - transparency
  • colour - colour of the smooths
  • fill - colour of the smooth
  • linetype - type of smooths used to construct smooth
  • size - symbol size for outliers
  • weight - for weighting data
  • x - x axis value (continuous)
  • y - y axis value (densities)
stat_smooth also has the following optional arguments:
  • method - the smoothing method (function). One of "lm", "glm", "gam", "loess" or "rlm"
  • formula - the formula for the smoothing function, expressed relative to x and y. E.g. "y~x", "y~s(x)"
  • se - whether to display confidence intervals
  • fullrange - whether the fit should span the full range of the data
  • level - confidence level (e.g. 0.95)
  • n - number of points to evaluate smoother at

FeaturegeomstatpositionNotes, additional parametersExample
smooth _smooth _identity identity Linear smoother

ggplot(CO2) + geom_smooth(aes(x = conc,
    y = uptake), method = "lm")
Lowess smoother _smooth _stat identity Lowess smoother

ggplot(CO2) + geom_smooth(aes(x = conc,
    y = uptake), method = "loess")
Gam smoother _smooth _stat identity Cupic regression spline smoother

library(mgcv)
ggplot(CO2) + geom_smooth(aes(x = conc,
    y = uptake), method = "gam",
    formula = y ~ s(x, bs = "cr",
        k = 4))

geom_tile

geom_tile constructs heat maps given x,y coordinates and a z value to associate with the fill of each tile. The following list describes the mapping aesthetic properties associated with geom_tile and stat_tile. The entries in bold are compulsory.
geom_tile
  • x - x axis value (continuous)
  • y - y axis value (continuous)
  • alpha - transparency
  • colour - colour of the borders around tiles
  • fill - colour of the fill of tiles
  • linetype - type of lines to use as borders to each tile
  • size - line thickness
FeaturegeomstatpositionNotes, additional parametersExample
tile _identity _identity identity Heat map

library(reshape)
volcano.df <- melt(volcano, varnames = c("X",
    "Y"))
ggplot(volcano.df) + geom_tile(aes(x = X,
    y = Y, fill = value))

geom_contour and stat_contour

geom_contour constructs contour maps given x,y coordinates and a z value from which to calculate each contour. The following list describes the mapping aesthetic properties associated with geom_contour and stat_contour. The entries in bold are compulsory.
geom_contourstat_contour
  • x - x axis value (continuous)
  • y - y axis value (continuous)
  • alpha - transparency
  • colour - colour of the contour lines
  • linetype - line type of the contour lines
  • size - line thickness
  • weight
  • x - x axis value (continuous)
  • y - y axis value (continuous)
  • z - z axis value (continuous)
  • order
FeaturegeomstatpositionNotes, additional parametersExample
contour _identity _identity identity Heat map

library(reshape)
volcano.df <- melt(volcano, varnames = c("X",
    "Y"))
ggplot(volcano.df) + geom_contour(aes(x = X,
    y = Y, z = value))

Secondary geometric objects

geom_segment
geom_ribbon
geom_errorbar
geom_hline
geom_vline
geom_pointrange
geom_rug
geom_text

geom_segment

geom_segment draws segments joining coordinates. The following list describes the mapping aesthetic properties associated with geom_segment. The entries in bold are compulsory.
geom_segment
  • x - x coordinates for the start of lines
  • xend - x coordinates for the end of lines
  • y - y coordinates for the start of lines
  • yend - y coordinates for the end of lines
  • alpha - transparency
  • colour - colour of the segments
  • fill - colour of the segment
  • linetype - type of segments used to construct segment
  • size - symbol size for outliers
geom_segment also has the following optional arguments:
  • arrow - specification of how arrows should be constructed
  • lineend - style of the line end


FeaturegeomstatpositionNotes, additional parametersExample
segment _identity _identity identity Segments on a plot - useful for drawing lots of lines or arrows

BOD.lm <- lm(demand ~ Time, data = BOD)
BOD$fitted <- fitted(BOD.lm)
BOD$resid <- resid(BOD.lm)
ggplot(BOD) + geom_segment(aes(x = Time,
    y = demand, xend = Time, yend = fitted))
segment _identity _identity identity Segments on a plot - useful for drawing lots of lines or arrows

BOD.lm <- lm(demand ~ Time, data = BOD)
BOD$fitted <- fitted(BOD.lm)
BOD$resid <- resid(BOD.lm)
ggplot(BOD) + geom_segment(aes(x = Time,
    y = demand, xend = Time, yend = fitted),
    arrow = arrow(length = unit(0.5,
        "cm")))

geom_ribbon

geom_ribbon draws ribbons based on upper and lower levels of y associated with each level of x. The following list describes the mapping aesthetic properties associated with geom_ribbon. The entries in bold are compulsory.
geom_ribbon
  • x - x coordinates
  • ymin - y coordinates of the lower limits
  • ymax - y coordinates of the upper limits
  • alpha - transparency
  • colour - colour of the ribbons
  • fill - colour of the ribbon
  • linetype - type of lines used to construct the borders of the ribbon
  • size - thickness of the lines used to border the ribbon

FeaturegeomstatpositionNotes, additional parametersExample
ribbon _identity _identity identity Ribbons on a plot - useful for depicting confidence envelopes

BOD.lm <- lm(demand ~ Time, data = BOD)
xs <- seq(min(BOD$Time), max(BOD$Time),
    l = 100)
pred <- data.frame(predict(BOD.lm,
    newdata = data.frame(Time = xs),
    interval = "confidence"))
pred$x <- xs
ggplot(pred) + geom_ribbon(aes(x = x,
    ymin = lwr, ymax = upr))

geom_errorbar

geom_errorbar draws errorbars based on upper and lower levels of y associated with each level of x. The following list describes the mapping aesthetic properties associated with geom_errorbar. The entries in bold are compulsory.
geom_errorbar
  • x - x coordinates
  • ymin - y coordinates of the lower limits
  • ymax - y coordinates of the upper limits
  • alpha - transparency
  • colour - colour of the errorbars
  • fill - colour of the errorbar
  • linetype - type of lines used to construct the borders of the errorbar
  • size - thickness of the lines used to border the errorbar
  • width - width of the errorbars

FeaturegeomstatpositionNotes, additional parametersExample
errorbar _identity _identity identity Error bars on a plot - useful for adding to means plots etc

library(plyr)
warpbreaks.df <- ddply(warpbreaks,
    ~wool, function(x) {
        Hmisc:::smean.cl.boot(x$breaks)
    })
ggplot(warpbreaks.df) + geom_errorbar(aes(x = wool,
    ymin = Lower, ymax = Upper))
errorbar _identity _summary identity Error bars on a plot - useful for adding to means plots etc

ggplot(warpbreaks) + geom_errorbar(aes(x = wool,
    y = breaks), stat = "summary",
    fun.data = "mean_cl_boot")
library(ggmap) hdf <- get_map(location=c(lon=146.8,lat=-19.20), zoom=12, maptype="satellite") ggmap(hdf, extent = 'normal')

Coordinate system - coord

The coordinate system controls the nature and scale of the axes.
SystemParametersExample
Regular cartesian coordinate system

coord_cartesian


xlim - x limits
ylim - y limits
ggplot(BOD) + coord_cartesian() +
    geom_line(aes(y = demand, x = Time))
Polar coordinate system

coord_polar


theta="x" - angle variable
start=0 - initial angle from 12 oclock
direction=1 - 1=clockwise, -1=anticlockwise
ggplot(BOD) + coord_polar() + geom_line(aes(y = demand,
    x = Time))
Flipped the axes

coord_flip


xlim=NULL - y limits
ylim=NULL - x limits
ggplot(BOD) + coord_flip() + geom_line(aes(y = demand,
    x = Time))
Fix the ratio of axes dimesions

coord_fixed


ratio=1 - y/x ratio
xlim=NULL - x limits
ylim=NULL - y limits
ggplot(BOD) + coord_fixed(ratio = 0.25) +
    geom_line(aes(y = demand, x = Time))
1:1 (equal) ratio of axes dimesions
same as coord_fixed(ratio=1)

coord_equal



xlim=NULL - x limits
ylim=NULL - y limits
ggplot(BOD) + coord_equal() + geom_line(aes(y = demand,
    x = Time))
Map projection coordinate system

coord_map



projection="mercator" - mapping projection
orientation=c(90,0,mean(range(x))) - map orientation
# get high resolution map of
# Australia (and islands) data
library(maps)
library(mapdata)
aus <- map_data("worldHires", region = "Australia")
# Orthographic coordinates
ggplot(aus, aes(x = long, y = lat,
    group = group)) + coord_map("ortho",
    orientation = c(-20, 125, 23.5)) +
    geom_polygon()

Altering the axes scales via the coordinate system

Modifying scales with coords affects the zoom on the graph. That is, it defines the extent and nature of the axes coordinates. By contrast, altering limits via scale_ routines will alter the scope of data included in a manner analogous to operating on a subset of the data.

Default scaleScale via coord_ (Zoom)Scale via scale_
# Default scales
ggplot(BOD, aes(y = demand, x = Time)) + geom_point() + geom_smooth(method = "lm")
# Zoom on x-axis
ggplot(BOD, aes(y = demand, x = Time)) + coord_cartesian(xlim = c(2, 6)) + geom_point() + geom_smooth(method = "lm")
# Scale (subset) the x data
ggplot(BOD, aes(y = demand, x = Time)) + scale_x_continuous(limits = c(2, 6)) + geom_point() + geom_smooth(method = "lm")

In addition to altering the zoom of the axes, axes (coordinate system) scales can be transformed to other scales via the coord_trans function. Transformations of the coordinate system take place after statistics have been calculated and geoms derived. Therefore the shape of geoms are altered.

The coord_trans() function has the following argments:

xtrans
a transformer that will operate on the x scale
ytrans
a transformer that will operate on the y scale
limx
limits of the x axis
limy
limits of the y axis

A transformer is a function that defines a transformation along with its inverse and rules on how to generate pretty breaks (tick marks) and their labels

To illustrate the distinction between coord_trans and scale_, we will generate some curvilinear data.
set.seed(1)
n <- 50
dat <- data.frame(x = exp((1:n + rnorm(n, sd = 2))/10), y = 1:n + rnorm(n, sd = 2))
Linear scalescoord_transscales_
Linear spacing of axis ticks Log10 spacing of axis ticks on a linear scale Linear spacing of axis ticks on a log10 scale
ggplot(dat, aes(y = y, x = x)) + geom_point()
ggplot(dat, aes(y = y, x = x)) + geom_point() + coord_trans(xtrans = log10_trans())
ggplot(dat, aes(y = y, x = x)) + geom_point() + scale_x_continuous(trans = log10_trans())
Linear trend applied to curved data Linear trend applied to curved data, then bent by coordinates rescaling Linear trend applied to scaled (linear) data
ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm")
ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + coord_trans(xtrans = log10_trans())
ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + scale_x_continuous(trans = log10_trans())

Transformers

trans_new

The trans_new function itself defines and returns a list structure comprising;

name
A name to be given to the transformation
transform
a function (or name of a function) that performs the transformation
inverse
a function (or name of a function) that performs the inverse of the transformation
breaks
a function that generates tick breaks. Operates on the raw data
format
a function that formats labels for the breaks.
domain
the range over which the transformation is valid

To illustrate the trans_new function, lets define a natural log (ln) transformer to apply to our artificial data.

ggplot(dat, aes(y=y,x=x)) + geom_point() +
  geom_smooth(method="lm") +
  scale_x_continuous(trans=trans_new(name="ln",
                                transform=function(x) log(x),
                                inverse=function(x) exp(x),
                                breaks=function(x) pretty(x),
                                domain=c(1e-100,Inf)))
## Error: 'from' cannot be NA, NaN or infinite
ln_trans <- function() {
    name <- "ln"
        trans <- function(x) log(x)
    inv <- function(x) exp(x)
        breaks <- function(x) pretty(x)
    format <- function(x) x
    domain <- c(1e-100,Inf)
    trans_new(name,transform=trans,inverse=inv,
              breaks=breaks, domain=domain)
}

ggplot(dat, aes(y=y,x=x)) + geom_point()+
  geom_smooth(method="lm")+
  scale_x_continuous(trans=ln_trans())
## Error: 'from' cannot be NA, NaN or infinite
coord_trans(xtrans = "identity", ytrans = "identity",limx = NULL, limy = NULL)

Finer control of the transformation can be exersized. Consider the following examples using the same dataset.

In order to be able to use trans_new effectively, it is necessary to understand the data parsed to each of the functions within the transformer. In the following demonstration, I have placed print statements within each of the functions so as to illustrate the sequence in which the functions are called (relative to each other and other external functions) as well as the input data of each function. Note for this demonstration, I have ommitted the smoother as it would also result in calls to these functions and therefore compound the sequencing.

p<-ggplot(dat, aes(y=y,x=x)) + geom_point()+# + geom_smooth(method="lm") +
  scale_x_continuous(trans=trans_new(name="",transform=function(x) {
     cat("**Tranform begin**\n");
     print(x);
     log10(x);
         },
                                inverse=function(x) {
     cat("**Inverse**\n");
     print(x);
     10^(x);
     },
                                breaks=function(x) {
         cat("**Breaks**\n");
     print(x);
         pretty(x);
     },
                            format=function(x) {cat("**Format**\n");print(x);x;},
                                domain=c(1e-100,Inf)))
p+ theme(plot.background = element_rect(fill = "transparent",colour = NA))
**Tranform begin**
 [1]   0.975   1.267   1.142   2.052   1.761
 [6]   1.546   2.220   2.580   2.760   2.557
[11]   4.065   3.589   3.241   2.604   5.612
[16]   4.909   5.456   7.307   7.879   8.321
[21]   9.814  10.553  10.124   7.405  13.790
[26]  13.313  14.423  12.254  16.517  21.837
[31]  29.129  24.033  29.298  29.643  25.143
[36]  33.683  37.380  44.174  61.560  63.601
[41]  58.387  63.391  84.723  91.043  78.433
[46]  86.358 118.264 141.699 131.306 177.013
**Inverse**
[1] -0.1239  2.3610
**Breaks**
[1]   0.7517 229.5905
**Tranform begin**
[1]   0.7517 229.5905
**Tranform begin**
[1]   0  50 100 150 200 250
**Inverse**
[1]  -Inf 1.699 2.000 2.176 2.301    NA
**Format**
[1]   0  50 100 150 200  NA
Error: 'from' cannot be NA, NaN or infinite
The sequence is as follows;
  1. The first call to the **Transform** function is parsed the raw data
  2. The first call to the **Inverse** function is parsed the computed axes limits (on the log10 scale). These originate in another part of the ggplot engine. Following the action of the transformation function, other functions determine the limits of the axes based on the transformed data as well as the nominated axis expansion factor (places a buffers beyond the data such that geoms do not overlapp axes). The inverse function then converts these limits into limits in the original raw data scale.
  3. The first call to the **Breaks** function is parsed the axes limits on the scale of the raw data and defines the spacing of axes tick marks
  4. The second call to the **Transform** function takes the axis limits and rescales them into the log10 scale
  5. The third call to the **Transform** function takes the axis tick mark spacing from **Breaks** and rescales them into the log10 scale
  6. The second call to the **Inverse** function takes the axis tick marks spacing on the log10 scale and rescales into the scale of the raw data
  7. Finally, the **Format** function is used to define the labels to be applied to the tick marks
Axes in the scale of observations
Axes in the scale of logarithms
*_trans transformers

The _trans family of transformers are convienient wrappers for the trans_new function.

TransformerDesciption
asn_trans() Arc-sin square-root transformation (of proportions/percentages).
atanh_trans() Arc-tangent transformation
boxcox_trans(p) Box-Cox power transformation
When the power exponent (p) is equal to 0, values are logged
For exponents other than zero, 1 is subtracted from the value are raised to the power of the exponent and this is then divided by the exponent.
date_trans
exp_trans
identity_trans
log10_trans
log1p_trans
log2_trans
log_trans
logit_trans
probability_trans
probit_trans
reciprocal_trans
reverse_trans
sqrt_trans
time_trans
Transform axes scale (logs)1:1 axes scales
# log10 axes scales
ggplot(BOD) + coord_trans(xtrans = "log10", ytrans = "log10") +
    geom_line(aes(y = demand, x = Time))

Modifying scales with coords affects the zoom on the graph. That is, it defines the extent and nature of the axes coordinates. By contrast, altering limits via scale_ routines will alter the scope of data included in a manner analogous to operating on a subset of the data.

Scales

The idea of scales is that you present the plotting engine with data or characteristics in one scale and use the various scale_ functions to convert those data into another scale. In the grammar of graphics, scales are synonymous for units of data, colors, shapes, sizes etc of plotting features and the axes and guides (legends) provide a visual cue for what the scales are. For example;
  • you might include data that ranges from 10 to 20 units, yet you wish to produce a plot that zooms in on the range 12-16.
  • you have presented grouped data (data with multiple trends) and instructed the graphing engine to assign different colour codes to each trend. You can then define a colour scale to adjust the exact colours rendered.
  • similarly, you might have indicated how plotting symbol shape and size are to be distinguished in your data set. You can then assign scales that define the exact shapes and symbol sizes rendered.

Technically, scales determine how attributes of the data are mapped into aesthetic geom properties. The majority of geom's (geometric objects) have the following aesthetic properties:

  • x - the x position (coordinates) of the geom
  • y - the y position (coordinates) of the geom
  • size - the size of the geom (e.g. the size of a point)
  • shape - the shape of the geom
  • linetype - the type of line associated with the geom's outline (solid, dashed etc)
  • colour - the colour of the geom's outline (note the English spelling of the word colour)
  • fill - the colour of the geom's fill
  • alpha - the transparency of the geom (0=transparent, through to 1=opaque)

In turn, each of these properties are mapped to a scale - the defaults of which are automatically selected according to what is appropriate for the sort of data. For example, data can be on a continuous or discrete (categorical) scale. Most data type have the following possible scales for each of the above properties:

  • _continuous - when you want the scale increments (such as the different point sizes, colours etc) to be determined from a continuous vector in your data frame.
  • _discrete - when you want the scale increments (such as the different point sizes colours etc) to be determined from a categorical vector in your data frame.
  • _manual - is a variation on _discrete and is used when you wish to manually indicate the characteristic of each increment. You need to provide as many values as there are levels of your discrete vector.
  • _identity - is another variation on _discrete and is used when you wish for the values in your categorical vector to be used un-scaled as the characteristics of the data. For example, your data frame might contain a vector of colour names or point sizes.

Some properties, such as colour also have additional scales that are specific to the characteristic. The scales effect not only the characteristics of the geoms, they also effect the guides (legends) that accompany the geoms.

Scaling functions comprise the prefix scale_, followed by the name of an aesthetic property and suffixed by the type of scale. Hence a function to manually define a colour scale would be scale_colour_manual.

All scales have the following arguments available:
  • name - a title applied to the scale. In the case of scales for x and y (the x,y coordinates of geoms), the name is the axis title. For all other scales, the name is the title of the guide (legend).
  • breaks - the increments on the guide. For scale_x_ and scale_y_, breaks are the axis tick locations. For all other scales, the breaks indicate the increments of the characteristic in the legend (e.g. how many point shapes are featured in the legend).
  • labels - the labels given to the increments on the guide. For scale_x_ and scale_y_, labels are the axis tick labels. For all other scales, the labels are the labels given to items in the legend.
  • limits - the span/range of data represented in the scale. Note, if the range is inside the range of the data, the data are sub-setted.
  • trans - scale transformations applied - obviously this is only relevant to scales that are associated with continuous data.

Scaling the x and y values (scale_x_)

The scale_x_ and scale_y_ scales control the x and y axes and in addition to the common arguments listed above, the following optional arguments available for specific scales:
  • expand - a vector of length two that indicates multiplicative and additive constants used to expand the axes away from the data thereby ensuring that geoms do not intersect with the axes.
  • minor_breaks - the increments for the minor breaks along the axis. The minor breaks have a grid line yet no tick marks or labels.
scale_x_continuousscale_x_continuousscale_x_continuous
linear scaling linear with nice title linear with more space
plot of chunk plotGgplotScaleXContinuous
plot of chunk plotGgplotScaleXContinuous1
plot of chunk plotGgplotScaleXContinuous2
# Linear axes scales with altered axis title
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() +
    scale_x_continuous(name = "CO2 conc")
# Linear axes scales with more complex title
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() +
    scale_x_continuous(name = expression(paste("Ambient ",
        CO[2], " concentration (mg/l)", sep = "")))
# Linear axes scales with more space along the x
# axis
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() +
    scale_x_continuous(name = "CO2 conc", expand = c(0,
        200))
scale_x_log10scale_x_sqrtscale_x_reverse
Log10 scale Square-root scale Reverse scale
plot of chunk plotGgplotScaleXLog
Shortcut for
scale_x_continuous(trans=
log10_trans())
plot of chunk plotGgplotScaleXSqrt
Shortcut for
scale_x_continuous(trans=
sqrt_trans())
plot of chunk plotGgplotScaleXRev
# log10 axes scales
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() +
    scale_x_log10(name = "CO2 conc", breaks = as.vector(c(1,
        2, 5, 10) %o% 10^(-1:2)))
# square-root transformation
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() +
    scale_x_sqrt(name = "CO2 conc")
# reverse the data
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() +
    scale_x_reverse(name = "CO2 conc")
scale_x_datescale_x_datetimescale_x_discrete
plot of chunk plotGgplotScaleXDate
For more info on date breaks see date_breaks
For more date formats see strptime
plot of chunk plotGgplotScaleXPOSIX
For more info on date breaks see date_breaks
For more date formats see strptime
plot of chunk plotGgplotScaleXDiscrete
# Date format
library(scales)
CO2$Date <- as.Date(paste(2000 + as.numeric(as.factor(CO2$conc)),
    "-01-01", sep = ""))
ggplot(CO2, aes(y = uptake, x = Date)) + geom_point() +
    scale_x_date(name = "Year", breaks = "2 years",
        minor_breaks = "6 month", labels = date_format("%Y"))
# POSIX format
library(scales)
CO2$DateTime <- as.POSIXct(paste(2000, "-0", as.numeric(as.factor(CO2$conc)),
    "-01 09:00:00", sep = ""))
ggplot(CO2, aes(y = uptake, x = DateTime)) + geom_point() +
    scale_x_datetime(name = "Time (days)", breaks = "2 months",
        minor_breaks = "1 months", labels = date_format("%b"))
# categorical axis
ggplot(CO2, aes(y = uptake, x = Treatment)) + geom_point() +
    scale_x_discrete(name = "Treatment")

Scaling the size of geoms (scale_size_)

The scale_size_ scales control the size of geoms (such as the size of points) and in addition to the common scale arguments, the following optional arguments available:
  • range - the minimum and maximum size
  • values - the specific sizes to use (for _manual scale)
  • guide - whether to include a guide and what sort of guide to include (e.g. "legend")
scale_size_continuousscale_size_discretescale_size_manual
Scale the geoms according to a continuous vector Scale the geoms according to a categorical vector Manually determine the size of geoms
plot of chunk plotGgplotScaleSizeContinuous
range - minimum and maximum geom size
plot of chunk plotGgplotScaleSizeDiscrete
plot of chunk plotGgplotScaleSizeManual
values - a set of values to use for sizes
# size determined by continuous covariate
set.seed(123)
CO2$cv <- runif(nrow(CO2), 10, 50)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(size = cv)) +
    scale_size_continuous(name = "Temperature")
# Discrete sizes ranging in size from 2 to 4
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(size = Type)) +
    scale_size_discrete(name = "Type", range = c(2,
        4))
# Manual sizes of exactly 2 and 4
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(size = Type)) +
    scale_size_manual(name = "Type", values = c(2,
        4))
scale_size_identity
Size the geoms according to the values of a continuous vector (don't scale)
plot of chunk plotGgplotScaleSizeIdentity
guide - whether to include a guide (legend)
# Sizes provided by a covariate
set.seed(123)
CO2$Count <- runif(nrow(CO2), 0, 10)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(size = Count)) +
    scale_size_identity(name = "Type", guide = "legend")

Scaling the shape of geoms (scale_shape_)

The scale_shape_ scales control the shape of geoms (such as the shape of the plotting point) an in addition to all of the regular arguments, the following optional arguments are available:
  • solid - whether the shapes should be solid (TRUE) or outlined (FALSE)
scale_shape_discretescale_shape_manualscale_shape_identity
Geom shapes determined (scaled) by categorical variable Geom shapes determined (scaled) manually Geom shapes determined by categorical variable (no scaling)
plot of chunk plotGgplotScaleShapeDiscrete
plot of chunk plotGgplotScaleShapeManual
values - a set of values (or shape names) to use for shapes
plot of chunk plotGgplotScaleShapeIdentity
# Discrete shapes determined by the combination of
# Type and Treatment The items in the guide are
# then rearranged and re-labelled
CO2$Comb <- interaction(CO2$Type, CO2$Treatment)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(shape = Comb)) +
    scale_shape_discrete(name = "Type", breaks = c("Quebec.nonchilled",
        "Quebec.chilled", "Mississippi.nonchilled",
        "Mississippi.chilled"), labels = c("Quebec non-chilled",
        "Quebec chilled", "Miss. non-chilled", "Miss. chilled"))
# Manual shapes
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(shape = Treatment),
    size = 2) + scale_shape_manual(name = "Treatment",
    values = c(16, 21))
# Identity shapes
set.seed(123)
CO2$Count <- runif(nrow(CO2), 0, 10)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(shape = Count)) +
    scale_shape_identity(name = "Species", guide = "legend")

Scaling the linetype associated with geoms (scale_linetype_)

The scale_size_ scales control the type of lines used in geoms and have the following additional optional arguments available:
  • values - values supplied to manually determine the line types
scale_linetype_discretescale_linetype_manualscale_linetype_identity
Geom linetypes determined (scaled) by categorical variable Geom linetypes determined (scaled) manually Geom linetypes determined by categorical variable (no scaling)
plot of chunk plotGgplotScaleLinetypeDiscrete
plot of chunk plotGgplotScaleLinetypeManual
values - a set of values (or linetype names) to use for linetypes
plot of chunk plotGgplotScaleLinetypeIdentity
# Discrete shapes determined by the combination of
# Type and Treatment The items in the guide are
# then rearranged and re-labelled
CO2$Comb <- interaction(CO2$Type, CO2$Treatment)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(linetype = Comb)) +
    scale_linetype_discrete(name = "Type", breaks = c("Quebec.nonchilled",
        "Quebec.chilled", "Mississippi.nonchilled",
        "Mississippi.chilled"), labels = c("Quebec non-chilled",
        "Quebec chilled", "Miss. non-chilled", "Miss. chilled"))
# Manual linetypes
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(linetype = Treatment)) +
    scale_linetype_manual(name = "Treatment", values = c("dashed",
        "dotted"))
# Identity linetypes
CO2$Lines <- factor(CO2$Treatment, levels = c("nonchilled",
    "chilled"), labels = c("dotted", "dashed"))
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(linetype = Lines)) +
    scale_linetype_identity(name = "Temperature", guide = "legend",
        breaks = c("dotted", "dashed"), labels = c("Low",
            "High"))

Scaling the colour (or fill) associated with geoms (scale_colour_ & scale_fill_)

The scale_size_ scales control the colour of geoms and have the following additional optional arguments available:
  • low - colour of low end of the colour spectrum
  • high - colour of high end of the colour spectrum
  • guide - what sort of legend (e.g. colorbar)
scale_colour_continuousscale_colour_gradientscale_colour_gradient2
Geom colours determined (scaled) by continuous variable Geom colours determined (scaled) palette Geom colours determined by a different palette
plot of chunk plotGgplotScaleColourContinuous
plot of chunk plotGgplotScaleColourGradient
plot of chunk plotGgplotScaleColourGradient2
# colour determined by continuous covariate
set.seed(123)
CO2$cv <- runif(nrow(CO2), 10, 50)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = cv)) +
    scale_colour_continuous(name = "Temperature", low = "blue",
        high = "red")
# colour determined by continuous covariate
set.seed(123)
CO2$cv <- runif(nrow(CO2), 10, 50)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = cv)) +
    scale_colour_gradient(name = "Temperature")
# colour determined by continuous covariate
set.seed(123)
CO2$cv <- runif(nrow(CO2), 10, 50)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = cv)) +
    scale_colour_gradient2(name = "Temperature")
scale_colour_gradientn
Geom colours determined (scaled) a specific palette
plot of chunk plotGgplotScaleColourGradientn
# colour determined by continuous covariate use a
# predefined gradient based colour palette
set.seed(123)
CO2$cv <- runif(nrow(CO2), 10, 50)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = cv)) +
    scale_colour_gradientn(name = "Temperature", colours = terrain.colors(5))
scale_colour_huescale_colour_greyscale_colour_brewer
Evenly spaced geom colours determined (scaled) by hue Geom colours determined (scaled) palette Geom colours determined by a different palette
plot of chunk plotGgplotScaleColourHue
plot of chunk plotGgplotScaleColourGrey
plot of chunk plotGgplotScaleColourBrewer
See the color brewer site for more info
# Discrete colours for hue
set.seed(123)
CO2$cv <- runif(nrow(CO2), 0, 100)
CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100),
    labels = c("Low", "Medium", "High"))
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = Temp)) +
    scale_colour_hue(name = "Temperature", l = 80,
        c = 130)
# Discrete colours
set.seed(123)
CO2$cv <- runif(nrow(CO2), 0, 100)
CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100),
    labels = c("Low", "Medium", "High"))
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = Temp)) +
    scale_colour_grey(name = "Temperature", start = 0.2,
        end = 0.8)
# Discrete colours selected from a colour brewer
# palette it automatically knows how many colours
# are required
set.seed(123)
CO2$cv <- runif(nrow(CO2), 0, 100)
CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100),
    labels = c("Low", "Medium", "High"))
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = Temp)) +
    scale_colour_brewer(name = "Temperature", type = "seq",
        palette = "Reds")
scale_colour_manualscale_colour_identity
Geom colours determined (scaled) a specific palette
plot of chunk plotGgplotScaleColourManual
plot of chunk plotGgplotScaleColourIdentity
# Manual colours
set.seed(123)
CO2$cv <- runif(nrow(CO2), 0, 100)
CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100),
    labels = c("Low", "Medium", "High"))
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = Temp)) +
    scale_colour_manual(name = "Temperature", values = c("red",
        "#00AA00", 1))
# identity colours
set.seed(123)
CO2$cv <- runif(nrow(CO2), 0, 100)
CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100),
    labels = c("red", "#00AA00", 1))
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(colour = Temp)) +
    scale_colour_identity(name = "Temperature", guide = "legend",
        labels = c("Low", "Medium", "High"))

Scaling the alpha level of colour associated with geoms (scale_alpha_)

The scale_alpha_ scales control the transparency of geoms and have the following additional optional arguments available:
  • range - the alpha range (0,1)
  • values - alpha values between 0 and 1
  • guide - what sort of legend (e.g. colorbar)
scale_alpha_continuousscale_alpha_discretescale_alpha_manual
Evenly spaced geom alphas determined (scaled) by continuous Geom alphas determined (scaled) palette Geom alphas determined by a different palette
plot of chunk plotGgplotScaleAlphaContinuous
plot of chunk plotGgplotScaleAlphaDiscrete
plot of chunk plotGgplotScaleAlphaManual
# colour determined by continuous covariate
set.seed(123)
CO2$cv <- runif(nrow(CO2), 10, 50)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(alpha = cv)) +
    scale_alpha_continuous(name = "Temperature", range = c(0.3,
        1))
# Discrete alphas
set.seed(123)
CO2$cv <- runif(nrow(CO2), 0, 100)
CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100),
    labels = c("Low", "Medium", "High"))
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(alpha = Temp)) +
    scale_alpha_discrete(name = "Temperature")
# Manual alphas
set.seed(123)
CO2$cv <- runif(nrow(CO2), 0, 100)
CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100),
    labels = c("Low", "Medium", "High"))
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(alpha = Temp)) +
    scale_alpha_manual(name = "Temperature", values = c(0.3,
        0.6, 0.95))
scale_alpha_identity
Geom alphas determined (scaled) a specific palette
plot of chunk plotGgplotScaleAlphaIdentity
# Identity alphas
set.seed(123)
CO2$Alpha <- runif(nrow(CO2), 0, 1)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(alpha = Alpha)) +
    scale_alpha_identity(name = "Temperature")

Facets (panels)

Faceting splits the data up into a matrix of panels on the basis of one or more categorical vectors. Since facets display subsets of the data, they are very useful for examining trends in hierarchical designs.

There are two faceting function, that reflect two alternative approaches:

  • facet_wrap(~cell) - creates a set of panels based on a factor and wraps the panels into a 2-d matrix. cell represents a categorical vector or set of categorical vectors
  • facet_wrap(row~column) - creates a set of panels based on a factor and wraps the panels into a 2-d matrix. row and column represents the categorical vectors used to define the rows and columns of the matrix respectively

facet_wrap
facet_grid
The following list describes the mapping aesthetic properties associated with facet_wrap and facet_grid functions. The entries in bold are compulsory.
facet_wrapfacet_grid
  • facets - formula specifying faceting variables to use in faceting
  • nrow - number of rows
  • ncol - number of columns
  • scales - should all scaled be fixed or free
    • "fixed" - (default) all scales the same
    • "free" - all scales free
    • "free_x" - all x-axis scales free
    • "free_y" - all y-axis scales free
  • as.table - if TRUE, laid out from top left to bottom right, if FALSE: bottom left to top right
  • drop - drop factor combinations that lack data
  • facets - formula specifying faceting variables to use in faceting
  • margins - whether to include marginal trends
  • scales - should all scaled be fixed or free
    • "fixed" - (default) all scales the same
    • "free" - all scales free
    • "free_x" - all x-axis scales free
    • "free_y" - all y-axis scales free
  • space- should all panels take up the same space
    • "fixed" - (default) all panels take up the same space
    • "free" - panel heights and widths vary
    • "free_x" - panel widths vary
    • "free_y" - panel heights vary
  • labeller - a function used to label the panel strips
  • as.table - if TRUE, laid out from top left to bottom right, if FALSE: bottom left to top right
  • drop - drop factor combinations that lack data
FacetNotes, additional parametersExample
_wrap Matrix of panels split by a single categorical vector

ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + facet_wrap(~Plant)
_wrap Matrix of panels split by a single categorical vector with different y-axis scale range for each panel
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + facet_wrap(~Plant, scales = "free_y")
_grid Matrix of panels split by a single categorical vector with different y-axis scale range for each panel
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + facet_grid(Type ~ Treatment)
_grid Matrix of panels split by a single categorical vector with different y-axis scale range for each panel
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + facet_grid(Type ~ Treatment, scales = "free_y")

Themes

Themes govern the overall style of the graphic. In particular, they control:

  • the look and positioning of the axes (and their ticks, titles and labels)
  • the look and positioning of the legends (size,alignment, font, direction)
  • the look of plots (spacing and titles)
  • the look of panels (background, grid lines)
  • the look of panels strips (background, alignment, font)

ThemeNotes, additional parametersExample
_bw Black and white theme
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + theme_bw()
_classic Classic theme
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + theme_classic()
_grey Grey theme
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + theme_grey()
_minimal Minimal theme
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + theme_minimal()

Along with these pre-fabricated themes, it is possible to create your own theme. This is done via the theme() function. Each themable element comprises of either a line, rectangle or text. Therefore, they can all be modified via one of the following functions:

  • element_blank() - remove the element
  • element_line() - set the properties of a line
  • element_rect() - set the properties of a rectangle
  • element_text() - set the properties of text

library(gridExtra)
ggplot(CO2, aes(y=uptake, x=conc)) + geom_smooth(aes(colour=Type)) + geom_point() +
theme(panel.grid.major = element_blank(), # no major grid lines
  panel.grid.minor = element_blank(), # no minor grid lines
  panel.background = element_blank(), # no background
  panel.border = element_blank(), # no plot border
  axis.title.y=element_text(size=15, vjust=0,angle=90), # y-axis title
  axis.text.y=element_text(size=12), # y-axis labels
  axis.title.x=element_text(size=15, vjust=-2), # x-axis title
  axis.text.x=element_text(size=12), # x-axis labels
  axis.line = element_line(),
  legend.position=c(1,0),
  legend.justification=c(1,0),
  plot.margin=unit(c(0.5,0.5,2,2),"lines")) # plot margins
plot of chunk plotGgplotThemeUser

Examples

Exploring distributions

Boxplots - geom_boxplot & stat_boxplot

Univariate boxplots
Basic boxplotPlain boxplot
# Univariate boxplot
ggplot(BOD) + geom_boxplot(aes(y = demand, x = "Demand"))
#Conditional boxplot
p <- ggplot(BOD) +
 geom_boxplot(aes(y=demand,x=1)) +
 scale_y_continuous("Biochemical oxygen demand (mg/l)") +
 scale_x_continuous(limits=c(0,2),breaks=NULL)

p + theme(panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  panel.background = element_blank(),
  panel.border = element_blank(),
  axis.title.y=element_text(size=15, vjust=0,angle=90),
  axis.text.y=element_text(size=12),
  axis.title.x=element_blank(),
  axis.text.x=element_blank(),
  axis.line = element_line(),
  plot.margin=unit(c(0.5,0.5,0.5,2),"lines")
 )

Conditional (factorial) boxplots

Basic factorial boxplotPlain factorial boxplot
# Conditional boxplot
ggplot(warpbreaks) + geom_boxplot(aes(y = breaks, x = wool))
# Plain conditional boxplot
p <- ggplot(warpbreaks) + geom_boxplot(aes(y = breaks,
    x = wool)) + scale_y_continuous("Number of wool breaks") +
    scale_x_discrete("Type of wool")

p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 12), axis.line = element_line(),
    plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))
Basic factorial boxplotPlain factorial boxplot
ggplot(warpbreaks) + geom_boxplot(aes(y = breaks, x = wool,
    fill = tension))
p <- ggplot(warpbreaks) +
 geom_boxplot(aes(y=breaks,x=wool, fill=tension)) +
 scale_y_continuous("Number of wool breaks") +
 scale_x_discrete("Type of wool")+
   labels=c("Low","Medium","High"),start=0.5,end=1)
p + theme(panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  panel.background = element_blank(), 
  panel.border = element_blank(),
  axis.title.y=element_text(size=15, vjust=0,angle=90),
  axis.text.y=element_text(size=12),
  axis.title.x=element_text(size=15, vjust=-1),
  axis.text.x=element_text(size=12),
  axis.line = element_line(),
  legend.position=c(1,1),legend.justification=c(1,1),
  plot.margin=unit(c(0.5,0.5,2,2),"lines")
 )

Violin Plot - geom_violin

Violin plotPlain violin plot
ggplot(warpbreaks, aes(y = breaks, x = wool)) + geom_violin()
library(grid)
library(scales)
p<-ggplot(warpbreaks, aes(y=breaks, x=wool))+
  geom_violin()+
  scale_x_discrete("Wool type")+
  scale_y_continuous("Number of breaks", expand=c(0.05,0), labels=comma)
p +       theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,0.2),legend.justification=c(1,0),
        plot.margin=unit(c(0.5,0.5,2,2),"lines"),
    legend.key=element_blank()
  )

Histograms - geom_histogram, geom_bar & stat_bin

Univariate histograms

Basic histogramPlain histogram
ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers))
# OR
ggplot(data = data.frame(rivers)) + geom_histogram(aes(x = rivers))
p <- ggplot(data=data.frame(rivers)) + geom_bar(aes(x=rivers),colour='black',fill='gray')+
 scale_x_continuous("Length of rivers (miles)")+
 scale_y_continuous("Frequency", expand=c(0,0))+
 coord_cartesian(xlim=c(0,4000))
p + theme(panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  panel.background = element_blank(),
  panel.border = element_blank(),
  axis.title.y=element_text(size=15, vjust=0,angle=90),
  axis.text.y=element_text(size=12),
  axis.title.x=element_text(size=15,vjust=-1),
  axis.text.x=element_text(size=12),
  axis.line = element_line(),
  plot.margin=unit(c(0.5,0.5,2,2),"lines")
 )
Number of binsPlain bin width
# Histogram with customized bin widths
ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers),
    binwidth = 50)
# OR
ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers))
#Plain histogram with custom bin widths
#use the expand() to scale the axis zero to 0 
p <- ggplot(data=data.frame(rivers)) + geom_bar(aes(x=rivers),binwidth=50,colour='black',fill='gray')+
  scale_x_continuous("Length of rivers (miles)", expand=c(0,0))+
  scale_y_continuous("Frequency", expand=c(0,0))
p + theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=12),
        axis.line = element_line(),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
        )
Scaled x-valuesPlain transformed x-values
# Histogram on log transformed data
ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers)) +
    scale_x_continuous(trans = "log10")
# OR
ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers)) +
    scale_x_log10()
# Plain histogram of log transformed data
#define a new axis label formattter
p <- ggplot(data=data.frame(rivers)) + geom_bar(aes(x=rivers),colour='black',fill='gray')+
  scale_x_continuous("Length of rivers (miles)", expand=c(0,0),trans="log10")+
  scale_y_continuous("Frequency", expand=c(0,0))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=12),
        axis.line = element_line(),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
        )
Scaled x-axisPlain transformed x- coordinates
# Histogram of linear data on log transformed axis
ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers)) +
    coord_trans(xtrans = "log1p")
# Plain histogram of linear data on log transformed axis
#define a new axis label formattter
p <- ggplot(data=data.frame(rivers)) + geom_bar(aes(x=rivers),colour='black',fill='gray')+
  scale_x_continuous("Length of rivers (miles)", expand=c(0,0))+
  coord_trans(xtrans="log1p")+
  scale_y_continuous("Frequency", expand=c(0,0))
p +  theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=12),
        axis.line = element_line(),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
        )
Gradient fillPlain scaled grey gradient fill
ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers,
    fill = ..count..))
# Plain histogram with gradient fill
#define a new axis label formattter
p <- ggplot(data=data.frame(rivers)) +
  geom_bar(aes(x=rivers,fill=..count..))+
  geom_bar(aes(x=rivers, fill=..count..),colour="black",guide=FALSE)+
  scale_x_continuous("Length of rivers (miles)", expand=c(0,0))+
  scale_y_continuous("Frequency", expand=c(0,0))+
  scale_fill_gradient(low="grey90", high="grey40")
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=12),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
        )

Conditional (factorial) histograms

Basic histogramPlain histogram
ggplot(data = iris) + geom_bar(aes(x = Sepal.Length,
    fill = Species), position = "identity")
# OR
ggplot(data = iris) + geom_histogram(aes(x = Sepal.Length,
    fill = Species), position = "identity")
#Conditional histogram
p <- ggplot(data=iris) +
  geom_bar(aes(x=Sepal.Length, fill=Species), position="identity")+
  scale_x_continuous("Sepal length (mm)", expand=c(0,0))+
  scale_y_continuous("Frequency", expand=c(0,0))+
  scale_fill_grey()
p + theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=12),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
        )
Basic histogramPlain histogram
ggplot(data = iris) + geom_bar(aes(x = Sepal.Length,
    fill = Species), position = "dodge")
# OR
ggplot(data = iris) + geom_histogram(aes(x = Sepal.Length,
    fill = Species), position = "dodge")
#Transparent Conditional Histogram
p <-ggplot(data=iris)+
  geom_bar(aes(x=Sepal.Length,fill=Species), alpha=0.5, stat="bin", position="identity")+
  geom_step(aes(x=Sepal.Length, colour=Species,fill=Species),stat="bin", position=position_identity())+#, 
  scale_x_continuous("Sepal length (mm)", expand=c(0,0))+
  scale_y_continuous("Frequency", expand=c(0,0))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=12),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
        )

Density plots - geom_density & stat_density

Univariate density plots
Basic density plotPlain density plot
ggplot(data = data.frame(rivers)) + geom_density(aes(x = rivers))
p <- ggplot(data=data.frame(rivers)) +
  geom_density(aes(x=rivers),colour='black',fill='grey90')+
  scale_x_continuous("Length of rivers (miles)", expand=c(0,0))+
  scale_y_continuous(expression(paste("Density (",phantom() %*% 10^-4,")")), expand=c(0,0),
         labels=function(x){format(x*10000,nsmall=1,scientific=FALSE)})
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=12),
        axis.line = element_line(),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
        )
Basic smoother density plotPlain smoother density plot
ggplot(data = data.frame(rivers)) + geom_density(aes(x = rivers),
    adjust = 5)
myF <- function(x) {
    format(x * 10000, nsmall = 1, scientific = FALSE)
}
p <- ggplot(data = data.frame(rivers)) + geom_density(aes(x = rivers),
    adjust = 5, colour = "black", fill = "grey90") +
    scale_x_continuous("Length of rivers (miles)",
        expand = c(0, 0)) + scale_y_continuous("Density (/10000)",
    expand = c(0, 0), labels = myF)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 12), axis.line = element_line(),
    plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))
Basic smoother density plotPlain smoother density plot
ggplot(data = data.frame(rivers)) + geom_density(aes(x = rivers)) +
    scale_x_continuous(trans = "log10")
p <- ggplot(data=data.frame(rivers)) +
  geom_density(aes(x=rivers),colour='black',fill='grey90')+
  scale_x_continuous("Length of rivers (miles)", expand=c(0,0),
    trans="log10", breaks=c(250,500,1000,2000,3000),label=c(250,500,1000,2000,3000))+
  scale_y_continuous("Density", expand=c(0,0))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
        )

Factorial density

Basic factorial density plotPlain factorial density plot
ggplot(data = iris) + geom_density(aes(x = Sepal.Length,
    colour = Species))
# Plain conditional density plot
p <- ggplot(data=iris) + geom_density(aes(x=Sepal.Length, colour=Species))+
  scale_x_continuous("Sepal length (mm)", expand=c(0,0))+
  scale_y_continuous("Density", expand=c(0,0))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
Basic factorial density plotPlain factorial density plot
# Conditional density plot
ggplot(data = iris) + geom_density(aes(x = Sepal.Length,
    fill = Species))
# Plain conditional density plot
p <- ggplot(data=iris) +
  geom_density(aes(x=Sepal.Length,fill=Species), alpha=0.4, colour=NA)+
  geom_density(aes(x=Sepal.Length,fill=Species, colour=Species), alpha=0.0, show_guide=FALSE)+
  scale_x_continuous("Sepal length (mm)", expand=c(0,0))+
  scale_y_continuous("Density", expand=c(0,0))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
    axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Line graphs - geom_line

Basic line graphPlain line graph
ggplot(BOD) + geom_line(aes(y = demand, x = Time))
# Plain line plot
p <- ggplot(data=BOD) +
  geom_line(aes(y=demand,x=Time),size=2)+
  scale_x_continuous("Time (days)", expand=c(0.05,0), limits=c(0,8))+
  scale_y_continuous("Demand (mg/l)", expand=c(0.05,0), limits=c(8,20))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
Basic line graphPlain line graph
ggplot(BOD, aes(y = demand, x = Time)) + geom_line() +
    geom_point()
# Plain line plot
p <- ggplot(data=BOD) +
  geom_line(aes(y=demand,x=Time),size=2)+
  scale_x_continuous("Time (days)", expand=c(0.05,0), limits=c(0,8))+
  scale_y_continuous("Demand (mg/l)", expand=c(0.05,0), limits=c(8,20))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Scatterplots - geom_point, geom_line, geom_smooth, stat_smooth & stat_summary

Simple scatterplots

Basic scatterplotPlain scatterplot
ggplot(BOD) + geom_point(aes(y = demand, x = Time))
# Plain scatterplot
p <- ggplot(data=BOD) +
  geom_point(aes(y=demand,x=Time),size=3)+
  scale_x_continuous("Time (days)", expand=c(0,0), limits=c(0,8))+
  scale_y_continuous("Demand (mg/l)", expand=c(0,0), limits=c(8,20))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Trends and smoothers

Linear trendPlain linear trend with
95% confidence and
ggplot(BOD) + geom_point(aes(y = demand, x = Time)) +
    geom_smooth(aes(y = demand, x = Time), method = "lm")
# fit linear model (in order to get confidence bands)
BOD.lm <- lm(demand~Time, data=BOD)
xs <- seq(min(BOD$Time), max(BOD$Time), l=1000)
BOD.predict <- predict(BOD.lm,
  newdata=data.frame(Time=xs),interval='confidence', se=TRUE)
BOD.predict <- data.frame(BOD.predict$fit, se=BOD.predict$se.fit,Time=xs)
# Create a plain scatterplot with smoother and confidence bands
p <- ggplot(data=BOD) +
  geom_point(aes(y=demand, x=Time),colour='grey',size=2)+
  geom_line(aes(y=demand,x=Time),stat="smooth", method="lm")+
  scale_x_continuous("Time (days)", limits=c(1,7))+
  scale_y_continuous("Demand (mg/l)")
p <- p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
# add the confidence bands
p+geom_line(data=BOD.predict, aes(y=upr,x=Time), linetype=2)+
  geom_line(data=BOD.predict, aes(y=lwr,x=Time), linetype=2)
Loess smootherPlain loess smoother with standard error
ggplot(BOD) + geom_point(aes(y = demand, x = Time)) +
    geom_smooth(aes(y = demand, x = Time), method = "loess",
        degree = 1, se = TRUE)
# Fit a loess smoother
BOD.loess <- loess(demand~Time, data=BOD, degree=1)
xs <- seq(min(BOD$Time), max(BOD$Time), l=1000)
BOD.predict <- predict(BOD.loess,newdata=data.frame(Time=xs), se=TRUE)
BOD.predict <- with(BOD.predict,data.frame(fit,lwr=fit-se.fit,upr=fit+se.fit,Time=xs))
# Plain scatterplot with loess smoother and confidence bands
p <- ggplot(data=BOD) +
  geom_point(aes(y=demand, x=Time),colour='grey',size=2)+
  geom_line(data=BOD.predict,aes(y=fit,x=Time))+
  scale_x_continuous("Time (days)", limits=c(1,7))+
  scale_y_continuous("Demand (mg/l)")
p <- p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    axis.title.y=element_text(size=15, vjust=0,angle=90),
    axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
# add confidence bands
p+geom_line(data=BOD.predict, aes(y=upr,x=Time), linetype=2)+
  geom_line(data=BOD.predict, aes(y=lwr,x=Time), linetype=2)
Generalized additive model (GAM)Plain gam with 95% CI based on
1.96 (dashed; ggplot default) and qt(0.975,df)
library(mgcv)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() +
    stat_smooth(geom = "smooth", method = "gam", formula = y ~
        s(x, k = 6))
#manually fit a GAM
library(mgcv)
CO2.gam <- gam(uptake~s(conc,k=6), data=CO2)
xs <- seq(min(CO2$conc), max(CO2$conc), l=1000)
CO2.predict <- predict(CO2.gam,newdata=data.frame(conc=xs),se.fit=TRUE)
df <- sum(CO2.gam$edf[-1])
#generate 95% CI predictions based on 1.96SE and degrees of freedom	  
CO2.predict1 <-with(CO2.predict,data.frame(fit,lwr=fit-(1.96*se.fit),
                                                               upr=fit+(1.96*se.fit),conc=xs))
CO2.predict2 <- with(CO2.predict,data.frame(fit,lwr=fit-(qt(0.975,df)*se.fit),
                                                upr=fit+(qt(0.975,df)*se.fit),conc=xs))
p <- ggplot(data=CO2) +
  geom_point(aes(y=uptake, x=conc),colour='grey',size=2)+
  #stat_smooth(aes(y=uptake, x=conc),geom="smooth",method="gam",formula=y~s(x,k=6))+
  scale_x_continuous(expression(paste("Ambient ",CO[2]," concentration (mg/l)", sep="")))+
  scale_y_continuous(expression(paste(CO[2]," uptake rate (",mu*mol/m^2/sec,")", sep="")))
p <- p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
# add the trendline and confidence bands
p+geom_line(data=CO2.predict1, aes(y=fit,x=conc),)+
  geom_line(data=CO2.predict1, aes(y=upr,x=conc), linetype=2)+
  geom_line(data=CO2.predict1, aes(y=lwr,x=conc), linetype=2)+
  geom_line(data=CO2.predict2, aes(y=upr,x=conc), linetype=3)+
  geom_line(data=CO2.predict2, aes(y=lwr,x=conc), linetype=3)
Means plotPlain gam with 95% CI based on
1.96 (dashed; ggplot default) and qt(0.975,df)
p <- ggplot(CO2, aes(y = uptake, x = conc)) + geom_pointrange(stat = "summary",
    fun.data = "mean_cl_boot")
p
p <- ggplot(data=CO2, aes(y=uptake, x=conc)) +
  scale_x_continuous(expression(paste("Ambient ",CO[2]," concentration (mg/l)", sep="")))+
  scale_y_continuous(expression(paste(CO[2]," uptake rate (",mu*mol/m^2/sec,")", sep="")))
p <- p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
# add the errorbars
p+geom_pointrange(stat="summary", fun.data="mean_cl_normal")
Means plotPlain gam with 95% CI based on
1.96 (dashed; ggplot default) and qt(0.975,df)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_errorbar(stat = "summary",
    fun.data = "mean_cl_boot") + geom_point(stat = "summary",
    fun.y = "mean")
p <- ggplot(data=CO2, aes(y=uptake, x=conc)) +
  stat_smooth(geom="smooth",method="gam",formula=y~s(x,k=6),se=FALSE, colour="gray")+
  geom_errorbar(stat="summary", fun.data="mean_cl_boot") +geom_point(stat="summary",fun.y="mean")+
  scale_x_continuous(expression(paste("Ambient ",CO[2]," concentration (mg/l)", sep="")))+
  scale_y_continuous(expression(paste(CO[2]," uptake rate (",mu*mol/m^2/sec,")", sep="")))
p +
  theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Bargraphs (dynamite plots) - geom_bar & stat_summary

Simple bargraphs

BargraphPlain bargraph
ggplot(warpbreaks, aes(y = breaks, x = tension)) +
    geom_bar(stat = "summary", fun.y = mean) + geom_errorbar(stat = "summary",
    fun.data = "mean_cl_normal", width = 0.1)
p <- ggplot(data=warpbreaks, aes(y=breaks, x=tension)) +
  geom_bar(stat="summary", fun.y=mean,color="black",fill="grey80")+
  geom_errorbar(stat="summary", fun.data="mean_cl_normal", width=0.1)+
  scale_x_discrete("Tension")+
  scale_y_continuous("Number of breaks", expand=c(0,0))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Conditional (factorial) bargraphs

Conditional bargraphPlain conditional bargraph
ggplot(warpbreaks, aes(y = breaks, x = tension, group = wool)) +
    geom_bar(aes(fill = wool), position = position_dodge(0.9),
        stat = "summary", fun.y = mean) + geom_errorbar(position = position_dodge(0.9),
    stat = "summary", fun.data = "mean_cl_normal",
    width = 0.1)
p <- ggplot(data = warpbreaks, aes(y = breaks, x = tension,
    group = wool)) + geom_bar(aes(fill = wool), position = "dodge",
    stat = "summary", fun.y = mean) + geom_bar(aes(fill = wool),
    position = "dodge", stat = "summary", fun.y = mean,
    color = "black", show_guide = FALSE) + scale_fill_grey("Wool type") +
    geom_errorbar(position = position_dodge(0.9), stat = "summary",
        fun.data = "mean_cl_normal", width = 0.1) +
    scale_x_discrete("Tension") + scale_y_continuous("Number of wool breaks",
    expand = c(0, 0))
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 1), legend.justification = c(1,
        1), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))

Bar charts - geom_bar

Stacked barchartPlain stacked barchart
# based on pre-calculated counts
warpbreaks.c <- ddply(warpbreaks, ~wool + tension,
    function(x) data.frame(count = sum(x$breaks)))
ggplot(warpbreaks.c, aes(x = tension, y = count, fill = wool)) +
    geom_bar(stat = "identity") + ylab("Number of breaks")
#based on pre-calculated counts
warpbreaks.c<-ddply(warpbreaks,~wool+tension, function(x) data.frame(count=sum(x$breaks)))
p <- ggplot(warpbreaks.c, aes(x=tension,y=count,fill=wool))+
  geom_bar(aes(fill=wool), stat='identity')+
  geom_bar(aes(fill=wool), stat='identity', colour="black",show_guide=FALSE)+
  scale_fill_grey("Wool type")+
  scale_x_discrete("Tension")+
  scale_y_continuous("Number of wool breaks", expand=c(0,0))
p +
  theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    axis.title.y=element_text(size=15, vjust=0,angle=90),
    axis.text.y=element_text(size=12),
    axis.title.x=element_text(size=15,vjust=-1),
    axis.text.x=element_text(size=10),
    axis.line = element_line(),
    legend.position=c(1,1),legend.justification=c(1,1),
    plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
Horizontal stacked barchartPlain horizontal stacked barchart
# based on pre-calculated counts
warpbreaks.c <- ddply(warpbreaks, ~wool + tension,
    function(x) data.frame(count = sum(x$breaks)))
ggplot(warpbreaks.c, aes(x = tension, y = count, fill = wool)) +
    geom_bar(stat = "identity") + ylab("Number of breaks") +
    coord_flip()
#based on pre-calculated counts
warpbreaks.c<-ddply(warpbreaks,~wool+tension, function(x) data.frame(count=sum(x$breaks)))
p <- ggplot(warpbreaks.c, aes(x=tension,y=count,fill=wool))+ coord_flip()+
  geom_bar(aes(fill=wool), stat='identity')+
  geom_bar(aes(fill=wool), stat='identity', colour="black",show_guide=FALSE)+
  scale_fill_grey("Wool type")+
  scale_x_discrete("Tension")+
  scale_y_continuous("Number of wool breaks", expand=c(0.05,0))
p +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Interaction plots - geom_point, geom_line, geom_smooth

Interaction plotPlain interaction plot
ggplot(ToothGrowth, aes(y = len, x = dose, colour = supp)) +
    geom_point() + geom_smooth(method = "lm")
p <- ggplot(ToothGrowth, aes(y = len, x = dose, linetype = supp)) +
    geom_point(aes(shape = supp)) + geom_smooth(method = "lm") +
    scale_linetype_manual(name = "Suppliment type",
        values = c(1, 2), breaks = c("OJ", "VC"), labels = c("Orange juice",
            "Vitamine C")) + scale_shape_manual(name = "Suppliment type",
    values = c(21, 16), breaks = c("OJ", "VC"), labels = c("Orange juice",
        "Vitamine C")) + scale_x_continuous("Dose (mg)",
    labels = comma) + scale_y_continuous("Tooth length (mm)",
    expand = c(0.05, 0), labels = comma)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 0), legend.justification = c(1,
        0), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))
Interaction plot with errorbarsPlain interaction plot with errorbars
ggplot(ToothGrowth, aes(y = len, x = dose, colour = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot") +
    geom_point(stat = "summary", fun.y = "mean") +
    geom_line(stat = "summary", fun.y = "mean")
p <- ggplot(ToothGrowth, aes(y = len, x = dose, group = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot",
        width = 0.05) + geom_line(aes(linetype = supp),
    stat = "summary", fun.y = "mean") + geom_point(aes(shape = supp,
    fill = supp), stat = "summary", fun.y = "mean") +
    scale_shape_manual(name = "Suppliment type", values = c(21,
        16), breaks = c("OJ", "VC"), labels = c("Orange juice",
        "Vitamine C")) + scale_fill_manual(name = "Suppliment type",
    values = c("white", "black"), breaks = c("OJ",
        "VC"), labels = c("Orange juice", "Vitamine C")) +
    scale_linetype_manual(name = "Suppliment type",
        values = c(1, 2), breaks = c("OJ", "VC"), labels = c("Orange juice",
            "Vitamine C")) + scale_x_continuous("Dose (mg)",
    labels = comma) + scale_y_continuous("Tooth length (mm)",
    expand = c(0.05, 0), labels = comma)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 0), legend.justification = c(1,
        0), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"),
    legend.key = element_blank())
Interaction plot with dodgePlain interaction plot with dodge
ggplot(ToothGrowth, aes(y = len, x = dose, colour = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot",
        position = position_dodge(0.2)) + geom_point(stat = "summary",
    fun.y = "mean", position = position_dodge(0.2)) +
    geom_line(stat = "summary", fun.y = "mean", position = position_dodge(0.2))
p <- ggplot(ToothGrowth, aes(y = len, x = dose, group = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot",
        width = 0.05, position = position_dodge(0.2)) +
    geom_line(aes(linetype = supp), stat = "summary",
        fun.y = "mean", position = position_dodge(0.2)) +
    geom_point(aes(shape = supp, fill = supp), size = 3,
        stat = "summary", fun.y = "mean", position = position_dodge(0.2)) +
    scale_shape_manual(name = "Suppliment type", values = c(21,
        16), breaks = c("OJ", "VC"), labels = c("Orange juice",
        "Vitamine C")) + scale_fill_manual(name = "Suppliment type",
    values = c("white", "black"), breaks = c("OJ",
        "VC"), labels = c("Orange juice", "Vitamine C")) +
    scale_linetype_manual(name = "Suppliment type",
        values = c(1, 2), breaks = c("OJ", "VC"), labels = c("Orange juice",
            "Vitamine C")) + scale_x_continuous("Dose (mg)",
    labels = comma) + scale_y_continuous("Tooth length (mm)",
    expand = c(0.05, 0), labels = comma)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 0), legend.justification = c(1,
        0), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"),
    legend.key = element_blank())

Scatterplot matrix - plotmatrix

Means plotPlain gam with 95% CI based on
1.96 (dashed; ggplot default) and qt(0.975,df)
p <- plotmatrix(iris) + geom_point(aes(colour = Species))
library(grid)
library(scales)
p <- ggplot(ToothGrowth, aes(y = len, x = dose, group = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot",
        width = 0.05, position = position_dodge(0.2)) +
    geom_line(aes(linetype = supp), stat = "summary",
        fun.y = "mean", position = position_dodge(0.2)) +
    geom_point(aes(shape = supp, fill = supp), size = 3,
        stat = "summary", fun.y = "mean", position = position_dodge(0.2)) +
    scale_shape_manual(name = "Suppliment type", values = c(21,
        16), breaks = c("OJ", "VC"), labels = c("Orange juice",
        "Vitamine C")) + scale_fill_manual(name = "Suppliment type",
    values = c("white", "black"), breaks = c("OJ",
        "VC"), labels = c("Orange juice", "Vitamine C")) +
    scale_linetype_manual(name = "Suppliment type",
        values = c(1, 2), breaks = c("OJ", "VC"), labels = c("Orange juice",
            "Vitamine C")) + # scale_fill_grey('Wool type')+
scale_x_continuous("Dose (mg)", label = comma) + scale_y_continuous("Tooth length (mm)",
    expand = c(0.05, 0), label = comma)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 0.1), legend.justification = c(1,
        0), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"),
    legend.key = element_blank())

Multi-panel (facetted) plot - facet_grid & facet_wrap

2D grid of panels - facet_grid

Grid of panelsPlain grid of panels
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + facet_grid(Type ~ Treatment)
library(grid)
library(scales)
# Create a new instance of the dataset to facilitate more informative panel titles
CO2.a <- CO2
# Re-define the factor labels
CO2.a$Treatment <- factor(CO2.a$Treatment, labels=c(expression(paste("Non-Chilled ", (symbol("\076")*15 * degree * C), sep="")),
expression(paste("Chilled ",(symbol("\074")*5 * degree * C), sep=""))))
CO2.a$Type <- factor(CO2.a$Type, labels=c("Origin:Quebec", "Origin:Mississippi"))
p<-ggplot(CO2.a,aes(y=uptake,x=conc))+
  geom_ribbon(aes(ymin=..ymin.., ymax=..ymax..),linetype=2,fill="transparent",colour="black",stat='smooth',method='loess')+
  geom_smooth(se=FALSE)+
  geom_point()+
  facet_grid(Type~Treatment, labeller=label_parsed)+
  scale_x_continuous(expression(paste("Ambient ",CO[2]," concentration (mg/l)", sep="")))+
  scale_y_continuous(expression(paste(CO[2]," uptake rate (",mu*mol/m^2/sec,")", sep="")))
p +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_rect(fill="transparent",colour="black"),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        #axis.line = element_line(),
    strip.background=element_rect(fill="transparent", colour="black"),
        #legend.position=c(1,0.2),legend.justification=c(1,0),
        plot.margin=unit(c(0.5,0.5,2,2),"lines"),
    legend.key=element_blank()
  )

Multiple graphs per graphic

Viewports

Grid of panelsPlain grid of panels
p1 <- ggplot(CO2, aes(y = uptake, x = Treatment, fill = Type)) + geom_boxplot()
p2 <- ggplot(CO2, aes(y = uptake, x = conc, colour = Type)) + geom_smooth()
p3 <- ggplot(CO2, aes(x = uptake, fill = Type)) + geom_density(alpha = 0.4) + facet_grid(~Treatment)

grid.newpage()
pushViewport(viewport(layout = grid.layout(4, 5)))

pushViewport(viewport(layout.pos.col = 1:2, layout.pos.row = 1:2))
print(p1 + theme(legend.position = "none"), newpage = FALSE)
popViewport(1)

pushViewport(viewport(layout.pos.col = 3:4, layout.pos.row = 1:2))
print(p2 + theme(legend.position = "none"), newpage = FALSE)
popViewport(1)

pushViewport(viewport(layout.pos.col = 1:4, layout.pos.row = 3:4))
print(p3 + theme(legend.position = "none"), newpage = FALSE)
popViewport(1)

library(gridExtra)
tmp <- ggplot_gtable(ggplot_build(p3))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
# legend <- gTree(legend, vp=viewport(layout.pos.col=5, layout.pos.row=1:4)) using grid.arrange for
# convenience could also manually push viewports grid.arrange(arrangeGrob(p3 +
# theme(legend.position='none'), p2 + theme(legend.position='none'), main ='this is a title', left =
# 'This is my global Y-axis title'), legend, widths=unit.c(unit(1, 'npc') - legend$width,
# legend$width), nrow=1)

pushViewport(viewport(layout.pos.col = 5, layout.pos.row = 1:4))
# print(p3+theme(keep='legend_box'), newpage=FALSE) print(p3+theme(keep='legend_box'), newpage=FALSE)
grid.draw(legend)
popViewport(0)
#Create a new plain element
element_plain <- function (base_size = 12,base_family=""){ #
  structure(list(
    axis.line = element_line(),
    #axis.text.x = element_text(family = base_family, size = base_size * 0.8, vjust = 1, lineheight = 0.9), 
    #axis.text.y = element_text(family = base_family, size = base_size * 0.8, hjust = 1, lineheight = 0.9),
    #axis.ticks = element_line(colour = "black", size = 0.2), 
    #axis.title.x = element_text(family = base_family, size = base_size, vjust = 0.5), 
    #axis.title.y = element_text(family = base_family, size = base_size, vjust = 0.5, angle = 90), 
    #axis.ticks.length = unit(0.15, "cm"), 
    #axis.ticks.margin = unit(0.1, "cm"), 
    #legend.background=element_blank(),
    #legend.margin=unit(0.2,"cm"),
    #legend.key = element_rect(colour = "grey80"),
    #legend.key.size=unit(1.2,"lines"),
    #legend.key.height=NULL,
    #legend.key.width=NULL,
    #legend.text=element_text(family = base_family, size = base_size * 0.8),
    #legend.text.align=NULL,
    #legend.title=element_text(family = base_family, face = "bold", size = base_size * 0.8, hjust = 0),
    #legend.title.align=NULL,
    #legend.position = "right",
    #legend.direction=NULL,
    #legend.justification="center",
    #legend.box=NULL,
    #panel.background = element_blank(), 
    #panel.border = element_blank(), 
    #panel.grid.major = element_blank(), 
    #panel.grid.minor = element_blank(), 
    #panel.margin = unit(0.25, "lines"),
    #strip.background=element_rect(fill="transparent", colour="black"),
    #strip.text.x=element_text(family = base_family, size = base_size * 0.8),
    #strip.text.y=element_text(family = base_family, size = base_size * 0.8, angle = -90),
    #plot.background = element_blank(),
    plot.margin=unit(c(0.5,0.5,1,1),"lines"),
    plot.title=element_blank()
  ), class = "theme")
}
#Construct the boxplots
p1 <- ggplot(CO2, aes(y=uptake,x=Treatment,fill=Type))+
  geom_boxplot(alpha=0.4)+scale_fill_manual(values=c("white","grey"))+
  element_plain()

p2 <- ggplot(CO2, aes(y=uptake, x=conc, linetype=Type))+
 geom_smooth(color="black",se=FALSE)+
 geom_smooth(color="black", show_guide=FALSE)+
 scale_linetype()+
 element_plain()

p3<-ggplot(CO2, aes(x=uptake, fill=Type)) +
 geom_density(alpha=0.4, colour=NA)+
 geom_density(alpha=0, show_guide=FALSE)+
 facet_grid(~Treatment)+
 scale_fill_manual(values=c("white","grey"))+
 element_plain()

library(gridExtra)
tmp <- ggplot_gtable(ggplot_build(p3))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]

tmp <- ggplot_gtable(ggplot_build(p2))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend1 <- tmp$grobs[[leg]]
##legend <- gTree(legend, vp=viewport(layout.pos.col=5, layout.pos.row=1:4))

 grid.newpage()
  pushViewport(viewport(layout=grid.layout(4,5)))

  pushViewport(viewport(layout.pos.col=1:2, layout.pos.row=1:2))
  print(p1+theme(legend.position="none"), newpage=FALSE)
  popViewport(1)

  pushViewport(viewport(layout.pos.col=3:4, layout.pos.row=1:2))
  print(p2+theme(legend.position="none"), newpage=FALSE)
  popViewport(1)

  pushViewport(viewport(layout.pos.col=1:4, layout.pos.row=3:4))
  print(p3+theme(legend.position="none"), newpage=FALSE)
  popViewport(1)

pushViewport(viewport(layout.pos.col=5, layout.pos.row=1:4))
grid.draw(legend)
grid.draw(legend1)
  popViewport(0)
- axis ticks etc

Welcome to the end of this tutorial