Tutorial 6 - The Grammar of Graphics in R (ggplot2)

25 Feb 2012

This Tutorial has been thrown together a little hastily and is therefore not very well organized - sorry! Graphical features are demonstrated either via tables of properties or as clickable graphics that reveal the required R code. Click on a graphic to reveal/toggle the source code or to navigate to an expanded section.

This tutorial is intended to be viewed sequentially. It begins with the basic ggplot framework and then progressively builds up more and more features as default elements are gradually replaced to yeild more customized graphics.

Having said that, I am going to start with a sort of showcase of graphics which should act as quick navigation to entire sections devoted to the broad series of graphs related to each of the featured graphics. I have intentionally titled each graph according to the main feature it encapsulates rather than any specific functions that are used to produce the features as often a single graphic requires a combination of features and thus functions. Furthermore, the grammar of graphics specifications are sufficiently unfamiliar to many that the relationships between the types of graphical features a researcher wishes to produce and the specific syntax required to achieve the desired result can be difficult to recognize.

Each graphic is intended to encapsulate a broad series of related graph types.

Basic plot types

Boxplots	Histograms	Density plots	Scatterplots	Line graphs	Smoothers
Trendlines	Bar charts	Stacked bar charts	Bar graphcs	Interaction plots	Scatterplot matrix
Heat maps	Contour maps

Additions

Segments	Confidence bands	Error bars	Horizontal lines	Vertical lines	Range bars
Axes rugs	Text plots

Customizations

Colours

Line types

Plotting symbols

Transparency

Sizes

Date and time

Layouts

facet_wrap

facet_grid

viewpots

The Grammar of Graphics

The Grammar of Graphics was first introduced/presented by Wilkinson and Wills (2006) as a new graphics philosophy that laid down a series of rules to govern the production of quantitative graphics. Essentially the proposed graphics infrastructure considers a graphic as comprising a plot (defined by a coordinate system, scales and panelling) over which one or more data layers are applied.

Each layer is defined as:

the data - a data frame
mapping specifications that establish the visual aesthetics (colour, line type and thickness, shapes etc) of each variable
statistical methods that determine how the data rows should be summarised (stat)
geometric instructions (geom) on how each summary should be represented (bar, line, point etc)
positional mechanism for dealing with overlapping data (position)

The visual aspects of all the graphical features are then governed by themes.

Following a very short example, the next section will largely concentrate on describing each of the above graphical components. Having then established the workings of these components, we can then put them together to yield specific graphics.

Hadley Wickham's interpretation of these principals in an R context is implimented via the ggplot2 package. In addition the following packages are also commonly used alongside ggplot so as to expand on the flexibility etc.

grid
gridExtra
scales

library(ggplot2)
library(grid)
library(gridExtra)
library(scales)

The following very simple graphic will be used to illustrate the above specification by implicitly stating many of the default specifications. It will use a cartesian coordinate system, continuous axes scales, a single facet (panel) and then define a single layer with a dataframe (BOD), with red points, identity (no summarisation) statistic visualised as a point geometric.

plotGgplot,

 p <- ggplot() +
  coord_cartesian() + #cartesian coordinates
  scale_x_continuous() + #continuous x axis
  scale_y_continuous() + #continuous y axis
  #single layer
  layer( data=BOD, #data.frame
    mapping=aes(y=demand,x=Time),
    stat="identity", #use original data
    geom="point" #plot data as points
  )+
  layer( data=BOD, #data.frame
    mapping=aes(y=demand,x=Time),
    stat="identity", #use original data
    geom="line" #plot data as a line
  )
p #print the plot

OR, by leaving out all the default stuff

p <- ggplot(data = BOD, map = aes(y = demand, x = Time)) +
    geom_point() + geom_line()
p

Note, the following important features of the grammar of graphics as implemented in R:

the order in which each of the above components in the first code snippet were added is unimportant. They each add additional information to the overall graphical object. The object itself is evaluated as a whole when it is printed.
multiple layers are laid down in the order that they appear in the statement
in the second code snippet (the shorter version), a layer is created for each of the two geoms
the data and mapping used by both geom_point() and geom_line are inherited from the main ggplot() function.
since layers are ordered, the points are drawn first and the line over the top

In an attempt to illustrate the use of ggplot for elegant graphics, we will drill down into each of the plot and layer specifications. Although the geoms and thus layers are amongst the last features to be constructed by the system, the data and aesthetic features of the data impact on how the coordinate system, scales and panelling work. Therefore, we will explore the geoms first.

Geometric objects - `geom_` and `stat_`

Geometric objects (geoms) are visual representations of observations. For example, there is a geom to represent points based on a set of x,y coordinates. All graphics need at least one geom and each geom is mapped to its own layer. Multiple geoms can be added to a graphic and the order that they are added to the expression determines the order that their respective layer is constructed.

When a ggplot expression is being evaluated, geoms are coupled together with a stat_ function. This function is responsible for generating data appropriate for the geom. For example, the stat_boxplot is responsible for generating the quantiles, whiskers and outliers for the geom_boxplot function.

In addition to certain specific stat_ functions, all geoms can be coupled to a stat_identity function. In mathematical contexts, identity functions map each element to themselves - this essentially means that each element passes straight through the identity function unaltered. Coupling a geom to an identity function is useful when the characteristics of the data that you wish to represent are present in the data frame. For example, your dataframe may contain the x,y coordinates for a series of points and you wish for them to be used unaltered as the x,y coordinates on the graph. Moreover, your dataframe may contain pre-calculated information about the quantiles, whiskers and outliers and you wish these to be used in the construction of a boxplot (rather than have the internals of ggplot perform the calculations on raw data).

Since geom_ and stats_ functions are coupled together, a geometric representation can be expressed from either a geom_ function OR a stats_ function. That is, you either:

specify a geom_ function that itself calls a stat_ function to provide the data for the geom function..
ggplot(CO2) + geom_smooth(aes(x = conc, y = uptake), stat = "smooth")
specify a stat_ function that itself calls a geom_ function to visually represent the data..
ggplot(CO2) + stat_smooth(aes(x = conc, y = uptake), geom = "smooth")

It does not really make any difference which way around you do this. For the remainder of this tutorial, we will directly engage the geom_ function for all examples.

The geom_ functions all have numerous arguments, many of which are common to all geoms_.

data - the data frame containing the data. Typically this is inherited from the ggplot function.
mapping - the aesthetic mapping instructions. Through the aesthetic mapping the aesthetic visual characteristics of the geometric features can be controlled (such as colour, point sizes, shapes etc). The aesthetic mapping can be inherited from the ggplot function. Common aesthetic features (mapped via a aes function) include:
- alpha - transparency
- colour - colour of the geometric features
- fill - fill colour of geometric features
- linetype - type of lines used in geometric features (dotted, dashed, etc)
- size - size of geometric features such as points or text
- shape - shape of geometric features such as points
- weight - weightings of values
stat - the stat_ function coupled to the geom_ function
position - the position adjustment for overlapping objects
- identity - leave objects were they are
- dodge - shift objects to the side to prevent overlapping
- stack - stack objects on top of each other
- fill - stack objects on top of each other and standardize each group to equal height

Currently, there are a large number of available geoms_ and stat_ functions within the ggplot system. This tutorial is still a work in progress and therefore does not include all of them - I have focused on the more commonly used ones.

In an attempt to break up the set of geoms_ and stat_ functions, I have somewhat arbitrarily divided them up into primary and secondary geometric features. Primary geometric features are those that could be viewed as graphics in their own right, whereas secondary geometric features are those that are added to other geometric features to provide additional information (but would rarely be considered a graphic in their own right).

Primary geometric objects

The following icon matrix provides navigation and an overview to the geometric features described in this section.

geom_bar	geom_bar	geom_bar	geom_bar	geom_boxplot	geom_density
geom_point	geom_line	geom_smooth	geom_smooth	geom_tile	geom_contour

`geom_bar` and `stats_bin`

geom_bar constructs barcharts and histograms. By default, the bins of each bar along with the associated bar heights are calculated by the stats_bin function. The following list describes the mapping aesthetic properties associated with geom_bar and stats_bin. The entries in bold are compulsory.

`geom_bar`	`stat_bar`
x - x axis value (categorical) alpha - transparency colour - colour of the lines fill - colour of the bar linetype - type of lines used to construct bar size - symbol size for outliers weight - weightings of values	x - a vector that is to be binned y - optional y axis value (continuous)

The following table illustrates the first six rows of the diamonds dataset (comes with R) that will be used for the following examples.

	carat	cut	color	clarity	depth	table	price	x	y	z
1	0.23	Ideal	E	SI2	61.50	55.00	326	3.95	3.98	2.43
2	0.21	Premium	E	SI1	59.80	61.00	326	3.89	3.84	2.31
3	0.23	Good	E	VS1	56.90	65.00	327	4.05	4.07	2.31
4	0.29	Premium	I	VS2	62.40	58.00	334	4.20	4.23	2.63
5	0.31	Good	J	SI2	63.30	58.00	335	4.34	4.35	2.75
6	0.24	Very Good	J	VVS2	62.80	57.00	336	3.94	3.96	2.48

Feature	geom	stat	position	Aesthetic parameters / Notes
barchart	`_bar`	`_bin`	stack	x,y,size,linetype,colour,fill,alpha, weight bar heights represent number of items in each level of the categorical vector
ggplot(diamonds) + geom_bar(aes(x = cut))
barchart	`_bar`	`_bin`	stack	Bar heights represent the number of items in each level of a categorical vector and stacked according to another categorical vector
ggplot(diamonds) + geom_bar(aes(x = cut, fill = clarity))
barchart	`_bar`	`_bin`	dodge	bar heights represent the number of items in each combination of levels of multiple categorical vectors displayed side by side
ggplot(diamonds) + geom_bar(aes(x = cut, fill = clarity), position = "dodge")
barchart	`_bar`	`_identity`	stack	bar heights represent value of y for each x
diamonds1 <- as.data.frame(table(diamonds$cut)) ggplot(diamonds1) + geom_bar(aes(x = Var1, y = Freq), stat = "identity")
bargraph	`_bar`	`_summary`	stack	bar heights represent mean y within each level of categorical vector
ggplot(diamonds) + geom_bar(aes(x = cut, y = carat), stat = "summary", fun.y = mean)
histogram	`_bar`	`_bin`	stack	bar heights represent counts within a binned continuous vector
ggplot(diamonds) + geom_bar(aes(x = carat))

id="boxplot"`geom_boxplot` and `stat_boxplot`

geom_boxplot constructs boxplots. The values of the various elements of the boxplot (quantiles, whiskers etc) are calculated by its main pairing function (stat_boxplot). The following list describes the mapping aesthetic properties associated with geom_boxplot. The entries in bold are compulsory. Note that boxplots are usually specified via the geom_boxplot function which will engage the stat_boxplot to calculate the quantiles, whiskers and outliers. Therefore, confusingly, when calling geom_boxplot, the compulsory paramters are actually those required by stat_boxplot (unless you indicated to use stat_identity).

`geom_boxplot`	`stat_boxplot`
x - x axis value (categorical) lower - value of the lower box line (25% percentile) middle - value of the middle box line (50% percentile - median) lower - value of the upper box line (75% percentile) ymax - value of the upper whisker ymax - value of the lower whisker alpha - transparency colour - colour of the lines fill - colour of the boxplot linetype - type of lines used to construct boxplot shape - symbol shape for outliers size - symbol size for outliers weight - weightings of values	x - x axis value (categorical) y - y axis value (continuous)

geom_boxplot

stat_boxplot

x - x axis value (categorical)
lower - value of the lower box line (25% percentile)
middle - value of the middle box line (50% percentile - median)
lower - value of the upper box line (75% percentile)
ymax - value of the upper whisker
ymax - value of the lower whisker
alpha - transparency
colour - colour of the lines
fill - colour of the boxplot
linetype - type of lines used to construct boxplot
shape - symbol shape for outliers
size - symbol size for outliers
weight - weightings of values

x - x axis value (categorical)
y - y axis value (continuous)

Feature	geom	stat	position	Notes, additional parameters	Example
boxplot	`_boxplot`	`_boxplot`	dodge	Plot of quantiles, whiskers and outliers `outlier.colour` `outlier.shape` `outlier.size` `notch` - whether to include a notch or not `notchwidth` - width of notch (fraction of box width)
ggplot(diamonds) + geom_boxplot(aes(x = "carat", y = carat))

`geom_density` and `stat_density`

geom_density constructs smooth density distributions from continuous vectors. The actual smoothed densities are calculated by its main pairing function (stat_density). The following list describes the mapping aesthetic properties associated with geom_density and stat_density. The entries in bold are compulsory. Note that density plots are usually specified via the geom_density function which will engage the stat_density. Therefore, confusingly, when calling geom_density, the compulsory paramaters are actually those required by stat_density (unless you indicated to use stat_identity).

`geom_density`	`stat_density`
x - x axis value (continuous) y - y axis value (densities) alpha - transparency colour - colour of the lines fill - colour of the density linetype - type of lines used to construct density shape - symbol shape for outliers size - symbol size for outliers weight - weightings of values	x - a continuous vector from which to create density distribution fill - fill colour y

Feature	geom	stat	position	Notes, additional parameters	Example
density	`_density`	`_density`	dodge	Density plot of a distribution of a vector `adjust` - smoothness `kernel` - kernel density `trim` - whether to trim densities to data range
ggplot(diamonds) + geom_density(aes(x = carat))

`geom_point`

geom_point draws points (scatterplot). Typically the stat used is stat_identity as we wish to use the values in two continuous vectors as the coordinates of each point. The following list describes the mapping aesthetic properties associated with geom_point. The entries in bold are compulsory.

`geom_point`
x - x axis value (continuous) y - y axis value (densities) alpha - transparency colour - colour of the lines fill - colour of the point linetype - type of lines used to construct point shape - symbol shape for outliers size - symbol size for outliers

Note, it is possible to combine geom_point with other stats (such as stat_summary), so as to plot summaries of the data rather than raw data.

Feature	geom	stat	position	Notes, additional parameters
point	`_point`	`_identity`	identity	Scatterplot
ggplot(BOD) + geom_point(aes(x = Time, y = demand))
means point	`_point`	`_summary`	identity	Means plot
ggplot(CO2) + geom_point(aes(x = conc, y = uptake), stat = "summary", fun.y = mean)

`geom_line`

geom_line draws lines joining coordinates. Typically the stat used is stat_identity as we wish to use the values in two continuous vectors as the coordinates of each line segment. The following list describes the mapping aesthetic properties associated with geom_line. The entries in bold are compulsory.

`geom_line`
x - x axis value (continuous) y - y axis value (densities) alpha - transparency colour - colour of the lines fill - colour of the line linetype - type of lines used to construct line size - symbol size for outliers

Note, it is possible to combine geom_line with other stats (such as stat_summary), so as to plot summaries of the data rather than raw data.

Feature	geom	stat	position	Notes, additional parameters
line	`_line`	`_identity`	identity	Line plot
ggplot(BOD) + geom_line(aes(x = Time, y = demand))
means line	`_line`	`_summary`	identity	Means line plot
ggplot(CO2) + geom_line(aes(x = conc, y = uptake), stat = "summary", fun.y = mean)

`geom_smooth` and `stat_smooth`

geom_smooth draws smooths lines (and 95% confidence intervals) through data clouds. Typically the stat used is stat_smooth which in turn engages one of the available smoothing methods (e.g. lm, glm, gam, loess or rlm). The following list describes the mapping aesthetic properties associated with geom_smooth and stat_smooth. The entries in bold are compulsory.

`geom_smooth`	`stat_smooth`
x - x axis value (continuous) y - y axis value (densities) alpha - transparency colour - colour of the smooths fill - colour of the smooth linetype - type of smooths used to construct smooth size - symbol size for outliers weight - for weighting data	x - x axis value (continuous) y - y axis value (densities)

stat_smooth also has the following optional arguments:

method - the smoothing method (function). One of "lm", "glm", "gam", "loess" or "rlm"
formula - the formula for the smoothing function, expressed relative to x and y. E.g. "y~x", "y~s(x)"
se - whether to display confidence intervals
fullrange - whether the fit should span the full range of the data
level - confidence level (e.g. 0.95)
n - number of points to evaluate smoother at

Feature	geom	stat	position	Notes, additional parameters
smooth	`_smooth`	`_identity`	identity	Linear smoother
ggplot(CO2) + geom_smooth(aes(x = conc, y = uptake), method = "lm")
Lowess smoother	`_smooth`	`_stat`	identity	Lowess smoother
ggplot(CO2) + geom_smooth(aes(x = conc, y = uptake), method = "loess")
Gam smoother	`_smooth`	`_stat`	identity	Cupic regression spline smoother
library(mgcv) ggplot(CO2) + geom_smooth(aes(x = conc, y = uptake), method = "gam", formula = y ~ s(x, bs = "cr", k = 4))

`geom_tile`

geom_tile constructs heat maps given x,y coordinates and a z value to associate with the fill of each tile. The following list describes the mapping aesthetic properties associated with geom_tile and stat_tile. The entries in bold are compulsory.

`geom_tile`
x - x axis value (continuous) y - y axis value (continuous) alpha - transparency colour - colour of the borders around tiles fill - colour of the fill of tiles linetype - type of lines to use as borders to each tile size - line thickness

Feature	geom	stat	position	Notes, additional parameters	Example
tile	`_identity`	`_identity`	identity	Heat map
library(reshape) volcano.df <- melt(volcano, varnames = c("X", "Y")) ggplot(volcano.df) + geom_tile(aes(x = X, y = Y, fill = value))

`geom_contour` and `stat_contour`

geom_contour constructs contour maps given x,y coordinates and a z value from which to calculate each contour. The following list describes the mapping aesthetic properties associated with geom_contour and stat_contour. The entries in bold are compulsory.

`geom_contour`	`stat_contour`
x - x axis value (continuous) y - y axis value (continuous) alpha - transparency colour - colour of the contour lines linetype - line type of the contour lines size - line thickness weight	x - x axis value (continuous) y - y axis value (continuous) z - z axis value (continuous) order

Feature	geom	stat	position	Notes, additional parameters	Example
contour	`_identity`	`_identity`	identity	Heat map
library(reshape) volcano.df <- melt(volcano, varnames = c("X", "Y")) ggplot(volcano.df) + geom_contour(aes(x = X, y = Y, z = value))

Secondary geometric objects

geom_segment	geom_ribbon	geom_errorbar	geom_hline	geom_vline	geom_pointrange
geom_rug	geom_text

`geom_segment`

geom_segment draws segments joining coordinates. The following list describes the mapping aesthetic properties associated with geom_segment. The entries in bold are compulsory.

`geom_segment`
x - x coordinates for the start of lines xend - x coordinates for the end of lines y - y coordinates for the start of lines yend - y coordinates for the end of lines alpha - transparency colour - colour of the segments fill - colour of the segment linetype - type of segments used to construct segment size - symbol size for outliers

geom_segment also has the following optional arguments:

arrow - specification of how arrows should be constructed
lineend - style of the line end

Feature	geom	stat	position	Notes, additional parameters
segment	`_identity`	`_identity`	identity	Segments on a plot - useful for drawing lots of lines or arrows
BOD.lm <- lm(demand ~ Time, data = BOD) BOD$fitted <- fitted(BOD.lm) BOD$resid <- resid(BOD.lm) ggplot(BOD) + geom_segment(aes(x = Time, y = demand, xend = Time, yend = fitted))
segment	`_identity`	`_identity`	identity	Segments on a plot - useful for drawing lots of lines or arrows
BOD.lm <- lm(demand ~ Time, data = BOD) BOD$fitted <- fitted(BOD.lm) BOD$resid <- resid(BOD.lm) ggplot(BOD) + geom_segment(aes(x = Time, y = demand, xend = Time, yend = fitted), arrow = arrow(length = unit(0.5, "cm")))

`geom_ribbon`

geom_ribbon draws ribbons based on upper and lower levels of y associated with each level of x. The following list describes the mapping aesthetic properties associated with geom_ribbon. The entries in bold are compulsory.

`geom_ribbon`
x - x coordinates ymin - y coordinates of the lower limits ymax - y coordinates of the upper limits alpha - transparency colour - colour of the ribbons fill - colour of the ribbon linetype - type of lines used to construct the borders of the ribbon size - thickness of the lines used to border the ribbon

Feature geom stat position Notes, additional parameters Example

ribbon

Feature	geom	stat	position	Notes, additional parameters	Example
ribbon	`_identity`	`_identity`	identity	Ribbons on a plot - useful for depicting confidence envelopes
BOD.lm <- lm(demand ~ Time, data = BOD) xs <- seq(min(BOD$Time), max(BOD$Time), l = 100) pred <- data.frame(predict(BOD.lm, newdata = data.frame(Time = xs), interval = "confidence")) pred$x <- xs ggplot(pred) + geom_ribbon(aes(x = x, ymin = lwr, ymax = upr))

_identity

identity

Ribbons on a plot - useful for depicting confidence envelopes

BOD.lm <- lm(demand ~ Time, data = BOD)
xs <- seq(min(BOD$Time), max(BOD$Time),
    l = 100)
pred <- data.frame(predict(BOD.lm,
    newdata = data.frame(Time = xs),
    interval = "confidence"))
pred$x <- xs
ggplot(pred) + geom_ribbon(aes(x = x,
    ymin = lwr, ymax = upr))

`geom_errorbar`

geom_errorbar draws errorbars based on upper and lower levels of y associated with each level of x. The following list describes the mapping aesthetic properties associated with geom_errorbar. The entries in bold are compulsory.

`geom_errorbar`
x - x coordinates ymin - y coordinates of the lower limits ymax - y coordinates of the upper limits alpha - transparency colour - colour of the errorbars fill - colour of the errorbar linetype - type of lines used to construct the borders of the errorbar size - thickness of the lines used to border the errorbar width - width of the errorbars

Feature	geom	stat	position	Notes, additional parameters
errorbar	`_identity`	`_identity`	identity	Error bars on a plot - useful for adding to means plots etc
library(plyr) warpbreaks.df <- ddply(warpbreaks, ~wool, function(x) { Hmisc:::smean.cl.boot(x$breaks) }) ggplot(warpbreaks.df) + geom_errorbar(aes(x = wool, ymin = Lower, ymax = Upper))
errorbar	`_identity`	`_summary`	identity	Error bars on a plot - useful for adding to means plots etc
ggplot(warpbreaks) + geom_errorbar(aes(x = wool, y = breaks), stat = "summary", fun.data = "mean_cl_boot")

library(ggmap) hdf <- get_map(location=c(lon=146.8,lat=-19.20), zoom=12, maptype="satellite") ggmap(hdf, extent = 'normal')

Coordinate system - coord

The coordinate system controls the nature and scale of the axes.

System	Parameters	Example
Regular cartesian coordinate system `coord_cartesian`	`xlim` - x limits `ylim` - y limits
ggplot(BOD) + coord_cartesian() + geom_line(aes(y = demand, x = Time))
Polar coordinate system `coord_polar`	`theta="x"` - angle variable `start=0` - initial angle from 12 oclock `direction=1` - 1=clockwise, -1=anticlockwise
ggplot(BOD) + coord_polar() + geom_line(aes(y = demand, x = Time))
Flipped the axes `coord_flip`	`xlim=NULL` - y limits `ylim=NULL` - x limits
ggplot(BOD) + coord_flip() + geom_line(aes(y = demand, x = Time))
Fix the ratio of axes dimesions `coord_fixed`	`ratio=1` - y/x ratio `xlim=NULL` - x limits `ylim=NULL` - y limits
ggplot(BOD) + coord_fixed(ratio = 0.25) + geom_line(aes(y = demand, x = Time))
1:1 (equal) ratio of axes dimesions same as `coord_fixed(ratio=1)` `coord_equal`	`xlim=NULL` - x limits `ylim=NULL` - y limits
ggplot(BOD) + coord_equal() + geom_line(aes(y = demand, x = Time))
Map projection coordinate system `coord_map`	`projection="mercator"` - mapping projection `orientation=c(90,0,mean(range(x)))` - map orientation
# get high resolution map of # Australia (and islands) data library(maps) library(mapdata) aus <- map_data("worldHires", region = "Australia") # Orthographic coordinates ggplot(aus, aes(x = long, y = lat, group = group)) + coord_map("ortho", orientation = c(-20, 125, 23.5)) + geom_polygon()

Altering the axes scales via the coordinate system

Modifying scales with coords affects the zoom on the graph. That is, it defines the extent and nature of the axes coordinates. By contrast, altering limits via scale_ routines will alter the scope of data included in a manner analogous to operating on a subset of the data.

Default scale	Scale via `coord_` (Zoom)	Scale via `scale_`

# Default scales ggplot(BOD, aes(y = demand, x = Time)) + geom_point() + geom_smooth(method = "lm") # Zoom on x-axis ggplot(BOD, aes(y = demand, x = Time)) + coord_cartesian(xlim = c(2, 6)) + geom_point() + geom_smooth(method = "lm") # Scale (subset) the x data ggplot(BOD, aes(y = demand, x = Time)) + scale_x_continuous(limits = c(2, 6)) + geom_point() + geom_smooth(method = "lm")

Default scale

Scale via coord_ (Zoom)

Scale via scale_

# Default scales
ggplot(BOD, aes(y = demand, x = Time)) + geom_point() + geom_smooth(method = "lm")

# Zoom on x-axis
ggplot(BOD, aes(y = demand, x = Time)) + coord_cartesian(xlim = c(2, 6)) + geom_point() + geom_smooth(method = "lm")

# Scale (subset) the x data
ggplot(BOD, aes(y = demand, x = Time)) + scale_x_continuous(limits = c(2, 6)) + geom_point() + geom_smooth(method = "lm")

In addition to altering the zoom of the axes, axes (coordinate system) scales can be transformed to other scales via the coord_trans function. Transformations of the coordinate system take place after statistics have been calculated and geoms derived. Therefore the shape of geoms are altered.

The coord_trans() function has the following argments:

xtrans: a transformer that will operate on the x scale
ytrans: a transformer that will operate on the y scale
limx: limits of the x axis
limy: limits of the y axis

A transformer is a function that defines a transformation along with its inverse and rules on how to generate pretty breaks (tick marks) and their labels

To illustrate the distinction between coord_trans and scale_, we will generate some curvilinear data.

set.seed(1)
n <- 50
dat <- data.frame(x = exp((1:n + rnorm(n, sd = 2))/10), y = 1:n + rnorm(n, sd = 2))

Linear scales	`coord_trans`	`scales_`
Linear spacing of axis ticks	Log₁₀ spacing of axis ticks on a linear scale	Linear spacing of axis ticks on a log₁₀ scale
ggplot(dat, aes(y = y, x = x)) + geom_point() ggplot(dat, aes(y = y, x = x)) + geom_point() + coord_trans(xtrans = log10_trans()) ggplot(dat, aes(y = y, x = x)) + geom_point() + scale_x_continuous(trans = log10_trans())
Linear trend applied to curved data	Linear trend applied to curved data, then bent by coordinates rescaling	Linear trend applied to scaled (linear) data
ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + coord_trans(xtrans = log10_trans()) ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + scale_x_continuous(trans = log10_trans())

More details of transformers

Transformers

`trans_new`

The trans_new function itself defines and returns a list structure comprising;

name
transform
inverse
breaks
format
domain

To illustrate the trans_new function, lets define a natural log (ln) transformer to apply to our artificial data.

ggplot(dat, aes(y=y,x=x)) + geom_point() +
  geom_smooth(method="lm") +
  scale_x_continuous(trans=trans_new(name="ln",
                                transform=function(x) log(x),
                                inverse=function(x) exp(x),
                                breaks=function(x) pretty(x),
                                domain=c(1e-100,Inf)))

## Error: 'from' cannot be NA, NaN or infinite

ln_trans <- function() {
    name <- "ln"
        trans <- function(x) log(x)
    inv <- function(x) exp(x)
        breaks <- function(x) pretty(x)
    format <- function(x) x
    domain <- c(1e-100,Inf)
    trans_new(name,transform=trans,inverse=inv,
              breaks=breaks, domain=domain)
}

ggplot(dat, aes(y=y,x=x)) + geom_point()+
  geom_smooth(method="lm")+
  scale_x_continuous(trans=ln_trans())

## Error: 'from' cannot be NA, NaN or infinite

coord_trans(xtrans = "identity", ytrans = "identity",limx = NULL, limy = NULL)

Finer control of the transformation can be exersized. Consider the following examples using the same dataset.

In order to be able to use trans_new effectively, it is necessary to understand the data parsed to each of the functions within the transformer. In the following demonstration, I have placed print statements within each of the functions so as to illustrate the sequence in which the functions are called (relative to each other and other external functions) as well as the input data of each function. Note for this demonstration, I have ommitted the smoother as it would also result in calls to these functions and therefore compound the sequencing.

p<-ggplot(dat, aes(y=y,x=x)) + geom_point()+# + geom_smooth(method="lm") +
  scale_x_continuous(trans=trans_new(name="",transform=function(x) {
     cat("**Tranform begin**\n");
     print(x);
     log10(x);
         },
                                inverse=function(x) {
     cat("**Inverse**\n");
     print(x);
     10^(x);
     },
                                breaks=function(x) {
         cat("**Breaks**\n");
     print(x);
         pretty(x);
     },
                            format=function(x) {cat("**Format**\n");print(x);x;},
                                domain=c(1e-100,Inf)))
p+ theme(plot.background = element_rect(fill = "transparent",colour = NA))

**Tranform begin**
 [1]   0.975   1.267   1.142   2.052   1.761
 [6]   1.546   2.220   2.580   2.760   2.557
[11]   4.065   3.589   3.241   2.604   5.612
[16]   4.909   5.456   7.307   7.879   8.321
[21]   9.814  10.553  10.124   7.405  13.790
[26]  13.313  14.423  12.254  16.517  21.837
[31]  29.129  24.033  29.298  29.643  25.143
[36]  33.683  37.380  44.174  61.560  63.601
[41]  58.387  63.391  84.723  91.043  78.433
[46]  86.358 118.264 141.699 131.306 177.013
**Inverse**
[1] -0.1239  2.3610
**Breaks**
[1]   0.7517 229.5905
**Tranform begin**
[1]   0.7517 229.5905
**Tranform begin**
[1]   0  50 100 150 200 250
**Inverse**
[1]  -Inf 1.699 2.000 2.176 2.301    NA
**Format**
[1]   0  50 100 150 200  NA

Error: 'from' cannot be NA, NaN or infinite

The sequence is as follows;

The first call to the **Transform** function is parsed the raw data
The first call to the **Inverse** function is parsed the computed axes limits (on the log₁₀ scale). These originate in another part of the ggplot engine. Following the action of the transformation function, other functions determine the limits of the axes based on the transformed data as well as the nominated axis expansion factor (places a buffers beyond the data such that geoms do not overlapp axes). The inverse function then converts these limits into limits in the original raw data scale.
The first call to the **Breaks** function is parsed the axes limits on the scale of the raw data and defines the spacing of axes tick marks
The second call to the **Transform** function takes the axis limits and rescales them into the log₁₀ scale
The third call to the **Transform** function takes the axis tick mark spacing from **Breaks** and rescales them into the log₁₀ scale
The second call to the **Inverse** function takes the axis tick marks spacing on the log₁₀ scale and rescales into the scale of the raw data
Finally, the **Format** function is used to define the labels to be applied to the tick marks

Axes in the scale of observations
	## Error: 'from' cannot be NA, NaN or infinite
p <- ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + scale_x_continuous(trans = log10_trans(), breaks = trans_breaks(function(x) log10(x), function(x) 10^(x))) p p <- ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + scale_x_continuous(trans = log10_trans(), breaks = trans_breaks(function(x) x, function(x) x)) p p <- ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + scale_x_continuous(trans = log10_trans(), breaks = as.vector(c(1, 2, 5) %o% 10^(-1:2))) p
Axes in the scale of logarithms

p <- ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + scale_x_continuous("x log", trans = log10_trans(), breaks = trans_breaks(function(x) log10(x), function(x) 10^(x)), labels = trans_format(function(x) log10(x), format = format_format())) p p <- ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + scale_x_continuous(trans = log10_trans(), breaks = trans_breaks(function(x) log10(x), function(x) 10^(x)), labels = trans_format(function(x) log10(x), format = scientific_format())) p p <- ggplot(dat, aes(y = y, x = x)) + geom_point() + geom_smooth(method = "lm") + scale_x_continuous(trans = log10_trans(), breaks = trans_breaks(function(x) log10(x), function(x) 10^(x)), labels = trans_format(function(x) log10(x), format = math_format(10^.x))) p

`*_trans` transformers

The _trans family of transformers are convienient wrappers for the trans_new function.

Transformer	Desciption
`asn_trans()`	Arc-sin square-root transformation (of proportions/percentages).
`atanh_trans()`	Arc-tangent transformation
`boxcox_trans(p)`	Box-Cox power transformation When the power exponent (`p`) is equal to 0, values are logged For exponents other than zero, 1 is subtracted from the value are raised to the power of the exponent and this is then divided by the exponent.
`date_trans`
`exp_trans`
`identity_trans`
`log10_trans`
`log1p_trans`
`log2_trans`
`log_trans`
`logit_trans`
`probability_trans`
`probit_trans`
`reciprocal_trans`
`reverse_trans`
`sqrt_trans`
`time_trans`

Transform axes scale (logs)	1:1 axes scales
## 'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1) ## theme_rect is deprecated. Use 'element_rect' instead. (Deprecated; last used in version 0.9.1)
# log10 axes scales ggplot(BOD) + coord_trans(xtrans = "log10", ytrans = "log10") + geom_line(aes(y = demand, x = Time))

Transform axes scale (logs)

1:1 axes scales

## 'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1)
## theme_rect is deprecated. Use 'element_rect' instead. (Deprecated; last used in version 0.9.1)

# log10 axes scales
ggplot(BOD) + coord_trans(xtrans = "log10", ytrans = "log10") +
    geom_line(aes(y = demand, x = Time))

Modifying scales with coords affects the zoom on the graph. That is, it defines the extent and nature of the axes coordinates. By contrast, altering limits via scale_ routines will alter the scope of data included in a manner analogous to operating on a subset of the data.

Scales

The idea of scales is that you present the plotting engine with data or characteristics in one scale and use the various scale_ functions to convert those data into another scale. In the grammar of graphics, scales are synonymous for units of data, colors, shapes, sizes etc of plotting features and the axes and guides (legends) provide a visual cue for what the scales are. For example;

you might include data that ranges from 10 to 20 units, yet you wish to produce a plot that zooms in on the range 12-16.
you have presented grouped data (data with multiple trends) and instructed the graphing engine to assign different colour codes to each trend. You can then define a colour scale to adjust the exact colours rendered.
similarly, you might have indicated how plotting symbol shape and size are to be distinguished in your data set. You can then assign scales that define the exact shapes and symbol sizes rendered.

Technically, scales determine how attributes of the data are mapped into aesthetic geom properties. The majority of geom's (geometric objects) have the following aesthetic properties:

x - the x position (coordinates) of the geom
y - the y position (coordinates) of the geom
size - the size of the geom (e.g. the size of a point)
shape - the shape of the geom
linetype - the type of line associated with the geom's outline (solid, dashed etc)
colour - the colour of the geom's outline (note the English spelling of the word colour)
fill - the colour of the geom's fill
alpha - the transparency of the geom (0=transparent, through to 1=opaque)

In turn, each of these properties are mapped to a scale - the defaults of which are automatically selected according to what is appropriate for the sort of data. For example, data can be on a continuous or discrete (categorical) scale. Most data type have the following possible scales for each of the above properties:

_continuous - when you want the scale increments (such as the different point sizes, colours etc) to be determined from a continuous vector in your data frame.
_discrete - when you want the scale increments (such as the different point sizes colours etc) to be determined from a categorical vector in your data frame.
_manual - is a variation on _discrete and is used when you wish to manually indicate the characteristic of each increment. You need to provide as many values as there are levels of your discrete vector.
_identity - is another variation on _discrete and is used when you wish for the values in your categorical vector to be used un-scaled as the characteristics of the data. For example, your data frame might contain a vector of colour names or point sizes.

Some properties, such as colour also have additional scales that are specific to the characteristic. The scales effect not only the characteristics of the geoms, they also effect the guides (legends) that accompany the geoms.

Scaling functions comprise the prefix scale_, followed by the name of an aesthetic property and suffixed by the type of scale. Hence a function to manually define a colour scale would be scale_colour_manual.

All scales have the following arguments available:

name - a title applied to the scale. In the case of scales for x and y (the x,y coordinates of geoms), the name is the axis title. For all other scales, the name is the title of the guide (legend).
breaks - the increments on the guide. For scale_x_ and scale_y_, breaks are the axis tick locations. For all other scales, the breaks indicate the increments of the characteristic in the legend (e.g. how many point shapes are featured in the legend).
labels - the labels given to the increments on the guide. For scale_x_ and scale_y_, labels are the axis tick labels. For all other scales, the labels are the labels given to items in the legend.
limits - the span/range of data represented in the scale. Note, if the range is inside the range of the data, the data are sub-setted.
trans - scale transformations applied - obviously this is only relevant to scales that are associated with continuous data.

Scaling the x and y values (`scale_x_`)

The scale_x_ and scale_y_ scales control the x and y axes and in addition to the common arguments listed above, the following optional arguments available for specific scales:

expand - a vector of length two that indicates multiplicative and additive constants used to expand the axes away from the data thereby ensuring that geoms do not intersect with the axes.
minor_breaks - the increments for the minor breaks along the axis. The minor breaks have a grid line yet no tick marks or labels.

`scale_x_continuous`	`scale_x_continuous`	`scale_x_continuous`
linear scaling	linear with nice title	linear with more space

# Linear axes scales with altered axis title ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() + scale_x_continuous(name = "CO2 conc") # Linear axes scales with more complex title ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() + scale_x_continuous(name = expression(paste("Ambient ", CO[2], " concentration (mg/l)", sep = ""))) # Linear axes scales with more space along the x # axis ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() + scale_x_continuous(name = "CO2 conc", expand = c(0, 200))
`scale_x_log10`	`scale_x_sqrt`	`scale_x_reverse`
Log10 scale	Square-root scale	Reverse scale
Shortcut for `scale_x_continuous(trans= log10_trans())`	Shortcut for `scale_x_continuous(trans= sqrt_trans())`
# log10 axes scales ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() + scale_x_log10(name = "CO2 conc", breaks = as.vector(c(1, 2, 5, 10) %o% 10^(-1:2))) # square-root transformation ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() + scale_x_sqrt(name = "CO2 conc") # reverse the data ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() + scale_x_reverse(name = "CO2 conc")
`scale_x_date`	`scale_x_datetime`	`scale_x_discrete`
For more info on date breaks see `date_breaks` For more date formats see `strptime`	For more info on date breaks see `date_breaks` For more date formats see `strptime`
# Date format library(scales) CO2$Date <- as.Date(paste(2000 + as.numeric(as.factor(CO2$conc)), "-01-01", sep = "")) ggplot(CO2, aes(y = uptake, x = Date)) + geom_point() + scale_x_date(name = "Year", breaks = "2 years", minor_breaks = "6 month", labels = date_format("%Y")) # POSIX format library(scales) CO2$DateTime <- as.POSIXct(paste(2000, "-0", as.numeric(as.factor(CO2$conc)), "-01 09:00:00", sep = "")) ggplot(CO2, aes(y = uptake, x = DateTime)) + geom_point() + scale_x_datetime(name = "Time (days)", breaks = "2 months", minor_breaks = "1 months", labels = date_format("%b")) # categorical axis ggplot(CO2, aes(y = uptake, x = Treatment)) + geom_point() + scale_x_discrete(name = "Treatment")

Scaling the size of geoms (`scale_size_`)

The scale_size_ scales control the size of geoms (such as the size of points) and in addition to the common scale arguments, the following optional arguments available:

range - the minimum and maximum size
values - the specific sizes to use (for _manual scale)
guide - whether to include a guide and what sort of guide to include (e.g. "legend")

`scale_size_continuous`	`scale_size_discrete`	`scale_size_manual`
Scale the geoms according to a continuous vector	Scale the geoms according to a categorical vector	Manually determine the size of geoms
`range` - minimum and maximum geom size		`values` - a set of values to use for sizes
# size determined by continuous covariate set.seed(123) CO2$cv <- runif(nrow(CO2), 10, 50) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(size = cv)) + scale_size_continuous(name = "Temperature") # Discrete sizes ranging in size from 2 to 4 ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(size = Type)) + scale_size_discrete(name = "Type", range = c(2, 4)) # Manual sizes of exactly 2 and 4 ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(size = Type)) + scale_size_manual(name = "Type", values = c(2, 4))
`scale_size_identity`
Size the geoms according to the values of a continuous vector (don't scale)
`guide` - whether to include a guide (legend)
# Sizes provided by a covariate set.seed(123) CO2$Count <- runif(nrow(CO2), 0, 10) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(size = Count)) + scale_size_identity(name = "Type", guide = "legend")

Scaling the shape of geoms (`scale_shape_`)

The scale_shape_ scales control the shape of geoms (such as the shape of the plotting point) an in addition to all of the regular arguments, the following optional arguments are available:

solid - whether the shapes should be solid (TRUE) or outlined (FALSE)

scale_shape_discrete scale_shape_manual scale_shape_identity

Geom shapes determined (scaled) by categorical variable Geom shapes determined (scaled) manually Geom shapes determined by categorical variable (no scaling)

`scale_shape_discrete`	`scale_shape_manual`	`scale_shape_identity`
Geom shapes determined (scaled) by categorical variable	Geom shapes determined (scaled) manually	Geom shapes determined by categorical variable (no scaling)
	`values` - a set of values (or shape names) to use for shapes
# Discrete shapes determined by the combination of # Type and Treatment The items in the guide are # then rearranged and re-labelled CO2$Comb <- interaction(CO2$Type, CO2$Treatment) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(shape = Comb)) + scale_shape_discrete(name = "Type", breaks = c("Quebec.nonchilled", "Quebec.chilled", "Mississippi.nonchilled", "Mississippi.chilled"), labels = c("Quebec non-chilled", "Quebec chilled", "Miss. non-chilled", "Miss. chilled")) # Manual shapes ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(shape = Treatment), size = 2) + scale_shape_manual(name = "Treatment", values = c(16, 21)) # Identity shapes set.seed(123) CO2$Count <- runif(nrow(CO2), 0, 10) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(shape = Count)) + scale_shape_identity(name = "Species", guide = "legend")

plot of chunk plotGgplotScaleShapeDiscrete

plot of chunk plotGgplotScaleShapeManual

values - a set of values (or shape names) to use for shapes

plot of chunk plotGgplotScaleShapeIdentity

# Discrete shapes determined by the combination of
# Type and Treatment The items in the guide are
# then rearranged and re-labelled
CO2$Comb <- interaction(CO2$Type, CO2$Treatment)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(shape = Comb)) +
    scale_shape_discrete(name = "Type", breaks = c("Quebec.nonchilled",
        "Quebec.chilled", "Mississippi.nonchilled",
        "Mississippi.chilled"), labels = c("Quebec non-chilled",
        "Quebec chilled", "Miss. non-chilled", "Miss. chilled"))

# Manual shapes
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(shape = Treatment),
    size = 2) + scale_shape_manual(name = "Treatment",
    values = c(16, 21))

# Identity shapes
set.seed(123)
CO2$Count <- runif(nrow(CO2), 0, 10)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(shape = Count)) +
    scale_shape_identity(name = "Species", guide = "legend")

Scaling the linetype associated with geoms (`scale_linetype_`)

The scale_size_ scales control the type of lines used in geoms and have the following additional optional arguments available:

values - values supplied to manually determine the line types

scale_linetype_discrete scale_linetype_manual scale_linetype_identity

Geom linetypes determined (scaled) by categorical variable Geom linetypes determined (scaled) manually Geom linetypes determined by categorical variable (no scaling)

`scale_linetype_discrete`	`scale_linetype_manual`	`scale_linetype_identity`
Geom linetypes determined (scaled) by categorical variable	Geom linetypes determined (scaled) manually	Geom linetypes determined by categorical variable (no scaling)
	`values` - a set of values (or linetype names) to use for linetypes
# Discrete shapes determined by the combination of # Type and Treatment The items in the guide are # then rearranged and re-labelled CO2$Comb <- interaction(CO2$Type, CO2$Treatment) ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(linetype = Comb)) + scale_linetype_discrete(name = "Type", breaks = c("Quebec.nonchilled", "Quebec.chilled", "Mississippi.nonchilled", "Mississippi.chilled"), labels = c("Quebec non-chilled", "Quebec chilled", "Miss. non-chilled", "Miss. chilled")) # Manual linetypes ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(linetype = Treatment)) + scale_linetype_manual(name = "Treatment", values = c("dashed", "dotted")) # Identity linetypes CO2$Lines <- factor(CO2$Treatment, levels = c("nonchilled", "chilled"), labels = c("dotted", "dashed")) ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(linetype = Lines)) + scale_linetype_identity(name = "Temperature", guide = "legend", breaks = c("dotted", "dashed"), labels = c("Low", "High"))

plot of chunk plotGgplotScaleLinetypeDiscrete

plot of chunk plotGgplotScaleLinetypeManual

values - a set of values (or linetype names) to use for linetypes

plot of chunk plotGgplotScaleLinetypeIdentity

# Discrete shapes determined by the combination of
# Type and Treatment The items in the guide are
# then rearranged and re-labelled
CO2$Comb <- interaction(CO2$Type, CO2$Treatment)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(linetype = Comb)) +
    scale_linetype_discrete(name = "Type", breaks = c("Quebec.nonchilled",
        "Quebec.chilled", "Mississippi.nonchilled",
        "Mississippi.chilled"), labels = c("Quebec non-chilled",
        "Quebec chilled", "Miss. non-chilled", "Miss. chilled"))

# Manual linetypes
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(linetype = Treatment)) +
    scale_linetype_manual(name = "Treatment", values = c("dashed",
        "dotted"))

# Identity linetypes
CO2$Lines <- factor(CO2$Treatment, levels = c("nonchilled",
    "chilled"), labels = c("dotted", "dashed"))
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(linetype = Lines)) +
    scale_linetype_identity(name = "Temperature", guide = "legend",
        breaks = c("dotted", "dashed"), labels = c("Low",
            "High"))

Scaling the colour (or fill) associated with geoms (`scale_colour_` & `scale_fill_`)

The scale_size_ scales control the colour of geoms and have the following additional optional arguments available:

low - colour of low end of the colour spectrum
high - colour of high end of the colour spectrum
guide - what sort of legend (e.g. colorbar)

`scale_colour_continuous`	`scale_colour_gradient`	`scale_colour_gradient2`
Geom colours determined (scaled) by continuous variable	Geom colours determined (scaled) palette	Geom colours determined by a different palette

# colour determined by continuous covariate set.seed(123) CO2$cv <- runif(nrow(CO2), 10, 50) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = cv)) + scale_colour_continuous(name = "Temperature", low = "blue", high = "red") # colour determined by continuous covariate set.seed(123) CO2$cv <- runif(nrow(CO2), 10, 50) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = cv)) + scale_colour_gradient(name = "Temperature") # colour determined by continuous covariate set.seed(123) CO2$cv <- runif(nrow(CO2), 10, 50) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = cv)) + scale_colour_gradient2(name = "Temperature")
`scale_colour_gradientn`
Geom colours determined (scaled) a specific palette

# colour determined by continuous covariate use a # predefined gradient based colour palette set.seed(123) CO2$cv <- runif(nrow(CO2), 10, 50) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = cv)) + scale_colour_gradientn(name = "Temperature", colours = terrain.colors(5))

`scale_colour_hue`	`scale_colour_grey`	`scale_colour_brewer`
Evenly spaced geom colours determined (scaled) by hue	Geom colours determined (scaled) palette	Geom colours determined by a different palette
		See the color brewer site for more info
# Discrete colours for hue set.seed(123) CO2$cv <- runif(nrow(CO2), 0, 100) CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100), labels = c("Low", "Medium", "High")) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = Temp)) + scale_colour_hue(name = "Temperature", l = 80, c = 130) # Discrete colours set.seed(123) CO2$cv <- runif(nrow(CO2), 0, 100) CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100), labels = c("Low", "Medium", "High")) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = Temp)) + scale_colour_grey(name = "Temperature", start = 0.2, end = 0.8) # Discrete colours selected from a colour brewer # palette it automatically knows how many colours # are required set.seed(123) CO2$cv <- runif(nrow(CO2), 0, 100) CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100), labels = c("Low", "Medium", "High")) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = Temp)) + scale_colour_brewer(name = "Temperature", type = "seq", palette = "Reds")
`scale_colour_manual`	`scale_colour_identity`
Geom colours determined (scaled) a specific palette

# Manual colours set.seed(123) CO2$cv <- runif(nrow(CO2), 0, 100) CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100), labels = c("Low", "Medium", "High")) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(colour = Temp)) + scale_colour_manual(name = "Temperature", values = c("red", "#00AA00", 1)) # identity colours set.seed(123) CO2$cv <- runif(nrow(CO2), 0, 100) CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100), labels = c("red", "#00AA00", 1)) ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth(aes(colour = Temp)) + scale_colour_identity(name = "Temperature", guide = "legend", labels = c("Low", "Medium", "High"))

Scaling the alpha level of colour associated with geoms (`scale_alpha_`)

The scale_alpha_ scales control the transparency of geoms and have the following additional optional arguments available:

range - the alpha range (0,1)
values - alpha values between 0 and 1
guide - what sort of legend (e.g. colorbar)

`scale_alpha_continuous`	`scale_alpha_discrete`	`scale_alpha_manual`
Evenly spaced geom alphas determined (scaled) by continuous	Geom alphas determined (scaled) palette	Geom alphas determined by a different palette

# colour determined by continuous covariate set.seed(123) CO2$cv <- runif(nrow(CO2), 10, 50) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(alpha = cv)) + scale_alpha_continuous(name = "Temperature", range = c(0.3, 1)) # Discrete alphas set.seed(123) CO2$cv <- runif(nrow(CO2), 0, 100) CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100), labels = c("Low", "Medium", "High")) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(alpha = Temp)) + scale_alpha_discrete(name = "Temperature") # Manual alphas set.seed(123) CO2$cv <- runif(nrow(CO2), 0, 100) CO2$Temp <- cut(CO2$cv, breaks = c(0, 33, 66, 100), labels = c("Low", "Medium", "High")) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(alpha = Temp)) + scale_alpha_manual(name = "Temperature", values = c(0.3, 0.6, 0.95))
`scale_alpha_identity`
Geom alphas determined (scaled) a specific palette

# Identity alphas set.seed(123) CO2$Alpha <- runif(nrow(CO2), 0, 1) ggplot(CO2, aes(y = uptake, x = conc)) + geom_point(aes(alpha = Alpha)) + scale_alpha_identity(name = "Temperature")

Faceting splits the data up into a matrix of panels on the basis of one or more categorical vectors. Since facets display subsets of the data, they are very useful for examining trends in hierarchical designs.

There are two faceting function, that reflect two alternative approaches:

facet_wrap(~cell) - creates a set of panels based on a factor and wraps the panels into a 2-d matrix. cell represents a categorical vector or set of categorical vectors
facet_wrap(row~column) - creates a set of panels based on a factor and wraps the panels into a 2-d matrix. row and column represents the categorical vectors used to define the rows and columns of the matrix respectively

facet_wrap

facet_grid

The following list describes the mapping aesthetic properties associated with facet_wrap and facet_grid functions. The entries in bold are compulsory.

`facet_wrap`	`facet_grid`
facets - formula specifying faceting variables to use in faceting nrow - number of rows ncol - number of columns scales - should all scaled be fixed or free "fixed" - (default) all scales the same "free" - all scales free "free_x" - all x-axis scales free "free_y" - all y-axis scales free as.table - if TRUE, laid out from top left to bottom right, if FALSE: bottom left to top right drop - drop factor combinations that lack data	facets - formula specifying faceting variables to use in faceting margins - whether to include marginal trends scales - should all scaled be fixed or free "fixed" - (default) all scales the same "free" - all scales free "free_x" - all x-axis scales free "free_y" - all y-axis scales free space- should all panels take up the same space "fixed" - (default) all panels take up the same space "free" - panel heights and widths vary "free_x" - panel widths vary "free_y" - panel heights vary labeller - a function used to label the panel strips as.table - if TRUE, laid out from top left to bottom right, if FALSE: bottom left to top right drop - drop factor combinations that lack data

facet_wrap

facet_grid

facets - formula specifying faceting variables to use in faceting
nrow - number of rows
ncol - number of columns
scales - should all scaled be fixed or free
- "fixed" - (default) all scales the same
- "free" - all scales free
- "free_x" - all x-axis scales free
- "free_y" - all y-axis scales free
as.table - if TRUE, laid out from top left to bottom right, if FALSE: bottom left to top right
drop - drop factor combinations that lack data

facets - formula specifying faceting variables to use in faceting
margins - whether to include marginal trends
scales - should all scaled be fixed or free
- "fixed" - (default) all scales the same
- "free" - all scales free
- "free_x" - all x-axis scales free
- "free_y" - all y-axis scales free
space- should all panels take up the same space
- "fixed" - (default) all panels take up the same space
- "free" - panel heights and widths vary
- "free_x" - panel widths vary
- "free_y" - panel heights vary
labeller - a function used to label the panel strips
as.table - if TRUE, laid out from top left to bottom right, if FALSE: bottom left to top right
drop - drop factor combinations that lack data

Facet	Notes, additional parameters	Example
`_wrap`	Matrix of panels split by a single categorical vector
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() + geom_point() + facet_wrap(~Plant)
`_wrap`	Matrix of panels split by a single categorical vector with different y-axis scale range for each panel
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() + geom_point() + facet_wrap(~Plant, scales = "free_y")
`_grid`	Matrix of panels split by a single categorical vector with different y-axis scale range for each panel
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() + geom_point() + facet_grid(Type ~ Treatment)
`_grid`	Matrix of panels split by a single categorical vector with different y-axis scale range for each panel
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() + geom_point() + facet_grid(Type ~ Treatment, scales = "free_y")

Themes

Themes govern the overall style of the graphic. In particular, they control:

the look and positioning of the axes (and their ticks, titles and labels)
the look and positioning of the legends (size,alignment, font, direction)
the look of plots (spacing and titles)
the look of panels (background, grid lines)
the look of panels strips (background, alignment, font)

Theme	Notes, additional parameters	Example
`_bw`	Black and white theme
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() + geom_point() + theme_bw()
`_classic`	Classic theme
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() + geom_point() + theme_classic()
`_grey`	Grey theme
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() + geom_point() + theme_grey()
`_minimal`	Minimal theme
ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() + geom_point() + theme_minimal()

Along with these pre-fabricated themes, it is possible to create your own theme. This is done via the theme() function. Each themable element comprises of either a line, rectangle or text. Therefore, they can all be modified via one of the following functions:

element_blank() - remove the element
element_line() - set the properties of a line
element_rect() - set the properties of a rectangle
element_text() - set the properties of text

library(gridExtra)
ggplot(CO2, aes(y=uptake, x=conc)) + geom_smooth(aes(colour=Type)) + geom_point() +
theme(panel.grid.major = element_blank(), # no major grid lines
  panel.grid.minor = element_blank(), # no minor grid lines
  panel.background = element_blank(), # no background
  panel.border = element_blank(), # no plot border
  axis.title.y=element_text(size=15, vjust=0,angle=90), # y-axis title
  axis.text.y=element_text(size=12), # y-axis labels
  axis.title.x=element_text(size=15, vjust=-2), # x-axis title
  axis.text.x=element_text(size=12), # x-axis labels
  axis.line = element_line(),
  legend.position=c(1,0),
  legend.justification=c(1,0),
  plot.margin=unit(c(0.5,0.5,2,2),"lines")) # plot margins

Examples

Exploring distributions

Boxplots - geom_boxplot & stat_boxplot

Univariate boxplots

Basic boxplot

Plain boxplot

# Univariate boxplot
ggplot(BOD) + geom_boxplot(aes(y = demand, x = "Demand"))

#Conditional boxplot
p <- ggplot(BOD) +
 geom_boxplot(aes(y=demand,x=1)) +
 scale_y_continuous("Biochemical oxygen demand (mg/l)") +
 scale_x_continuous(limits=c(0,2),breaks=NULL)

p + theme(panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  panel.background = element_blank(),
  panel.border = element_blank(),
  axis.title.y=element_text(size=15, vjust=0,angle=90),
  axis.text.y=element_text(size=12),
  axis.title.x=element_blank(),
  axis.text.x=element_blank(),
  axis.line = element_line(),
  plot.margin=unit(c(0.5,0.5,0.5,2),"lines")
 )

Conditional (factorial) boxplots

Basic factorial boxplot	Plain factorial boxplot

# Conditional boxplot ggplot(warpbreaks) + geom_boxplot(aes(y = breaks, x = wool)) # Plain conditional boxplot p <- ggplot(warpbreaks) + geom_boxplot(aes(y = breaks, x = wool)) + scale_y_continuous("Number of wool breaks") + scale_x_discrete("Type of wool") p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y = element_text(size = 15, vjust = 0, angle = 90), axis.text.y = element_text(size = 12), axis.title.x = element_text(size = 15, vjust = -1), axis.text.x = element_text(size = 12), axis.line = element_line(), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))
Basic factorial boxplot	Plain factorial boxplot

ggplot(warpbreaks) + geom_boxplot(aes(y = breaks, x = wool, fill = tension)) p <- ggplot(warpbreaks) + geom_boxplot(aes(y=breaks,x=wool, fill=tension)) + scale_y_continuous("Number of wool breaks") + scale_x_discrete("Type of wool")+ labels=c("Low","Medium","High"),start=0.5,end=1) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15, vjust=-1), axis.text.x=element_text(size=12), axis.line = element_line(), legend.position=c(1,1),legend.justification=c(1,1), plot.margin=unit(c(0.5,0.5,2,2),"lines") )

Violin Plot - geom_violin

Violin plot

Plain violin plot

ggplot(warpbreaks, aes(y = breaks, x = wool)) + geom_violin()

library(grid)
library(scales)
p<-ggplot(warpbreaks, aes(y=breaks, x=wool))+
  geom_violin()+
  scale_x_discrete("Wool type")+
  scale_y_continuous("Number of breaks", expand=c(0.05,0), labels=comma)
p +       theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,0.2),legend.justification=c(1,0),
        plot.margin=unit(c(0.5,0.5,2,2),"lines"),
    legend.key=element_blank()
  )

Histograms - geom_histogram, geom_bar & stat_bin

Univariate histograms

Basic histogram	Plain histogram

ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers)) # OR ggplot(data = data.frame(rivers)) + geom_histogram(aes(x = rivers)) p <- ggplot(data=data.frame(rivers)) + geom_bar(aes(x=rivers),colour='black',fill='gray')+ scale_x_continuous("Length of rivers (miles)")+ scale_y_continuous("Frequency", expand=c(0,0))+ coord_cartesian(xlim=c(0,4000)) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=12), axis.line = element_line(), plot.margin=unit(c(0.5,0.5,2,2),"lines") )
Number of bins	Plain bin width

# Histogram with customized bin widths ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers), binwidth = 50) # OR ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers)) #Plain histogram with custom bin widths #use the expand() to scale the axis zero to 0 p <- ggplot(data=data.frame(rivers)) + geom_bar(aes(x=rivers),binwidth=50,colour='black',fill='gray')+ scale_x_continuous("Length of rivers (miles)", expand=c(0,0))+ scale_y_continuous("Frequency", expand=c(0,0)) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=12), axis.line = element_line(), plot.margin=unit(c(0.5,0.5,2,2),"lines") )

Scaled x-values	Plain transformed x-values

# Histogram on log transformed data ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers)) + scale_x_continuous(trans = "log10") # OR ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers)) + scale_x_log10() # Plain histogram of log transformed data #define a new axis label formattter p <- ggplot(data=data.frame(rivers)) + geom_bar(aes(x=rivers),colour='black',fill='gray')+ scale_x_continuous("Length of rivers (miles)", expand=c(0,0),trans="log10")+ scale_y_continuous("Frequency", expand=c(0,0)) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=12), axis.line = element_line(), plot.margin=unit(c(0.5,0.5,2,2),"lines") )
Scaled x-axis	Plain transformed x- coordinates

# Histogram of linear data on log transformed axis ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers)) + coord_trans(xtrans = "log1p") # Plain histogram of linear data on log transformed axis #define a new axis label formattter p <- ggplot(data=data.frame(rivers)) + geom_bar(aes(x=rivers),colour='black',fill='gray')+ scale_x_continuous("Length of rivers (miles)", expand=c(0,0))+ coord_trans(xtrans="log1p")+ scale_y_continuous("Frequency", expand=c(0,0)) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=12), axis.line = element_line(), plot.margin=unit(c(0.5,0.5,2,2),"lines") )
Gradient fill	Plain scaled grey gradient fill

ggplot(data = data.frame(rivers)) + geom_bar(aes(x = rivers, fill = ..count..)) # Plain histogram with gradient fill #define a new axis label formattter p <- ggplot(data=data.frame(rivers)) + geom_bar(aes(x=rivers,fill=..count..))+ geom_bar(aes(x=rivers, fill=..count..),colour="black",guide=FALSE)+ scale_x_continuous("Length of rivers (miles)", expand=c(0,0))+ scale_y_continuous("Frequency", expand=c(0,0))+ scale_fill_gradient(low="grey90", high="grey40") p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=12), axis.line = element_line(), legend.position=c(1,1),legend.justification=c(1,1), plot.margin=unit(c(0.5,0.5,2,2),"lines") )

Conditional (factorial) histograms

Basic histogram	Plain histogram

ggplot(data = iris) + geom_bar(aes(x = Sepal.Length, fill = Species), position = "identity") # OR ggplot(data = iris) + geom_histogram(aes(x = Sepal.Length, fill = Species), position = "identity") #Conditional histogram p <- ggplot(data=iris) + geom_bar(aes(x=Sepal.Length, fill=Species), position="identity")+ scale_x_continuous("Sepal length (mm)", expand=c(0,0))+ scale_y_continuous("Frequency", expand=c(0,0))+ scale_fill_grey() p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=12), axis.line = element_line(), legend.position=c(1,1),legend.justification=c(1,1), plot.margin=unit(c(0.5,0.5,2,2),"lines") )
Basic histogram	Plain histogram

ggplot(data = iris) + geom_bar(aes(x = Sepal.Length, fill = Species), position = "dodge") # OR ggplot(data = iris) + geom_histogram(aes(x = Sepal.Length, fill = Species), position = "dodge") #Transparent Conditional Histogram p <-ggplot(data=iris)+ geom_bar(aes(x=Sepal.Length,fill=Species), alpha=0.5, stat="bin", position="identity")+ geom_step(aes(x=Sepal.Length, colour=Species,fill=Species),stat="bin", position=position_identity())+#, scale_x_continuous("Sepal length (mm)", expand=c(0,0))+ scale_y_continuous("Frequency", expand=c(0,0)) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=12), axis.line = element_line(), legend.position=c(1,1),legend.justification=c(1,1), plot.margin=unit(c(0.5,0.5,2,2),"lines") )

Density plots - geom_density & stat_density

Univariate density plots

Basic density plot	Plain density plot

ggplot(data = data.frame(rivers)) + geom_density(aes(x = rivers)) p <- ggplot(data=data.frame(rivers)) + geom_density(aes(x=rivers),colour='black',fill='grey90')+ scale_x_continuous("Length of rivers (miles)", expand=c(0,0))+ scale_y_continuous(expression(paste("Density (",phantom() %% 10^-4,")")), expand=c(0,0), labels=function(x){format(x10000,nsmall=1,scientific=FALSE)}) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=12), axis.line = element_line(), plot.margin=unit(c(0.5,0.5,2,2),"lines") )
Basic smoother density plot	Plain smoother density plot

ggplot(data = data.frame(rivers)) + geom_density(aes(x = rivers), adjust = 5) myF <- function(x) { format(x * 10000, nsmall = 1, scientific = FALSE) } p <- ggplot(data = data.frame(rivers)) + geom_density(aes(x = rivers), adjust = 5, colour = "black", fill = "grey90") + scale_x_continuous("Length of rivers (miles)", expand = c(0, 0)) + scale_y_continuous("Density (/10000)", expand = c(0, 0), labels = myF) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y = element_text(size = 15, vjust = 0, angle = 90), axis.text.y = element_text(size = 12), axis.title.x = element_text(size = 15, vjust = -1), axis.text.x = element_text(size = 12), axis.line = element_line(), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))

Basic smoother density plot

Plain smoother density plot

ggplot(data = data.frame(rivers)) + geom_density(aes(x = rivers)) +
    scale_x_continuous(trans = "log10")

p <- ggplot(data=data.frame(rivers)) +
  geom_density(aes(x=rivers),colour='black',fill='grey90')+
  scale_x_continuous("Length of rivers (miles)", expand=c(0,0),
    trans="log10", breaks=c(250,500,1000,2000,3000),label=c(250,500,1000,2000,3000))+
  scale_y_continuous("Density", expand=c(0,0))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
        )

Factorial density

Basic factorial density plot	Plain factorial density plot

ggplot(data = iris) + geom_density(aes(x = Sepal.Length, colour = Species)) # Plain conditional density plot p <- ggplot(data=iris) + geom_density(aes(x=Sepal.Length, colour=Species))+ scale_x_continuous("Sepal length (mm)", expand=c(0,0))+ scale_y_continuous("Density", expand=c(0,0)) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=10), axis.line = element_line(), legend.position=c(1,1),legend.justification=c(1,1), plot.margin=unit(c(0.5,0.5,2,2),"lines") )
Basic factorial density plot	Plain factorial density plot

# Conditional density plot ggplot(data = iris) + geom_density(aes(x = Sepal.Length, fill = Species)) # Plain conditional density plot p <- ggplot(data=iris) + geom_density(aes(x=Sepal.Length,fill=Species), alpha=0.4, colour=NA)+ geom_density(aes(x=Sepal.Length,fill=Species, colour=Species), alpha=0.0, show_guide=FALSE)+ scale_x_continuous("Sepal length (mm)", expand=c(0,0))+ scale_y_continuous("Density", expand=c(0,0)) p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), axis.title.y=element_text(size=15, vjust=0,angle=90), axis.text.y=element_text(size=12), axis.title.x=element_text(size=15,vjust=-1), axis.text.x=element_text(size=10), axis.line = element_line(), legend.position=c(1,1),legend.justification=c(1,1), plot.margin=unit(c(0.5,0.5,2,2),"lines") )

Line graphs - geom_line

Basic line graph

Plain line graph

ggplot(BOD) + geom_line(aes(y = demand, x = Time))

# Plain line plot
p <- ggplot(data=BOD) +
  geom_line(aes(y=demand,x=Time),size=2)+
  scale_x_continuous("Time (days)", expand=c(0.05,0), limits=c(0,8))+
  scale_y_continuous("Demand (mg/l)", expand=c(0.05,0), limits=c(8,20))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Basic line graph

Plain line graph

ggplot(BOD, aes(y = demand, x = Time)) + geom_line() +
    geom_point()

# Plain line plot
p <- ggplot(data=BOD) +
  geom_line(aes(y=demand,x=Time),size=2)+
  scale_x_continuous("Time (days)", expand=c(0.05,0), limits=c(0,8))+
  scale_y_continuous("Demand (mg/l)", expand=c(0.05,0), limits=c(8,20))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Scatterplots - geom_point, geom_line, geom_smooth, stat_smooth & stat_summary

Simple scatterplots

Basic scatterplot

Plain scatterplot

ggplot(BOD) + geom_point(aes(y = demand, x = Time))

# Plain scatterplot
p <- ggplot(data=BOD) +
  geom_point(aes(y=demand,x=Time),size=3)+
  scale_x_continuous("Time (days)", expand=c(0,0), limits=c(0,8))+
  scale_y_continuous("Demand (mg/l)", expand=c(0,0), limits=c(8,20))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Trends and smoothers

Linear trend

Plain linear trend with
95% confidence and

plot of chunk ggplotScatterplotLinearPlain

ggplot(BOD) + geom_point(aes(y = demand, x = Time)) +
    geom_smooth(aes(y = demand, x = Time), method = "lm")

# fit linear model (in order to get confidence bands)
BOD.lm <- lm(demand~Time, data=BOD)
xs <- seq(min(BOD$Time), max(BOD$Time), l=1000)
BOD.predict <- predict(BOD.lm,
  newdata=data.frame(Time=xs),interval='confidence', se=TRUE)
BOD.predict <- data.frame(BOD.predict$fit, se=BOD.predict$se.fit,Time=xs)
# Create a plain scatterplot with smoother and confidence bands
p <- ggplot(data=BOD) +
  geom_point(aes(y=demand, x=Time),colour='grey',size=2)+
  geom_line(aes(y=demand,x=Time),stat="smooth", method="lm")+
  scale_x_continuous("Time (days)", limits=c(1,7))+
  scale_y_continuous("Demand (mg/l)")
p <- p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
# add the confidence bands
p+geom_line(data=BOD.predict, aes(y=upr,x=Time), linetype=2)+
  geom_line(data=BOD.predict, aes(y=lwr,x=Time), linetype=2)

Loess smoother

Plain loess smoother with standard error

plot of chunk ggplotScatterplotLoessPlain

ggplot(BOD) + geom_point(aes(y = demand, x = Time)) +
    geom_smooth(aes(y = demand, x = Time), method = "loess",
        degree = 1, se = TRUE)

# Fit a loess smoother
BOD.loess <- loess(demand~Time, data=BOD, degree=1)
xs <- seq(min(BOD$Time), max(BOD$Time), l=1000)
BOD.predict <- predict(BOD.loess,newdata=data.frame(Time=xs), se=TRUE)
BOD.predict <- with(BOD.predict,data.frame(fit,lwr=fit-se.fit,upr=fit+se.fit,Time=xs))
# Plain scatterplot with loess smoother and confidence bands
p <- ggplot(data=BOD) +
  geom_point(aes(y=demand, x=Time),colour='grey',size=2)+
  geom_line(data=BOD.predict,aes(y=fit,x=Time))+
  scale_x_continuous("Time (days)", limits=c(1,7))+
  scale_y_continuous("Demand (mg/l)")
p <- p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    axis.title.y=element_text(size=15, vjust=0,angle=90),
    axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
# add confidence bands
p+geom_line(data=BOD.predict, aes(y=upr,x=Time), linetype=2)+
  geom_line(data=BOD.predict, aes(y=lwr,x=Time), linetype=2)

Generalized additive model (GAM)

Plain gam with 95% CI based on
1.96 (dashed; ggplot default) and qt(0.975,df)

## Error: unused argument (k = 6)

## Error: error in evaluating the argument 'object' in selecting a method for function 'predict': Error: object 'CO2.gam' not found

## Error: object 'CO2.gam' not found

## Error: object 'CO2.predict' not found

## Error: object 'CO2.predict' not found

## Error: object 'CO2.predict1' not found

library(mgcv)
ggplot(CO2, aes(y = uptake, x = conc)) + geom_point() +
    stat_smooth(geom = "smooth", method = "gam", formula = y ~
        s(x, k = 6))

#manually fit a GAM
library(mgcv)
CO2.gam <- gam(uptake~s(conc,k=6), data=CO2)
xs <- seq(min(CO2$conc), max(CO2$conc), l=1000)
CO2.predict <- predict(CO2.gam,newdata=data.frame(conc=xs),se.fit=TRUE)
df <- sum(CO2.gam$edf[-1])
#generate 95% CI predictions based on 1.96SE and degrees of freedom	  
CO2.predict1 <-with(CO2.predict,data.frame(fit,lwr=fit-(1.96*se.fit),
                                                               upr=fit+(1.96*se.fit),conc=xs))
CO2.predict2 <- with(CO2.predict,data.frame(fit,lwr=fit-(qt(0.975,df)*se.fit),
                                                upr=fit+(qt(0.975,df)*se.fit),conc=xs))
p <- ggplot(data=CO2) +
  geom_point(aes(y=uptake, x=conc),colour='grey',size=2)+
  #stat_smooth(aes(y=uptake, x=conc),geom="smooth",method="gam",formula=y~s(x,k=6))+
  scale_x_continuous(expression(paste("Ambient ",CO[2]," concentration (mg/l)", sep="")))+
  scale_y_continuous(expression(paste(CO[2]," uptake rate (",mu*mol/m^2/sec,")", sep="")))
p <- p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
# add the trendline and confidence bands
p+geom_line(data=CO2.predict1, aes(y=fit,x=conc),)+
  geom_line(data=CO2.predict1, aes(y=upr,x=conc), linetype=2)+
  geom_line(data=CO2.predict1, aes(y=lwr,x=conc), linetype=2)+
  geom_line(data=CO2.predict2, aes(y=upr,x=conc), linetype=3)+
  geom_line(data=CO2.predict2, aes(y=lwr,x=conc), linetype=3)

Means plot

Plain gam with 95% CI based on
1.96 (dashed; ggplot default) and qt(0.975,df)

p <- ggplot(CO2, aes(y = uptake, x = conc)) + geom_pointrange(stat = "summary",
    fun.data = "mean_cl_boot")
p

p <- ggplot(data=CO2, aes(y=uptake, x=conc)) +
  scale_x_continuous(expression(paste("Ambient ",CO[2]," concentration (mg/l)", sep="")))+
  scale_y_continuous(expression(paste(CO[2]," uptake rate (",mu*mol/m^2/sec,")", sep="")))
p <- p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )
# add the errorbars
p+geom_pointrange(stat="summary", fun.data="mean_cl_normal")

Means plot

Plain gam with 95% CI based on
1.96 (dashed; ggplot default) and qt(0.975,df)

ggplot(CO2, aes(y = uptake, x = conc)) + geom_errorbar(stat = "summary",
    fun.data = "mean_cl_boot") + geom_point(stat = "summary",
    fun.y = "mean")

p <- ggplot(data=CO2, aes(y=uptake, x=conc)) +
  stat_smooth(geom="smooth",method="gam",formula=y~s(x,k=6),se=FALSE, colour="gray")+
  geom_errorbar(stat="summary", fun.data="mean_cl_boot") +geom_point(stat="summary",fun.y="mean")+
  scale_x_continuous(expression(paste("Ambient ",CO[2]," concentration (mg/l)", sep="")))+
  scale_y_continuous(expression(paste(CO[2]," uptake rate (",mu*mol/m^2/sec,")", sep="")))
p +
  theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Bargraphs (dynamite plots) - geom_bar & stat_summary

Simple bargraphs

Bargraph

Plain bargraph

ggplot(warpbreaks, aes(y = breaks, x = tension)) +
    geom_bar(stat = "summary", fun.y = mean) + geom_errorbar(stat = "summary",
    fun.data = "mean_cl_normal", width = 0.1)

p <- ggplot(data=warpbreaks, aes(y=breaks, x=tension)) +
  geom_bar(stat="summary", fun.y=mean,color="black",fill="grey80")+
  geom_errorbar(stat="summary", fun.data="mean_cl_normal", width=0.1)+
  scale_x_discrete("Tension")+
  scale_y_continuous("Number of breaks", expand=c(0,0))
p + theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Conditional (factorial) bargraphs

Conditional bargraph

Plain conditional bargraph

plot of chunk ggplotFactorialBargraphPlain

ggplot(warpbreaks, aes(y = breaks, x = tension, group = wool)) +
    geom_bar(aes(fill = wool), position = position_dodge(0.9),
        stat = "summary", fun.y = mean) + geom_errorbar(position = position_dodge(0.9),
    stat = "summary", fun.data = "mean_cl_normal",
    width = 0.1)

p <- ggplot(data = warpbreaks, aes(y = breaks, x = tension,
    group = wool)) + geom_bar(aes(fill = wool), position = "dodge",
    stat = "summary", fun.y = mean) + geom_bar(aes(fill = wool),
    position = "dodge", stat = "summary", fun.y = mean,
    color = "black", show_guide = FALSE) + scale_fill_grey("Wool type") +
    geom_errorbar(position = position_dodge(0.9), stat = "summary",
        fun.data = "mean_cl_normal", width = 0.1) +
    scale_x_discrete("Tension") + scale_y_continuous("Number of wool breaks",
    expand = c(0, 0))
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 1), legend.justification = c(1,
        1), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))

Bar charts - geom_bar

Stacked barchart

Plain stacked barchart

# based on pre-calculated counts
warpbreaks.c <- ddply(warpbreaks, ~wool + tension,
    function(x) data.frame(count = sum(x$breaks)))
ggplot(warpbreaks.c, aes(x = tension, y = count, fill = wool)) +
    geom_bar(stat = "identity") + ylab("Number of breaks")

#based on pre-calculated counts
warpbreaks.c<-ddply(warpbreaks,~wool+tension, function(x) data.frame(count=sum(x$breaks)))
p <- ggplot(warpbreaks.c, aes(x=tension,y=count,fill=wool))+
  geom_bar(aes(fill=wool), stat='identity')+
  geom_bar(aes(fill=wool), stat='identity', colour="black",show_guide=FALSE)+
  scale_fill_grey("Wool type")+
  scale_x_discrete("Tension")+
  scale_y_continuous("Number of wool breaks", expand=c(0,0))
p +
  theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    axis.title.y=element_text(size=15, vjust=0,angle=90),
    axis.text.y=element_text(size=12),
    axis.title.x=element_text(size=15,vjust=-1),
    axis.text.x=element_text(size=10),
    axis.line = element_line(),
    legend.position=c(1,1),legend.justification=c(1,1),
    plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Horizontal stacked barchart

Plain horizontal stacked barchart

# based on pre-calculated counts
warpbreaks.c <- ddply(warpbreaks, ~wool + tension,
    function(x) data.frame(count = sum(x$breaks)))
ggplot(warpbreaks.c, aes(x = tension, y = count, fill = wool)) +
    geom_bar(stat = "identity") + ylab("Number of breaks") +
    coord_flip()

#based on pre-calculated counts
warpbreaks.c<-ddply(warpbreaks,~wool+tension, function(x) data.frame(count=sum(x$breaks)))
p <- ggplot(warpbreaks.c, aes(x=tension,y=count,fill=wool))+ coord_flip()+
  geom_bar(aes(fill=wool), stat='identity')+
  geom_bar(aes(fill=wool), stat='identity', colour="black",show_guide=FALSE)+
  scale_fill_grey("Wool type")+
  scale_x_discrete("Tension")+
  scale_y_continuous("Number of wool breaks", expand=c(0.05,0))
p +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        axis.line = element_line(),
        legend.position=c(1,1),legend.justification=c(1,1),
        plot.margin=unit(c(0.5,0.5,2,2),"lines")
  )

Interaction plots - geom_point, geom_line, geom_smooth

Interaction plot

Plain interaction plot

plot of chunk ggplotInteractionPlotPlain

ggplot(ToothGrowth, aes(y = len, x = dose, colour = supp)) +
    geom_point() + geom_smooth(method = "lm")

p <- ggplot(ToothGrowth, aes(y = len, x = dose, linetype = supp)) +
    geom_point(aes(shape = supp)) + geom_smooth(method = "lm") +
    scale_linetype_manual(name = "Suppliment type",
        values = c(1, 2), breaks = c("OJ", "VC"), labels = c("Orange juice",
            "Vitamine C")) + scale_shape_manual(name = "Suppliment type",
    values = c(21, 16), breaks = c("OJ", "VC"), labels = c("Orange juice",
        "Vitamine C")) + scale_x_continuous("Dose (mg)",
    labels = comma) + scale_y_continuous("Tooth length (mm)",
    expand = c(0.05, 0), labels = comma)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 0), legend.justification = c(1,
        0), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))

Interaction plot with errorbars

Plain interaction plot with errorbars

plot of chunk ggplotInteractionPlotPlain2

ggplot(ToothGrowth, aes(y = len, x = dose, colour = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot") +
    geom_point(stat = "summary", fun.y = "mean") +
    geom_line(stat = "summary", fun.y = "mean")

p <- ggplot(ToothGrowth, aes(y = len, x = dose, group = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot",
        width = 0.05) + geom_line(aes(linetype = supp),
    stat = "summary", fun.y = "mean") + geom_point(aes(shape = supp,
    fill = supp), stat = "summary", fun.y = "mean") +
    scale_shape_manual(name = "Suppliment type", values = c(21,
        16), breaks = c("OJ", "VC"), labels = c("Orange juice",
        "Vitamine C")) + scale_fill_manual(name = "Suppliment type",
    values = c("white", "black"), breaks = c("OJ",
        "VC"), labels = c("Orange juice", "Vitamine C")) +
    scale_linetype_manual(name = "Suppliment type",
        values = c(1, 2), breaks = c("OJ", "VC"), labels = c("Orange juice",
            "Vitamine C")) + scale_x_continuous("Dose (mg)",
    labels = comma) + scale_y_continuous("Tooth length (mm)",
    expand = c(0.05, 0), labels = comma)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 0), legend.justification = c(1,
        0), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"),
    legend.key = element_blank())

Interaction plot with dodge

Plain interaction plot with dodge

plot of chunk ggplotInteractionPlotPlain4

ggplot(ToothGrowth, aes(y = len, x = dose, colour = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot",
        position = position_dodge(0.2)) + geom_point(stat = "summary",
    fun.y = "mean", position = position_dodge(0.2)) +
    geom_line(stat = "summary", fun.y = "mean", position = position_dodge(0.2))

p <- ggplot(ToothGrowth, aes(y = len, x = dose, group = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot",
        width = 0.05, position = position_dodge(0.2)) +
    geom_line(aes(linetype = supp), stat = "summary",
        fun.y = "mean", position = position_dodge(0.2)) +
    geom_point(aes(shape = supp, fill = supp), size = 3,
        stat = "summary", fun.y = "mean", position = position_dodge(0.2)) +
    scale_shape_manual(name = "Suppliment type", values = c(21,
        16), breaks = c("OJ", "VC"), labels = c("Orange juice",
        "Vitamine C")) + scale_fill_manual(name = "Suppliment type",
    values = c("white", "black"), breaks = c("OJ",
        "VC"), labels = c("Orange juice", "Vitamine C")) +
    scale_linetype_manual(name = "Suppliment type",
        values = c(1, 2), breaks = c("OJ", "VC"), labels = c("Orange juice",
            "Vitamine C")) + scale_x_continuous("Dose (mg)",
    labels = comma) + scale_y_continuous("Tooth length (mm)",
    expand = c(0.05, 0), labels = comma)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 0), legend.justification = c(1,
        0), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"),
    legend.key = element_blank())

Scatterplot matrix - plotmatrix

Means plot

Plain gam with 95% CI based on
1.96 (dashed; ggplot default) and qt(0.975,df)

p <- plotmatrix(iris) + geom_point(aes(colour = Species))

library(grid)
library(scales)
p <- ggplot(ToothGrowth, aes(y = len, x = dose, group = supp)) +
    geom_errorbar(stat = "summary", fun.data = "mean_cl_boot",
        width = 0.05, position = position_dodge(0.2)) +
    geom_line(aes(linetype = supp), stat = "summary",
        fun.y = "mean", position = position_dodge(0.2)) +
    geom_point(aes(shape = supp, fill = supp), size = 3,
        stat = "summary", fun.y = "mean", position = position_dodge(0.2)) +
    scale_shape_manual(name = "Suppliment type", values = c(21,
        16), breaks = c("OJ", "VC"), labels = c("Orange juice",
        "Vitamine C")) + scale_fill_manual(name = "Suppliment type",
    values = c("white", "black"), breaks = c("OJ",
        "VC"), labels = c("Orange juice", "Vitamine C")) +
    scale_linetype_manual(name = "Suppliment type",
        values = c(1, 2), breaks = c("OJ", "VC"), labels = c("Orange juice",
            "Vitamine C")) + # scale_fill_grey('Wool type')+
scale_x_continuous("Dose (mg)", label = comma) + scale_y_continuous("Tooth length (mm)",
    expand = c(0.05, 0), label = comma)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(), panel.border = element_blank(),
    axis.title.y = element_text(size = 15, vjust = 0,
        angle = 90), axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 15, vjust = -1),
    axis.text.x = element_text(size = 10), axis.line = element_line(),
    legend.position = c(1, 0.1), legend.justification = c(1,
        0), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"),
    legend.key = element_blank())

2D grid of panels - facet_grid

Grid of panels

Plain grid of panels

ggplot(CO2, aes(y = uptake, x = conc)) + geom_smooth() +
    geom_point() + facet_grid(Type ~ Treatment)

library(grid)
library(scales)
# Create a new instance of the dataset to facilitate more informative panel titles
CO2.a <- CO2
# Re-define the factor labels
CO2.a$Treatment <- factor(CO2.a$Treatment, labels=c(expression(paste("Non-Chilled ", (symbol("\076")*15 * degree * C), sep="")),
expression(paste("Chilled ",(symbol("\074")*5 * degree * C), sep=""))))
CO2.a$Type <- factor(CO2.a$Type, labels=c("Origin:Quebec", "Origin:Mississippi"))
p<-ggplot(CO2.a,aes(y=uptake,x=conc))+
  geom_ribbon(aes(ymin=..ymin.., ymax=..ymax..),linetype=2,fill="transparent",colour="black",stat='smooth',method='loess')+
  geom_smooth(se=FALSE)+
  geom_point()+
  facet_grid(Type~Treatment, labeller=label_parsed)+
  scale_x_continuous(expression(paste("Ambient ",CO[2]," concentration (mg/l)", sep="")))+
  scale_y_continuous(expression(paste(CO[2]," uptake rate (",mu*mol/m^2/sec,")", sep="")))
p +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_rect(fill="transparent",colour="black"),
        axis.title.y=element_text(size=15, vjust=0,angle=90),
        axis.text.y=element_text(size=12),
        axis.title.x=element_text(size=15,vjust=-1),
        axis.text.x=element_text(size=10),
        #axis.line = element_line(),
    strip.background=element_rect(fill="transparent", colour="black"),
        #legend.position=c(1,0.2),legend.justification=c(1,0),
        plot.margin=unit(c(0.5,0.5,2,2),"lines"),
    legend.key=element_blank()
  )

Multiple graphs per graphic

Viewports

Grid of panels

Plain grid of panels

## Error: argument is of length zero

## Error: argument is of length zero

## Error: argument is of length zero

p1 <- ggplot(CO2, aes(y = uptake, x = Treatment, fill = Type)) + geom_boxplot()
p2 <- ggplot(CO2, aes(y = uptake, x = conc, colour = Type)) + geom_smooth()
p3 <- ggplot(CO2, aes(x = uptake, fill = Type)) + geom_density(alpha = 0.4) + facet_grid(~Treatment)

grid.newpage()
pushViewport(viewport(layout = grid.layout(4, 5)))

pushViewport(viewport(layout.pos.col = 1:2, layout.pos.row = 1:2))
print(p1 + theme(legend.position = "none"), newpage = FALSE)
popViewport(1)

pushViewport(viewport(layout.pos.col = 3:4, layout.pos.row = 1:2))
print(p2 + theme(legend.position = "none"), newpage = FALSE)
popViewport(1)

pushViewport(viewport(layout.pos.col = 1:4, layout.pos.row = 3:4))
print(p3 + theme(legend.position = "none"), newpage = FALSE)
popViewport(1)

library(gridExtra)
tmp <- ggplot_gtable(ggplot_build(p3))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
# legend <- gTree(legend, vp=viewport(layout.pos.col=5, layout.pos.row=1:4)) using grid.arrange for
# convenience could also manually push viewports grid.arrange(arrangeGrob(p3 +
# theme(legend.position='none'), p2 + theme(legend.position='none'), main ='this is a title', left =
# 'This is my global Y-axis title'), legend, widths=unit.c(unit(1, 'npc') - legend$width,
# legend$width), nrow=1)

pushViewport(viewport(layout.pos.col = 5, layout.pos.row = 1:4))
# print(p3+theme(keep='legend_box'), newpage=FALSE) print(p3+theme(keep='legend_box'), newpage=FALSE)
grid.draw(legend)
popViewport(0)

#Create a new plain element
element_plain <- function (base_size = 12,base_family=""){ #
  structure(list(
    axis.line = element_line(),
    #axis.text.x = element_text(family = base_family, size = base_size * 0.8, vjust = 1, lineheight = 0.9), 
    #axis.text.y = element_text(family = base_family, size = base_size * 0.8, hjust = 1, lineheight = 0.9),
    #axis.ticks = element_line(colour = "black", size = 0.2), 
    #axis.title.x = element_text(family = base_family, size = base_size, vjust = 0.5), 
    #axis.title.y = element_text(family = base_family, size = base_size, vjust = 0.5, angle = 90), 
    #axis.ticks.length = unit(0.15, "cm"), 
    #axis.ticks.margin = unit(0.1, "cm"), 
    #legend.background=element_blank(),
    #legend.margin=unit(0.2,"cm"),
    #legend.key = element_rect(colour = "grey80"),
    #legend.key.size=unit(1.2,"lines"),
    #legend.key.height=NULL,
    #legend.key.width=NULL,
    #legend.text=element_text(family = base_family, size = base_size * 0.8),
    #legend.text.align=NULL,
    #legend.title=element_text(family = base_family, face = "bold", size = base_size * 0.8, hjust = 0),
    #legend.title.align=NULL,
    #legend.position = "right",
    #legend.direction=NULL,
    #legend.justification="center",
    #legend.box=NULL,
    #panel.background = element_blank(), 
    #panel.border = element_blank(), 
    #panel.grid.major = element_blank(), 
    #panel.grid.minor = element_blank(), 
    #panel.margin = unit(0.25, "lines"),
    #strip.background=element_rect(fill="transparent", colour="black"),
    #strip.text.x=element_text(family = base_family, size = base_size * 0.8),
    #strip.text.y=element_text(family = base_family, size = base_size * 0.8, angle = -90),
    #plot.background = element_blank(),
    plot.margin=unit(c(0.5,0.5,1,1),"lines"),
    plot.title=element_blank()
  ), class = "theme")
}
#Construct the boxplots
p1 <- ggplot(CO2, aes(y=uptake,x=Treatment,fill=Type))+
  geom_boxplot(alpha=0.4)+scale_fill_manual(values=c("white","grey"))+
  element_plain()

p2 <- ggplot(CO2, aes(y=uptake, x=conc, linetype=Type))+
 geom_smooth(color="black",se=FALSE)+
 geom_smooth(color="black", show_guide=FALSE)+
 scale_linetype()+
 element_plain()

p3<-ggplot(CO2, aes(x=uptake, fill=Type)) +
 geom_density(alpha=0.4, colour=NA)+
 geom_density(alpha=0, show_guide=FALSE)+
 facet_grid(~Treatment)+
 scale_fill_manual(values=c("white","grey"))+
 element_plain()

library(gridExtra)
tmp <- ggplot_gtable(ggplot_build(p3))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]

tmp <- ggplot_gtable(ggplot_build(p2))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend1 <- tmp$grobs[[leg]]
##legend <- gTree(legend, vp=viewport(layout.pos.col=5, layout.pos.row=1:4))

 grid.newpage()
  pushViewport(viewport(layout=grid.layout(4,5)))

  pushViewport(viewport(layout.pos.col=1:2, layout.pos.row=1:2))
  print(p1+theme(legend.position="none"), newpage=FALSE)
  popViewport(1)

  pushViewport(viewport(layout.pos.col=3:4, layout.pos.row=1:2))
  print(p2+theme(legend.position="none"), newpage=FALSE)
  popViewport(1)

  pushViewport(viewport(layout.pos.col=1:4, layout.pos.row=3:4))
  print(p3+theme(legend.position="none"), newpage=FALSE)
  popViewport(1)

pushViewport(viewport(layout.pos.col=5, layout.pos.row=1:4))
grid.draw(legend)
grid.draw(legend1)
  popViewport(0)

- axis ticks etc

Tutorial 6 - The Grammar of Graphics in R (ggplot2)

The Grammar of Graphics

Geometric objects - geom_ and stat_

Primary geometric objects

geom_bar and stats_bin

id="boxplot"geom_boxplot and stat_boxplot

geom_density and stat_density

geom_point

geom_line

geom_smooth and stat_smooth

geom_tile

geom_contour and stat_contour

Secondary geometric objects

geom_segment

geom_ribbon

geom_errorbar

Coordinate system - coord

Altering the axes scales via the coordinate system

Transformers

trans_new

*_trans transformers

Scales

Scaling the x and y values (scale_x_)

Scaling the size of geoms (scale_size_)

Scaling the shape of geoms (scale_shape_)

Scaling the linetype associated with geoms (scale_linetype_)

Scaling the colour (or fill) associated with geoms (scale_colour_ & scale_fill_)

Scaling the alpha level of colour associated with geoms (scale_alpha_)

Facets (panels)

Themes

Examples

Exploring distributions

Boxplots - geom_boxplot & stat_boxplot

Univariate boxplots

Conditional (factorial) boxplots

Violin Plot - geom_violin

Histograms - geom_histogram, geom_bar & stat_bin

Univariate histograms

Conditional (factorial) histograms

Density plots - geom_density & stat_density

Univariate density plots

Factorial density

Line graphs - geom_line

Scatterplots - geom_point, geom_line, geom_smooth, stat_smooth & stat_summary

Simple scatterplots

Trends and smoothers

Bargraphs (dynamite plots) - geom_bar & stat_summary

Simple bargraphs

Conditional (factorial) bargraphs

Bar charts - geom_bar

Interaction plots - geom_point, geom_line, geom_smooth

Scatterplot matrix - plotmatrix

Multi-panel (facetted) plot - facet_grid & facet_wrap

2D grid of panels - facet_grid

Multiple graphs per graphic

Viewports

Welcome to the end of this tutorial

Geometric objects - `geom_` and `stat_`

`geom_bar` and `stats_bin`

id="boxplot"`geom_boxplot` and `stat_boxplot`

`geom_density` and `stat_density`

`geom_point`

`geom_line`

`geom_smooth` and `stat_smooth`

`geom_tile`

`geom_contour` and `stat_contour`

`geom_segment`

`geom_ribbon`

`geom_errorbar`

`trans_new`

`*_trans` transformers

Scaling the x and y values (`scale_x_`)

Scaling the size of geoms (`scale_size_`)

Scaling the shape of geoms (`scale_shape_`)

Scaling the linetype associated with geoms (`scale_linetype_`)

Scaling the colour (or fill) associated with geoms (`scale_colour_` & `scale_fill_`)

Scaling the alpha level of colour associated with geoms (`scale_alpha_`)