Tutorial 3.2 - Matrix algebra
27 Mar 2017
Overview
A matrix represents a collection of numbers arranged in a rectangle. Since statistical procedures are entirely numerical operations, the matrix is the natural representation of analyzable data. By convention, the columns of a matrix represent variables (the characteristics or descriptors such as length, count etc) and the rows represent the sampling units or objects (such as sites, quadrats, individuals, etc).
The general form of a matrix is outlined below along with the nomenclature used to refer to different parts of the matrix.
|
|
Such is the importance of matrices in numeric ecology and statistics that they warrant special attention. Even though a researcher performing statistical analyses may not directly advanced matrix manipulations for many statistical routines, matrices are used extensively behind the scenes of almost all analyses.
A matrix is typically denoted an uppercase, boldface letter (such as $\textbf{Y}$), although occasionally they are also indicated in general elemental form $[y_{ij}]$. $$\textbf{Y} = [y_{ij}] = \begin{bmatrix} y_{11} & y_{12} &. &. &. & y_{1p}\\ y_{21} & y_{22} &. &. &. & y_{2p}\\ . &. &. &. &. & .\\ . &. &. &. &. & .\\ . &. &. &. &. & .\\ y_{n1} & y_{n2} &. &. &. & y_{np}\\ \end{bmatrix} $$
In R, matrices can be constructed with the matrix function. This function folds a vector (data=) into a nominated number of rows (nrow=) and columns (ncol=) filling by columns (folding the data vertically) unless (byrow=TRUE).
> matrix(c(1,5,10, + 0,4,2, + 3,6,1, + 7,4,1),nrow=4,ncol=3)
[,1] [,2] [,3] [1,] 1 4 1 [2,] 5 2 7 [3,] 10 3 4 [4,] 0 6 1
> matrix(c(1,5,10, + 0,4,2, + 3,6,1, + 7,4,1),nrow=4,ncol=3,byrow=TRUE)
[,1] [,2] [,3] [1,] 1 5 10 [2,] 0 4 2 [3,] 3 6 1 [4,] 7 4 1
The order of a matrix is its dimensions. In the example above, the matrix has 4 rows and 3 columns and thus the matrix has order $4\times 3$. In R, the order of a matrix is confirmed using the dim() function.
> Y <- matrix(c(1,5,10, + 0,4,2, + 3,6,1, + 7,4,1),nrow=4,ncol=3) > dim(Y)
[1] 4 3
Matrices can be any rectangular shape. Some of the common types of matrices are listed below.
- square matrix where there are the same number of rows as columns.
We refer to this as either a square matrix of order $n$ or more simply, a matrix of order $n$.
$$
\textbf{Y}_{nn}=[y_{ij}]=
\begin{bmatrix}
y_{11} & y_{12} &. &. &. & y_{1n}\\
y_{21} & y_{22} &. &. &. & y_{2n}\\
.& & & & & .\\
.& & & & & .\\
.& & & & & .\\
y_{n1} & y_{n2} &. &. &. & y_{nn}\\
\end{bmatrix}
$$
Elements with matching subscripts ($y_{11}$, $y_{22}$, etc) are called the diagonals
- diagonal matrix is a square matrix in which the non diagonals are all zero.
$$ \begin{bmatrix} 3 & 0 & 0\\ 0 & 5 & 0\\ 0 & 0 & 1\\ \end{bmatrix} $$ > diag(c(3,5,1))
[,1] [,2] [,3] [1,] 3 0 0 [2,] 0 5 0 [3,] 0 0 1
The sum of the diagonals is called the trace.
- identity matrix (I) is a diagonal matrix in which all the diagonals are equal to 1.
$$ \textbf{I}= \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ \end{bmatrix} $$ > diag(1,nrow=3)
[,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1
In matrix algebra, an identity matrix is used to multiply by 1.
- scalar matrix (I) is an identity matrix in which all the diagonals are a constant value.
$$ \textbf{3I}= \begin{bmatrix} 3 & 0 & 0\\ 0 & 3 & 0\\ 0 & 0 & 3\\ \end{bmatrix} $$ > diag(3,nrow=3)
[,1] [,2] [,3] [1,] 3 0 0 [2,] 0 3 0 [3,] 0 0 3
- triangular matrix when a square matrix is symetrical, that is $y_{12}$ is the same as $y_{21}$
and so on, the matrix can be stored as a triangular matrix in which either the upper or lower corner of the
matrix is ommitted.
$$ \begin{bmatrix} y_{11} & \\ y_{21} & y_{22}\\ \end{bmatrix} $$ > Y <- matrix(c(1,2,3,4, + 5,6,7,8, + 9,10,11,12, + 13,14,15,16),4,4) > lower.tri(Y)
[,1] [,2] [,3] [,4] [1,] FALSE FALSE FALSE FALSE [2,] TRUE FALSE FALSE FALSE [3,] TRUE TRUE FALSE FALSE [4,] TRUE TRUE TRUE FALSE
> lower.tri(Y)*Y
[,1] [,2] [,3] [,4] [1,] 0 0 0 0 [2,] 2 0 0 0 [3,] 3 7 0 0 [4,] 4 8 12 0
- row matrix where there is only a single row $$ \begin{bmatrix} y_{11} & y_{12} & Y_{13} &. &. &. & y_{1p}\\ \end{bmatrix} $$
- column matrix where there is only a single column $$ \begin{bmatrix} y_{11}\\ y_{21}\\ .\\ .\\ .\\ y_{n1}\\ \end{bmatrix} $$
Transposed matrix
A transposed matrix is a created by swapping the rows and columns around. In R, matrix transposing is performed by the t() function.
If a matrix ($\textbf{Y}$) is: $$ \textbf{Y}=[y_{ij}]= \begin{bmatrix} 7 & 18 & -2 & 22\\ -16 & 3 & 55 & 1\\ 9 & -4 & 0 & 31\\ \end{bmatrix} $$ Then the transposed matrix ($\textbf{Y'}$) is: $$ \textbf{Y'}=[y_{ji}]= \begin{bmatrix} 7 & 16 & 9\\ 18 & 3 & -4\\ -2 & 55 & 0\\ 22 & 1 & 31\\ \end{bmatrix} $$ |
> Y <- matrix(c(7,18,-2,22 + -16,3,55,1, + 9,-4,0,31 + ),nrow=3,ncol=4,byrow=TRUE) > Y [,1] [,2] [,3] [,4] [1,] 7 18 -2 6 [2,] 3 55 1 9 [3,] -4 0 31 7 > #transpose Y > t(Y) [,1] [,2] [,3] [1,] 7 3 -4 [2,] 18 55 0 [3,] -2 1 31 [4,] 6 9 7 |
Matrix algebra
Algebric operations (such as addition and multiplication) can be performed on matrices (actually the elements within the matrices, multiplying a matrix by two multiplies each of the values by two rather than duplicating the matrix) just like any number. Nevertheless, in order to perform such operations, the matrices must be conformable. That is, since the operations are performed element-wise (on each element in turn), the matrices must be compatible. For example, it is not possible to multiply a 5x3 matrix by a 4x4 matrix because they have a different number of elements.
Matrix addition
To be conformable for addition two matrices must be of the same order (must have the same number of rows and columns). The resulting matrix has the same number of rows and columns as the original matrices and the elements are the pairwise sums of the original matrix elements.
$$ \textbf{A}= \begin{bmatrix} 1&2&3\\ 4&5&6\\ \end{bmatrix} ~,~ \textbf{B}= \begin{bmatrix} 7&8&9\\ 10&11&12\\ \end{bmatrix} $$ \begin{align*} \textbf{A+B} &= \begin{bmatrix} 1+7 & 2+8 & 3+9\\ 4+10 & 5+11 & 6+12\\ \end{bmatrix}\\ &=\begin{bmatrix} 8 & 10 & 12\\ 14 & 16 & 18\\ \end{bmatrix} \end{align*} |
> (A <- matrix(c(1,2,3,4,5,6),nrow=2,ncol=3,byrow=TRUE)) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 > (B <- matrix(c(7,8,9,10,11,12),nrow=2,ncol=3,byrow=TRUE)) [,1] [,2] [,3] [1,] 7 8 9 [2,] 10 11 12 > A+B [,1] [,2] [,3] [1,] 8 10 12 [2,] 14 16 18 |
Matrix subtraction
Matrix subtraction is actually matrix addition following the multiplication of right hand matrix by -1.
$$ \textbf{A}= \begin{bmatrix} 1&2&3\\ 4&5&6\\ \end{bmatrix} ~,~ \textbf{B}= \begin{bmatrix} 7&8&9\\ 10&11&12\\ \end{bmatrix} $$ \begin{align*} \textbf{B-A} &=\textbf{B+(-1)A}\\ &=\begin{bmatrix} 7+(-1) & 8+(-2) & 9+(-3)\\ 10+(-4) & 11+(-5) & 12+(-6)\\ \end{bmatrix}\\ &=\begin{bmatrix} 6 & 6 & 6\\ 6 & 6 & 6\\ \end{bmatrix} \end{align*} |
> (A <- matrix(c(1,2,3,4,5,6),nrow=2,ncol=3,byrow=TRUE)) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 > (B <- matrix(c(7,8,9,10,11,12),nrow=2,ncol=3,byrow=TRUE)) [,1] [,2] [,3] [1,] 7 8 9 [2,] 10 11 12 > B-A [,1] [,2] [,3] [1,] 6 6 6 [2,] 6 6 6 |
Matrix multiplication
To be conformable for multiplication the number of columns in the first matrix must be equal to the number of rows in the second matrix.
Formally, if $\textbf{A}$ has order of $n\times p$ and $\textbf{B}$ has order of $m\times q$, then they are only conformable if $p=m$ and the result is a matrix of order $n\times q$. Pairs of row and column elements from each matrix are multiplied together before being summed for each row of the first matrices columns. $$ \textbf{A}= \begin{bmatrix} 1&2&3\\ 4&5&6\\ \end{bmatrix} ~,~ \textbf{B}= \begin{bmatrix} 1&5\\ -3&2\\ 0&-1\\ \end{bmatrix} $$ \begin{align*} \textbf{AB} &= \begin{bmatrix} (1\times 1)+(2\times -3)+(3\times 0)&(1\times 5)+(2\times 2)+(3\times -1)\\ (4\times 1)+(5\times -3)+(6\times 0)&(4\times 5)+(5\times 2)+(6\times -1)\\ \end{bmatrix}\\ &=\begin{bmatrix} 5&6\\ -11&24\\ \end{bmatrix} \end{align*} In R, matrix multiplication is performed by the special %*% operator.
> (A <- matrix(c(1,2,3,4,5,6),nrow=2,ncol=3,byrow=TRUE))
[,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6
> (B <- matrix(c(1,-3,0,5,2,-1),nrow=3,ncol=2))
[,1] [,2] [1,] 1 5 [2,] -3 2 [3,] 0 -1
> A %*% B
[,1] [,2] [1,] -5 6 [2,] -11 24
Element wise products
Hadamard product
For two matrices of the same order, the Hadamard product multiplies corresponding cells together to create a product matrix of the same order as the original matrices.
$$ \textbf{A}= \begin{bmatrix} 1&2&3\\ 4&5&6\\ \end{bmatrix} ~,~ \textbf{B}= \begin{bmatrix} 1&5\\ -3&2\\ 0&-1\\ \end{bmatrix} $$ \begin{align*} \textbf{A}\odot\textbf{B} &= \begin{bmatrix} 1\times 7 & 2\times 8 & 3\times 9\\ 4\times 10 & 5\times 11 & 6\times 12\\ \end{bmatrix}\\ &=\begin{bmatrix} 7 & 16 & 27\\ 40 & 55 & 72\\ \end{bmatrix} \end{align*} |
> (A <- matrix(c(1,2,3,4,5,6),nrow=2,ncol=3, byrow=TRUE)) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 > (B <- matrix(c(7,8,9,10,11,12),nrow=2,ncol=3, byrow=TRUE)) [,1] [,2] [,3] [1,] 7 8 9 [2,] 10 11 12 > A*B [,1] [,2] [,3] [1,] 7 16 27 [2,] 40 55 72 |
Kronecker product
In the Kronecker (direct) product of two matrices, each of the elements of the first matrix multiplied by the second matrix. As with regular matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix.
$$ \textbf{A}= \begin{bmatrix} 1&2&3\\ 4&5&6\\ \end{bmatrix} ~,~ \textbf{B}= \begin{bmatrix} 7&8&9\\ 10&11&12\\ \end{bmatrix} $$ \begin{align*} \textbf{A}\otimes\textbf{B} &= \begin{bmatrix} 1\times\textbf{B}& 2\times\textbf{B}& 3\times\textbf{B}\\ 4\times\textbf{B}& 5\times\textbf{B}& 6\times\textbf{B}\\ \end{bmatrix}\\ &=\begin{bmatrix} 1&5&2&10&3&15\\ -3&2&-6&4&-9&6\\ 0&-1&0&-2&0&-3\\ 4&20&5&25&6&30\\ -12&8&-15&10&-18&12\\ 0&-4&0&-5&0&-6\\ \end{bmatrix} \end{align*} |
> (A <- matrix(c(1,2,3,4,5,6),nrow=2,ncol=3,byrow=TRUE)) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 > (B <- matrix(c(1,-3,0,5,2,-1),nrow=3,ncol=2)) [,1] [,2] [1,] 1 5 [2,] -3 2 [3,] 0 -1 > A%x%B [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 5 2 10 3 15 [2,] -3 2 -6 4 -9 6 [3,] 0 -1 0 -2 0 -3 [4,] 4 20 5 25 6 30 [5,] -12 8 -15 10 -18 12 [6,] 0 -4 0 -5 0 -6 |
Matrix modifications via matrix algebra
Multiplying rows or columns
Lets say we had a $3x4$ matrix and wanted to multiply the first row by 2, the second row by 3 and leave the third row alone (multiply it by 1). This could be performed by multiplying a specific diagonal matrix by the matrix.
$$ \textbf{A}= \begin{bmatrix} 1&2&3&4\\ 5&6&7&8\\ 9&10&11&12\\ \end{bmatrix} ~,~ \textbf{D}= \begin{bmatrix} 2&0&0\\ 0&2&0\\ 0&0&1\\ \end{bmatrix} $$ \begin{align*} \textbf{DA} &= \begin{bmatrix} 2&4&6&8\\ 15&18&21&24\\ 9&10&11&12\\ \end{bmatrix} \end{align*} |
> (A <- matrix(1:12,nrow=3,ncol=4,byrow=TRUE)) [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > (D <- diag(c(2,3,1))) [,1] [,2] [,3] [1,] 2 0 0 [2,] 0 3 0 [3,] 0 0 1 > D%*%A [,1] [,2] [,3] [,4] [1,] 2 4 6 8 [2,] 15 18 21 24 [3,] 9 10 11 12 |
Swapping entire rows or columns
To interchange an entire row (or column) of a matrix we multiply a modified identity matrix by the matrix. So if we wanted to swap the second and third rows in a $3\times4$ matrix:
$$ \textbf{A}= \begin{bmatrix} 1&2&3&4\\ 5&6&7&8\\ 9&10&11&12\\ \end{bmatrix} ~,~ \textbf{I}= \begin{bmatrix} 1&0&0\\ 0&0&1\\ 0&1&0\\ \end{bmatrix} $$ \begin{align*} \textbf{IA} &= \begin{bmatrix} 1&2&3&4\\ 9&10&11&12\\ 5&6&7&8\\ \end{bmatrix} \end{align*} |
> (A <- matrix(1:12,nrow=3,ncol=4,byrow=TRUE)) [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > (I <- matrix(c(1,0,0,0,0,1,0,1,0), nrow=3, byrow=TRUE)) [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 0 1 [3,] 0 1 0 > I%*%A [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 9 10 11 12 [3,] 5 6 7 8 |
Adding or subtracting rows or columns together
A modified identity matrix can also be used to cause rows (or columns) to be summed (or subtracted). if for example, we wanted to modify a $3\times4$ matrix such that each successive row represented a progressive accumulation from the original matrix:
$$ \textbf{A}= \begin{bmatrix} 1&2&3&4\\ 5&6&7&8\\ 9&10&11&12\\ \end{bmatrix} ~,~ \textbf{I}= \begin{bmatrix} 1&0&0\\ 1&1&0\\ 1&1&1\\ \end{bmatrix} $$ \begin{align*} \textbf{IA} &= \begin{bmatrix} 1&2&3&4\\ 6&8&10&12\\ 15&18&21&24\\ \end{bmatrix} \end{align*} |
> (A <- matrix(1:12,nrow=3,ncol=4,byrow=TRUE)) [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > (I <- matrix(c(0,0,1,1,0,1,1,1), nrow=3, byrow=TRUE)) [,1] [,2] [,3] [1,] 0 0 1 [2,] 1 0 1 [3,] 1 1 0 > I%*%A [,1] [,2] [,3] [,4] [1,] 9 10 11 12 [2,] 10 12 14 16 [3,] 6 8 10 12 |
Applying element-wise functions
The apply family
The apply family of functions applies a function to one of the margins of an array (or matrix). For example, we could calculate the mean of each column (MARGIN=2) or the sum of each row ((MARGIN=2).
$$ \textbf{A}= \begin{bmatrix} 1&2&3&4\\ 5&6&7&8\\ 9&10&11&12\\ \end{bmatrix} $$ |
> (A <- matrix(1:12,nrow=3,ncol=4,byrow=TRUE)) [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > apply(A,MARGIN=1,mean) [1] 2.5 6.5 10.5 > apply(A,MARGIN=2,sum) [1] 15 18 21 24 |
The apply function is therefore usefull when we want to apply a non-constant function throughout a matrix. For example, we might want to center each value in a matrix by its column means. In other words, we want to subtract the mean of each column from each value.
> A <- matrix(1:12,nrow=3,ncol=4,byrow=TRUE) > apply(A,MARGIN=2,function(x) x-mean(x))
[,1] [,2] [,3] [,4] [1,] -4 -4 -4 -4 [2,] 0 0 0 0 [3,] 4 4 4 4
> A <- matrix(1:12,nrow=3,ncol=4,byrow=TRUE) > (A[1,3] <- NA)
[1] NA
> apply(A,MARGIN=1,mean)
[1] NA 6.5 10.5
> apply(A,MARGIN=1,mean, na.rm=TRUE)
[1] 2.333333 6.500000 10.500000
sweep element-wise operations
The sweep functions extends the flexibility of apply a little further by allowing us to 'sweep' a function that applies a statistic (STAT through a margin of the array. For example, we could use sweep to subtract (the FUN) the mean of each column (the STAT) so as to center the data by column (like above):
> (A <- matrix(1:12,nrow=3,ncol=4,byrow=TRUE))
[,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12
> sweep(A,2,STAT=apply(A,2,mean),FUN='-')
[,1] [,2] [,3] [,4] [1,] -4 -4 -4 -4 [2,] 0 0 0 0 [3,] 4 4 4 4
$$ \textbf{A}= \begin{bmatrix} 1&2&3&4\\ 5&6&7&8\\ 9&10&11&12\\ \end{bmatrix} ~,~ \textbf{B}= \begin{bmatrix} 3&5&7&6\\ \end{bmatrix} $$ |
> (A <- matrix(1:12,nrow=3,ncol=4,byrow=TRUE)) [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > (B <- c(3,5,7,6)) [1] 3 5 7 6 > sweep(A,MARGIN=2,STAT=B,FUN='-') [,1] [,2] [,3] [,4] [1,] -2 -3 -4 -2 [2,] 2 1 0 2 [3,] 6 5 4 6 |
Ordinary least squares via matrix algebra
Ordinary least squares regression is an elegant estimating technique that solves a series of simultaneous equations to as to estimate a number of parameters (typically an intercept and one or more slopes).
$$ \textbf{Y}=\textbf{X}\beta+\epsilon $$ $$ \hat{\beta} = (\textbf{X'}\textbf{X})^{-1}\textbf{X'}\textbf{Y} $$ And the standard errors as the square-root of the diagonals from the variance-covariance matrix $$ Var(\hat{\beta}|\textbf{X}) = \frac{1}{n-k}\hat{\epsilon}'\hat{\epsilon}(\textbf{X'}\textbf{X})^{-1} $$ where $n$ is the number of observations, $k$ is the number of columns in the model matrix and $\epsilon$ is the vector of residuals |
> y <- c(3,2.5,6,5.5,9,8.6,12) > x <- 0:6 > X <- model.matrix(~x) > X (Intercept) x 1 1 0 2 1 1 3 1 2 4 1 3 5 1 4 6 1 5 7 1 6 attr(,"assign") [1] 0 1 > beta<-solve(t(X) %*% X) %*% t(X) %*% y > beta [,1] (Intercept) 2.135714 x 1.507143 > #standard error > ## Calculate the residuals > epsilon <- as.matrix(y-beta[1]-beta[2]*x) > ## Calculate Variance-Covariance Matrix > VCV <- 1/(length(y)-ncol(X)) * as.numeric(t(epsilon)%*%epsilon) * solve(t(X)%*%X) > > ## Standard errors of the estimated coefficients > sqrt(diag(VCV)) (Intercept) x 0.7849672 0.2177107 > # Compare to lm > summary(lm(y~x)) Call: lm(formula = y ~ x) Residuals: 1 2 3 4 5 0.8643 -1.1429 0.8500 -1.1571 0.8357 6 7 -1.0714 0.8214 Coefficients: Estimate Std. Error t value (Intercept) 2.1357 0.7850 2.721 x 1.5071 0.2177 6.923 Pr(>|t|) (Intercept) 0.041737 * x 0.000965 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.152 on 5 degrees of freedom Multiple R-squared: 0.9055, Adjusted R-squared: 0.8866 F-statistic: 47.92 on 1 and 5 DF, p-value: 0.0009648 |