Matrices

Review basic matrix algebra
Published

August 28, 2024


1 Definition

1.1 Vector

  • A list of numbers (commonly used within the field of computer science).

  • A line segment that has both magnitude and direction (commonly used within the field of physics).

  • If it consists of only a single number, it is termed as a scalar.

1.2 Matrix

  • A rectangular array of elements (numbers or expressions).

  • It has dimensions \(m \times n\) (known also as matrix order), where \(m\) is the number of rows and \(n\) is the number of columns.

  • It is usually denoted by an uppercase letter.

  • It can also be also considered as an array of vectors contained in the same object.

  • Each element can be indexed by the number of its row and column (i.e., the element in the \(i^{th}\) row and \(j^{th}\) column is denoted by \(a_{ij}\)).

  • For example, consider the following matrix \(A\):

    • \(A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \end{bmatrix}\).

    • \(A\) is a \(3\times 4\) matrix that has \(3\) rows and \(4\) columns.

    • The element \(a_{23}\) is located in the second row and the third column.

  • If the matrix has one row (dimensions \(1\times n\)), it is referred to as row matrix,for example: \(A = \begin{bmatrix} 2 & 0 & 5 & 9 \end{bmatrix}\).

  • If the matrix has one column (dimensions \(m\times 1\)), it is referred to as column matrix,for example: \(A = \begin{bmatrix} 2 \\ 0 \\ 5 \\ 9 \end{bmatrix}\).

  • In R, matrices can be created using the function matrix():

    Click to show/hide code
    A <- matrix(c(2, 0, 5, 9),  # element values
                nrow = 2,       # specify the number of rows
                byrow = TRUE)   # fill the matrix by rows
    A
         [,1] [,2]
    [1,]    2    0
    [2,]    5    9
    Click to show/hide code
    # Any element can be extracted using the row and column index, For example, the element in the second row and first column (5) can be extracted from the matrix A as follows
    
    A[2, 1]
    [1] 5

1.3 Transpose of a matrix

  • For a matrix \(A\) with dimensions \(m\times n\), the transpose of \(A\), denoted as \(A^T\) or \(A'\), is a matrix with dimensions \(n\times m\) formed by interchanging the rows and columns of \(A\) (i.e., the rows of \(A\) become the columns of \(A^T\), and the columns of \(A\) become the rows of \(A^T\)).

  • Example:

    • \(A = \begin{bmatrix} \color{red} 3 & \color{#0466c8} 8 & \color{green} 4 \\ \color{red} 5 & \color{#0466c8} 7 & \color{green} 6 \end{bmatrix} \Rightarrow A^T = \begin{bmatrix} \color{red} 3 & \color{red}5 \\ \color{#0466c8}8 & \color{#0466c8}7 \\ \color{green}4 & \color{green}6 \end{bmatrix}\).
  • R can be used to get the transpose of a matrix using the function t():

    Click to show/hide code
    A <- matrix(c(3, 8, 4, 5, 7, 6), 
                nrow = 2, 
                byrow = TRUE)
    A
         [,1] [,2] [,3]
    [1,]    3    8    4
    [2,]    5    7    6
    Click to show/hide code
    # get the transpose of the matrix A
    t(A) 
         [,1] [,2]
    [1,]    3    5
    [2,]    8    7
    [3,]    4    6
  • Some properties of the transpose of a matrix:

    • \((A^T)^T = A\).

    • \((A + B)^T = A^T + B^T\), where \(A\) and \(B\) are matrices of the same dimensions.

    • \((cA)^T = cA^T\), where \(c\) is a scalar (constant).

2 Special types of matrices

2.1 Zero matrix

  • It is a matrix having all of its elements equal to zero, for example: \(A = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}\).

2.2 Square matrix

  • A square matrix has equal number of rows and columns (i.e., \(m = n\)).

  • The diagonal consists of the elements running from the top-left corner to the bottom-right corner of the matrix.

  • The trace of a square matrix is the sum of the elements of the matrix diagonal.

  • Elements other than the diagonal are termed off-diagonal.

  • Example:

    • \(A = \begin{bmatrix} \color{red} {5} & 9 & 4 & 15 \\ 2 & \color{red} {6} & 3 & 9 \\ 1 & 4 & \color{red} {11} & 21 \\ 10 & 17 & 12 & \color{red} {8} \end{bmatrix}\).

    • The diagonal consists of the elements \(5, 6, 11\) and \(8\).

    • The trace \(= 5+6+11+8 = 30\)

    • The diagonal elements can be extracted from a matrix in R using the function diag():

      Click to show/hide code
      A <- matrix(c(5, 9, 4, 15, 2, 6, 3, 9, 1, 4, 11, 21, 10, 17, 12, 8), 
                  nrow = 4, 
                  byrow = TRUE)
      A
           [,1] [,2] [,3] [,4]
      [1,]    5    9    4   15
      [2,]    2    6    3    9
      [3,]    1    4   11   21
      [4,]   10   17   12    8
      Click to show/hide code
      diag(A)  # get the diagonal elements of the matrix A
      [1]  5  6 11  8
    • The trace can be calculated in R using the function trace() from the psych package:

      Click to show/hide code
      library(psych)
      tr(A)  # get the trace of the matrix A
      [1] 30
  • Symmetric and skew-symmetric square matrices:

    • If \(A^T = A\), matrix \(A\) is referred to as symmetric.

    • If \(A^T = -A\), matrix \(A\) is referred to as skew-symmetric.

2.3 Diagonal matrix

  • It is a square matrix where all off-diagonal elements are zeros, for example: \(B = \begin{bmatrix} \color{red} {5} & 0 & 0 & 0 \\ 0 & \color{red} {6} & 0 & 0 \\ 0 & 0 & \color{red} {11} & 0 \\ 0 & 0 & 0 & \color{red} {8} \end{bmatrix}\)

2.4 Identity matrix

  • It is a diagonal matrix where all the diagonal elements are equal to 1, e.g., \(I_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\).

  • In linear algebra, the identity matrix functions like the unit scalar of 1 in ordinary scalar algebra:

    • If a square matrix \(A\) with dimensions \(n\times n\) is multiplied by the identity matrix \(I_n\), the result is the matrix \(A\) itself (i.e., \(A \times I_n = I_n \times A = A\)).

2.5 Invertible matrix

  • A square matrix \(A\) with dimensions \(n\times n\) is invertible, if there exists another matrix \(B\) with the same dimensions \(n\times n\) , such that, \(AB = BA = I_n\), where \(I_n\) is an identity matrix with dimensions \(n\times n\).

  • The inverse of a matrix \(A\) is denoted by \(A^{-1}\).

  • An invertible matrix is also referred to as non-singular matrix.

  • Invertible matrices have non-zero determinant:

    • The determinant of a matrix is a scalar value (i.e., a single numerical value) used when solving systems of linear equations or calculating the inverse of a matrix.

    • For a \(2 \times 2\) matrix \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\), the determinant, denoted by \(|A|\) or \(det(A)\), is calculated as:

      • \(|A| = ad - bc\).

      • For example, if \(A = \begin{bmatrix} 4 & 7 \\ 2 & 6 \end{bmatrix} \Rightarrow |A| = 4 \times 6 - 7 \times 2 = 24-14 = 10\).

      • The determinant and the inverse of a matrix can be calculated in R using the functions det() and solve(), respectively:

        Click to show/hide code
        A <- matrix(c(4, 7, 2, 6), 
                    nrow = 2, 
                    byrow = TRUE)
        A
             [,1] [,2]
        [1,]    4    7
        [2,]    2    6
        Click to show/hide code
        det(A)    # get the determinant of the matrix A     
        [1] 10
        Click to show/hide code
        solve(A)  # get the inverse of the matrix A   
             [,1] [,2]
        [1,]  0.6 -0.7
        [2,] -0.2  0.4
        Click to show/hide code
        round(A %*% solve(A), 10)  # check if the product of A and its inverse is equal to the identity matrix 
             [,1] [,2]
        [1,]    1    0
        [2,]    0    1
    • The formulae for the determinant of matrices with dimensions greater than \(2 \times 2\) are complicated and are not covered here.

    • If a matrix has a determinant equal to zero, it is referred to as singular and is not invertible:

      • For example, \(A = \begin{bmatrix} 2 & 4 \\ 1 & 2 \end{bmatrix} \Rightarrow |A| = 2 \times 2 - 4 \times 1 = 4-4 = 0\), so matrix \(A\) is singular and not invertible.

      • A singular matrix has linearly dependent rows or columns (i.e., one row or one column can be expressed as a linear combination of the others).

      • In the above example, the second row is linearly dependent on the first row (i.e., you can get the second row by multiplying the first row by \(\frac{1}{2}\)). Similarly, the second column can be obtained by multiplying the first column by \(2\).

      • A singular matrix suggests that solving a system of linear equations in the usual way is not possible.

Rank of a matrix \((r)\)
  • It is defined as the maximum number of linearly independent rows or columns.

  • At most, the rank can be the number of rows (\(m\)) or the number of columns (\(n\)), so the maximum possible rank is limited by the smaller of the two dimensions \(m\) and \(n\).

  • A matrix \(A\) is said to have full rank if its rank is equal to the smaller of the number of rows or columns, i.e., \(r = \min(m, n)\).

  • A matrix \(A\) is rank deficient if \(r \lt \min(m, n)\).

  • If a square matrix with dimensions \(n\times n\) is rank deficient, then it is singular and not invertible.

  • Example:

    • \(A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 1 & 2 & 3 \end{bmatrix}\)

    • Rows:

      • Row \(2\) is linearly dependent on row \(1\) (row \(2 =\) \(2\ \times\) row \(1\)).

      • Row \(3\) is identical to row \(1\), i.e, linearly dependent on it (row \(3 =\) \(1\ \times\) row \(1\)).

      • Therefore, there is only one independent row.

    • Columns:

      • Column \(2\) is linearly dependent on column \(1\) (column \(2 =\) \(2\ \times\) column \(1\)).

      • Column \(3\) is linearly dependent on column $1$ (column \(3 =\) \(3\ \times\) colunm \(1\)).

      • Therefore, there is only one independent column.

    • Therefore, \(r=1\)

    • The rank of a matrix can be calculated in R as follows:

Click to show/hide code
A <- matrix(c(1, 2, 3,
              2, 4, 6,
              1, 2, 3), 
            nrow = 3, 
            byrow = TRUE)

# get the rank of the matrix A through the QR decomposition method
qr(A)$rank                   
[1] 1
Click to show/hide code
# Confirm matrix A is singular by calculating its determinant
det(A)                       
[1] 0
Click to show/hide code
# Get the inverse of the matrix A
solve(A) # The error confirms that the matrix A is not invertible
Error in solve.default(A): Lapack routine dgesv: system is exactly singular: U[2,2] = 0

2.6 Orthogonal matrix

  • A square matrix \(A\) with dimensions \(n\times n\) is referred to as orthogonal, if \(AA^T = I_n\) or alternatively \(A^T = A^{-1}\).

2.7 Sparse matrix

  • It is a matrix in which most of the elements are zero (in contrast to a dense matrix, where most of the elements are non-zero).

  • Example: \(\begin{bmatrix} 0 & 0 & 0 & 0 & 5 \\ 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 2 & 0 & 0 & 0 & 0 \end{bmatrix}\)

  • Storing sparse matrices in the usual form would waste a lot of memory. Therefore, special methods are used to store only the non-zero elements and their locations.

3 Mathematical operations on matrices

3.1 Addition and subtraction

  • Matrices can be added or subtracted only if they have the same dimensions.

  • The sum or difference of two matrices \(A\) and \(B\) is a matrix \(C\) with the same dimensions as \(A\) and \(B\), where each element of \(C\) is the sum or difference of the corresponding elements of \(A\) and \(B\) (i.e., the element \(c_{ij}\) of the matrix \(C\) is calculated as \(c_{ij} = a_{ij} \pm b_{ij}\)).

Calculate \(A+B\) for the matrices \(A = \begin{bmatrix} 1 & 5 \\ 2 & 3 \end{bmatrix}\) and \(B = \begin{bmatrix} 4 & -6 \\ 2 & 4 \end{bmatrix}\)

\[ A +B = \begin{bmatrix} 1 & 5 \\ 2 & 3 \end{bmatrix} + \begin{bmatrix} 4 & -6 \\ 2 & 4 \end{bmatrix} = \begin{bmatrix} 1+4 & 5-6 \\ 2+2 & 3+4 \end{bmatrix} = \begin{bmatrix} 5 & -1 \\ 4 & 7 \end{bmatrix} \]

3.2 Multiplication

  • Matrices can be multiplied only if the number of columns of the first matrix is equal to the number of rows of the second matrix.

  • The product of two matrices \(A_{m\times n}\) and \(B_{n\times p}\) is a matrix \(C\) with dimensions \(m\times p\) (therefore, \(AB \neq BA\), i.e., matrix multiplication is not commutative).

  • The element \(c_{ij}\) of the matrix \(C\) is calculated as \(c_{ij} =\displaystyle \sum_{k=1}^{n} a_{ik}b_{kj}\) where \(k\) is the index of the column of matrix \(A\) and the row of matrix \(B\). In other words, the element \(c_{ij}\) is the dot product of the \(i^{th}\) row of matrix \(A\) and the \(j^{th}\) column of matrix \(B\) (i.e., multiply the elements of each row of the first matrix by the elements of each column in the second matrix and then calculate the sum of the products).

  • \(I_m \times A_{m\times n} = A_{m\times n} \times I_n= A_{m \times n}\), where \(I_m\) and \(I_n\) are identity matrices with dimensions \(m\times m\) and \(n\times n\), respectively.

Calculate \(AB\) for the matrices \(A = \begin{bmatrix} 1 & 5 \\ -3 & 2 \end{bmatrix}\) and \(B = \begin{bmatrix} -5 & 6 \\ 4 & 2 \end{bmatrix}\)

\[ AB = \begin{bmatrix} 1 & 5 \\ -3 & 2 \end{bmatrix} \times \begin{bmatrix} -5 & 6 \\ 4 & 2 \end{bmatrix} = \begin{bmatrix} (1 \times -5) + (5 \times 4) & (1 \times 6) + (5 \times 2) \\ (-3 \times -5) + (2 \times 4) & (-3 \times 6) + (2 \times 2) \end{bmatrix} = \begin{bmatrix} 15 & 16 \\ 23 & -14 \end{bmatrix} \]

3.3 Scalar multiplication

  • If a matrix \(A\) is multiplied by any number (constant) \(c\), every element in \(A\) is multiplied by \(c\).
  • Example: \(2 \times \begin{bmatrix} 1 & 5 \\ -3 & 2 \end{bmatrix} = \begin{bmatrix} 2 & 10 \\ -6 & 4 \end{bmatrix}\)

4 Using matrices to solve a system of linear equations

  • Assume a system that consists of three linear equations as follows:

    • \(2x + 3y + z = 1\)

    • \(4x + 5y + 2z = 2\)

    • \(2x + 10y + 9z = 3\)

  • The system can be represented in matrix form as \(AX = B\), where:

    • \(A = \begin{bmatrix} 2 & 3 & 1 \\ 4 & 5 & 2 \\ 2 & 10 & 9 \end{bmatrix}\), the matrix of coefficients

    • \(X = \begin{bmatrix} x \\ y \\ z \end{bmatrix}\), the matrix of variables (unknowns)

    • \(B = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}\), the matrix of constants

  • The unknowns can be calculated as \(X = A^{-1}B\).

  • The solution can be obtained in R as follows:

Click to show/hide code
A <- matrix(c(2, 3, 1, 4, 5, 2, 2, 10, 9), 
            nrow = 3, 
            byrow = TRUE)

B <- matrix(c(1, 2, 3),
            nrow = 3)

X <- solve(A) %*% B
X
      [,1]
[1,] 0.375
[2,] 0.000
[3,] 0.250
  • So, \(x = 0.375,\ y = 0,\ z = 0.25\) is the solution to this system of linear equations.

  • This is the basis of calculating the least squares coefficients in multiple linear regression that will be covered under regression analysis.

5 Eigenvalues and eigenvectors

  • For a square matrix \(A\) with dimensions \(n\times n\), a scalar (constant) \(λ\) and a non-zero vector \(v\) can be obtained satisfying the following equality \(Av = \lambda v\), where \(\lambda\) is known as the eigenvalue and \(v\) is known as the eigenvector.

  • This means that the transformation on the vector \(v\) via the matrix \(A\) is equivalent to transforming the vector \(v\) via multiplying it by the scalar \(λ\).

  • This concept will be revisited while discussing principal component analysis (PCA) and factor analysis.

  • Example:

    • \(A = \begin{bmatrix} 2 & 3 \\ 6 & 1 \end{bmatrix}\)

    • The eigenvalues and eigenvectors can be calculated in R using the function eigen():

    Click to show/hide code
    A <- matrix(c(2, 3, 6, 1), 
                nrow = 2, 
                byrow = TRUE)
    
    eig_A <- eigen(A)
    eig_A
    eigen() decomposition
    $values
    [1]  5.772002 -2.772002
    
    $vectors
              [,1]       [,2]
    [1,] 0.6224656 -0.5322295
    [2,] 0.7826472  0.8466001
    • There are two eigenvalues \(5.772002\) and \(-2.772002\).

    • There are two corresponding eigenvectors \(\begin{bmatrix} 0.6224656 \\ 0.7826472 \end{bmatrix}\) and \(\begin{bmatrix} -0.5322295 \\ 0.8466001 \end{bmatrix}\).

    • The eigenvalues and eigenvectors satisfy the following equations:

      • \(\begin{bmatrix} 2 & 3 \\ 6 & 1 \end{bmatrix} \times \begin{bmatrix} 0.6224656 \\ 0.7826472 \end{bmatrix} = 5.772002 \times \begin{bmatrix} 0.6224656 \\ 0.7826472 \end{bmatrix}\) = \(\begin{bmatrix} 3.577708 \\ 4.500000 \end{bmatrix}\).

      • \(\begin{bmatrix} 2 & 3 \\ 6 & 1 \end{bmatrix} \times \begin{bmatrix} -0.5322295 \\ 0.8466001 \end{bmatrix} = -2.772002 \times \begin{bmatrix} -0.5322295 \\ 0.8466001 \end{bmatrix}\) = \(\begin{bmatrix} 1.477708 \\ -2.350000 \end{bmatrix}\).

    Click to show/hide code
    # extract the first eigenvalue
    eig_value_1 <- eig_A$values[1]
    eig_value_1
    [1] 5.772002
    Click to show/hide code
    # extract the second eigenvalue
    eig_value_2 <- eig_A$values[2]
    eig_value_2
    [1] -2.772002
    Click to show/hide code
    # extract the first eigenvector
    eig_vector_1 <- eig_A$vectors[, 1]
    eig_vector_1
    [1] 0.6224656 0.7826472
    Click to show/hide code
    # extract the second eigenvector
    eig_vector_2 <- eig_A$vectors[, 2]
    eig_vector_2
    [1] -0.5322295  0.8466001
    Click to show/hide code
    # Check if the first equality holds
    all(round(A %*% eig_vector_1, 6) == round(eig_value_1 * eig_vector_1, 6))
    [1] TRUE
    Click to show/hide code
    # Check if the second equality holds
    all(round(A %*% eig_vector_2, 6) == round(eig_value_2 * eig_vector_2, 6))
    [1] TRUE
    • The function all() in the above code was used to check if the equality holds for all elements as the vectors on both sides have two elements.

6 References


7 Add your comments

Back to top