You're reading from Hands-On Mathematics for Deep Learning Build a solid mathematical foundation for training efficient deep neural networks

Product type Paperback

Published in Jun 2020

Publisher Packt

ISBN-13 9781838647292

Length 364 pages

Edition 1st Edition

Languages

Python

Tools

Pandas

Concepts

Deep Learning

Author (1):

Jay Dawani

View More author details

Matrix decompositions

Matrix decompositions are a set of methods that we use to describe matrices using more interpretable matrices and give us insight to the matrices' properties.

Determinant

Earlier, we got a quick glimpse of the determinant of a square 2x2 matrix when we wanted to determine whether a square matrix was invertible. The determinant is a very important concept in linear algebra and is used frequently in the solving of systems of linear equations.

Note: The determinant only exists when we have square matrices.

Notationally, the determinant is usually written as either or .

Let's take an arbitrary n×n matrix A, as follows:

We will also take its determinant, as follows:

The determinant reduces the matrix to a real number (or, in other words, maps A onto a real number).

We start by checking if a square matrix is invertible. Let's take a 2x2 matrix, and from the earlier definition, we know that the matrix applied to its inverse produces the identity matrix. It works no differently than when we multiply a with (only true when ), which produces 1, except with matrices. Therefore, AA^-1 = I.

Let's go ahead and find the inverse of our matrix, as follows:

A is invertible only when , and this resulting value is what we call the determinator.

Now that we know how to find the determinant in the 2x2 case, let's move on to a 3x3 matrix and find its determinant. It looks like this:

This produces the following:

I know that probably looks more intimidating, but it's really not. Take a moment to look carefully at what we did and how this would work for a larger n×n matrix.

If we have an n×n matrix and if it can be triangularly factorized (upper or lower), then its determinant will be the product of all the pivot values. For the sake of simplicity, we will represent all triangularly factorizable matrices with T. Therefore, the determinant can be written like so:

Looking at the preceding 3×3 matrix example, I'm sure you've figured out that computing the determinant for matrices where n > 3 is quite a lengthy process. Luckily, there is a way in which we can simplify the calculation, and this is where the Laplace expansion comes to the rescue.

When we want to find the determinant of an n×n matrix, the Laplace expansion finds the determinant of (n×1)×(n×1) matrices and does so repeatedly until we get to 2×2 matrices. In general, we can calculate the determinant of an n×n matrix using 2×2 matrices.

Let's again take an n-dimensional square matrix, where . We then expand for all , as follows:

Expansion along row i:

Expansion along row j:

And is a sub-matrix of , which we get after removing row i and column j.

For example, we have a 3×3 matrix, as follows:

We want to find its determinant using the Laplace expansion along the first row. This results in the following:

We can now use the preceding equation from the 2×2 case and calculate the determinant for A, as follows:

Here are some of the very important properties of determinants that are important to know:

There is one other additional property of the determinant, and it is that we can use it to find the volume of an object in whose vertices are formed by the column vectors in the matrix.

As an example, let's take a parallelogram in with the vectors and . By taking the determinant of the 2×2 matrix, we find the area of the shape (we can only find the volume for objects in or higher), as follows:

You are welcome to try it for any 3×3 matrix for yourselves as practice.

Eigenvalues and eigenvectors

Let's imagine an arbitrary real n×n matrix, A. It is very possible that when we apply this matrix to some vector, they are scaled by a constant value. If this is the case, we say that the nonzero -dimensional vector is an eigenvector of A, and it corresponds to an eigenvalue λ. We write this as follows:

Note: The zero vector (0) cannot be an eigenvector of A, since A0 = 0 = λ0 for all λ.

Let's consider again a matrix A that has an eigenvector x and a corresponding eigenvalue λ. Then, the following rules will apply:

If we have a matrix A and it has been shifted from its current position to , then it has the eigenvector x and the corresponding eigenvalue , for all , so that .
If the matrix A is invertible, then x is also an eigenvector of the inverse of the matrix, , with the corresponding eigenvalue .
for any .

We know from earlier in the chapter that whenever we multiply a matrix and a vector, the direction of the vector is changed, but this is not the case with eigenvectors. They are in the same direction as A, and thus x remains unchanged. The eigenvalue, being a scalar value, tells us whether the eigenvector is being scaled, and if so, how much, as well as if the direction of the vector has changed.

Another very fascinating property the determinant has is that it is equivalent to the product of the eigenvalues of the matrix, and it is written as follows:

But this isn't the only relation that the determinant has with eigenvalues. We can rewrite in the form. And since this is equal to zero, this means it is a non-invertible matrix, and therefore its determinant too must be equal to zero. Using this, we can use the determinant to find the eigenvalues. Let's see how.

Suppose we have . Then, its determinant is shown as follows:

We can rewrite this as the following quadratic equation:

We know that the quadratic equation will give us both the eigenvalues . So, we plug our values into the quadratic formula and get our roots.

Another interesting property is that when we have triangular matrices such as the ones we found earlier in this chapter, their eigenvalues are the pivot values. So, if we want to find the determinant of a triangular matrix, then all we have to do is find the product of all the entries along the diagonal.

Trace

Given an n×n matrix A, the sum of all the entries on the diagonal is called the trace. We write it like so:

The following are four important properties of the trace:

A very interesting property of the trace is that it is equal to the sum of its eigenvalues, so that the following applies:

Orthogonal matrices

The concept of orthogonality arises frequently in linear algebra. It's really just a fancy word for perpendicularity, except it goes beyond two dimensions or a pair of vectors.

But to get an understanding, let's start with two column vectors . If they are orthogonal, then the following holds:

Orthogonal matrices are a special kind of matrix where the columns are pairwise orthonormal. What this means is that we have a matrix with the following property:

Then, we can deduce that (that is, the transpose of Q is also the inverse of Q).

As with other types of matrices, orthogonal matrices have some special properties.

Firstly, they preserve inner products, so that the following applies:

This brings us to the second property, which states that 2-norms are preserved for orthogonal matrices, which we see as follows:

When multiplying by orthogonal matrices, you can think of it as a transformation that preserves length, but the vector may be rotated about the origin by some degree.

The most well-known orthogonal matrix that is also orthonormal is a special matrix we have dealt with a few times already. It is the identity matrix I, and since it represents a unit of length in the direction of axes, we generally refer to it as the standard basis.

Diagonalization and symmetric matrices

Let's suppose we have a matrix that has eigenvectors. We put these vectors into a matrix X that is invertible and multiply the two matrices. This gives us the following:

We know from that when dealing with matrices, this becomes , where and each x_i has a unique λ_i. Therefore, .

Let's move on to symmetric matrices. These are special matrices that, when transposed, are the same as the original, implying that and for all , . This may seem rather trivial, but its implications are rather strong.

The spectral theorem states that if a matrix is a symmetric matrix, then there exists an orthonormal basis for , which contains the eigenvectors of A.

This theorem is important to us because it allows us to factorize symmetric matrices. We call this spectral decomposition (also sometimes referred to as Eigendecomposition).

Suppose we have an orthogonal matrix Q, with the orthonormal basis of eigenvectors and being the matrix with corresponding eigenvalues.

From earlier, we know that for all ; therefore, we have the following:

Note: Λ comes after Q because it is a diagonal matrix, and the s need to multiply the individual columns of Q.

By multiplying both sides by Q^T, we get the following result:

Singular value decomposition

Singular Value Decomposition (SVD) is widely used in linear algebra and is known for its strength, particularly arising from the fact that every matrix has an SVD. It looks like this:

For our purposes, let's suppose , , , and , and that U, V are orthogonal matrices, whereas ∑ is a matrix that contains singular values (denoted by σ_i) of A along the diagonal.

∑ in the preceding equation looks like this:

We can also write the SVD like so:

Here, u_i, v_i are the column vectors of U, V.

Cholesky decomposition

As I'm sure you've figured out by now, there is more than one way to factorize a matrix, and there are special methods for special matrices.

The Cholesky decomposition is square root-like and works only on symmetric positive definite matrices.

This works by factorizing A into the form LL^T. Here, L, as before, is a lower triangular matrix.

Do develop some intuition. It looks like this:

However, here, L is called a Cholesky factor.

Let's take a look at the case where .

We know from the preceding matrix that ; therefore, we have the following:

Let's multiply the upper and lower triangular matrices on the right, as follows:

Writing out A fully and equating it to our preceding matrix gives us the following:

We can then compare, element-wise, the corresponding entries of A and LL^T and solve algebraically for _,as follows:

We can repeat this process for any symmetric positive definite matrix, and compute the l_i,j values given a_i,j.

You're reading from Hands-On Mathematics for Deep Learning Build a solid mathematical foundation for training efficient deep neural networks

Table of Contents (19) Chapters