References: edX online course MIT 8.05.1x Week 3.
Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 3.
A linear operator can be represented as a matrix with elements , but in order to do this, we need to specify which basis we’re using for the vector space . Suppose we have a set of basis vectors and we know the result of operating on each basis vector with . We can express the result of as another vector which can be written in terms of the original basis vectors as
This defines the matrix elements in the basis . [In Zwiebach’s notes, he usually uses to represent the basis vectors, while in his lectures he tends to use . I’ll stick to to be consistent with the notes.]
Equation 1 may not look quite right, since we are summing over the rows of the matrix multiplied by the vectors . Usually in matrix multiplication, we sum over the columns of the matrix on the left and the rows of the matrix (or vector) on the right. However, 1 isn’t actually a matrix multiplication formula, since each is an entire basis vector, and not a component from one vector.
To see that this formula does make sense, and does coincide with the usual definition of matrix multiplication, suppose we have an orthonormal basis where each vector is represented as a column vector with all entries equal to zero except for the th element, which is 1. In that case, the result of operating on one particular basis vector with is
In the column vector in the first line, all entries are zero except for the th entry which is 1. Multiplying a square matrix into this column vector using the normal rules for matrix multiplication simply copies the th column of into a column vector, as shown in the second line.
Although the matrix entries in general depend on the basis, the identity operator is the same in all bases. Since we must have
This can be true for all only if .
If we start with 1 as the definition of the matrix elements of a linear operator , we can actually derive the traditional formula for matrix multiplication from it. If we didn’t know the matrix multiplication formula beforehand (that is, the formula where we multiply a row of the left matrix into a column of the right matrix), we might naively assume that in order to multiply two matrices, we just multiply together the corresponding entries in the two matrices. If that were true, then matrix multiplication could be defined only for two matrices that had the same dimensions, as in two matrices, say.
As you probably know, the accepted formula for the product of two matrices is valid if the number of columns in the left matrix equals the number of rows in the right matrix. To see how this formula arises naturally out of the matrix representation of linear operators, we’ll consider two vectors and and look at their components along some basis in a vector space . That is, we can expand and as (to save writing, I’ll use the summation convention in which any pair of repeated indices is assumed to be summed):
Now suppose there is a linear operator that transforms into , so that
If we know the effect of operating on each basis vector with , we can plug 1 into this equation to get
In the second line, we used the fact that the are just numbers (not vectors), so they commute with . In the last line, the quantity is the sum over the columns of and the rows of (we’re writing as a column vector), and so is a traditional product of an matrix into an -component column vector. Also, referring back to 8, we see that is , the component of along the basis vector .
We can apply similar logic to the product of two operators, and . Suppose the product operates on a basis vector .
In this derivation, we’ve used the fact that the matrix elements and are just numbers, so they commute with all operators. We also applied 1 in the second and fourth lines. By comparing the last two lines, we see that the matrix element of the product is formed by taking the usual matrix product of and , that is, by multiplying rows of into columns of :
Thus the traditional matrix product is actually a consequence of a consistent definition of a product of linear operators.