Featured post

# Welcome to Physics Pages

This blog consists of my notes and solutions to problems in various areas of mainstream physics. An index to the topics covered is contained in the links in the sidebar on the right, or in the menu at the top of the page.

This isn’t a “popular science” site, in that most posts use a fair bit of mathematics to explain their concepts. Thus this blog aims mainly to help those who are learning or reviewing physics in depth. More details on what the site contains and how to use it are on the welcome page.

Despite Stephen Hawking’s caution that every equation included in a book (or, I suppose in a blog) would halve the readership, this blog has proved very popular since its inception in December 2010. Details of the number of visits and distinct visitors are given on the hit statistics page.

Many thanks to my loyal followers and best wishes to everyone who visits. I hope you find it useful. Constructive criticism (or even praise) is always welcome, so feel free to leave a comment in response to any of the posts.

I should point out that although I did study physics at the university level, this was back in the 1970s and by the time I started this blog in December 2010, I had forgotten pretty much everything I had learned back then. This blog represents my journey back to some level of literacy in physics. I am by no means a professional physicist or an authority on any aspect of the subject. I offer this blog as a record of my own notes and problem solutions as I worked through various books, in the hope that it will help, and possibly even inspire, others to explore this wonderful subject.

Before leaving a comment, you may find it useful to read the “Instructions for commenters“.

# Eigenvalues and eigenvectors

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 5.

While studying quantum mechanics, we have made extensive use of the eigenvalues and eigenvectors (the latter usually called eigenstates in quantum theory) of hermitian operators, since an observable quantity in quantum mechanics is always represented by a hermitian operator and the spectrum of possible values for a given observable is equivalent to the set of eigenvalues of that operator.

It’s useful to re-examine eigenvalues and eigenvectors from a strictly mathematical viewpoint, since this allows us to put precise definitions on many of the terms in common use. As usual, suppose we start with a vector space ${V}$ and an operator ${T}$. Suppose there is a one-dimensional subspace ${U}$ of ${V}$ which has the property that for any vector ${u\in U}$, ${Tu=\lambda u}$. That is, the operator ${T}$ maps any vector ${u}$ back into another vector in the same subspace ${U}$. In that case, ${U}$ is said to be an invariant subspace under the operator ${T}$.

You can think of this in geometric terms if we have some ${n}$-dimensional vector space ${V}$, and a one-dimensional subspace ${U}$ consisting of all vectors parallel to some straight line within ${V}$. The operator ${T}$ acting on any vector ${u}$ parallel to that line produces another vector which is also parallel to the same line. Of course we can’t push the geometric illustration too far, since in general ${V}$ and ${U}$ can be complex vector spaces, so the result of acting on ${u}$ with ${T}$ might give you some complex number ${\lambda}$ multiplied by ${u}$.

The equation

$\displaystyle Tu=\lambda u \ \ \ \ \ (1)$

is called an eigenvalue equation, and the number ${\lambda\in\mathbb{F}}$ is called the eigenvalue. The vector ${u}$ itself is called the eigenvector corresponding to the eigenvalue ${\lambda}$. Since we can multiply both sides of this equation by any number ${c}$, any multiple of ${u}$ is also an eigenvector corresponding to ${\lambda}$, so any vector ‘parallel’ to ${u}$ is also an eigenvector. (I’ve put ‘parallel’ in quotes, since we’re allowing for multiplication of ${u}$ by complex as well as real numbers.)

It can happen that, for a particular value of ${\lambda}$, there are two or more linearly independent (that is, non-parallel) eigenvectors. In that case, the subspace spanned by the eigenvectors is two- or higher-dimensional.

Another way of writing 1 is by introducing the identity operator ${I}$:

$\displaystyle \left(T-\lambda I\right)u=0 \ \ \ \ \ (2)$

If this equation has a solution other than ${u=0}$, then the operator ${T-\lambda I}$ has a non-trivial null space, which in turn means that ${T-\lambda I}$ is not injective (not one-to-one) and therefore not invertible. Also, the eigenvectors of ${T}$ with eigenvalue ${\lambda}$ are those vectors ${u}$ in the null space of ${T-\lambda I}$.

An important result is

Theorem 1 Suppose ${\lambda_{1},\ldots,\lambda_{m}}$ are distinct eigenvalues of ${T}$ and ${v_{1},\ldots,v_{m}}$ are the corresponding non-zero eigenvectors. Then the set ${v_{1},\ldots,v_{m}}$ is linearly independent.

Proof: Suppose to the contrary that ${v_{1},\ldots,v_{m}}$ is linearly dependent. Then there must be some subset that is linearly independent. Suppose that ${k}$ is the smallest positive integer such that ${v_{k}}$ can be written in terms of ${v_{1},\ldots,v_{k-1}}$. That is, the set ${v_{1},\ldots,v_{k-1}}$ is a linearly independent subset of ${v_{1},\ldots,v_{m}}$. In that case, there are numbers ${a_{1},\ldots,a_{k-1}\in\mathbb{F}}$ such that

$\displaystyle v_{k}=\sum_{i=1}^{k-1}a_{i}v_{i} \ \ \ \ \ (3)$

If we apply the operator ${T}$ to both sides and use the eigenvalue equation, we have

 $\displaystyle Tv_{k}$ $\displaystyle =$ $\displaystyle \lambda_{k}v_{k}\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{k-1}a_{i}Tv_{i}\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{k-1}a_{i}\lambda_{i}v_{i} \ \ \ \ \ (6)$

We can multiply both sides of 3 by ${\lambda_{k}}$ and subtract to get

 $\displaystyle \left(\lambda_{k}-\lambda_{k}\right)v_{k}$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{k-1}a_{i}\left(\lambda_{i}-\lambda_{k}\right)v_{i}\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (8)$

Since the set of vectors ${v_{1},\ldots,v_{k-1}}$ is linearly independent, and ${\lambda_{k}\ne\lambda_{i}}$ for ${i=1,\ldots,k-1}$, the only solution of this equation is ${a_{i}=0}$ for ${i=1,\ldots,k-1}$. But this would make ${v_{k}=0}$, contrary to our assumption that ${v_{k}}$ is a non-zero eigenvector of ${T}$. Therefore the set ${v_{1},\ldots,v_{m}}$ is linearly independent. $\Box$

It turns out that there are some operators on real vector spaces that don’t have any eigenvalues. A simple example is the 2-dimensional vector space consisting of the ${xy}$ plane. The rotation operator which rotates any vector about the origin (by some angle other than ${2\pi}$) doesn’t leave any vector parallel to itself and thus has no eigenvalues or eigenvectors.

However, in a complex vector space, things are a bit neater. This leads to the following theorem:

Theorem 2 Every operator on a finite-dimensional, nonzero, complex vector space has at least one eigenvalue.

Proof: Suppose ${V}$ is a complex vector space with dimension ${n>0}$. For some vector ${v\in V}$ we can write the ${n+1}$ vectors

$\displaystyle v,Tv,T^{2}v,\ldots,T^{n}v \ \ \ \ \ (9)$

Because we have ${n+1}$ vectors in an ${n}$-dimensional vector space, these vectors must be linearly dependent, which means we can find complex numbers ${a_{0},\ldots,a_{n}\in\mathbb{C}}$, not all zero, such that

$\displaystyle 0=a_{0}v+a_{1}Tv+\ldots+a_{n}T^{n}v \ \ \ \ \ (10)$

We can consider a polynomial in ${z}$ with the ${a_{i}}$ as coefficients:

$\displaystyle p\left(z\right)=a_{0}+a_{1}z+\ldots+a_{n}z^{n} \ \ \ \ \ (11)$

The Fundamental Theorem of Algebra states that any polynomial of degree ${n}$ can be factored into ${n}$ linear factors. In our case, the actual degree of ${p\left(z\right)}$ is ${m\le n}$ since ${a_{n}}$ could be zero. So we can factor ${p\left(z\right)}$ as follows:

$\displaystyle p\left(z\right)=c\left(z-\lambda_{1}\right)\ldots\left(z-\lambda_{m}\right) \ \ \ \ \ (12)$

where ${c\ne0}$.

Comparing this to 10, we can write that equation as

 $\displaystyle 0$ $\displaystyle =$ $\displaystyle a_{0}v+a_{1}Tv+\ldots+a_{n}T^{n}v\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(a_{0}I+a_{1}T+\ldots+a_{n}T^{n}\right)v\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle c\left(T-\lambda_{1}I\right)\ldots\left(T-\lambda_{m}I\right)v \ \ \ \ \ (15)$

All the ${T-\lambda_{i}I}$ operators in the last line commute with each other since ${I}$ commutes with everything and ${T}$ commutes with itself, so in order for the last line to be zero, there has to be at least one ${\lambda_{i}}$ such that ${\left(T-\lambda_{i}I\right)v=0}$. That is, there is at least one ${\lambda_{i}}$ such that ${T-\lambda_{i}I}$ has a nonzero null space, which means ${\lambda_{i}}$ is an eigenvalue.$\Box$

# Matrix representation of linear operators: change of basis

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 3.

We’ve seen that the matrix representation of a linear operator depends on the basis we’ve chosen within a vector space ${V}$. We now look at how the matrix representation changes if we change the basis. In what follows, we’ll consider two sets of basis vectors ${\left\{ v\right\} }$ and ${\left\{ u\right\} }$ and two operators ${A}$ and ${B}$. Operator ${A}$ transforms the basis ${\left\{ v\right\} }$ into the basis ${\left\{ u\right\} }$, while ${B}$ does the reverse. That is

 $\displaystyle Av_{i}$ $\displaystyle =$ $\displaystyle u_{i}\ \ \ \ \ (1)$ $\displaystyle Bu_{i}$ $\displaystyle =$ $\displaystyle v_{i} \ \ \ \ \ (2)$

for all ${i=1,\ldots,n}$. From this definition, we can see that ${A=B^{-1}}$ and ${B=A^{-1}}$, since

 $\displaystyle u_{i}$ $\displaystyle =$ $\displaystyle Av_{i}=ABu_{i}\ \ \ \ \ (3)$ $\displaystyle v_{i}$ $\displaystyle =$ $\displaystyle Bu_{i}=BAv_{i} \ \ \ \ \ (4)$

Theorem 1 An operator (like ${A}$ or ${B}$ above) that transforms one set of basis vectors into another has the same matrix representation in both bases.

Proof: In matrix form, we have (remember we’re using the summation convention on repeated indices):

 $\displaystyle Av_{i}$ $\displaystyle =$ $\displaystyle A_{ji}\left(\left\{ v\right\} \right)v_{j}\ \ \ \ \ (5)$ $\displaystyle Au_{i}$ $\displaystyle =$ $\displaystyle A_{ji}\left(\left\{ u\right\} \right)u_{j} \ \ \ \ \ (6)$

Note that the matrix elements depend on different bases in the two equations.

We can now operate with ${A}$ again, using 1, to get

 $\displaystyle Au_{i}$ $\displaystyle =$ $\displaystyle A\left(Av_{i}\right)\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle A\left(A_{ji}\left(\left\{ v\right\} \right)v_{j}\right)\ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle A_{ji}\left(\left\{ v\right\} \right)Av_{j}\ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle A_{ji}\left(\left\{ v\right\} \right)u_{j} \ \ \ \ \ (10)$

Comparing the last line with 6, we see that

$\displaystyle A_{ji}\left(\left\{ v\right\} \right)=A_{ji}\left(\left\{ u\right\} \right)$

Since the matrix elements are just numbers, this means that the elements in the two matrices ${A_{ji}\left(\left\{ v\right\} \right)}$ and ${A_{ji}\left(\left\{ u\right\} \right)}$ are the same.

We could do the same analysis using the ${B}$ operator with the same result:

$\displaystyle B_{ji}\left(\left\{ v\right\} \right)=B_{ji}\left(\left\{ u\right\} \right) \ \ \ \ \ (11)$

$\Box$

We can now turn to the matrix representations of a general operator ${T}$ in two different bases. In this case, ${T}$ can perform any linear transformation, so it doesn’t necessarily transform one set of basis vectors into another set of basis vectors. Consider first the case where ${T}$ operates on each set of basis vectors given above:

 $\displaystyle Tv_{i}$ $\displaystyle =$ $\displaystyle T_{ji}\left(\left\{ v\right\} \right)v_{j}\ \ \ \ \ (12)$ $\displaystyle Tu_{i}$ $\displaystyle =$ $\displaystyle T_{ji}\left(\left\{ u\right\} \right)u_{j} \ \ \ \ \ (13)$

Unless ${T}$ is an operator like ${A}$ or ${B}$ above, in general ${T_{ji}\left(\left\{ v\right\} \right)\ne T_{ji}\left(\left\{ u\right\} \right)}$. We can see how these two matrices are related by using operators ${A}$ and ${B}$ above to write

 $\displaystyle Tu_{i}$ $\displaystyle =$ $\displaystyle T\left(A_{ji}v_{j}\right)\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle A_{ji}Tv_{j}\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle A_{ji}T_{kj}\left(\left\{ v\right\} \right)v_{k}\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle A_{ji}T_{kj}\left(\left\{ v\right\} \right)Bu_{k}\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle A_{ji}T_{kj}\left(\left\{ v\right\} \right)A^{-1}u_{k}\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle A_{ji}T_{kj}\left(\left\{ v\right\} \right)A_{pk}^{-1}u_{p}\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[A_{pk}^{-1}T_{kj}\left(\left\{ v\right\} \right)A_{ji}\right]u_{p}\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle T_{pi}\left(\left\{ u\right\} \right)u_{p} \ \ \ \ \ (21)$

We don’t need to specify the basis for the ${A}$ or ${B}$ matrices since the matrices are the same in both bases as we just saw above. The last line is just the expansion of ${Tu_{i}}$ in terms of the ${\left\{ u\right\} }$ basis. In the penultimate line, we see that the quantity in square brackets is the product of 3 matrices:

$\displaystyle A_{pk}^{-1}T_{kj}\left(\left\{ v\right\} \right)A_{ji}=\left[A^{-1}T\left(\left\{ v\right\} \right)A\right]_{pi} \ \ \ \ \ (22)$

The required transformation is therefore

$\displaystyle T\left(\left\{ u\right\} \right)=A^{-1}T\left(\left\{ v\right\} \right)A \ \ \ \ \ (23)$

where ${u_{i}=Av_{i}}$.

As a check, note that if ${T=A}$ or ${T=B=A^{-1}}$, we reclaim the result in the theorem above, namely that ${A\left(\left\{ u\right\} \right)=A\left(\left\{ v\right\} \right)}$ and ${B\left(\left\{ u\right\} \right)=B\left(\left\{ v\right\} \right)}$.

Trace and determinant

The trace of a matrix is the sum of its diagonal elements, written as ${\mbox{tr }T}$. A useful property of the trace is that

$\displaystyle \mbox{tr }\left(AB\right)=\mbox{tr }\left(BA\right) \ \ \ \ \ (24)$

We can prove this by looking at the components. If ${C=AB}$ then

$\displaystyle C_{ij}=A_{ik}B_{kj} \ \ \ \ \ (25)$

The trace of ${C}$ is the sum of its diagonal elements, written as ${C_{ii}}$, so

 $\displaystyle \mbox{tr }C$ $\displaystyle =$ $\displaystyle \mbox{tr }\left(AB\right)\ \ \ \ \ (26)$ $\displaystyle$ $\displaystyle =$ $\displaystyle A_{ik}B_{ki}\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle B_{ki}A_{ik}\ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[BA\right]_{kk}\ \ \ \ \ (29)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \mbox{tr }\left(BA\right) \ \ \ \ \ (30)$

From this we can generalize to the case of the trace of a product of any number of matrices and obtain the cyclic rule:

$\displaystyle \mbox{tr}\left(A_{1}A_{2}\ldots A_{n}\right)=\mbox{tr}\left(A_{n}A_{1}A_{2}\ldots A_{n-1}\right) \ \ \ \ \ (31)$

Going back to 23, we have

 $\displaystyle \mbox{tr }T\left(\left\{ u\right\} \right)$ $\displaystyle =$ $\displaystyle \mbox{tr}\left(A^{-1}T\left(\left\{ v\right\} \right)A\right)\ \ \ \ \ (32)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \mbox{tr}\left(AA^{-1}T\left(\left\{ v\right\} \right)\right)\ \ \ \ \ (33)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \mbox{tr }T\left(\left\{ v\right\} \right) \ \ \ \ \ (34)$

Thus the trace of any linear operator is invariant under a change of basis.

For the determinant, we have the results that the determinant of a product of matrices is equal to the product of the determinants, and the determinant of a matrix inverse is the reciprocal of the determinant of the original matrix. Therefore

 $\displaystyle \mbox{det}\left(T\left(\left\{ u\right\} \right)\right)$ $\displaystyle =$ $\displaystyle \mbox{det}\left(A^{-1}T\left(\left\{ v\right\} \right)A\right)\ \ \ \ \ (35)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\mbox{det}A}{\mbox{det}A}\mbox{det}T\left(\left\{ v\right\} \right)\ \ \ \ \ (36)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \mbox{det}T\left(\left\{ v\right\} \right) \ \ \ \ \ (37)$

Thus the determinant is also invariant under a change of basis.

# Matrix representation of linear operators; matrix multiplication

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 3.

A linear operator ${T}$ can be represented as a matrix with elements ${T_{ij}}$, but in order to do this, we need to specify which basis we’re using for the vector space ${V}$. Suppose we have a set of basis vectors ${\left\{ v\right\} =\left(v_{1},v_{2},\ldots,v_{n}\right)}$ and we know the result of operating on each basis vector with ${T}$. We can express the result of ${Tv_{j}}$ as another vector ${v_{j}^{\prime}}$ which can be written in terms of the original basis vectors as

$\displaystyle v_{j}^{\prime}=\sum_{i=1}^{n}T_{ij}v_{i} \ \ \ \ \ (1)$

This defines the matrix elements ${T_{ij}}$ in the basis ${\left\{ v\right\} }$. [In Zwiebach’s notes, he usually uses ${v_{i}}$ to represent the basis vectors, while in his lectures he tends to use ${e_{i}}$. I’ll stick to ${v_{i}}$ to be consistent with the notes.]

Equation 1 may not look quite right, since we are summing over the rows of the matrix ${T_{ij}}$ multiplied by the vectors ${v_{i}}$. Usually in matrix multiplication, we sum over the columns of the matrix on the left and the rows of the matrix (or vector) on the right. However, 1 isn’t actually a matrix multiplication formula, since each ${v_{i}}$ is an entire basis vector, and not a component from one vector.

To see that this formula does make sense, and does coincide with the usual definition of matrix multiplication, suppose we have an orthonormal basis where each vector ${v_{i}}$ is represented as a column vector with all entries equal to zero except for the ${i}$th element, which is 1. In that case, the result of operating on one particular basis vector ${v_{k}}$ with ${T}$ is

 $\displaystyle Tv_{k}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} \ldots & T_{1k} & \ldots\\ \vdots & T_{2k} & \vdots\\ \vdots & \vdots & \vdots\\ \vdots & \vdots & \vdots\\ \ldots & T_{nk} & \ldots \end{array}\right]\left[\begin{array}{c} 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right]\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{c} T_{1k}\\ T_{2k}\\ \vdots\\ \vdots\\ T_{nk} \end{array}\right]\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle T_{1k}\left[\begin{array}{c} 1\\ 0\\ \vdots\\ \vdots\\ 0 \end{array}\right]+T_{2k}\left[\begin{array}{c} 0\\ 1\\ 0\\ \vdots\\ 0 \end{array}\right]+\ldots+T_{nk}\left[\begin{array}{c} 0\\ \vdots\\ \vdots\\ \vdots\\ 1 \end{array}\right]\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{n}T_{ij}v_{i} \ \ \ \ \ (5)$

In the column vector in the first line, all entries are zero except for the ${k}$th entry which is 1. Multiplying a square ${n\times n}$ matrix ${T_{ij}}$ into this column vector using the normal rules for matrix multiplication simply copies the ${k}$th column of ${T_{ij}}$ into a column vector, as shown in the second line.

Although the matrix entries ${T_{ij}}$ in general depend on the basis, the identity operator ${I}$ is the same in all bases. Since ${Iv_{j}=v_{j}}$ we must have

$\displaystyle Iv_{j}=\sum_{i=1}^{n}T_{ij}v_{i}=v_{j} \ \ \ \ \ (6)$

This can be true for all ${v_{j}}$ only if ${T_{ij}=\delta_{ij}}$.

Matrix multiplication

If we start with 1 as the definition of the matrix elements of a linear operator ${T}$, we can actually derive the traditional formula for matrix multiplication from it. If we didn’t know the matrix multiplication formula beforehand (that is, the formula where we multiply a row of the left matrix into a column of the right matrix), we might naively assume that in order to multiply two matrices, we just multiply together the corresponding entries in the two matrices. If that were true, then matrix multiplication could be defined only for two matrices that had the same dimensions, as in two ${n\times m}$ matrices, say.

As you probably know, the accepted formula for the product of two matrices is valid if the number of columns in the left matrix equals the number of rows in the right matrix. To see how this formula arises naturally out of the matrix representation of linear operators, we’ll consider two vectors ${a}$ and ${b}$ and look at their components along some basis ${\left\{ v\right\} }$ in a vector space ${V}$. That is, we can expand ${a}$ and ${b}$ as (to save writing, I’ll use the summation convention in which any pair of repeated indices is assumed to be summed):

 $\displaystyle a$ $\displaystyle =$ $\displaystyle a_{i}v_{i}\ \ \ \ \ (7)$ $\displaystyle b$ $\displaystyle =$ $\displaystyle b_{i}v_{i} \ \ \ \ \ (8)$

Now suppose there is a linear operator ${T}$ that transforms ${a}$ into ${b}$, so that

$\displaystyle b=Ta \ \ \ \ \ (9)$

If we know the effect of operating on each basis vector ${v_{i}}$ with ${T}$, we can plug 1 into this equation to get

 $\displaystyle b$ $\displaystyle =$ $\displaystyle Ta_{i}v_{i}\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle a_{i}Tv_{i}\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle a_{i}T_{ji}v_{j}\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(T_{ji}a_{i}\right)v_{j} \ \ \ \ \ (13)$

In the second line, we used the fact that the ${a_{i}}$ are just numbers (not vectors), so they commute with ${T}$. In the last line, the quantity ${T_{ji}a_{i}}$ is the sum over the columns of ${T}$ and the rows of ${a}$ (we’re writing ${a}$ as a column vector), and so is a traditional product of an ${n\times n}$ matrix into an ${n}$-component column vector. Also, referring back to 8, we see that ${T_{ji}a_{i}}$ is ${b_{j}}$, the component of ${b}$ along the basis vector ${v_{j}}$.

We can apply similar logic to the product of two operators, ${T}$ and ${S}$. Suppose the product ${TS}$ operates on a basis vector ${v_{j}}$.

 $\displaystyle \left(TS\right)v_{j}$ $\displaystyle =$ $\displaystyle T\left(Sv_{j}\right)\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle TS_{pj}v_{p}\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle S_{pj}Tv_{p}\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle S_{pj}T_{ip}v_{i}\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(T_{ip}S_{pj}\right)v_{i}\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(TS\right)_{ij}v_{i} \ \ \ \ \ (19)$

In this derivation, we’ve used the fact that the matrix elements ${T_{ij}}$ and ${S_{ij}}$ are just numbers, so they commute with all operators. We also applied 1 in the second and fourth lines. By comparing the last two lines, we see that the matrix element ${\left(TS\right)_{ij}}$ of the product is formed by taking the usual matrix product of ${T}$ and ${S}$, that is, by multiplying rows of ${T}$ into columns of ${S}$:

$\displaystyle \left(TS\right)_{ij}=T_{ip}S_{pj} \ \ \ \ \ (20)$

Thus the traditional matrix product is actually a consequence of a consistent definition of a product of linear operators.

# Inverses of linear operators

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 3.

On a vector space ${V}$, a linear operator ${T}$ has an inverse ${S}$ if ${TSv=STv=v}$ for all ${v\in V}$. Here, we’re restricting the arguments given in Axler’s section 3.D by assuming that all linear operators act from ${V}$ back onto ${V}$. (Axler’s arguments allow ${T}$ to map vectors from ${V}$ into another vector space ${W}$.) We can show first that ${S}$ is unique:

Theorem 1 If a linear operator ${T}$ has an inverse, the inverse is unique.

Proof: Suppose that there are two distinct inverses ${S_{1}}$ and ${S_{2}}$. Then

$\displaystyle S_{1}=S_{1}I=S_{1}\left(TS_{2}\right)=\left(S_{1}T\right)S_{2}=IS_{2}=S_{2}$

We’ve made use of the fact that ${TS_{2}=S_{1}T=I}$, the identity operator, by the definition of the inverse, and we’ve also used the associativity property of a vector space to go from the third to the fourth term. $\Box$

Because the inverse is unique, we can refer to it by the notation ${T^{-1}}$.

There is an important general result about operator inverses:

Theorem 2 A linear operator is invertible if and only if it is both injective and surjective.

Proof: We first recall the definitions of injective and surjective. An injective operator is one-to-one, so that if ${Tv_{1}=Tv_{2}}$, then ${v_{1}=v_{2}}$. A surjective operator has the entire vector space ${V}$ as its range.

As this is an ‘if and only if’ proof, we need to prove the theorem in both directions. First, assume that ${T^{-1}}$ exists, so that the operator is invertible. Then for ${u,v\in V}$ suppose that ${Tu=Tv}$. Applying the inverse, we get

$\displaystyle T^{-1}Tu=u=T^{-1}Tv=v \ \ \ \ \ (1)$

so we must have ${u=v}$, making ${T}$ injective.

Next, to prove that the existence of an inverse implies surjectivity, we note that we can write any vector ${v\in V}$ as

$\displaystyle v=T\left(T^{-1}v\right) \ \ \ \ \ (2)$

That is, there is a vector ${T^{-1}v}$ which, when operated on by ${T}$, gives any vector ${v\in V}$. Thus every vector ${v\in V}$ is in the range of ${T}$, making ${T}$ surjective.

Now we need to prove that if ${T}$ is both injective and surjective, it has an inverse. Let the operator ${S}$ be defined by the property that for any vector ${w\in V}$, ${v=Sw}$ gives a vector ${v\in V}$ such that ${Tv=w}$. Because ${T}$ is injective, we know that ${v=Sw}$ must be unique, since only one vector ${v}$ can satisfy ${Tv=w}$. We also know that ${v}$ must exist for every ${w\in V}$ because ${T}$ is surjective, and must produce every vector in ${V}$ in its range. Since ${Tv=T\left(Sw\right)=\left(TS\right)w=w}$, we must have ${TS=I}$, so ${S}$ is an inverse on the RHS side of ${T}$. To prove that ${ST=I}$ as well, we have

$\displaystyle T\left(STv\right)=\left(TS\right)Tv=ITv=Tv \ \ \ \ \ (3)$

Comparing first and last terms, we see that ${ST=I}$. Thus ${TS=ST=I}$ and ${S=T^{-1}}$ so ${T}$ has an inverse.

[The full proof also requires that we show that ${S}$ is linear; the details are done in Axler’s theorem 3.56 if you’re interested.] $\Box$

In the above proof, we did not need to assume anything about the dimension of the vector space ${V}$, so that the result is valid for both finite and infinite-dimensional vector spaces. For a finite vector space, the result can be made even stricter. The fundamental theorem of linear maps states that for a finite vector space ${V}$ and linear operator ${T}$

$\displaystyle \mbox{dim }V=\mbox{dim null }T+\mbox{dim range }T \ \ \ \ \ (4)$

If ${T}$ is injective, then ${\mbox{dim null }T=0}$, since 0 is the only vector in the null space. Thus ${\mbox{dim range }T=\mbox{dim }V}$ and ${T}$ is surjective. Conversely, if we know that ${\mbox{dim range }T=\mbox{dim }V}$, then the theorem tells us that the null space has 0 dimensions. In other words, for a finite-dimensional vector space, injectivity implies surjectivity and vice versa. Thus, in this case, if we know that ${T}$ is either injective or surjective, we know it has an inverse.

# Linear operators: null space, range, injectivity and surjectivity

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 3.

We’ve looked at some basic properties of linear operators, so we’ll carry on with a few more definitions and theorems.

Null space and injectivity

First, we define the null space or kernel of an operator ${T}$ to be the set of all vectors which ${T}$ maps to the zero vector:

$\displaystyle \mbox{null }T=\left\{ v\in V:Tv=0\right\} \ \ \ \ \ (1)$

Theorem 1 The null space is a subspace.

Proof: Since ${T}$ is a linear operator, it maps the zero vector 0 to the zero vector, so ${0\in\mbox{null }T}$. If two other vectors ${u,v\in\mbox{null }T}$, then so is their sum, by the additivity property of linear operators. By the homogeneity property, if ${u\in\mbox{null }T}$ and ${\lambda\in\mathbb{F}}$, then ${T\left(\lambda u\right)=\lambda Tu=0}$. Thus ${\mbox{null }T}$ is closed under addition and scalar multiplication, and contains the 0 vector, so it is a subspace. $\Box$

An operator ${T}$ is injective if ${Tu=Tv\rightarrow u=v}$. That is, no two vectors are mapped to the same vector by the operator. An injective operator is also called one-to-one (or, in Zwiebach’s notes, two-to-two).

A useful result is the following:

Theorem 2 A linear operator ${T}$ is injective if and only if ${\mbox{null }T=\left\{ 0\right\} }$. That is, if the only vector mapped to 0 is 0 itself, then ${T}$ is one-to-one.

Proof: If ${T}$ is injective, then the only vector mapped to 0 is 0. To prove the converse, suppose ${\mbox{null }T=\left\{ 0\right\} }$ and assume there are two different vectors ${u,v}$ such that ${Tu=Tv}$. Then ${Tu-Tv=0=T\left(u-v\right)}$. However, since ${\mbox{null }T=\left\{ 0\right\} }$ the only vector in the null space is 0, so we must have ${u-v=0}$ or ${u=v}$. Thus ${T}$ is injective. $\Box$

Range and surjectivity

The range of an operator ${T}$ is the set of all vectors produced by operating on vectors with ${T}$. That is

$\displaystyle W=\mbox{range }T=\left\{ Tv:v\in V\right\} \ \ \ \ \ (2)$

The notation ${T\in\mathcal{L}\left(V,W\right)}$ means the operator ${T}$ operates on vector space ${V}$ and has range ${W}$.

Theorem 3 The range is a subspace.

Proof: As before, we must have ${T\left(0\right)=0}$, so the zero vector is in the range. To show that the range is closed under addition, choose two vectors ${v_{1},v_{2}\in V}$. The corresponding vectors in the range are ${w_{1}=Tv_{1}}$ and ${w_{2}=Tv_{2}}$. By linearity we must have ${T\left(v_{1}+v_{2}\right)=Tv_{1}+Tv_{2}=w_{1}+w_{2}}$, so ${W}$ is closed under addition. Similarly for scalar multiplication, if ${w=Tv}$ then ${T\left(\lambda v\right)=\lambda Tv=\lambda w}$. $\Box$

An operator is surjective if ${W=V}$, that is, if the range is the same as the original vector space upon which ${T}$ operates.

The null space and range of an operator obey the fundamental theorem of linear maps:

Theorem 4 If ${V}$ is finite-dimensional and ${T\in\mathcal{L}\left(V,W\right)}$ then ${\mbox{range }T}$ is finite-dimensional and

$\displaystyle \mbox{dim }V=\mbox{dim null }T+\mbox{dim range }T \ \ \ \ \ (3)$

That is, the dimension of the null space and the dimension of the range add up to the dimension of the original vector space on which ${T}$ operates. Note that this makes sense only for finite-dimensional vector spaces.

Proof: Let ${u_{1},\ldots,u_{m}}$ be a basis of ${\mbox{null }T}$. If ${\mbox{dim null }T<\mbox{dim }V}$, we can add some more vectors to the basis ${u_{1},\ldots,u_{m}}$ to get a basis for ${V}$ (we haven’t proved this, but the proof is in Axler section 2.33). Suppose we need another ${n}$ vectors to do this. We then have a basis for ${V}$:

$\displaystyle u_{1},\ldots,u_{m},v_{1},\ldots,v_{n} \ \ \ \ \ (4)$

The dimension of ${V}$ is therefore ${m+n}$. To complete the proof, we need to show that ${\mbox{dim range }T=n}$, which we can do if we can show that ${Tv_{1},\ldots,Tv_{n}}$ is a basis for ${\mbox{range }T}$.

Since 4 is a basis for ${V}$, we can write any vector ${v\in V}$ as a linear combination

$\displaystyle v=\sum_{i=1}^{m}a_{i}u_{i}+\sum_{j=1}^{n}b_{i}v_{i} \ \ \ \ \ (5)$

Operating on this equation with ${T}$ and using the fact that ${Tu_{i}=0}$ since the ${u_{i}}$ are a basis for the null space of ${T}$, we have

$\displaystyle Tv=\sum_{j=1}^{n}b_{i}Tv_{i} \ \ \ \ \ (6)$

Thus any vector in ${\mbox{range }T}$ can be written as a linear combination of the vectors ${Tv_{i}}$, so the vectors ${Tv_{i}}$ span the range of ${T}$.

To complete the proof, we need to show that the vectors ${Tv_{i}}$ are linearly independent and thus form a basis for ${\mbox{range }T}$. To do this, suppose we have the equation

$\displaystyle \sum_{i=1}^{n}c_{i}Tv_{i}=0 \ \ \ \ \ (7)$

By linearity, we have

$\displaystyle T\sum_{i=1}^{n}c_{i}v_{i}=0 \ \ \ \ \ (8)$

so ${\sum_{i=1}^{n}c_{i}v_{i}\in\mbox{null }T}$ and we can write

$\displaystyle \sum_{i=1}^{n}c_{i}v_{i}=\sum_{i=1}^{m}d_{i}u_{i} \ \ \ \ \ (9)$

However, because the list 4 of vectors is a basis for ${V}$, all the vectors in this list are linearly independent, which means that the only way two different linear combinations of these vectors can be equal is if all the coefficients are zero, that is, all the ${c_{i}}$s and ${d_{i}}$s are zero. Going back to 7, this means that the only solution is ${c_{i}=0}$ for all ${i}$, which means that the vectors ${Tv_{i}}$ are linearly independent and span ${\mbox{range }T}$, so they form a basis for ${\mbox{range }T}$. Thus ${\mbox{dim range }T=n}$.$\Box$

# Linear operators & commutators

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 3.

Having looked at some of the properties of a vector space, we can now look at linear maps. A linear map ${T}$ is defined as a function that maps one vector space ${V}$ into another (possibly the same) vector space ${W}$, written as

$\displaystyle T:V\rightarrow W \ \ \ \ \ (1)$

The linear map ${T}$ must satisfy the two properties

1. Additivity: ${T\left(u+v\right)=Tu+Tv}$ for all ${u,v\in V}$.
2. Homogeneity: ${T\left(\lambda v\right)=\lambda\left(Tv\right)}$ for all ${\lambda\in\mathbb{F}}$ and all ${v\in V}$. As usual, the field ${\mathbb{F}}$ is either the set of real or complex numbers.

This definition of a linear map is general in the sense that the two vector spaces ${V}$ and ${W}$ can be any two vector spaces. In physics, it’s more common to have ${V=W}$, and in such a case, the linear map ${T}$ is called a linear operator.

With a couple of extra definitions, the set ${\mathcal{L}\left(V\right)}$ of all linear operators on ${V}$ is itself a vector space, with the operators being the vectors. In order for this to be true, we need the following:

1. Zero operator: A zero operator, written as just 0 (the same symbol now being used for three distinct objects: the scalar 0, the vector 0 and the operator 0; again the correct meaning is usually easy to deduce from the context) which has the property that the result of acting with 0 on any vector produces the 0 vector. That is ${0v=0}$, where the 0 on the LHS is the zero operator and the 0 on the RHS is the zero vector.
2. Identity operator: An identity operator ${I}$ (sometimes written as 1) leaves any vector unchanged, so that ${Iv=v}$ for all ${v\in V}$.

With these definitions, ${\mathcal{L}\left(V\right)}$ is now a vector space, since it satisfies the distributive (additivity) and scalar multiplication (homogeneity) properties, contains an additive identity (the zero operator) and a multiplicative identity (the identity operator).

In addition, there is a natural definition of the multiplication of two linear operators ${S}$ and ${T}$, written as ${ST}$. When a product operates on a vector ${v\in V}$, we just operate from right to left in succession, so that

$\displaystyle \left(ST\right)v=S\left(Tv\right) \ \ \ \ \ (2)$

The product of two operators produces another operator also in ${\mathcal{L}\left(V\right)}$, since this product also satisfies additivity and homogeneity:

 $\displaystyle \left(ST\right)\left(u+v\right)$ $\displaystyle =$ $\displaystyle S\left(T\left(u+v\right)\right)\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle S\left(Tu+Tv\right)\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle STu+STv\ \ \ \ \ (5)$ $\displaystyle \left(ST\right)\left(\lambda v\right)$ $\displaystyle =$ $\displaystyle S\left(T\left(\lambda v\right)\right)\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle =$ $\displaystyle S\left(\lambda Tv\right)\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \lambda S\left(Tv\right)\ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \lambda\left(ST\right)v \ \ \ \ \ (9)$

A very important property of operator multiplication is that it is not commutative. We’ve already seen many examples of this in our journey through quantum mechanics with operators such as position and momentum, angular momentum and so on. The non-commutativity is a fundamental mathematical property however, and can be seen in other examples that have nothing to do with quantum theory.

For example, consider the left shift operator ${L}$ and right shift operator ${R}$, defined to act on the vector space consisting of infinite sequences of numbers. That is, our vector space ${V}$ is such that

$\displaystyle v=\left(x_{1},x_{2},x_{3},\ldots\right) \ \ \ \ \ (10)$

where ${x_{i}\in\mathbb{F}}$. The shift operators have the following effects:

 $\displaystyle Lv$ $\displaystyle =$ $\displaystyle \left(x_{2},x_{3},\ldots\right)\ \ \ \ \ (11)$ $\displaystyle Rv$ $\displaystyle =$ $\displaystyle \left(0,x_{1},x_{2},x_{3},\ldots\right) \ \ \ \ \ (12)$

The ${L}$ operator removes the first element in the sequence, while the ${R}$ operator inserts a 0 (number!) as the new first element in the sequence. Note that 0 is the only number we could insert into a sequence in order that ${R}$ be a linear operator, since from additivity above, we must have ${R0=0}$. That is, if we start with ${v=0}$ (the vector all of whose elements ${x_{i}=0}$), then ${R0}$ must also give the zero vector.

The two products ${LR}$ and ${RL}$ produce different results:

 $\displaystyle LRv$ $\displaystyle =$ $\displaystyle L\left(0,x_{1},x_{2},x_{3},\ldots\right)=\left(x_{1},x_{2},x_{3},\ldots\right)=v\ \ \ \ \ (13)$ $\displaystyle RLv$ $\displaystyle =$ $\displaystyle R\left(x_{2},x_{3},\ldots\right)=\left(0,x_{2},x_{3},\ldots\right)\ne v \ \ \ \ \ (14)$

The difference ${\left[L,R\right]\equiv LR-RL}$ is called the commutator of the two operators ${L}$ and ${R}$. If we introduce the operator which projects out the first element in the sequence:

 $\displaystyle P_{1}v$ $\displaystyle \equiv$ $\displaystyle P_{1}\left(x_{1},x_{2},x_{3},\ldots\right)=\left(x_{1},0,0,\ldots\right)\ \ \ \ \ (15)$ $\displaystyle Iv-P_{1}v$ $\displaystyle =$ $\displaystyle \left(x_{1},x_{2},x_{3},\ldots\right)-\left(x_{1},0,0,\ldots\right)\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(0,x_{2},x_{3},\ldots\right) \ \ \ \ \ (17)$

then we have

 $\displaystyle \left[L,R\right]v$ $\displaystyle =$ $\displaystyle Iv-\left(Iv-P_{1}v\right)\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle P_{1}v \ \ \ \ \ (19)$

# Vector spaces: span, linear independence and basis

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 2.

Here, we investigate the ideas of the span of a vector space and see how this leads to the idea of linear independence of a set of vectors. I’ll summarize the main definitions and results here for future use; a more complete explanation together with some examples is given in Axler’s book, Chapter 2.

Span of a list of vectors

A list of vectors is just a subset of the vectors in a vector space, with the condition that the number of vectors in the subset is finite. The set of all linear combinations of the vectors ${\left(v_{1},\ldots,v_{m}\right)}$ in a list is called the span of that list. Since a general linear combination has the form

$\displaystyle v=\sum_{i=1}^{m}a_{i}v_{i} \ \ \ \ \ (1)$

where ${a_{i}\in\mathbb{F}}$ (recall that the field ${\mathbb{F}}$ is always taken to be either the real numbers ${\mathbb{R}}$ or the complex numbers ${\mathbb{C}}$), the span of a list itself forms a vector space which is a subspace of the original vector space. One result we can show is

Theorem 1 The span of a list of vectors in a vector space ${V}$ is the smallest subspace of ${V}$ containing all the vectors in the list.

Proof: Let the list be ${L\equiv\left(v_{1},\ldots,v_{m}\right)}$. Then ${S\equiv\mbox{span}\left(v_{1},\ldots,v_{m}\right)}$ is a subspace since it contains the zero vector if all ${a_{i}}$s are zero in 1, and since it contains all linear combinations of the list, it is closed under addition and scalar multiplication.

The span ${S}$ contains all ${v_{j}\in L}$ (just set ${a_{j}=\delta_{ij}}$ in 1). Now if we look at a subspace of ${V}$ that contains all the ${v_{i}}$s, it must also contain every vector in the span ${S}$, since a subspace must be closed under addition and scalar multiplication. Thus ${S}$ is the smallest subspace of ${V}$ that contains all the vectors in ${L}$. $\Box$

If ${S\equiv\mbox{span}\left(v_{1},\ldots,v_{m}\right)=V}$, that is, the span of a list is the same as the original vector space, then we say that ${\left(v_{1},\ldots,v_{m}\right)}$ spans ${V}$. This leads to the definition that a vector space is called finite-dimensional if it is spanned by some list of vectors. (Remember that all lists are finite in length!) A vector space that is not finite-dimensional is called (not surprisingly) infinite-dimensional.

Linear independence

Suppose a list ${\left(v_{1},\ldots,v_{m}\right)\in V}$ and ${v}$ is a vector such that ${v\in\mbox{span}\left(v_{1},\ldots,v_{m}\right)}$. This means that ${v}$ is a linear combination of ${\left(v_{1},\ldots,v_{m}\right)}$, so that 1 is true. However, using only the definitions above, there is no guarantee that there is only one choice for the scalars ${a_{i}}$ that satisfies 1. We might also have, for example

$\displaystyle v=\sum_{i=1}^{m}c_{i}v_{i} \ \ \ \ \ (2)$

where ${c_{i}\ne a_{i}}$. This means that we can write the zero vector as

$\displaystyle 0=\sum_{i=1}^{m}\left(a_{i}-c_{i}\right)v_{i} \ \ \ \ \ (3)$

Now, if the only way we can satsify this equation is to require that ${a_{i}=c_{i}}$ for all ${i}$, then we say that the list ${\left(v_{1},\ldots,v_{m}\right)}$ is linearly independent. (For completeness, the empty list (containing no vectors) is also declared to be linearly independent.) By reversing the above argument, we see that if the list ${\left(v_{1},\ldots,v_{m}\right)}$ is linearly independent, then there is only one set of scalars ${a_{i}}$ such that 1 is satisfied. In other words, any vector ${v\in\mbox{span}\left(v_{1},\ldots,v_{m}\right)}$ has only one representation as a linear combination of the vectors in the list.

A list that is not linearly independent is, again not surprisingly, defined to be linearly dependent. This leads to the linear dependence lemma:

Lemma 2 Suppose ${\left(v_{1},\ldots,v_{m}\right)}$ is a linearly dependent list in ${V}$. Then there exists some ${j\in\left\{ 1,2,\ldots,m\right\} }$ such that

(a) ${v_{j}\in\mbox{span}\left(v_{1},\ldots,v_{j-1}\right)}$;

(b) if ${v_{j}}$ is removed from the list ${\left(v_{1},\ldots,v_{m}\right)}$, the span of the remaining list, containing ${m-1}$ vectors, equals the span of the original list.

Proof: Because ${\left(v_{1},\ldots,v_{m}\right)}$ is linearly dependent, we can write

$\displaystyle \sum_{i=1}^{m}a_{i}v_{i}=0 \ \ \ \ \ (4)$

where not all of the ${a_{i}}$s are zero. Suppose ${j}$ is the largest index where ${a_{j}\ne0.}$ Then we can divide through by ${a_{j}}$ to get

$\displaystyle v_{j}=-\frac{1}{a_{j}}\sum_{i=1}^{j-1}a_{i}v_{i} \ \ \ \ \ (5)$

Thus ${v_{j}}$ is a linear combination of other vectors in the list, which proves part (a). Part (b) follows from the fact that we can represent any vector ${u\in\mbox{span}\left(v_{1},\ldots,v_{m}\right)}$ as

$\displaystyle u=\sum_{i=1}^{m}a_{i}v_{i} \ \ \ \ \ (6)$

We can replace ${v_{j}}$ in this sum by 5, so ${u}$ can be written as a linear combination of all the vectors in the list ${\left(v_{1},\ldots,v_{m}\right)}$ except for ${v_{j}}$. Thus (b) is true. $\Box$

We can use this lemma to prove the main result about linearly independent lists:

Theorem 3 In a finite-dimensional vector space ${V}$, the length of every linearly independent list is less than or equal to the length of every list that spans ${V}$.

Proof: Suppose the list ${A\equiv\left(u_{1},\ldots,u_{m}\right)}$ is linearly independent in ${V}$, and suppose another list ${B\equiv\left(w_{1},\ldots,w_{n}\right)}$ spans ${V}$. We want to prove that ${m\le n}$.

Since ${B}$ already spans ${V}$, if we add any other vector from ${V}$ to the list ${B}$, we will get a linearly dependent list, since this newly added vector can, by the definition of a span, be expressed a linear combination of the vectors in ${B}$. In particular, if we add ${u_{1}}$ from the list ${A}$ to ${B}$, then the list ${\left(u_{1},w_{1},\ldots,w_{n}\right)}$ is linearly dependent. By the linear independence lemma above, we can therefore remove one of the ${w_{i}}$s from ${B}$ so that the remaining list still spans ${V}$, and contains ${n}$ vectors. For the sake of argument, let’s say we remove ${w_{n}}$ (we can always order the ${w_{i}}$s in the list so that the element we remove is at the end). Then we’re left with the revised list ${B_{1}=\left(u_{1},w_{1},\ldots,w_{n-1}\right)}$.

We can repeat this process ${m}$ times, each time adding the next element ${u_{i}}$ from list ${A}$ and removing the last ${w_{i}}$. Because of the linear dependence lemma, we know that there must always be a ${w_{i}}$ that can be removed each time we add a ${u_{i}}$, so there must be at least as many ${w_{i}}$s as ${u_{i}}$s. In other words, ${m\le n}$ which is what we wanted to prove. $\Box$

This theorem can be used to show easily that any list of more than ${n}$ vectors in ${n}$-dimensional space cannot be linearly independent, since we know that we can span ${n}$-dimensional space with ${n}$ vectors (for example, the 3 coordinate axes in 3-d space). Conversely, since we can find a list of ${n}$ vectors in ${n}$-dimensional space that is linearly independent, any list of fewer than ${n}$ vectors cannot span ${n}$-dimensional space.

Basis of a finite-dimensional vector space

A basis of a finite-dimensional vector space is defined to be a list that is both linearly independent and spans the space. The dimension of the vector space is defined to be the length of a basis list. For example, in 3-d space, the list ${\left\{ \left(1,0,0\right),\left(0,1,0\right),\left(0,0,1\right)\right\} }$ is a basis, and since the length is 3, the dimension of the vector space is also 3. Any proper subset (that is, a subset with fewer than 3 members) of this basis is also linearly independent, but it does not span the space so is not a basis. For example, the list ${\left\{ \left(1,0,0\right),\left(0,1,0\right)\right\} }$ is linearly independent, but spans only the ${xy}$ plane.

# Subspaces and direct sums

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 1.

Subspaces

Having defined a general vector space, we can now consider subspaces of a vector space. Put simply, a subspace is a subset of a vector space that is itself a vector space. That is, the elements of the subspace must satisfy all the conditions of a vector space, which are

1. A vector space is a set ${V}$ with two operations, addition and scalar multiplication, defined on the set.
2. The addition property is a function that assigns an element ${u+v\in V}$ to each pair of elements ${u,v\in V}$. Note that this definition implies completeness, in the sense that every sum of two vectors in ${V}$ must also be in ${V}$. This definition includes the traditional notion of vector addition in 2-d or 3-d space (that is, where a vector is represented by an arrow, and vector addition is performed by putting the tail of the second vector onto the head of the first and drawing the resulting vector as the sum), but vector addition is much more general than that.
3. Scalar multiplication means that we can take an ordinary number ${\lambda}$ from some field ${\mathbb{F}}$ (in quantum theory, ${\mathbb{F}}$ will always be either the set of real numbers ${\mathbb{R}}$ or the set of complex numbers ${\mathbb{C}}$) and define a function in which the vector obtained by multiplying an existing vector ${v}$ by ${\lambda}$ gives another vector ${\lambda v\in V}$. Note that again, completeness is implied by this definition: every vector ${\lambda v}$ obtained through scalar multiplication must also be in the space ${V}$.
4. Addition is commutative, so that ${u+v=v+u}$.
5. Addition and scalar multiplication are associative, so that ${\left(u+v\right)+w=u+\left(v+w\right)}$ and ${\left(ab\right)v=a\left(bv\right)}$, where ${u,v,w\in V}$ and ${a,b\in\mathbb{F}}$.
6. There is an additive identity element ${0\in V}$ such that ${v+0=v}$ for all ${v\in V}$. Note that here 0 is a vector, not a scalar. In practice, there is also a zero scalar number which is also denoted by 0, so we need to rely on the context to tell whether 0 refers to a vector or a number. Usually this isn’t too hard.
7. Every vector ${v\in V}$ has an additive inverse ${w\in V}$ with the property that ${u+w=0}$. The additive inverse of ${v}$ is written as ${-v}$ and ${w-v}$ is defined to be ${w+\left(-v\right)}$.
8. There is a (scalar) multiplicative identity number 1 with the property that ${1v=v}$ for all ${v\in V}$.
9. Scalar multiplication is distributive, in the sense that ${a\left(u+v\right)=au+av}$ and ${\left(a+b\right)v=av+bv}$ for all ${a,b\in\mathbb{F}}$ and all ${u,v\in V}$.

Since a subset will automatically satisfy all the conditions except possibly for numbers 2, 3 and 6, we need test only these three conditions to verify that a subset is a subspace. In particular, we need to check that

• The subset is closed under addition (satisfying property 2 above).
• The subset is closed under scalar multiplication (satisfying property 3).
• The subset contains the additive identity element 0 (satisfying property 6).

As an example, suppose we start with a vector space consisting of all the vectors pointing to locations in 3-d space (here, a ‘vector’ is the traditional line with an arrow on the end). Any subspace of this vector space must contain the origin (the additive identity). One possible subspace is just the origin on its own, since it satisfies all the above properties.

Any other subspace must be infinite in extent, since multiplication by a scalar can increase any non-zero to an arbitrarily large value. Thus any finite 3-d volume cannot be a subspace, even if it includes the origin. The possible subspaces are all planes that contain the origin (giving 2-d subspaces), and all lines that contain the origin (1-d subspaces).

Given any two subspaces ${U_{1}}$ and ${U_{2}}$, their intersection ${U_{1}\cap U_{2}}$ is also a subspace. Being a physicist, I’m not going to give a rigorous proof of this, but the argument would go something like this. Any subspace is closed under both addition and scalar multiplication, so if both ${U_{1}}$ and ${U_{2}}$ contain the vectors ${u}$ and ${v}$, they must both also contain all vectors of the form ${au+bv}$, where ${a,b\in\mathbb{F}}$. Thus their intersection will also contain all these vectors, making ${U_{1}\cap U_{2}}$ a subspace.

Direct sums

A vector space ${V}$ can be written as a direct sum of subspaces ${U_{1},\ldots,U_{m}}$, written as

$\displaystyle V=U_{1}\oplus\ldots\oplus U_{m} \ \ \ \ \ (1)$

provided that any vector ${v\in V}$ can be written uniquely as

$\displaystyle v=u_{1}+\ldots+u_{m} \ \ \ \ \ (2)$

where ${u_{i}\in U_{i}}$. That is, any vector can be written as a sum consisting of one vector from each subspace. Note that since all subspaces contain the zero vector 0, one or more of the ${u_{i}}$ could be 0. [Axler uses an ordinary plus sign + to denote a direct sum.]

The subspaces ${U_{i}}$ in a direct sum cannot overlap (apart from containing 0). That is ${U_{i}\cap U_{j}=\left\{ 0\right\} }$ if ${i\ne j}$. We can see this since if we consider some vector ${w\in U_{i}\cap U_{j}}$, then ${-w\in U_{i}\cap U_{j}}$ (since ${-w}$ is the additive inverse of ${w}$; see condition 7 above). Thus the following two decompositions of a vector ${v}$ are both valid:

 $\displaystyle v$ $\displaystyle =$ $\displaystyle u_{1}+\ldots+u_{i}+\ldots+u_{j}+u_{m}\ \ \ \ \ (3)$ $\displaystyle v$ $\displaystyle =$ $\displaystyle u_{1}+\ldots+\left(u_{i}+w\right)+\ldots+\left(u_{j}-w\right)+u_{m} \ \ \ \ \ (4)$

Thus the decomposition isn’t unique unless ${U_{i}\cap U_{j}=\left\{ 0\right\} }$ if ${i\ne j}$.

Another way of saying this is that the only way of writing 0 as a decomposition is if ${u_{i}=0}$ for all ${i}$. (Since that is a valid decomposition of 0, and it must be unique.)

As an example, the space of 3-d points considered above can be decomposed into one plane and one line not in that plane (for example, the ${xy}$ plane and the ${z}$ axis), or into 3 non-coplanar lines (for example, the ${x,y}$ and ${z}$ axes).

As another example, consider the space of polynomials ${p\left(z\right)}$ of degree ${N}$. We can decompose this into a subspace of polynomials containing only even powers of ${z}$ and another subspace containing only odd powers of ${z}$. Both subspaces contain 0 (obtained by setting all the coefficients to zero), and any general polynomial of degree ${N}$ can be formed from the sum of one polynomial containing even powers and another containing odd powers.

# Vector spaces: definitions and examples

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 1.

It appears that one of my stumbling blocks in trying to get to grips with quantum field theory is an insufficient understanding of linear algebra, so here we’ll start looking at this subject in a bit more depth than is typical in an introductory course. In my undergraduate physics degree (back in the 1970s) I didn’t really get any further with quantum theory than the level covered in Griffiths’s introductory textbook. As good as this book is, it doesn’t give you enough background to leap into quantum field theory.

The foundation of linear algebra is the concept of a vector space. The definition of a vector space is as follows:

• A vector space is a set ${V}$ with two operations, addition and scalar multiplication, defined on the set.
• The addition property is a function that assigns an element ${u+v\in V}$ to each pair of elements ${u,v\in V}$. Note that this definition implies completeness, in the sense that every sum of two vectors in ${V}$ must also be in ${V}$. This definition includes the traditional notion of vector addition in 2-d or 3-d space (that is, where a vector is represented by an arrow, and vector addition is performed by putting the tail of the second vector onto the head of the first and drawing the resulting vector as the sum), but vector addition is much more general than that.
• Scalar multiplication means that we can take an ordinary number ${\lambda}$ from some field ${\mathbb{F}}$ (in quantum theory, ${\mathbb{F}}$ will always be either the set of real numbers ${\mathbb{R}}$ or the set of complex numbers ${\mathbb{C}}$) and define a function in which the vector obtained by multiplying an existing vector ${v}$ by ${\lambda}$ gives another vector ${\lambda v\in V}$. Note that again, completeness is implied by this definition: every vector ${\lambda v}$ obtained through scalar multiplication must also be in the space ${V}$.
• Addition is commutative, so that ${u+v=v+u}$.
• Addition and scalar multiplication are associative, so that ${\left(u+v\right)+w=u+\left(v+w\right)}$ and ${\left(ab\right)v=a\left(bv\right)}$, where ${u,v,w\in V}$ and ${a,b\in\mathbb{F}}$.
• There is an additive identity element ${0\in V}$ such that ${v+0=v}$ for all ${v\in V}$. Note that here 0 is a vector, not a scalar. In practice, there is also a zero scalar number which is also denoted by 0, so we need to rely on the context to tell whether 0 refers to a vector or a number. Usually this isn’t too hard.
• Every vector ${v\in V}$ has an additive inverse ${w\in V}$ with the property that ${u+w=0}$. The additive inverse of ${v}$ is written as ${-v}$ and ${w-v}$ is defined to be ${w+\left(-v\right)}$.
• There is a (scalar) multiplicative identity number 1 with the property that ${1v=v}$ for all ${v\in V}$.
• Scalar multiplication is distributive, in the sense that ${a\left(u+v\right)=au+av}$ and ${\left(a+b\right)v=av+bv}$ for all ${a,b\in\mathbb{F}}$ and all ${u,v\in V}$.

Real and complex vector spaces

A real vector space is a vector space in which all the scalars are drawn from the set of real numbers ${\mathbb{R}}$, and a complex vector space is one where all the scalars are drawn from the set of complex numbers ${\mathbb{C}}$. It is important to note that we do not refer to the actual vectors as real or complex; they are simply vectors. The nature of the vector space is determined by the field ${\mathbb{F}}$ from which the scalars are taken. This can be confusing to beginners, since the temptation is to look at some of the vectors in a vector space to see if they contain real or complex numbers and label the vector space based on that. That doesn’t always work, as the following example shows.

Example 1 The set of ${N\times N}$ complex hermitian matrices is a real (not complex!) vector space. Recall that a hermitian matrix ${M}$ is one whose complex conjugate transpose equals the original matrix, that is, ${M^{\dagger}=M}$.

To see this, look at a general ${2\times2}$ hermitian matrix, which has the form

$\displaystyle M=\left[\begin{array}{cc} c+d & a-ib\\ a+ib & c-d \end{array}\right] \ \ \ \ \ (1)$

where ${a,b,c,d\in\mathbb{R}}$. Each matrix ${M}$ is a vector in this vector space. Note that with the general definition of a vector above, a matrix can be considered to be a vector. This illustrates that the notion of a vector as defined above is more general than a line with an arrow on one end.

With addition defined as the usual matrix addition, and scalar multiplication by a real number ${x\in\mathbb{R}}$ also defined in the usual way for a matrix, that is

$\displaystyle xM=\left[\begin{array}{cc} x\left(c+d\right) & x\left(a-ib\right)\\ x\left(a+ib\right) & x\left(c-d\right) \end{array}\right] \ \ \ \ \ (2)$

we can grind through the requirements above to verify that this set is a vector space. For example, if we have two hermitian matrices ${M_{1}}$ and ${M_{2}}$ then ${M_{1}+M_{2}}$ is also hermitian. Also ${\left(xM\right)^{\dagger}=xM}$ and so on.

However, if we had chosen a complex number ${z\in\mathbb{C}}$ as the scalar to multiply by, we’d get

 $\displaystyle zM$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} z\left(c+d\right) & z\left(a-ib\right)\\ z\left(a+ib\right) & z\left(c-d\right) \end{array}\right]\ \ \ \ \ (3)$ $\displaystyle \left(zM\right)^{\dagger}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} z^*\left(c+d\right) & z^*\left(a-ib\right)\\ z^*\left(a+ib\right) & z^*\left(c-d\right) \end{array}\right]\ne zM \ \ \ \ \ (4)$

Thus even though the vectors ${M}$ in this vector space contain complex numbers, the vector space to which they belong is a real vector space because the scalars used in scalar multiplication must be real.

Example 2 The set of polynomials of degree ${\le N}$ is a vector space. Whether it is real or complex depends on which set of scalars we choose. A general polynomial of degree ${n}$, where ${n\ge0}$ is an integer, is

$\displaystyle p\left(z\right)=a_{0}+a_{1}z+a_{2}z^{2}+\ldots+a_{n}z^{n} \ \ \ \ \ (5)$

If all the ${a_{i}}$s are real and ${z\in\mathbb{R}}$, then if we choose our scalars from ${\mathbb{R}}$ we have a real vector space. Addition of polynomials follows the usual rule. If

$\displaystyle q\left(z\right)=b_{0}+b_{1}z+b_{2}z^{2}+\ldots+b_{n}z^{n} \ \ \ \ \ (6)$

then

$\displaystyle \left(p+q\right)\left(z\right)=\left(a_{0}+b_{0}\right)+\left(a_{1}+b_{1}\right)z+\left(a_{2}+b_{2}\right)z^{2}+\ldots+\left(a_{n}+b_{n}\right)z^{n} \ \ \ \ \ (7)$

from which it’s fairly obvious that ${p+q}$ is another polynomial of degree ${n}$. Scalar multiplication also works as expected:

$\displaystyle xp\left(z\right)=xa_{0}+xa_{1}z+xa_{2}z^{2}+\ldots+xa_{n}z^{n} \ \ \ \ \ (8)$

so that ${xp}$ is also in the vector space. The additive inverse of ${p}$ above would be ${q}$ if ${b_{i}=-a_{i}}$ for all ${i}$.

Example 3 The set of complex functions ${f\left(x\right)}$ on a finite interval ${x\in\left[0,L\right]}$ form a complex vector space. Addition and scalar multiplication are defined in the usual way as

 $\displaystyle \left(f_{1}+f_{2}\right)\left(x\right)$ $\displaystyle =$ $\displaystyle f_{1}\left(x\right)+f_{2}\left(x\right)\ \ \ \ \ (9)$ $\displaystyle \left(af\right)\left(x\right)$ $\displaystyle =$ $\displaystyle af\left(x\right) \ \ \ \ \ (10)$

We’ve seen such functions as solutions of the infinite square well in ordinary quantum mechanics.

There are several other properties of vector spaces which follow from the requirements above. We won’t go through all of them, but the proofs of a couple of the simpler ones are instructive as to how these sorts of results are derived. Note in the following that we need to verify each step by stating which of the above properties we’re using to justify that step.

Theorem The additive identity ${0}$ is unique.

Proof: (by contradiction). Suppose there are two distinct additive identities ${0}$ and ${0^{\prime}}$. Then

 $\displaystyle 0^{\prime}$ $\displaystyle =$ $\displaystyle 0^{\prime}+0\mbox{ (since 0 is an additive identity)}\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0+0^{\prime}\mbox{ (commutative addition)}\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0\mbox{ (since \ensuremath{0^{\prime}} is an additive identity)} \ \ \ \ \ (13)$

Theorem The additive inverse of each vector in a vector space is unique.

Proof: (again by contradiction). Suppose ${v\in V}$ has two different additive inverses ${w}$ and ${w^{\prime}}$. Then

 $\displaystyle w$ $\displaystyle =$ $\displaystyle w+0\mbox{ (additive identity)}\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle w+\left(v+w^{\prime}\right)\mbox{ (\ensuremath{w^{\prime}} is an additive inverse of \ensuremath{v})}\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(w+v\right)+w^{\prime}\mbox{ (addition is associative)}\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0+w^{\prime}\mbox{ (\ensuremath{w} is an additive inverse of \ensuremath{v})}\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle w^{\prime}\mbox{ (additive identity)} \ \ \ \ \ (18)$

# Every attractive 1-dimensional potential has a bound state

Reference: Lecture by Barton Zwiebach in MIT course 8.05.1x, week 1.

An interesting application of the variational principle in quantum mechances is the following theorem:

Theorem Every 1-dimensional attractive potential has at least one bound state.

To prove this, we need first to define what we mean by an attractive potential ${V\left(x\right)}$. ${V\left(x\right)}$ must satisfy the following conditions:

• ${V\left(x\right)\rightarrow0}$ as ${x\rightarrow\pm\infty}$.
• ${V\left(x\right)<0}$ everywhere.
• ${V\left(x\right)}$ is piecewise continuous. This means that it may have a finite number of jump discontinuities.

One possible form for ${V\left(x\right)}$ is as shown:

This is a particularly simple potential that satisfies the above conditions. We could introduce a few step functions, multiple local maxima and minima, and so on, provided we don’t violate any of the 3 conditions above.

Since ${V\left(x\right)<0}$ everywhere, we can write it as

$\displaystyle V\left(x\right)=-\left|V\left(x\right)\right| \ \ \ \ \ (1)$

What we would like to prove is that for any hamiltonian of the form

$\displaystyle H=-\frac{\hbar^{2}}{2m}\frac{d^{2}}{dx^{2}}-\left|V\left(x\right)\right| \ \ \ \ \ (2)$

the ground state ${E_{0}}$ is a bound state, that is

$\displaystyle E_{0}<0 \ \ \ \ \ (3)$

We can apply the variational principle, which states

If ${\psi}$ is any normalized function and ${H}$ is a hamiltonian, then the ground state energy ${E_{0}}$ of this hamiltonian has an upper bound given by

$\displaystyle E_{0}\le\left\langle \psi\left|H\right|\psi\right\rangle \equiv\left\langle H\right\rangle \ \ \ \ \ (4)$

The use of the variational principle to prove the above theorem involves a bit of a convoluted argument, but the mathematics involved is fairly simple. Our goal is to find some wave function ${\psi_{\alpha}}$ (where ${\alpha}$ is some parameter that we can vary) so that

$\displaystyle E_{0}\le\left\langle \psi_{\alpha}\left|H\right|\psi_{\alpha}\right\rangle =\left\langle H\right\rangle _{\psi_{\alpha}}<0 \ \ \ \ \ (5)$

From 2 we have

 $\displaystyle \left\langle \hat{H}\right\rangle _{\psi_{\alpha}}$ $\displaystyle =$ $\displaystyle \int dx\;\psi_{\alpha}\left(x\right)\hat{H}\psi_{\alpha}\left(x\right)\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle T\right\rangle _{\psi_{\alpha}}-\left\langle \left|V\left(x\right)\right|\right\rangle _{\psi_{\alpha}} \ \ \ \ \ (7)$

where

 $\displaystyle \left\langle T\right\rangle _{\psi_{\alpha}}$ $\displaystyle =$ $\displaystyle -\int dx\;\psi_{\alpha}\left(x\right)\frac{\hbar^{2}}{2m}\frac{d^{2}}{dx^{2}}\psi_{\alpha}\left(x\right)\ \ \ \ \ (8)$ $\displaystyle \left\langle \left|V\left(x\right)\right|\right\rangle _{\psi_{\alpha}}$ $\displaystyle =$ $\displaystyle \int dx\;\psi_{\alpha}\left(x\right)\left|V\left(x\right)\right|\psi_{\alpha}\left(x\right) \ \ \ \ \ (9)$

We can integrate 8 by parts once to get

 $\displaystyle \left\langle T\right\rangle _{\psi_{\alpha}}$ $\displaystyle =$ $\displaystyle -\int dx\;\psi_{\alpha}\left(x\right)\frac{\hbar^{2}}{2m}\frac{d^{2}}{dx^{2}}\psi_{\alpha}\left(x\right)\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\left.\frac{\hbar^{2}}{2m}\psi_{\alpha}\left(x\right)\frac{d}{dx}\psi_{\alpha}\left(x\right)\right|_{-\infty}^{\infty}+\frac{\hbar^{2}}{2m}\int dx\;\left(\frac{d}{dx}\psi_{\alpha}\left(x\right)\right)^{2}\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\hbar^{2}}{2m}\int_{-\infty}^{\infty}dx\;\left(\frac{d}{dx}\psi_{\alpha}\left(x\right)\right)^{2} \ \ \ \ \ (12)$

where we invoke the usual requirement that ${\psi_{\alpha}}$ and its first derivative vanish at infinity.

We therefore see that since the integrand in the last line is always positive (we’re assuming that ${\psi_{\alpha}}$ is not zero everywhere), that ${\left\langle T\right\rangle _{\psi_{\alpha}}>0}$. Likewise, from 9, ${\left\langle \left|V\left(x\right)\right|\right\rangle _{\psi_{\alpha}}>0}$. Thus in order that ${\left\langle H\right\rangle _{\psi_{\alpha}}<0}$, we must have

$\displaystyle \left\langle T\right\rangle _{\psi_{\alpha}}<\left\langle \left|V\left(x\right)\right|\right\rangle _{\psi_{\alpha}} \ \ \ \ \ (13)$

To get any further, we need to choose a test function ${\psi_{\alpha}\left(x\right)}$. We’ll pick (because it works!)

$\displaystyle \psi_{\alpha}=\left(\frac{\alpha}{\pi}\right)^{1/4}e^{-\frac{1}{2}\alpha x^{2}} \ \ \ \ \ (14)$

The factor of ${\left(\frac{\alpha}{\pi}\right)^{1/4}}$ is required so that ${\psi_{\alpha}}$ is normalized. The integral in 12 can be done using standard methods; I’ll just use Maple, and we find

$\displaystyle \left\langle T\right\rangle _{\psi_{\alpha}}=\frac{\hbar^{2}\alpha}{4m} \ \ \ \ \ (15)$

The integral 9 of course can’t be done exactly if we don’t know what ${V}$ is, so we have just

$\displaystyle \left\langle \left|V\left(x\right)\right|\right\rangle _{\psi_{\alpha}}=\int dx\;\psi_{\alpha}^{2}\left(x\right)\left|V\left(x\right)\right| \ \ \ \ \ (16)$

(No need for modulus signs around ${\psi_{\alpha}}$ since the function 14 is real.) To progress further, we need to start invoking some inequalities to get where we want to go. The argument consists of several steps, so watch carefully as we go along.

From 13 through 15 we have to show that we can satisfy the condition

$\displaystyle \frac{\left\langle \left|V\left(x\right)\right|\right\rangle _{\psi_{\alpha}}}{\left\langle T\right\rangle _{\psi_{\alpha}}}=\frac{4m}{\hbar^{2}\sqrt{\pi}}\frac{1}{\sqrt{\alpha}}\int_{-\infty}^{\infty}e^{-\alpha x^{2}}\left|V\left(x\right)\right|dx>1 \ \ \ \ \ (17)$

Since ${V}$ is arbitrary subject to the 3 conditions above, the only thing we can legitimately fiddle with is the value of ${\alpha}$. We can see that if we choose ${\alpha}$ small enough, we should be able to satisfy this inequality, since for small ${\alpha}$, the ${1/\sqrt{\alpha}}$ term gets large, while the ${e^{-\alpha x^{2}}}$ term in the integrand is bounded between 0 and 1. We need to find some upper limit for ${\alpha}$.

In what follows, you’ll need to refer to the following diagram:

First, we choose some point ${x_{0}}$ at which ${V\left(x_{0}\right)}$ is continuous (that is, we ensure that ${x_{0}}$ isn’t at one of the points where ${V\left(x\right)}$ has a discontinuity, or jump). The value of ${V\left(x_{0}\right)}$ is defined as ${-2v_{0}}$ where ${v_{0}>0}$. Because ${V\rightarrow0}$ at ${x\rightarrow\pm\infty}$, there must be points ${x_{1}}$ and ${x_{2}}$ on either side of ${x_{0}}$ where ${V}$ has the value ${-v_{0}}$ (actually, I’m not sure this is strictly true, because, as ${V}$ is allowed a few jumps, it might jump over the point where it’s equal to ${-v_{0}}$. However, as the number of jumps is required to be finite, there must be some points ${x_{1}}$ and ${x_{2}}$ on either side of ${x_{0}}$ where ${V}$ attains a value that is between ${-2v_{0}}$ and 0, and I think the argument below still works if we choose those points instead.)

Now for the first inequality. We know that, because the integrand is positive

$\displaystyle \int_{-\infty}^{\infty}e^{-\alpha x^{2}}\left|V\left(x\right)\right|dx>\int_{x_{1}}^{x_{2}}e^{-\alpha x^{2}}\left|V\left(x\right)\right|dx \ \ \ \ \ (18)$

Second inequality: in the interval ${x_{1}}$ to ${x_{2}}$, ${\left|V\left(x\right)\right|>v_{0}}$ (see the diagram!), so we have

$\displaystyle \int_{x_{1}}^{x_{2}}e^{-\alpha x^{2}}\left|V\left(x\right)\right|dx>v_{0}\int_{x_{1}}^{x_{2}}e^{-\alpha x^{2}}dx \ \ \ \ \ (19)$

The last integral has no closed form solution, but we know that in the interval ${x_{1}}$ to ${x_{2}}$

$\displaystyle e^{-\alpha x^{2}}>e^{-\alpha\max\left(x_{1}^{2},x_{2}^{2}\right)} \ \ \ \ \ (20)$

Therefore

 $\displaystyle v_{0}\int_{x_{1}}^{x_{2}}e^{-\alpha x^{2}}dx$ $\displaystyle >$ $\displaystyle v_{0}\int_{x_{1}}^{x_{2}}e^{-\alpha\max\left(x_{1}^{2},x_{2}^{2}\right)}dx\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle v_{0}\left(x_{2}-x_{1}\right)e^{-\alpha\max\left(x_{1}^{2},x_{2}^{2}\right)} \ \ \ \ \ (22)$

Now suppose we choose ${\alpha}$ to be

$\displaystyle \alpha<\frac{1}{\max\left(x_{1}^{2},x_{2}^{2}\right)} \ \ \ \ \ (23)$

Then

$\displaystyle e^{-\alpha\max\left(x_{1}^{2},x_{2}^{2}\right)}>e^{-1} \ \ \ \ \ (24)$

We can now summarize as follows:

$\displaystyle \int_{-\infty}^{\infty}e^{-\alpha x^{2}}\left|V\left(x\right)\right|dx>v_{0}\left(x_{2}-x_{1}\right)e^{-1} \ \ \ \ \ (25)$

provided we choose ${\alpha}$ according to 23. Plugging this back into 17 we have

$\displaystyle \frac{\left\langle \left|V\left(x\right)\right|\right\rangle _{\psi_{\alpha}}}{\left\langle T\right\rangle _{\psi_{\alpha}}}>\frac{4m}{\hbar^{2}\sqrt{\pi}}\frac{v_{0}\left(x_{2}-x_{1}\right)}{e}\frac{1}{\sqrt{\alpha}} \ \ \ \ \ (26)$

This expression will now be greater than 1 provided that

 $\displaystyle \sqrt{\alpha}$ $\displaystyle <$ $\displaystyle \frac{4m}{\hbar^{2}\sqrt{\pi}}\frac{v_{0}\left(x_{2}-x_{1}\right)}{e}\ \ \ \ \ (27)$ $\displaystyle \alpha$ $\displaystyle <$ $\displaystyle \left[\frac{4m}{\hbar^{2}\sqrt{\pi}}\frac{v_{0}\left(x_{2}-x_{1}\right)}{e}\right]^{2} \ \ \ \ \ (28)$

Comparing 23 and 28, we see that we can satisfy both conditions if we take

$\displaystyle \alpha<\min\left\{ \frac{1}{\max\left(x_{1}^{2},x_{2}^{2}\right)},\left[\frac{4m}{\hbar^{2}\sqrt{\pi}}\frac{v_{0}\left(x_{2}-x_{1}\right)}{e}\right]^{2}\right\} \ \ \ \ \ (29)$

This condition depends on ${x_{1}}$ and ${x_{2}}$ but that doesn’t matter, since both quantities in the RHS of 29 are positive, so there is always some positive value of ${\alpha}$ that satisfies the condition. In other words, going right back to 17 and then to 7, we can always find a value of ${\alpha}$ so that ${\left\langle H\right\rangle <0}$ which means that the ground state of ${H}$ must be negative, which makes it a bound state.