Infinite square well – force to decrease well width

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 5.2, Exercise 5.2.4.

One way of comparing the classical and quantum pictures of a particle in an infinite square well is to calculate the force exerted on the walls by the particle. If a particle is in state ${\left|n\right\rangle }$, its energy is

$\displaystyle E_{n}=\frac{\left(n\pi\hbar\right)^{2}}{2mL^{2}} \ \ \ \ \ (1)$

If the particle remains in this state as the walls are slowly pushed in, so that ${L}$ slowly decreases, then its energy ${E_{n}}$ will increase, meaning that work is done on the system. The force is the change in energy per unit distance, so the force required is

$\displaystyle F=-\frac{\partial E_{n}}{\partial L}=\frac{\left(n\pi\hbar\right)^{2}}{mL^{3}} \ \ \ \ \ (2)$

If we treat the system classically, then a particle with energy ${E_{n}}$ between the walls is effectively a free particle in this region (since the potential ${V=0}$ there), so all its energy is kinetic. That is

 $\displaystyle E_{n}$ $\displaystyle =$ $\displaystyle \frac{1}{2}mv^{2}\ \ \ \ \ (3)$ $\displaystyle v$ $\displaystyle =$ $\displaystyle \sqrt{\frac{2E_{n}}{m}}\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{n\pi\hbar}{mL} \ \ \ \ \ (5)$

The classical particle bounces elastically between the two walls, which means its velocity is exactly reversed at each collision. The momentum transfer in such a collision is

$\displaystyle \Delta p=2mv=\frac{2n\pi\hbar}{L} \ \ \ \ \ (6)$

The time between successive collisions on the same wall is

$\displaystyle \Delta t=\frac{2L}{v}=\frac{2mL^{2}}{n\pi\hbar} \ \ \ \ \ (7)$

Thus the average force exerted on one wall is

$\displaystyle \bar{F}=\frac{\Delta p}{\Delta t}=\frac{\left(n\pi\hbar\right)^{2}}{mL^{3}} \ \ \ \ \ (8)$

Comparing with 2, we see that the quantum and classical forces in this case are the same.

Non-denumerable basis: position and momentum states

References: References: edX online course MIT 8.05 Section 5.6.

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.10; Exercises 1.10.1 – 1.10.3.

Although we’ve looked at position and momentum operators in quantum mechanics before, it’s worth another look at the ways that Zwiebach and Shankar introduce them.

First, we’ll have a look at Shankar’s treatment. He begins by considering a string fixed at each end, at positions ${x=0}$ and ${x=L}$, then asks how we could convey the shape of the string to an observer who cannot see the string directly. We could note the position at some fixed finite number of points between 0 and ${L}$, but then the remote observer would have only a partial knowledge of the string’s shape; the locations of those portions of the string between the points at which it was measured are still unknown, although the observer could probably get a reasonable picture by interpolating between these points.

We can increase the number of points at which the position is measured to get a better picture, but to convey the exact shape of the string, we need to measure its position at an infinite number of points. This is possible (in principle) but leads to a problem with the definition of the inner product. For two vectors defined on a finite vector space with an orthonormal basis, the inner product is given by the usual formula for the dot product:

 $\displaystyle \left\langle f\left|g\right.\right\rangle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{n}f_{i}g_{i}\ \ \ \ \ (1)$ $\displaystyle \left\langle f\left|f\right.\right\rangle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{n}f_{i}^{2} \ \ \ \ \ (2)$

where ${f_{i}}$ and ${g_{i}}$ are the components of ${f}$ and ${g}$ in the orthonormal basis. If we’re taking ${f}$ to be the displacement of a string and we try to increase the accuracy of the picture by increasing the number ${n}$ of points at which measurements are taken, then the value of ${\left\langle f\left|f\right.\right\rangle }$ continues to increase as ${n}$ increases (provided that ${f\ne0}$ everywhere). As ${n\rightarrow\infty}$ then ${\left\langle f\left|f\right.\right\rangle \rightarrow\infty}$ as well, even though the system we’re measuring (a string of finite length with finite displacement) is certainly not infinite in any practical sense.

Shankar proposes getting around this problem by simply redefining the inner product for a finite vector space to be

$\displaystyle \left\langle f\left|g\right.\right\rangle =\sum_{i=1}^{n}f\left(x_{i}\right)g\left(x_{i}\right)\Delta \ \ \ \ \ (3)$

where ${\Delta\equiv L/\left(n+1\right)}$. That is, ${\Delta}$ now becomes the distance between adjacent points at which measurements are taken. If we let ${n\rightarrow\infty}$ this leads to the definition of the inner product as an integral

 $\displaystyle \left\langle f\left|g\right.\right\rangle$ $\displaystyle =$ $\displaystyle \int_{0}^{L}f\left(x\right)g\left(x\right)\;dx\ \ \ \ \ (4)$ $\displaystyle \left\langle f\left|f\right.\right\rangle$ $\displaystyle =$ $\displaystyle \int_{0}^{L}f^{2}\left(x\right)\;dx \ \ \ \ \ (5)$

This looks familiar enough, if you’ve done any work with inner products in quantum mechanics, but there is a subtle point which Shankar overlooks. In going from 1 to 3, we have introduced a factor ${\Delta}$ which, in the string example at least, has the dimensions of length, so the physical interpretation of these two equations is different. The units of ${\left\langle f\left|g\right.\right\rangle }$ appear to be different in the two cases. Now in quantum theory, inner products of the continuous type usually involve the wave function multiplied by its complex conjugate, with possibly another operator thrown in if we’re trying to find the expectation value of some observable. The square modulus of the wave function, ${\left|\Psi\right|^{2}}$, is taken to be a probability density, so it has units of inverse length (in one dimension) or inverse volume (in three dimensions), which makes the integral work out properly.

Admittedly, when we’re using ${f}$ to represent the displacement of a string, it’s not obvious what meaning the inner product of ${f}$ with anything else would actually have, so maybe the point isn’t worth worrying about. However, it does seem to be something that it would be worth Shankar including a comment about.

From this point, Shankar continues by saying that this infinite dimensional vector space is spanned by basis vectors ${\left|x\right\rangle }$, with one basis vector for each value of ${x}$. We require this basis to be orthogonal, which means that we must have, if ${x\ne x^{\prime}}$

$\displaystyle \left\langle x\left|x^{\prime}\right.\right\rangle =0 \ \ \ \ \ (6)$

We then generalize the identity operator to be

$\displaystyle I=\int\left|x\right\rangle \left\langle x\right|dx \ \ \ \ \ (7)$

$\displaystyle \left\langle x\left|f\right.\right\rangle =\int\left\langle x\left|x^{\prime}\right.\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle dx^{\prime} \ \ \ \ \ (8)$

The bra-ket ${\left\langle x\left|f\right.\right\rangle }$ is the projection of the vector ${\left|f\right\rangle }$ onto the ${\left|x\right\rangle }$ basis vector, so it is just ${f\left(x\right)}$. This means

$\displaystyle f\left(x\right)=\int\left\langle x\left|x^{\prime}\right.\right\rangle f\left(x^{\prime}\right)dx^{\prime} \ \ \ \ \ (9)$

which leads to the definition of the Dirac delta function as the normalization of ${\left\langle x\left|x^{\prime}\right.\right\rangle }$:

$\displaystyle \left\langle x\left|x^{\prime}\right.\right\rangle =\delta\left(x-x^{\prime}\right) \ \ \ \ \ (10)$

Shankar then describes some properties of the delta function and its derivative, most of which we’ve already covered. For example, we’ve seen these two results for the delta function:

 $\displaystyle \delta\left(ax\right)$ $\displaystyle =$ $\displaystyle \frac{\delta\left(x\right)}{\left|a\right|}\ \ \ \ \ (11)$ $\displaystyle \frac{d\theta\left(x-x^{\prime}\right)}{dx}$ $\displaystyle =$ $\displaystyle \delta\left(x-x^{\prime}\right) \ \ \ \ \ (12)$

where ${\theta}$ is the step function

$\displaystyle \theta\left(x-x^{\prime}\right)\equiv\begin{cases} 0 & x\le x^{\prime}\\ 1 & x>x^{\prime} \end{cases} \ \ \ \ \ (13)$

One other result is that for a function ${f\left(x\right)}$ with zeroes at a number of points ${x_{i}}$, we have

$\displaystyle \delta\left(f\left(x\right)\right)=\sum_{i}\frac{\delta\left(x_{i}-x\right)}{\left|df/dx_{i}\right|} \ \ \ \ \ (14)$

To see this, consider one of the ${x_{i}}$ where ${f\left(x_{i}\right)=0}$. Expanding in a Taylor series about this point, we have

 $\displaystyle f\left(x_{i}+\left(x-x_{i}\right)\right)$ $\displaystyle =$ $\displaystyle f\left(x_{i}\right)+\left(x-x_{i}\right)\frac{df}{dx_{i}}+\ldots\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0+\left(x-x_{i}\right)\frac{df}{dx_{i}} \ \ \ \ \ (16)$

From 11 we have

$\displaystyle \delta\left(\left(x-x_{i}\right)\frac{df}{dx_{i}}\right)=\frac{\delta\left(x_{i}-x\right)}{\left|df/dx_{i}\right|} \ \ \ \ \ (17)$

The behaviour is the same at all points ${x_{i}}$ and since ${\delta\left(x_{i}-x\right)=0}$ at all other ${x_{j}\ne x_{i}}$ where ${f\left(x_{j}\right)=0}$, we can just add the delta functions for each zero of ${f}$.

Turning now to Zwiebach’s treatment, he begins with the basis states ${\left|x\right\rangle }$ and position operator ${\hat{x}}$ with the eigenvalue equation

$\displaystyle \hat{x}\left|x\right\rangle =x\left|x\right\rangle \ \ \ \ \ (18)$

and simply defines the inner product between two position states to be

$\displaystyle \left\langle x\left|y\right.\right\rangle =\delta\left(x-y\right) \ \ \ \ \ (19)$

With this definition, 9 follows immediately. We can therefore write a quantum state ${\left|\psi\right\rangle }$ as

$\displaystyle \left|\psi\right\rangle =I\left|\psi\right\rangle =\int\left|x\right\rangle \left\langle x\left|\psi\right.\right\rangle dx=\int\left|x\right\rangle \psi\left(x\right)dx \ \ \ \ \ (20)$

That is, the vector ${\left|\psi\right\rangle }$ is the integral of its projections ${\psi\left(x\right)}$ onto the basis vectors ${\left|x\right\rangle }$.

The position operator ${\hat{x}}$ is hermitian as can be seen from

 $\displaystyle \left\langle x_{1}\left|\hat{x}^{\dagger}\right|x_{2}\right\rangle$ $\displaystyle =$ $\displaystyle \left\langle x_{2}\left|\hat{x}\right|x_{1}\right\rangle ^*\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{1}\left\langle x_{2}\left|x_{1}\right.\right\rangle ^*\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{1}\delta\left(x_{2}-x_{1}\right)^*\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{1}\delta\left(x_{2}-x_{1}\right)\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{2}\delta\left(x_{2}-x_{1}\right)\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle x_{1}\left|\hat{x}\right|x_{2}\right\rangle \ \ \ \ \ (26)$

The fourth line follows because the delta function is real, and the fifth follows because ${\delta\left(x_{2}-x_{1}\right)}$ is non-zero only when ${x_{1}=x_{2}}$.

Zwiebach then introduces the momentum eigenstates ${\left|p\right\rangle }$ which are analogous to the position states ${\left|x\right\rangle }$, in that

 $\displaystyle \left\langle p^{\prime}\left|p\right.\right\rangle$ $\displaystyle =$ $\displaystyle \delta\left(p^{\prime}-p\right)\ \ \ \ \ (27)$ $\displaystyle I$ $\displaystyle =$ $\displaystyle \int dp\left|p\right\rangle \left\langle p\right|\ \ \ \ \ (28)$ $\displaystyle \hat{p}\left|p\right\rangle$ $\displaystyle =$ $\displaystyle p\left|p\right\rangle \ \ \ \ \ (29)$ $\displaystyle \tilde{\psi}\left(p\right)$ $\displaystyle =$ $\displaystyle \left\langle p\left|\psi\right.\right\rangle \ \ \ \ \ (30)$

By the same calculation as for ${\left|x\right\rangle }$, we see that ${\hat{p}}$ is hermitian.

To get a relation between the ${\left|x\right\rangle }$ and ${\left|p\right\rangle }$ bases, we require that ${\left\langle x\left|p\right.\right\rangle }$ is the wave function for a particle with momentum ${p}$ in the ${x}$ basis, which we’ve seen is

$\displaystyle \psi\left(x\right)=\frac{1}{\sqrt{2\pi\hbar}}e^{ipx/\hbar} \ \ \ \ \ (31)$

Zwiebach then shows that this is consistent with the equation

$\displaystyle \left\langle x\left|\hat{p}\right|\psi\right\rangle =\frac{h}{i}\frac{d}{dx}\left\langle x\left|\psi\right.\right\rangle =\frac{h}{i}\frac{d\psi\left(x\right)}{dx} \ \ \ \ \ (32)$

We can get a similar relation by switching ${x}$ and ${p}$:

 $\displaystyle \left\langle p\left|\hat{x}\right|\psi\right\rangle$ $\displaystyle =$ $\displaystyle \int dx\left\langle p\left|x\right.\right\rangle \left\langle x\left|\hat{x}\right|\psi\right\rangle \ \ \ \ \ (33)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int dx\left\langle x\left|p\right.\right\rangle ^*x\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (34)$

From 31 we see

 $\displaystyle \left\langle x\left|p\right.\right\rangle ^*$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2\pi\hbar}}e^{-ipx/\hbar}\ \ \ \ \ (35)$ $\displaystyle \left\langle x\left|p\right.\right\rangle ^*x$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dp}\left\langle x\left|p\right.\right\rangle ^*\ \ \ \ \ (36)$ $\displaystyle \int dx\left\langle x\left|p\right.\right\rangle ^*x\left\langle x\left|\psi\right.\right\rangle$ $\displaystyle =$ $\displaystyle i\hbar\int dx\;\frac{d}{dp}\left\langle x\left|p\right.\right\rangle ^*\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (37)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dp}\int dx\;\left\langle x\left|p\right.\right\rangle ^*\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (38)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dp}\int dx\;\left\langle p\left|x\right.\right\rangle \left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (39)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d\tilde{\psi}\left(p\right)}{dp} \ \ \ \ \ (40)$

In the fourth line, we took the ${\frac{d}{dp}}$ outside the integral since ${p}$ occurs in only one term, and in the last line we used 7. Thus we have

$\displaystyle \left\langle p\left|\hat{x}\right|\psi\right\rangle =i\hbar\frac{d\tilde{\psi}\left(p\right)}{dp} \ \ \ \ \ (41)$

Exponentials of operators – Baker-Campbell-Hausdorff formula

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.9.

Although the result in this post isn’t covered in Shankar’s book, it’s a result that is frequently used in quantum theory, so it’s worth including at this point.

We’ve seen how to define a function of an operator if that function can be expanded in a power series. A common operator function is the exponential:

$\displaystyle f\left(\Omega\right)=e^{i\Omega} \ \ \ \ \ (1)$

If ${\Omega}$ is hermitian, the exponential ${e^{i\Omega}}$ is unitary. If we try to calculate the exponential of two operators such as ${e^{A+B}}$, the result isn’t as simple as we might hope if ${A}$ and ${B}$ don’t commute. To see the problem, we can write this out as a power series

 $\displaystyle e^{A+B}$ $\displaystyle =$ $\displaystyle \sum_{n=0}^{\infty}\frac{\left(A+B\right)^{n}}{n!}\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle I+A+B+\frac{1}{2}\left(A+B\right)\left(A+B\right)+\ldots\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle I+A+B+\frac{1}{2}\left(A^{2}+AB+BA+B^{2}\right)+\ldots \ \ \ \ \ (4)$

The problem appears first in the fourth term in the series, since we can’t condense the ${AB+BA}$ sum into ${2AB}$ if ${\left[A,B\right]\ne0}$. In fact, the expansion of ${e^{A}e^{B}}$ can be written entirely in terms of the commutators of ${A}$ and ${B}$ with each other, nested to increasingly higher levels. This formula is known as the Baker-Campbell-Hausdorff formula. Up to the fourth order commutator, the BCH formula gives

$\displaystyle e^{A}e^{B}=\exp\left[A+B+\frac{1}{2}\left[A,B\right]+\frac{1}{12}\left(\left[A,\left[A,B\right]\right]+\left[B,\left[B,A\right]\right]\right)-\frac{1}{24}\left[B,\left[A,\left[A,B\right]\right]\right]+\ldots\right] \ \ \ \ \ (5)$

There is no known closed form expression for this result. However, an important special case that occurs frequently in quantum theory is the case where ${\left[A,B\right]=cI}$, where ${c}$ is a complex scalar and ${I}$ is the usual identity matrix. Since ${cI}$ commutes with all operators, all terms from the third order upwards are zero, and we have

$\displaystyle e^{A}e^{B}=e^{A+B+\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (6)$

We can prove this result as follows. Start with the operator function

$\displaystyle G\left(t\right)\equiv e^{t\left(A+B\right)}e^{-tA} \ \ \ \ \ (7)$

where ${t}$ is a scalar parameter (not necessarily time!).

From its definition,

$\displaystyle G\left(0\right)=I \ \ \ \ \ (8)$

The inverse is

$\displaystyle G^{-1}\left(t\right)=e^{tA}e^{-t\left(A+B\right)} \ \ \ \ \ (9)$

and the derivative is

 $\displaystyle \frac{dG\left(t\right)}{dt}$ $\displaystyle =$ $\displaystyle \left(A+B\right)e^{t\left(A+B\right)}e^{-tA}-e^{t\left(A+B\right)}e^{-tA}A \ \ \ \ \ (10)$

Note that we have to keep the ${\left(A+B\right)}$ factor to the left of the ${A}$ factor because ${\left[A,B\right]\ne0}$. Now we multiply:

 $\displaystyle G^{-1}\frac{dG}{dt}$ $\displaystyle =$ $\displaystyle e^{tA}e^{-t\left(A+B\right)}\left[\left(A+B\right)e^{t\left(A+B\right)}e^{-tA}-e^{t\left(A+B\right)}e^{-tA}A\right]\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{tA}\left(A+B\right)e^{-tA}-A\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{tA}Ae^{-tA}+e^{tA}Be^{-tA}-A\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{tA}Be^{-tA}\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle B+t\left[A,B\right]\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle B+ctI \ \ \ \ \ (16)$

We used Hadamard’s lemma in the penultimate line, which in this case reduces to

$\displaystyle e^{tA}Be^{-tA}=B+t\left[A,B\right] \ \ \ \ \ (17)$

because ${\left[A,B\right]=cI}$ so all higher order commutators are zero.

We end up with an expression in which ${A}$ has disappeared. This gives the differential equation for ${G}$:

$\displaystyle G^{-1}\frac{dG}{dt}=B+ctI \ \ \ \ \ (18)$

We try a solution of the form (this apparently appears from divine inspiration):

$\displaystyle G\left(t\right)=e^{\alpha tB}e^{\beta ct^{2}} \ \ \ \ \ (19)$

From which we get

 $\displaystyle G^{-1}$ $\displaystyle =$ $\displaystyle e^{-\alpha tB}e^{-\beta ct^{2}}\ \ \ \ \ (20)$ $\displaystyle \frac{dG}{dt}$ $\displaystyle =$ $\displaystyle \left(\alpha B+2\beta ct\right)e^{\alpha tB}e^{\beta ct^{2}}\ \ \ \ \ (21)$ $\displaystyle G^{-1}\frac{dG}{dt}$ $\displaystyle =$ $\displaystyle \alpha B+2\beta ct \ \ \ \ \ (22)$

Comparing this to 18, we have

 $\displaystyle \alpha$ $\displaystyle =$ $\displaystyle 1\ \ \ \ \ (23)$ $\displaystyle \beta$ $\displaystyle =$ $\displaystyle \frac{1}{2}\ \ \ \ \ (24)$ $\displaystyle G\left(t\right)$ $\displaystyle =$ $\displaystyle e^{tB}e^{\frac{1}{2}ct^{2}} \ \ \ \ \ (25)$

Setting this equal to the original definition of ${G}$ in 7 and then taking ${t=1}$ we have

 $\displaystyle e^{A+B}e^{-A}$ $\displaystyle =$ $\displaystyle e^{B}e^{c/2}\ \ \ \ \ (26)$ $\displaystyle e^{A+B}$ $\displaystyle =$ $\displaystyle e^{B}e^{A}e^{\frac{1}{2}c}\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{B}e^{A}e^{\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (28)$

If we swap ${A}$ with ${B}$ and use the fact that ${A+B=B+A}$, and also ${\left[A,B\right]=-\left[B,A\right]}$, we have

$\displaystyle e^{A+B}=e^{A}e^{B}e^{-\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (29)$

This is the restricted form of the BCH formula for the case where ${\left[A,B\right]}$ is a scalar.

Lorentz transformations as 2×2 matrices

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, recall that a Lorentz transformation can be represented by a ${4\times4}$ matrix ${\Lambda}$ which preserves the Minkowski length ${x_{\mu}x^{\mu}}$ of all four-vectors ${x}$. This leads to the condition

$\displaystyle \Lambda^{T}g\Lambda=g \ \ \ \ \ (1)$

where ${g}$ is the flat-space Minkowski metric

$\displaystyle g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (2)$

It turns out that we can map any 4-vector ${x}$ to a ${2\times2}$ Hermitian matrix ${\widehat{x}}$ defined as

$\displaystyle \widehat{x}\equiv\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right] \ \ \ \ \ (3)$

[Recall that a Hermitian matrix ${H}$ is equal to the complex conjugate of its transpose:

$\displaystyle H=\left(H^{T}\right)^*\equiv H^{\dagger} \ \ \ \ \ (4)$

Also note that Jaffe uses an unconventional notation for the Hermitian conjugate, as he uses a superscript * rather that a superscript ${\dagger}$. This can be confusing since usually a superscript * indicates just complex conjugate, without the transpose. I’ll use the more usual superscript ${\dagger}$ for Hermitian conjugate here.]

Although we’re used to the scalar product of two vectors, it is also useful to define the scalar product of two matrices as

$\displaystyle \left\langle A,B\right\rangle \equiv\frac{1}{2}\mbox{Tr}\left(A^{\dagger}B\right) \ \ \ \ \ (5)$

where ‘Tr’ means the trace of a matrix, which is the sum of its diagonal elements. Note that the scalar product of ${\widehat{x}}$ with itself is

 $\displaystyle \left\langle \widehat{x},\widehat{x}\right\rangle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right]\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right]\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle$ $\displaystyle \frac{1}{2}\left[\left(x_{0}+x_{3}\right)^{2}+2\left(x_{1}-ix_{2}\right)\left(x_{1}+ix_{2}\right)+\left(x_{0}-x_{3}\right)^{2}\right]\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{0}^{2}+x_{1}^{2}+x_{2}^{2}+x_{3}^{2} \ \ \ \ \ (8)$

The determinant of ${\widehat{x}}$ is

 $\displaystyle \det\widehat{x}$ $\displaystyle =$ $\displaystyle \left(x_{0}+x_{3}\right)\left(x_{0}-x_{3}\right)-\left(x_{1}-ix_{2}\right)\left(x_{1}+ix_{2}\right)\ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{0}^{2}-x_{1}^{2}-x_{2}^{2}-x_{3}^{2}\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{\mu}x^{\mu} \ \ \ \ \ (11)$

Thus ${\det\widehat{x}}$ is the Minkowski length squared.

From 3, we observe that we can write ${\widehat{x}}$ as a sum:

$\displaystyle \widehat{x}=\sum_{\mu=0}^{4}x_{\mu}\sigma_{\mu} \ \ \ \ \ (12)$

where the ${\sigma_{\mu}}$ are four Hermitian matrices:

 $\displaystyle \sigma_{0}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]=I\ \ \ \ \ (13)$ $\displaystyle \sigma_{1}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 0 & 1\\ 1 & 0 \end{array}\right]\ \ \ \ \ (14)$ $\displaystyle \sigma_{2}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (15)$ $\displaystyle \sigma_{3}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right] \ \ \ \ \ (16)$

The last three are the Pauli spin matrices that we met when looking at spin-${\frac{1}{2}}$ in quantum mechanics.

The ${\sigma_{\mu}}$ are orthonormal under the scalar product operation, as we can verify by direct calculation. For example

 $\displaystyle \left\langle \sigma_{2},\sigma_{3}\right\rangle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right]\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left(0+0\right)\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (19)$

And:

 $\displaystyle \left\langle \sigma_{2},\sigma_{2}\right\rangle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left(1+1\right)\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 1 \ \ \ \ \ (22)$

The other products work out similarly, so we have

$\displaystyle \left\langle \sigma_{\mu},\sigma_{\nu}\right\rangle =\delta_{\mu\nu} \ \ \ \ \ (23)$

We can work out the inverse transformation to 3 by taking the scalar product of 12 with ${\sigma_{\nu}}$:

 $\displaystyle \left\langle \sigma_{\nu},\widehat{x}\right\rangle$ $\displaystyle =$ $\displaystyle \sum_{\mu=0}^{4}x_{\mu}\left\langle \sigma_{\nu},\sigma_{\mu}\right\rangle \ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{\mu=0}^{4}x_{\mu}\delta_{\nu\mu}\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{\nu} \ \ \ \ \ (26)$

Now a few more theorems that will be useful later.

Irreducible Sets of Matrices

A set of matrices ${\mathfrak{U}}$ is called irreducible if the only matrix ${C}$ that commutes with every matrix in ${\mathfrak{U}}$ is the identity matrix ${I}$ (or a multiple of ${I}$). Any two of the three Pauli matrices ${\sigma_{i}}$, ${i=1,2,3}$ above form an irreducible set of ${2\times2}$ Hermitian matrices. This can be shown by direct calculation, which Jaffe does in detail in his article. For example, if we define ${C}$ to be some arbitrary matrix

$\displaystyle C=\left[\begin{array}{cc} a & b\\ c & d \end{array}\right] \ \ \ \ \ (27)$

where ${a,b,c,d}$ are complex numbers, then

 $\displaystyle C\sigma_{1}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} b & a\\ d & c \end{array}\right]\ \ \ \ \ (28)$ $\displaystyle \sigma_{1}C$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} c & d\\ a & b \end{array}\right] \ \ \ \ \ (29)$

If ${C}$ is to commute with ${\sigma_{1}}$, we must therefore require ${b=c}$ and ${a=d}$.

Similarly, for ${\sigma_{2}}$ we have

 $\displaystyle C\sigma_{2}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} ib & -ia\\ id & -ic \end{array}\right]\ \ \ \ \ (30)$ $\displaystyle \sigma_{2}C$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} -ic & -id\\ ia & ib \end{array}\right] \ \ \ \ \ (31)$

so that ${C\sigma_{2}=\sigma_{2}C}$ requires ${b=-c}$ and ${a=d}$.

And for ${\sigma_{3}}$:

 $\displaystyle C\sigma_{3}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} a & -b\\ c & -d \end{array}\right]\ \ \ \ \ (32)$ $\displaystyle \sigma_{3}C$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} a & b\\ -c & -d \end{array}\right] \ \ \ \ \ (33)$

so that ${C\sigma_{3}=\sigma_{3}C}$ requires ${b=-b}$ and ${c=-c}$, so ${b=c=0}$ (no conditions can be inferred for ${a}$ or ${d}$).

If we form a set ${\mathfrak{U}}$ containing ${\sigma_{3}}$ and one of ${\sigma_{1}}$ or ${\sigma_{2}}$, we see that ${b=c=0}$ and ${a=d}$, so ${C}$ is a multiple of ${I}$. If we form ${\mathfrak{U}}$ from ${\sigma_{1}}$ and ${\sigma_{2}}$ we again have ${a=d}$, but we must have simultaneously ${b=c}$ and ${b=-c}$ which can be true only if ${b=c=0}$, so again ${C}$ is a multiple of ${I}$.

Unitary Matrices

A unitary matrix is one whose Hermitian conjugate is its inverse, so that ${U^{\dagger}=U^{-1}}$. Some properties of unitary matrices are given on the Wikipedia page, so we’ll just use those without going through the proofs. First, a unitary matrix is normal, which means that ${U^{\dagger}U=UU^{\dagger}}$ (this actually follows from the condition ${U^{\dagger}=U^{-1}}$). Second, there is another unitary matrix ${V}$ which diagonalizes ${U}$, that is

$\displaystyle V^{\dagger}UV=D \ \ \ \ \ (34)$

where ${D}$ is a diagonal, unitary matrix.

Third,

$\displaystyle \left|\det U\right|=1 \ \ \ \ \ (35)$

(The determinant can be complex, but has magnitude 1.)

From this it follows that ${\left|\det D\right|=1}$ and since ${D}$ is unitary and diagonal, each diagonal element ${d_{j}}$ of ${D}$ must satisfy ${\left|d_{j}\right|=1}$. (Remember that ${d_{j}}$ could be a complex number.) That means that ${d_{j}=e^{i\lambda_{j}}}$ for some real number ${\lambda_{j}}$, so we can write

$\displaystyle D=e^{i\Lambda} \ \ \ \ \ (36)$

where ${\Lambda}$ is a diagonal hermitian matrix containing only real elements, non-zero along its diagonal: ${\Lambda_{ij}=\lambda_{j}\delta_{ij}}$. As usual, the exponential of a matrix is interpreted in terms of its power series, so that

$\displaystyle e^{i\Lambda}=1+i\Lambda+\frac{\left(i\Lambda\right)^{2}}{2!}+\frac{\left(i\Lambda\right)^{3}}{3!}+\ldots \ \ \ \ \ (37)$

For a diagonal matrix ${\Lambda}$ with diagonal elements ${\Lambda_{jj}=\lambda_{j}}$, the diagonal elements of ${\Lambda^{n}}$ are just ${\Lambda_{jj}^{n}=\lambda_{j}^{n}}$.

From 34, we have

 $\displaystyle U$ $\displaystyle =$ $\displaystyle VDV^{\dagger}\ \ \ \ \ (38)$ $\displaystyle$ $\displaystyle =$ $\displaystyle Ve^{i\Lambda}V^{\dagger} \ \ \ \ \ (39)$

Now we also have, since ${VV^{\dagger}=I}$

 $\displaystyle V\Lambda^{n}V^{\dagger}$ $\displaystyle =$ $\displaystyle V\Lambda\left(VV^{\dagger}\right)\Lambda\left(VV^{\dagger}\right)\ldots\Lambda V^{\dagger}\ \ \ \ \ (40)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(V\Lambda V^{\dagger}\right)^{n} \ \ \ \ \ (41)$

Therefore, from 37

 $\displaystyle U$ $\displaystyle =$ $\displaystyle Ve^{i\Lambda}V^{\dagger}\ \ \ \ \ (42)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{iV\Lambda V^{\dagger}}\ \ \ \ \ (43)$ $\displaystyle$ $\displaystyle \equiv$ $\displaystyle e^{iH} \ \ \ \ \ (44)$

where ${H=V\Lambda V^{\dagger}}$ is another Hermitian matrix. In other words, we can always write a unitary matrix as the exponential of a Hermitian matrix.

In the case where ${H}$ is a ${2\times2}$ matrix, we can write it in terms of the ${\sigma_{\mu}}$ matrices above as

$\displaystyle H=\sum_{\mu=0}^{3}a_{\mu}\sigma_{\mu} \ \ \ \ \ (45)$

where the ${a_{\mu}}$ are real, since the diagonal elements of a Hermitian matrix must be real. This follows because the ${\sigma_{\mu}}$ form an orthonormal basis for the ${2\times2}$ Hermitian matrices. [For some reason, Jaffe refers to the ${a_{\mu}}$as ${\lambda_{\mu}}$ which is confusing since he has used ${\lambda_{\mu}}$ as the diagonal elements of ${\Lambda}$ above, and they’re not the same thing.]

If ${\det U=+1}$, then

 $\displaystyle \det U$ $\displaystyle =$ $\displaystyle \det\left(VDV^{\dagger}\right)\ \ \ \ \ (46)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \det\left(VV^{\dagger}D\right)\ \ \ \ \ (47)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \det D\ \ \ \ \ (48)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \det e^{i\Lambda} \ \ \ \ \ (49)$

The second line follows because the determinant of a product of matrices is the product of the determinants, so we can rearrange the multiplication order. To evaluate the last line, we observe that for a diagonal matrix ${\Lambda}$, using 37 and applying the result to each diagonal element

$\displaystyle e^{i\Lambda}=\left[\begin{array}{cc} e^{i\Lambda_{11}} & 0\\ 0 & e^{i\Lambda_{22}} \end{array}\right] \ \ \ \ \ (50)$

Therefore

$\displaystyle \det e^{i\Lambda}=e^{i\left(\Lambda_{11}+\Lambda_{22}\right)}=e^{i\mbox{Tr}\Lambda} \ \ \ \ \ (51)$

[By the way, the relation ${\det e^{A}=e^{\mbox{Tr}A}}$ is actually true for any square matrix ${A}$, and is a corollary of Jacobi’s formula.]

We can now use the cyclic property of the trace (another matrix algebra theroem) which says that for 3 matrices ${A,B,C}$,

$\displaystyle \mbox{Tr}\left(ABC\right)=\mbox{Tr}\left(CAB\right)=\mbox{Tr}\left(BCA\right) \ \ \ \ \ (52)$

This gives us

$\displaystyle \mbox{Tr}H=\mbox{Tr}\left(V\Lambda V^{\dagger}\right)=\mbox{Tr}\left(V^{\dagger}V\Lambda\right)=\mbox{Tr}\Lambda \ \ \ \ \ (53)$

Finally, from 45 and the fact that the traces of the ${\sigma_{i}}$ are all zero for ${i=1,2,3}$, and ${\mbox{Tr}\sigma_{0}=2}$, we have

$\displaystyle \det U=\det e^{i\Lambda}=e^{i\mbox{Tr}H}=e^{2ia_{0}}=1 \ \ \ \ \ (54)$

Thus ${a_{0}=n\pi}$ for some integer ${n}$, but as all values of ${n}$ give the same original unitary matrix ${U}$, we can choose ${n=0}$ so that ${a_{0}=0}$ and

$\displaystyle H=\sum_{\mu=1}^{3}a_{\mu}\sigma_{\mu} \ \ \ \ \ (55)$

Rotation group: basics, generators and commutators

Reference: Lewis H. Ryder (1996), Quantum Field Theory, Second edition, Cambridge University Press. Section 2.3.

A group important in physics is the 3-d rotation group, which consists of all rotations about the origin in 3-d space. Such a rotation can be written as

$\displaystyle r^{\prime}=Rr \ \ \ \ \ (1)$

where ${r=\left(\begin{array}{c} x\\ y\\ z \end{array}\right)}$ is the vector before rotation, ${r^{\prime}=\left(\begin{array}{c} x^{\prime}\\ y^{\prime}\\ z^{\prime} \end{array}\right)}$ is the vector after rotation and ${R}$ is the ${3\times3}$ rotation matrix. Because all rotations are about an axis going through the origin, the distances are unchanged, so that ${r^{2}=r^{\prime2}}$. In matrix notation, using a superscript ${T}$ to indicate the transpose, we have

 $\displaystyle x^{2}+y^{2}+z^{2}=r^{T}r$ $\displaystyle =$ $\displaystyle r^{\prime T}r^{\prime}\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle r^{T}R^{T}Rr \ \ \ \ \ (3)$

[The transpose of the product of two or more matrices is the product of the transposes, in reverse order.]

From this we see that

$\displaystyle R^{T}R=I \ \ \ \ \ (4)$

where ${I}$ is the ${3\times3}$ identity matrix. This is the definition of an orthogonal matrix, that is, a matrix whose rows and columns are orthogonal unit vectors. If this equation is written out in terms of components, it provides 6 independent constraints on the elements of ${R}$. To get a diagonal element ${I_{kk}=1}$, for ${k=1,2,3}$ we multiply the ${k}$th row of ${R^{T}}$ into the ${k}$th column of ${R}$. However, the ${k}$th row of ${R^{T}}$ is the ${k}$th column of ${R}$, so

 $\displaystyle I_{kk}=1$ $\displaystyle =$ $\displaystyle \left(R^{T}R\right)_{kk}\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{3}R_{ik}R_{ik}\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{3}R_{ik}^{2} \ \ \ \ \ (7)$

This provides 3 conditions, one for each value of ${k}$.

An off-diagonal element such as ${I_{12}=0}$ is obtained from

 $\displaystyle I_{12}=0$ $\displaystyle =$ $\displaystyle \left(R^{T}R\right)_{12}\ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{3}R_{i1}R_{i2} \ \ \ \ \ (9)$

In general, the off-diagonal elements give the condition

$\displaystyle I_{kj}=\sum_{i=1}^{3}R_{ik}R_{ij} \ \ \ \ \ (10)$

The sum on the RHS is symmetric under exchange of ${k}$ and ${j}$ so, for example, the condition ${I_{21}=0}$ gives the same equation as ${I_{12}=0}$. Thus the off-diagonal elements provide only 3 more independent constraints, which can be taken as ${I_{12}=I_{13}=I_{23}=0}$.

The rotation matrices satisfy the four conditions for forming a group. They are complete, since if we have two rotation matrices ${R_{1}}$ and ${R_{2}}$, then the combined rotation ${R_{1}R_{2}}$ is also a rotation matrix, since it satisfies

$\displaystyle \left(R_{1}R_{2}\right)^{T}R_{1}R_{2}=R_{2}^{T}R_{1}^{T}R_{1}R_{2}=R_{2}^{T}IR_{2}=R_{2}^{T}R_{2}=I \ \ \ \ \ (11)$

Rotation is associative, since a series of rotations is just performed in order from right to left, so inserting parentheses is meaningless: ${\left(R_{1}R_{2}\right)R_{3}=R_{1}\left(R_{2}R_{3}\right)}$. The identity rotation is just ${R=I}$ (that is, no rotation at all), and the inverse is just the transpose: ${R^{-1}=R^{T}}$, since ${R^{T}R=I}$.

The rotation group in 3-d is known as the ${O\left(3\right)}$ group (and in ${n}$ dimensions as ${O\left(n\right)}$).

Because the condition 4 imposes 6 constraints on a matrix with 9 elements, we must specify 3 parameters to define a rotation uniquely. Those with a background in classical mechanics might be familiar with the three Euler angles; however, for our purposes we can just define rotations about the ${x}$, ${y}$ and ${z}$ axes with angles ${\phi}$, ${\psi}$ and ${\theta}$ respectively. We then get the three usual rotation matrices:

 $\displaystyle R_{x}\left(\phi\right)$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & \cos\phi & \sin\phi\\ 0 & -\sin\phi & \cos\phi \end{array}\right]\ \ \ \ \ (12)$ $\displaystyle R_{y}\left(\phi\right)$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} \cos\psi & 0 & -\sin\psi\\ 0 & 1 & 0\\ \sin\psi & 0 & \cos\psi \end{array}\right]\ \ \ \ \ (13)$ $\displaystyle R_{z}\left(\phi\right)$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} \cos\theta & \sin\theta & 0\\ -\sin\theta & \cos\theta & 0\\ 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (14)$

Any rotation about the origin can be decomposed into a succession of these three rotations in some order. If we do two rotations about two different axes, the operation is not commutative as you can verify by direct substitution. For example, if we do a rotation by ${\theta=\frac{\pi}{2}}$ about the ${z}$ axis followed by ${\phi=\frac{\pi}{2}}$ about the ${x}$ axis, the result is

 $\displaystyle R_{x}\left(\frac{\pi}{2}\right)R_{z}\left(\frac{\pi}{2}\right)$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{array}\right]\left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 1 \end{array}\right]\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 0 & 1 & 0\\ 0 & 0 & 1\\ 1 & 0 & 0 \end{array}\right] \ \ \ \ \ (16)$

Reversing the order we get

 $\displaystyle R_{z}\left(\frac{\pi}{2}\right)R_{x}\left(\frac{\pi}{2}\right)$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 1 \end{array}\right]\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{array}\right]\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 0 & 0 & 1\\ -1 & 0 & 0\\ 0 & -1 & 0 \end{array}\right] \ \ \ \ \ (18)$

Thus the rotation group is non-Abelian.

Finally, we can define three curious objects called generators of the group. Their definitions will look bizarre, but be patient as we will see that they do have useful properties. We have

 $\displaystyle J_{x}$ $\displaystyle \equiv$ $\displaystyle -i\left.\frac{dR_{x}\left(\phi\right)}{d\phi}\right|_{\phi=0}=\left[\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & -i\\ 0 & i & 0 \end{array}\right]\ \ \ \ \ (19)$ $\displaystyle J_{y}$ $\displaystyle \equiv$ $\displaystyle -i\left.\frac{dR_{y}\left(\psi\right)}{d\psi}\right|_{\psi=0}=\left[\begin{array}{ccc} 0 & 0 & i\\ 0 & 0 & 0\\ -i & 0 & 0 \end{array}\right]\ \ \ \ \ (20)$ $\displaystyle J_{z}$ $\displaystyle \equiv$ $\displaystyle -i\left.\frac{dR_{z}\left(\theta\right)}{d\theta}\right|_{\theta=0}=\left[\begin{array}{ccc} 0 & -i & 0\\ i & 0 & 0\\ 0 & 0 & 0 \end{array}\right] \ \ \ \ \ (21)$

For an infinitesimal rotation ${\delta\theta}$ about the ${z}$ axis we can, to first order, approximate ${\cos\delta\theta\approx1}$ and ${\sin\delta\theta\approx\delta\theta}$, so from 14 and 21 we have, to first order

 $\displaystyle R_{z}\left(\delta\theta\right)$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 1 & \delta\theta & 0\\ -\delta\theta & 1 & 0\\ 0 & 0 & 1 \end{array}\right]\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle I+iJ_{z}\delta\theta \ \ \ \ \ (23)$

For a finite rotation through an angle ${\theta}$, we can apply ${R_{z}\left(\delta\theta\right)}$ ${N}$ times so that if ${\delta\theta=\theta/N}$:

 $\displaystyle R_{z}\left(\theta\right)$ $\displaystyle =$ $\displaystyle \left[R_{z}\left(\delta\theta\right)\right]^{N}\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[I+iJ_{z}\frac{\theta}{N}\right]^{N} \ \ \ \ \ (25)$

We can now take the limit ${N\rightarrow\infty}$ and use the definition (covered in introductory calculus books) of ${e^{x}\equiv\lim_{N\rightarrow\infty}\left(1+\frac{x}{N}\right)^{N}}$ to get

$\displaystyle R_{z}\left(\theta\right)=e^{iJ_{z}\theta} \ \ \ \ \ (26)$

Since ${J_{z}}$ in the exponent is a matrix, we can interpret this exponential in terms of its series expansion:

 $\displaystyle e^{iJ_{z}\theta}$ $\displaystyle =$ $\displaystyle I+iJ_{z}\theta-J_{z}^{2}\frac{\theta^{2}}{2!}-iJ_{z}^{3}\frac{\theta^{3}}{3!}+J_{z}^{4}\frac{\theta^{4}}{4!}+\ldots\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{array}\right]+\theta\left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]-\frac{\theta^{2}}{2!}\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle$ $\displaystyle -\frac{\theta^{3}}{3!}\left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]+\frac{\theta^{4}}{4!}\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{array}\right]+\ldots \ \ \ \ \ (29)$

The matrices on the RHS alternate between ${\left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]}$ and ${\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{array}\right]}$, so the terms on the RHS can be seen to be the series expansions of ${\pm\mbox{sine}}$ (in the 1,2 and 2,1 elements) and cosine (in the 1,1 and 2,2 elements), giving 14 again:

$\displaystyle e^{iJ_{z}\theta}=\left[\begin{array}{ccc} \cos\theta & \sin\theta & 0\\ -\sin\theta & \cos\theta & 0\\ 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (30)$

This might seem like a pointless exercise, since we’ve just managed to express the rotation matrix in a fancy form as a complex exponential, but we’ll need to be patient to see how useful this turns out to be later.

Finally, we can work out the commutators of the generators. [As you might recall from quantum mechanics, commutators play an important role in the theory, and it turns out to be the same here, so we’ll need these for later.] The commutators can be calculated explicitly by doing a bit of matrix arithmetic, directly from 19, 20 and 21. For example

 $\displaystyle \left[J_{x},J_{y}\right]$ $\displaystyle \equiv$ $\displaystyle J_{x}J_{y}-J_{y}J_{x}\ \ \ \ \ (31)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & -i\\ 0 & i & 0 \end{array}\right]\left[\begin{array}{ccc} 0 & 0 & i\\ 0 & 0 & 0\\ -i & 0 & 0 \end{array}\right]-\left[\begin{array}{ccc} 0 & 0 & i\\ 0 & 0 & 0\\ -i & 0 & 0 \end{array}\right]\left[\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & -i\\ 0 & i & 0 \end{array}\right]\ \ \ \ \ (32)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 0 & 0 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]-\left[\begin{array}{ccc} 0 & -1 & 0\\ 0 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (33)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (34)$ $\displaystyle$ $\displaystyle =$ $\displaystyle iJ_{z} \ \ \ \ \ (35)$

By similar calculations, we can get the other two commutators, which are just cyclic permutations of this one:

 $\displaystyle \left[J_{z},J_{x}\right]$ $\displaystyle =$ $\displaystyle iJ_{y}\ \ \ \ \ (36)$ $\displaystyle \left[J_{y},J_{z}\right]$ $\displaystyle =$ $\displaystyle iJ_{x} \ \ \ \ \ (37)$

Except for a factor of ${\hbar}$, these are the same commutation relations as those satisfied by angular momentum in quantum mechanics.

Ongoing work

25 April 2016

All upgrades of plots are now complete. Normal service has resumed.

24 April 2016

At the moment, I’m working through those posts that have Maple-generated plots in them and upgrading the quality of the plots. This involves removing the old plots and uploading new ones to replace them, so occasionally you might find a post where there is a ‘broken image’ icon in place of a plot. This should last for only a couple of minutes or so for each affected post, so try waiting a bit and then refreshing the page before you send in a comment that the image is broken.

Dirac equation: the gamma matrices

Reference: References: Robert D. Klauber, Student Friendly Quantum Field Theory, (Sandtrove Press, 2013) – Chapter 4, Problems 4.2 – 4.3.

The Dirac equation in relativistic quantum mechanics can be written as

$\displaystyle i\frac{\partial}{\partial t}\left|\psi\right\rangle =\left(\boldsymbol{\alpha}\cdot\mathbf{p}+\beta m\right)\left|\psi\right\rangle \ \ \ \ \ (1)$

where the matrices are given by

 $\displaystyle \alpha_{1}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0\\ 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (2)$ $\displaystyle \alpha_{2}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 0 & -i\\ 0 & 0 & i & 0\\ 0 & -i & 0 & 0\\ i & 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (3)$ $\displaystyle \alpha_{3}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 1 & 0\\ 0 & 0 & 0 & -1\\ 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0 \end{array}\right]\ \ \ \ \ (4)$ $\displaystyle \beta$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (5)$

The matrices all satisfy the relations

$\displaystyle \alpha_{i}^{2}=\beta^{2}=I \ \ \ \ \ (6)$

It turns out that products of these matrices are used more often than ${\boldsymbol{\alpha}}$ and ${\beta}$ in studies of the Dirac equation, so we’ll introduce those here. They are denoted by ${\gamma^{\mu}}$ where ${\mu=0,\ldots,3}$ and are defined as follows:

 $\displaystyle \gamma^{0}$ $\displaystyle =$ $\displaystyle \beta=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\ \ \ \ \ (7)$ $\displaystyle \gamma^{1}$ $\displaystyle =$ $\displaystyle \beta\alpha_{1}=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\left[\begin{array}{cccc} 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0\\ 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0\\ 0 & -1 & 0 & 0\\ -1 & 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (9)$ $\displaystyle \gamma^{2}$ $\displaystyle =$ $\displaystyle \beta\alpha_{2}=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\left[\begin{array}{cccc} 0 & 0 & 0 & -i\\ 0 & 0 & i & 0\\ 0 & -i & 0 & 0\\ i & 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 0 & -i\\ 0 & 0 & i & 0\\ 0 & i & 0 & 0\\ -i & 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (11)$ $\displaystyle \gamma^{3}$ $\displaystyle =$ $\displaystyle \beta\alpha_{3}=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\left[\begin{array}{cccc} 0 & 0 & 1 & 0\\ 0 & 0 & 0 & -1\\ 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0 \end{array}\right]\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 1 & 0\\ 0 & 0 & 0 & -1\\ -1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 \end{array}\right] \ \ \ \ \ (13)$

The Hermitian conjugates of the ${\gamma^{\mu}}$ satisfy the relation

$\displaystyle \gamma^{\mu\dagger}=\gamma^{0}\gamma^{\mu}\gamma^{0} \ \ \ \ \ (14)$

This can be verified by direct calculation. For ${\gamma^{0}}$ we get (using 6)

$\displaystyle \gamma^{0}\gamma^{0}\gamma^{0}=\beta^{2}\gamma^{0}=\gamma^{0} \ \ \ \ \ (15)$

Since ${\gamma^{0}}$ is a real, diagonal matrix, we must have ${\gamma^{0\dagger}=\gamma^{0}}$, so 14 is correct for ${\gamma^{0}}$.

For ${\gamma^{1}}$ we have

 $\displaystyle \gamma^{0}\gamma^{1}\gamma^{0}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\left[\begin{array}{cccc} 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0\\ 0 & -1 & 0 & 0\\ -1 & 0 & 0 & 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0\\ 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 0 & -1\\ 0 & 0 & -1 & 0\\ 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0 \end{array}\right]=\gamma^{1\dagger} \ \ \ \ \ (18)$

For ${\gamma^{2}}$:

 $\displaystyle \gamma^{0}\gamma^{2}\gamma^{0}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\left[\begin{array}{cccc} 0 & 0 & 0 & -i\\ 0 & 0 & i & 0\\ 0 & i & 0 & 0\\ -i & 0 & 0 & 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 0 & -i\\ 0 & 0 & i & 0\\ 0 & -i & 0 & 0\\ i & 0 & 0 & 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 0 & i\\ 0 & 0 & -i & 0\\ 0 & -i & 0 & 0\\ i & 0 & 0 & 0 \end{array}\right]=\gamma^{2\dagger} \ \ \ \ \ (21)$

And for ${\gamma^{3}}$:

 $\displaystyle \gamma^{0}\gamma^{3}\gamma^{0}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\left[\begin{array}{cccc} 0 & 0 & 1 & 0\\ 0 & 0 & 0 & -1\\ -1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & 1 & 0\\ 0 & 0 & 0 & -1\\ 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 0 & 0 & -1 & 0\\ 0 & 0 & 0 & 1\\ 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0 \end{array}\right]=\gamma^{3\dagger} \ \ \ \ \ (24)$

‘Latex path not specified’ errors

I’ve been getting a few reports of readers seeing the message “Latex path not specified” in place of mathematical formulas in some of my posts. It appears that this problem is not unique to my site and that it started around 19 November (see here and here for discussions on WordPress blogs).

I don’t see the error myself so I can’t test any solutions. However, I view my pages  using a Windows 7 desktop and Windows 8.1 laptop only, and there have been some reports that this error occurs only on smaller devices like smart phones (which I don’t have). If you do see the error and want to report it, please let me know what device and operating system you are using.

In any case, it doesn’t look like there’s anything I can do to fix it. I haven’t changed anything in the way I generate equations in my posts recently, anyway. I guess we’ll just have to wait for someone at WordPress to fix it.

Update (24 November): it appears that this error has now been fixed by WordPress staff. If you still see “Latex path not specified” errors on my pages, try refreshing the page. If that doesn’t work, try clearing your browser’s cache (instructions here) and then refreshing the page.

Creation and annihilation operators: commutators and anticommutators

References: Amitabha Lahiri & P. B. Pal, A First Book of Quantum Field Theory, Second Edition (Alpha Science International, 2004) – Chapter 1, Problems 1.1 – 1.2.

As a bit of background to the quantum field theoretic use of creation and annihilation operators we’ll look again at the harmonic oscillator. The creation and annihilation operators (called raising and lowering operators by Griffiths) are defined in terms of the position and momentum operators as

 $\displaystyle a^{\dagger}$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2\hbar m\omega}}\left[-ip+m\omega x\right]\ \ \ \ \ (1)$ $\displaystyle a$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2\hbar m\omega}}\left[ip+m\omega x\right] \ \ \ \ \ (2)$

From the commutator ${\left[x,p\right]=i\hbar}$ we can work out

 $\displaystyle \left[a,a^{\dagger}\right]$ $\displaystyle =$ $\displaystyle \frac{1}{2\hbar m\omega}\left(-im\omega\left[x,p\right]\right)\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 1 \ \ \ \ \ (4)$

The annihilation operator ${a}$ acting on the vacuum or ground state ${\left|0\right\rangle }$ gives 0, and the creation operator ${a^{\dagger}}$ produces a state ${a^{\dagger}\left|0\right\rangle =\left|1\right\rangle }$ with energy eigenvalue ${\frac{3}{2}\hbar\omega}$. Successive applications of ${a^{\dagger}}$ produce states with higher energy, where each quantum of energy is ${\hbar\omega}$.

Normalization

Given that the ground state is normalized so that ${\left\langle \left.0\right|0\right\rangle =1}$, we can find the factor required to normalize higher states so that ${\left\langle \left.n\right|n\right\rangle =1}$. Consider ${n=2}$. We have

$\displaystyle a^{\dagger}a^{\dagger}\left|0\right\rangle =A\left|2\right\rangle \ \ \ \ \ (5)$

where ${A}$ is to be determined. We have

 $\displaystyle \left\langle 0\left|aaa^{\dagger}a^{\dagger}\right|0\right\rangle$ $\displaystyle =$ $\displaystyle \left\langle 0\left|a\left(1+a^{\dagger}a\right)a^{\dagger}\right|0\right\rangle \ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle 0\left|aa^{\dagger}\right|0\right\rangle +\left\langle 0\left|aa^{\dagger}aa^{\dagger}\right|0\right\rangle \ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle 0\left|\left(1+a^{\dagger}a\right)\right|0\right\rangle +\left\langle 0\left|aa^{\dagger}\left(1+a^{\dagger}a\right)\right|0\right\rangle \ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle \left.0\right|0\right\rangle +\left\langle 0\left|aa^{\dagger}\right|0\right\rangle \ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle \left.0\right|0\right\rangle +\left\langle 0\left|\left(1+a^{\dagger}a\right)\right|0\right\rangle \ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle \left.0\right|0\right\rangle +\left\langle \left.0\right|0\right\rangle \ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 2\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{A^{2}}\ \ \ \ \ (13)$ $\displaystyle A$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2}} \ \ \ \ \ (14)$

For ${n=3}$ we get ${\left\langle 0\left|aaaa^{\dagger}a^{\dagger}a^{\dagger}\right|0\right\rangle }$. We need to commute each ${a}$ through the ${a^{\dagger}}$ operators to its right. The first ${a}$ will generate the factor ${\left(1+a^{\dagger}a\right)}$ 3 times as it commutes with each ${a^{\dagger}}$ operator. Each of these terms will be ${\left\langle 0\left|aaa^{\dagger}a^{\dagger}\right|0\right\rangle }$ and we already know that this term produces a factor of 2. Therefore

$\displaystyle \left\langle 0\left|aaaa^{\dagger}a^{\dagger}a^{\dagger}\right|0\right\rangle =3\times2=6 \ \ \ \ \ (15)$

We can extend this result to the general case:

$\displaystyle \left\langle 0\left|a^{n}\left(a^{\dagger}\right)^{n}\right|0\right\rangle =n! \ \ \ \ \ (16)$

The normalization must then be

$\displaystyle \left|n\right\rangle =\frac{1}{\sqrt{n!}}\left(a^{\dagger}\right)^{n}\left|0\right\rangle \ \ \ \ \ (17)$

Number operator

We’ve met the number operator ${N}$ in the field case, but there is an analogous operator for the harmonic oscillator. We have

$\displaystyle N\equiv a^{\dagger}a \ \ \ \ \ (18)$

As with the field case, we can work out its commutators:

 $\displaystyle \left[N,a^{\dagger}\right]$ $\displaystyle =$ $\displaystyle a^{\dagger}aa^{\dagger}-a^{\dagger}a^{\dagger}a\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle a^{\dagger}a^{\dagger}a+a^{\dagger}-a^{\dagger}a^{\dagger}a\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle a^{\dagger}\ \ \ \ \ (21)$ $\displaystyle \left[N,a\right]$ $\displaystyle =$ $\displaystyle a^{\dagger}aa-aa^{\dagger}a\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle a^{\dagger}aa-a+a^{\dagger}aa\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -a \ \ \ \ \ (24)$

Applying this to ${\left|n\right\rangle }$ we get

$\displaystyle N\left|n\right\rangle =\frac{1}{\sqrt{n!}}N\left(a^{\dagger}\right)^{n}\left|0\right\rangle \ \ \ \ \ (25)$

We get

 $\displaystyle N\left(a^{\dagger}\right)^{n}$ $\displaystyle =$ $\displaystyle \left[a^{\dagger}+a^{\dagger}N\right]\left(a^{\dagger}\right)^{n-1}\ \ \ \ \ (26)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(a^{\dagger}\right)^{n}+\left(a^{\dagger}\right)^{2}\left(1+N\right)\left(a^{\dagger}\right)^{n-2}\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \ldots\ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle =$ $\displaystyle n\left(a^{\dagger}\right)^{n}+\left(a^{\dagger}\right)^{n}N\ \ \ \ \ (29)$ $\displaystyle$ $\displaystyle =$ $\displaystyle n\left(a^{\dagger}\right)^{n}+\left(a^{\dagger}\right)^{n}a^{\dagger}a \ \ \ \ \ (30)$

When operating on ${\left|0\right\rangle }$, the last term gives 0, so

$\displaystyle N\left|n\right\rangle =\frac{n}{\sqrt{n!}}\left(a^{\dagger}\right)^{n}\left|0\right\rangle \ \ \ \ \ (31)$

Multiple oscillators

If we now have a system of ${N}$ non-interacting harmonic oscillators with equal masses and frequencies ${\omega_{i}}$, ${i=1,\ldots,N}$, the Hamiltonian is

$\displaystyle H=\frac{1}{2m}\sum_{i}\left(p_{i}^{2}+m^{2}\omega_{i}^{2}x_{i}^{2}\right) \ \ \ \ \ (32)$

Since the oscillators are not coupled, the creation and annihilation operators for different operators all commute, so that

$\displaystyle \left[a_{i},a_{j}^{\dagger}\right]=\delta_{ij} \ \ \ \ \ (33)$

so the normalized state where oscillator ${i}$ is in the ${n_{i}}$th excited state is

$\displaystyle \left|n_{1}n_{2}\ldots n_{N}\right\rangle =\prod_{i=1}^{N}\frac{\left(a_{i}^{\dagger}\right)^{n_{i}}}{\sqrt{n_{i}!}}\left|0\right\rangle \ \ \ \ \ (34)$

The number operator in this case is

$\displaystyle \mathcal{N}=\sum_{i=1}^{N}\left(a_{i}^{\dagger}a_{i}\right) \ \ \ \ \ (35)$

This works because the commutation relation 33 allows each term ${a_{i}^{\dagger}a_{i}}$ in the sum to pick out the number of quanta of oscillator ${i}$.

Anticommutators

Now suppose that instead of the commutation relations 33 we have anticommutation relations as follows:

 $\displaystyle \left\{ a_{i},a_{j}^{\dagger}\right\}$ $\displaystyle \equiv$ $\displaystyle a_{i}a_{j}+a_{j}a_{i}=\delta_{ij}\ \ \ \ \ (36)$ $\displaystyle \left\{ a_{i}^{\dagger},a_{j}^{\dagger}\right\}$ $\displaystyle =$ $\displaystyle \left\{ a_{i},a_{j}\right\} =0 \ \ \ \ \ (37)$

If we start with the vacuum state ${\left|0\right\rangle }$ and require ${a_{i}^{\dagger}\left|0\right\rangle =\left|0\ldots1_{i}\ldots0\right\rangle }$ (that is, ${a_{i}^{\dagger}}$ creates one quantum in category ${i}$), then if we try to create another quantum in the same state, we get

 $\displaystyle \left\langle 0\left|a_{i}a_{i}a_{i}^{\dagger}a_{i}^{\dagger}\right|0\right\rangle$ $\displaystyle =$ $\displaystyle \left\langle 0\left|a_{i}\left(1-a_{i}^{\dagger}a_{i}\right)a_{i}^{\dagger}\right|0\right\rangle \ \ \ \ \ (38)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle 0\left|a_{i}a_{i}^{\dagger}\right|0\right\rangle -\left\langle 0\left|a_{i}a_{i}^{\dagger}a_{i}a_{i}^{\dagger}\right|0\right\rangle \ \ \ \ \ (39)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle 0\left|a_{i}a_{i}^{\dagger}\right|0\right\rangle -\left\langle 0\left|a_{i}a_{i}^{\dagger}\left(1-a_{i}^{\dagger}a_{i}\right)\right|0\right\rangle \ \ \ \ \ (40)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle 0\left|a_{i}a_{i}^{\dagger}\right|0\right\rangle -\left\langle 0\left|a_{i}a_{i}^{\dagger}\right|0\right\rangle +\left\langle 0\left|a_{i}a_{i}^{\dagger}a_{i}^{\dagger}a_{i}\right|0\right\rangle \ \ \ \ \ (41)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (42)$

Thus, attempting to create two quanta in the same state produces zero, so at most one quantum can occupy each state. The commutator case 33 thus behaves like bosons and the anticommutator case like fermions.

Number operator

References: Mark Srednicki, Quantum Field Theory, (Cambridge University Press, 2007) – Chapter 1, Problem 1.3.

The number operator is defined as

$\displaystyle N\equiv\int d^{3}x\;a^{\dagger}\left(\mathbf{x}\right)a\left(\mathbf{x}\right) \ \ \ \ \ (1)$

Applied to a quantum state, it counts the number of particles in that state:

$\displaystyle Na^{\dagger}\left(\mathbf{x}_{1}\right)\ldots a^{\dagger}\left(\mathbf{x}_{n}\right)\left|0\right\rangle =na^{\dagger}\left(\mathbf{x}_{1}\right)\ldots a^{\dagger}\left(\mathbf{x}_{n}\right)\left|0\right\rangle \ \ \ \ \ (2)$

Another property of ${N}$ is that it commutes with any other operator that contains an equal number of creation and annihilation operators. To see this, look at the individual commutators as follows (where ${a_{i}\equiv a\left(\mathbf{x}_{i}\right)}$).

 $\displaystyle \left[N,a_{i}^{\dagger}\right]$ $\displaystyle =$ $\displaystyle \int d^{3}x\;\left(a^{\dagger}\left(\mathbf{x}\right)a\left(\mathbf{x}\right)a_{i}^{\dagger}-a_{i}^{\dagger}a^{\dagger}\left(\mathbf{x}\right)a\left(\mathbf{x}\right)\right)\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int d^{3}x\;\left[a^{\dagger}\left(\mathbf{x}\right)\left(\delta\left(\mathbf{x}-\mathbf{x}_{i}\right)+a_{i}^{\dagger}a\left(\mathbf{x}\right)\right)-a_{i}^{\dagger}a^{\dagger}\left(\mathbf{x}\right)a\left(\mathbf{x}\right)\right]\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int d^{3}x\;a^{\dagger}\left(\mathbf{x}\right)\delta\left(\mathbf{x}-\mathbf{x}_{i}\right)\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle a_{i}^{\dagger}\ \ \ \ \ (6)$ $\displaystyle \left[N,a_{i}\right]$ $\displaystyle =$ $\displaystyle \int d^{3}x\;\left(a^{\dagger}\left(\mathbf{x}\right)a\left(\mathbf{x}\right)a_{i}-a_{i}a^{\dagger}\left(\mathbf{x}\right)a\left(\mathbf{x}\right)\right)\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int d^{3}x\;\left[a^{\dagger}\left(\mathbf{x}\right)a\left(\mathbf{x}\right)a_{i}-\left(\delta\left(\mathbf{x}-\mathbf{x}_{i}\right)+a^{\dagger}\left(\mathbf{x}\right)a_{i}\right)a\left(\mathbf{x}\right)\right]\ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\int d^{3}x\;a\left(\mathbf{x}\right)\delta\left(\mathbf{x}-\mathbf{x}_{i}\right)\ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -a_{i} \ \ \ \ \ (10)$

Here we’ve used the commutation relations

 $\displaystyle \left[a\left(\mathbf{x}\right),a\left(\mathbf{x}^{\prime}\right)\right]$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (11)$ $\displaystyle \left[a^{\dagger}\left(\mathbf{x}\right),a^{\dagger}\left(\mathbf{x}^{\prime}\right)\right]$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (12)$ $\displaystyle \left[a\left(\mathbf{x}\right),a^{\dagger}\left(\mathbf{x}^{\prime}\right)\right]$ $\displaystyle =$ $\displaystyle \delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (13)$

Now suppose we have an operator ${X}$ which contains ${n}$ creation operators ${a_{i}^{\dagger}}$, ${i=1,\ldots,n}$ and ${m}$ annihiliation operators ${a_{j}}$, ${j=1,\ldots,m}$:

$\displaystyle X=a_{i1}^{\dagger}\ldots a_{in}^{\dagger}a_{j1}\ldots a_{jm} \ \ \ \ \ (14)$

Then

 $\displaystyle \left[N,X\right]$ $\displaystyle =$ $\displaystyle Na_{i1}^{\dagger}\ldots a_{in}^{\dagger}a_{j1}\ldots a_{jm}-a_{i1}^{\dagger}\ldots a_{in}^{\dagger}a_{j1}\ldots a_{jm}N\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(a_{i1}^{\dagger}N+a_{i1}^{\dagger}\right)a_{i2}^{\dagger}\ldots a_{in}^{\dagger}a_{j1}\ldots a_{jm}-a_{i1}^{\dagger}\ldots a_{in}^{\dagger}a_{j1}\ldots a_{jm}N\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle X+a_{i1}^{\dagger}\left[N,a_{i2}^{\dagger}\ldots a_{in}^{\dagger}a_{j1}\ldots a_{jm}\right] \ \ \ \ \ (17)$

We can see that the commutator in the last line can be worked out recursively until we’ve processed all the creation operators up to ${a_{in}^{\dagger}}$, giving

$\displaystyle \left[N,X\right]=nX+a_{i1}^{\dagger}\ldots a_{in}^{\dagger}\left[N,a_{j1}\ldots a_{jm}\right] \ \ \ \ \ (18)$

The last commutator gives us

 $\displaystyle \left[N,a_{j1}\ldots a_{jm}\right]$ $\displaystyle =$ $\displaystyle Na_{j1}\ldots a_{jm}-a_{j1}\ldots a_{jm}N\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(a_{j1}N-a_{j1}\right)a_{j2}\ldots a_{jm}-a_{j1}\ldots a_{jm}N\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -a_{j1}\ldots a_{jm}+a_{j1}\left[N,a_{j2}\ldots a_{jm}\right]\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -m\left(a_{j1}\ldots a_{jm}\right) \ \ \ \ \ (22)$

Therefore

 $\displaystyle a_{i1}^{\dagger}\ldots a_{in}^{\dagger}\left[N,a_{j1}\ldots a_{jm}\right]$ $\displaystyle =$ $\displaystyle -m\left(a_{i1}^{\dagger}\ldots a_{in}^{\dagger}a_{j1}\ldots a_{jm}\right)\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -mX\ \ \ \ \ (24)$ $\displaystyle \left[N,X\right]$ $\displaystyle =$ $\displaystyle \left(n-m\right)X \ \ \ \ \ (25)$

So if ${n=m}$ (the numbers of creation and annihiliation operators are equal), the operator ${X}$ commutes with ${N}$. In particular, the hamiltonian we met last time satisfies this criterion, so ${\left[N,H\right]=0}$ and this hamiltonian conserves particle numbers.