# Spherical Bessel functions – behaviour for small arguments

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Chapter 12, Exercise 12.6.7.

[If some equations are too small to read easily, use your browser’s magnifying option (Ctrl + on Chrome, probably something similar on other browsers).]

The general solution for a free particle in spherical coordinates involves the radial function, which turns out to be

$\displaystyle \frac{R_{l+1}}{\rho^{l+1}}=\left(-\frac{1}{\rho}\frac{d}{d\rho}\right)^{l+1}\frac{R_{0}}{\rho^{0}} \ \ \ \ \ (1)$

where ${l}$ is the total angular momentum quantum number and

 $\displaystyle k^{2}$ $\displaystyle \equiv$ $\displaystyle \frac{2\mu E}{\hbar^{2}}\ \ \ \ \ (2)$ $\displaystyle \rho$ $\displaystyle \equiv$ $\displaystyle kr \ \ \ \ \ (3)$

We can rewrite this as

$\displaystyle R_{l}=\left(-\rho\right)^{l}\left(\frac{1}{\rho}\frac{d}{d\rho}\right)^{l}R_{0} \ \ \ \ \ (4)$

We saw earlier that the solutions for ${l=0}$ are, with ${U_{l}=\rho R_{l}}$

 $\displaystyle U_{0}^{A}\left(\rho\right)$ $\displaystyle =$ $\displaystyle \sin\rho\ \ \ \ \ (5)$ $\displaystyle U_{0}^{B}\left(\rho\right)$ $\displaystyle =$ $\displaystyle -\cos\rho \ \ \ \ \ (6)$

Thus the two solutions for ${l=0}$ are

 $\displaystyle R_{0}^{A}$ $\displaystyle =$ $\displaystyle \frac{\sin\rho}{\rho}\ \ \ \ \ (7)$ $\displaystyle R_{0}^{B}$ $\displaystyle =$ $\displaystyle -\frac{\cos\rho}{\rho} \ \ \ \ \ (8)$

From these starting points, we can generate all the solutions for higher values of ${l}$ using 4. These functions are

 $\displaystyle j_{l}\left(\rho\right)$ $\displaystyle =$ $\displaystyle \left(-\rho\right)^{l}\left(\frac{1}{\rho}\frac{d}{d\rho}\right)^{l}\frac{\sin\rho}{\rho}\ \ \ \ \ (9)$ $\displaystyle n_{l}\left(\rho\right)$ $\displaystyle =$ $\displaystyle -\left(-\rho\right)^{l}\left(\frac{1}{\rho}\frac{d}{d\rho}\right)^{l}\frac{\cos\rho}{\rho} \ \ \ \ \ (10)$

and are known as spherical Bessel functions ${j_{l}}$ and spherical Neumann functions ${n_{l}}$.

The asymptotic behaviour is given by

 $\displaystyle j_{l}$ $\displaystyle \underset{\rho\rightarrow\infty}{\longrightarrow}$ $\displaystyle \frac{1}{\rho}\sin\left(\rho-\frac{l\pi}{2}\right)\ \ \ \ \ (11)$ $\displaystyle n_{l}$ $\displaystyle \underset{\rho\rightarrow\infty}{\longrightarrow}$ $\displaystyle -\frac{1}{\rho}\cos\left(\rho-\frac{l\pi}{2}\right) \ \ \ \ \ (12)$

For ${\rho\rightarrow0}$, we have

 $\displaystyle j_{l}$ $\displaystyle \underset{\rho\rightarrow0}{\longrightarrow}$ $\displaystyle \frac{\rho^{l}}{\left(2l+1\right)!!}\ \ \ \ \ (13)$ $\displaystyle n_{l}$ $\displaystyle \underset{\rho\rightarrow0}{\longrightarrow}$ $\displaystyle -\frac{\left(2l-1\right)!!}{\rho^{l+1}} \ \ \ \ \ (14)$

We can verify the latter equation for ${j_{l}}$ for a couple of cases with small ${l}$. From 9, we can generate a couple of ${j_{l}}$s:

 $\displaystyle j_{0}$ $\displaystyle =$ $\displaystyle \frac{\sin\rho}{\rho}\ \ \ \ \ (15)$ $\displaystyle j_{1}$ $\displaystyle =$ $\displaystyle -\rho\frac{1}{\rho}\frac{d}{dr}\left(\frac{\sin\rho}{\rho}\right)\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\rho\frac{1}{\rho}\left(\frac{\cos\rho}{\rho}-\frac{\sin\rho}{\rho^{2}}\right)\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\sin\rho}{\rho^{2}}-\frac{\cos\rho}{\rho}\ \ \ \ \ (18)$ $\displaystyle j_{2}$ $\displaystyle =$ $\displaystyle \left(-\rho\right)^{2}\frac{1}{\rho}\frac{d}{d\rho}\left[\frac{1}{\rho}\frac{d}{dr}\left(\frac{\sin\rho}{\rho}\right)\right]\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle$ $\displaystyle \left(-\rho\right)^{2}\frac{1}{\rho}\frac{d}{d\rho}\left[\frac{1}{\rho}\left(\frac{\cos\rho}{\rho}-\frac{\sin\rho}{\rho^{2}}\right)\right]\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\frac{3}{\rho^{3}}-\frac{1}{\rho}\right)\sin\rho-\frac{3\cos\rho}{\rho^{2}} \ \ \ \ \ (21)$

We can get the limits for ${\rho\rightarrow0}$ by expanding the sine and cosine. That is, we use the limiting forms

 $\displaystyle \sin\rho$ $\displaystyle \rightarrow$ $\displaystyle \rho-\frac{\rho^{3}}{3!}+\ldots\ \ \ \ \ (22)$ $\displaystyle \cos\rho$ $\displaystyle \rightarrow$ $\displaystyle 1-\frac{1}{2}\rho^{2}+\ldots \ \ \ \ \ (23)$

We need to retain enough terms for ${j_{l}}$ so that we get all the terms up to the first power of ${\rho}$ that doesn’t cancel out when we do the algebra. We get

 $\displaystyle j_{0}$ $\displaystyle \rightarrow$ $\displaystyle 1=\frac{\rho^{0}}{1!!}\ \ \ \ \ (24)$ $\displaystyle j_{1}$ $\displaystyle \rightarrow$ $\displaystyle \frac{1}{\rho}-\frac{\rho}{6}-\frac{1}{\rho}\left(1-\frac{1}{2}\rho^{2}\right)\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\rho}{3}=\frac{\rho^{1}}{3!!}\ \ \ \ \ (26)$ $\displaystyle j_{2}$ $\displaystyle \rightarrow$ $\displaystyle \left(\frac{3}{\rho^{3}}-\frac{1}{\rho}\right)\left(\rho-\frac{\rho^{3}}{6}+\frac{\rho^{5}}{120}\right)-\frac{3}{\rho^{2}}\left(1-\frac{1}{2}\rho^{2}+\frac{1}{24}\rho^{4}\right)\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle \rightarrow$ $\displaystyle \left(\frac{1}{6}+\frac{1}{40}-\frac{1}{8}\right)\rho^{2}\ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{20+3-15}{120}\rho^{2}\ \ \ \ \ (29)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\rho^{2}}{15}=\frac{\rho^{2}}{5!!} \ \ \ \ \ (30)$

# Equation display may be misaligned

Today I switched the plugin used to display math equations in Latex to the “Beautiful maths” feature of Jetpack. Briefly, this caused some of the equations to appear out of alignment. In particular, in multiple-line equations, the equals sign may not be centered with respect to the rest of the equation.

If you see this, do a complete reload of the page (Ctrl + F5 on most desktop browsers). On mobiles and tablets without either Ctrl or F5 keys, you will probably have to clear the cache (which you can do on Android devices by selecting History from the menu of the Chrome browser; sorry but I don’t know if this works on other devices or device browsers).

# Direct product of vector spaces: 2-dim examples

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Chapter 10, Exercise 10.1.2.

To help with understanding the direct product of two vector spaces, some examples with a couple of 2-d vector spaces are useful. Suppose the one-particle Hilbert space is two-dimensional, with basis vectors ${\left|+\right\rangle }$ and ${\left|-\right\rangle }$. Now suppose we have two such particles, each in its own 2-d space, ${\mathbb{V}_{1}}$ for particle 1 and ${\mathbb{V}_{2}}$ for particle 2. We can define a couple of operators by their matrix elements in these two spaces. We define

 $\displaystyle \sigma_{1}^{\left(1\right)}$ $\displaystyle \equiv$ $\displaystyle \left[\begin{array}{cc} a & b\\ c & d \end{array}\right]\ \ \ \ \ (1)$ $\displaystyle \sigma_{2}^{\left(2\right)}$ $\displaystyle \equiv$ $\displaystyle \left[\begin{array}{cc} e & f\\ g & h \end{array}\right] \ \ \ \ \ (2)$

where the first column and row refer to basis vector ${\left|+\right\rangle }$ and the second column and row to ${\left|-\right\rangle }$. Recall that the subscript on each ${\sigma}$ refers to the particle and the superscript refers to the vector space. Thus ${\sigma_{1}^{\left(1\right)}}$ is an operator in space ${\mathbb{V}_{1}}$ for particle 1.

Now consider the direct product space ${\mathbb{V}_{1}\otimes\mathbb{V}_{2}}$, which is spanned by the four basis vectors formed by direct products of the two basis vectors in each of the one-particle spaces, that is by ${\left|+\right\rangle \otimes\left|+\right\rangle }$, ${\left|+\right\rangle \otimes\left|-\right\rangle }$, ${\left|-\right\rangle \otimes\left|+\right\rangle }$ and ${\left|-\right\rangle \otimes\left|-\right\rangle }$. Each of the ${\sigma}$ operators has a corresponding version in the product space, which is formed by taking the direct product of the one-particle version for one of the particles with the identity operator for the other particle. That is

 $\displaystyle \sigma_{1}^{\left(1\right)\otimes\left(2\right)}$ $\displaystyle =$ $\displaystyle \sigma_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\ \ \ \ \ (3)$ $\displaystyle \sigma_{2}^{\left(1\right)\otimes\left(2\right)}$ $\displaystyle =$ $\displaystyle I^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)} \ \ \ \ \ (4)$

To get the matrix elements in the product space, we need the form of the identity operators in the one-particle spaces. They are, as usual

 $\displaystyle I^{\left(1\right)}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]\ \ \ \ \ (5)$ $\displaystyle I^{\left(2\right)}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right] \ \ \ \ \ (6)$

I’ve written the two identity operators as separate equations since although they have the same numerical form as a matrix, the two operators operate on different spaces, so they are technically different operators. To get the matrix elements of ${\sigma_{1}^{\left(1\right)\otimes\left(2\right)}}$ we can expand the direct product (Shankar suggests using the ‘method of images’, although I have no idea what this is. I doubt that it’s the same method of images used in electrostatics, and Google draws a blank for any other kind of method of images.) In any case, we can form the product by taking the corresponding matrix elements. For example

 $\displaystyle \left\langle ++\left|\sigma_{1}^{\left(1\right)\otimes\left(2\right)}\right|++\right\rangle$ $\displaystyle =$ $\displaystyle \left(\left\langle +\right|\otimes\left\langle +\right|\right)\sigma_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\left(\left|+\right\rangle \otimes\left|+\right\rangle \right)\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle +\left|\sigma_{1}^{\left(1\right)}\right|+\right\rangle \left\langle +\left|I^{\left(2\right)}\right|+\right\rangle \ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle a\times1=a \ \ \ \ \ (9)$

When working out the RHS of the first line, remember that operators with a superscript (1) operate only on bras and kets from the space ${\mathbb{V}_{1}}$ and operators with a superscript (2) operate only on bras and kets from the space ${\mathbb{V}_{2}}$. Applying the same technique for the remaining elements gives

$\displaystyle \sigma_{1}^{\left(1\right)\otimes\left(2\right)}=\sigma_{1}^{\left(1\right)}\otimes I^{\left(2\right)}=\left[\begin{array}{cccc} a & 0 & b & 0\\ 0 & a & 0 & b\\ c & 0 & d & 0\\ 0 & c & 0 & d \end{array}\right] \ \ \ \ \ (10)$

Another less tedious way of getting this result is to note that we can form the direct product by taking each element in the first matrix ${\sigma_{1}^{\left(1\right)}}$ from 1 and multiply it into the second matrix ${I^{\left(2\right)}}$ from 6. Thus the top ${2\times2}$ elements in ${\sigma_{1}^{\left(1\right)\otimes\left(2\right)}}$ are obtained by taking the element ${\left\langle +\left|\sigma_{1}^{\left(1\right)}\right|+\right\rangle =a}$ from 1 and multiplying it into the matrix ${I^{\left(2\right)}}$ from 6. That is, the upper left ${2\times2}$ block is formed from

 $\displaystyle aI_{2\times2}^{\left(2\right)}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} a & 0\\ 0 & a \end{array}\right] \ \ \ \ \ (11)$

and so on for the other three ${2\times2}$ blocks in the complete matrix. Note that it’s important to get things in the right order, as the direct product is not commutative.

To get the other direct product, we can apply the same technique:

$\displaystyle \sigma_{2}^{\left(1\right)\otimes\left(2\right)}=I^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)}=\left[\begin{array}{cccc} e & f & 0 & 0\\ g & h & 0 & 0\\ 0 & 0 & e & f\\ 0 & 0 & g & h \end{array}\right] \ \ \ \ \ (12)$

Again, note that

$\displaystyle I^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)}\ne\sigma_{2}^{\left(2\right)}\otimes I^{\left(1\right)}=\left[\begin{array}{cccc} e & 0 & f & 0\\ 0 & e & 0 & f\\ g & 0 & h & 0\\ 0 & g & 0 & h \end{array}\right] \ \ \ \ \ (13)$

Finally, we can work out the direct product version of the product of two one-particle operators. That is, we want

$\displaystyle \left(\sigma_{1}\sigma_{2}\right)^{\left(1\right)\otimes\left(2\right)}=\sigma_{1}^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)} \ \ \ \ \ (14)$

We can do this in two ways. First, we can apply the same recipe as in the previous example. We take each element of ${\sigma_{1}^{\left(1\right)}}$ and multiply it into the full matrix ${\sigma_{2}^{\left(2\right)}}$:

 $\displaystyle \sigma_{1}^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} ae & af & be & bf\\ ag & ah & bg & bh\\ ce & cf & de & df\\ cg & ch & dg & dh \end{array}\right] \ \ \ \ \ (15)$

Second, we can take the matrix product of ${\sigma_{1}^{\left(1\right)\otimes\left(2\right)}}$ from 10 with ${\sigma_{2}^{\left(1\right)\otimes\left(2\right)}}$ from 12:

 $\displaystyle \left(\sigma_{1}\sigma_{2}\right)^{\left(1\right)\otimes\left(2\right)}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} a & 0 & b & 0\\ 0 & a & 0 & b\\ c & 0 & d & 0\\ 0 & c & 0 & d \end{array}\right]\left[\begin{array}{cccc} e & f & 0 & 0\\ g & h & 0 & 0\\ 0 & 0 & e & f\\ 0 & 0 & g & h \end{array}\right]\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} ae & af & be & bf\\ ag & ah & bg & bh\\ ce & cf & de & df\\ cg & ch & dg & dh \end{array}\right] \ \ \ \ \ (17)$

# WordPress help requested

As regular visitors will know, this blog occasionally goes off line due to problems connecting to the WordPress database that stores the posts. I have contacted my hosting provider and they say that this is due to an excessive number of database connections that are opened but then not closed again, but are unable to provide any help beyond that.

As I do not want to mess with any of the WordPress code (for two reasons: 1 – changing the code could introduce security problems and 2 – I don’t know enough about either WordPress or PHP to mess with their code) I was wondering if any readers have experience with running WordPress blogs and know of any ways to prevent excessive database access. I have just installed the “W3 Total Cache” plugin which may help, but if anyone has any other suggestions I’d be very grateful.

# Thermodynamics of harmonic oscillators – classical and quantum

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 7.5, Exercise 7.5.4.

One application of harmonic oscillator theory is in the behaviour of crystals as a function of temperature. A reasonable model of a crystal is of a number of atoms that vibrate as harmonic oscillators. From statistical mechanics, the probability ${P\left(i\right)}$ of finding a system in a state ${i}$ is given by the Boltzmann formula

$\displaystyle P\left(i\right)=\frac{e^{-\beta E\left(i\right)}}{Z} \ \ \ \ \ (1)$

where ${\beta=1/kT}$, with ${k}$ being Boltzmann’s constant and ${T}$ the absolute temperature, and ${Z}$ is the partition function

$\displaystyle Z=\sum_{i}e^{-\beta E\left(i\right)} \ \ \ \ \ (2)$

The thermal average energy of the system is then

 $\displaystyle \bar{E}$ $\displaystyle =$ $\displaystyle \sum_{i}E\left(i\right)P\left(i\right)\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\sum_{i}E\left(i\right)e^{-\beta E\left(i\right)}}{Z}\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\frac{\partial\left(\ln Z\right)}{\partial\beta} \ \ \ \ \ (5)$

For a classical harmonic oscillator, the energy is a continuous function of the position ${x}$ and momentum ${p}$:

$\displaystyle E_{cl}=\frac{p^{2}}{2m}+\frac{1}{2}m\omega^{2}x^{2} \ \ \ \ \ (6)$

The classical partition function is then

 $\displaystyle Z_{cl}$ $\displaystyle =$ $\displaystyle \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-\beta p^{2}/2m}e^{-\beta m\omega^{2}x^{2}/2}dp\;dx\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int_{-\infty}^{\infty}e^{-\beta p^{2}/2m}dp\int_{-\infty}^{\infty}e^{-\beta m\omega^{2}x^{2}/2}dx\ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sqrt{\frac{2\pi m}{\beta}}\sqrt{\frac{2\pi}{\beta m\omega^{2}}}\ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{2\pi}{\omega\beta} \ \ \ \ \ (10)$

Where we used the standard formula for Gaussian integrals to get the third line. The average classical energy is, from 5

$\displaystyle \bar{E}_{cl}=-\frac{\partial\left(\ln Z_{cl}\right)}{\partial\beta}=\frac{1}{\beta}=kT \ \ \ \ \ (11)$

The average energy of a classical oscillator thus depends only on the temperature, and not on the frequency ${\omega}$.

For a quantum oscillator, the energies are quantized with values of

$\displaystyle E\left(n\right)=\hbar\omega\left(n+\frac{1}{2}\right) \ \ \ \ \ (12)$

The quantum partition function is therefore

$\displaystyle Z_{qu}=e^{-\beta\hbar\omega/2}\sum_{n=0}^{\infty}e^{-\beta\hbar\omega n} \ \ \ \ \ (13)$

The sum is a geometric series, so we can use the standard result for ${\left|x\right|<1}$:

$\displaystyle \sum_{n=0}^{\infty}x^{n}=\frac{1}{1-x} \ \ \ \ \ (14)$

This gives

$\displaystyle Z_{qu}=\frac{e^{-\beta\hbar\omega/2}}{1-e^{-\beta\hbar\omega}} \ \ \ \ \ (15)$

The mean quantum energy is again found from 5, although this time the derivative is a bit messier, so is most easily done using Maple. However, by hand, you’d get

 $\displaystyle \bar{E}_{qu}$ $\displaystyle =$ $\displaystyle -\frac{\partial\left(\ln Z_{qu}\right)}{\partial\beta}\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1-e^{-\beta\hbar\omega}}{e^{-\beta\hbar\omega/2}}\left[-\frac{1}{2}\frac{\hbar\omega e^{-\beta\hbar\omega/2}}{1-e^{-\beta\hbar\omega}}-\frac{\hbar\omega e^{-\beta\hbar\omega/2}e^{-\beta\hbar\omega}}{\left(1-e^{-\beta\hbar\omega}\right)^{2}}\right]\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\hbar\omega}{2}\left(\frac{1+e^{-\beta\hbar\omega}}{1-e^{-\beta\hbar\omega}}\right)\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\hbar\omega}{2}\left(\frac{1-e^{-\beta\hbar\omega}+2e^{-\beta\hbar\omega}}{1-e^{-\beta\hbar\omega}}\right)\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \hbar\omega\left(\frac{1}{2}+\frac{1}{e^{\beta\hbar\omega}-1}\right) \ \ \ \ \ (20)$

The average energy is the ground state energy ${\hbar\omega/2}$ plus a quantity that increases with increasing temperature (decreasing ${\beta}$). For small ${\beta}$ we have

 $\displaystyle \bar{E}_{qu}$ $\displaystyle \rightarrow$ $\displaystyle \hbar\omega\left(\frac{1}{2}+\frac{1}{1+\beta\hbar\omega-1}\right)\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\hbar\omega}{2}+\frac{1}{\beta}\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle \rightarrow$ $\displaystyle kT \ \ \ \ \ (23)$

since as ${\beta\rightarrow0}$, ${\frac{1}{\beta}\gg\frac{\hbar\omega}{2}}$. Thus the quantum energy reduces to the classical energy 11 for high temperatures. The ‘high temperature’ condition is that

 $\displaystyle \frac{1}{\beta}$ $\displaystyle \gg$ $\displaystyle \frac{\hbar\omega}{2}\ \ \ \ \ (24)$ $\displaystyle T$ $\displaystyle \gg$ $\displaystyle \frac{\hbar\omega}{2k} \ \ \ \ \ (25)$

So far, we’ve considered the average behaviour of only one oscillator. Suppose we now have a 3-d crystal with ${N_{0}}$ atoms. Assuming small oscillations we can approximate its behaviour by a system of ${3N_{0}}$ decoupled oscillators. In the classical case, the average energy is found from 11:

$\displaystyle \bar{\mathcal{E}}_{cl}=3N_{0}\bar{E}_{cl}=3N_{0}kT \ \ \ \ \ (26)$

The heat capacity per atom is the amount of heat (energy) ${\Delta E}$ required to raise the temperature by ${\Delta T}$, so

$\displaystyle C_{cl}=\frac{1}{N_{0}}\frac{\partial\bar{\mathcal{E}}_{cl}}{\partial T}=3k \ \ \ \ \ (27)$

For the quantum system, we have from 20

 $\displaystyle \bar{\mathcal{E}}_{qu}$ $\displaystyle =$ $\displaystyle 3N_{0}\bar{E}_{qu}\ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 3N_{0}\hbar\omega\left(\frac{1}{2}+\frac{1}{e^{\beta\hbar\omega}-1}\right) \ \ \ \ \ (29)$

The quantum heat capacity is therefore

 $\displaystyle C_{qu}$ $\displaystyle =$ $\displaystyle \frac{1}{N_{0}}\frac{\partial\bar{\mathcal{E}}_{qu}}{\partial T}\ \ \ \ \ (30)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 3\hbar\omega\frac{\partial}{\partial\beta}\left(\frac{1}{e^{\beta\hbar\omega}-1}\right)\frac{d\beta}{dT}\ \ \ \ \ (31)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 3\frac{\hbar^{2}\omega^{2}}{kT^{2}}\frac{e^{\hbar\omega/kT}}{\left(e^{\beta\hbar\omega}-1\right)^{2}} \ \ \ \ \ (32)$

We can define the Einstein temperature as

$\displaystyle \theta_{E}\equiv\frac{\hbar\omega}{k} \ \ \ \ \ (33)$

which gives the heat capacity as

$\displaystyle C_{qu}=3k\frac{\theta_{E}^{2}}{T^{2}}\frac{e^{\theta_{E}/T}}{\left(e^{\theta_{E}/T}-1\right)^{2}} \ \ \ \ \ (34)$

For large temperatures, the exponent ${\theta_{E}/T}$ becomes small, so we have

 $\displaystyle C_{qu}$ $\displaystyle \underset{T\gg\theta_{E}}{\longrightarrow}$ $\displaystyle 3k\frac{\theta_{E}^{2}}{T^{2}}\frac{1+\theta_{E}/T}{\left(1+\theta_{E}/T-1\right)^{2}}\ \ \ \ \ (35)$ $\displaystyle$ $\displaystyle \rightarrow$ $\displaystyle 3k \ \ \ \ \ (36)$

For low temperatures ${e^{\theta_{E}/T}\gg1}$ so we have

 $\displaystyle C_{qu}$ $\displaystyle \underset{T\ll\theta_{E}}{\longrightarrow}$ $\displaystyle 3k\frac{\theta_{E}^{2}}{T^{2}}\frac{e^{\theta_{E}/T}}{e^{2\theta_{E}/T}}\ \ \ \ \ (37)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 3k\frac{\theta_{E}^{2}}{T^{2}}e^{-\theta_{E}/T} \ \ \ \ \ (38)$

The heat capacity again reduces to the classical value for high temperatures. The observed behaviour at low temperatures is that ${C_{qu}\rightarrow T^{3}}$, so this simple model fails for very low temperatures. However, as is shown by Shankar’s figure 7.3 Einstein’s quantum model is actually quite good for all but the lowest temperatures.

# 10 million hits

For anyone who is following such things, physicspages.com has just (in the past hour) had its 10 millionth hit. I’m still amazed and grateful that a site with so much mathematics on it has proved so popular. Many thanks to everyone who has visited.

# Infinite square well – force to decrease well width

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 5.2, Exercise 5.2.4.

One way of comparing the classical and quantum pictures of a particle in an infinite square well is to calculate the force exerted on the walls by the particle. If a particle is in state ${\left|n\right\rangle }$, its energy is

$\displaystyle E_{n}=\frac{\left(n\pi\hbar\right)^{2}}{2mL^{2}} \ \ \ \ \ (1)$

If the particle remains in this state as the walls are slowly pushed in, so that ${L}$ slowly decreases, then its energy ${E_{n}}$ will increase, meaning that work is done on the system. The force is the change in energy per unit distance, so the force required is

$\displaystyle F=-\frac{\partial E_{n}}{\partial L}=\frac{\left(n\pi\hbar\right)^{2}}{mL^{3}} \ \ \ \ \ (2)$

If we treat the system classically, then a particle with energy ${E_{n}}$ between the walls is effectively a free particle in this region (since the potential ${V=0}$ there), so all its energy is kinetic. That is

 $\displaystyle E_{n}$ $\displaystyle =$ $\displaystyle \frac{1}{2}mv^{2}\ \ \ \ \ (3)$ $\displaystyle v$ $\displaystyle =$ $\displaystyle \sqrt{\frac{2E_{n}}{m}}\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{n\pi\hbar}{mL} \ \ \ \ \ (5)$

The classical particle bounces elastically between the two walls, which means its velocity is exactly reversed at each collision. The momentum transfer in such a collision is

$\displaystyle \Delta p=2mv=\frac{2n\pi\hbar}{L} \ \ \ \ \ (6)$

The time between successive collisions on the same wall is

$\displaystyle \Delta t=\frac{2L}{v}=\frac{2mL^{2}}{n\pi\hbar} \ \ \ \ \ (7)$

Thus the average force exerted on one wall is

$\displaystyle \bar{F}=\frac{\Delta p}{\Delta t}=\frac{\left(n\pi\hbar\right)^{2}}{mL^{3}} \ \ \ \ \ (8)$

Comparing with 2, we see that the quantum and classical forces in this case are the same.

# Non-denumerable basis: position and momentum states

References: References: edX online course MIT 8.05 Section 5.6.

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.10; Exercises 1.10.1 – 1.10.3.

Although we’ve looked at position and momentum operators in quantum mechanics before, it’s worth another look at the ways that Zwiebach and Shankar introduce them.

First, we’ll have a look at Shankar’s treatment. He begins by considering a string fixed at each end, at positions ${x=0}$ and ${x=L}$, then asks how we could convey the shape of the string to an observer who cannot see the string directly. We could note the position at some fixed finite number of points between 0 and ${L}$, but then the remote observer would have only a partial knowledge of the string’s shape; the locations of those portions of the string between the points at which it was measured are still unknown, although the observer could probably get a reasonable picture by interpolating between these points.

We can increase the number of points at which the position is measured to get a better picture, but to convey the exact shape of the string, we need to measure its position at an infinite number of points. This is possible (in principle) but leads to a problem with the definition of the inner product. For two vectors defined on a finite vector space with an orthonormal basis, the inner product is given by the usual formula for the dot product:

 $\displaystyle \left\langle f\left|g\right.\right\rangle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{n}f_{i}g_{i}\ \ \ \ \ (1)$ $\displaystyle \left\langle f\left|f\right.\right\rangle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{n}f_{i}^{2} \ \ \ \ \ (2)$

where ${f_{i}}$ and ${g_{i}}$ are the components of ${f}$ and ${g}$ in the orthonormal basis. If we’re taking ${f}$ to be the displacement of a string and we try to increase the accuracy of the picture by increasing the number ${n}$ of points at which measurements are taken, then the value of ${\left\langle f\left|f\right.\right\rangle }$ continues to increase as ${n}$ increases (provided that ${f\ne0}$ everywhere). As ${n\rightarrow\infty}$ then ${\left\langle f\left|f\right.\right\rangle \rightarrow\infty}$ as well, even though the system we’re measuring (a string of finite length with finite displacement) is certainly not infinite in any practical sense.

Shankar proposes getting around this problem by simply redefining the inner product for a finite vector space to be

$\displaystyle \left\langle f\left|g\right.\right\rangle =\sum_{i=1}^{n}f\left(x_{i}\right)g\left(x_{i}\right)\Delta \ \ \ \ \ (3)$

where ${\Delta\equiv L/\left(n+1\right)}$. That is, ${\Delta}$ now becomes the distance between adjacent points at which measurements are taken. If we let ${n\rightarrow\infty}$ this leads to the definition of the inner product as an integral

 $\displaystyle \left\langle f\left|g\right.\right\rangle$ $\displaystyle =$ $\displaystyle \int_{0}^{L}f\left(x\right)g\left(x\right)\;dx\ \ \ \ \ (4)$ $\displaystyle \left\langle f\left|f\right.\right\rangle$ $\displaystyle =$ $\displaystyle \int_{0}^{L}f^{2}\left(x\right)\;dx \ \ \ \ \ (5)$

This looks familiar enough, if you’ve done any work with inner products in quantum mechanics, but there is a subtle point which Shankar overlooks. In going from 1 to 3, we have introduced a factor ${\Delta}$ which, in the string example at least, has the dimensions of length, so the physical interpretation of these two equations is different. The units of ${\left\langle f\left|g\right.\right\rangle }$ appear to be different in the two cases. Now in quantum theory, inner products of the continuous type usually involve the wave function multiplied by its complex conjugate, with possibly another operator thrown in if we’re trying to find the expectation value of some observable. The square modulus of the wave function, ${\left|\Psi\right|^{2}}$, is taken to be a probability density, so it has units of inverse length (in one dimension) or inverse volume (in three dimensions), which makes the integral work out properly.

Admittedly, when we’re using ${f}$ to represent the displacement of a string, it’s not obvious what meaning the inner product of ${f}$ with anything else would actually have, so maybe the point isn’t worth worrying about. However, it does seem to be something that it would be worth Shankar including a comment about.

From this point, Shankar continues by saying that this infinite dimensional vector space is spanned by basis vectors ${\left|x\right\rangle }$, with one basis vector for each value of ${x}$. We require this basis to be orthogonal, which means that we must have, if ${x\ne x^{\prime}}$

$\displaystyle \left\langle x\left|x^{\prime}\right.\right\rangle =0 \ \ \ \ \ (6)$

We then generalize the identity operator to be

$\displaystyle I=\int\left|x\right\rangle \left\langle x\right|dx \ \ \ \ \ (7)$

$\displaystyle \left\langle x\left|f\right.\right\rangle =\int\left\langle x\left|x^{\prime}\right.\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle dx^{\prime} \ \ \ \ \ (8)$

The bra-ket ${\left\langle x\left|f\right.\right\rangle }$ is the projection of the vector ${\left|f\right\rangle }$ onto the ${\left|x\right\rangle }$ basis vector, so it is just ${f\left(x\right)}$. This means

$\displaystyle f\left(x\right)=\int\left\langle x\left|x^{\prime}\right.\right\rangle f\left(x^{\prime}\right)dx^{\prime} \ \ \ \ \ (9)$

which leads to the definition of the Dirac delta function as the normalization of ${\left\langle x\left|x^{\prime}\right.\right\rangle }$:

$\displaystyle \left\langle x\left|x^{\prime}\right.\right\rangle =\delta\left(x-x^{\prime}\right) \ \ \ \ \ (10)$

Shankar then describes some properties of the delta function and its derivative, most of which we’ve already covered. For example, we’ve seen these two results for the delta function:

 $\displaystyle \delta\left(ax\right)$ $\displaystyle =$ $\displaystyle \frac{\delta\left(x\right)}{\left|a\right|}\ \ \ \ \ (11)$ $\displaystyle \frac{d\theta\left(x-x^{\prime}\right)}{dx}$ $\displaystyle =$ $\displaystyle \delta\left(x-x^{\prime}\right) \ \ \ \ \ (12)$

where ${\theta}$ is the step function

$\displaystyle \theta\left(x-x^{\prime}\right)\equiv\begin{cases} 0 & x\le x^{\prime}\\ 1 & x>x^{\prime} \end{cases} \ \ \ \ \ (13)$

One other result is that for a function ${f\left(x\right)}$ with zeroes at a number of points ${x_{i}}$, we have

$\displaystyle \delta\left(f\left(x\right)\right)=\sum_{i}\frac{\delta\left(x_{i}-x\right)}{\left|df/dx_{i}\right|} \ \ \ \ \ (14)$

To see this, consider one of the ${x_{i}}$ where ${f\left(x_{i}\right)=0}$. Expanding in a Taylor series about this point, we have

 $\displaystyle f\left(x_{i}+\left(x-x_{i}\right)\right)$ $\displaystyle =$ $\displaystyle f\left(x_{i}\right)+\left(x-x_{i}\right)\frac{df}{dx_{i}}+\ldots\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0+\left(x-x_{i}\right)\frac{df}{dx_{i}} \ \ \ \ \ (16)$

From 11 we have

$\displaystyle \delta\left(\left(x-x_{i}\right)\frac{df}{dx_{i}}\right)=\frac{\delta\left(x_{i}-x\right)}{\left|df/dx_{i}\right|} \ \ \ \ \ (17)$

The behaviour is the same at all points ${x_{i}}$ and since ${\delta\left(x_{i}-x\right)=0}$ at all other ${x_{j}\ne x_{i}}$ where ${f\left(x_{j}\right)=0}$, we can just add the delta functions for each zero of ${f}$.

Turning now to Zwiebach’s treatment, he begins with the basis states ${\left|x\right\rangle }$ and position operator ${\hat{x}}$ with the eigenvalue equation

$\displaystyle \hat{x}\left|x\right\rangle =x\left|x\right\rangle \ \ \ \ \ (18)$

and simply defines the inner product between two position states to be

$\displaystyle \left\langle x\left|y\right.\right\rangle =\delta\left(x-y\right) \ \ \ \ \ (19)$

With this definition, 9 follows immediately. We can therefore write a quantum state ${\left|\psi\right\rangle }$ as

$\displaystyle \left|\psi\right\rangle =I\left|\psi\right\rangle =\int\left|x\right\rangle \left\langle x\left|\psi\right.\right\rangle dx=\int\left|x\right\rangle \psi\left(x\right)dx \ \ \ \ \ (20)$

That is, the vector ${\left|\psi\right\rangle }$ is the integral of its projections ${\psi\left(x\right)}$ onto the basis vectors ${\left|x\right\rangle }$.

The position operator ${\hat{x}}$ is hermitian as can be seen from

 $\displaystyle \left\langle x_{1}\left|\hat{x}^{\dagger}\right|x_{2}\right\rangle$ $\displaystyle =$ $\displaystyle \left\langle x_{2}\left|\hat{x}\right|x_{1}\right\rangle ^*\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{1}\left\langle x_{2}\left|x_{1}\right.\right\rangle ^*\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{1}\delta\left(x_{2}-x_{1}\right)^*\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{1}\delta\left(x_{2}-x_{1}\right)\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{2}\delta\left(x_{2}-x_{1}\right)\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle x_{1}\left|\hat{x}\right|x_{2}\right\rangle \ \ \ \ \ (26)$

The fourth line follows because the delta function is real, and the fifth follows because ${\delta\left(x_{2}-x_{1}\right)}$ is non-zero only when ${x_{1}=x_{2}}$.

Zwiebach then introduces the momentum eigenstates ${\left|p\right\rangle }$ which are analogous to the position states ${\left|x\right\rangle }$, in that

 $\displaystyle \left\langle p^{\prime}\left|p\right.\right\rangle$ $\displaystyle =$ $\displaystyle \delta\left(p^{\prime}-p\right)\ \ \ \ \ (27)$ $\displaystyle I$ $\displaystyle =$ $\displaystyle \int dp\left|p\right\rangle \left\langle p\right|\ \ \ \ \ (28)$ $\displaystyle \hat{p}\left|p\right\rangle$ $\displaystyle =$ $\displaystyle p\left|p\right\rangle \ \ \ \ \ (29)$ $\displaystyle \tilde{\psi}\left(p\right)$ $\displaystyle =$ $\displaystyle \left\langle p\left|\psi\right.\right\rangle \ \ \ \ \ (30)$

By the same calculation as for ${\left|x\right\rangle }$, we see that ${\hat{p}}$ is hermitian.

To get a relation between the ${\left|x\right\rangle }$ and ${\left|p\right\rangle }$ bases, we require that ${\left\langle x\left|p\right.\right\rangle }$ is the wave function for a particle with momentum ${p}$ in the ${x}$ basis, which we’ve seen is

$\displaystyle \psi\left(x\right)=\frac{1}{\sqrt{2\pi\hbar}}e^{ipx/\hbar} \ \ \ \ \ (31)$

Zwiebach then shows that this is consistent with the equation

$\displaystyle \left\langle x\left|\hat{p}\right|\psi\right\rangle =\frac{h}{i}\frac{d}{dx}\left\langle x\left|\psi\right.\right\rangle =\frac{h}{i}\frac{d\psi\left(x\right)}{dx} \ \ \ \ \ (32)$

We can get a similar relation by switching ${x}$ and ${p}$:

 $\displaystyle \left\langle p\left|\hat{x}\right|\psi\right\rangle$ $\displaystyle =$ $\displaystyle \int dx\left\langle p\left|x\right.\right\rangle \left\langle x\left|\hat{x}\right|\psi\right\rangle \ \ \ \ \ (33)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int dx\left\langle x\left|p\right.\right\rangle ^*x\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (34)$

From 31 we see

 $\displaystyle \left\langle x\left|p\right.\right\rangle ^*$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2\pi\hbar}}e^{-ipx/\hbar}\ \ \ \ \ (35)$ $\displaystyle \left\langle x\left|p\right.\right\rangle ^*x$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dp}\left\langle x\left|p\right.\right\rangle ^*\ \ \ \ \ (36)$ $\displaystyle \int dx\left\langle x\left|p\right.\right\rangle ^*x\left\langle x\left|\psi\right.\right\rangle$ $\displaystyle =$ $\displaystyle i\hbar\int dx\;\frac{d}{dp}\left\langle x\left|p\right.\right\rangle ^*\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (37)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dp}\int dx\;\left\langle x\left|p\right.\right\rangle ^*\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (38)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dp}\int dx\;\left\langle p\left|x\right.\right\rangle \left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (39)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d\tilde{\psi}\left(p\right)}{dp} \ \ \ \ \ (40)$

In the fourth line, we took the ${\frac{d}{dp}}$ outside the integral since ${p}$ occurs in only one term, and in the last line we used 7. Thus we have

$\displaystyle \left\langle p\left|\hat{x}\right|\psi\right\rangle =i\hbar\frac{d\tilde{\psi}\left(p\right)}{dp} \ \ \ \ \ (41)$

# Exponentials of operators – Baker-Campbell-Hausdorff formula

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.9.

Although the result in this post isn’t covered in Shankar’s book, it’s a result that is frequently used in quantum theory, so it’s worth including at this point.

We’ve seen how to define a function of an operator if that function can be expanded in a power series. A common operator function is the exponential:

$\displaystyle f\left(\Omega\right)=e^{i\Omega} \ \ \ \ \ (1)$

If ${\Omega}$ is hermitian, the exponential ${e^{i\Omega}}$ is unitary. If we try to calculate the exponential of two operators such as ${e^{A+B}}$, the result isn’t as simple as we might hope if ${A}$ and ${B}$ don’t commute. To see the problem, we can write this out as a power series

 $\displaystyle e^{A+B}$ $\displaystyle =$ $\displaystyle \sum_{n=0}^{\infty}\frac{\left(A+B\right)^{n}}{n!}\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle I+A+B+\frac{1}{2}\left(A+B\right)\left(A+B\right)+\ldots\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle I+A+B+\frac{1}{2}\left(A^{2}+AB+BA+B^{2}\right)+\ldots \ \ \ \ \ (4)$

The problem appears first in the fourth term in the series, since we can’t condense the ${AB+BA}$ sum into ${2AB}$ if ${\left[A,B\right]\ne0}$. In fact, the expansion of ${e^{A}e^{B}}$ can be written entirely in terms of the commutators of ${A}$ and ${B}$ with each other, nested to increasingly higher levels. This formula is known as the Baker-Campbell-Hausdorff formula. Up to the fourth order commutator, the BCH formula gives

$\displaystyle e^{A}e^{B}=\exp\left[A+B+\frac{1}{2}\left[A,B\right]+\frac{1}{12}\left(\left[A,\left[A,B\right]\right]+\left[B,\left[B,A\right]\right]\right)-\frac{1}{24}\left[B,\left[A,\left[A,B\right]\right]\right]+\ldots\right] \ \ \ \ \ (5)$

There is no known closed form expression for this result. However, an important special case that occurs frequently in quantum theory is the case where ${\left[A,B\right]=cI}$, where ${c}$ is a complex scalar and ${I}$ is the usual identity matrix. Since ${cI}$ commutes with all operators, all terms from the third order upwards are zero, and we have

$\displaystyle e^{A}e^{B}=e^{A+B+\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (6)$

We can prove this result as follows. Start with the operator function

$\displaystyle G\left(t\right)\equiv e^{t\left(A+B\right)}e^{-tA} \ \ \ \ \ (7)$

where ${t}$ is a scalar parameter (not necessarily time!).

From its definition,

$\displaystyle G\left(0\right)=I \ \ \ \ \ (8)$

The inverse is

$\displaystyle G^{-1}\left(t\right)=e^{tA}e^{-t\left(A+B\right)} \ \ \ \ \ (9)$

and the derivative is

 $\displaystyle \frac{dG\left(t\right)}{dt}$ $\displaystyle =$ $\displaystyle \left(A+B\right)e^{t\left(A+B\right)}e^{-tA}-e^{t\left(A+B\right)}e^{-tA}A \ \ \ \ \ (10)$

Note that we have to keep the ${\left(A+B\right)}$ factor to the left of the ${A}$ factor because ${\left[A,B\right]\ne0}$. Now we multiply:

 $\displaystyle G^{-1}\frac{dG}{dt}$ $\displaystyle =$ $\displaystyle e^{tA}e^{-t\left(A+B\right)}\left[\left(A+B\right)e^{t\left(A+B\right)}e^{-tA}-e^{t\left(A+B\right)}e^{-tA}A\right]\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{tA}\left(A+B\right)e^{-tA}-A\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{tA}Ae^{-tA}+e^{tA}Be^{-tA}-A\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{tA}Be^{-tA}\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle B+t\left[A,B\right]\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle B+ctI \ \ \ \ \ (16)$

We used Hadamard’s lemma in the penultimate line, which in this case reduces to

$\displaystyle e^{tA}Be^{-tA}=B+t\left[A,B\right] \ \ \ \ \ (17)$

because ${\left[A,B\right]=cI}$ so all higher order commutators are zero.

We end up with an expression in which ${A}$ has disappeared. This gives the differential equation for ${G}$:

$\displaystyle G^{-1}\frac{dG}{dt}=B+ctI \ \ \ \ \ (18)$

We try a solution of the form (this apparently appears from divine inspiration):

$\displaystyle G\left(t\right)=e^{\alpha tB}e^{\beta ct^{2}} \ \ \ \ \ (19)$

From which we get

 $\displaystyle G^{-1}$ $\displaystyle =$ $\displaystyle e^{-\alpha tB}e^{-\beta ct^{2}}\ \ \ \ \ (20)$ $\displaystyle \frac{dG}{dt}$ $\displaystyle =$ $\displaystyle \left(\alpha B+2\beta ct\right)e^{\alpha tB}e^{\beta ct^{2}}\ \ \ \ \ (21)$ $\displaystyle G^{-1}\frac{dG}{dt}$ $\displaystyle =$ $\displaystyle \alpha B+2\beta ct \ \ \ \ \ (22)$

Comparing this to 18, we have

 $\displaystyle \alpha$ $\displaystyle =$ $\displaystyle 1\ \ \ \ \ (23)$ $\displaystyle \beta$ $\displaystyle =$ $\displaystyle \frac{1}{2}\ \ \ \ \ (24)$ $\displaystyle G\left(t\right)$ $\displaystyle =$ $\displaystyle e^{tB}e^{\frac{1}{2}ct^{2}} \ \ \ \ \ (25)$

Setting this equal to the original definition of ${G}$ in 7 and then taking ${t=1}$ we have

 $\displaystyle e^{A+B}e^{-A}$ $\displaystyle =$ $\displaystyle e^{B}e^{c/2}\ \ \ \ \ (26)$ $\displaystyle e^{A+B}$ $\displaystyle =$ $\displaystyle e^{B}e^{A}e^{\frac{1}{2}c}\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{B}e^{A}e^{\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (28)$

If we swap ${A}$ with ${B}$ and use the fact that ${A+B=B+A}$, and also ${\left[A,B\right]=-\left[B,A\right]}$, we have

$\displaystyle e^{A+B}=e^{A}e^{B}e^{-\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (29)$

This is the restricted form of the BCH formula for the case where ${\left[A,B\right]}$ is a scalar.

# Lorentz transformations as 2×2 matrices

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, recall that a Lorentz transformation can be represented by a ${4\times4}$ matrix ${\Lambda}$ which preserves the Minkowski length ${x_{\mu}x^{\mu}}$ of all four-vectors ${x}$. This leads to the condition

$\displaystyle \Lambda^{T}g\Lambda=g \ \ \ \ \ (1)$

where ${g}$ is the flat-space Minkowski metric

$\displaystyle g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (2)$

It turns out that we can map any 4-vector ${x}$ to a ${2\times2}$ Hermitian matrix ${\widehat{x}}$ defined as

$\displaystyle \widehat{x}\equiv\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right] \ \ \ \ \ (3)$

[Recall that a Hermitian matrix ${H}$ is equal to the complex conjugate of its transpose:

$\displaystyle H=\left(H^{T}\right)^*\equiv H^{\dagger} \ \ \ \ \ (4)$

Also note that Jaffe uses an unconventional notation for the Hermitian conjugate, as he uses a superscript * rather that a superscript ${\dagger}$. This can be confusing since usually a superscript * indicates just complex conjugate, without the transpose. I’ll use the more usual superscript ${\dagger}$ for Hermitian conjugate here.]

Although we’re used to the scalar product of two vectors, it is also useful to define the scalar product of two matrices as

$\displaystyle \left\langle A,B\right\rangle \equiv\frac{1}{2}\mbox{Tr}\left(A^{\dagger}B\right) \ \ \ \ \ (5)$

where ‘Tr’ means the trace of a matrix, which is the sum of its diagonal elements. Note that the scalar product of ${\widehat{x}}$ with itself is

 $\displaystyle \left\langle \widehat{x},\widehat{x}\right\rangle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right]\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right]\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle$ $\displaystyle \frac{1}{2}\left[\left(x_{0}+x_{3}\right)^{2}+2\left(x_{1}-ix_{2}\right)\left(x_{1}+ix_{2}\right)+\left(x_{0}-x_{3}\right)^{2}\right]\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{0}^{2}+x_{1}^{2}+x_{2}^{2}+x_{3}^{2} \ \ \ \ \ (8)$

The determinant of ${\widehat{x}}$ is

 $\displaystyle \det\widehat{x}$ $\displaystyle =$ $\displaystyle \left(x_{0}+x_{3}\right)\left(x_{0}-x_{3}\right)-\left(x_{1}-ix_{2}\right)\left(x_{1}+ix_{2}\right)\ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{0}^{2}-x_{1}^{2}-x_{2}^{2}-x_{3}^{2}\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{\mu}x^{\mu} \ \ \ \ \ (11)$

Thus ${\det\widehat{x}}$ is the Minkowski length squared.

From 3, we observe that we can write ${\widehat{x}}$ as a sum:

$\displaystyle \widehat{x}=\sum_{\mu=0}^{4}x_{\mu}\sigma_{\mu} \ \ \ \ \ (12)$

where the ${\sigma_{\mu}}$ are four Hermitian matrices:

 $\displaystyle \sigma_{0}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]=I\ \ \ \ \ (13)$ $\displaystyle \sigma_{1}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 0 & 1\\ 1 & 0 \end{array}\right]\ \ \ \ \ (14)$ $\displaystyle \sigma_{2}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (15)$ $\displaystyle \sigma_{3}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right] \ \ \ \ \ (16)$

The last three are the Pauli spin matrices that we met when looking at spin-${\frac{1}{2}}$ in quantum mechanics.

The ${\sigma_{\mu}}$ are orthonormal under the scalar product operation, as we can verify by direct calculation. For example

 $\displaystyle \left\langle \sigma_{2},\sigma_{3}\right\rangle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right]\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left(0+0\right)\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (19)$

And:

 $\displaystyle \left\langle \sigma_{2},\sigma_{2}\right\rangle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left(1+1\right)\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 1 \ \ \ \ \ (22)$

The other products work out similarly, so we have

$\displaystyle \left\langle \sigma_{\mu},\sigma_{\nu}\right\rangle =\delta_{\mu\nu} \ \ \ \ \ (23)$

We can work out the inverse transformation to 3 by taking the scalar product of 12 with ${\sigma_{\nu}}$:

 $\displaystyle \left\langle \sigma_{\nu},\widehat{x}\right\rangle$ $\displaystyle =$ $\displaystyle \sum_{\mu=0}^{4}x_{\mu}\left\langle \sigma_{\nu},\sigma_{\mu}\right\rangle \ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{\mu=0}^{4}x_{\mu}\delta_{\nu\mu}\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{\nu} \ \ \ \ \ (26)$

Now a few more theorems that will be useful later.

Irreducible Sets of Matrices

A set of matrices ${\mathfrak{U}}$ is called irreducible if the only matrix ${C}$ that commutes with every matrix in ${\mathfrak{U}}$ is the identity matrix ${I}$ (or a multiple of ${I}$). Any two of the three Pauli matrices ${\sigma_{i}}$, ${i=1,2,3}$ above form an irreducible set of ${2\times2}$ Hermitian matrices. This can be shown by direct calculation, which Jaffe does in detail in his article. For example, if we define ${C}$ to be some arbitrary matrix

$\displaystyle C=\left[\begin{array}{cc} a & b\\ c & d \end{array}\right] \ \ \ \ \ (27)$

where ${a,b,c,d}$ are complex numbers, then

 $\displaystyle C\sigma_{1}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} b & a\\ d & c \end{array}\right]\ \ \ \ \ (28)$ $\displaystyle \sigma_{1}C$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} c & d\\ a & b \end{array}\right] \ \ \ \ \ (29)$

If ${C}$ is to commute with ${\sigma_{1}}$, we must therefore require ${b=c}$ and ${a=d}$.

Similarly, for ${\sigma_{2}}$ we have

 $\displaystyle C\sigma_{2}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} ib & -ia\\ id & -ic \end{array}\right]\ \ \ \ \ (30)$ $\displaystyle \sigma_{2}C$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} -ic & -id\\ ia & ib \end{array}\right] \ \ \ \ \ (31)$

so that ${C\sigma_{2}=\sigma_{2}C}$ requires ${b=-c}$ and ${a=d}$.

And for ${\sigma_{3}}$:

 $\displaystyle C\sigma_{3}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} a & -b\\ c & -d \end{array}\right]\ \ \ \ \ (32)$ $\displaystyle \sigma_{3}C$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} a & b\\ -c & -d \end{array}\right] \ \ \ \ \ (33)$

so that ${C\sigma_{3}=\sigma_{3}C}$ requires ${b=-b}$ and ${c=-c}$, so ${b=c=0}$ (no conditions can be inferred for ${a}$ or ${d}$).

If we form a set ${\mathfrak{U}}$ containing ${\sigma_{3}}$ and one of ${\sigma_{1}}$ or ${\sigma_{2}}$, we see that ${b=c=0}$ and ${a=d}$, so ${C}$ is a multiple of ${I}$. If we form ${\mathfrak{U}}$ from ${\sigma_{1}}$ and ${\sigma_{2}}$ we again have ${a=d}$, but we must have simultaneously ${b=c}$ and ${b=-c}$ which can be true only if ${b=c=0}$, so again ${C}$ is a multiple of ${I}$.

Unitary Matrices

A unitary matrix is one whose Hermitian conjugate is its inverse, so that ${U^{\dagger}=U^{-1}}$. Some properties of unitary matrices are given on the Wikipedia page, so we’ll just use those without going through the proofs. First, a unitary matrix is normal, which means that ${U^{\dagger}U=UU^{\dagger}}$ (this actually follows from the condition ${U^{\dagger}=U^{-1}}$). Second, there is another unitary matrix ${V}$ which diagonalizes ${U}$, that is

$\displaystyle V^{\dagger}UV=D \ \ \ \ \ (34)$

where ${D}$ is a diagonal, unitary matrix.

Third,

$\displaystyle \left|\det U\right|=1 \ \ \ \ \ (35)$

(The determinant can be complex, but has magnitude 1.)

From this it follows that ${\left|\det D\right|=1}$ and since ${D}$ is unitary and diagonal, each diagonal element ${d_{j}}$ of ${D}$ must satisfy ${\left|d_{j}\right|=1}$. (Remember that ${d_{j}}$ could be a complex number.) That means that ${d_{j}=e^{i\lambda_{j}}}$ for some real number ${\lambda_{j}}$, so we can write

$\displaystyle D=e^{i\Lambda} \ \ \ \ \ (36)$

where ${\Lambda}$ is a diagonal hermitian matrix containing only real elements, non-zero along its diagonal: ${\Lambda_{ij}=\lambda_{j}\delta_{ij}}$. As usual, the exponential of a matrix is interpreted in terms of its power series, so that

$\displaystyle e^{i\Lambda}=1+i\Lambda+\frac{\left(i\Lambda\right)^{2}}{2!}+\frac{\left(i\Lambda\right)^{3}}{3!}+\ldots \ \ \ \ \ (37)$

For a diagonal matrix ${\Lambda}$ with diagonal elements ${\Lambda_{jj}=\lambda_{j}}$, the diagonal elements of ${\Lambda^{n}}$ are just ${\Lambda_{jj}^{n}=\lambda_{j}^{n}}$.

From 34, we have

 $\displaystyle U$ $\displaystyle =$ $\displaystyle VDV^{\dagger}\ \ \ \ \ (38)$ $\displaystyle$ $\displaystyle =$ $\displaystyle Ve^{i\Lambda}V^{\dagger} \ \ \ \ \ (39)$

Now we also have, since ${VV^{\dagger}=I}$

 $\displaystyle V\Lambda^{n}V^{\dagger}$ $\displaystyle =$ $\displaystyle V\Lambda\left(VV^{\dagger}\right)\Lambda\left(VV^{\dagger}\right)\ldots\Lambda V^{\dagger}\ \ \ \ \ (40)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(V\Lambda V^{\dagger}\right)^{n} \ \ \ \ \ (41)$

Therefore, from 37

 $\displaystyle U$ $\displaystyle =$ $\displaystyle Ve^{i\Lambda}V^{\dagger}\ \ \ \ \ (42)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{iV\Lambda V^{\dagger}}\ \ \ \ \ (43)$ $\displaystyle$ $\displaystyle \equiv$ $\displaystyle e^{iH} \ \ \ \ \ (44)$

where ${H=V\Lambda V^{\dagger}}$ is another Hermitian matrix. In other words, we can always write a unitary matrix as the exponential of a Hermitian matrix.

In the case where ${H}$ is a ${2\times2}$ matrix, we can write it in terms of the ${\sigma_{\mu}}$ matrices above as

$\displaystyle H=\sum_{\mu=0}^{3}a_{\mu}\sigma_{\mu} \ \ \ \ \ (45)$

where the ${a_{\mu}}$ are real, since the diagonal elements of a Hermitian matrix must be real. This follows because the ${\sigma_{\mu}}$ form an orthonormal basis for the ${2\times2}$ Hermitian matrices. [For some reason, Jaffe refers to the ${a_{\mu}}$as ${\lambda_{\mu}}$ which is confusing since he has used ${\lambda_{\mu}}$ as the diagonal elements of ${\Lambda}$ above, and they’re not the same thing.]

If ${\det U=+1}$, then

 $\displaystyle \det U$ $\displaystyle =$ $\displaystyle \det\left(VDV^{\dagger}\right)\ \ \ \ \ (46)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \det\left(VV^{\dagger}D\right)\ \ \ \ \ (47)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \det D\ \ \ \ \ (48)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \det e^{i\Lambda} \ \ \ \ \ (49)$

The second line follows because the determinant of a product of matrices is the product of the determinants, so we can rearrange the multiplication order. To evaluate the last line, we observe that for a diagonal matrix ${\Lambda}$, using 37 and applying the result to each diagonal element

$\displaystyle e^{i\Lambda}=\left[\begin{array}{cc} e^{i\Lambda_{11}} & 0\\ 0 & e^{i\Lambda_{22}} \end{array}\right] \ \ \ \ \ (50)$

Therefore

$\displaystyle \det e^{i\Lambda}=e^{i\left(\Lambda_{11}+\Lambda_{22}\right)}=e^{i\mbox{Tr}\Lambda} \ \ \ \ \ (51)$

[By the way, the relation ${\det e^{A}=e^{\mbox{Tr}A}}$ is actually true for any square matrix ${A}$, and is a corollary of Jacobi’s formula.]

We can now use the cyclic property of the trace (another matrix algebra theroem) which says that for 3 matrices ${A,B,C}$,

$\displaystyle \mbox{Tr}\left(ABC\right)=\mbox{Tr}\left(CAB\right)=\mbox{Tr}\left(BCA\right) \ \ \ \ \ (52)$

This gives us

$\displaystyle \mbox{Tr}H=\mbox{Tr}\left(V\Lambda V^{\dagger}\right)=\mbox{Tr}\left(V^{\dagger}V\Lambda\right)=\mbox{Tr}\Lambda \ \ \ \ \ (53)$

Finally, from 45 and the fact that the traces of the ${\sigma_{i}}$ are all zero for ${i=1,2,3}$, and ${\mbox{Tr}\sigma_{0}=2}$, we have

$\displaystyle \det U=\det e^{i\Lambda}=e^{i\mbox{Tr}H}=e^{2ia_{0}}=1 \ \ \ \ \ (54)$

Thus ${a_{0}=n\pi}$ for some integer ${n}$, but as all values of ${n}$ give the same original unitary matrix ${U}$, we can choose ${n=0}$ so that ${a_{0}=0}$ and

$\displaystyle H=\sum_{\mu=1}^{3}a_{\mu}\sigma_{\mu} \ \ \ \ \ (55)$