Category Archives: Uncategorised

Direct product of vector spaces: 2-dim examples

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Chapter 10, Exercise 10.1.2.

To help with understanding the direct product of two vector spaces, some examples with a couple of 2-d vector spaces are useful. Suppose the one-particle Hilbert space is two-dimensional, with basis vectors {\left|+\right\rangle } and {\left|-\right\rangle }. Now suppose we have two such particles, each in its own 2-d space, {\mathbb{V}_{1}} for particle 1 and {\mathbb{V}_{2}} for particle 2. We can define a couple of operators by their matrix elements in these two spaces. We define

\displaystyle   \sigma_{1}^{\left(1\right)} \displaystyle  \equiv \displaystyle  \left[\begin{array}{cc} a & b\\ c & d \end{array}\right]\ \ \ \ \ (1)
\displaystyle  \sigma_{2}^{\left(2\right)} \displaystyle  \equiv \displaystyle  \left[\begin{array}{cc} e & f\\ g & h \end{array}\right] \ \ \ \ \ (2)

where the first column and row refer to basis vector {\left|+\right\rangle } and the second column and row to {\left|-\right\rangle }. Recall that the subscript on each {\sigma} refers to the particle and the superscript refers to the vector space. Thus {\sigma_{1}^{\left(1\right)}} is an operator in space {\mathbb{V}_{1}} for particle 1.

Now consider the direct product space {\mathbb{V}_{1}\otimes\mathbb{V}_{2}}, which is spanned by the four basis vectors formed by direct products of the two basis vectors in each of the one-particle spaces, that is by {\left|+\right\rangle \otimes\left|+\right\rangle }, {\left|+\right\rangle \otimes\left|-\right\rangle }, {\left|-\right\rangle \otimes\left|+\right\rangle } and {\left|-\right\rangle \otimes\left|-\right\rangle }. Each of the {\sigma} operators has a corresponding version in the product space, which is formed by taking the direct product of the one-particle version for one of the particles with the identity operator for the other particle. That is

\displaystyle   \sigma_{1}^{\left(1\right)\otimes\left(2\right)} \displaystyle  = \displaystyle  \sigma_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\ \ \ \ \ (3)
\displaystyle  \sigma_{2}^{\left(1\right)\otimes\left(2\right)} \displaystyle  = \displaystyle  I^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)} \ \ \ \ \ (4)

To get the matrix elements in the product space, we need the form of the identity operators in the one-particle spaces. They are, as usual

\displaystyle   I^{\left(1\right)} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]\ \ \ \ \ (5)
\displaystyle  I^{\left(2\right)} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right] \ \ \ \ \ (6)

I’ve written the two identity operators as separate equations since although they have the same numerical form as a matrix, the two operators operate on different spaces, so they are technically different operators. To get the matrix elements of {\sigma_{1}^{\left(1\right)\otimes\left(2\right)}} we can expand the direct product (Shankar suggests using the ‘method of images’, although I have no idea what this is. I doubt that it’s the same method of images used in electrostatics, and Google draws a blank for any other kind of method of images.) In any case, we can form the product by taking the corresponding matrix elements. For example

\displaystyle   \left\langle ++\left|\sigma_{1}^{\left(1\right)\otimes\left(2\right)}\right|++\right\rangle \displaystyle  = \displaystyle  \left(\left\langle +\right|\otimes\left\langle +\right|\right)\sigma_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\left(\left|+\right\rangle \otimes\left|+\right\rangle \right)\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  \left\langle +\left|\sigma_{1}^{\left(1\right)}\right|+\right\rangle \left\langle +\left|I^{\left(2\right)}\right|+\right\rangle \ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  a\times1=a \ \ \ \ \ (9)

When working out the RHS of the first line, remember that operators with a superscript (1) operate only on bras and kets from the space {\mathbb{V}_{1}} and operators with a superscript (2) operate only on bras and kets from the space {\mathbb{V}_{2}}. Applying the same technique for the remaining elements gives

\displaystyle  \sigma_{1}^{\left(1\right)\otimes\left(2\right)}=\sigma_{1}^{\left(1\right)}\otimes I^{\left(2\right)}=\left[\begin{array}{cccc} a & 0 & b & 0\\ 0 & a & 0 & b\\ c & 0 & d & 0\\ 0 & c & 0 & d \end{array}\right] \ \ \ \ \ (10)

Another less tedious way of getting this result is to note that we can form the direct product by taking each element in the first matrix {\sigma_{1}^{\left(1\right)}} from 1 and multiply it into the second matrix {I^{\left(2\right)}} from 6. Thus the top {2\times2} elements in {\sigma_{1}^{\left(1\right)\otimes\left(2\right)}} are obtained by taking the element {\left\langle +\left|\sigma_{1}^{\left(1\right)}\right|+\right\rangle =a} from 1 and multiplying it into the matrix {I^{\left(2\right)}} from 6. That is, the upper left {2\times2} block is formed from

\displaystyle   aI_{2\times2}^{\left(2\right)} \displaystyle  = \displaystyle  \left[\begin{array}{cc} a & 0\\ 0 & a \end{array}\right] \ \ \ \ \ (11)

and so on for the other three {2\times2} blocks in the complete matrix. Note that it’s important to get things in the right order, as the direct product is not commutative.

To get the other direct product, we can apply the same technique:

\displaystyle  \sigma_{2}^{\left(1\right)\otimes\left(2\right)}=I^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)}=\left[\begin{array}{cccc} e & f & 0 & 0\\ g & h & 0 & 0\\ 0 & 0 & e & f\\ 0 & 0 & g & h \end{array}\right] \ \ \ \ \ (12)

Again, note that

\displaystyle  I^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)}\ne\sigma_{2}^{\left(2\right)}\otimes I^{\left(1\right)}=\left[\begin{array}{cccc} e & 0 & f & 0\\ 0 & e & 0 & f\\ g & 0 & h & 0\\ 0 & g & 0 & h \end{array}\right] \ \ \ \ \ (13)

Finally, we can work out the direct product version of the product of two one-particle operators. That is, we want

\displaystyle  \left(\sigma_{1}\sigma_{2}\right)^{\left(1\right)\otimes\left(2\right)}=\sigma_{1}^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)} \ \ \ \ \ (14)

We can do this in two ways. First, we can apply the same recipe as in the previous example. We take each element of {\sigma_{1}^{\left(1\right)}} and multiply it into the full matrix {\sigma_{2}^{\left(2\right)}}:

\displaystyle   \sigma_{1}^{\left(1\right)}\otimes\sigma_{2}^{\left(2\right)} \displaystyle  = \displaystyle  \left[\begin{array}{cccc} ae & af & be & bf\\ ag & ah & bg & bh\\ ce & cf & de & df\\ cg & ch & dg & dh \end{array}\right] \ \ \ \ \ (15)

Second, we can take the matrix product of {\sigma_{1}^{\left(1\right)\otimes\left(2\right)}} from 10 with {\sigma_{2}^{\left(1\right)\otimes\left(2\right)}} from 12:

\displaystyle   \left(\sigma_{1}\sigma_{2}\right)^{\left(1\right)\otimes\left(2\right)} \displaystyle  = \displaystyle  \left[\begin{array}{cccc} a & 0 & b & 0\\ 0 & a & 0 & b\\ c & 0 & d & 0\\ 0 & c & 0 & d \end{array}\right]\left[\begin{array}{cccc} e & f & 0 & 0\\ g & h & 0 & 0\\ 0 & 0 & e & f\\ 0 & 0 & g & h \end{array}\right]\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{cccc} ae & af & be & bf\\ ag & ah & bg & bh\\ ce & cf & de & df\\ cg & ch & dg & dh \end{array}\right] \ \ \ \ \ (17)

WordPress help requested

As regular visitors will know, this blog occasionally goes off line due to problems connecting to the WordPress database that stores the posts. I have contacted my hosting provider and they say that this is due to an excessive number of database connections that are opened but then not closed again, but are unable to provide any help beyond that.

As I do not want to mess with any of the WordPress code (for two reasons: 1 – changing the code could introduce security problems and 2 – I don’t know enough about either WordPress or PHP to mess with their code) I was wondering if any readers have experience with running WordPress blogs and know of any ways to prevent excessive database access. I have just installed the “W3 Total Cache” plugin which may help, but if anyone has any other suggestions I’d be very grateful.

Thermodynamics of harmonic oscillators – classical and quantum

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 7.5, Exercise 7.5.4.

One application of harmonic oscillator theory is in the behaviour of crystals as a function of temperature. A reasonable model of a crystal is of a number of atoms that vibrate as harmonic oscillators. From statistical mechanics, the probability {P\left(i\right)} of finding a system in a state {i} is given by the Boltzmann formula

\displaystyle  P\left(i\right)=\frac{e^{-\beta E\left(i\right)}}{Z} \ \ \ \ \ (1)

where {\beta=1/kT}, with {k} being Boltzmann’s constant and {T} the absolute temperature, and {Z} is the partition function

\displaystyle  Z=\sum_{i}e^{-\beta E\left(i\right)} \ \ \ \ \ (2)

The thermal average energy of the system is then

\displaystyle   \bar{E} \displaystyle  = \displaystyle  \sum_{i}E\left(i\right)P\left(i\right)\ \ \ \ \ (3)
\displaystyle  \displaystyle  = \displaystyle  \frac{\sum_{i}E\left(i\right)e^{-\beta E\left(i\right)}}{Z}\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  -\frac{\partial\left(\ln Z\right)}{\partial\beta} \ \ \ \ \ (5)

For a classical harmonic oscillator, the energy is a continuous function of the position {x} and momentum {p}:

\displaystyle  E_{cl}=\frac{p^{2}}{2m}+\frac{1}{2}m\omega^{2}x^{2} \ \ \ \ \ (6)

The classical partition function is then

\displaystyle   Z_{cl} \displaystyle  = \displaystyle  \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-\beta p^{2}/2m}e^{-\beta m\omega^{2}x^{2}/2}dp\;dx\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  \int_{-\infty}^{\infty}e^{-\beta p^{2}/2m}dp\int_{-\infty}^{\infty}e^{-\beta m\omega^{2}x^{2}/2}dx\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  \sqrt{\frac{2\pi m}{\beta}}\sqrt{\frac{2\pi}{\beta m\omega^{2}}}\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \frac{2\pi}{\omega\beta} \ \ \ \ \ (10)

Where we used the standard formula for Gaussian integrals to get the third line. The average classical energy is, from 5

\displaystyle  \bar{E}_{cl}=-\frac{\partial\left(\ln Z_{cl}\right)}{\partial\beta}=\frac{1}{\beta}=kT \ \ \ \ \ (11)

The average energy of a classical oscillator thus depends only on the temperature, and not on the frequency {\omega}.

For a quantum oscillator, the energies are quantized with values of

\displaystyle  E\left(n\right)=\hbar\omega\left(n+\frac{1}{2}\right) \ \ \ \ \ (12)

The quantum partition function is therefore

\displaystyle  Z_{qu}=e^{-\beta\hbar\omega/2}\sum_{n=0}^{\infty}e^{-\beta\hbar\omega n} \ \ \ \ \ (13)

The sum is a geometric series, so we can use the standard result for {\left|x\right|<1}:

\displaystyle  \sum_{n=0}^{\infty}x^{n}=\frac{1}{1-x} \ \ \ \ \ (14)

This gives

\displaystyle  Z_{qu}=\frac{e^{-\beta\hbar\omega/2}}{1-e^{-\beta\hbar\omega}} \ \ \ \ \ (15)

The mean quantum energy is again found from 5, although this time the derivative is a bit messier, so is most easily done using Maple. However, by hand, you’d get

\displaystyle   \bar{E}_{qu} \displaystyle  = \displaystyle  -\frac{\partial\left(\ln Z_{qu}\right)}{\partial\beta}\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \frac{1-e^{-\beta\hbar\omega}}{e^{-\beta\hbar\omega/2}}\left[-\frac{1}{2}\frac{\hbar\omega e^{-\beta\hbar\omega/2}}{1-e^{-\beta\hbar\omega}}-\frac{\hbar\omega e^{-\beta\hbar\omega/2}e^{-\beta\hbar\omega}}{\left(1-e^{-\beta\hbar\omega}\right)^{2}}\right]\ \ \ \ \ (17)
\displaystyle  \displaystyle  = \displaystyle  \frac{\hbar\omega}{2}\left(\frac{1+e^{-\beta\hbar\omega}}{1-e^{-\beta\hbar\omega}}\right)\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \frac{\hbar\omega}{2}\left(\frac{1-e^{-\beta\hbar\omega}+2e^{-\beta\hbar\omega}}{1-e^{-\beta\hbar\omega}}\right)\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \hbar\omega\left(\frac{1}{2}+\frac{1}{e^{\beta\hbar\omega}-1}\right) \ \ \ \ \ (20)

The average energy is the ground state energy {\hbar\omega/2} plus a quantity that increases with increasing temperature (decreasing {\beta}). For small {\beta} we have

\displaystyle   \bar{E}_{qu} \displaystyle  \rightarrow \displaystyle  \hbar\omega\left(\frac{1}{2}+\frac{1}{1+\beta\hbar\omega-1}\right)\ \ \ \ \ (21)
\displaystyle  \displaystyle  = \displaystyle  \frac{\hbar\omega}{2}+\frac{1}{\beta}\ \ \ \ \ (22)
\displaystyle  \displaystyle  \rightarrow \displaystyle  kT \ \ \ \ \ (23)

since as {\beta\rightarrow0}, {\frac{1}{\beta}\gg\frac{\hbar\omega}{2}}. Thus the quantum energy reduces to the classical energy 11 for high temperatures. The ‘high temperature’ condition is that

\displaystyle   \frac{1}{\beta} \displaystyle  \gg \displaystyle  \frac{\hbar\omega}{2}\ \ \ \ \ (24)
\displaystyle  T \displaystyle  \gg \displaystyle  \frac{\hbar\omega}{2k} \ \ \ \ \ (25)

So far, we’ve considered the average behaviour of only one oscillator. Suppose we now have a 3-d crystal with {N_{0}} atoms. Assuming small oscillations we can approximate its behaviour by a system of {3N_{0}} decoupled oscillators. In the classical case, the average energy is found from 11:

\displaystyle  \bar{\mathcal{E}}_{cl}=3N_{0}\bar{E}_{cl}=3N_{0}kT \ \ \ \ \ (26)

The heat capacity per atom is the amount of heat (energy) {\Delta E} required to raise the temperature by {\Delta T}, so

\displaystyle  C_{cl}=\frac{1}{N_{0}}\frac{\partial\bar{\mathcal{E}}_{cl}}{\partial T}=3k \ \ \ \ \ (27)

For the quantum system, we have from 20

\displaystyle   \bar{\mathcal{E}}_{qu} \displaystyle  = \displaystyle  3N_{0}\bar{E}_{qu}\ \ \ \ \ (28)
\displaystyle  \displaystyle  = \displaystyle  3N_{0}\hbar\omega\left(\frac{1}{2}+\frac{1}{e^{\beta\hbar\omega}-1}\right) \ \ \ \ \ (29)

The quantum heat capacity is therefore

\displaystyle   C_{qu} \displaystyle  = \displaystyle  \frac{1}{N_{0}}\frac{\partial\bar{\mathcal{E}}_{qu}}{\partial T}\ \ \ \ \ (30)
\displaystyle  \displaystyle  = \displaystyle  3\hbar\omega\frac{\partial}{\partial\beta}\left(\frac{1}{e^{\beta\hbar\omega}-1}\right)\frac{d\beta}{dT}\ \ \ \ \ (31)
\displaystyle  \displaystyle  = \displaystyle  3\frac{\hbar^{2}\omega^{2}}{kT^{2}}\frac{e^{\hbar\omega/kT}}{\left(e^{\beta\hbar\omega}-1\right)^{2}} \ \ \ \ \ (32)

We can define the Einstein temperature as

\displaystyle  \theta_{E}\equiv\frac{\hbar\omega}{k} \ \ \ \ \ (33)

which gives the heat capacity as

\displaystyle  C_{qu}=3k\frac{\theta_{E}^{2}}{T^{2}}\frac{e^{\theta_{E}/T}}{\left(e^{\theta_{E}/T}-1\right)^{2}} \ \ \ \ \ (34)

For large temperatures, the exponent {\theta_{E}/T} becomes small, so we have

\displaystyle   C_{qu} \displaystyle  \underset{T\gg\theta_{E}}{\longrightarrow} \displaystyle  3k\frac{\theta_{E}^{2}}{T^{2}}\frac{1+\theta_{E}/T}{\left(1+\theta_{E}/T-1\right)^{2}}\ \ \ \ \ (35)
\displaystyle  \displaystyle  \rightarrow \displaystyle  3k \ \ \ \ \ (36)

For low temperatures {e^{\theta_{E}/T}\gg1} so we have

\displaystyle   C_{qu} \displaystyle  \underset{T\ll\theta_{E}}{\longrightarrow} \displaystyle  3k\frac{\theta_{E}^{2}}{T^{2}}\frac{e^{\theta_{E}/T}}{e^{2\theta_{E}/T}}\ \ \ \ \ (37)
\displaystyle  \displaystyle  = \displaystyle  3k\frac{\theta_{E}^{2}}{T^{2}}e^{-\theta_{E}/T} \ \ \ \ \ (38)

The heat capacity again reduces to the classical value for high temperatures. The observed behaviour at low temperatures is that {C_{qu}\rightarrow T^{3}}, so this simple model fails for very low temperatures. However, as is shown by Shankar’s figure 7.3 Einstein’s quantum model is actually quite good for all but the lowest temperatures.

10 million hits

For anyone who is following such things, physicspages.com has just (in the past hour) had its 10 millionth hit. I’m still amazed and grateful that a site with so much mathematics on it has proved so popular. Many thanks to everyone who has visited.

Infinite square well – force to decrease well width

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 5.2, Exercise 5.2.4.

One way of comparing the classical and quantum pictures of a particle in an infinite square well is to calculate the force exerted on the walls by the particle. If a particle is in state {\left|n\right\rangle }, its energy is

\displaystyle  E_{n}=\frac{\left(n\pi\hbar\right)^{2}}{2mL^{2}} \ \ \ \ \ (1)

If the particle remains in this state as the walls are slowly pushed in, so that {L} slowly decreases, then its energy {E_{n}} will increase, meaning that work is done on the system. The force is the change in energy per unit distance, so the force required is

\displaystyle  F=-\frac{\partial E_{n}}{\partial L}=\frac{\left(n\pi\hbar\right)^{2}}{mL^{3}} \ \ \ \ \ (2)

If we treat the system classically, then a particle with energy {E_{n}} between the walls is effectively a free particle in this region (since the potential {V=0} there), so all its energy is kinetic. That is

\displaystyle   E_{n} \displaystyle  = \displaystyle  \frac{1}{2}mv^{2}\ \ \ \ \ (3)
\displaystyle  v \displaystyle  = \displaystyle  \sqrt{\frac{2E_{n}}{m}}\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  \frac{n\pi\hbar}{mL} \ \ \ \ \ (5)

The classical particle bounces elastically between the two walls, which means its velocity is exactly reversed at each collision. The momentum transfer in such a collision is

\displaystyle  \Delta p=2mv=\frac{2n\pi\hbar}{L} \ \ \ \ \ (6)

The time between successive collisions on the same wall is

\displaystyle  \Delta t=\frac{2L}{v}=\frac{2mL^{2}}{n\pi\hbar} \ \ \ \ \ (7)

Thus the average force exerted on one wall is

\displaystyle  \bar{F}=\frac{\Delta p}{\Delta t}=\frac{\left(n\pi\hbar\right)^{2}}{mL^{3}} \ \ \ \ \ (8)

Comparing with 2, we see that the quantum and classical forces in this case are the same.

Non-denumerable basis: position and momentum states

References: References: edX online course MIT 8.05 Section 5.6.

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.10; Exercises 1.10.1 – 1.10.3.

Although we’ve looked at position and momentum operators in quantum mechanics before, it’s worth another look at the ways that Zwiebach and Shankar introduce them.

First, we’ll have a look at Shankar’s treatment. He begins by considering a string fixed at each end, at positions {x=0} and {x=L}, then asks how we could convey the shape of the string to an observer who cannot see the string directly. We could note the position at some fixed finite number of points between 0 and {L}, but then the remote observer would have only a partial knowledge of the string’s shape; the locations of those portions of the string between the points at which it was measured are still unknown, although the observer could probably get a reasonable picture by interpolating between these points.

We can increase the number of points at which the position is measured to get a better picture, but to convey the exact shape of the string, we need to measure its position at an infinite number of points. This is possible (in principle) but leads to a problem with the definition of the inner product. For two vectors defined on a finite vector space with an orthonormal basis, the inner product is given by the usual formula for the dot product:

\displaystyle \left\langle f\left|g\right.\right\rangle \displaystyle = \displaystyle \sum_{i=1}^{n}f_{i}g_{i}\ \ \ \ \ (1)
\displaystyle \left\langle f\left|f\right.\right\rangle \displaystyle = \displaystyle \sum_{i=1}^{n}f_{i}^{2} \ \ \ \ \ (2)

where {f_{i}} and {g_{i}} are the components of {f} and {g} in the orthonormal basis. If we’re taking {f} to be the displacement of a string and we try to increase the accuracy of the picture by increasing the number {n} of points at which measurements are taken, then the value of {\left\langle f\left|f\right.\right\rangle } continues to increase as {n} increases (provided that {f\ne0} everywhere). As {n\rightarrow\infty} then {\left\langle f\left|f\right.\right\rangle \rightarrow\infty} as well, even though the system we’re measuring (a string of finite length with finite displacement) is certainly not infinite in any practical sense.

Shankar proposes getting around this problem by simply redefining the inner product for a finite vector space to be

\displaystyle \left\langle f\left|g\right.\right\rangle =\sum_{i=1}^{n}f\left(x_{i}\right)g\left(x_{i}\right)\Delta \ \ \ \ \ (3)

 

where {\Delta\equiv L/\left(n+1\right)}. That is, {\Delta} now becomes the distance between adjacent points at which measurements are taken. If we let {n\rightarrow\infty} this leads to the definition of the inner product as an integral

\displaystyle \left\langle f\left|g\right.\right\rangle \displaystyle = \displaystyle \int_{0}^{L}f\left(x\right)g\left(x\right)\;dx\ \ \ \ \ (4)
\displaystyle \left\langle f\left|f\right.\right\rangle \displaystyle = \displaystyle \int_{0}^{L}f^{2}\left(x\right)\;dx \ \ \ \ \ (5)

This looks familiar enough, if you’ve done any work with inner products in quantum mechanics, but there is a subtle point which Shankar overlooks. In going from 1 to 3, we have introduced a factor {\Delta} which, in the string example at least, has the dimensions of length, so the physical interpretation of these two equations is different. The units of {\left\langle f\left|g\right.\right\rangle } appear to be different in the two cases. Now in quantum theory, inner products of the continuous type usually involve the wave function multiplied by its complex conjugate, with possibly another operator thrown in if we’re trying to find the expectation value of some observable. The square modulus of the wave function, {\left|\Psi\right|^{2}}, is taken to be a probability density, so it has units of inverse length (in one dimension) or inverse volume (in three dimensions), which makes the integral work out properly.

Admittedly, when we’re using {f} to represent the displacement of a string, it’s not obvious what meaning the inner product of {f} with anything else would actually have, so maybe the point isn’t worth worrying about. However, it does seem to be something that it would be worth Shankar including a comment about.

From this point, Shankar continues by saying that this infinite dimensional vector space is spanned by basis vectors {\left|x\right\rangle }, with one basis vector for each value of {x}. We require this basis to be orthogonal, which means that we must have, if {x\ne x^{\prime}}

\displaystyle \left\langle x\left|x^{\prime}\right.\right\rangle =0 \ \ \ \ \ (6)

We then generalize the identity operator to be

\displaystyle I=\int\left|x\right\rangle \left\langle x\right|dx \ \ \ \ \ (7)

 

which leads to

\displaystyle \left\langle x\left|f\right.\right\rangle =\int\left\langle x\left|x^{\prime}\right.\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle dx^{\prime} \ \ \ \ \ (8)

The bra-ket {\left\langle x\left|f\right.\right\rangle } is the projection of the vector {\left|f\right\rangle } onto the {\left|x\right\rangle } basis vector, so it is just {f\left(x\right)}. This means

\displaystyle f\left(x\right)=\int\left\langle x\left|x^{\prime}\right.\right\rangle f\left(x^{\prime}\right)dx^{\prime} \ \ \ \ \ (9)

 

which leads to the definition of the Dirac delta function as the normalization of {\left\langle x\left|x^{\prime}\right.\right\rangle }:

\displaystyle \left\langle x\left|x^{\prime}\right.\right\rangle =\delta\left(x-x^{\prime}\right) \ \ \ \ \ (10)

Shankar then describes some properties of the delta function and its derivative, most of which we’ve already covered. For example, we’ve seen these two results for the delta function:

\displaystyle \delta\left(ax\right) \displaystyle = \displaystyle \frac{\delta\left(x\right)}{\left|a\right|}\ \ \ \ \ (11)
\displaystyle \frac{d\theta\left(x-x^{\prime}\right)}{dx} \displaystyle = \displaystyle \delta\left(x-x^{\prime}\right) \ \ \ \ \ (12)

where {\theta} is the step function

\displaystyle \theta\left(x-x^{\prime}\right)\equiv\begin{cases} 0 & x\le x^{\prime}\\ 1 & x>x^{\prime} \end{cases} \ \ \ \ \ (13)

One other result is that for a function {f\left(x\right)} with zeroes at a number of points {x_{i}}, we have

\displaystyle \delta\left(f\left(x\right)\right)=\sum_{i}\frac{\delta\left(x_{i}-x\right)}{\left|df/dx_{i}\right|} \ \ \ \ \ (14)

To see this, consider one of the {x_{i}} where {f\left(x_{i}\right)=0}. Expanding in a Taylor series about this point, we have

\displaystyle f\left(x_{i}+\left(x-x_{i}\right)\right) \displaystyle = \displaystyle f\left(x_{i}\right)+\left(x-x_{i}\right)\frac{df}{dx_{i}}+\ldots\ \ \ \ \ (15)
\displaystyle \displaystyle = \displaystyle 0+\left(x-x_{i}\right)\frac{df}{dx_{i}} \ \ \ \ \ (16)

From 11 we have

\displaystyle \delta\left(\left(x-x_{i}\right)\frac{df}{dx_{i}}\right)=\frac{\delta\left(x_{i}-x\right)}{\left|df/dx_{i}\right|} \ \ \ \ \ (17)

The behaviour is the same at all points {x_{i}} and since {\delta\left(x_{i}-x\right)=0} at all other {x_{j}\ne x_{i}} where {f\left(x_{j}\right)=0}, we can just add the delta functions for each zero of {f}.

Turning now to Zwiebach’s treatment, he begins with the basis states {\left|x\right\rangle } and position operator {\hat{x}} with the eigenvalue equation

\displaystyle \hat{x}\left|x\right\rangle =x\left|x\right\rangle \ \ \ \ \ (18)

and simply defines the inner product between two position states to be

\displaystyle \left\langle x\left|y\right.\right\rangle =\delta\left(x-y\right) \ \ \ \ \ (19)

With this definition, 9 follows immediately. We can therefore write a quantum state {\left|\psi\right\rangle } as

\displaystyle \left|\psi\right\rangle =I\left|\psi\right\rangle =\int\left|x\right\rangle \left\langle x\left|\psi\right.\right\rangle dx=\int\left|x\right\rangle \psi\left(x\right)dx \ \ \ \ \ (20)

That is, the vector {\left|\psi\right\rangle } is the integral of its projections {\psi\left(x\right)} onto the basis vectors {\left|x\right\rangle }.

The position operator {\hat{x}} is hermitian as can be seen from

\displaystyle \left\langle x_{1}\left|\hat{x}^{\dagger}\right|x_{2}\right\rangle \displaystyle = \displaystyle \left\langle x_{2}\left|\hat{x}\right|x_{1}\right\rangle ^*\ \ \ \ \ (21)
\displaystyle \displaystyle = \displaystyle x_{1}\left\langle x_{2}\left|x_{1}\right.\right\rangle ^*\ \ \ \ \ (22)
\displaystyle \displaystyle = \displaystyle x_{1}\delta\left(x_{2}-x_{1}\right)^*\ \ \ \ \ (23)
\displaystyle \displaystyle = \displaystyle x_{1}\delta\left(x_{2}-x_{1}\right)\ \ \ \ \ (24)
\displaystyle \displaystyle = \displaystyle x_{2}\delta\left(x_{2}-x_{1}\right)\ \ \ \ \ (25)
\displaystyle \displaystyle = \displaystyle \left\langle x_{1}\left|\hat{x}\right|x_{2}\right\rangle \ \ \ \ \ (26)

The fourth line follows because the delta function is real, and the fifth follows because {\delta\left(x_{2}-x_{1}\right)} is non-zero only when {x_{1}=x_{2}}.

Zwiebach then introduces the momentum eigenstates {\left|p\right\rangle } which are analogous to the position states {\left|x\right\rangle }, in that

\displaystyle \left\langle p^{\prime}\left|p\right.\right\rangle \displaystyle = \displaystyle \delta\left(p^{\prime}-p\right)\ \ \ \ \ (27)
\displaystyle I \displaystyle = \displaystyle \int dp\left|p\right\rangle \left\langle p\right|\ \ \ \ \ (28)
\displaystyle \hat{p}\left|p\right\rangle \displaystyle = \displaystyle p\left|p\right\rangle \ \ \ \ \ (29)
\displaystyle \tilde{\psi}\left(p\right) \displaystyle = \displaystyle \left\langle p\left|\psi\right.\right\rangle \ \ \ \ \ (30)

By the same calculation as for {\left|x\right\rangle }, we see that {\hat{p}} is hermitian.

To get a relation between the {\left|x\right\rangle } and {\left|p\right\rangle } bases, we require that {\left\langle x\left|p\right.\right\rangle } is the wave function for a particle with momentum {p} in the {x} basis, which we’ve seen is

\displaystyle \psi\left(x\right)=\frac{1}{\sqrt{2\pi\hbar}}e^{ipx/\hbar} \ \ \ \ \ (31)

 

Zwiebach then shows that this is consistent with the equation

\displaystyle \left\langle x\left|\hat{p}\right|\psi\right\rangle =\frac{h}{i}\frac{d}{dx}\left\langle x\left|\psi\right.\right\rangle =\frac{h}{i}\frac{d\psi\left(x\right)}{dx} \ \ \ \ \ (32)

We can get a similar relation by switching {x} and {p}:

\displaystyle \left\langle p\left|\hat{x}\right|\psi\right\rangle \displaystyle = \displaystyle \int dx\left\langle p\left|x\right.\right\rangle \left\langle x\left|\hat{x}\right|\psi\right\rangle \ \ \ \ \ (33)
\displaystyle \displaystyle = \displaystyle \int dx\left\langle x\left|p\right.\right\rangle ^*x\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (34)

From 31 we see

\displaystyle \left\langle x\left|p\right.\right\rangle ^* \displaystyle = \displaystyle \frac{1}{\sqrt{2\pi\hbar}}e^{-ipx/\hbar}\ \ \ \ \ (35)
\displaystyle \left\langle x\left|p\right.\right\rangle ^*x \displaystyle = \displaystyle i\hbar\frac{d}{dp}\left\langle x\left|p\right.\right\rangle ^*\ \ \ \ \ (36)
\displaystyle \int dx\left\langle x\left|p\right.\right\rangle ^*x\left\langle x\left|\psi\right.\right\rangle \displaystyle = \displaystyle i\hbar\int dx\;\frac{d}{dp}\left\langle x\left|p\right.\right\rangle ^*\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (37)
\displaystyle \displaystyle = \displaystyle i\hbar\frac{d}{dp}\int dx\;\left\langle x\left|p\right.\right\rangle ^*\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (38)
\displaystyle \displaystyle = \displaystyle i\hbar\frac{d}{dp}\int dx\;\left\langle p\left|x\right.\right\rangle \left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (39)
\displaystyle \displaystyle = \displaystyle i\hbar\frac{d\tilde{\psi}\left(p\right)}{dp} \ \ \ \ \ (40)

In the fourth line, we took the {\frac{d}{dp}} outside the integral since {p} occurs in only one term, and in the last line we used 7. Thus we have

\displaystyle \left\langle p\left|\hat{x}\right|\psi\right\rangle =i\hbar\frac{d\tilde{\psi}\left(p\right)}{dp} \ \ \ \ \ (41)

Exponentials of operators – Baker-Campbell-Hausdorff formula

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.9.

Although the result in this post isn’t covered in Shankar’s book, it’s a result that is frequently used in quantum theory, so it’s worth including at this point.

We’ve seen how to define a function of an operator if that function can be expanded in a power series. A common operator function is the exponential:

\displaystyle  f\left(\Omega\right)=e^{i\Omega} \ \ \ \ \ (1)

If {\Omega} is hermitian, the exponential {e^{i\Omega}} is unitary. If we try to calculate the exponential of two operators such as {e^{A+B}}, the result isn’t as simple as we might hope if {A} and {B} don’t commute. To see the problem, we can write this out as a power series

\displaystyle   e^{A+B} \displaystyle  = \displaystyle  \sum_{n=0}^{\infty}\frac{\left(A+B\right)^{n}}{n!}\ \ \ \ \ (2)
\displaystyle  \displaystyle  = \displaystyle  I+A+B+\frac{1}{2}\left(A+B\right)\left(A+B\right)+\ldots\ \ \ \ \ (3)
\displaystyle  \displaystyle  = \displaystyle  I+A+B+\frac{1}{2}\left(A^{2}+AB+BA+B^{2}\right)+\ldots \ \ \ \ \ (4)

The problem appears first in the fourth term in the series, since we can’t condense the {AB+BA} sum into {2AB} if {\left[A,B\right]\ne0}. In fact, the expansion of {e^{A}e^{B}} can be written entirely in terms of the commutators of {A} and {B} with each other, nested to increasingly higher levels. This formula is known as the Baker-Campbell-Hausdorff formula. Up to the fourth order commutator, the BCH formula gives

\displaystyle  e^{A}e^{B}=\exp\left[A+B+\frac{1}{2}\left[A,B\right]+\frac{1}{12}\left(\left[A,\left[A,B\right]\right]+\left[B,\left[B,A\right]\right]\right)-\frac{1}{24}\left[B,\left[A,\left[A,B\right]\right]\right]+\ldots\right] \ \ \ \ \ (5)

There is no known closed form expression for this result. However, an important special case that occurs frequently in quantum theory is the case where {\left[A,B\right]=cI}, where {c} is a complex scalar and {I} is the usual identity matrix. Since {cI} commutes with all operators, all terms from the third order upwards are zero, and we have

\displaystyle  e^{A}e^{B}=e^{A+B+\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (6)

We can prove this result as follows. Start with the operator function

\displaystyle  G\left(t\right)\equiv e^{t\left(A+B\right)}e^{-tA} \ \ \ \ \ (7)

where {t} is a scalar parameter (not necessarily time!).

From its definition,

\displaystyle  G\left(0\right)=I \ \ \ \ \ (8)

The inverse is

\displaystyle  G^{-1}\left(t\right)=e^{tA}e^{-t\left(A+B\right)} \ \ \ \ \ (9)

and the derivative is

\displaystyle   \frac{dG\left(t\right)}{dt} \displaystyle  = \displaystyle  \left(A+B\right)e^{t\left(A+B\right)}e^{-tA}-e^{t\left(A+B\right)}e^{-tA}A \ \ \ \ \ (10)

Note that we have to keep the {\left(A+B\right)} factor to the left of the {A} factor because {\left[A,B\right]\ne0}. Now we multiply:

\displaystyle   G^{-1}\frac{dG}{dt} \displaystyle  = \displaystyle  e^{tA}e^{-t\left(A+B\right)}\left[\left(A+B\right)e^{t\left(A+B\right)}e^{-tA}-e^{t\left(A+B\right)}e^{-tA}A\right]\ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  e^{tA}\left(A+B\right)e^{-tA}-A\ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  e^{tA}Ae^{-tA}+e^{tA}Be^{-tA}-A\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  e^{tA}Be^{-tA}\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  B+t\left[A,B\right]\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  B+ctI \ \ \ \ \ (16)

We used Hadamard’s lemma in the penultimate line, which in this case reduces to

\displaystyle  e^{tA}Be^{-tA}=B+t\left[A,B\right] \ \ \ \ \ (17)

because {\left[A,B\right]=cI} so all higher order commutators are zero.

We end up with an expression in which {A} has disappeared. This gives the differential equation for {G}:

\displaystyle  G^{-1}\frac{dG}{dt}=B+ctI \ \ \ \ \ (18)

We try a solution of the form (this apparently appears from divine inspiration):

\displaystyle  G\left(t\right)=e^{\alpha tB}e^{\beta ct^{2}} \ \ \ \ \ (19)

From which we get

\displaystyle   G^{-1} \displaystyle  = \displaystyle  e^{-\alpha tB}e^{-\beta ct^{2}}\ \ \ \ \ (20)
\displaystyle  \frac{dG}{dt} \displaystyle  = \displaystyle  \left(\alpha B+2\beta ct\right)e^{\alpha tB}e^{\beta ct^{2}}\ \ \ \ \ (21)
\displaystyle  G^{-1}\frac{dG}{dt} \displaystyle  = \displaystyle  \alpha B+2\beta ct \ \ \ \ \ (22)

Comparing this to 18, we have

\displaystyle   \alpha \displaystyle  = \displaystyle  1\ \ \ \ \ (23)
\displaystyle  \beta \displaystyle  = \displaystyle  \frac{1}{2}\ \ \ \ \ (24)
\displaystyle  G\left(t\right) \displaystyle  = \displaystyle  e^{tB}e^{\frac{1}{2}ct^{2}} \ \ \ \ \ (25)

Setting this equal to the original definition of {G} in 7 and then taking {t=1} we have

\displaystyle   e^{A+B}e^{-A} \displaystyle  = \displaystyle  e^{B}e^{c/2}\ \ \ \ \ (26)
\displaystyle  e^{A+B} \displaystyle  = \displaystyle  e^{B}e^{A}e^{\frac{1}{2}c}\ \ \ \ \ (27)
\displaystyle  \displaystyle  = \displaystyle  e^{B}e^{A}e^{\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (28)

If we swap {A} with {B} and use the fact that {A+B=B+A}, and also {\left[A,B\right]=-\left[B,A\right]}, we have

\displaystyle  e^{A+B}=e^{A}e^{B}e^{-\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (29)

This is the restricted form of the BCH formula for the case where {\left[A,B\right]} is a scalar.

Lorentz transformations as 2×2 matrices

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, recall that a Lorentz transformation can be represented by a {4\times4} matrix {\Lambda} which preserves the Minkowski length {x_{\mu}x^{\mu}} of all four-vectors {x}. This leads to the condition

\displaystyle  \Lambda^{T}g\Lambda=g \ \ \ \ \ (1)

where {g} is the flat-space Minkowski metric

\displaystyle  g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (2)

It turns out that we can map any 4-vector {x} to a {2\times2} Hermitian matrix {\widehat{x}} defined as

\displaystyle  \widehat{x}\equiv\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right] \ \ \ \ \ (3)

[Recall that a Hermitian matrix {H} is equal to the complex conjugate of its transpose:

\displaystyle  H=\left(H^{T}\right)^*\equiv H^{\dagger} \ \ \ \ \ (4)

Also note that Jaffe uses an unconventional notation for the Hermitian conjugate, as he uses a superscript * rather that a superscript {\dagger}. This can be confusing since usually a superscript * indicates just complex conjugate, without the transpose. I’ll use the more usual superscript {\dagger} for Hermitian conjugate here.]

Although we’re used to the scalar product of two vectors, it is also useful to define the scalar product of two matrices as

\displaystyle  \left\langle A,B\right\rangle \equiv\frac{1}{2}\mbox{Tr}\left(A^{\dagger}B\right) \ \ \ \ \ (5)

where ‘Tr’ means the trace of a matrix, which is the sum of its diagonal elements. Note that the scalar product of {\widehat{x}} with itself is

\displaystyle   \left\langle \widehat{x},\widehat{x}\right\rangle \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right]\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right]\ \ \ \ \ (6)
\displaystyle  \displaystyle  \displaystyle  \frac{1}{2}\left[\left(x_{0}+x_{3}\right)^{2}+2\left(x_{1}-ix_{2}\right)\left(x_{1}+ix_{2}\right)+\left(x_{0}-x_{3}\right)^{2}\right]\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  x_{0}^{2}+x_{1}^{2}+x_{2}^{2}+x_{3}^{2} \ \ \ \ \ (8)

The determinant of {\widehat{x}} is

\displaystyle   \det\widehat{x} \displaystyle  = \displaystyle  \left(x_{0}+x_{3}\right)\left(x_{0}-x_{3}\right)-\left(x_{1}-ix_{2}\right)\left(x_{1}+ix_{2}\right)\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  x_{0}^{2}-x_{1}^{2}-x_{2}^{2}-x_{3}^{2}\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  x_{\mu}x^{\mu} \ \ \ \ \ (11)

Thus {\det\widehat{x}} is the Minkowski length squared.

From 3, we observe that we can write {\widehat{x}} as a sum:

\displaystyle  \widehat{x}=\sum_{\mu=0}^{4}x_{\mu}\sigma_{\mu} \ \ \ \ \ (12)

where the {\sigma_{\mu}} are four Hermitian matrices:

\displaystyle   \sigma_{0} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]=I\ \ \ \ \ (13)
\displaystyle  \sigma_{1} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & 1\\ 1 & 0 \end{array}\right]\ \ \ \ \ (14)
\displaystyle  \sigma_{2} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (15)
\displaystyle  \sigma_{3} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right] \ \ \ \ \ (16)

The last three are the Pauli spin matrices that we met when looking at spin-{\frac{1}{2}} in quantum mechanics.

The {\sigma_{\mu}} are orthonormal under the scalar product operation, as we can verify by direct calculation. For example

\displaystyle   \left\langle \sigma_{2},\sigma_{3}\right\rangle \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right]\ \ \ \ \ (17)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\left(0+0\right)\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (19)

And:

\displaystyle   \left\langle \sigma_{2},\sigma_{2}\right\rangle \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\left(1+1\right)\ \ \ \ \ (21)
\displaystyle  \displaystyle  = \displaystyle  1 \ \ \ \ \ (22)

The other products work out similarly, so we have

\displaystyle  \left\langle \sigma_{\mu},\sigma_{\nu}\right\rangle =\delta_{\mu\nu} \ \ \ \ \ (23)

We can work out the inverse transformation to 3 by taking the scalar product of 12 with {\sigma_{\nu}}:

\displaystyle   \left\langle \sigma_{\nu},\widehat{x}\right\rangle \displaystyle  = \displaystyle  \sum_{\mu=0}^{4}x_{\mu}\left\langle \sigma_{\nu},\sigma_{\mu}\right\rangle \ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  \sum_{\mu=0}^{4}x_{\mu}\delta_{\nu\mu}\ \ \ \ \ (25)
\displaystyle  \displaystyle  = \displaystyle  x_{\nu} \ \ \ \ \ (26)

Now a few more theorems that will be useful later.

Irreducible Sets of Matrices

A set of matrices {\mathfrak{U}} is called irreducible if the only matrix {C} that commutes with every matrix in {\mathfrak{U}} is the identity matrix {I} (or a multiple of {I}). Any two of the three Pauli matrices {\sigma_{i}}, {i=1,2,3} above form an irreducible set of {2\times2} Hermitian matrices. This can be shown by direct calculation, which Jaffe does in detail in his article. For example, if we define {C} to be some arbitrary matrix

\displaystyle  C=\left[\begin{array}{cc} a & b\\ c & d \end{array}\right] \ \ \ \ \ (27)

where {a,b,c,d} are complex numbers, then

\displaystyle   C\sigma_{1} \displaystyle  = \displaystyle  \left[\begin{array}{cc} b & a\\ d & c \end{array}\right]\ \ \ \ \ (28)
\displaystyle  \sigma_{1}C \displaystyle  = \displaystyle  \left[\begin{array}{cc} c & d\\ a & b \end{array}\right] \ \ \ \ \ (29)

If {C} is to commute with {\sigma_{1}}, we must therefore require {b=c} and {a=d}.

Similarly, for {\sigma_{2}} we have

\displaystyle   C\sigma_{2} \displaystyle  = \displaystyle  \left[\begin{array}{cc} ib & -ia\\ id & -ic \end{array}\right]\ \ \ \ \ (30)
\displaystyle  \sigma_{2}C \displaystyle  = \displaystyle  \left[\begin{array}{cc} -ic & -id\\ ia & ib \end{array}\right] \ \ \ \ \ (31)

so that {C\sigma_{2}=\sigma_{2}C} requires {b=-c} and {a=d}.

And for {\sigma_{3}}:

\displaystyle   C\sigma_{3} \displaystyle  = \displaystyle  \left[\begin{array}{cc} a & -b\\ c & -d \end{array}\right]\ \ \ \ \ (32)
\displaystyle  \sigma_{3}C \displaystyle  = \displaystyle  \left[\begin{array}{cc} a & b\\ -c & -d \end{array}\right] \ \ \ \ \ (33)

so that {C\sigma_{3}=\sigma_{3}C} requires {b=-b} and {c=-c}, so {b=c=0} (no conditions can be inferred for {a} or {d}).

If we form a set {\mathfrak{U}} containing {\sigma_{3}} and one of {\sigma_{1}} or {\sigma_{2}}, we see that {b=c=0} and {a=d}, so {C} is a multiple of {I}. If we form {\mathfrak{U}} from {\sigma_{1}} and {\sigma_{2}} we again have {a=d}, but we must have simultaneously {b=c} and {b=-c} which can be true only if {b=c=0}, so again {C} is a multiple of {I}.

Unitary Matrices

A unitary matrix is one whose Hermitian conjugate is its inverse, so that {U^{\dagger}=U^{-1}}. Some properties of unitary matrices are given on the Wikipedia page, so we’ll just use those without going through the proofs. First, a unitary matrix is normal, which means that {U^{\dagger}U=UU^{\dagger}} (this actually follows from the condition {U^{\dagger}=U^{-1}}). Second, there is another unitary matrix {V} which diagonalizes {U}, that is

\displaystyle  V^{\dagger}UV=D \ \ \ \ \ (34)

where {D} is a diagonal, unitary matrix.

Third,

\displaystyle  \left|\det U\right|=1 \ \ \ \ \ (35)

(The determinant can be complex, but has magnitude 1.)

From this it follows that {\left|\det D\right|=1} and since {D} is unitary and diagonal, each diagonal element {d_{j}} of {D} must satisfy {\left|d_{j}\right|=1}. (Remember that {d_{j}} could be a complex number.) That means that {d_{j}=e^{i\lambda_{j}}} for some real number {\lambda_{j}}, so we can write

\displaystyle  D=e^{i\Lambda} \ \ \ \ \ (36)

where {\Lambda} is a diagonal hermitian matrix containing only real elements, non-zero along its diagonal: {\Lambda_{ij}=\lambda_{j}\delta_{ij}}. As usual, the exponential of a matrix is interpreted in terms of its power series, so that

\displaystyle  e^{i\Lambda}=1+i\Lambda+\frac{\left(i\Lambda\right)^{2}}{2!}+\frac{\left(i\Lambda\right)^{3}}{3!}+\ldots \ \ \ \ \ (37)

For a diagonal matrix {\Lambda} with diagonal elements {\Lambda_{jj}=\lambda_{j}}, the diagonal elements of {\Lambda^{n}} are just {\Lambda_{jj}^{n}=\lambda_{j}^{n}}.

From 34, we have

\displaystyle   U \displaystyle  = \displaystyle  VDV^{\dagger}\ \ \ \ \ (38)
\displaystyle  \displaystyle  = \displaystyle  Ve^{i\Lambda}V^{\dagger} \ \ \ \ \ (39)

Now we also have, since {VV^{\dagger}=I}

\displaystyle   V\Lambda^{n}V^{\dagger} \displaystyle  = \displaystyle  V\Lambda\left(VV^{\dagger}\right)\Lambda\left(VV^{\dagger}\right)\ldots\Lambda V^{\dagger}\ \ \ \ \ (40)
\displaystyle  \displaystyle  = \displaystyle  \left(V\Lambda V^{\dagger}\right)^{n} \ \ \ \ \ (41)

Therefore, from 37

\displaystyle   U \displaystyle  = \displaystyle  Ve^{i\Lambda}V^{\dagger}\ \ \ \ \ (42)
\displaystyle  \displaystyle  = \displaystyle  e^{iV\Lambda V^{\dagger}}\ \ \ \ \ (43)
\displaystyle  \displaystyle  \equiv \displaystyle  e^{iH} \ \ \ \ \ (44)

where {H=V\Lambda V^{\dagger}} is another Hermitian matrix. In other words, we can always write a unitary matrix as the exponential of a Hermitian matrix.

In the case where {H} is a {2\times2} matrix, we can write it in terms of the {\sigma_{\mu}} matrices above as

\displaystyle  H=\sum_{\mu=0}^{3}a_{\mu}\sigma_{\mu} \ \ \ \ \ (45)

where the {a_{\mu}} are real, since the diagonal elements of a Hermitian matrix must be real. This follows because the {\sigma_{\mu}} form an orthonormal basis for the {2\times2} Hermitian matrices. [For some reason, Jaffe refers to the {a_{\mu}}as {\lambda_{\mu}} which is confusing since he has used {\lambda_{\mu}} as the diagonal elements of {\Lambda} above, and they’re not the same thing.]

If {\det U=+1}, then

\displaystyle   \det U \displaystyle  = \displaystyle  \det\left(VDV^{\dagger}\right)\ \ \ \ \ (46)
\displaystyle  \displaystyle  = \displaystyle  \det\left(VV^{\dagger}D\right)\ \ \ \ \ (47)
\displaystyle  \displaystyle  = \displaystyle  \det D\ \ \ \ \ (48)
\displaystyle  \displaystyle  = \displaystyle  \det e^{i\Lambda} \ \ \ \ \ (49)

The second line follows because the determinant of a product of matrices is the product of the determinants, so we can rearrange the multiplication order. To evaluate the last line, we observe that for a diagonal matrix {\Lambda}, using 37 and applying the result to each diagonal element

\displaystyle  e^{i\Lambda}=\left[\begin{array}{cc} e^{i\Lambda_{11}} & 0\\ 0 & e^{i\Lambda_{22}} \end{array}\right] \ \ \ \ \ (50)

Therefore

\displaystyle  \det e^{i\Lambda}=e^{i\left(\Lambda_{11}+\Lambda_{22}\right)}=e^{i\mbox{Tr}\Lambda} \ \ \ \ \ (51)

[By the way, the relation {\det e^{A}=e^{\mbox{Tr}A}} is actually true for any square matrix {A}, and is a corollary of Jacobi’s formula.]

We can now use the cyclic property of the trace (another matrix algebra theroem) which says that for 3 matrices {A,B,C},

\displaystyle  \mbox{Tr}\left(ABC\right)=\mbox{Tr}\left(CAB\right)=\mbox{Tr}\left(BCA\right) \ \ \ \ \ (52)

This gives us

\displaystyle  \mbox{Tr}H=\mbox{Tr}\left(V\Lambda V^{\dagger}\right)=\mbox{Tr}\left(V^{\dagger}V\Lambda\right)=\mbox{Tr}\Lambda \ \ \ \ \ (53)

Finally, from 45 and the fact that the traces of the {\sigma_{i}} are all zero for {i=1,2,3}, and {\mbox{Tr}\sigma_{0}=2}, we have

\displaystyle  \det U=\det e^{i\Lambda}=e^{i\mbox{Tr}H}=e^{2ia_{0}}=1 \ \ \ \ \ (54)

Thus {a_{0}=n\pi} for some integer {n}, but as all values of {n} give the same original unitary matrix {U}, we can choose {n=0} so that {a_{0}=0} and

\displaystyle  H=\sum_{\mu=1}^{3}a_{\mu}\sigma_{\mu} \ \ \ \ \ (55)

Rotation group: basics, generators and commutators

Reference: Lewis H. Ryder (1996), Quantum Field Theory, Second edition, Cambridge University Press. Section 2.3.

A group important in physics is the 3-d rotation group, which consists of all rotations about the origin in 3-d space. Such a rotation can be written as

\displaystyle r^{\prime}=Rr \ \ \ \ \ (1)

where {r=\left(\begin{array}{c} x\\ y\\ z \end{array}\right)} is the vector before rotation, {r^{\prime}=\left(\begin{array}{c} x^{\prime}\\ y^{\prime}\\ z^{\prime} \end{array}\right)} is the vector after rotation and {R} is the {3\times3} rotation matrix. Because all rotations are about an axis going through the origin, the distances are unchanged, so that {r^{2}=r^{\prime2}}. In matrix notation, using a superscript {T} to indicate the transpose, we have

\displaystyle x^{2}+y^{2}+z^{2}=r^{T}r \displaystyle = \displaystyle r^{\prime T}r^{\prime}\ \ \ \ \ (2)
\displaystyle \displaystyle = \displaystyle r^{T}R^{T}Rr \ \ \ \ \ (3)

[The transpose of the product of two or more matrices is the product of the transposes, in reverse order.]

From this we see that

\displaystyle R^{T}R=I \ \ \ \ \ (4)

 

where {I} is the {3\times3} identity matrix. This is the definition of an orthogonal matrix, that is, a matrix whose rows and columns are orthogonal unit vectors. If this equation is written out in terms of components, it provides 6 independent constraints on the elements of {R}. To get a diagonal element {I_{kk}=1}, for {k=1,2,3} we multiply the {k}th row of {R^{T}} into the {k}th column of {R}. However, the {k}th row of {R^{T}} is the {k}th column of {R}, so

\displaystyle I_{kk}=1 \displaystyle = \displaystyle \left(R^{T}R\right)_{kk}\ \ \ \ \ (5)
\displaystyle \displaystyle = \displaystyle \sum_{i=1}^{3}R_{ik}R_{ik}\ \ \ \ \ (6)
\displaystyle \displaystyle = \displaystyle \sum_{i=1}^{3}R_{ik}^{2} \ \ \ \ \ (7)

This provides 3 conditions, one for each value of {k}.

An off-diagonal element such as {I_{12}=0} is obtained from

\displaystyle I_{12}=0 \displaystyle = \displaystyle \left(R^{T}R\right)_{12}\ \ \ \ \ (8)
\displaystyle \displaystyle = \displaystyle \sum_{i=1}^{3}R_{i1}R_{i2} \ \ \ \ \ (9)

In general, the off-diagonal elements give the condition

\displaystyle I_{kj}=\sum_{i=1}^{3}R_{ik}R_{ij} \ \ \ \ \ (10)

The sum on the RHS is symmetric under exchange of {k} and {j} so, for example, the condition {I_{21}=0} gives the same equation as {I_{12}=0}. Thus the off-diagonal elements provide only 3 more independent constraints, which can be taken as {I_{12}=I_{13}=I_{23}=0}.

The rotation matrices satisfy the four conditions for forming a group. They are complete, since if we have two rotation matrices {R_{1}} and {R_{2}}, then the combined rotation {R_{1}R_{2}} is also a rotation matrix, since it satisfies

\displaystyle \left(R_{1}R_{2}\right)^{T}R_{1}R_{2}=R_{2}^{T}R_{1}^{T}R_{1}R_{2}=R_{2}^{T}IR_{2}=R_{2}^{T}R_{2}=I \ \ \ \ \ (11)

Rotation is associative, since a series of rotations is just performed in order from right to left, so inserting parentheses is meaningless: {\left(R_{1}R_{2}\right)R_{3}=R_{1}\left(R_{2}R_{3}\right)}. The identity rotation is just {R=I} (that is, no rotation at all), and the inverse is just the transpose: {R^{-1}=R^{T}}, since {R^{T}R=I}.

The rotation group in 3-d is known as the {O\left(3\right)} group (and in {n} dimensions as {O\left(n\right)}).

Because the condition 4 imposes 6 constraints on a matrix with 9 elements, we must specify 3 parameters to define a rotation uniquely. Those with a background in classical mechanics might be familiar with the three Euler angles; however, for our purposes we can just define rotations about the {x}, {y} and {z} axes with angles {\phi}, {\psi} and {\theta} respectively. We then get the three usual rotation matrices:

\displaystyle R_{x}\left(\phi\right) \displaystyle = \displaystyle \left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & \cos\phi & \sin\phi\\ 0 & -\sin\phi & \cos\phi \end{array}\right]\ \ \ \ \ (12)
\displaystyle R_{y}\left(\phi\right) \displaystyle = \displaystyle \left[\begin{array}{ccc} \cos\psi & 0 & -\sin\psi\\ 0 & 1 & 0\\ \sin\psi & 0 & \cos\psi \end{array}\right]\ \ \ \ \ (13)
\displaystyle R_{z}\left(\phi\right) \displaystyle = \displaystyle \left[\begin{array}{ccc} \cos\theta & \sin\theta & 0\\ -\sin\theta & \cos\theta & 0\\ 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (14)

Any rotation about the origin can be decomposed into a succession of these three rotations in some order. If we do two rotations about two different axes, the operation is not commutative as you can verify by direct substitution. For example, if we do a rotation by {\theta=\frac{\pi}{2}} about the {z} axis followed by {\phi=\frac{\pi}{2}} about the {x} axis, the result is

\displaystyle R_{x}\left(\frac{\pi}{2}\right)R_{z}\left(\frac{\pi}{2}\right) \displaystyle = \displaystyle \left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{array}\right]\left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 1 \end{array}\right]\ \ \ \ \ (15)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{ccc} 0 & 1 & 0\\ 0 & 0 & 1\\ 1 & 0 & 0 \end{array}\right] \ \ \ \ \ (16)

Reversing the order we get

\displaystyle R_{z}\left(\frac{\pi}{2}\right)R_{x}\left(\frac{\pi}{2}\right) \displaystyle = \displaystyle \left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 1 \end{array}\right]\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{array}\right]\ \ \ \ \ (17)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{ccc} 0 & 0 & 1\\ -1 & 0 & 0\\ 0 & -1 & 0 \end{array}\right] \ \ \ \ \ (18)

Thus the rotation group is non-Abelian.

Finally, we can define three curious objects called generators of the group. Their definitions will look bizarre, but be patient as we will see that they do have useful properties. We have

\displaystyle J_{x} \displaystyle \equiv \displaystyle -i\left.\frac{dR_{x}\left(\phi\right)}{d\phi}\right|_{\phi=0}=\left[\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & -i\\ 0 & i & 0 \end{array}\right]\ \ \ \ \ (19)
\displaystyle J_{y} \displaystyle \equiv \displaystyle -i\left.\frac{dR_{y}\left(\psi\right)}{d\psi}\right|_{\psi=0}=\left[\begin{array}{ccc} 0 & 0 & i\\ 0 & 0 & 0\\ -i & 0 & 0 \end{array}\right]\ \ \ \ \ (20)
\displaystyle J_{z} \displaystyle \equiv \displaystyle -i\left.\frac{dR_{z}\left(\theta\right)}{d\theta}\right|_{\theta=0}=\left[\begin{array}{ccc} 0 & -i & 0\\ i & 0 & 0\\ 0 & 0 & 0 \end{array}\right] \ \ \ \ \ (21)

For an infinitesimal rotation {\delta\theta} about the {z} axis we can, to first order, approximate {\cos\delta\theta\approx1} and {\sin\delta\theta\approx\delta\theta}, so from 14 and 21 we have, to first order

\displaystyle R_{z}\left(\delta\theta\right) \displaystyle = \displaystyle \left[\begin{array}{ccc} 1 & \delta\theta & 0\\ -\delta\theta & 1 & 0\\ 0 & 0 & 1 \end{array}\right]\ \ \ \ \ (22)
\displaystyle \displaystyle = \displaystyle I+iJ_{z}\delta\theta \ \ \ \ \ (23)

For a finite rotation through an angle {\theta}, we can apply {R_{z}\left(\delta\theta\right)} {N} times so that if {\delta\theta=\theta/N}:

\displaystyle R_{z}\left(\theta\right) \displaystyle = \displaystyle \left[R_{z}\left(\delta\theta\right)\right]^{N}\ \ \ \ \ (24)
\displaystyle \displaystyle = \displaystyle \left[I+iJ_{z}\frac{\theta}{N}\right]^{N} \ \ \ \ \ (25)

We can now take the limit {N\rightarrow\infty} and use the definition (covered in introductory calculus books) of {e^{x}\equiv\lim_{N\rightarrow\infty}\left(1+\frac{x}{N}\right)^{N}} to get

\displaystyle R_{z}\left(\theta\right)=e^{iJ_{z}\theta} \ \ \ \ \ (26)

Since {J_{z}} in the exponent is a matrix, we can interpret this exponential in terms of its series expansion:

\displaystyle e^{iJ_{z}\theta} \displaystyle = \displaystyle I+iJ_{z}\theta-J_{z}^{2}\frac{\theta^{2}}{2!}-iJ_{z}^{3}\frac{\theta^{3}}{3!}+J_{z}^{4}\frac{\theta^{4}}{4!}+\ldots\ \ \ \ \ (27)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{array}\right]+\theta\left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]-\frac{\theta^{2}}{2!}\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (28)
\displaystyle \displaystyle \displaystyle -\frac{\theta^{3}}{3!}\left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]+\frac{\theta^{4}}{4!}\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{array}\right]+\ldots \ \ \ \ \ (29)

The matrices on the RHS alternate between {\left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]} and {\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{array}\right]}, so the terms on the RHS can be seen to be the series expansions of {\pm\mbox{sine}} (in the 1,2 and 2,1 elements) and cosine (in the 1,1 and 2,2 elements), giving 14 again:

\displaystyle e^{iJ_{z}\theta}=\left[\begin{array}{ccc} \cos\theta & \sin\theta & 0\\ -\sin\theta & \cos\theta & 0\\ 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (30)

This might seem like a pointless exercise, since we’ve just managed to express the rotation matrix in a fancy form as a complex exponential, but we’ll need to be patient to see how useful this turns out to be later.

Finally, we can work out the commutators of the generators. [As you might recall from quantum mechanics, commutators play an important role in the theory, and it turns out to be the same here, so we’ll need these for later.] The commutators can be calculated explicitly by doing a bit of matrix arithmetic, directly from 19, 20 and 21. For example

\displaystyle \left[J_{x},J_{y}\right] \displaystyle \equiv \displaystyle J_{x}J_{y}-J_{y}J_{x}\ \ \ \ \ (31)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & -i\\ 0 & i & 0 \end{array}\right]\left[\begin{array}{ccc} 0 & 0 & i\\ 0 & 0 & 0\\ -i & 0 & 0 \end{array}\right]-\left[\begin{array}{ccc} 0 & 0 & i\\ 0 & 0 & 0\\ -i & 0 & 0 \end{array}\right]\left[\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & -i\\ 0 & i & 0 \end{array}\right]\ \ \ \ \ (32)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{ccc} 0 & 0 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]-\left[\begin{array}{ccc} 0 & -1 & 0\\ 0 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (33)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right]\ \ \ \ \ (34)
\displaystyle \displaystyle = \displaystyle iJ_{z} \ \ \ \ \ (35)

By similar calculations, we can get the other two commutators, which are just cyclic permutations of this one:

\displaystyle \left[J_{z},J_{x}\right] \displaystyle = \displaystyle iJ_{y}\ \ \ \ \ (36)
\displaystyle \left[J_{y},J_{z}\right] \displaystyle = \displaystyle iJ_{x} \ \ \ \ \ (37)

Except for a factor of {\hbar}, these are the same commutation relations as those satisfied by angular momentum in quantum mechanics.

Ongoing work

25 April 2016

All upgrades of plots are now complete. Normal service has resumed.

24 April 2016

At the moment, I’m working through those posts that have Maple-generated plots in them and upgrading the quality of the plots. This involves removing the old plots and uploading new ones to replace them, so occasionally you might find a post where there is a ‘broken image’ icon in place of a plot. This should last for only a couple of minutes or so for each affected post, so try waiting a bit and then refreshing the page before you send in a comment that the image is broken.