Tag Archives: Hermitian operators

Spherically symmetric potentials: hermiticity of the radial function

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Chapter 12, Exercise 12.6.3.

[If some equations are too small to read easily, use your browser’s magnifying option (Ctrl + on Chrome, probably something similar on other browsers).]

The Schrödinger equation in 3-d for a potential that depends only on {r} is

\displaystyle -\frac{\hbar^{2}}{2\mu}\left[\frac{1}{r^{2}}\frac{\partial}{\partial r}\left(r^{2}\frac{\partial\psi}{\partial r}\right)+\frac{1}{r^{2}\sin\theta}\frac{\partial}{\partial\theta}\left(\sin\theta\frac{\partial\psi}{\partial\theta}\right)+\frac{1}{r^{2}\sin^{2}\theta}\left(\frac{\partial^{2}\psi}{\partial\phi^{2}}\right)\right]+V\psi=E\psi \ \ \ \ \ (1)

Eigenfunctions in this equation satisfy

\displaystyle \psi=R_{Elm}\left(r\right)Y_{l}^{m}\left(\theta,\phi\right) \ \ \ \ \ (2)

where the subscript {Elm} refers to the energy {E} and the angular momentum quantum numbers {l} and {m}. {Y_{l}^{m}} is a spherical harmonic and {R_{Elm}} is the radial function which depends on the potential {V}. With the substitution

\displaystyle R_{El}\left(r\right)=\frac{U_{El}\left(r\right)}{r} \ \ \ \ \ (3)

the differential equation reduces to

\displaystyle \left[-\frac{\hbar^{2}}{2\mu}\frac{d^{2}}{dr^{2}}+V\left(r\right)+\frac{l\left(l+1\right)\hbar^{2}}{2\mu r^{2}}\right]U_{El}=EU_{El} \ \ \ \ \ (4)

 

The quantity in the square brackets is an operator which will call {D_{l}\left(r\right)}:

\displaystyle D_{l}\left(r\right)\equiv-\frac{\hbar^{2}}{2\mu}\frac{d^{2}}{dr^{2}}+V\left(r\right)+\frac{l\left(l+1\right)\hbar^{2}}{2\mu r^{2}} \ \ \ \ \ (5)

 

Equation 4 is similar to the 1-d Schrödinger equation except that the variable {r} goes from 0 to {\infty} rather than from {-\infty} to {\infty}, and the potential is modified by the ‘centrifugal term’ {\frac{l\left(l+1\right)\hbar^{2}}{2\mu r^{2}}}. Because {r} begins at 0 rather than {-\infty}, the usual boundary conditions on {U} (that it tend to zero at {\pm\infty}) must also be modified. We can get the new boundary conditions by imposing the hermiticity condition, which says that

\displaystyle \int_{0}^{\infty}U_{1}^*\left(D_{l}U_{2}\right)dr \displaystyle = \displaystyle \left[\int_{0}^{\infty}U_{2}^*\left(D_{l}U_{1}\right)dr\right]^*\ \ \ \ \ (6)
\displaystyle \displaystyle = \displaystyle \int_{0}^{\infty}\left(D_{l}U_{1}\right)^*U_{2}dr \ \ \ \ \ (7)

The two terms {V\left(r\right)+\frac{l\left(l+1\right)\hbar^{2}}{2\mu r^{2}}} in 5 are real and multiplicative, so the hermiticity condition is automatically satisfied for them. For the derivative term, we can use the usual integration by parts.

\displaystyle \int_{0}^{\infty}U_{1}^*\left(\frac{d^{2}}{dr^{2}}U_{2}\right)dr \displaystyle = \displaystyle \left.U_{1}^*\frac{dU_{2}}{dr}\right|_{0}^{\infty}-\int_{0}^{\infty}\frac{dU_{1}^*}{dr}\frac{dU_{2}}{dr}dr\ \ \ \ \ (8)
\displaystyle \displaystyle = \displaystyle \left.U_{1}^*\frac{dU_{2}}{dr}\right|_{0}^{\infty}-\left.U_{2}\frac{dU_{1}^*}{dr}\right|_{0}^{\infty}+\int_{0}^{\infty}U_{2}\left(\frac{d^{2}}{dr^{2}}U_{1}^*\right)dr \ \ \ \ \ (9)

If we require

\displaystyle \left.U_{1}^*\frac{dU_{2}}{dr}\right|_{0}^{\infty}-\left.U_{2}\frac{dU_{1}^*}{dr}\right|_{0}^{\infty}=0 \ \ \ \ \ (10)

then we have

\displaystyle \int_{0}^{\infty}U_{1}^*\left(\frac{d^{2}}{dr^{2}}U_{2}\right)dr \displaystyle = \displaystyle \int_{0}^{\infty}U_{2}\left(\frac{d^{2}}{dr^{2}}U_{1}^*\right)dr\ \ \ \ \ (11)
\displaystyle \displaystyle = \displaystyle \left[\int_{0}^{\infty}U_{2}^*\left(\frac{d^{2}}{dr^{2}}U_{1}\right)dr\right]^* \ \ \ \ \ (12)

and the hermiticity condition 6 is satisfied.

Total angular momentum is Hermitian

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Chapter 12, Exercise 12.5.9.

[If some equations are too small to read easily, use your browser’s magnifying option (Ctrl + on Chrome, probably something similar on other browsers).]

The total angular momentum operator {L^{2}} can be written in spherical coordinates as

\displaystyle L^{2}=-\hbar^{2}\left[\frac{1}{\sin\theta}\frac{\partial}{\partial\theta}\left(\sin\theta\frac{\partial}{\partial\theta}\right)+\frac{1}{\sin^{2}\theta}\frac{\partial^{2}}{\partial\phi^{2}}\right] \ \ \ \ \ (1)

 

As {L^{2}} is an observable, it should be Hermitian. We can verify this by showing that

\displaystyle \left\langle \psi_{2}\left|L^{2}\right|\psi_{1}\right\rangle =\left\langle \psi_{1}\left|L^{2}\right|\psi_{2}\right\rangle ^* \ \ \ \ \ (2)

In spherical coordinates, this becomes

\displaystyle \int\psi_{2}^*\left(L^{2}\psi_{1}\right)d\Omega=\left[\int\psi_{1}^*\left(L^{2}\psi_{2}\right)d\Omega\right]^* \ \ \ \ \ (3)

 

The element of solid angle {d\Omega=\sin\theta\;d\theta\;d\phi}, so the full integral is

\displaystyle \int\psi_{2}^*\left(L^{2}\psi_{1}\right)d\Omega=\int_{0}^{2\pi}\int_{0}^{\pi}\psi_{2}^*\left(L^{2}\psi_{1}\right)\sin\theta\;d\theta\;d\phi \ \ \ \ \ (4)

We can verify 3 by showing that it is true for each of the two terms in 1 separately. As usual for these sorts of integrals, we need to use integration by parts. To simplify things, we’ll consider {-L^{2}/\hbar^{2}} so we can deal only with the terms in the brackets in 1. We’ll also use the shorthand notation

\displaystyle s \displaystyle \equiv \displaystyle \sin\theta\ \ \ \ \ (5)
\displaystyle c \displaystyle \equiv \displaystyle \cos\theta \ \ \ \ \ (6)

Also, a prime indicates a derivative with respect to {\theta}: {\psi_{1}^{\prime}\equiv\frac{\partial\psi_{1}}{\partial\theta}}, etc.

For the first term, we have, considering only the integration over {\theta}:

\displaystyle \int_{0}^{\pi}\psi_{2}^*\frac{1}{s}\frac{\partial}{\partial\theta}\left(s\frac{\partial\psi_{1}}{\partial\theta}\right)s\;d\theta \displaystyle = \displaystyle \int_{0}^{\pi}\left[\psi_{2}^*c\psi_{1}^{\prime}+\psi_{2}^*s\psi_{1}^{\prime\prime}\right]d\theta\ \ \ \ \ (7)
\displaystyle \displaystyle = \displaystyle \left.\psi_{2}^*c\psi_{1}\right|_{0}^{\pi}+\left.\psi_{2}^*s\psi_{1}^{\prime}\right|_{0}^{\pi}-\ \ \ \ \ (8)
\displaystyle \displaystyle \displaystyle \int_{0}^{\pi}\left[\left(\psi_{2}^*\right)^{\prime}c\psi_{1}-\psi_{2}^*s\psi_{1}\right]d\theta-\ \ \ \ \ (9)
\displaystyle \displaystyle \displaystyle \int_{0}^{\pi}\left[\left(\psi_{2}^*\right)^{\prime}s\psi_{1}^{\prime}+\psi_{2}^*c\psi_{1}^{\prime}\right]d\theta \ \ \ \ \ (10)

The second term in 8 is zero since {\sin0=\sin\pi=0}, but we can’t ignore the first term, which is not, in general, zero. Thus we are left with

\displaystyle \int_{0}^{\pi}\psi_{2}^*\frac{\partial}{\partial\theta}\left(s\frac{\partial\psi_{1}}{\partial\theta}\right)\;d\theta \displaystyle = \displaystyle \left.\psi_{2}^*c\psi_{1}\right|_{0}^{\pi}-\ \ \ \ \ (11)
\displaystyle \displaystyle \displaystyle \int_{0}^{\pi}\left[\left(\psi_{2}^*\right)^{\prime}c\psi_{1}-\psi_{2}^*s\psi_{1}\right]d\theta-\ \ \ \ \ (12)
\displaystyle \displaystyle \displaystyle \int_{0}^{\pi}\left[\left(\psi_{2}^*\right)^{\prime}s\psi_{1}^{\prime}+\psi_{2}^*c\psi_{1}^{\prime}\right]d\theta \ \ \ \ \ (13)

We can now integrate the last line by parts again to get rid of the derivatives of {\psi_{1}}:

\displaystyle -\int_{0}^{\pi}\left[\left(\psi_{2}^*\right)^{\prime}s\psi_{1}^{\prime}+\psi_{2}^*c\psi_{1}^{\prime}\right]d\theta \displaystyle = \displaystyle -\left.\left(\psi_{2}^*\right)^{\prime}s\psi_{1}\right|_{0}^{\pi}-\left.\psi_{2}^*c\psi_{1}\right|_{0}^{\pi}+\ \ \ \ \ (14)
\displaystyle \displaystyle \displaystyle \int_{0}^{\pi}\left[\psi_{1}\left(\psi_{2}^*\right)^{\prime\prime}s+\left(\psi_{2}^*\right)^{\prime}c\psi_{1}\right]d\theta+\ \ \ \ \ (15)
\displaystyle \displaystyle \displaystyle \int_{0}^{\pi}\left[\psi_{1}\left(\psi_{2}^*\right)^{\prime}c-\psi_{2}^*s\psi_{1}\right]d\theta\ \ \ \ \ (16)
\displaystyle \displaystyle = \displaystyle -\left.\psi_{2}^*c\psi_{1}\right|_{0}^{\pi}+\ \ \ \ \ (17)
\displaystyle \displaystyle \displaystyle \int_{0}^{\pi}\left[\psi_{1}\left(\psi_{2}^*\right)^{\prime\prime}s+\left(\psi_{2}^*\right)^{\prime}c\psi_{1}\right]d\theta+\ \ \ \ \ (18)
\displaystyle \displaystyle \displaystyle \int_{0}^{\pi}\left[\psi_{1}\left(\psi_{2}^*\right)^{\prime}c-\psi_{2}^*s\psi_{1}\right]d\theta \ \ \ \ \ (19)

Inserting this back into 11 and cancelling terms, we have

\displaystyle \int_{0}^{\pi}\psi_{2}^*\frac{\partial}{\partial\theta}\left(s\frac{\partial\psi_{1}}{\partial\theta}\right)\;d\theta=\int_{0}^{\pi}\left[\psi_{1}\left(\psi_{2}^*\right)^{\prime\prime}s+\left(\psi_{2}^*\right)^{\prime}c\psi_{1}\right]d\theta \ \ \ \ \ (20)

Comparing this with 7, we see that

\displaystyle \int_{0}^{\pi}\psi_{2}^*\frac{\partial}{\partial\theta}\left(s\frac{\partial\psi_{1}}{\partial\theta}\right)\;d\theta=\left[\int_{0}^{\pi}\psi_{1}^*\frac{\partial}{\partial\theta}\left(s\frac{\partial\psi_{2}}{\partial\theta}\right)\;d\theta\right]^* \ \ \ \ \ (21)

Thus the first term in 1 is Hermitian. (As this first term involves no derivatives with respect to {\phi}, the integration over {\phi} is automatically Hermitian.)

For the second term in 1, we need to consider only the integral over {\phi}, so we have

\displaystyle \int_{0}^{2\pi}\psi_{2}^*\frac{1}{\sin^{2}\theta}\frac{\partial^{2}\psi_{1}}{\partial\phi^{2}}\sin\theta\;d\phi=\frac{1}{s}\int_{0}^{2\pi}\psi_{2}^*\frac{\partial^{2}\psi_{1}}{\partial\phi^{2}}\;d\phi \ \ \ \ \ (22)

(As we’re integrating over {\phi}, terms in {\theta} act as constants and can be taken outside the integral.) The first integration by parts gives (where a prime now indicates a derivative with respect to {\phi}):

\displaystyle \int_{0}^{2\pi}\psi_{2}^*\psi_{1}^{\prime\prime}\;d\phi \displaystyle = \displaystyle \left.\psi_{2}^*\psi_{1}^{\prime}\right|_{0}^{2\pi}-\int_{0}^{2\pi}\left(\psi_{2}^*\right)^{\prime}\psi_{1}^{\prime}\;d\phi\ \ \ \ \ (23)
\displaystyle \displaystyle = \displaystyle -\int_{0}^{2\pi}\left(\psi_{2}^*\right)^{\prime}\psi_{1}^{\prime}\;d\phi \ \ \ \ \ (24)

This time, we’re able to set the integrated term to zero, since {\phi=0} and {\phi=2\pi} refer to the same angle. A second integration by parts gives

\displaystyle -\int_{0}^{2\pi}\left(\psi_{2}^*\right)^{\prime}\psi_{1}^{\prime}\;d\phi \displaystyle = \displaystyle -\left.\left(\psi_{2}^*\right)^{\prime}\psi_{1}\right|_{0}^{2\pi}+\int_{0}^{2\pi}\left(\psi_{2}^*\right)^{\prime\prime}\psi_{1}\;d\phi\ \ \ \ \ (25)
\displaystyle \displaystyle = \displaystyle \int_{0}^{2\pi}\left(\psi_{2}^*\right)^{\prime\prime}\psi_{1}\;d\phi\ \ \ \ \ (26)
\displaystyle \displaystyle = \displaystyle \left[\int_{0}^{2\pi}\psi_{1}^*\psi_{2}^{\prime\prime}\;d\phi\right]^* \ \ \ \ \ (27)

Thus both terms in 1 are Hermitian, so the complete operator {L^{2}} is also Hermitian.

Differential operators – matrix elements and hermiticity

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.10.

Here, we’ll revisit the differential operator on a continuous vector space which we looked at earlier in its role as the momentum operator. This time around, we’ll use the bra-ket notation and vector space results to analyze it, hopefully putting it on a slightly more mathematical foundation.

We define the differential operator {D} acting on a vector {\left|f\right\rangle } in a continuous vector space as having the action

\displaystyle D\left|f\right\rangle =\left|\frac{df}{dx}\right\rangle \ \ \ \ \ (1)

This notation means that {D} operating on {\left|f\right\rangle } produces the vector (ket) {\left|\frac{df}{dx}\right\rangle } corresponding to the function whose form in the {\left|x\right\rangle } basis is {\frac{df\left(x\right)}{dx}}. That is, the projection of {\left|\frac{df}{dx}\right\rangle } onto the basis vector {\left|x\right\rangle } is

\displaystyle \frac{df\left(x\right)}{dx}=\left\langle x\left|\frac{df}{dx}\right.\right\rangle =\left\langle x\left|D\right|f\right\rangle \ \ \ \ \ (2)

By a similar argument to that which we used to deduce the matrix element {\left\langle x\left|x^{\prime}\right.\right\rangle }, we can work out the matrix elements of {D} in the {\left|x\right\rangle } basis. Inserting the unit operator, we have

\displaystyle \left\langle x\left|D\right|f\right\rangle \displaystyle = \displaystyle \int dx^{\prime}\left\langle x\left|D\right|x^{\prime}\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle \ \ \ \ \ (3)
\displaystyle \displaystyle = \displaystyle \int dx^{\prime}\left\langle x\left|D\right|x^{\prime}\right\rangle f\left(x^{\prime}\right) \ \ \ \ \ (4)

We need this to be equal to {\frac{df}{dx}}. To get this, we can introduce the derivative of the delta function, except this time the delta function is a function of {x-x^{\prime}} rather than just {x} on its own. To see the effect of this derivative, consider the integral

\displaystyle \int dx^{\prime}\frac{d\delta\left(x-x^{\prime}\right)}{dx}f\left(x^{\prime}\right)=\frac{d}{dx}\int dx^{\prime}\delta\left(x-x^{\prime}\right)f\left(x^{\prime}\right)=\frac{df\left(x\right)}{dx} \ \ \ \ \ (5)

In the second step, we could take the derivative outside the integral since {x} is a constant with respect to the integration. Comparing this with 4 we see that

\displaystyle \left\langle x\left|D\right|x^{\prime}\right\rangle \equiv D_{xx^{\prime}}=\frac{d\delta\left(x-x^{\prime}\right)}{dx}=\delta^{\prime}\left(x-x^{\prime}\right) \ \ \ \ \ (6)

Here the prime in {\delta^{\prime}} means derivative with respect to {x}, not {x^{\prime}}. [Note that this is not the same formula as that quoted in the earlier post, where we had {f\left(x\right)\delta^{\prime}\left(x\right)=-f^{\prime}\left(x\right)\delta\left(x\right)} because in that formula it was the same variable {x} that was involved in the derivative of the delta function and in the integral.]

The operator {D} is not hermitian as it stands. Since the delta function is real, we have, looking at {D_{xx^{\prime}}^{\dagger}=D_{x^{\prime}x}^*} in bra-ket notation, we see that

\displaystyle D_{x^{\prime}x}^{\dagger}=\left\langle x^{\prime}\left|D^*\right|x\right\rangle =\delta^{\prime}\left(x^{\prime}-x\right)=-\delta^{\prime}\left(x-x^{\prime}\right)\ne D_{xx^{\prime}} \ \ \ \ \ (7)

Thus {D} is anti-hermitian. It is easy to fix this and create a hermitian operator by multiplying by an imaginary number, such as {-i} (this choice is, of course, to make the new operator consistent with the momentum operator). Calling this new operator {K\equiv-iD} we have

\displaystyle K_{x^{\prime}x}^{\dagger}=\left\langle x^{\prime}\left|K^*\right|x\right\rangle =i\delta^{\prime}\left(x^{\prime}-x\right)=-i\delta^{\prime}\left(x-x^{\prime}\right)=K_{xx^{\prime}} \ \ \ \ \ (8)

A curious fact about {K} (and thus about the momentum operator as well) is that it is not automatically hermitian even with this correction. We’ve seen that it satisfies the hermiticity property with respect to its matrix elements in the position basis, but to be fully hermitian, it must satisfy

\displaystyle \left\langle g\left|K\right|f\right\rangle =\left\langle f\left|K\right|g\right\rangle ^* \ \ \ \ \ (9)

for any two vectors {\left|f\right\rangle } and {\left|g\right\rangle }. Suppose we are interested in {x} over some range {\left[a,b\right]}. Then by inserting a couple of identity operators, we have

\displaystyle \left\langle g\left|K\right|f\right\rangle \displaystyle = \displaystyle \int_{a}^{b}\int_{a}^{b}\left\langle g\left|x\right.\right\rangle \left\langle x\left|K\right|x^{\prime}\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle dx\;dx^{\prime}\ \ \ \ \ (10)
\displaystyle \displaystyle = \displaystyle -i\int_{a}^{b}g^*\left(x\right)\frac{df}{dx}dx\ \ \ \ \ (11)
\displaystyle \displaystyle = \displaystyle -i\left.g^*\left(x\right)f\left(x\right)\right|_{a}^{b}+i\int_{a}^{b}f\left(x\right)\frac{dg^*}{dx}dx\ \ \ \ \ (12)
\displaystyle \displaystyle = \displaystyle -i\left.g^*\left(x\right)f\left(x\right)\right|_{a}^{b}+\left\langle f\left|K\right|g\right\rangle ^* \ \ \ \ \ (13)

The result is hermitian only if the first term in the last line is zero, which happens only for certain choices of {f} and {g}. If the limits are infinite, so we’re integrating over all space, and the system is bounded so that both {f} and {g} go to zero at infinity, then we’re OK, and {K} is hermitian. Another option is if {g} and {f} are periodic and the range of integration is equal to an integral multiple of the period, then {g^*f} has the same value at each end and the term becomes zero.

However, as we’ve seen, in quantum mechanics there are cases where we deal with functions such as {e^{ikx}} (for {k} real) that oscillate indefinitely, no matter how large {x} is (see the free particle, for example). There isn’t any mathematically airtight way around such cases (as far as I know), but a hand-wavy way of defining a limit for such oscillating functions is to consider their average behaviour as {x\rightarrow\pm\infty}. The average defined by Shankar is given as

\displaystyle \lim_{x\rightarrow\infty}e^{ikx}e^{-ik^{\prime}x}=\lim_{\substack{L\rightarrow\infty\\ \Delta\rightarrow\infty } }\frac{1}{\Delta}\int_{L}^{L+\Delta}e^{i\left(k-k^{\prime}\right)x}dx \ \ \ \ \ (14)

This is interpreted as looking at the function very far out on the {x} axis (at position {L}), and then considering a very long interval {\Delta} starting at point {L}. Since the integral of {e^{i\left(k-k^{\prime}\right)x}} over one period is zero (it’s just a combination of sine and cosine functions), the integral is always bounded between 0 and the area under half a cycle, as successive half-cycles cancel each other. Dividing by {\Delta}, which is monotonically increasing, ensures that the limit is zero.

This isn’t an ideal solution, but it’s just one of many cases where an infinitely oscillating function is called upon to do seemingly impossible things. The theory seems to hang together fairly well in any case.

Functions of hermitian operators

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Exercises 1.9.1 – 1.9.3.

One of the most common ways to define a function of an operator is to consider the case where the function can be expressed as a power series. That is, given an operator {\Omega}, a function {f\left(\Omega\right)} can be defined as

\displaystyle f\left(\Omega\right)=\sum_{n=0}^{\infty}a_{n}\Omega^{n} \ \ \ \ \ (1)

where the coefficients {a_{n}} are, in general, complex scalars. This definition can still be difficult to deal with if {\Omega} is not diagonalizable since, in that case, powers of {\Omega} have no simple form, so it can be hard to tell if the series converges.

We can avoid this problem by restricting ourselves to hermitian operators, since such operators are always diagonalizable according to the spectral theorem and all eigenvalues of hermitian operators are real. Then powers of {\Omega} are easy to calculate, since if the {i}th diagonal element of {\Omega} is {\omega_{i}}, the {i}th diagonal element of {\Omega^{n}} is {\omega_{i}^{n}}. The problem of finding {f\left(\Omega\right)} is then reduced to examining whether the series converges for each diagonal element.

Example 1 Suppose we have the simplest such power series

\displaystyle f\left(\Omega\right)=\sum_{n=0}^{\infty}\Omega^{n} \ \ \ \ \ (2)

 

If we look at this series in the eigenbasis (the basis of orthonormal eigenvectors that diagonalizes {\Omega}), then we have

\displaystyle f\left(\Omega\right)=\left[\begin{array}{cccc} \sum_{n=0}^{\infty}\omega_{1}^{n}\\ & \sum_{n=0}^{\infty}\omega_{2}^{n}\\ & & \ddots\\ & & & \sum_{n=0}^{\infty}\omega_{m}^{n} \end{array}\right] \ \ \ \ \ (3)

 

{\Omega} here is an {m\times m} matrix with eigenvalues {\omega_{i}}, {i=1,\ldots,m} (it’s possible that some of the eigenvalues could be equal, if {\Omega} is degenerate, but that doesn’t affect the argument).

It’s known that the geometric series

\displaystyle f\left(x\right)=\sum_{n=0}^{\infty}x^{n}=\frac{1}{1-x} \ \ \ \ \ (4)

converges as shown, provided that {\left|x\right|<1}. Thus we see that {f\left(\Omega\right)} converges provided all its eigenvalues satisfy {\left|\omega_{i}\right|<1}. The function is then

\displaystyle f\left(\Omega\right)=\left[\begin{array}{cccc} \frac{1}{1-\omega_{1}}\\ & \frac{1}{1-\omega_{2}}\\ & & \ddots\\ & & & \frac{1}{1-\omega_{m}} \end{array}\right] \ \ \ \ \ (5)

 

To see what operator it converges to, we consider the function

\displaystyle g\left(\Omega\right)=\left(I-\Omega\right)^{-1} \ \ \ \ \ (6)

Still working in the eigenbasis where {\Omega} is diagonal, the matrix {I-\Omega} is also diagonal with diagonal elements {1-\omega_{i}}. The inverse of a diagonal matrix is another diagonal matrix with diagonal elements equal to the reciprocal of the elements in the original matrix, so {\left(I-\Omega\right)^{-1}} has diagonal elements {\frac{1}{1-\omega_{i}}} so from 5 we see that

\displaystyle f\left(\Omega\right)=\sum_{n=0}^{\infty}\Omega^{n}=\left(I-\Omega\right)^{-1} \ \ \ \ \ (7)

provided all the eigenvalues of {\Omega} satisfy {\left|\omega_{i}\right|<1}.

Example 2 If {H} is a hermitian operator, then {e^{iH}} is unitary. To see this, we again work in the eigenbasis of {H}. By expressing {e^{iH}} as a power series and using the same argument as in the previous example, we see that

\displaystyle U=e^{iH}=\left[\begin{array}{cccc} e^{i\omega_{1}}\\ & e^{i\omega_{2}}\\ & & \ddots\\ & & & e^{i\omega_{m}} \end{array}\right] \ \ \ \ \ (8)

 

The adjoint of {e^{iH}} is found by looking at the power series:

\displaystyle U^{\dagger}=\left(e^{iH}\right)^{\dagger} \displaystyle = \displaystyle \left[\sum_{n=0}^{\infty}\frac{\left(iH\right)^{n}}{n!}\right]^{\dagger}\ \ \ \ \ (9)
\displaystyle \displaystyle = \displaystyle \sum_{n=0}^{\infty}\frac{\left(-iH^{\dagger}\right)^{n}}{n!}\ \ \ \ \ (10)
\displaystyle \displaystyle = \displaystyle \sum_{n=0}^{\infty}\frac{\left(-iH\right)^{n}}{n!}\ \ \ \ \ (11)
\displaystyle \displaystyle = \displaystyle e^{-iH} \ \ \ \ \ (12)

where in the third line we used the hermitian property {H^{\dagger}=H}. Therefore

\displaystyle \left(e^{iH}\right)^{\dagger} \displaystyle = \displaystyle e^{-iH}=\left[\begin{array}{cccc} e^{-i\omega_{1}}\\ & e^{-i\omega_{2}}\\ & & \ddots\\ & & & e^{-i\omega_{m}} \end{array}\right]\ \ \ \ \ (13)
\displaystyle U^{\dagger}U=\left(e^{iH}\right)^{\dagger}e^{iH} \displaystyle = \displaystyle \left[\begin{array}{cccc} e^{-i\omega_{1}}\\ & e^{-i\omega_{2}}\\ & & \ddots\\ & & & e^{-i\omega_{m}} \end{array}\right]\left[\begin{array}{cccc} e^{i\omega_{1}}\\ & e^{i\omega_{2}}\\ & & \ddots\\ & & & e^{i\omega_{m}} \end{array}\right]\ \ \ \ \ (14)
\displaystyle \displaystyle = \displaystyle I \ \ \ \ \ (15)

Thus {\left(e^{iH}\right)^{\dagger}=\left(e^{iH}\right)^{-1}} and {e^{iH}} is unitary.

From 8 we can find the determinant of {e^{iH}}:

\displaystyle \det U=\det e^{iH}=\exp\left[i\sum_{i=1}^{m}\omega_{i}\right]=\exp\left(i\mbox{Tr}H\right) \ \ \ \ \ (16)

since the trace of a hermitian matrix is the sum of its eigenvalues.

Simultaneous diagonalization of hermitian matrices

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Exercise 1.8.10.

The spectral theorem guarantees that any normal operator can be unitarily diagonalized. For commuting hermitian operators we can go one step further and show that a set of such operators can be simultaneously diagonalized with a single unitary transformation. The proof is a bit lengthy and is spelled out in full both in Zwiebach’s notes (chapter 6) and in Shankar’s book (chapter 1, theorem 13) so I won’t reproduce it in full here. To summarize the main points:

We can start by considering two operators {\Omega} and {\Lambda} and assume that at least one of them, say {\Omega}, is nondegenerate, that is, for each eigenvalue there is only one eigenvector (up to multiplication by a scalar). Then for one eigenvalue {\omega_{i}} of {\Omega} we have

\displaystyle  \Omega\left|\omega_{i}\right\rangle =\omega_{i}\left|\omega_{i}\right\rangle \ \ \ \ \ (1)

We also have

\displaystyle  \Lambda\Omega\left|\omega_{i}\right\rangle =\omega_{i}\Lambda\left|\omega_{i}\right\rangle \ \ \ \ \ (2)

so that {\Lambda\left|\omega_{i}\right\rangle } is also an eigenvector of {\Omega} for eigenvalue {\omega_{i}}. However, since {\Omega} is nondegenerate, {\Lambda\left|\omega_{i}\right\rangle } must be a multiple of {\left|\omega_{i}\right\rangle } so that

\displaystyle  \Lambda\left|\omega_{i}\right\rangle =\lambda_{i}\left|\omega_{i}\right\rangle \ \ \ \ \ (3)

so that {\left|\omega_{i}\right\rangle } is an eigenvector of {\Lambda} for eigenvalue {\lambda_{i}}. Therefore a unitary transformation that diagonalizes {\Omega} will also diagonalize {\Lambda}. Note that in this case, we didn’t need to use the condition that {\Omega} and {\Lambda} commute, and we also didn’t need to assume that {\Lambda} is nondegenerate.

If both {\Omega} and {\Lambda} are degenerate, things are a bit more complicated, but the basic idea is this. Suppose we find a basis that diagonalizes {\Omega} and arrange the basis vectors within the unitary matrix {U} in an order that groups all equal eigenvalues together, so that all the eigenvectors corresponding to eigenvalue {\omega_{1}} occur first, followed by all the eigenvectors corresponding to eigenvalue {\omega_{2}} and so on, up to eigenvalue {\omega_{m}} where {m<n} is the number of distinct eigenvalues (which is less than the dimension {n} of the matrix {\Omega} because {\Omega} is degenerate).

Each subset of eigenvectors corresponding to a single eigenvalue forms a subspace, and we can show that the other matrix {\Lambda}, operating on a vector from that subspace transforms the vector to another vector that also lies within the same subspace. Now, any linearly independent selection of basis vectors within the subspace will still diagonalize {\Omega} for that eigenvalue, so we can select such a set of basis vectors within that subspace that also diagonalizes {\Lambda} within that subspace. The process can be repeated for each eigenvalue of {\Omega} resulting in a set of basis vectors that diagonalizes both matrices.

Obviously, I’ve left out the technical details of just how this is done, but you can refer to either Zwiebach’s notes or Shankar’s book for the details.

As an example, consider the two matrices

\displaystyle   \Omega \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 1 & 0 & 1\\ 0 & 0 & 0\\ 1 & 0 & 1 \end{array}\right]\ \ \ \ \ (4)
\displaystyle  \Lambda \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 2 & 1 & 1\\ 1 & 0 & -1\\ 1 & -1 & 2 \end{array}\right] \ \ \ \ \ (5)

We can verify that they commute:

\displaystyle  \Omega\Lambda=\Lambda\Omega=\left[\begin{array}{ccc} 3 & 0 & 3\\ 0 & 0 & 0\\ 3 & 0 & 3 \end{array}\right] \ \ \ \ \ (6)

We can find the eigenvalues and eigenvectors of {\Omega} and {\Lambda} in the usual way. For {\Omega} we have

\displaystyle   \det\left(\Omega-\omega I\right) \displaystyle  = \displaystyle  0\ \ \ \ \ (7)
\displaystyle  \left(1-\omega\right)\left[\left(-\omega\left(1-\omega\right)\right)\right]+\omega \displaystyle  = \displaystyle  0\ \ \ \ \ (8)
\displaystyle  \omega\left(2\omega-\omega^{2}\right) \displaystyle  = \displaystyle  0\ \ \ \ \ (9)
\displaystyle  \omega \displaystyle  = \displaystyle  0,0,2 \ \ \ \ \ (10)

Solving the eigenvector equation, we get, for {\omega=0}

\displaystyle   \left(\Omega-\omega I\right)\left[\begin{array}{c} a\\ b\\ c \end{array}\right] \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 0\\ 0 \end{array}\right]\ \ \ \ \ (11)
\displaystyle  \left[\begin{array}{ccc} 1 & 0 & 1\\ 0 & 0 & 0\\ 1 & 0 & 1 \end{array}\right]\left[\begin{array}{c} a\\ b\\ c \end{array}\right] \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 0\\ 0 \end{array}\right]\ \ \ \ \ (12)
\displaystyle  a \displaystyle  = \displaystyle  -c\ \ \ \ \ (13)
\displaystyle  b \displaystyle  = \displaystyle  \mbox{anything} \ \ \ \ \ (14)

Thus 2 orthonormal eigenvectors are

\displaystyle   \left|0_{1}\right\rangle \displaystyle  = \displaystyle  \frac{1}{\sqrt{2}}\left[\begin{array}{c} 1\\ 0\\ -1 \end{array}\right]\ \ \ \ \ (15)
\displaystyle  \left|0_{2}\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 1\\ 0 \end{array}\right] \ \ \ \ \ (16)

For {\omega=2}:

\displaystyle   \left[\begin{array}{ccc} -1 & 0 & 1\\ 0 & -2 & 0\\ 1 & 0 & -1 \end{array}\right]\left[\begin{array}{c} a\\ b\\ c \end{array}\right] \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 0\\ 0 \end{array}\right]\ \ \ \ \ (17)
\displaystyle  a \displaystyle  = \displaystyle  c\ \ \ \ \ (18)
\displaystyle  b \displaystyle  = \displaystyle  0\ \ \ \ \ (19)
\displaystyle  \left|2\right\rangle \displaystyle  = \displaystyle  \frac{1}{\sqrt{2}}\left[\begin{array}{c} 1\\ 0\\ 1 \end{array}\right] \ \ \ \ \ (20)

For {\Lambda}, we can go through the same procedure to find

\displaystyle   \det\left(\Lambda-\lambda I\right) \displaystyle  = \displaystyle  0\ \ \ \ \ (21)
\displaystyle  -\lambda\left(2-\lambda\right)^{2}+\lambda-2+\lambda-2-2+\lambda \displaystyle  = \displaystyle  0\ \ \ \ \ (22)
\displaystyle  \left(\lambda-2\right)\left[\lambda\left(2-\lambda\right)+3\right] \displaystyle  = \displaystyle  0\ \ \ \ \ (23)
\displaystyle  \lambda \displaystyle  = \displaystyle  -1,2,3 \ \ \ \ \ (24)

We could calculate the eigenvectors from scratch, but from the simultaneous diagonalization theorem, we know that the eigenvector {\left|2\right\rangle } from {\Omega} must be an eigenvector of {\Lambda}, and we find by direct calculation that

\displaystyle  \Lambda\left|2\right\rangle =3\left|2\right\rangle \ \ \ \ \ (25)

so {\left|2\right\rangle } is the eigenvector for {\lambda=3}.

For the other two eigenvalues of {\Lambda}, we know the eigenvectors must be linear combinations of {\left|0_{1}\right\rangle } and {\left|0_{2}\right\rangle } from {\Omega}. Such a combination must have form

\displaystyle  a\left|0_{1}\right\rangle +b\left|0_{2}\right\rangle =\left[\begin{array}{c} a\\ b\\ -a \end{array}\right] \ \ \ \ \ (26)

so we must have

\displaystyle  \Lambda\left[\begin{array}{c} a\\ b\\ -a \end{array}\right]=\left[\begin{array}{c} a+b\\ 2a\\ -a-b \end{array}\right]=\lambda\left[\begin{array}{c} a\\ b\\ -a \end{array}\right] \ \ \ \ \ (27)

for {\lambda=-1,2}. For {\lambda=2}, we have

\displaystyle   a \displaystyle  = \displaystyle  b\ \ \ \ \ (28)
\displaystyle  \left|\lambda=2\right\rangle \displaystyle  = \displaystyle  \frac{1}{\sqrt{3}}\left[\begin{array}{c} 1\\ 1\\ -1 \end{array}\right] \ \ \ \ \ (29)

For {\lambda=-1}:

\displaystyle   b \displaystyle  = \displaystyle  -2a\ \ \ \ \ (30)
\displaystyle  \left|\lambda=-1\right\rangle \displaystyle  = \displaystyle  \frac{1}{\sqrt{6}}\left[\begin{array}{c} 1\\ -2\\ -1 \end{array}\right] \ \ \ \ \ (31)

The columns of the unitary transformation matrix are therefore given by 29, 31 and 20, so we have

\displaystyle   U \displaystyle  = \displaystyle  \left[\begin{array}{ccc} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{3}} & -\frac{2}{\sqrt{6}} & 0\\ -\frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} \end{array}\right]\ \ \ \ \ (32)
\displaystyle  U^{\dagger} \displaystyle  = \displaystyle  \left[\begin{array}{ccc} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{6}} & -\frac{2}{\sqrt{6}} & -\frac{1}{\sqrt{6}}\\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{array}\right] \ \ \ \ \ (33)

By matrix multiplication, we can verify that this transformation diagonalizes both {\Omega} and {\Lambda}:

\displaystyle   U^{\dagger}\Omega U \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & 2 \end{array}\right]\ \ \ \ \ (34)
\displaystyle  U^{\dagger}\Lambda U \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 2 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & 3 \end{array}\right] \ \ \ \ \ (35)

Hermitian matrices – example with 4 matrices

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Exercise 1.8.8.

Suppose we have four hermitian matrices {M^{i}} for {i=1,2,3,4} that obey the relation

\displaystyle M^{i}M^{j}+M^{j}M^{i}=2\delta^{ij}I \ \ \ \ \ (1)

We can find the possible eigenvalues as follows. Suppose we choose an orthonormal basis (such a basis always exists for a hermitian matrix) {\left\{ e\right\} }in which {M^{i}} is diagonal for one particular value of {i}. That is, for a basis vector {\left|e_{k}\right\rangle } in this basis, we have {M^{i}\left|e_{k}\right\rangle =\omega_{k}^{i}\left|e_{k}\right\rangle }, where {\omega_{k}^{i}} is the {k}th eigenvalue of {M^{i}}.

Then with {i=j} above, we have

\displaystyle 2\left(M^{i}\right)^{2} \displaystyle = \displaystyle 2I\ \ \ \ \ (2)
\displaystyle \left(M^{i}\right)^{2} \displaystyle = \displaystyle I \ \ \ \ \ (3)

Operating on a vector {e} from this basis, we get

\displaystyle \left(M^{i}\right)^{2}\left|e_{k}\right\rangle \displaystyle = \displaystyle \left|e_{k}\right\rangle \ \ \ \ \ (4)
\displaystyle \displaystyle = \displaystyle \left(\omega_{k}^{i}\right)^{2}\left|e_{k}\right\rangle \ \ \ \ \ (5)

Therefore, the possible values of {\omega_{k}^{i}} are {\pm1}. We didn’t choose any particular value for {i}, so this is true of all four matrices.

Now, for {i\ne j} we have

\displaystyle M^{i}M^{j}=-M^{j}M^{i} \ \ \ \ \ (6)

 

We can find the trace of {M^{j}} as follows. Assuming {i\ne j}

\displaystyle \mbox{Tr}M^{j} \displaystyle = \displaystyle \mbox{Tr}\left(M^{i}M^{i}M^{j}\right)\ \ \ \ \ (7)
\displaystyle \displaystyle = \displaystyle -\mbox{Tr}\left(M^{i}M^{j}M^{i}\right)\ \ \ \ \ (8)
\displaystyle \displaystyle = \displaystyle -\mbox{Tr}\left(M^{i}M^{i}M^{j}\right)\ \ \ \ \ (9)
\displaystyle \displaystyle = \displaystyle -\mbox{Tr}M^{j} \ \ \ \ \ (10)

In line 1 we used 3, in line 2 we used 6 and in line 3 we used the cyclic property of the trace. Thus {\mbox{Tr}M^{j}=-\mbox{Tr}M^{j}=0}.

Since each {M^{j}} has zero trace, the trace is the sum of the eigenvalues and the possible eigenvalues are {\pm1}, the eigenvalue {+1} must occur the same number of times as {-1}, meaning that each {M^{j}} must have an even number of eigenvalues, so the matrices must be even-dimensional.

Spectral theorem for normal operators

References: edX online course MIT 8.05 Week 6.

We’ll now look at a central theorem about normal operators, known as the spectral theorem.

We’ve seen that if a matrix {M} has a set {v} of eigenvectors that span the space, then we can diagonalize {M} by means of the similarity transformation

\displaystyle D_{M}=A^{-1}MA \ \ \ \ \ (1)

where {D_{T}} is diagonal and the columns of {A} are the eigenvectors of {M}. In the general case, there’s no guarantee that the eigenvectors of {M} are orthonormal. However, if there is an orthonormal basis in which {M} is diagonal, then {M} is said to be unitarily diagonalizable. Suppose we start with some arbitrary orthonormal basis {\left(e_{1},\ldots,e_{n}\right)} (we can always construct such a basis using the Gram-Schmidt procedure). Then if the set of eigenvectors of {M} form an orthonormal basis {\left(u_{1},\ldots,u_{n}\right)}, there is a unitary matrix {U} that transforms the {e_{i}} basis into the {u_{i}} basis (since unitary operators preserve inner products):

\displaystyle u_{i}=Ue_{i}=\sum_{j}U_{ji}e_{j} \ \ \ \ \ (2)

Using this unitary operator, we therefore have for a unitarily diagonalizable operator {M}

\displaystyle D_{M}=U^{-1}MU=U^{\dagger}MU \ \ \ \ \ (3)

 

The spectral theorem now states:

Theorem An operator {M} in a complex vector space has an orthonormal basis of eigenvectors (that is, it’s unitarily diagonalizable) if and only if {M} is normal.

Proof: Since this is an ‘if and only if’ theorem, we need to prove it in both directions. First, suppose that {M} is unitarily diagonalizable, so that 3 holds for some {U}. Then

\displaystyle M \displaystyle = \displaystyle UD_{M}U^{\dagger}\ \ \ \ \ (4)
\displaystyle M^{\dagger} \displaystyle = \displaystyle UD_{M}^{\dagger}U^{\dagger} \ \ \ \ \ (5)

The commutator is then, since {U^{\dagger}U=I}

\displaystyle \left[M^{\dagger},M\right] \displaystyle = \displaystyle UD_{M}^{\dagger}D_{M}U^{\dagger}-UD_{M}D_{M}^{\dagger}U^{\dagger}\ \ \ \ \ (6)
\displaystyle \displaystyle = \displaystyle U\left[D_{M}^{\dagger},D_{M}\right]U^{\dagger}\ \ \ \ \ (7)
\displaystyle \displaystyle = \displaystyle 0 \ \ \ \ \ (8)

where the result follows because all diagonal matrices commute. Thus {M} is normal, and this completes one direction of the proof.

Going the other way is a bit trickier. We need to show that for any normal matrix {M} with elements defined on some arbitrary orthonormal basis (that is, a basis that is not necessarily composed of eigenvectors of {M}), there is a unitary matrix {U} such that {U^{\dagger}MU} is diagonal. Since we started with an orthonormal basis and {U} preserves inner products, the new basis is also orthonormal which will prove the theorem.

The proof uses mathematical induction, in which we first prove that the result is true for one specific dimension of vector space, say {\mbox{dim }V=1}. We can then assume the result is true for some dimension {n-1} and from that assumption, prove it is also true for the next higher dimension {n}.

Since any {1\times1} matrix is diagonal (it consists of only one element), the result is true for {\mbox{dim }V=1}. So we now assume it’s true for a dimension of {n-1} and prove it’s true for a dimension of {n}.

We take an arbitrary orthonormal basis of the {n}-dimensional {V} to be {\left(\left|1\right\rangle ,\ldots,\left|n\right\rangle \right)}. In that basis, the matrix {M} has elements {M_{ij}=\left\langle i\left|M\right|j\right\rangle }. We know that {M} has at least one eigenvalue {\lambda_{1}} with a normalized eigenvector {\left|x_{1}\right\rangle }:

\displaystyle M\left|x_{1}\right\rangle =\lambda_{1}\left|x_{1}\right\rangle \ \ \ \ \ (9)

and, since {M} is normal, the eigenvector {\left|x_{1}\right\rangle } is also an eigenvector of {M^{\dagger}}:

\displaystyle M^{\dagger}\left|x_{1}\right\rangle =\lambda_{1}^*\left|x_{1}\right\rangle \ \ \ \ \ (10)

 

Starting with a basis of {V} containing {\left|x_{1}\right\rangle }, we can use Gram-Schmidt to generate an orthonormal basis {\left(\left|x_{1}\right\rangle ,\ldots,\left|x_{n}\right\rangle \right)}. We now define an operator {U_{1}} as follows:

\displaystyle U_{1}\equiv\sum_{i}\left|x_{i}\right\rangle \left\langle i\right| \ \ \ \ \ (11)

{U_{1}} is unitary, since

\displaystyle U_{1}^{\dagger} \displaystyle = \displaystyle \sum_{i}\left|i\right\rangle \left\langle x_{i}\right|\ \ \ \ \ (12)
\displaystyle U_{1}^{\dagger}U \displaystyle = \displaystyle \sum_{i}\sum_{j}\left|i\right\rangle \left\langle x_{i}\left|x_{j}\right.\right\rangle \left\langle j\right|\ \ \ \ \ (13)
\displaystyle \displaystyle = \displaystyle \sum_{i}\sum_{j}\left|i\right\rangle \delta_{ij}\left\langle j\right|\ \ \ \ \ (14)
\displaystyle \displaystyle = \displaystyle \sum_{i}\left|i\right\rangle \left\langle i\right|\ \ \ \ \ (15)
\displaystyle \displaystyle = \displaystyle I \ \ \ \ \ (16)

From its definition

\displaystyle U_{1}\left|1\right\rangle \displaystyle = \displaystyle \left|x_{1}\right\rangle \ \ \ \ \ (17)
\displaystyle U_{1}^{\dagger}\left|x_{1}\right\rangle \displaystyle = \displaystyle \left|1\right\rangle \ \ \ \ \ (18)

Now consider the matrix {M_{1}} defined as

\displaystyle M_{1}\equiv U_{1}^{\dagger}MU_{1} \ \ \ \ \ (19)

 

{M_{1}} is also normal, as can be verified by calculating the commutator and using {\left[M^{\dagger},M\right]=0}. Further

\displaystyle M_{1}\left|1\right\rangle \displaystyle = \displaystyle U_{1}^{\dagger}MU_{1}\left|1\right\rangle \ \ \ \ \ (20)
\displaystyle \displaystyle = \displaystyle U_{1}^{\dagger}M\left|x_{1}\right\rangle \ \ \ \ \ (21)
\displaystyle \displaystyle = \displaystyle \lambda_{1}U_{1}^{\dagger}\left|x_{1}\right\rangle \ \ \ \ \ (22)
\displaystyle \displaystyle = \displaystyle \lambda_{1}\left|1\right\rangle \ \ \ \ \ (23)

Thus {\left|1\right\rangle } is an eigenvector of {M_{1}} with eigenvalue {\lambda_{1}}.

The matrix elements in the first column of {M_{1}} in the original basis {\left(\left|1\right\rangle ,\ldots,\left|n\right\rangle \right)} are

\displaystyle \left\langle j\left|M_{1}\right|1\right\rangle =\lambda_{1}\left\langle j\left|1\right.\right\rangle =\lambda_{1}\delta_{1j} \ \ \ \ \ (24)

Thus all entries in the first column are zero except for the first row, where it is {\lambda_{1}}. How about the first row? Using 10 we have

\displaystyle \left\langle 1\left|M_{1}\right|j\right\rangle \displaystyle = \displaystyle \left(\left\langle j\left|M_{1}^{\dagger}\right|1\right\rangle \right)^*\ \ \ \ \ (25)
\displaystyle \displaystyle = \displaystyle \left(\lambda_{1}^*\left\langle j\left|1\right.\right\rangle \right)^*\ \ \ \ \ (26)
\displaystyle \displaystyle = \displaystyle \lambda_{1}\delta_{1j} \ \ \ \ \ (27)

Thus all entries in the first row, except the first, are also zero. Thus in the original basis {\left(\left|1\right\rangle ,\ldots,\left|n\right\rangle \right)} we have

\displaystyle M_{1}=\left[\begin{array}{cccc} \lambda_{1} & 0 & \ldots & 0\\ 0\\ \vdots & & M^{\prime}\\ 0 \end{array}\right] \ \ \ \ \ (28)

where {M^{\prime}} is an {\left(n-1\right)\times\left(n-1\right)} matrix. We have

\displaystyle M_{1}^{\dagger} \displaystyle = \displaystyle \left[\begin{array}{cccc} \lambda_{1}^* & 0 & \ldots & 0\\ 0\\ \vdots & & \left(M^{\prime}\right)^{\dagger}\\ 0 \end{array}\right]\ \ \ \ \ (29)
\displaystyle M_{1}^{\dagger}M_{1} \displaystyle = \displaystyle \left[\begin{array}{cccc} \left|\lambda_{1}\right|^{2} & 0 & \ldots & 0\\ 0\\ \vdots & & \left(M^{\prime}\right)^{\dagger}M^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (30)
\displaystyle M_{1}M_{1}^{\dagger} \displaystyle = \displaystyle \left[\begin{array}{cccc} \left|\lambda_{1}\right|^{2} & 0 & \ldots & 0\\ 0\\ \vdots & & M^{\prime}\left(M^{\prime}\right)^{\dagger}\\ 0 \end{array}\right] \ \ \ \ \ (31)

Since {M_{1}} is normal, we must have {M_{1}M_{1}^{\dagger}=M_{1}^{\dagger}M_{1}}, which implies

\displaystyle \left(M^{\prime}\right)^{\dagger}M^{\prime}=M^{\prime}\left(M^{\prime}\right)^{\dagger} \ \ \ \ \ (32)

so that {M^{\prime}} is also a normal matrix. By the induction hypotheses, since {M^{\prime}} is an {\left(n-1\right)\times\left(n-1\right)} normal matrix, it is unitarily diagonalizable by some unitary matrix {U^{\prime}}, that is

\displaystyle U^{\prime\dagger}M^{\prime}U^{\prime}=D_{M^{\prime}} \ \ \ \ \ (33)

 

is diagonal. We can extend {U^{\prime}} to an {n\times n} unitary matrix by adding a 1 to the upper left:

\displaystyle U=\left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & U^{\prime}\\ 0 \end{array}\right] \ \ \ \ \ (34)

We can check that {U} is unitary by direct calculation, using {\left(U^{\prime}\right)^{\dagger}U^{\prime}=I}

\displaystyle U^{\dagger}U \displaystyle = \displaystyle \left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & \left(U^{\prime}\right)^{\dagger}\\ 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & U^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (35)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & \left(U^{\prime}\right)^{\dagger}U^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (36)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0 & 1\\ \vdots & & 1\\ 0 & & & 1 \end{array}\right]=I \ \ \ \ \ (37)

We then have, using 33

\displaystyle U^{\dagger}M_{1}U \displaystyle = \displaystyle \left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & \left(U^{\prime}\right)^{\dagger}\\ 0 \end{array}\right]\left[\begin{array}{cccc} \lambda_{1} & 0 & \ldots & 0\\ 0\\ \vdots & & M^{\prime}\\ 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & U^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (38)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{cccc} \lambda_{1} & 0 & \ldots & 0\\ 0\\ \vdots & & U^{\prime\dagger}M^{\prime}U^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (39)
\displaystyle \displaystyle = \displaystyle \left[\begin{array}{cccc} \lambda_{1} & 0 & \ldots & 0\\ 0\\ \vdots & & D_{M^{\prime}}\\ 0 \end{array}\right] \ \ \ \ \ (40)

That is, {U^{\dagger}M_{1}U} is diagonal. From the definition 19 of {M_{1}}, we now have

\displaystyle U^{\dagger}M_{1}U \displaystyle = \displaystyle U^{\dagger}U_{1}^{\dagger}MU_{1}U\ \ \ \ \ (41)
\displaystyle \displaystyle = \displaystyle \left(U_{1}U\right)^{\dagger}M\left(U_{1}U\right) \ \ \ \ \ (42)

Since the product of two unitary matrices is unitary, we have found a unitary operator {U_{1}U} that diagonalizes {M}, which proves the result. \Box

Notice that the proof didn’t assume that the eigenvalues are nondegenerate, so that even if there are several linearly independent eigenvectors corresponding to one eigenvalue, it is still possible to find an orthonormal basis consisting of the eigenvectors. In other words, for any hermitian or unitary operator, it is always possible to find an orthonormal basis of the vector space consisting of eigenvectors of the operator.

In the general case, a normal matrix {M} in an {n}-dimensional vector space can have {m} distinct eigenvalues, where {1\le m\le n}. If {n=m}, there is no degeneracy and each eigenvalue has a unique (up to a scalar multiple) eigenvector. If {m<n}, then one or more of the eigenvalues occurs more than once, and the eigenvector subspace corresponding to a degenerate eigenvalue has a dimension larger than 1. However, the spectral theorem guarantees that it is possible to choose an orthonormal basis within each subspace, and that each subspace is orthogonal to all other subspaces.

More precisely, the vector space {V} can be decomposed into {m} subspaces {U_{k}} for {k=1,\ldots,m}, with the dimension {d_{k}} of subspace {U_{k}} equal to the degeneracy of eigenvalue {\lambda_{k}}. The full space {V} is the direct sum of these subspaces

\displaystyle V \displaystyle = \displaystyle U_{1}\oplus U_{2}\oplus\ldots\oplus U_{m}\ \ \ \ \ (43)
\displaystyle n \displaystyle = \displaystyle \sum_{k=1}^{m}d_{k} \ \ \ \ \ (44)

It’s usually most convenient to order the eigenvectors as follows:

\displaystyle \left(u_{1}^{\left(1\right)},\ldots,u_{d_{1}}^{\left(1\right)},u_{1}^{\left(2\right)},\ldots,u_{d_{2}}^{\left(2\right)},\ldots,u_{1}^{\left(m\right)},\ldots,u_{d_{m}}^{\left(m\right)}\right) \ \ \ \ \ (45)

The notation {u_{j}^{\left(k\right)}} means the {j}th eigenvector belonging to eigenvalue {k}.

In practice, there is a lot of freedom in choosing orthonormal eigenvectors for degenerate eigenvalues, since we can pick any {d_{k}} mutually orthogonal vectors within the subspace of dimension {d_{k}}. For example, in 3-d space we usually choose the {x,y,z} unit vectors as the orthonormal set, but we can pivot these three vectors about the origin, or even reflect them in a plane passing through the origin, and still get an orthonormal set of 3 vectors.

The diagonal form of the normal matrix {M} in this orthonormal basis is

\displaystyle D_{M}=\left[\begin{array}{ccccccc} \lambda_{1}\\ & \ddots\\ & & \lambda_{1}\\ & & & \ddots\\ & & & & \lambda_{m}\\ & & & & & \ddots\\ & & & & & & \lambda_{m} \end{array}\right] \ \ \ \ \ (46)

Here, eigenvalue {\lambda_{k}} occurs {d_{k}} times along the diagonal.

Hermitian operators – a few examples

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Exercise 1.6.2.

Here are a few more results about hermitian operators.

Suppose we are given two hermitian operators {\Omega} and {\Lambda}. We’ll look at some combinations of these operators.

The operator {\Omega\Lambda} has the hermitian conjugate

\displaystyle  \left(\Omega\Lambda\right)^{\dagger}=\Lambda^{\dagger}\Omega^{\dagger}=\Lambda\Omega \ \ \ \ \ (1)

Thus the product operator {\Omega\Lambda} is hermitian only if {\Lambda} and {\Omega} commute.

The operator {\Omega\Lambda+\lambda\Omega} for some complex scalar {\lambda} has the hermitian conjugate

\displaystyle   \left(\Omega\Lambda+\lambda\Omega\right)^{\dagger} \displaystyle  = \displaystyle  \Lambda^{\dagger}\Omega^{\dagger}+\lambda^*\Omega^{\dagger}\ \ \ \ \ (2)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\Omega+\lambda^*\Omega \ \ \ \ \ (3)

This operator is therefore hermitian only if {\Lambda} and {\Omega} commute and {\lambda} is real.

The commutator has the hermitian conjugate

\displaystyle   \left[\Omega,\Lambda\right]^{\dagger} \displaystyle  = \displaystyle  \left(\Omega\Lambda-\Lambda\Omega\right)^{\dagger}\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\Omega-\Omega\Lambda\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \left[\Lambda,\Omega\right]\ \ \ \ \ (6)
\displaystyle  \displaystyle  = \displaystyle  -\left[\Omega,\Lambda\right] \ \ \ \ \ (7)

Thus the commutator is anti-hermitian (the hermitian conjugate is the negative of the original operator).

Finally, what happens if we multiply the commutator by {i}?

\displaystyle   \left(i\left[\Omega,\Lambda\right]\right)^{\dagger} \displaystyle  = \displaystyle  -i\left(\Omega\Lambda-\Lambda\Omega\right)^{\dagger}\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  -i\left(\Lambda\Omega-\Omega\Lambda\right)\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  -i\left[\Lambda,\Omega\right]\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  i\left[\Omega,\Lambda\right] \ \ \ \ \ (11)

Thus the operator {i\left[\Omega,\Lambda\right]} is hermitian.

Hermitian operators – a few theorems

References: edX online course MIT 8.05.1x Week 4.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 7.

A hermitian operator {T} satisfies {T=T^{\dagger}}. [Axler (and most mathematicians, probably) refers to a hermitian operator as self-adjoint and uses the notation {T^*} for {T^{\dagger}}.]

As preparation for discussing hermitian operators, we need the following theorem.

Theorem 1 If {T} is a linear operator in a complex vector space {V}, then if {\left\langle v,Tv\right\rangle =0} for all {v\in V}, then {T=0}.

Proof: The idea is to show something even more general, namely that {\left\langle u,Tv\right\rangle =0} for all {u,v\in V}. If we can do this, then setting {u=Tv} means that {\left\langle Tv,Tv\right\rangle =0} for all {v\in V}, which in turn implies that {Tv=0} for all {v\in V}, implying further that {T=0}.

Zwiebach goes through a few stages in developing the proof, but the end result is that we can write

\displaystyle   \left\langle u,Tv\right\rangle \displaystyle  = \displaystyle  \frac{1}{4}\left[\left\langle u+v,T\left(u+v\right)\right\rangle -\left\langle u-v,T\left(u-v\right)\right\rangle \right]+\ \ \ \ \ (1)
\displaystyle  \displaystyle  \displaystyle  \frac{1}{4i}\left[\left\langle u+iv,T\left(u+iv\right)\right\rangle -\left\langle u-iv,T\left(u-iv\right)\right\rangle \right] \ \ \ \ \ (2)

Note that all the terms on the RHS are of the form {\left\langle x,Tx\right\rangle } for some {x}. Thus if we require {\left\langle x,Tx\right\rangle =0} for all {x\in V}, then all four terms are separately 0, meaning that {\left\langle u,Tv\right\rangle =0} as desired, completing the proof. \Box

Although we’ve used the imaginary number {i} in this proof, we might wonder if it really does restrict the result to complex vector spaces. That is, is there some other decomposition of {\left\langle u,Tv\right\rangle } that doesn’t required complex numbers that would still work?

In fact, we don’t need to worry about this, since there is a simple counter-example to the theorem if we consider a real vector space. In 2-d or 3-d space, an operator {T} that rotates a vector through {\frac{\pi}{2}} always produces a vector orthogonal to the original, resulting in {\left\langle v,Tv\right\rangle =0} for all {v}. In this case, {T\ne0} so the theorem is definitely not true for real vector spaces.

Now we can turn to a few theorems about hermitian operators. First, since every operator on a finite-dimensional complex vector space has at least one eigenvalue, we know that every hermitian operator has at least one eigenvalue. This leads to the first theorem on hermitian operators.

Theorem 2 All eigenvalues of hermitian operators are real.

Proof: Since at least one eigenvalue {\lambda} exists, let {v} be the corresponding non-zero eigenvector, so that {Tv=\lambda v}. We have

\displaystyle  \left\langle v,Tv\right\rangle =\left\langle v,\lambda v\right\rangle =\lambda\left\langle v,v\right\rangle \ \ \ \ \ (3)

Since {T=T^{\dagger}} we also have

\displaystyle  \left\langle v,Tv\right\rangle =\left\langle T^{\dagger}v,v\right\rangle =\left\langle Tv,v\right\rangle =\left\langle \lambda v,v\right\rangle =\lambda^*\left\langle v,v\right\rangle \ \ \ \ \ (4)

Equating the last two equations, and remembering that {\left\langle v,v\right\rangle \ne0}, we have {\lambda=\lambda^*}, so {\lambda} is real. \Box

Next, a theorem on the eigenvectors of distinct eigenvalues.

Theorem 3 Eigenvectors associated with different eigenvalues of a hermitian operator are orthogonal.

Proof: Suppose {\lambda_{1}\ne\lambda_{2}} are two eigenvalues of {T}, and {v_{1}} and {v_{2}} are the corresponding eigenvectors. Then {Tv_{1}=\lambda_{1}v_{1}} and {Tv_{2}=\lambda_{2}v_{2}}. Taking an inner product, we have

\displaystyle   \left\langle v_{2},Tv_{1}\right\rangle \displaystyle  = \displaystyle  \lambda_{1}\left\langle v_{2},v_{1}\right\rangle \ \ \ \ \ (5)
\displaystyle  \left\langle v_{2},Tv_{1}\right\rangle \displaystyle  = \displaystyle  \left\langle Tv_{2},v_{1}\right\rangle \ \ \ \ \ (6)
\displaystyle  \displaystyle  = \displaystyle  \lambda_{2}\left\langle v_{2},v_{1}\right\rangle \ \ \ \ \ (7)

where in the last line we used the fact that {\lambda_{2}} is real when taking it outside the inner product. Equating the first and last lines and using {\lambda_{1}\ne\lambda_{2}}, we see that {\left\langle v_{2},v_{1}\right\rangle =0} as required.\Box

Anti-hermitian operators

Required math: calculus, vectors

Required physics: none

References: Griffiths, David J. (2005), Introduction to Quantum Mechanics, 2nd Edition; Pearson Education – Problem 3.26.

A hermitian operator is equal to its hermitian conjugate (which, remember, is the complex conjugate of the transpose of the matrix representing the operator). That is,

\displaystyle  \hat{Q}^{\dagger}=\hat{Q} \ \ \ \ \ (1)

This has the consequence that for inner products

\displaystyle   \langle f|\hat{Q}g\rangle \displaystyle  = \displaystyle  \langle\hat{Q}^{\dagger}f|g\rangle\ \ \ \ \ (2)
\displaystyle  \displaystyle  = \displaystyle  \langle\hat{Q}f|g\rangle \ \ \ \ \ (3)

An anti-hermitian operator is equal to the negative of its hermitian conjugate, that is

\displaystyle  \hat{Q}^{\dagger}=-\hat{Q} \ \ \ \ \ (4)

In inner products, this means

\displaystyle   \langle f|\hat{Q}g\rangle \displaystyle  = \displaystyle  \langle\hat{Q}^{\dagger}f|g\rangle\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  -\langle\hat{Q}f|g\rangle \ \ \ \ \ (6)

The expectation value of an anti-hermitian operator is:

\displaystyle   \langle f|\hat{Q}f\rangle \displaystyle  = \displaystyle  \langle\hat{Q}^{\dagger}f|f\rangle\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  -\langle\hat{Q}f|f\rangle\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  -\langle Q\rangle^* \ \ \ \ \ (9)

But {\langle f|\hat{Q}f\rangle=\langle Q\rangle} so {\langle Q\rangle=-\langle Q\rangle^*}, which means the expectation value must be pure imaginary.

For two hermitian operators {\hat{Q}} and {\hat{R}} we have

\displaystyle   \left[\hat{Q},\hat{R}\right] \displaystyle  = \displaystyle  \hat{Q}\hat{R}-\hat{R}\hat{Q}\ \ \ \ \ (10)
\displaystyle  {}[\hat{Q},\hat{R}]^{\dagger} \displaystyle  = \displaystyle  \hat{R}^{\dagger}\hat{Q}^{\dagger}-\hat{Q}^{\dagger}\hat{R}^{\dagger}\ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  \hat{R}\hat{Q}-\hat{Q}\hat{R}\ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  \left[\hat{R},\hat{Q}\right]\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  -[\hat{Q},\hat{R}] \ \ \ \ \ (14)

where we have used the hermitian property {\hat{Q}^{\dagger}=\hat{Q}} to get the third line. Thus the commutator of two hermitian operators is anti-hermitian.

If two operators {\hat{S}} and {\hat{T}} are anti-hermitian, a similar derivation shows that {[\hat{S},\hat{T}]^{\dagger}=-[\hat{S},\hat{T}]} also:

\displaystyle   \left[\hat{S},\hat{T}\right] \displaystyle  = \displaystyle  \hat{S}\hat{T}-\hat{T}\hat{S}\ \ \ \ \ (15)
\displaystyle  {}[\hat{S},\hat{T}]^{\dagger} \displaystyle  = \displaystyle  \hat{T}^{\dagger}\hat{S}^{\dagger}-\hat{S}^{\dagger}\hat{T}^{\dagger}\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  (-\hat{T})(-\hat{S})-(-\hat{S})(-\hat{T})\ \ \ \ \ (17)
\displaystyle  \displaystyle  = \displaystyle  -[\hat{S},\hat{T}] \ \ \ \ \ (18)