# Spectral theorem for normal operators

References: edX online course MIT 8.05 Week 6.

We’ll now look at a central theorem about normal operators, known as the spectral theorem.

We’ve seen that if a matrix ${M}$ has a set ${v}$ of eigenvectors that span the space, then we can diagonalize ${M}$ by means of the similarity transformation

$\displaystyle D_{M}=A^{-1}MA \ \ \ \ \ (1)$

where ${D_{T}}$ is diagonal and the columns of ${A}$ are the eigenvectors of ${M}$. In the general case, there’s no guarantee that the eigenvectors of ${M}$ are orthonormal. However, if there is an orthonormal basis in which ${M}$ is diagonal, then ${M}$ is said to be unitarily diagonalizable. Suppose we start with some arbitrary orthonormal basis ${\left(e_{1},\ldots,e_{n}\right)}$ (we can always construct such a basis using the Gram-Schmidt procedure). Then if the set of eigenvectors of ${M}$ form an orthonormal basis ${\left(u_{1},\ldots,u_{n}\right)}$, there is a unitary matrix ${U}$ that transforms the ${e_{i}}$ basis into the ${u_{i}}$ basis (since unitary operators preserve inner products):

$\displaystyle u_{i}=Ue_{i}=\sum_{j}U_{ji}e_{j} \ \ \ \ \ (2)$

Using this unitary operator, we therefore have for a unitarily diagonalizable operator ${M}$

$\displaystyle D_{M}=U^{-1}MU=U^{\dagger}MU \ \ \ \ \ (3)$

The spectral theorem now states:

Theorem An operator ${M}$ in a complex vector space has an orthonormal basis of eigenvectors (that is, it’s unitarily diagonalizable) if and only if ${M}$ is normal.

Proof: Since this is an ‘if and only if’ theorem, we need to prove it in both directions. First, suppose that ${M}$ is unitarily diagonalizable, so that 3 holds for some ${U}$. Then

 $\displaystyle M$ $\displaystyle =$ $\displaystyle UD_{M}U^{\dagger}\ \ \ \ \ (4)$ $\displaystyle M^{\dagger}$ $\displaystyle =$ $\displaystyle UD_{M}^{\dagger}U^{\dagger} \ \ \ \ \ (5)$

The commutator is then, since ${U^{\dagger}U=I}$

 $\displaystyle \left[M^{\dagger},M\right]$ $\displaystyle =$ $\displaystyle UD_{M}^{\dagger}D_{M}U^{\dagger}-UD_{M}D_{M}^{\dagger}U^{\dagger}\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle =$ $\displaystyle U\left[D_{M}^{\dagger},D_{M}\right]U^{\dagger}\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (8)$

where the result follows because all diagonal matrices commute. Thus ${M}$ is normal, and this completes one direction of the proof.

Going the other way is a bit trickier. We need to show that for any normal matrix ${M}$ with elements defined on some arbitrary orthonormal basis (that is, a basis that is not necessarily composed of eigenvectors of ${M}$), there is a unitary matrix ${U}$ such that ${U^{\dagger}MU}$ is diagonal. Since we started with an orthonormal basis and ${U}$ preserves inner products, the new basis is also orthonormal which will prove the theorem.

The proof uses mathematical induction, in which we first prove that the result is true for one specific dimension of vector space, say ${\mbox{dim }V=1}$. We can then assume the result is true for some dimension ${n-1}$ and from that assumption, prove it is also true for the next higher dimension ${n}$.

Since any ${1\times1}$ matrix is diagonal (it consists of only one element), the result is true for ${\mbox{dim }V=1}$. So we now assume it’s true for a dimension of ${n-1}$ and prove it’s true for a dimension of ${n}$.

We take an arbitrary orthonormal basis of the ${n}$-dimensional ${V}$ to be ${\left(\left|1\right\rangle ,\ldots,\left|n\right\rangle \right)}$. In that basis, the matrix ${M}$ has elements ${M_{ij}=\left\langle i\left|M\right|j\right\rangle }$. We know that ${M}$ has at least one eigenvalue ${\lambda_{1}}$ with a normalized eigenvector ${\left|x_{1}\right\rangle }$:

$\displaystyle M\left|x_{1}\right\rangle =\lambda_{1}\left|x_{1}\right\rangle \ \ \ \ \ (9)$

and, since ${M}$ is normal, the eigenvector ${\left|x_{1}\right\rangle }$ is also an eigenvector of ${M^{\dagger}}$:

$\displaystyle M^{\dagger}\left|x_{1}\right\rangle =\lambda_{1}^*\left|x_{1}\right\rangle \ \ \ \ \ (10)$

Starting with a basis of ${V}$ containing ${\left|x_{1}\right\rangle }$, we can use Gram-Schmidt to generate an orthonormal basis ${\left(\left|x_{1}\right\rangle ,\ldots,\left|x_{n}\right\rangle \right)}$. We now define an operator ${U_{1}}$ as follows:

$\displaystyle U_{1}\equiv\sum_{i}\left|x_{i}\right\rangle \left\langle i\right| \ \ \ \ \ (11)$

${U_{1}}$ is unitary, since

 $\displaystyle U_{1}^{\dagger}$ $\displaystyle =$ $\displaystyle \sum_{i}\left|i\right\rangle \left\langle x_{i}\right|\ \ \ \ \ (12)$ $\displaystyle U_{1}^{\dagger}U$ $\displaystyle =$ $\displaystyle \sum_{i}\sum_{j}\left|i\right\rangle \left\langle x_{i}\left|x_{j}\right.\right\rangle \left\langle j\right|\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i}\sum_{j}\left|i\right\rangle \delta_{ij}\left\langle j\right|\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i}\left|i\right\rangle \left\langle i\right|\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle I \ \ \ \ \ (16)$

From its definition

 $\displaystyle U_{1}\left|1\right\rangle$ $\displaystyle =$ $\displaystyle \left|x_{1}\right\rangle \ \ \ \ \ (17)$ $\displaystyle U_{1}^{\dagger}\left|x_{1}\right\rangle$ $\displaystyle =$ $\displaystyle \left|1\right\rangle \ \ \ \ \ (18)$

Now consider the matrix ${M_{1}}$ defined as

$\displaystyle M_{1}\equiv U_{1}^{\dagger}MU_{1} \ \ \ \ \ (19)$

${M_{1}}$ is also normal, as can be verified by calculating the commutator and using ${\left[M^{\dagger},M\right]=0}$. Further

 $\displaystyle M_{1}\left|1\right\rangle$ $\displaystyle =$ $\displaystyle U_{1}^{\dagger}MU_{1}\left|1\right\rangle \ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle U_{1}^{\dagger}M\left|x_{1}\right\rangle \ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \lambda_{1}U_{1}^{\dagger}\left|x_{1}\right\rangle \ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \lambda_{1}\left|1\right\rangle \ \ \ \ \ (23)$

Thus ${\left|1\right\rangle }$ is an eigenvector of ${M_{1}}$ with eigenvalue ${\lambda_{1}}$.

The matrix elements in the first column of ${M_{1}}$ in the original basis ${\left(\left|1\right\rangle ,\ldots,\left|n\right\rangle \right)}$ are

$\displaystyle \left\langle j\left|M_{1}\right|1\right\rangle =\lambda_{1}\left\langle j\left|1\right.\right\rangle =\lambda_{1}\delta_{1j} \ \ \ \ \ (24)$

Thus all entries in the first column are zero except for the first row, where it is ${\lambda_{1}}$. How about the first row? Using 10 we have

 $\displaystyle \left\langle 1\left|M_{1}\right|j\right\rangle$ $\displaystyle =$ $\displaystyle \left(\left\langle j\left|M_{1}^{\dagger}\right|1\right\rangle \right)^*\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\lambda_{1}^*\left\langle j\left|1\right.\right\rangle \right)^*\ \ \ \ \ (26)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \lambda_{1}\delta_{1j} \ \ \ \ \ (27)$

Thus all entries in the first row, except the first, are also zero. Thus in the original basis ${\left(\left|1\right\rangle ,\ldots,\left|n\right\rangle \right)}$ we have

$\displaystyle M_{1}=\left[\begin{array}{cccc} \lambda_{1} & 0 & \ldots & 0\\ 0\\ \vdots & & M^{\prime}\\ 0 \end{array}\right] \ \ \ \ \ (28)$

where ${M^{\prime}}$ is an ${\left(n-1\right)\times\left(n-1\right)}$ matrix. We have

 $\displaystyle M_{1}^{\dagger}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} \lambda_{1}^* & 0 & \ldots & 0\\ 0\\ \vdots & & \left(M^{\prime}\right)^{\dagger}\\ 0 \end{array}\right]\ \ \ \ \ (29)$ $\displaystyle M_{1}^{\dagger}M_{1}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} \left|\lambda_{1}\right|^{2} & 0 & \ldots & 0\\ 0\\ \vdots & & \left(M^{\prime}\right)^{\dagger}M^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (30)$ $\displaystyle M_{1}M_{1}^{\dagger}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} \left|\lambda_{1}\right|^{2} & 0 & \ldots & 0\\ 0\\ \vdots & & M^{\prime}\left(M^{\prime}\right)^{\dagger}\\ 0 \end{array}\right] \ \ \ \ \ (31)$

Since ${M_{1}}$ is normal, we must have ${M_{1}M_{1}^{\dagger}=M_{1}^{\dagger}M_{1}}$, which implies

$\displaystyle \left(M^{\prime}\right)^{\dagger}M^{\prime}=M^{\prime}\left(M^{\prime}\right)^{\dagger} \ \ \ \ \ (32)$

so that ${M^{\prime}}$ is also a normal matrix. By the induction hypotheses, since ${M^{\prime}}$ is an ${\left(n-1\right)\times\left(n-1\right)}$ normal matrix, it is unitarily diagonalizable by some unitary matrix ${U^{\prime}}$, that is

$\displaystyle U^{\prime\dagger}M^{\prime}U^{\prime}=D_{M^{\prime}} \ \ \ \ \ (33)$

is diagonal. We can extend ${U^{\prime}}$ to an ${n\times n}$ unitary matrix by adding a 1 to the upper left:

$\displaystyle U=\left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & U^{\prime}\\ 0 \end{array}\right] \ \ \ \ \ (34)$

We can check that ${U}$ is unitary by direct calculation, using ${\left(U^{\prime}\right)^{\dagger}U^{\prime}=I}$

 $\displaystyle U^{\dagger}U$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & \left(U^{\prime}\right)^{\dagger}\\ 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & U^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (35)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & \left(U^{\prime}\right)^{\dagger}U^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (36)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0 & 1\\ \vdots & & 1\\ 0 & & & 1 \end{array}\right]=I \ \ \ \ \ (37)$

We then have, using 33

 $\displaystyle U^{\dagger}M_{1}U$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & \left(U^{\prime}\right)^{\dagger}\\ 0 \end{array}\right]\left[\begin{array}{cccc} \lambda_{1} & 0 & \ldots & 0\\ 0\\ \vdots & & M^{\prime}\\ 0 \end{array}\right]\left[\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0\\ \vdots & & U^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (38)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} \lambda_{1} & 0 & \ldots & 0\\ 0\\ \vdots & & U^{\prime\dagger}M^{\prime}U^{\prime}\\ 0 \end{array}\right]\ \ \ \ \ (39)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cccc} \lambda_{1} & 0 & \ldots & 0\\ 0\\ \vdots & & D_{M^{\prime}}\\ 0 \end{array}\right] \ \ \ \ \ (40)$

That is, ${U^{\dagger}M_{1}U}$ is diagonal. From the definition 19 of ${M_{1}}$, we now have

 $\displaystyle U^{\dagger}M_{1}U$ $\displaystyle =$ $\displaystyle U^{\dagger}U_{1}^{\dagger}MU_{1}U\ \ \ \ \ (41)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(U_{1}U\right)^{\dagger}M\left(U_{1}U\right) \ \ \ \ \ (42)$

Since the product of two unitary matrices is unitary, we have found a unitary operator ${U_{1}U}$ that diagonalizes ${M}$, which proves the result. $\Box$

Notice that the proof didn’t assume that the eigenvalues are nondegenerate, so that even if there are several linearly independent eigenvectors corresponding to one eigenvalue, it is still possible to find an orthonormal basis consisting of the eigenvectors. In other words, for any hermitian or unitary operator, it is always possible to find an orthonormal basis of the vector space consisting of eigenvectors of the operator.

In the general case, a normal matrix ${M}$ in an ${n}$-dimensional vector space can have ${m}$ distinct eigenvalues, where ${1\le m\le n}$. If ${n=m}$, there is no degeneracy and each eigenvalue has a unique (up to a scalar multiple) eigenvector. If ${m, then one or more of the eigenvalues occurs more than once, and the eigenvector subspace corresponding to a degenerate eigenvalue has a dimension larger than 1. However, the spectral theorem guarantees that it is possible to choose an orthonormal basis within each subspace, and that each subspace is orthogonal to all other subspaces.

More precisely, the vector space ${V}$ can be decomposed into ${m}$ subspaces ${U_{k}}$ for ${k=1,\ldots,m}$, with the dimension ${d_{k}}$ of subspace ${U_{k}}$ equal to the degeneracy of eigenvalue ${\lambda_{k}}$. The full space ${V}$ is the direct sum of these subspaces

 $\displaystyle V$ $\displaystyle =$ $\displaystyle U_{1}\oplus U_{2}\oplus\ldots\oplus U_{m}\ \ \ \ \ (43)$ $\displaystyle n$ $\displaystyle =$ $\displaystyle \sum_{k=1}^{m}d_{k} \ \ \ \ \ (44)$

It’s usually most convenient to order the eigenvectors as follows:

$\displaystyle \left(u_{1}^{\left(1\right)},\ldots,u_{d_{1}}^{\left(1\right)},u_{1}^{\left(2\right)},\ldots,u_{d_{2}}^{\left(2\right)},\ldots,u_{1}^{\left(m\right)},\ldots,u_{d_{m}}^{\left(m\right)}\right) \ \ \ \ \ (45)$

The notation ${u_{j}^{\left(k\right)}}$ means the ${j}$th eigenvector belonging to eigenvalue ${k}$.

In practice, there is a lot of freedom in choosing orthonormal eigenvectors for degenerate eigenvalues, since we can pick any ${d_{k}}$ mutually orthogonal vectors within the subspace of dimension ${d_{k}}$. For example, in 3-d space we usually choose the ${x,y,z}$ unit vectors as the orthonormal set, but we can pivot these three vectors about the origin, or even reflect them in a plane passing through the origin, and still get an orthonormal set of 3 vectors.

The diagonal form of the normal matrix ${M}$ in this orthonormal basis is

$\displaystyle D_{M}=\left[\begin{array}{ccccccc} \lambda_{1}\\ & \ddots\\ & & \lambda_{1}\\ & & & \ddots\\ & & & & \lambda_{m}\\ & & & & & \ddots\\ & & & & & & \lambda_{m} \end{array}\right] \ \ \ \ \ (46)$

Here, eigenvalue ${\lambda_{k}}$ occurs ${d_{k}}$ times along the diagonal.