Tag Archives: vector space

Direct product of two vector spaces

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Chapter 10, Exercise 10.1.1.

Although we’ve studied quantum systems of more than one particle before (for example, systems of fermions and bosons) as covered by Griffiths’s book, the wave functions associated with such particles were just given as products of single-particle wave functions (or linear combinations of these products). We didn’t examine the linear algebra behind these functions. In his chapter 10, Shankar begins by describing the algebra of a direct product vector space, so we’ll review this here.

The physics begins with an extension of the postulate of quantum mechanics that, for a single particle, the position and momentum obey the commutation relation

\displaystyle \left[X,P\right]=i\hbar \ \ \ \ \ (1)

To extend this to multi-particle systems, we propose

\displaystyle \left[X_{i},P_{j}\right] \displaystyle = \displaystyle i\hbar\delta_{ij}\ \ \ \ \ (2)
\displaystyle \left[X_{i},X_{j}\right] \displaystyle = \displaystyle \left[P_{i},P_{j}\right]=0 \ \ \ \ \ (3)

where the subscripts refer to the particle we’re considering.

These postulates are translations of the classical Poisson brackets from classical mechanics, following the prescription that to obtain the quantum commutator, we multiply the classical Poisson bracket by {i\hbar}. The physics in these relations is that properties such as position or momentum of different particles are simultaneously observable, although the position and momentum of a single particle are still governed by the uncertainty principle.

We’ll now restrict our attention to a two-particle system. In such a system, the eigenstate of the position operators is written as {\left|x_{1}x_{2}\right\rangle } and satisfies the eigenvalue equation

\displaystyle X_{i}\left|x_{1}x_{2}\right\rangle =x_{i}\left|x_{1}x_{2}\right\rangle \ \ \ \ \ (4)

Operators referring to particle {i} effectively ignore any quantities associated with the other particle.

So what exactly are these states {\left|x_{1}x_{2}\right\rangle }? They are a set of vectors that span a Hilbert space that describes the state of two particles. Note that we can use any two commuting operators {\Omega_{1}\left(X_{1},P_{1}\right)} and {\Omega_{2}\left(X_{2},P_{2}\right)} to create a set of eigenkets {\left|\omega_{1}\omega_{2}\right\rangle } which also span the space. Any operator that is a function of the position and momentum of only one of the particles always commutes with a similar operator that is a function of only the other particle, since the position and momentum operators of which it is a function commute with those of the other operator. That is

\displaystyle \left[\Omega\left(X_{1},P_{1}\right),\Lambda\left(X_{2},P_{2}\right)\right]=0 \ \ \ \ \ (5)


The space spanned by {\left|x_{1}x_{2}\right\rangle } can also be written as a direct product of two one-particle spaces. This space is written as {\mathbb{V}_{1\otimes2}} where the symbol {\otimes} is the direct product symbol (it’s also the logo of the X-Men, but we won’t pursue that). The direct product is composed of the two single-particle spaces {\mathbb{V}_{1}} (spanned by {\left|x_{1}\right\rangle }) and {\mathbb{V}_{2}} (spanned by {\left|x_{2}\right\rangle }). The notation gets quite cumbersome at this point, so let’s spell it out carefully. For an operator {\Omega}, we can specify which particle it acts on by a subscript, and which space it acts on by a superscript. Thus {X_{1}^{\left(1\right)}} is the position operator for particle 1, which operates on the vector space {\mathbb{V}_{1}}. It might seem redundant at this point to specify both the particle and the space, since it would seem that these are always the same. However, be patient…

From the two one-particle spaces, we can form the two-particle space by taking the direct product of the two one-particle states. Thus the state in which particle 1 is in state {\left|x_{1}\right\rangle } and particle 2 is in state {\left|x_{2}\right\rangle } is written as

\displaystyle \left|x_{1}x_{2}\right\rangle =\left|x_{1}\right\rangle \otimes\left|x_{2}\right\rangle \ \ \ \ \ (6)


It is important to note that this object is composed of two vectors from different vector spaces. The inner and outer products we’ve dealt with up to now, for things like finding the probability that a state has a particular value and so on, that is, objects like {\left\langle \psi_{1}\left|\psi_{2}\right.\right\rangle } and {\left|\psi_{1}\right\rangle \left\langle \psi_{2}\right|}, are composed of two vectors from the same vector space, so no direct product is needed.

If we recall the direct sum of two vector spaces

\displaystyle \mathbb{V}_{1\oplus2}=\mathbb{V}_{1}\oplus\mathbb{V}_{2} \ \ \ \ \ (7)

in that case, the dimension of {\mathbb{V}_{1\oplus2}} is the sum of the dimensions of {\mathbb{V}_{1}} and {\mathbb{V}_{2}}. For a direct product we see from 6 that for each vector {\left|x_{1}\right\rangle } there is one basis vector for each vector {\left|x_{2}\right\rangle }. Thus the number of basis vectors is the product of the number of basis vectors in each of the two one-particle spaces. In other words, the dimension of a direct product is the product of the dimensions of the two vector spaces of which it is composed. [In the case here, both the spaces {\mathbb{V}_{1}} and {\mathbb{V}_{2}} have infinite dimension, so the dimension of {\mathbb{V}_{1\otimes2}} is in effect, ‘doubly infinite’. In a case where {\mathbb{V}_{1}} and {\mathbb{V}_{2}} have finite dimension, we can then just multiply these dimensions to get the dimension of {\mathbb{V}_{1\otimes2}}.]

As {\mathbb{V}_{1\otimes2}} is a vector space with basis vectors {\left|x_{1}\right\rangle \otimes\left|x_{2}\right\rangle }, any linear combination of the basis vectors is also a vector in the space {\mathbb{V}_{1\otimes2}}. Thus the vector

\displaystyle \left|\psi\right\rangle =\left|x_{1}\right\rangle \otimes\left|x_{2}\right\rangle +\left|y_{1}\right\rangle \otimes\left|y_{2}\right\rangle \ \ \ \ \ (8)

is in {\mathbb{V}_{1\otimes2}}, although it can’t be written as a direct product of the two one-particle spaces {\mathbb{V}_{1}} and {\mathbb{V}_{2}}.

Having defined the direct product space, we now need to consider operators in this space. Although Shankar states that it ‘is intuitively clear’ that a single particle operator such as {X_{1}^{\left(1\right)}} must have a corresponding operator in the product space that has the same effect has {X_{1}^{\left(1\right)}} has on the single particle state, it seems to me to be more of a postulate. In any case, it is proposed that if

\displaystyle X_{1}^{\left(1\right)}\left|x_{1}\right\rangle =x_{1}\left|x_{1}\right\rangle \ \ \ \ \ (9)

then in the product space there must be an operator {X_{1}^{\left(1\right)\otimes\left(2\right)}} that operates only on particle 1, with the same effect, that is

\displaystyle X_{1}^{\left(1\right)\otimes\left(2\right)}\left|x_{1}\right\rangle \otimes\left|x_{2}\right\rangle =x_{1}\left|x_{1}\right\rangle \otimes\left|x_{2}\right\rangle \ \ \ \ \ (10)

The notation can be explained as follows. The subscript 1 in {X_{1}^{\left(1\right)\otimes\left(2\right)}} means that the operator operates on particle 1, while the superscript {\left(1\right)\otimes\left(2\right)} means that the operator operates in the product space {\mathbb{V}_{1\otimes2}}. In effect, the operator {X_{1}^{\left(1\right)\otimes\left(2\right)}} is the product of two one-particle operators {X_{1}^{\left(1\right)}}, which operates on space {\mathbb{V}_{1}} and an identity operator {I_{2}^{\left(2\right)}} which operates on space {\mathbb{V}_{2}}. That is, we can write

\displaystyle X_{1}^{\left(1\right)\otimes\left(2\right)} \displaystyle = \displaystyle X_{1}^{\left(1\right)}\otimes I_{2}^{\left(2\right)}\ \ \ \ \ (11)
\displaystyle X_{1}^{\left(1\right)\otimes\left(2\right)}\left|x_{1}\right\rangle \otimes\left|x_{2}\right\rangle \displaystyle = \displaystyle \left|X_{1}^{\left(1\right)}x_{1}\right\rangle \otimes\left|I_{2}^{\left(2\right)}x_{2}\right\rangle \ \ \ \ \ (12)
\displaystyle \displaystyle = \displaystyle x_{1}\left|x_{1}\right\rangle \otimes\left|x_{2}\right\rangle \ \ \ \ \ (13)

Generally, if we have two one-particle operators {\Gamma_{1}^{\left(1\right)}} and {\Lambda_{2}^{\left(2\right)}}, each of which operates on a different one-particle state, then we can form a direct product operator with the property

\displaystyle \left(\Gamma_{1}^{\left(1\right)}\otimes\Lambda_{2}^{\left(2\right)}\right)\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle =\left|\Gamma_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\Lambda_{2}^{\left(2\right)}\omega_{2}\right\rangle \ \ \ \ \ (14)

That is, a single-particle operator that operates on space {i} that forms part of a direct product operator operates only on the factor of a direct product vector that corresponds to the one-particle space. Given this property, it’s fairly easy to derive a few properties of direct product operators.

\displaystyle \left[\Omega_{1}^{\left(1\right)}\otimes I^{\left(2\right)},I^{\left(1\right)}\otimes\Lambda_{2}^{\left(2\right)}\right]\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \displaystyle = \displaystyle \Omega_{1}^{\left(1\right)}\otimes I^{\left(2\right)}I^{\left(1\right)}\otimes\Lambda_{2}^{\left(2\right)}\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle -\ \ \ \ \ (15)
\displaystyle \displaystyle \displaystyle I^{\left(1\right)}\otimes\Lambda_{2}^{\left(2\right)}\Omega_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \ \ \ \ \ (16)
\displaystyle \displaystyle = \displaystyle \Omega_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\left|I^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\Lambda_{2}^{\left(2\right)}\omega_{2}\right\rangle -\ \ \ \ \ (17)
\displaystyle \displaystyle \displaystyle I^{\left(1\right)}\otimes\Lambda_{2}^{\left(2\right)}\left|\Omega_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|I^{\left(2\right)}\omega_{2}\right\rangle \ \ \ \ \ (18)
\displaystyle \displaystyle = \displaystyle \left|\Omega_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|I^{\left(2\right)}\Lambda_{2}^{\left(2\right)}\omega_{2}\right\rangle -\ \ \ \ \ (19)
\displaystyle \displaystyle \displaystyle \left|I^{\left(1\right)}\Omega_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\Lambda_{2}^{\left(2\right)}\omega_{2}\right\rangle \ \ \ \ \ (20)
\displaystyle \displaystyle = \displaystyle \left|\Omega_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\Lambda_{2}^{\left(2\right)}\omega_{2}\right\rangle -\left|\Omega_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\Lambda_{2}^{\left(2\right)}\omega_{2}\right\rangle \ \ \ \ \ (21)
\displaystyle \displaystyle = \displaystyle 0 \ \ \ \ \ (22)

This derivation shows that the identity operators effectively cancel out and we’re left with the earlier commutator 5 between two operators that operate on different spaces.

The next derivation involves the successive operation of two direct product operators.

\displaystyle \left(\Omega_{1}^{\left(1\right)}\otimes\Gamma_{2}^{\left(2\right)}\right)\left(\theta_{1}^{\left(1\right)}\otimes\Lambda_{2}^{\left(2\right)}\right)\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \displaystyle = \displaystyle \left(\Omega_{1}^{\left(1\right)}\otimes\Gamma_{2}^{\left(2\right)}\right)\left|\theta_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\Lambda_{2}^{\left(2\right)}\omega_{2}\right\rangle \ \ \ \ \ (23)
\displaystyle \displaystyle = \displaystyle \left|\Omega_{1}^{\left(1\right)}\theta_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\Gamma_{2}^{\left(2\right)}\Lambda_{2}^{\left(2\right)}\omega_{2}\right\rangle \ \ \ \ \ (24)
\displaystyle \displaystyle = \displaystyle \left(\Omega_{1}^{\left(1\right)}\theta_{1}^{\left(1\right)}\right)\otimes\left(\Gamma_{2}^{\left(2\right)}\Lambda_{2}^{\left(2\right)}\right)\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \ \ \ \ \ (25)
\displaystyle \displaystyle = \displaystyle \left\{ \left(\Omega\theta\right)^{\left(1\right)}\otimes\left(\Gamma\Lambda\right)^{\left(2\right)}\right\} \left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \ \ \ \ \ (26)
\displaystyle \left(\Omega_{1}^{\left(1\right)}\otimes\Gamma_{2}^{\left(2\right)}\right)\left(\theta_{1}^{\left(1\right)}\otimes\Lambda_{2}^{\left(2\right)}\right) \displaystyle = \displaystyle \left(\Omega\theta\right)^{\left(1\right)}\otimes\left(\Gamma\Lambda\right)^{\left(2\right)} \ \ \ \ \ (27)

Next, another commutator identity. Given

\displaystyle \left[\Omega_{1}^{\left(1\right)},\Lambda_{1}^{\left(1\right)}\right]=\Gamma_{1}^{\left(1\right)} \ \ \ \ \ (28)

we have

\displaystyle \left[\Omega_{1}^{\left(1\right)\otimes\left(2\right)},\Lambda_{1}^{\left(1\right)\otimes\left(2\right)}\right]\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \displaystyle = \displaystyle \left[\Omega_{1}^{\left(1\right)}\otimes I^{\left(2\right)},\Lambda_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\right]\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \ \ \ \ \ (29)
\displaystyle \displaystyle = \displaystyle \left|\left[\Omega_{1}^{\left(1\right)},\Lambda_{1}^{\left(1\right)}\right]\omega_{1}\right\rangle \otimes\left|I^{\left(2\right)}\omega_{2}\right\rangle \ \ \ \ \ (30)
\displaystyle \displaystyle = \displaystyle \left|\Gamma_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|I^{\left(2\right)}\omega_{2}\right\rangle \ \ \ \ \ (31)
\displaystyle \displaystyle = \displaystyle \Gamma_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \ \ \ \ \ (32)
\displaystyle \left[\Omega_{1}^{\left(1\right)\otimes\left(2\right)},\Lambda_{1}^{\left(1\right)\otimes\left(2\right)}\right] \displaystyle = \displaystyle \Gamma_{1}^{\left(1\right)}\otimes I^{\left(2\right)} \ \ \ \ \ (33)

Finally, the square of the sum of two operators:

\displaystyle \left(\Omega_{1}^{\left(1\right)\otimes\left(2\right)}+\Omega_{2}^{\left(1\right)\otimes\left(2\right)}\right)^{2}\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \displaystyle = \displaystyle \left(\Omega_{1}^{\left(1\right)}\otimes I^{\left(2\right)}+I^{\left(1\right)}\otimes\Omega_{2}^{\left(2\right)}\right)^{2}\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \ \ \ \ \ (34)
\displaystyle \displaystyle = \displaystyle \left(\Omega_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\right)^{2}\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle +\ \ \ \ \ (35)
\displaystyle \displaystyle \displaystyle \Omega_{1}^{\left(1\right)}\otimes I^{\left(2\right)}I^{\left(1\right)}\otimes\Omega_{2}^{\left(2\right)}\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle +\ \ \ \ \ (36)
\displaystyle \displaystyle \displaystyle I^{\left(1\right)}\otimes\Omega_{2}^{\left(2\right)}\Omega_{1}^{\left(1\right)}\otimes I^{\left(2\right)}\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle +\ \ \ \ \ (37)
\displaystyle \displaystyle \displaystyle \left(I^{\left(1\right)}\otimes\Omega_{2}^{\left(2\right)}\right)^{2}\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \ \ \ \ \ (38)
\displaystyle \displaystyle = \displaystyle \left|\left(\Omega_{1}^{2}\right)^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|I^{\left(2\right)}\omega_{2}\right\rangle +\ \ \ \ \ (39)
\displaystyle \displaystyle \displaystyle \left|\Omega_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\Omega_{2}^{\left(2\right)}\omega_{2}\right\rangle +\ \ \ \ \ (40)
\displaystyle \displaystyle \displaystyle \left|\Omega_{1}^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\Omega_{2}^{\left(2\right)}\omega_{2}\right\rangle +\ \ \ \ \ (41)
\displaystyle \displaystyle \displaystyle \left|I^{\left(1\right)}\omega_{1}\right\rangle \otimes\left|\left(\Omega_{2}^{2}\right)^{\left(2\right)}\omega_{2}\right\rangle \ \ \ \ \ (42)
\displaystyle \displaystyle = \displaystyle \left(\left(\Omega_{1}^{2}\right)^{\left(1\right)}\otimes I^{\left(2\right)}+2\Omega_{1}^{\left(1\right)}\otimes\Omega_{2}^{\left(2\right)}+I^{\left(1\right)}\otimes\left(\Omega_{2}^{2}\right)^{\left(2\right)}\right)\left|\omega_{1}\right\rangle \otimes\left|\omega_{2}\right\rangle \ \ \ \ \ (43)
\displaystyle \left(\Omega_{1}^{\left(1\right)\otimes\left(2\right)}+\Omega_{2}^{\left(1\right)\otimes\left(2\right)}\right)^{2} \displaystyle = \displaystyle \left(\Omega_{1}^{2}\right)^{\left(1\right)}\otimes I^{\left(2\right)}+2\Omega_{1}^{\left(1\right)}\otimes\Omega_{2}^{\left(2\right)}+I^{\left(1\right)}\otimes\left(\Omega_{2}^{2}\right)^{\left(2\right)} \ \ \ \ \ (44)

In this derivation, we used the fact that the identity operator leaves its operand unchanged, and thus that {\left(I^{2}\right)^{\left(i\right)}=I^{\left(i\right)}} for either space {i}.

Vector spaces – number of dimensions

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Exercises 1.4.1 – 1.4.2.

Here are a couple of theorems that arise from the subspace theorem we proved earlier, which is:

If {U} is a subspace of {V}, then {V=U\oplus U^{\perp}}. (Recall the direct sum.) Here, the orthogonal complement {U^{\perp}} of {U} is the set of all vectors that are orthogonal to all vectors {u\in U}.

First, we can show that:

Theorem 1 The dimensionality of a vector space is {n_{\perp}}, the maximum number of mutually orthogonal vectors in the space.

Proof: The set of mutually orthogonal vectors is linearly independent, and since it is the largest such set, any vector {v\in V} can be written as a linear combination of them. Thus the dimension of the space cannot be greater than {n_{\perp}}. Since the set is linearly dependent, no member of the set can be written as a linear combination of the remaining members of the set, so the dimension can’t be less than {n_{\perp}}. Thus the dimension must be equal to {n_{\perp}}. \Box

Now we look at a couple of other theorems.

Theorem 2 In a vector space {V^{n}} of dimension {n}, the set {V_{\perp}} of all vectors orthogonal to any specific vector {v\ne\left|0\right\rangle } forms a subspace {V^{n-1}} of dimension {n-1}.

Proof: From the subspace theorem above, if we take {U} to be the subspace spanned by {v}, then {U^{\perp}} is the orthogonal subspace. Since the dimension of {U} is 1 and {V^{n}=U\oplus U^{\perp}}, the dimension of {U^{\perp}=V^{n-1}} is {n-1}.\Box

Theorem 3 Given two subspaces {V_{1}^{n_{1}}} and {V_{2}^{n_{2}}} such that every vector {v_{1}\in V_{1}} is orthogonal to every vector {v_{2}\in V_{2}}, the dimension of {V_{1}\oplus V_{2}} is {n_{1}+n_{2}}.

Proof: An orthonormal basis of {V_{1}} consists of {n_{1}} mutually orthogonal vectors in {V_{1}}, and similarly, an orthonormal basis of {V_{2}} consists of {n_{2}} mutually orthogonal vectors in {V_{2}}. These bases consist of the maximum number of mutually orthogonal vectors in their respective spaces. In the direct sum {V_{1}\oplus V_{2}}, we therefore have a set of {n_{1}+n_{2}} mutually orthogonal vectors, which is the maximum number of such vectors in {V_{1}\oplus V_{2}}. This follows because a vector {w\in V_{1}\oplus V_{2}} must be a linear combination of a vector {v_{1}\in V_{1}} and a vector {v_{2}\in V_{2}}, where {v_{i}} is, in turn, a linear combination of the basis of space {V_{i}}. Thus {w=v_{1}+v_{2}} must be a linear combination of vectors from the two bases combined. Hence the dimension of {V_{1}\oplus V_{2}} is {n_{1}+n_{2}}.\Box


Vector spaces & linear independence – some examples

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Exercises 1.1.1 – 1.1.5.

Here are a few examples of vector space problems.

Given the axioms of a vector space, we can derive a few more properties. I’ll use Shankar’s notation for vectors, which is essentially Dirac’s bra-ket notation.

Theorem 1 The additive identity {0} is unique.

Proof: Proof: (by contradiction). Suppose there are two distinct additive identities {\left|0\right\rangle } and {\left|0^{\prime}\right\rangle }. Then

\displaystyle   \left|0^{\prime}\right\rangle \displaystyle  = \displaystyle  \left|0^{\prime}\right\rangle +\left|0\right\rangle \mbox{ (since \ensuremath{\left|0\right\rangle } is an additive identity)}\ \ \ \ \ (1)
\displaystyle  \displaystyle  = \displaystyle  \left|0\right\rangle +\left|0^{\prime}\right\rangle \mbox{ (commutative addition)}\ \ \ \ \ (2)
\displaystyle  \displaystyle  = \displaystyle  \left|0\right\rangle \mbox{ (since \ensuremath{\left|0^{\prime}\right\rangle } is an additive identity)} \ \ \ \ \ (3)


Theorem 2 Multiplication of any vector by the zero scalar gives the zero vector.

Proof: We wish to show that {0\left|v\right\rangle =\left|0\right\rangle } for all {v\in V}. We have

\displaystyle   \left|0\right\rangle \displaystyle  = \displaystyle  \left(0+1\right)\left|v\right\rangle +\left|-v\right\rangle \ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  0\left|v\right\rangle +\left|v\right\rangle +\left|-v\right\rangle \ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  0\left|v\right\rangle +\left|0\right\rangle \ \ \ \ \ (6)
\displaystyle  \displaystyle  = \displaystyle  0\left|v\right\rangle \ \ \ \ \ (7)

where the third line follows because {\left|-v\right\rangle } is the additive inverse of {\left|v\right\rangle } and the last line follows because {\left|0\right\rangle } is the additive identity vector.\Box

Theorem 3 {\left|-v\right\rangle =-\left|v\right\rangle }. That is, {-\left|v\right\rangle } is the additive inverse of {\left|v\right\rangle }.

Proof: The negative of a vector {v} is multiplication of {v} by the scalar {-1}, so

\displaystyle   \left|v\right\rangle +\left(-\left|v\right\rangle \right) \displaystyle  = \displaystyle  \left(1+\left(-1\right)\right)\left|v\right\rangle \ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  0\left|v\right\rangle \ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \left|0\right\rangle \ \ \ \ \ (10)

by theorem 2. Thus {-\left|v\right\rangle } is an additive inverse of {\left|v\right\rangle }, so {-\left|v\right\rangle =\left|-v\right\rangle }.\Box

Theorem 4 The additive inverse {\left|-v\right\rangle } is unique.

Proof: Suppose there is another vector {\left|w\right\rangle } for which {\left|v\right\rangle +\left|w\right\rangle =\left|0\right\rangle }. By theorem 1, {\left|0\right\rangle } is unique, so we must have {\left|v\right\rangle +\left|w\right\rangle =\left|v\right\rangle +\left|-v\right\rangle }. By theorem 3, this gives

\displaystyle   \left|v\right\rangle -\left|v\right\rangle +\left|w\right\rangle \displaystyle  = \displaystyle  \left|-v\right\rangle \ \ \ \ \ (11)
\displaystyle  \left|0\right\rangle +\left|w\right\rangle \displaystyle  = \displaystyle  \left|-v\right\rangle \ \ \ \ \ (12)
\displaystyle  \left|w\right\rangle \displaystyle  = \displaystyle  \left|-v\right\rangle \ \ \ \ \ (13)

where the third line follows because {\left|0\right\rangle } is the additive identity.\Box

Example 1 Consider the set of all entities {\left(a,b,c\right)} where the entries are real numbers. Addition and scalar multiplication are defined as

\displaystyle   \left(a,b,c\right)+\left(d,e,f\right) \displaystyle  \equiv \displaystyle  \left(a+d,b+e,c+f\right)\ \ \ \ \ (14)
\displaystyle  \alpha\left(a,b,c\right) \displaystyle  \equiv \displaystyle  \left(\alpha a,\alpha b,\alpha c\right) \ \ \ \ \ (15)

The null vector is

\displaystyle  \left|0\right\rangle =\left(0,0,0\right) \ \ \ \ \ (16)

The inverse of {\left(a,b,c\right)} is {\left(-a,-b,-c\right)}. As the set is closed under addition and scalar multiplication it is a vector space. However, a subset such as {\left(a,b,1\right)} is not a vector space since it is not closed under addition or scalar multiplication:

\displaystyle   \left(a,b,1\right)+\left(d,e,1\right) \displaystyle  = \displaystyle  \left(a+d,b+e,2\right)\ \ \ \ \ (17)
\displaystyle  2\left(a,b,1\right) \displaystyle  = \displaystyle  \left(2a,2b,2\right) \ \ \ \ \ (18)

Neither of the vectors on the RHS are of the form {\left(a,b,1\right)} so they don’t lie in the set.

Example 2 The set of all functions {f\left(x\right)} defined on an interval {0\le x\le L} form a vector space if we define addition as pointwise addition {f+g=f\left(x\right)+g\left(x\right)} for all {x}, and scalar multiplication by {a} as {af\left(x\right)}.

Some subsets of this vector space are also vector spaces. For example the set of all functions that satsify {f\left(0\right)=f\left(L\right)=0} is a vector space, because the sum of any two such functions also satisfies {\left(f+g\right)\left(0\right)=\left(f+g\right)\left(L\right)=0}, and scalar multiplication leaves the endpoints at 0 as well.

The subset of periodic functions {f\left(0\right)=f\left(L\right)} (not necessarily equal to 0) is also a vector space. Adding any two functions from this subset gives a sum such that

\displaystyle   f\left(0\right)+g\left(0\right) \displaystyle  = \displaystyle  f\left(L\right)+g\left(L\right)\ \ \ \ \ (19)
\displaystyle  \left(f+g\right)\left(0\right) \displaystyle  = \displaystyle  \left(f+g\right)\left(L\right) \ \ \ \ \ (20)

Multiplying by a scalar gives

\displaystyle   a\left(f\left(0\right)+g\left(0\right)\right) \displaystyle  = \displaystyle  a\left(f\left(L\right)+g\left(L\right)\right)\ \ \ \ \ (21)
\displaystyle  a\left(f+g\right)\left(0\right) \displaystyle  = \displaystyle  a\left(f+g\right)\left(L\right) \ \ \ \ \ (22)

However, a subset such as all functions with {f\left(0\right)=4} is not a vector space, since adding two such functions gives a sum with {\left(f+g\right)\left(0\right)=8}, and multiplying by a scalar gives a function with {af\left(0\right)=4a}, neither of which is in the subset.

Now a couple of examples of linear independence.

Example 3 We have three vectors from the vector space of real {2\times2} matrices:

\displaystyle   \left|1\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & 1\\ 0 & 0 \end{array}\right]\ \ \ \ \ (23)
\displaystyle  \left|2\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 1\\ 0 & 1 \end{array}\right]\ \ \ \ \ (24)
\displaystyle  \left|3\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{cc} -2 & -1\\ 0 & -2 \end{array}\right] \ \ \ \ \ (25)

These are not linearly independent, because {\left|3\right\rangle =\left|1\right\rangle -2\left|2\right\rangle }.

Example 4 We have 3 row vectors

\displaystyle   \left|1\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 1 & 1 & 0\end{array}\right]\ \ \ \ \ (26)
\displaystyle  \left|2\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 1 & 0 & 1\end{array}\right]\ \ \ \ \ (27)
\displaystyle  \left|3\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 3 & 2 & 1\end{array}\right] \ \ \ \ \ (28)

These are linearly dependent, since {\left|3\right\rangle =2\left|1\right\rangle +\left|2\right\rangle }.

Now we look at the 3 vectors

\displaystyle   \left|1\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 1 & 1 & 0\end{array}\right]\ \ \ \ \ (29)
\displaystyle  \left|2\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 1 & 0 & 1\end{array}\right]\ \ \ \ \ (30)
\displaystyle  \left|3\right\rangle \displaystyle  = \displaystyle  \left[\begin{array}{ccc} 0 & 1 & 1\end{array}\right] \ \ \ \ \ (31)

We can show that these are linearly independent by attempting to solve the equation

\displaystyle  0=a\left|1\right\rangle +b\left|2\right\rangle +c\left|3\right\rangle \ \ \ \ \ (32)

Looking at each component, we have

\displaystyle   a+b \displaystyle  = \displaystyle  0\ \ \ \ \ (33)
\displaystyle  a+c \displaystyle  = \displaystyle  0\ \ \ \ \ (34)
\displaystyle  b+c \displaystyle  = \displaystyle  0 \ \ \ \ \ (35)

Solving the last two equations for {a} and {b} in terms of {c} and substituting into the first equation, we get

\displaystyle   -2c \displaystyle  = \displaystyle  0\ \ \ \ \ (36)
\displaystyle  c \displaystyle  = \displaystyle  0 \ \ \ \ \ (37)

Thus we find that the only solution is {a=b=c=0}, which proves linear independence.

Unitary operators

References: edX online course MIT 8.05.1x Week 4.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 7.

Another important type of operator is the unitary operator {U}, which is defined by the condition that it is surjective and that

\displaystyle  \left|Uu\right|=\left|u\right| \ \ \ \ \ (1)

for all {u\in V}. That is, a unitary operator preserves the norm of all vectors. The identity matrix {I} is a special case of a unitary operator, as it doesn’t change any vector, but multiplying {I} by any complex number {\alpha} with {\left|\alpha\right|=1} also preserves the norm, so {\alpha I} is another unitary operator.

Because {U} preserves the norm of all vectors, the only vector that can be in the null space of {U} is the zero vector, meaning that {U} is also injective. As it is both injective and surjective, it is invertible.

Theorem 1 For a unitary operator {U}, {U^{\dagger}=U^{-1}}.

Proof: From its definition and the properties of an adjoint operator, we have

\displaystyle   \left|Uu\right|^{2} \displaystyle  = \displaystyle  \left\langle Uu,Uu\right\rangle \ \ \ \ \ (2)
\displaystyle  \displaystyle  = \displaystyle  \left\langle u,U^{\dagger}Uu\right\rangle \ \ \ \ \ (3)
\displaystyle  \displaystyle  = \displaystyle  \left\langle u,u\right\rangle \ \ \ \ \ (4)

Therefore, {U^{\dagger}U=I} so {U^{\dagger}=U^{-1}}.\Box

Theorem 2 Unitary operators preserve inner products, meaning that {\left\langle Uu,Uv\right\rangle =\left\langle u,v\right\rangle } for all {u,v\in V}.

Proof: Since {U^{\dagger}=U^{-1}} we have

\displaystyle  \left\langle Uu,Uv\right\rangle =\left\langle u,U^{\dagger}Uv\right\rangle =\left\langle u,v\right\rangle \ \ \ \ \ (5)


Theorem 3 Acting on an orthonormal basis {\left(e_{1},\ldots,e_{n}\right)} with a unitary operator {U} produces another orthonormal basis.

Proof: Suppose the orthonormal basis is converted to another set of vectors {\left(f_{1},\ldots,f_{n}\right)} by {U}:

\displaystyle  f_{i}=Ue_{i} \ \ \ \ \ (6)


\displaystyle  \left\langle f_{i},f_{j}\right\rangle =\left\langle Ue_{i},Ue_{j}\right\rangle =\left\langle e_{i},e_{j}\right\rangle =\delta_{ij} \ \ \ \ \ (7)

Thus {\left(f_{1},\ldots,f_{n}\right)} are an orthonormal set. Since the orthonormal basis {\left(e_{1},\ldots,e_{n}\right)} spans {V} (by assumption) and the set {\left(f_{1},\ldots,f_{n}\right)} contains {n} linearly independent orthonormal vectors, {\left(f_{1},\ldots,f_{n}\right)} is also an orthonormal basis for {V}.\Box

Theorem 4 If one orthonormal basis {\left(e_{1},\ldots,e_{n}\right)} is converted to another {\left(f_{1},\ldots,f_{n}\right)} by a unitary operator {U}, then the matrix elements of {U} are the same in both bases.

Proof: This is just a special case of the more general theorem that states that any operator that transforms one set of basis vectors into another has the same matrix elements in both bases. In this case, the proof is especially simple:

\displaystyle   U_{ki}\left(\left\{ e\right\} \right) \displaystyle  = \displaystyle  \left\langle e_{k},Ue_{i}\right\rangle \ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  \left\langle U^{-1}f_{k},f_{i}\right\rangle \ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \left\langle U^{\dagger}f_{k},f_{i}\right\rangle \ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \left\langle f_{k},Uf_{i}\right\rangle \ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  U_{ki}\left(\left\{ f\right\} \right) \ \ \ \ \ (12)


Hermitian operators – a few theorems

References: edX online course MIT 8.05.1x Week 4.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 7.

A hermitian operator {T} satisfies {T=T^{\dagger}}. [Axler (and most mathematicians, probably) refers to a hermitian operator as self-adjoint and uses the notation {T^*} for {T^{\dagger}}.]

As preparation for discussing hermitian operators, we need the following theorem.

Theorem 1 If {T} is a linear operator in a complex vector space {V}, then if {\left\langle v,Tv\right\rangle =0} for all {v\in V}, then {T=0}.

Proof: The idea is to show something even more general, namely that {\left\langle u,Tv\right\rangle =0} for all {u,v\in V}. If we can do this, then setting {u=Tv} means that {\left\langle Tv,Tv\right\rangle =0} for all {v\in V}, which in turn implies that {Tv=0} for all {v\in V}, implying further that {T=0}.

Zwiebach goes through a few stages in developing the proof, but the end result is that we can write

\displaystyle   \left\langle u,Tv\right\rangle \displaystyle  = \displaystyle  \frac{1}{4}\left[\left\langle u+v,T\left(u+v\right)\right\rangle -\left\langle u-v,T\left(u-v\right)\right\rangle \right]+\ \ \ \ \ (1)
\displaystyle  \displaystyle  \displaystyle  \frac{1}{4i}\left[\left\langle u+iv,T\left(u+iv\right)\right\rangle -\left\langle u-iv,T\left(u-iv\right)\right\rangle \right] \ \ \ \ \ (2)

Note that all the terms on the RHS are of the form {\left\langle x,Tx\right\rangle } for some {x}. Thus if we require {\left\langle x,Tx\right\rangle =0} for all {x\in V}, then all four terms are separately 0, meaning that {\left\langle u,Tv\right\rangle =0} as desired, completing the proof. \Box

Although we’ve used the imaginary number {i} in this proof, we might wonder if it really does restrict the result to complex vector spaces. That is, is there some other decomposition of {\left\langle u,Tv\right\rangle } that doesn’t required complex numbers that would still work?

In fact, we don’t need to worry about this, since there is a simple counter-example to the theorem if we consider a real vector space. In 2-d or 3-d space, an operator {T} that rotates a vector through {\frac{\pi}{2}} always produces a vector orthogonal to the original, resulting in {\left\langle v,Tv\right\rangle =0} for all {v}. In this case, {T\ne0} so the theorem is definitely not true for real vector spaces.

Now we can turn to a few theorems about hermitian operators. First, since every operator on a finite-dimensional complex vector space has at least one eigenvalue, we know that every hermitian operator has at least one eigenvalue. This leads to the first theorem on hermitian operators.

Theorem 2 All eigenvalues of hermitian operators are real.

Proof: Since at least one eigenvalue {\lambda} exists, let {v} be the corresponding non-zero eigenvector, so that {Tv=\lambda v}. We have

\displaystyle  \left\langle v,Tv\right\rangle =\left\langle v,\lambda v\right\rangle =\lambda\left\langle v,v\right\rangle \ \ \ \ \ (3)

Since {T=T^{\dagger}} we also have

\displaystyle  \left\langle v,Tv\right\rangle =\left\langle T^{\dagger}v,v\right\rangle =\left\langle Tv,v\right\rangle =\left\langle \lambda v,v\right\rangle =\lambda^*\left\langle v,v\right\rangle \ \ \ \ \ (4)

Equating the last two equations, and remembering that {\left\langle v,v\right\rangle \ne0}, we have {\lambda=\lambda^*}, so {\lambda} is real. \Box

Next, a theorem on the eigenvectors of distinct eigenvalues.

Theorem 3 Eigenvectors associated with different eigenvalues of a hermitian operator are orthogonal.

Proof: Suppose {\lambda_{1}\ne\lambda_{2}} are two eigenvalues of {T}, and {v_{1}} and {v_{2}} are the corresponding eigenvectors. Then {Tv_{1}=\lambda_{1}v_{1}} and {Tv_{2}=\lambda_{2}v_{2}}. Taking an inner product, we have

\displaystyle   \left\langle v_{2},Tv_{1}\right\rangle \displaystyle  = \displaystyle  \lambda_{1}\left\langle v_{2},v_{1}\right\rangle \ \ \ \ \ (5)
\displaystyle  \left\langle v_{2},Tv_{1}\right\rangle \displaystyle  = \displaystyle  \left\langle Tv_{2},v_{1}\right\rangle \ \ \ \ \ (6)
\displaystyle  \displaystyle  = \displaystyle  \lambda_{2}\left\langle v_{2},v_{1}\right\rangle \ \ \ \ \ (7)

where in the last line we used the fact that {\lambda_{2}} is real when taking it outside the inner product. Equating the first and last lines and using {\lambda_{1}\ne\lambda_{2}}, we see that {\left\langle v_{2},v_{1}\right\rangle =0} as required.\Box

Linear functionals and adjoint operators

References: edX online course MIT 8.05.1x Week 4.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapters 3.F, 7.

Linear functionals

A linear functional is a linear map {\phi\left(v\right)} from a vector space {V} to the number field {\mathbb{F}} which satisfies the two properties

  1. {\phi\left(v_{1}+v_{2}\right)=\phi\left(v_{1}\right)+\phi\left(v_{2}\right)}, with {v_{1},v_{2}\in V}.
  2. {\phi\left(av\right)=a\phi\left(v\right)} for all {v\in V} and {a\in\mathbb{F}}.

That is, a linear functional acts on a vector and produces a number as output.

A linear functional is actually a vector space, since it satisfies all the required axioms. Many of these axioms are satisfied because {V} on which {\phi} acts is a vector space. The only axiom that requires a bit of examination is the existence of an additive identity. This requires {\phi\left(v+0\right)=\phi\left(v\right)}. From property 1 above, this means that {\phi\left(0\right)=0}.

From the definition above, we can prove that any linear functional can be written as an inner product.

Theorem 1 For any linear functional {\phi} on {V} there is a unique vector {u\in V} such that {\phi\left(v\right)=\left\langle u,v\right\rangle } for all {v\in V}.

Proof: We can write any vector {v} in terms of an orthonormal basis {\left(e_{1},\ldots,e_{n}\right)} as

\displaystyle v=\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i} \ \ \ \ \ (1)

By applying the two properties of a linear functional above, we have

\displaystyle \phi\left(v\right) \displaystyle = \displaystyle \phi\left(\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i}\right)\ \ \ \ \ (2)
\displaystyle \displaystyle = \displaystyle \sum_{i=1}^{n}\left\langle e_{i},v\right\rangle \phi\left(e_{i}\right)\ \ \ \ \ (3)
\displaystyle \displaystyle = \displaystyle \sum_{i=1}^{n}\left\langle e_{i},\phi\left(e_{i}\right)v\right\rangle \ \ \ \ \ (4)
\displaystyle \displaystyle = \displaystyle \sum_{i=1}^{n}\left\langle \phi\left(e_{i}\right)^*e_{i},v\right\rangle \ \ \ \ \ (5)
\displaystyle \displaystyle = \displaystyle \left\langle \left[\sum_{i=1}^{n}\phi\left(e_{i}\right)^*e_{i}\right],v\right\rangle \ \ \ \ \ (6)
\displaystyle \displaystyle \equiv \displaystyle \left\langle u,v\right\rangle \ \ \ \ \ (7)


\displaystyle u\equiv\sum_{i=1}^{n}\phi\left(e_{i}\right)^*e_{i} \ \ \ \ \ (8)

We were able to move {\phi\left(e_{i}\right)} inside the inner product in the third line above since{\phi\left(e_{i}\right)} is just a number.

To prove that {u} is unique, as usual we suppose there is another {u^{\prime}} that gives the same result as {u} for all {v\in V}. This means that {\left\langle u^{\prime},v\right\rangle =\left\langle u,v\right\rangle } or {\left\langle u^{\prime}-u,v\right\rangle =0} for all {v}. We can then choose {v=u^{\prime}-u}, giving {\left\langle u^{\prime}-u,u^{\prime}-u\right\rangle =0}, which implies that {u^{\prime}-u=0} so {u^{\prime}=u}. \Box

Adjoint operators

Suppose we have some linear operator {T} and some fixed vector {u}. We can then form the inner product

\displaystyle \phi\left(v\right)=\left\langle u,Tv\right\rangle \ \ \ \ \ (9)

{\phi\left(v\right)} is a linear functional since it satisfies the two properties specified earlier. It is now stated in Zwiebach’s notes that because {\phi} is a linear functional, we can write it in the form {\left\langle w,v\right\rangle } for some vector {w}. It’s not clear to me that this follows directly, since the original definition of a linear functional applied to the entire vector space {V}, whereas here we don’t know whether {T} is a surjective operator, that is, whether the range of {Tv} is the entire space {V}. The motivation behind this step is the definition of the adjoint operator, but in Axler’s book (chapter 7.A), an adjoint is just defined directly without any motivation from linear functionals.

Anyway, we’ll just go with Zwiebach’s argument, since the rest of the derivation is fairly easy to follow. We assume that for a suitable vector {w} we have

\displaystyle \left\langle u,Tv\right\rangle =\left\langle w,v\right\rangle \ \ \ \ \ (10)

The vector {w} depends on both the operator {T} and the vector {u}, so we can write it as a function of {u}, using the notation

\displaystyle w=T^{\dagger}u \ \ \ \ \ (11)

This gives us the relation

\displaystyle \left\langle u,Tv\right\rangle =\left\langle T^{\dagger}u,v\right\rangle \ \ \ \ \ (12)

At this stage, we can’t be sure that {T^{\dagger}} is a linear operator; it may be some non-linear map from one vector to another. However, we have

Theorem 2 The operator {T^{\dagger}}, called the adjoint of {T}, is a linear operator: {T^{\dagger}\in\mathcal{L}\left(V\right)}.

Proof: Consider

\displaystyle \left\langle u_{1}+u_{2},Tv\right\rangle \displaystyle = \displaystyle \left\langle T^{\dagger}\left(u_{1}+u_{2}\right),v\right\rangle \ \ \ \ \ (13)
\displaystyle \left\langle u_{1}+u_{2},Tv\right\rangle \displaystyle = \displaystyle \left\langle u_{1},Tv\right\rangle +\left\langle u_{2},Tv\right\rangle \ \ \ \ \ (14)
\displaystyle \displaystyle = \displaystyle \left\langle T^{\dagger}u_{1},v\right\rangle +\left\langle T^{\dagger}u_{2},v\right\rangle \ \ \ \ \ (15)
\displaystyle \displaystyle = \displaystyle \left\langle T^{\dagger}u_{1}+T^{\dagger}u_{2},v\right\rangle \ \ \ \ \ (16)

Comparing the first and last lines gives us

\displaystyle T^{\dagger}\left(u_{1}+u_{2}\right)=T^{\dagger}u_{1}+T^{\dagger}u_{2} \ \ \ \ \ (17)

A similar argument can be used for multiplication by a number:

\displaystyle \left\langle au,Tv\right\rangle \displaystyle = \displaystyle \left\langle T^{\dagger}\left(au\right),v\right\rangle \ \ \ \ \ (18)
\displaystyle \left\langle au,Tv\right\rangle \displaystyle = \displaystyle a^*\left\langle u,Tv\right\rangle \ \ \ \ \ (19)
\displaystyle \displaystyle = \displaystyle a^*\left\langle T^{\dagger}u,v\right\rangle \ \ \ \ \ (20)
\displaystyle \displaystyle = \displaystyle \left\langle aT^{\dagger}u,v\right\rangle \ \ \ \ \ (21)

Again, comparing the first and last lines we have

\displaystyle T^{\dagger}\left(au\right)=aT^{\dagger}u \ \ \ \ \ (22)

Thus {T^{\dagger}} satisfies the two conditions required for linearity. \Box

A couple of other results follow fairly easily (proofs are in Zwiebach’s notes, if you’re interested):

\displaystyle \left(ST\right)^{\dagger} \displaystyle = \displaystyle T^{\dagger}S^{\dagger}\ \ \ \ \ (23)
\displaystyle \left(S^{\dagger}\right)^{\dagger} \displaystyle = \displaystyle S \ \ \ \ \ (24)

A very important result is the representation of adjoint operators in matrix form.

If we have an orthonormal basis {\left(e_{1},\ldots,e_{n}\right)} and an operator {T}, then {T} transforms the basis according to (using the summation convention):

\displaystyle Te_{k}=T_{ik}e_{i} \ \ \ \ \ (25)

Thus the matrix elements in this basis are found by taking the inner product with {e_{j}}:

\displaystyle \left\langle e_{j},T_{ik}e_{i}\right\rangle =T_{ik}\left\langle e_{j},e_{i}\right\rangle =T_{ik}\delta_{ji}=T_{jk} \ \ \ \ \ (26)


\displaystyle T^{\dagger}e_{j}=T_{ij}^{\dagger}e_{i} \ \ \ \ \ (27)

Taking the inner product on the right with {e_{k}} we get

\displaystyle \left\langle T^{\dagger}e_{j},e_{k}\right\rangle =\left\langle T_{ij}^{\dagger}e_{i},e_{k}\right\rangle =\left(T_{ij}^{\dagger}\right)^*\left\langle e_{i},e_{k}\right\rangle =\left(T^{\dagger}\right)_{kj}^* \ \ \ \ \ (28)

We can take the {T_{ik}} and {T_{ik}^{\dagger}} outside the inner product as they are just numbers. Using this, we have

\displaystyle \left\langle T^{\dagger}e_{j},e_{k}\right\rangle \displaystyle = \displaystyle \left\langle e_{j},T_{ik}e_{i}\right\rangle \ \ \ \ \ (29)
\displaystyle \left(T^{\dagger}\right)_{kj}^* \displaystyle = \displaystyle T_{jk} \ \ \ \ \ (30)

That is, in an orthonormal basis, the adjoint matrix is the complex conjugate transpose of the original matrix:

\displaystyle T^{\dagger}=\left(T^*\right)^{T} \ \ \ \ \ (31)

The superscript {T} indicates ‘transpose’, not another operator!

Projection operators

References: edX online course MIT 8.05.1x Week 4.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 6.

Continuing from our examination of orthonormal bases and the orthogonal complement in a vector space {V}, we can now look at the orthogonal projection, sometimes known in physics as a projection operator.

Suppose we have defined a subspace {U} of {V} and its orthogonal complement {U^{\perp}}, so that {V=U\oplus U^{\perp}}. We can define a linear operator {P_{U}} called the orthogonal projection operator. It has the property that, given any vector {v\in V}, it ‘projects’ out the component of {v} that lies in {U}. That is, if we write

\displaystyle v=u+w \ \ \ \ \ (1)

where {u\in U} and {w\in U^{\perp}}, then

\displaystyle P_{U}v=u \ \ \ \ \ (2)

An example of a projection operator is an operator in 3-d space that projects a vector onto the {xy} plane. Then the {xy} plane is the subspace {U} and the {z} axis is the orthogonal complement {U^{\perp}}.

From the definition of {P_{U}} we can list a few properties:

  1. {P_{U}} is not surjective, that is, its range is smaller than the entire space {V}.
  2. {P_{U}} is not injective, since it maps all vectors {u+w} to {u}, for all {w\in U^{\perp}}. Thus it is a many-to-one mapping.
  3. {P_{U}} is not invertible, since it is not injective.
  4. Its null space is {\mbox{null }P_{U}=U^{\perp}}.
  5. Once {P_{U}} is applied to any vector {v}, all subsequent applications of {P_{U}} have no effect. That is, once you’ve projected out the component of {v} that lies in {U}, all further projections into {U} just give the same result. In other words {P_{U}^{n}=P_{U}} for all integers {n>0}.
  6. {\left|P_{U}v\right|\le\left|v\right|}. This follows from the Pythagorean theorem, since {u} and {w} are orthogonal, so {\left|v\right|^{2}=\left|u\right|^{2}+\left|w\right|^{2}\ge\left|u\right|^{2}=\left|P_{U}v\right|^{2}}. Geometrically, a projection operator cannot increase the ‘length’ (norm) of a vector. This property relies on the fact that the projection is an orthogonal projection. Other projections can increase the length of a vector (think of the shadow cast by a stick; if the surface onto which the shadow falls is nearly parallel to the direction of the incoming light, the shadow is much longer than the stick).

An explicit form for {P_{U}v} can be obtained from the decomposition we had earlier

\displaystyle v=\underbrace{\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i}}_{\in U}+\underbrace{v-\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i}}_{\in U^{\perp}} \ \ \ \ \ (3)

From this,

\displaystyle P_{U}v=\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i} \ \ \ \ \ (4)

From the definition, it seems reasonable that a vector space {V} can be decomposed into a direct sum of {\mbox{range }P_{U}} and {\mbox{null }P_{U}}. We can in fact prove this.

Theorem 1 {P} is an orthogonal projection within the vector space {V} if

\displaystyle V=\mbox{null }P\oplus\mbox{range }P \ \ \ \ \ (5)

Proof: We can take the subspace {U=\mbox{range }P}. From our earlier theorem, we know that {V=U\oplus U^{\perp}}, so we need to show that {U^{\perp}=\mbox{null }P}. Since {Pw=0} for any {w\in U^{\perp}}, then {\mbox{null }P\subset U^{\perp}}, but are there vectors in {U^{\perp}} that are not in {\mbox{null }P}? Suppose there is such a vector {x\in U^{\perp}} such that {Px\ne0}. For such a vector, we can decompose it into {x=x^{\prime}+x^{\prime\prime}} where {x^{\prime}\in\mbox{null }P} and {x^{\prime\prime}\in\mbox{range }P}, with {x^{\prime\prime}\ne0} (since if {x^{\prime\prime}=0}, then {x} would be in {\mbox{null }P}, contrary to our assumption).

As {x\in U^{\perp}}, {\left\langle x,u\right\rangle =0} for all {u\in U=\mbox{range }P}. Therefore {\left\langle x,u\right\rangle =\left\langle x^{\prime}+x^{\prime\prime},u\right\rangle =\left\langle x^{\prime},u\right\rangle +\left\langle x^{\prime\prime},u\right\rangle =0}. Since {x^{\prime}\in\mbox{null }P}, {\left\langle x^{\prime},u\right\rangle =0} (as {x^{\prime}\in U^{\perp}}). Therefore we must have {\left\langle x^{\prime\prime},u\right\rangle =0}, implying that {x^{\prime\prime}\in U^{\perp}} also. Thus {x^{\prime\prime}\in U} and {x^{\prime\prime}\in U^{\perp}}, but the only vector that can be in both a subspace and its orthogonal complement is 0, so {x^{\prime\prime}=0}, which contradicts our assumption above. \Box

From property 5 above, we must have {P_{U}^{2}=P_{U}}, which implies that the eigenvalues of {P_{U}} are 0 and 1. The eigenvectors belong to either the subspace {U} (for eigenvalue 1) or to the orthogonal complement {U^{\perp}} (for eigenvalue 0).

The orthonormal basis of a vector space {V} can be divided into two separate lists of vectors, with one list {\left(e_{1},\ldots,e_{m}\right)} spanning the subspace {U} and the other list {\left(f_{1},\ldots,f_{k}\right)} spanning {U^{\perp}}. A matrix representation of {P_{U}} can be obtained by considering the action of {P_{U}} on each of the basis vectors from the two subspaces. We have

\displaystyle P_{U}e_{i} \displaystyle = \displaystyle e_{i}\ \ \ \ \ (6)
\displaystyle P_{U}f_{i} \displaystyle = \displaystyle 0 \ \ \ \ \ (7)

In general, the matrix representation of an operator {T} is defined in terms of its action on the basis vectors {v_{i}} by

\displaystyle v_{j}^{\prime}=\sum_{i=1}^{n}T_{ij}v_{i} \ \ \ \ \ (8)

For a projection operator, we can see that this means that for the {m} basis vectors {\left(e_{1},\ldots,e_{m}\right)} we must have {P_{ij}=\delta_{ij}} for all {i,j=1,\ldots,m}, while for the {k} basis vectors {\left(f_{1},\ldots,f_{k}\right)} we must have {P_{ij}=0} for all {i,j=1,\ldots,k}. If we list the basis vectors in the order {\left(e_{1},\ldots,e_{m},f_{1},\ldots,f_{k}\right)}, then {P_{U}} is a {\left(m+k\right)\times\left(m+k\right)} diagonal matrix with the diagonal elements in the top {m} rows equal to 1, and all other elements equal to zero.

In this basis, we see that {\mbox{det }P_{U}=0} (because there is at least one zero element on the diagonal) and {\mbox{tr }P_{U}=m}, which is the dimension of the subspace {U}. As the trace and determinant are invariant under a change of basis, these properties apply to any basis.

Orthonormal basis and orthogonal complement

References: edX online course MIT 8.05.1x Week 4.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 6.

Once we have defined an inner product defined on a vector space {V}, we can create an orthonormal basis for {V}. A list of vectors {\left(e_{1},e_{2},\ldots,e_{n}\right)} is orthonormal if

\displaystyle  \left\langle e_{i},e_{j}\right\rangle =\delta_{ij} \ \ \ \ \ (1)

That is, any pair of vectors is orthogonal, and all the vectors have norm 1. In 3-d space, the unit vectors along the three axes form an orthonormal list.

Given an orthonormal list, we can construct a vector from the vectors in that list by

\displaystyle   v \displaystyle  = \displaystyle  a_{1}e_{1}+a_{2}e_{2}+\ldots+a_{n}e_{n}\ \ \ \ \ (2)
\displaystyle  \displaystyle  = \displaystyle  \sum_{i=1}^{n}a_{i}e_{i} \ \ \ \ \ (3)

for {a_{i}\in\mathbb{F}}. The norm of {v} has a simple form:

\displaystyle   \left\langle v,v\right\rangle ^{2} \displaystyle  = \displaystyle  \left\langle \sum_{i=1}^{n}a_{i}e_{i},\sum_{i=1}^{n}a_{i}e_{i}\right\rangle \ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  \sum_{i=1}^{n}\left\langle a_{i}e_{i},a_{i}e_{i}\right\rangle +\mbox{zero terms}\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \sum_{i=1}^{n}\left|a_{i}\right|^{2} \ \ \ \ \ (6)

The ‘zero terms’ in the second line are terms involving {\left\langle a_{i}e_{i},a_{j}e_{j}\right\rangle } for {i\ne j} which are all zero because of 1.

This result shows that an orthonormal list of vectors is linearly independent, since if we form the linear combination

\displaystyle  v=a_{1}e_{1}+a_{2}e_{2}+\ldots+a_{n}e_{n}=0 \ \ \ \ \ (7)

then {\left\langle v,v\right\rangle =0} so from 6 we must have all {a_{i}=0}, which means the list is linearly independent.

If we have an orthonormal list {\left(e_{1},e_{2},\ldots,e_{n}\right)} that is also a basis for {V}, then any vector {v\in V} can be written as

\displaystyle  v=a_{1}e_{1}+a_{2}e_{2}+\ldots+a_{n}e_{n} \ \ \ \ \ (8)

The coefficients {a_{i}} can be found by taking the inner product {\left\langle e_{i},v\right\rangle =a_{i}} (using 1), so we have

\displaystyle  v=\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i} \ \ \ \ \ (9)

For example, in 3-d space, the 3 unit vectors along the {x,y,z} axes form an orthonormal basis for the space. However, the unit vectors along the {x,y} axes form an orthonormal list, but this is not a basis for 3-d space since no vector with a {z} component can be written as a linear combination of these two vectors.

If we have any basis (not necessarily orthonormal), we can form an orthonormal basis using the Gram-Schmidt orthogonalization procedure. We’ve already met this in the context of quantum mechanics, and the derivation for a general finite vector space is much the same, so I’ll just quote the result. The procedure is iterative and follows these steps:

The first vector {e_{1}} in the orthonormal basis is defined by

\displaystyle  e_{1}=\frac{v_{1}}{\left|v_{1}\right|} \ \ \ \ \ (10)

where {v_{1}} is the first vector (well, any vector, really) in the non-orthonormal basis.

Given vector {e_{j-1}} in the orthonormal basis, we can form {e_{j}} from the formula

\displaystyle  e_{j}=\frac{v_{j}-\sum_{i=1}^{j-1}\left\langle e_{i},v_{j}\right\rangle e_{i}}{\left|v_{j}-\sum_{i=1}^{j-1}\left\langle e_{i},v_{j}\right\rangle e_{i}\right|} \ \ \ \ \ (11)

{e_{j}} clearly has norm 1, and we can check that {\left\langle e_{i},e_{j}\right\rangle =\delta_{ij}} by direct calculation. Note that although we’ve indexed the vectors {v_{i}} in the original basis, we can take them in any order when calculating the orthonormal basis via the Gram-Schmidt procedure.

Orthogonal complement

Suppose we have a subset {U} (not necessarily a subspace) of {V}. Then we can define the orthogonal complement {U^{\perp}} of {U} as the set of all vectors that are orthogonal to all vectors {u\in U}. More formally:

\displaystyle  U^{\perp}\equiv\left\{ v\in V|\left\langle v,u\right\rangle =0\mbox{ for all }u\in U\right\} \ \ \ \ \ (12)

A useful general theorem is as follows.

Theorem 1 If {U} is a subspace of {V}, then {V=U\oplus U^{\perp}}. (Recall the direct sum.)

Proof: Given an orthonormal basis of {U}: {\left(e_{1},e_{2},\ldots,e_{n}\right)}, we can write any {v\in V} as the sum

\displaystyle  v=\underbrace{\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i}}_{\in U}+\underbrace{v-\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i}}_{\in U^{\perp}} \ \ \ \ \ (13)

On the RHS, we’ve just added and subtracted the same term from {v}. Since the first term is a linear combination of the basis vectors of {U}, the overall sum is a vector in {U}. To see that the second term is in {U^{\perp}}, take the inner product with any of the basis vectors {e_{k}}:

\displaystyle   \left\langle e_{k},v-\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i}\right\rangle \displaystyle  = \displaystyle  \left\langle e_{k},v\right\rangle -\left\langle e_{k},\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle e_{i}\right\rangle \ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \left\langle e_{k},v\right\rangle -\sum_{i=1}^{n}\left\langle e_{i},v\right\rangle \delta_{ik}\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \left\langle e_{k},v\right\rangle -\left\langle e_{k},v\right\rangle \ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (17)

Finally, since the two vector spaces in a direct sum can have only the zero vector in their intersection, we need to show that {U\cap U^{\perp}=\left\{ 0\right\} }. However if a vector {v} is in both {U} and {U^{\perp}} then it must be orthogonal to itself, so {\left\langle v,v\right\rangle =0} which implies {v=0}. \Box

Thus any vector space {V} can be decomposed into two orthogonal subspaces (assuming that {V} has any subspaces other than {\left\{ 0\right\} }).

Inner products and Hilbert spaces

References: edX online course MIT 8.05.1x Week 4.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 6.

An inner product defined on a vector space {V} is a function that maps each ordered pair of vectors {\left(u,v\right)} of {V} to a number denoted by {\left\langle u,v\right\rangle \in\mathbb{F}}. The inner product satisfies the following axioms:

  1. positivity: {\left\langle v,v\right\rangle \ge0} for all {v\in V}.
  2. definiteness: {\left\langle v,v\right\rangle =0} if and only if {v=0} (the zero vector).
  3. additivity in the second slot: {\left\langle u,v+w\right\rangle =\left\langle u,v\right\rangle +\left\langle u,w\right\rangle }.
  4. homogeneity in the second slot: {\left\langle u,\lambda v\right\rangle =\lambda\left\langle u,v\right\rangle } where {\lambda\in\mathbb{F}}.
  5. conjugate symmetry: {\left\langle u,v\right\rangle =\left\langle v,u\right\rangle ^*}.

[Axler requires additivity and homegeneity in the first slot rather than the second, but Zwiebach uses the conditions above, which are more usual for physics.]

The norm {\left|v\right|} of a vector {v} is defined as

\displaystyle  \left|v\right|^{2}\equiv\left\langle v,v\right\rangle \ \ \ \ \ (1)

All the above applies to both real and complex vector spaces, although it should be noted that in the case of conjugate symmetry in a real space, {\left\langle v,u\right\rangle ^*=\left\langle v,u\right\rangle } so in that case, the condition reduces to {\left\langle u,v\right\rangle =\left\langle v,u\right\rangle }.

Conditions 3 and 4 apply to the first slot as well, though with a slight difference for complex vector spaces. [These properties can actually be derived from the 5 axioms above, so aren’t listed as separate axioms]:

\displaystyle   \left\langle u+v,w\right\rangle \displaystyle  = \displaystyle  \left\langle u,w\right\rangle +\left\langle v,w\right\rangle \ \ \ \ \ (2)
\displaystyle  \left\langle \lambda u,v\right\rangle \displaystyle  = \displaystyle  \lambda^*\left\langle u,v\right\rangle \ \ \ \ \ (3)

Two vectors are orthogonal if {\left\langle u,v\right\rangle =\left\langle v,u\right\rangle =0}. By this definition, the zero vector is orthogonal to all vectors, including itself.

The inner product is non-degenerate, meaning that any vector that is orthogonal to all vectors must be zero.

An orthogonal decomposition is defined for a vector {u} as follows: Suppose {u,v\in V} and {v\ne0}. We can write {u} as

\displaystyle   u \displaystyle  = \displaystyle  \frac{\left\langle u,v\right\rangle }{\left|v\right|^{2}}v+u-\frac{\left\langle u,v\right\rangle }{\left|v\right|^{2}}v\ \ \ \ \ (4)
\displaystyle  \displaystyle  \equiv \displaystyle  cv+w \ \ \ \ \ (5)


\displaystyle   c \displaystyle  = \displaystyle  \frac{\left\langle u,v\right\rangle }{\left|v\right|^{2}}\in\mathbb{F}\ \ \ \ \ (6)
\displaystyle  w \displaystyle  = \displaystyle  u-\frac{\left\langle u,v\right\rangle }{\left|v\right|^{2}}v \ \ \ \ \ (7)

From the definition, {\left\langle w,v\right\rangle =0} so we’ve decomposed {u} into a component {cv} ‘parallel’ to {v} and another component {w} orthogonal to {v}.

There are a couple of important theorems for which we’ll run through the proofs:

Theorem 1 Pythagorean theorem. If {u} and {v} are orthogonal then

\displaystyle  \left|u+v\right|^{2}=\left|u\right|^{2}+\left|v\right|^{2} \ \ \ \ \ (8)

Proof: From the definition of the norm

\displaystyle   \left|u+v\right|^{2} \displaystyle  = \displaystyle  \left\langle u+v,u+v\right\rangle \ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \left\langle u,u\right\rangle +\left\langle u,v\right\rangle +\left\langle v,u\right\rangle +\left\langle v,v\right\rangle \ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \left\langle u,u\right\rangle +\left\langle v,v\right\rangle \ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  \left|u\right|^{2}+\left|v\right|^{2} \ \ \ \ \ (12)

where in the third line we used the orthogonality condition {\left\langle u,v\right\rangle =\left\langle v,u\right\rangle =0}. \Box

Theorem 2 Schwarz (or Cauchy-Schwarz) inequality. For all vectors {u,v\in V}

\displaystyle  \left|\left\langle u,v\right\rangle \right|\le\left|u\right|\left|v\right| \ \ \ \ \ (13)

Proof: The inequality is obviously true (as an equality) if {v=0}, so we need to prove it for {v\ne0}. In that case we can form an orthogonal decomposition of {u}:

\displaystyle  u=\frac{\left\langle u,v\right\rangle }{\left|v\right|^{2}}v+w \ \ \ \ \ (14)

Since {v} and {w} are orthogonal, we can apply the Pythagorean theorem:

\displaystyle   \left|u\right|^{2} \displaystyle  = \displaystyle  \left|\frac{\left\langle u,v\right\rangle }{\left|v\right|^{2}}v\right|^{2}+\left|w\right|^{2}\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \frac{\left|\left\langle u,v\right\rangle \right|^{2}}{\left|v\right|^{4}}\left|v\right|^{2}+\left|w\right|^{2}\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \frac{\left|\left\langle u,v\right\rangle \right|^{2}}{\left|v\right|^{2}}+\left|w\right|^{2}\ \ \ \ \ (17)
\displaystyle  \displaystyle  \ge \displaystyle  \frac{\left|\left\langle u,v\right\rangle \right|^{2}}{\left|v\right|^{2}} \ \ \ \ \ (18)

Rearranging the last line and taking the positive square root (since norms are always non-negative) we have

\displaystyle  \left|\left\langle u,v\right\rangle \right|\le\left|u\right|\left|v\right| \ \ \ \ \ (19)


Theorem 3 Triangle inequality. For all {u,v\in V}

\displaystyle  \left|u+v\right|\le\left|u\right|+\left|v\right| \ \ \ \ \ (20)

Proof: As with the Schwarz inequality, we start with the square of the norm:

\displaystyle   \left|u+v\right|^{2} \displaystyle  = \displaystyle  \left\langle u+v,u+v\right\rangle \ \ \ \ \ (21)
\displaystyle  \displaystyle  = \displaystyle  \left\langle u,u\right\rangle +\left\langle v,v\right\rangle +\left\langle u,v\right\rangle +\left\langle v,u\right\rangle \ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  \left\langle u,u\right\rangle +\left\langle v,v\right\rangle +\left\langle u,v\right\rangle +\left\langle u,v\right\rangle ^*\ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  \left|u\right|^{2}+\left|v\right|^{2}+2\mathfrak{R}\left\langle u,v\right\rangle \ \ \ \ \ (24)
\displaystyle  \displaystyle  \le \displaystyle  \left|u\right|^{2}+\left|v\right|^{2}+2\left|\left\langle u,v\right\rangle \right| \ \ \ \ \ (25)

The last line follows because {\left\langle u,v\right\rangle } is a complex number, so

\displaystyle  \left|\left\langle u,v\right\rangle \right|=\sqrt{\left(\mathfrak{R}\left\langle u,v\right\rangle \right)^{2}+\left(\mathfrak{I}\left\langle u,v\right\rangle \right)^{2}}\ge\mathfrak{R}\left\langle u,v\right\rangle \ \ \ \ \ (26)

We can now apply the Schwarz inequality to the last term in the last line to get

\displaystyle   \left|u+v\right|^{2} \displaystyle  \le \displaystyle  \left|u\right|^{2}+\left|v\right|^{2}+2\left|u\right|\left|v\right|\ \ \ \ \ (27)
\displaystyle  \displaystyle  = \displaystyle  \left(\left|u\right|+\left|v\right|\right)^{2} \ \ \ \ \ (28)

Taking the positive square root, we get

\displaystyle  \left|u+v\right|\le\left|u\right|+\left|v\right| \ \ \ \ \ (29)


A finite-dimensional complex vector space with an inner product is a Hilbert space. An infinite-dimensional complex vector space with an inner product is also a Hilbert space if a completeness property holds. This property is a technical property which is always satisfied in quantum mechanics, so we can assume that any infinite-dimensional complex vector spaces we encounter in quantum theory are Hilbert spaces.

Eigenvalues and eigenvectors

References: edX online course MIT 8.05.1x Week 3.

Sheldon Axler (2015), Linear Algebra Done Right, 3rd edition, Springer. Chapter 5.

While studying quantum mechanics, we have made extensive use of the eigenvalues and eigenvectors (the latter usually called eigenstates in quantum theory) of hermitian operators, since an observable quantity in quantum mechanics is always represented by a hermitian operator and the spectrum of possible values for a given observable is equivalent to the set of eigenvalues of that operator.

It’s useful to re-examine eigenvalues and eigenvectors from a strictly mathematical viewpoint, since this allows us to put precise definitions on many of the terms in common use. As usual, suppose we start with a vector space {V} and an operator {T}. Suppose there is a one-dimensional subspace {U} of {V} which has the property that for any vector {u\in U}, {Tu=\lambda u}. That is, the operator {T} maps any vector {u} back into another vector in the same subspace {U}. In that case, {U} is said to be an invariant subspace under the operator {T}.

You can think of this in geometric terms if we have some {n}-dimensional vector space {V}, and a one-dimensional subspace {U} consisting of all vectors parallel to some straight line within {V}. The operator {T} acting on any vector {u} parallel to that line produces another vector which is also parallel to the same line. Of course we can’t push the geometric illustration too far, since in general {V} and {U} can be complex vector spaces, so the result of acting on {u} with {T} might give you some complex number {\lambda} multiplied by {u}.

The equation

\displaystyle  Tu=\lambda u \ \ \ \ \ (1)

is called an eigenvalue equation, and the number {\lambda\in\mathbb{F}} is called the eigenvalue. The vector {u} itself is called the eigenvector corresponding to the eigenvalue {\lambda}. Since we can multiply both sides of this equation by any number {c}, any multiple of {u} is also an eigenvector corresponding to {\lambda}, so any vector ‘parallel’ to {u} is also an eigenvector. (I’ve put ‘parallel’ in quotes, since we’re allowing for multiplication of {u} by complex as well as real numbers.)

It can happen that, for a particular value of {\lambda}, there are two or more linearly independent (that is, non-parallel) eigenvectors. In that case, the subspace spanned by the eigenvectors is two- or higher-dimensional.

Another way of writing 1 is by introducing the identity operator {I}:

\displaystyle  \left(T-\lambda I\right)u=0 \ \ \ \ \ (2)

If this equation has a solution other than {u=0}, then the operator {T-\lambda I} has a non-trivial null space, which in turn means that {T-\lambda I} is not injective (not one-to-one) and therefore not invertible. Also, the eigenvectors of {T} with eigenvalue {\lambda} are those vectors {u} in the null space of {T-\lambda I}.

An important result is

Theorem 1 Suppose {\lambda_{1},\ldots,\lambda_{m}} are distinct eigenvalues of {T} and {v_{1},\ldots,v_{m}} are the corresponding non-zero eigenvectors. Then the set {v_{1},\ldots,v_{m}} is linearly independent.

Proof: Suppose to the contrary that {v_{1},\ldots,v_{m}} is linearly dependent. Then there must be some subset that is linearly independent. Suppose that {k} is the smallest positive integer such that {v_{k}} can be written in terms of {v_{1},\ldots,v_{k-1}}. That is, the set {v_{1},\ldots,v_{k-1}} is a linearly independent subset of {v_{1},\ldots,v_{m}}. In that case, there are numbers {a_{1},\ldots,a_{k-1}\in\mathbb{F}} such that

\displaystyle  v_{k}=\sum_{i=1}^{k-1}a_{i}v_{i} \ \ \ \ \ (3)

If we apply the operator {T} to both sides and use the eigenvalue equation, we have

\displaystyle   Tv_{k} \displaystyle  = \displaystyle  \lambda_{k}v_{k}\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  \sum_{i=1}^{k-1}a_{i}Tv_{i}\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \sum_{i=1}^{k-1}a_{i}\lambda_{i}v_{i} \ \ \ \ \ (6)

We can multiply both sides of 3 by {\lambda_{k}} and subtract to get

\displaystyle   \left(\lambda_{k}-\lambda_{k}\right)v_{k} \displaystyle  = \displaystyle  \sum_{i=1}^{k-1}a_{i}\left(\lambda_{i}-\lambda_{k}\right)v_{i}\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (8)

Since the set of vectors {v_{1},\ldots,v_{k-1}} is linearly independent, and {\lambda_{k}\ne\lambda_{i}} for {i=1,\ldots,k-1}, the only solution of this equation is {a_{i}=0} for {i=1,\ldots,k-1}. But this would make {v_{k}=0}, contrary to our assumption that {v_{k}} is a non-zero eigenvector of {T}. Therefore the set {v_{1},\ldots,v_{m}} is linearly independent. \Box

It turns out that there are some operators on real vector spaces that don’t have any eigenvalues. A simple example is the 2-dimensional vector space consisting of the {xy} plane. The rotation operator which rotates any vector about the origin (by some angle other than {2\pi}) doesn’t leave any vector parallel to itself and thus has no eigenvalues or eigenvectors.

However, in a complex vector space, things are a bit neater. This leads to the following theorem:

Theorem 2 Every operator on a finite-dimensional, nonzero, complex vector space has at least one eigenvalue.

Proof: Suppose {V} is a complex vector space with dimension {n>0}. For some vector {v\in V} we can write the {n+1} vectors

\displaystyle  v,Tv,T^{2}v,\ldots,T^{n}v \ \ \ \ \ (9)

Because we have {n+1} vectors in an {n}-dimensional vector space, these vectors must be linearly dependent, which means we can find complex numbers {a_{0},\ldots,a_{n}\in\mathbb{C}}, not all zero, such that

\displaystyle  0=a_{0}v+a_{1}Tv+\ldots+a_{n}T^{n}v \ \ \ \ \ (10)

We can consider a polynomial in {z} with the {a_{i}} as coefficients:

\displaystyle  p\left(z\right)=a_{0}+a_{1}z+\ldots+a_{n}z^{n} \ \ \ \ \ (11)

The Fundamental Theorem of Algebra states that any polynomial of degree {n} can be factored into {n} linear factors. In our case, the actual degree of {p\left(z\right)} is {m\le n} since {a_{n}} could be zero. So we can factor {p\left(z\right)} as follows:

\displaystyle  p\left(z\right)=c\left(z-\lambda_{1}\right)\ldots\left(z-\lambda_{m}\right) \ \ \ \ \ (12)

where {c\ne0}.

Comparing this to 10, we can write that equation as

\displaystyle   0 \displaystyle  = \displaystyle  a_{0}v+a_{1}Tv+\ldots+a_{n}T^{n}v\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  \left(a_{0}I+a_{1}T+\ldots+a_{n}T^{n}\right)v\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  c\left(T-\lambda_{1}I\right)\ldots\left(T-\lambda_{m}I\right)v \ \ \ \ \ (15)

All the {T-\lambda_{i}I} operators in the last line commute with each other since {I} commutes with everything and {T} commutes with itself, so in order for the last line to be zero, there has to be at least one {\lambda_{i}} such that {\left(T-\lambda_{i}I\right)v=0}. That is, there is at least one {\lambda_{i}} such that {T-\lambda_{i}I} has a nonzero null space, which means {\lambda_{i}} is an eigenvalue.\Box