Category Archives: Relativity

Lorentz transformation as product of a pure boost and pure rotation

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, we can now complete the demonstration that a general Lorentz transformation is the product of a pure boost (motion at a constant velocity) multiplied by a pure rotation. We’ll follow Corollary IV.2 in Jaffe’s article.

In the last post, we saw that we could write a general Lorentz transformation in the form

\displaystyle  \widehat{\Lambda x}=A\widehat{x}A^{\dagger} \ \ \ \ \ (1)

where {x} is the 4-vector of a spacetime event, {\Lambda} is the Lorentz transformation as a {4\times4} matrix, {A} is a {2\times2} matrix with complex elements and a hat over a symbol means we’re looking at the {2\times2} complex matrix representing that object. We also saw in the last post that this representation restricts

Jaffe goes through a rather involved proof that the transformation {\Lambda\left(A\right)} defined by 1 is a member of the physically relevant group with {\det\Lambda=+1} and {\Lambda_{00}\ge1}, but this involves a lot of somewhat obscure matrix theorems that I don’t want to get into here, and these techniques don’t seem to be required for the rest of the demonstration, so we’ll just accept this fact for now.

What we really want to do is find out how we can calculate {\Lambda} given the {2\times2} matrix {A}. We can do this by using the result we got earlier for the components of the 4-vector {x}:

\displaystyle  \widehat{x}=\sum_{\mu=0}^{3}x_{\mu}\sigma_{\mu} \ \ \ \ \ (2)

where the {\sigma_{\mu}} are four Hermitian matrices:

\displaystyle   \sigma_{0} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]=I\ \ \ \ \ (3)
\displaystyle  \sigma_{1} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & 1\\ 1 & 0 \end{array}\right]\ \ \ \ \ (4)
\displaystyle  \sigma_{2} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (5)
\displaystyle  \sigma_{3} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right] \ \ \ \ \ (6)

We can invert 2 to get

\displaystyle  x_{\nu}=\left\langle \sigma_{\nu},\widehat{x}\right\rangle =\frac{1}{2}\mbox{Tr}\left(\sigma_{\nu}\widehat{x}\right) \ \ \ \ \ (7)

Reverting back to the {4\times4} matrix {\Lambda} (no hat), we have

\displaystyle   x_{\mu}^{\prime} \displaystyle  = \displaystyle  \sum_{\nu=0}^{3}\Lambda\left(A\right)_{\mu\nu}x_{\nu}\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  \left(\Lambda\left(A\right)x\right)_{\mu}\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},\widehat{\Lambda\left(A\right)x}\right\rangle \ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},A\widehat{x}A^{\dagger}\right\rangle \ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},A\sum_{\nu=0}^{3}x_{\nu}\sigma_{\nu}A^{\dagger}\right\rangle \ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  \sum_{\nu=0}^{3}\left\langle \sigma_{\mu},A\sigma_{\nu}A^{\dagger}\right\rangle x_{\nu} \ \ \ \ \ (13)

We used 1 in the fourth line and 2 in the fifth line. Comparing the first and last lines, we see that

\displaystyle   \Lambda\left(A\right)_{\mu\nu} \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},A\sigma_{\nu}A^{\dagger}\right\rangle \ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}^{\dagger}A\sigma_{\nu}A^{\dagger}\right)\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}A\sigma_{\nu}A^{\dagger}\right) \ \ \ \ \ (16)

where in the last line we used the fact that all the {\sigma_{\mu}} are Hermitian so that {\sigma_{\mu}^{\dagger}=\sigma_{\mu}}.

In order for {\Lambda\left(A\right)} to be a valid Lorentz transformation, clearly its elements must be real numbers. We can show this is true as follows. The complex conjugate is represented by drawing a bar over a quantity. We get

\displaystyle  \overline{\Lambda\left(A\right)_{\mu\nu}}=\frac{1}{2}\mbox{Tr}\left(\overline{\sigma_{\mu}A\sigma_{\nu}A^{\dagger}}\right) \ \ \ \ \ (17)

We can now use the fact that the trace of a product of matrices remains unchanged if we cyclically permute the order of multiplication. In particular {\mbox{Tr}\left(XB^{\dagger}\right)=\mbox{Tr}\left(B^{\dagger}X\right)}. Also, {\mbox{Tr}\left(B^{\dagger}X\right)=\mbox{Tr}\left(\left(\overline{X^{\dagger}B}\right)^{T}\right)=\mbox{Tr}\left(\overline{X^{\dagger}B}\right)} since the trace of a matrix is equal to the trace of its transpose. In 17, we can set {X^{\dagger}=\sigma_{\mu}} and {B=A\sigma_{\nu}A^{\dagger}} and use the fact that the {\sigma_{\mu}} are all Hermitian so that {\sigma_{\mu}^{\dagger}=\sigma_{\mu}}:

\displaystyle   \overline{\Lambda\left(A\right)_{\mu\nu}}=\frac{1}{2}\mbox{Tr}\left(\overline{\sigma_{\mu}A\sigma_{\nu}A^{\dagger}}\right) \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\left(A\sigma_{\nu}A^{\dagger}\right)^{\dagger}\sigma_{\mu}\right)\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(A\sigma_{\nu}A^{\dagger}\sigma_{\mu}\right)\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}A\sigma_{\nu}A^{\dagger}\right)\ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\left(A\right)_{\mu\nu} \ \ \ \ \ (21)

where in the third line we cyclically permuted the matrices in the trace. Thus the elements of {\Lambda\left(A\right)} are real.

Now we consider two cases. First, suppose that {A=U}, where {U} is a unitary matrix, so that {U^{\dagger}=U^{-1}}. From 16 we find that {\Lambda\left(U\right)_{00}} is, using {\sigma_{0}=I}:

\displaystyle   \Lambda\left(U\right)_{00} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{0}U\sigma_{0}U^{\dagger}\right)\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(UU^{\dagger}\right)\ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}I\ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  1 \ \ \ \ \ (25)

The other elements in the first row and first column of {\Lambda} are all zero, as we can see by using 16 again:

\displaystyle   \Lambda\left(U\right)_{0i} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{0}U\sigma_{i}U^{\dagger}\right)\ \ \ \ \ (26)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(U\sigma_{i}U^{\dagger}\right)\ \ \ \ \ (27)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(U^{\dagger}U\sigma_{i}\right)\ \ \ \ \ (28)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}\right)\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (30)

since {\mbox{Tr}\sigma_{i}=0} for {i=1,2,3}. A similar argument works for the first column of {\Lambda\left(U\right)} as well:

\displaystyle   \Lambda\left(U\right)_{i0} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}U\sigma_{0}U^{\dagger}\right)\ \ \ \ \ (31)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}UU^{\dagger}\right)\ \ \ \ \ (32)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}\right)\ \ \ \ \ (33)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (34)

For the other elements, we have

\displaystyle   \Lambda\left(U\right)_{ij} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}U\sigma_{j}U^{\dagger}\right)\ \ \ \ \ (35)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}\left(U^{-1}\right)^{\dagger}\sigma_{j}U^{-1}\right)\ \ \ \ \ (36)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{j}U^{-1}\sigma_{i}\left(U^{-1}\right)^{\dagger}\right)\ \ \ \ \ (37)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\left(U^{-1}\right)_{ji}\ \ \ \ \ (38)
\displaystyle  \displaystyle  = \displaystyle  \left[\Lambda\left(U\right)\right]_{ji}^{-1} \ \ \ \ \ (39)

That is

\displaystyle  \left[\Lambda\left(U\right)\right]^{T}=\Lambda\left(U\right)^{-1} \ \ \ \ \ (40)

so that

\displaystyle  \Lambda=\left[\begin{array}{cc} 1 & 0\\ 0 & \mathcal{R} \end{array}\right] \ \ \ \ \ (41)

where {\mathcal{R}} is a {3\times3} matrix, and the 0s represent 3 zero components in the top row and first column. In other words, when {A=U}, {\Lambda} is a pure rotation.

The other case we need to examine is when {A=H}, where {H} is a Hermitian matrix, so that {H^{\dagger}=H}. In that case, from 16

\displaystyle   \Lambda\left(H\right)_{\mu\nu} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}H\sigma_{\nu}H\right)\ \ \ \ \ (42)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(H\sigma_{\mu}H\sigma_{\nu}\right)\ \ \ \ \ (43)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\nu}H\sigma_{\mu}H\right)\ \ \ \ \ (44)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\left(H\right)_{\nu\mu} \ \ \ \ \ (45)

so {\Lambda\left(H\right)} is a symmetric matrix. (We used two cyclic permutations in the trace here.) Although we haven’t proved that a symmetric Lorentz transformation always represents a pure boost, this has been verified (see, for example, Wikipedia; I can’t be bothered going through it all here).

Now we are ready to get our final result. To do this, we need to use a theorem from matrix algebra which says that every matrix {A} in the group {SL\left(2,\mathbb{C}\right)} (that is, a {2\times2} matrix with complex elements and determinant +1) has a unique polar decomposition into a strictly positive Hermitian matrix {H} and a unitary matrix {U}, so that we always have

\displaystyle  A=HU \ \ \ \ \ (46)

To connect this with what we’ve done above, we can define

\displaystyle   H \displaystyle  = \displaystyle  \left(AA^{\dagger}\right)^{1/2}\ \ \ \ \ (47)
\displaystyle  U \displaystyle  = \displaystyle  H^{-1}A=\left(AA^{\dagger}\right)^{1/2}A \ \ \ \ \ (48)

[The square root of a matrix is defined to be the matrix {S=A^{1/2}} so that {S^{2}=A}.] This definition is consistent with {H} being Hermitian, since

\displaystyle   \left(S^{2}\right)^{\dagger} \displaystyle  = \displaystyle  A^{\dagger}=A\ \ \ \ \ (49)
\displaystyle  \displaystyle  = \displaystyle  \left(SS\right)^{\dagger}\ \ \ \ \ (50)
\displaystyle  \displaystyle  = \displaystyle  \left(S^{\dagger}\right)^{2}\ \ \ \ \ (51)
\displaystyle  \displaystyle  = \displaystyle  S^{2} \ \ \ \ \ (52)

Thus if we restrict {S} to be the positive square root, we must have {S^{\dagger}=S}.

The definition is also consistent with {U} being unitary, since

\displaystyle   UU^{\dagger} \displaystyle  = \displaystyle  \left(H^{-1}A\right)\left(H^{-1}A\right)^{\dagger}\ \ \ \ \ (53)
\displaystyle  \displaystyle  = \displaystyle  H^{-1}AA^{\dagger}H^{-1}\ \ \ \ \ (54)
\displaystyle  \displaystyle  = \displaystyle  \left(AA^{\dagger}\right)^{-1/2}AA^{\dagger}\left(AA^{\dagger}\right)^{-1/2}\ \ \ \ \ (55)
\displaystyle  \displaystyle  = \displaystyle  I \ \ \ \ \ (56)

[We define {\left(AA^{\dagger}\right)^{-1/2}} to be the inverse of {\left(AA^{\dagger}\right)^{1/2}}.]

Therefore, we can uniquely decompose any Lorentz transformation {\Lambda\left(A\right)} into

\displaystyle  \Lambda\left(A\right)=\Lambda\left(H\right)\Lambda\left(U\right) \ \ \ \ \ (57)

that is, the product of a pure rotation and a pure boost.

Lorentz transformations and the special linear group SL(2,C)

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, we start off with the representation of a spacetime 4-vector as a {2\times2} complex Hermitian matrix:

\displaystyle  \widehat{x}\equiv\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right] \ \ \ \ \ (1)

Our ultimate goal is to show that any Lorentz transformation can be represented as the product of a pure rotation {R} and a pure boost {B}: {\Lambda=RB}. The step shown in this post may look like little more than an exercise in matrix algebra, but be patient; it takes a while to get to our final goal.

We start by looking at the matrices belonging to the special linear group {SL\left(2,\mathbb{C}\right)}, which consists of {2\times2} matrices containing general complex numbers as elements, and with determinant 1. Each matrix {A\in SL\left(2,\mathbb{C}\right)} can be used to define a linear transformation of the Hermitian matrix 1:

\displaystyle  \widehat{x}^{\prime}=A\widehat{x}A^{\dagger} \ \ \ \ \ (2)

Because the determinant of a product is equal to the product of the determinants, and {\det A=\det A^{\dagger}=1}, {\det\widehat{x}^{\prime}=\det\widehat{x}=x_{\mu}x^{\mu}}. Thus such a transformation leaves the 4-vector length unchanged, so qualifies as a Lorentz transformation. Also, as a general complex {2\times2} matrix contains 4 elements, each with a real and imaginary part, there are 8 parameters. The condition {\det A=1} provides 2 constraints (one on the real part and one on the imaginary part), leaving 6 independent parameters, which is the same as the number of free parameters in a general Lorentz transformation.

We can give a more detailed proof that {A} provides a Lorentz transformation as follows. Suppose we start with two matrices {A,B\in SL\left(2,\mathbb{C}\right)} and define a transformation

\displaystyle  \widehat{x}^{\prime}=A\widehat{x}B \ \ \ \ \ (3)

[Remember that the hats on {\widehat{x}} and {\widehat{x}^{\prime}} mean that we’re considering the {2\times2} matrix version 1 of the 4-vectors {x} and {x^{\prime}}.] The transformed matrix {\widehat{x}^{\prime}} must be Hermitian for all {\widehat{x}}, so we must have

\displaystyle   \left(A\widehat{x}B\right)^{\dagger} \displaystyle  = \displaystyle  A\widehat{x}B\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  B^{\dagger}\widehat{x}A^{\dagger} \ \ \ \ \ (5)

We now left-multiply by {\left(B^{\dagger}\right)^{-1}} and right-multiply by {B^{-1}} to get

\displaystyle  \left(B^{\dagger}\right)^{-1}A\widehat{x}=\widehat{x}A^{\dagger}B^{-1} \ \ \ \ \ (6)

But we also have

\displaystyle  \left(B^{\dagger}\right)^{-1}A=\left(A^{\dagger}B^{-1}\right)^{\dagger} \ \ \ \ \ (7)

so the matrix

\displaystyle  T\equiv\left(B^{\dagger}\right)^{-1}A \ \ \ \ \ (8)

is Hermitian. We can therefore write 6 as

\displaystyle  T\widehat{x}=\widehat{x}T^{\dagger}=\widehat{x}T \ \ \ \ \ (9)

so {T} commutes with {\widehat{x}} for all {\widehat{x}}.

Now we can choose {x=\sigma_{2}} and then {x=\sigma_{3}}, where the {\sigma_{i}}s are two of the Pauli matrices which we showed (together with the identity matrix {\sigma_{0}}) form a basis for the space of {2\times2} Hermitian matrices. Now we’ve seen that{\sigma_{2}} and {\sigma_{3}} also form an irreducible set, and we saw that any matrix {T} that commutes with all the members of an irreducible set must be a multiple of the identity matrix. Thus we must have

\displaystyle  T=\lambda I \ \ \ \ \ (10)

for some constant {\lambda}. However, since {T} is the product of two matrices {A} and {\left(B^{\dagger}\right)^{-1}}, both of which have determinant 1, {\det T=1} also, which means that {\lambda^{2}=1} and {\lambda=\pm1}. Therefore

\displaystyle   \left(B^{\dagger}\right)^{-1}A \displaystyle  = \displaystyle  \pm I\ \ \ \ \ (11)
\displaystyle  A \displaystyle  = \displaystyle  \pm B^{\dagger} \ \ \ \ \ (12)

Thus the transformation 3 can be written as

\displaystyle  \widehat{x}^{\prime}=\pm A\widehat{x}A^{\dagger} \ \ \ \ \ (13)

To eliminate the {-} sign, suppose that

\displaystyle  \widehat{x}^{\prime}=-A\widehat{x}A^{\dagger} \ \ \ \ \ (14)

A Lorentz transformation giving this result can be written as

\displaystyle  \widehat{x}^{\prime}=\widehat{\Lambda x} \ \ \ \ \ (15)

where {\Lambda} is the {4\times4} matrix giving the Lorentz transformation of the original 4-vector {x}. In the original 4-vector notation, we have

\displaystyle   x_{\mu}^{\prime} \displaystyle  = \displaystyle  \sum_{\nu=0}^{3}\Lambda_{\mu\nu}x_{\nu}\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \left(\Lambda x\right)_{\mu} \ \ \ \ \ (17)

From the relation between the 4-vector and {2\times2} matrix representations, we have

\displaystyle  x_{\mu}^{\prime}=\left\langle \sigma_{\mu},\widehat{x}^{\prime}\right\rangle \ \ \ \ \ (18)

where {\left\langle \sigma_{\mu},\widehat{x}^{\prime}\right\rangle } is the inner product of the two matrices. Therefore from 14

\displaystyle   \left(\Lambda x\right)_{\mu} \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},\widehat{x}^{\prime}\right\rangle \ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{\mu},A\widehat{x}A^{\dagger}\right\rangle \ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{\mu},A\left(\sum_{\nu=0}^{3}\sigma_{\nu}x_{\nu}\right)A^{\dagger}\right\rangle \ \ \ \ \ (21)

If we choose {x=\left(1,0,0,0\right)}, we have

\displaystyle   \left(\Lambda x\right)_{0} \displaystyle  = \displaystyle  \Lambda_{00}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{0},A\left(\sum_{\nu=0}^{3}\sigma_{\nu}x_{\nu}\right)A^{\dagger}\right\rangle \ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{0},A\sigma_{0}A^{\dagger}\right\rangle \ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{0},AA^{\dagger}\right\rangle \ \ \ \ \ (25)
\displaystyle  \displaystyle  = \displaystyle  -\frac{1}{2}\mbox{Tr}\left(AA^{\dagger}\right)\ \ \ \ \ (26)
\displaystyle  \displaystyle  \le \displaystyle  0 \ \ \ \ \ (27)

where the penultimate line follows from the definition of the inner product. The last line follows because

\displaystyle  \mbox{Tr}\left(AA^{\dagger}\right)=\left|A_{11}\right|^{2}+\left|A_{22}\right|^{2}\ge0 \ \ \ \ \ (28)

Since we’re requiring the transformation to be orthochronous, we must have {\Lambda_{00}\ge1}, so we must exclude the {-} sign in 13, giving 2.

Finally, we can show that the transformation matrix {A} is unique, up to a sign. We can prove this by supposing that there are two different {SL\left(2,\mathbb{C}\right)} matrices {A} and {B} that give the same transformation for all {\widehat{x}}, that is

\displaystyle  A\widehat{x}A^{\dagger}=B\widehat{x}B^{\dagger} \ \ \ \ \ (29)

This implies

\displaystyle   B^{-1}A\widehat{x}A^{\dagger}\left(B^{\dagger}\right)^{-1} \displaystyle  = \displaystyle  \widehat{x}\ \ \ \ \ (30)
\displaystyle  \displaystyle  = \displaystyle  B^{-1}A\widehat{x}\left(B^{-1}A\right)^{\dagger} \ \ \ \ \ (31)

We can now choose {\widehat{x}=I}, which shows that

\displaystyle  \left(B^{-1}A\right)^{\dagger}=\left(B^{-1}A\right)^{-1} \ \ \ \ \ (32)

which means (by definition), {B^{-1}A} is unitary, so for all {\widehat{x}}

\displaystyle  \widehat{x}=B^{-1}A\widehat{x}\left(B^{-1}A\right)^{-1} \ \ \ \ \ (33)

This means that {B^{-1}A} commutes with {\widehat{x}} for all {\widehat{x}} (that’s the only way we can cancel {B^{-1}A} off the RHS). Using the same argument as above, we can choose {\widehat{x}} to be two of the Pauli matrices, which form an irreducible set. Since {B^{-1}A} commutes with both these matrices, it must be a multiple {\lambda} of the identity:

\displaystyle   B^{-1}A \displaystyle  = \displaystyle  \lambda I\ \ \ \ \ (34)
\displaystyle  A \displaystyle  = \displaystyle  \lambda B \ \ \ \ \ (35)

Since {\det A=\det B=1} and for a {2\times2} matrix {\det\left(\lambda B\right)=\lambda^{2}\det B}, we have {\lambda^{2}=1}, so {\lambda=\pm1}. Therefore {A} is unique up to a sign.

In summary, what we’ve done in this post is show that a restricted Lorentz transformation {\Lambda} (that is, one where {\det\Lambda=+1} and {\Lambda_{00}\ge1}) can be represented by a matrix {A\in SL\left(2,\mathbb{C}\right)} where {A} is unique up to a sign.

Lorentz transformations as rotations

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Before we apply Noether’s theorem to Lorentz transformations, we need to take a step back and look at a generalized version of the Lorentz transformation. Most introductory treatments of special relativity derive the Lorentz transformation as the transformation between two inertial frames that are moving at some constant velocity with respect to each other. This form of the transformations allows us to derive the usual consequences of special relativity such as length contraction and time dilation. However, it’s useful to look at a Lorentz transformation is a more general way.

The idea is to define a Lorentz transformation as any transformation that leaves the magnitude of all four-vectors {x} unchanged, where this magnitude is defined using the usual flat space metric {g^{\mu\nu}} so that

\displaystyle  x^{2}=x_{\mu}x^{\mu}=g^{\mu\nu}x_{\mu}x_{\nu}=x_{0}^{2}-x_{1}^{2}-x_{2}^{2}-x_{3}^{2} \ \ \ \ \ (1)

The flat space (Minkowski) metric is

\displaystyle  g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (2)

We know that the traditional Lorentz transformation between two inertial frames in relative motion satisfies this condition, but in fact a rotation of the coordinate system in 3-d space (leaving the time coordinate unchanged) also satisfies this condition, so a Lorentz transformation defined in this more general way includes more transformations than the traditional one.

We can define this general transformation in terms of a {4\times4} matrix {\Lambda}, so that a four-vector {x} transforms to another vector {x^{\prime}} according to

\displaystyle  x^{\prime}=\Lambda x \ \ \ \ \ (3)

We can define the scalar product of two 4-vectors using the notation

\displaystyle  \left\langle x,y\right\rangle \equiv\sum_{i=0}^{3}x_{i}y_{i} \ \ \ \ \ (4)

The scalar product in flat space using the Minkowski metric {g} is therefore

\displaystyle  \left\langle x,gy\right\rangle =g^{\mu\nu}x_{\mu}y_{\nu}=x_{0}y_{0}-x_{1}y_{1}-x_{2}y_{2}-x_{3}y_{3} \ \ \ \ \ (5)

In matrix notation, in which {x} and {y} are column vectors, this is

\displaystyle  \left\langle x,gy\right\rangle =x^{T}gy \ \ \ \ \ (6)

In this way, the condition that {\Lambda} leaves the magnitude unchanged is

\displaystyle  \left\langle \Lambda x,g\Lambda x\right\rangle =\left\langle x,gx\right\rangle \ \ \ \ \ (7)

for all {x}. In matrix notation, this is

\displaystyle  \left(\Lambda x\right)^{T}g\Lambda x=x^{T}\Lambda^{T}g\Lambda x=x^{T}gx \ \ \ \ \ (8)

from which we get one condition on {\Lambda}:

\displaystyle  \Lambda^{T}g\Lambda=g \ \ \ \ \ (9)

[Note that Jaffe uses a superscript {tr} to indicate a matrix transpose; I find this confusing as {tr} usually means the trace of a matrix, and a superscript {T} is more usual for the transpose.]

Because both sides of 9 refer to a symmetric matrix (on the LHS, {\left(\Lambda^{T}g\Lambda\right)^{T}=\Lambda^{T}g^{T}\left(\Lambda^{T}\right)^{T}=\Lambda^{T}g\Lambda}), this equation gives 10 independent equations for the elements of {\Lambda}, so the number of parameters that can be specified arbitrarily is {4\times4-10=6}.

The set {\mathcal{L}} of all Lorentz transformations forms a group under matrix multiplication, known as the Lorentz group. We can demonstrate this by showing that the four group properties are satisfied.

First, completeness. If we perform two transformations in succession on a 4-vector {x} then we get {x^{\prime}=\Lambda_{2}\Lambda_{1}x}. The compound transformation satisfies 9:

\displaystyle   \left(\Lambda_{2}\Lambda_{1}\right)^{T}g\Lambda_{2}\Lambda_{1} \displaystyle  = \displaystyle  \Lambda_{1}^{T}\Lambda_{2}^{T}g\Lambda_{2}\Lambda_{1}\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \Lambda_{1}^{T}g\Lambda_{1}\ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  g \ \ \ \ \ (12)

Thus the group is closed under multiplication.

Second, associativity is automatically satisfied as matrix multiplication is associative.

An identity element exists in the form of the identity matrix {I}, which is itself a Lorentz transformation as it satisfies 9.

Finally, we need to show that every matrix {\Lambda} has an inverse that is also part of the set {\mathcal{L}}. Taking the determinant of 9 we have

\displaystyle   \det\left(\Lambda^{T}g\Lambda\right) \displaystyle  = \displaystyle  \left(\det\Lambda^{T}\right)\left(\det g\right)\left(\det\Lambda\right)\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  \left(\det\Lambda\right)\left(\det g\right)\left(\det\Lambda\right)\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  -\left(\det\Lambda\right)^{2} \ \ \ \ \ (15)

since {\det g=-1} from 2. From the RHS of 9, this must equal {\det g=-1} so we have

\displaystyle   -\left(\det\Lambda\right)^{2} \displaystyle  = \displaystyle  -1\ \ \ \ \ (16)
\displaystyle  \det\Lambda \displaystyle  = \displaystyle  \pm1 \ \ \ \ \ (17)

From a basic theorem in matrix algebra, any matrix with a non-zero determinant has an inverse, so {\Lambda^{-1}} exists. To show that {\Lambda^{-1}} is a Lorentz transformation, we can take the inverse of 9 and use the fact that {g^{-1}=g}:

\displaystyle   \left(\Lambda^{T}g\Lambda\right)^{-1} \displaystyle  = \displaystyle  g^{-1}=g\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{T}\right)^{-1}\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{-1}\right)^{T} \ \ \ \ \ (20)

since the inverse and transpose operations commute (another basic theorem in matrix algebra). Therefore {\Lambda^{-1}} is also a valid Lorentz transformation.

We can also see that {\Lambda^{T}} is a valid transformation by left-multiplying by {\Lambda} and right-multiplying by {\Lambda^{T}}:

\displaystyle   g \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{-1}\right)^{T}\ \ \ \ \ (21)
\displaystyle  \Lambda g\Lambda^{T} \displaystyle  = \displaystyle  \left(\Lambda\Lambda^{-1}\right)g\left(\Lambda^{-1}\right)^{T}\Lambda^{T}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  g \ \ \ \ \ (23)

We need one more property of {\Lambda} concerning the element {\Lambda_{00}}. Again starting from 9, the 00 component of the RHS is {g_{00}=1}, and writing out the 00 component of the LHS explicitly we have

\displaystyle  \left[\Lambda^{T}g\Lambda\right]_{00}=\Lambda_{00}^{2}-\sum_{i=1}^{3}\Lambda_{i0}^{2}=1 \ \ \ \ \ (24)

This gives

\displaystyle  \Lambda_{00}=\pm\sqrt{1+\sum_{i=1}^{3}\Lambda_{i0}^{2}} \ \ \ \ \ (25)

Thus either {\Lambda_{00}\ge1} or {\Lambda_{00}\le-1}.

From the determinant and {\Lambda_{00}}, we can classify a particular transformation matrix {\Lambda} as being in one of four so-called connected components. Jaffe spells out in detail the proof that these four components are disjoint, that is, we can’t define some parameter {s} that can be varied continuously to move a matrix {\Lambda} from one connected component to another connected component. The notation {\mathcal{L}_{+}^{\uparrow}} indicates the set of matrices with {\det\Lambda=+1} (indicated by the + subscript) and {\Lambda_{00}\ge1} (indicated by the {\uparrow} superscript). The other three connected components are {\mathcal{L}_{-}^{\uparrow}} ({\det\Lambda=-1}, {\Lambda_{00}\ge1}); {\mathcal{L}_{+}^{\downarrow}} ({\det\Lambda=+1}, {\Lambda_{00}\le1}); and {\mathcal{L}_{-}^{\downarrow}} ({\det\Lambda=-1}, {\Lambda_{00}\le1}). Not all of these subsets of {\mathcal{L}} form groups, as some of them are not closed under multiplication.

If {\det\Lambda=+1}, {\Lambda} is called proper, and if{\det\Lambda=-1}, {\Lambda} is called improper. If {\Lambda_{00}\ge+1}, {\Lambda} is orthochronous, and if{\Lambda_{00}\le-1}, {\Lambda} is non-orthochronous. From here on, we’ll consider only proper orthochronous transformations, that is, the connected component {\mathcal{L}_{+}^{\uparrow}}.

Members of {\mathcal{L}_{+}^{\uparrow}} can be subdivided again into two types: pure rotations and pure boosts. A pure rotation is a rotation (about the origin) in 3-d space, leaving the time coordinate unchanged. That is, {\Lambda_{00}=+1}. Such a transformation can be written as

\displaystyle  \Lambda=\left[\begin{array}{cc} 1 & 0\\ 0 & \mathcal{R} \end{array}\right] \ \ \ \ \ (26)

where {\mathcal{R}} is a {3\times3} matrix, and the 0s represent 3 zero components in the top row and first column. We know that the off-diagonal elements in the first column must be zero, since if {\Lambda_{00}=+1}, we have from 25 that

\displaystyle  \sum_{i=1}^{3}\Lambda_{i0}^{2}=0 \ \ \ \ \ (27)

Since {\Lambda^{T}} must also be a valid transformation, this gives the analogous equation

\displaystyle  \sum_{i=1}^{3}\Lambda_{0i}^{2}=0 \ \ \ \ \ (28)

Thus the off-diagonal elements of the top row of {\Lambda} are also zero.

Since {\det\Lambda=1}, we must have {\det\mathcal{R}=1}. From 9, {\mathcal{R}} must also be an orthogonal matrix, that is, its rows must be mutually orthogonal (as must its columns). For example, if we pick the 2,3 element in the product 9, we have

\displaystyle   \left[\Lambda^{T}g\Lambda\right]_{23} \displaystyle  = \displaystyle  g_{23}=0\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  -\sum_{i=1}^{3}\Lambda_{i2}\Lambda_{i3} \ \ \ \ \ (30)

Thus columns 2 and 3 must be orthogonal.

These matrices form a group known as {SO\left(3\right)}, the group of real, orthogonal, {3\times3} matrices with {\det\mathcal{R}=+1}. A familiar example is a rotation by an angle {\theta} about the {z} axis, for which

\displaystyle  \mathcal{R}=\left[\begin{array}{ccc} \cos\theta & -\sin\theta & 0\\ \sin\theta & \cos\theta & 0\\ 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (31)

giving the full transformation matrix as

\displaystyle  \Lambda=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & \cos\theta & -\sin\theta & 0\\ 0 & \sin\theta & \cos\theta & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (32)

In general, a rotation can be about any axis through the origin, in which case {\mathcal{R}} gets more complicated, but the idea is the same.

We’ve already seen that a pure boost, that is, a transformation into a second inertial frame moving at some constant velocity in a given direction relative to the first frame, can be written as a rotation, if we use hyperbolic functions instead of trig functions. In this case {\Lambda_{00}>+1}. The standard situation from introductory special relativity is that of from {S^{\prime}} moving along the {x_{1}} axis at some constant speed {\beta}. If we define

\displaystyle   \cosh\chi \displaystyle  \equiv \displaystyle  \gamma=\frac{1}{\sqrt{1-\beta^{2}}}\ \ \ \ \ (33)
\displaystyle  \sinh\chi \displaystyle  \equiv \displaystyle  \beta\gamma=\frac{\beta}{\sqrt{1-\beta^{2}}} \ \ \ \ \ (34)

then the transformation is

\displaystyle  \Lambda=\left[\begin{array}{cccc} \cosh\chi & \sinh\chi & 0 & 0\\ \sinh\chi & \cosh\chi & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (35)

This has determinant +1 since {\cosh^{2}\chi-\sinh^{2}\chi=1}. We can verify by direct substitution that 9 is satisfied.

It turns out that all proper, orthochronous Lorentz transformations can be written as the product of a pure rotation and a pure boost, that is

\displaystyle  \Lambda=BR \ \ \ \ \ (36)

where the pure rotation {R} is applied first, followed by a pure boost {B}. (Jaffe doesn’t prove this at this point; we’ll return to this later.)

Decomposition of a rank 2 tensor

References: Anthony Zee, Einstein Gravity in a Nutshell, (Princeton University Press, 2013) – Chapter I.4, Problem 2.

In Zee’s book, he defines a tensor as “something that transforms like a tensor”. For a tensor with {N} indices, under a rotation specified by the matrix {R}, the transformation of the tensor is given by multiplying the original tensor by one copy of {R} for each index. For a 2-index tensor, for example

\displaystyle  T^{\prime ij}=R^{ik}R^{j\ell}T^{k\ell} \ \ \ \ \ (1)

Another way of looking at this transformation is to think of each component {T^{ij}} of the tensor as a separate object in its own right. We can then arrange these objects in a column matrix (I’m avoiding calling this column matrix a ‘vector’ since, as Zee points out, vectors have a specific transformation property that this column matrix doesn’t have, namely that it must transform under a rotation by a multiplication by a single instance of a rotation matrix {R}). For 3-d, for example, we have the 9-component matrix

\displaystyle  \mathcal{T}=\left[\begin{array}{c} T^{11}\\ T^{12}\\ \vdots\\ T^{33} \end{array}\right] \ \ \ \ \ (2)

Under a rotation, we see from 1 that the transformed tensor component {T^{\prime ij}} is a linear combination of the original components {T^{k\ell}}, where the coefficients of this linear transformation are found from the elements of the rotation matrix {R}. This means that we could define a matrix {\mathcal{D}} which, in 3-d, is of size {9\times9} and whose elements are composed of combinations of the elements of {R}. That is

\displaystyle  \mathcal{T}^{\prime}=\mathcal{D}\mathcal{T} \ \ \ \ \ (3)

For example

\displaystyle  \left(\mathcal{T}^{\prime}\right)^{11}=T^{\prime11}=\mathcal{D}^{ij}\mathcal{T}^{j} \ \ \ \ \ (4)

where the index {j} is summed from {j=1} to {j=9}. We can read off the first row of {\mathcal{D}} from 1, as this is the row of {\mathcal{D}} which provides the coefficients for producing the transformed component {T^{\prime11}}.

\displaystyle  \mathcal{D}^{1j}=\left[\begin{array}{cccccc} R^{11}R^{11} & R^{11}R^{12} & R^{11}R^{13} & R^{12}R^{11} & \ldots & R^{13}R^{13}\end{array}\right] \ \ \ \ \ (5)

For a general rank 2 tensor (a tensor having 2 indices), there aren’t any pre-defined symmetries, so all the elements are independent of each other. As such, a transformed component {T^{\prime ij}} could have a contribution from all 9 of the original components {T^{k\ell}}. However, it’s possible to create linear combinations of the original {T^{ij}}s such that a subset of these linear combinations transform into each other.

One such subset contains the antisymmetric combinations

\displaystyle  A^{ij}\equiv T^{ij}-T^{ji} \ \ \ \ \ (6)

Zee shows that an antisymmetric component transforms as

\displaystyle  A^{\prime ij}=R^{ik}R^{j\ell}A^{k\ell} \ \ \ \ \ (7)

That is, antisymmetric components transform as linear combinations of only other antisymmetric components. In 3-d the index {i} in {A^{ij}} can have 3 values, while {j} can have only 2 (since {A^{ii}} is always zero by definition, we don’t count it). Also, since we’re after only components that are linearly independent of each other, we don’t count {A^{ji}} once we’ve counted {A^{ij}}, so there are a total of {\frac{3\times2}{2}=3} independent {A^{ij}}. In {D} dimensions, there are {\frac{1}{2}D\left(D-1\right)} independent antisymmetric combinations. These components transform entirely within their own private subset.

We can also define a set {S^{ij}} of symmetric components as

\displaystyle  S^{ij}\equiv T^{ij}+T^{ji} \ \ \ \ \ (8)

These components transform as follows:

\displaystyle   S^{\prime ij} \displaystyle  = \displaystyle  T^{\prime ij}+T^{\prime ji}\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  R^{ik}R^{j\ell}T^{k\ell}+R^{jk}R^{i\ell}T^{k\ell}\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  R^{ik}R^{j\ell}T^{k\ell}+R^{j\ell}R^{ik}T^{\ell k}\ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  R^{ik}R^{j\ell}\left(T^{k\ell}+T^{\ell k}\right)\ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  R^{ik}R^{j\ell}S^{k\ell} \ \ \ \ \ (13)

In the third line, we swapped the dummy summed indices {k} and {\ell}. Thus the symmetric combinations also transform within their own subset. There are {\frac{1}{2}D\left(D-1\right)} plus the {D} diagonal components {S^{ii}} (no sum) which are, in general, non-zero, for a total of {\frac{1}{2}D\left(D+1\right)} symmetric components. Together the antisymmetric and symmetric components contain all {\frac{1}{2}D\left(D-1\right)+\frac{1}{2}D\left(D+1\right)=D^{2}} independent linear combinations in the original tensor {T^{ij}}. This means that any of the original tensor components can be written as a combination of the {A^{ij}} and {S^{ij}} as

\displaystyle  T^{ij}=\frac{1}{2}\left(A^{ij}+S^{ij}\right) \ \ \ \ \ (14)

This decomposition also works for diagonal elements since {A^{ii}=0} and {S^{ii}=2T^{ii}} (no sums).

If we write the original tensor in terms of the {A^{ij}} and {S^{ij}}, then (in 3-d) the matrix {\mathcal{D}} decomposes into a block diagonal matrix with a {3\times3} block for the {A^{ij}} and a {6\times6} block for the {S^{ij}}. That is, the transformation equation becomes

\displaystyle   T^{\prime ij} \displaystyle  = \displaystyle  \frac{1}{2}\left(A^{\prime ij}+S^{\prime ij}\right)\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}R^{ik}R^{j\ell}\left(A^{k\ell}+S^{k\ell}\right) \ \ \ \ \ (16)

For example, if we want {T^{\prime32}} we have

\displaystyle   T^{\prime32} \displaystyle  = \displaystyle  \frac{1}{2}\left(A^{\prime32}+S^{\prime32}\right)\ \ \ \ \ (17)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\left(-A^{\prime23}+S^{\prime23}\right)\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}R^{2k}R^{3\ell}\left(-A^{k\ell}+S^{k\ell}\right) \ \ \ \ \ (19)

The sums over {A^{k\ell}} and {S^{k\ell}} can now be worked out using the symmetry properties of these elements. For {A^{k\ell}} we have

\displaystyle   -R^{2k}R^{3\ell}A^{k\ell} \displaystyle  = \displaystyle  -R^{21}R^{32}A^{12}-R^{21}R^{33}A^{13}-R^{22}R^{31}A^{21}-\ \ \ \ \ (20)
\displaystyle  \displaystyle  \displaystyle  R^{22}R^{33}A^{23}-R^{23}R^{31}A^{31}-R^{23}R^{32}A^{32}\nonumber
\displaystyle  \displaystyle  = \displaystyle  \left(R^{22}R^{31}-R^{21}R^{32}\right)A^{12}+\left(R^{23}R^{31}-R^{21}R^{33}\right)A^{13}+\ \ \ \ \ (21)
\displaystyle  \displaystyle  \displaystyle  \left(R^{23}R^{32}-R^{22}R^{33}\right)A^{23}\nonumber

Thus the third row of the {3\times3} block in the matrix {\mathcal{D}} (which is used to calculate {A^{\prime23}}) is

\displaystyle  \left[\begin{array}{ccc} \left(R^{21}R^{32}-R^{22}R^{31}\right) & \left(R^{21}R^{33}-R^{23}R^{31}\right) & \left(R^{22}R^{33}-R^{23}R^{32}\right)\end{array}\right] \ \ \ \ \ (22)

We could do a similar calculation for {S^{ij}} except this time we’d get 6 terms in the transformation.

In fact the symmetric part of {\mathcal{D}} can be decomposed further by observing that the trace of the symmetric submatrix is invariant under rotation, as Zee shows in his equation 6 (sum implied over {i}):

\displaystyle  S^{\prime ii}=S^{ii} \ \ \ \ \ (23)

Therefore the {6\times6} matrix breaks into a {1\times1} matrix and a {5\times5} matrix. Zee shows that the components of the {5\times5} block (or {D-1\times D-1} in the {D}-dimensional case) are given by

\displaystyle  \tilde{S}^{ij}=S^{ij}-\delta^{ij}\frac{S^{kk}}{D} \ \ \ \ \ (24)

Zee gives an example in 3-d showing that the components of {\tilde{S}^{ij}} do indeed transform into themselves.

Tensors transform like tensors

References: Anthony Zee, Einstein Gravity in a Nutshell, (Princeton University Press, 2013) – Chapter I.4, Problem 1.

I’ve done a lot of posts involving tensors, but Zee’s approach is a simpler way of looking at them, so it’s worth revisiting the definition of a tensor. Zee’s definition of a tensor is that it is “something that transforms like a tensor”. Of course, in order for this definition to make sense, we need to know how a tensor transforms. If we have a tensor with 2 indices such as {T^{ij}}, then it must transform under a rotation according to (using the summation convention):

\displaystyle  T^{\prime ij}=R^{ik}R^{j\ell}T^{k\ell} \ \ \ \ \ (1)

[At this stage, we’re not distinguishing between upper and lower indices; that will come later.] Thus a vector is a tensor with a single index.

Example 1 The gradient of a scalar is a tensor (actually a vector), as we can see using the following.

\displaystyle  \left(\nabla\phi\right)^{i}\equiv\frac{\partial\phi}{\partial x^{i}} \ \ \ \ \ (2)

If we rotate the coordinate system, then the new coordinates are related to the old ones by

\displaystyle  x^{\prime i}=R^{ij}x^{j} \ \ \ \ \ (3)

so using the chain rule

\displaystyle   \frac{\partial}{\partial x^{\prime i}} \displaystyle  = \displaystyle  \frac{\partial x^{j}}{\partial x^{\prime i}}\frac{\partial}{\partial x^{j}}\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  R^{ij}\frac{\partial}{\partial x^{j}} \ \ \ \ \ (5)

Therefore the rotation matrix is given by

\displaystyle  R^{ij}=\frac{\partial x^{j}}{\partial x^{\prime i}} \ \ \ \ \ (6)

The rotated gradient is thus

\displaystyle   \left(\nabla^{\prime}\phi\right)^{i} \displaystyle  = \displaystyle  \frac{\partial\phi}{\partial x^{\prime i}}\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  \frac{\partial x^{j}}{\partial x^{\prime i}}\frac{\partial\phi}{\partial x^{j}}\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  R^{ij}\frac{\partial\phi}{\partial x^{j}}\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  R^{ij}\left(\nabla\phi\right)^{j} \ \ \ \ \ (10)

The gradient therefore transforms like a tensor so it is a tensor.

Example 2 What about the square of the gradient of a scalar? We have

\displaystyle  \left(\nabla\phi\right)\cdot\left(\nabla\phi\right)=\sum_{i}\left(\frac{\partial\phi}{\partial x^{i}}\right)^{2} \ \ \ \ \ (11)

Rotating the coordinates gives

\displaystyle   \left(\nabla^{\prime}\phi\right)\cdot\left(\nabla^{\prime}\phi\right) \displaystyle  = \displaystyle  \sum_{i}\left(\frac{\partial\phi}{\partial x^{\prime i}}\right)^{2}\ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  R^{ij}\frac{\partial\phi}{\partial x^{j}}R^{ik}\frac{\partial\phi}{\partial x^{k}} \ \ \ \ \ (13)

We can now use the fact that {R^{T}=R^{-1}} so {R^{ij}=\left(R^{-1}\right)^{ji}}:

\displaystyle   \left(\nabla^{\prime}\phi\right)\cdot\left(\nabla^{\prime}\phi\right) \displaystyle  = \displaystyle  \left(R^{-1}\right)^{ji}R^{ik}\frac{\partial\phi}{\partial x^{j}}\frac{\partial\phi}{\partial x^{k}}\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \delta^{jk}\frac{\partial\phi}{\partial x^{j}}\frac{\partial\phi}{\partial x^{k}}\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \sum_{j}\left(\frac{\partial\phi}{\partial x^{j}}\right)^{2}\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \left(\nabla\phi\right)\cdot\left(\nabla\phi\right) \ \ \ \ \ (17)

Thus {\left(\nabla\phi\right)\cdot\left(\nabla\phi\right)} is invariant under rotation, so it’s a scalar.

Example 3 The Laplacian of a scalar is also a scalar. The Laplacian is defined as

\displaystyle   \nabla^{2}\phi \displaystyle  \equiv \displaystyle  \sum_{i}\frac{\partial^{2}\phi}{\left(\partial x^{i}\right)^{2}}\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \frac{\partial}{\partial x^{i}}\left(\frac{\partial\phi}{\partial x^{i}}\right) \ \ \ \ \ (19)

Under rotation, we have

\displaystyle   \nabla^{\prime2}\phi \displaystyle  = \displaystyle  \frac{\partial}{\partial x^{\prime i}}\left(\frac{\partial\phi}{\partial x^{\prime i}}\right)\ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  R^{ij}\frac{\partial}{\partial x^{j}}\left(R^{ik}\frac{\partial\phi}{\partial x^{k}}\right)\ \ \ \ \ (21)
\displaystyle  \displaystyle  = \displaystyle  \left(R^{-1}\right)^{ji}R^{ik}\frac{\partial}{\partial x^{j}}\left(\frac{\partial\phi}{\partial x^{k}}\right)\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  \delta^{jk}\frac{\partial}{\partial x^{j}}\left(\frac{\partial\phi}{\partial x^{k}}\right)\ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  \frac{\partial}{\partial x^{j}}\left(\frac{\partial\phi}{\partial x^{j}}\right)\ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  \nabla^{2}\phi \ \ \ \ \ (25)

Note that the rotation matrix {R} is a constant with respect to {\frac{\partial}{\partial x^{j}}} since we’re considering a fixed rotation. The derivative measures how the scalar field {\phi} varies as we move from one point to another, whereas {R} relates one set of fixed coordinates to another set of fixed coordinates.

Thus {\nabla^{2}\phi} is invariant under rotation, so it’s a scalar.

Average of a vector over all directions

References: Anthony Zee, Einstein Gravity in a Nutshell, (Princeton University Press, 2013) – Chapter I.3, Problem 5.

In this problem, Zee gives us a 3-d vector {\vec{p}} and asks us to find the quantity {p^{i}p^{j}} averaged over the direction of {\vec{p}}. I wasn’t entirely clear what this question was asking, since once you’ve defined {\vec{p}}, surely its direction is fixed so how can you average over a fixed direction? I think what he means is: suppose you rotate {\vec{p}} through all possible angles in 3-d while keeping its magnitude {\left|\vec{p}\right|} fixed. What then is the average of {p^{i}p^{j}} over all these rotations?

There is also a mistake in the formula he gives for the average. If {\theta} and {\phi} are the usual spherical angles, then the average of {p^{i}p^{j}} is given by

\displaystyle  \left\langle p^{i}p^{j}\right\rangle =\frac{1}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin\theta p^{i}p^{j} \ \ \ \ \ (1)

(not {\cos\theta} in the integral).

We can work out the integral by writing {p^{i}} in rectangular coordinates in the usual way:

\displaystyle   p^{x} \displaystyle  = \displaystyle  p\sin\theta\cos\phi\ \ \ \ \ (2)
\displaystyle  p^{y} \displaystyle  = \displaystyle  p\sin\theta\sin\phi\ \ \ \ \ (3)
\displaystyle  p^{z} \displaystyle  = \displaystyle  p\cos\theta \ \ \ \ \ (4)

From symmetry, {\left\langle p^{i}p^{j}\right\rangle =0} if {i\ne j}. If this isn’t obvious, suppose we’re looking at {\left\langle p^{x}p^{y}\right\rangle }, and we pick one direction specified by angles {\theta_{1}} and {\phi_{1}}. Then the initial coordinates are

\displaystyle   p_{1}^{x} \displaystyle  = \displaystyle  p\sin\theta_{1}\cos\phi_{1}\ \ \ \ \ (5)
\displaystyle  p_{1}^{y} \displaystyle  = \displaystyle  p\sin\theta_{1}\sin\phi_{1} \ \ \ \ \ (6)

If we then rotate {\vec{p}} by keeping {\theta_{1}} constant and rotating {\phi} by {2\left(\frac{\pi}{2}-\phi_{1}\right)} so that {\phi_{2}=\phi_{1}+2\left(\frac{\pi}{2}-\phi_{1}\right)=\pi-\phi_{1}} then

\displaystyle   p_{2}^{x} \displaystyle  = \displaystyle  p\sin\theta_{1}\cos\left(\pi-\phi_{1}\right)=-p\sin\theta_{1}\cos\phi_{1}=-p_{1}^{x}\ \ \ \ \ (7)
\displaystyle  p_{2}^{y} \displaystyle  = \displaystyle  p\sin\theta_{1}\sin\left(\pi-\phi_{1}\right)=p\sin\theta_{1}\sin\phi_{1}=p_{1}^{y} \ \ \ \ \ (8)

Thus this position for {\vec{p}} cancels the original position in the average, so the overall average {\left\langle p^{x}p^{y}\right\rangle =0}. A similar argument applies to {\left\langle p^{x}p^{z}\right\rangle } and {\left\langle p^{y}p^{z}\right\rangle }.

We could also work out these averages by direct integration. For example

\displaystyle   \left\langle p^{x}p^{y}\right\rangle \displaystyle  = \displaystyle  \frac{1}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin\theta\left(p\sin\theta\cos\phi\right)\left(p\sin\theta\sin\phi\right)\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \frac{p^{2}}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin^{3}\theta\sin\phi\cos\phi\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (11)

For the other terms, we have

\displaystyle   \left\langle p^{x}p^{x}\right\rangle \displaystyle  = \displaystyle  \frac{1}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin\theta\left(p\sin\theta\cos\phi\right)^{2}\ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  \frac{p^{2}}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin^{3}\theta\cos^{2}\phi\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  \frac{p^{2}}{3}\ \ \ \ \ (14)
\displaystyle  \left\langle p^{y}p^{y}\right\rangle \displaystyle  = \displaystyle  \frac{1}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin\theta\left(p\sin\theta\sin\phi\right)^{2}\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \frac{p^{2}}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin^{3}\theta\sin^{2}\phi\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \frac{p^{2}}{3}\ \ \ \ \ (17)
\displaystyle  \left\langle p^{z}p^{z}\right\rangle \displaystyle  = \displaystyle  \frac{1}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin\theta\left(p\cos\theta\right)^{2}\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \frac{p^{2}}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin\theta\cos^{2}\theta\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \frac{p^{2}}{3} \ \ \ \ \ (20)

In summary

\displaystyle  \left\langle p^{i}p^{j}\right\rangle =\frac{1}{4\pi}\int_{0}^{\pi}\int_{0}^{2\pi}d\theta d\phi\sin\theta p^{i}p^{j}=\frac{p^{2}}{3}\delta^{ij} \ \ \ \ \ (21)

Vectors and rotations

References: Anthony Zee, Einstein Gravity in a Nutshell, (Princeton University Press, 2013) – Chapter I.3, Problem 1.

When you first meet vectors in a linear algebra course, it’s easy to get the impression that a vector is (in 3-d) just 3 numbers placed in a column, like

\displaystyle  \vec{a}=\left[\begin{array}{c} 1\\ 2\\ 3 \end{array}\right] \ \ \ \ \ (1)

However, as used by Zee, a vector is a specific type of object that transforms under a rotation according to

\displaystyle  \vec{a}^{\prime}=R\vec{a} \ \ \ \ \ (2)

where {R} is a 3-d rotation matrix. One consequence of this is that if we start with a vector {\vec{p}} (that is, an object that does transform properly under a rotation), then we can’t just take the components of {\vec{p}} and combine them in an arbitrary way to create another object that is a column of 3 numbers, expecting that new object to be a vector in the sense just defined. That is, the new object probably won’t transform properly under a rotation.

For example, suppose we have 2 vectors {\vec{p}} and {\vec{q}} that do transform properly under a rotation. This means that (using the summation convention)

\displaystyle   p^{\prime i} \displaystyle  = \displaystyle  R^{ij}p^{j}\ \ \ \ \ (3)
\displaystyle  q^{\prime i} \displaystyle  = \displaystyle  R^{ij}q^{j} \ \ \ \ \ (4)

Now consider the array

\displaystyle  A=\left[\begin{array}{c} p^{2}q^{3}\\ p^{3}q^{1}\\ p^{1}q^{2} \end{array}\right] \ \ \ \ \ (5)

Is this a vector? To check, we must use the transformation equations above for the components of {\vec{p}} and {\vec{q}}, since we know how these transform. We get, for the first component:

\displaystyle  A^{\prime1}=p^{\prime2}q^{\prime3}=R^{2j}p^{j}R^{3k}q^{k} \ \ \ \ \ (6)

In order for this to transform properly, we would need this to be equal to {R^{1m}A^{m}}, that is

\displaystyle  R^{1m}A^{m}=R^{11}p^{2}q^{3}+R^{12}p^{3}q^{1}+R^{13}p^{1}q^{2} \ \ \ \ \ (7)

In order for that to be true, we’d need

\displaystyle   R^{22}R^{33} \displaystyle  = \displaystyle  R^{11}\ \ \ \ \ (8)
\displaystyle  R^{23}R^{31} \displaystyle  = \displaystyle  R^{12}\ \ \ \ \ (9)
\displaystyle  R^{21}R^{33} \displaystyle  = \displaystyle  R^{13} \ \ \ \ \ (10)

This isn’t true as can be seen by looking at a rotation about the {x} axis:

\displaystyle  R_{x}\left(\theta_{x}\right)=\left(\begin{array}{ccc} 1 & 0 & 0\\ 0 & \cos\theta_{x} & \sin\theta_{x}\\ 0 & -\sin\theta_{x} & \cos\theta_{x} \end{array}\right) \ \ \ \ \ (11)

In this case, for example, {R^{22}R^{33}=\cos^{2}\theta_{x}\ne R^{11}=1}.

We can show that the vector cross product {\vec{p}\times\vec{q}} does transform properly under a general 3-d rotation. In this case

\displaystyle  \vec{p}\times\vec{q}=\left[\begin{array}{c} p^{2}q^{3}-p^{3}q^{2}\\ p^{3}q^{1}-p^{1}q^{3}\\ p^{1}q^{2}-p^{2}q^{1} \end{array}\right] \ \ \ \ \ (12)

Transforming the first component by transforming {\vec{p}} and {\vec{q}} separately first gives

\displaystyle   \left(p^{2}q^{3}-p^{3}q^{2}\right)^{\prime} \displaystyle  = \displaystyle  R^{2j}R^{3k}p^{j}p^{k}-R^{2k}R^{3j}p^{j}q^{k}\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  p^{j}q^{k}\left(R^{2j}R^{3k}-R^{2k}R^{3j}\right) \ \ \ \ \ (14)

Treating {\vec{p}\times\vec{q}} as a vector and transforming it directly gives

\displaystyle  \left(\vec{p}\times\vec{q}\right)^{\prime1}=R^{11}\left(p^{2}q^{3}-p^{3}q^{2}\right)+R^{12}\left(p^{3}q^{1}-p^{1}q^{3}\right)+R^{13}\left(p^{1}q^{2}-p^{2}q^{1}\right) \ \ \ \ \ (15)

Comparing this with 14 shows that the two expressions are equal if

\displaystyle   R^{11} \displaystyle  = \displaystyle  R^{22}R^{33}-R^{23}R^{32}\nonumber
\displaystyle  R^{12} \displaystyle  = \displaystyle  R^{31}R^{23}-R^{21}R^{33}\ \ \ \ \ (16)
\displaystyle  R^{13} \displaystyle  = \displaystyle  R^{21}R^{32}-R^{31}R^{22}\nonumber

The terms in 14 with {j=k} are all zero.

To verify these equations, we need to recall the two properties that {R} must satisfy: {\det R=1} and {R^{T}=R^{-1}}. The determinant condition gives us (expanding about the first row):

\displaystyle  R^{11}\left(R^{22}R^{33}-R^{23}R^{32}\right)-R^{12}\left(R^{21}R^{33}-R^{31}R^{23}\right)+R^{13}\left(R^{21}R^{32}-R^{31}R^{22}\right)=1 \ \ \ \ \ (17)

The condition {R^{T}=R^{-1}} gives

\displaystyle  R^{ij}R^{kj}=\delta^{ik} \ \ \ \ \ (18)

so for {i=k=1} we have

\displaystyle  \left(R^{11}\right)^{2}+\left(R^{12}\right)^{2}+\left(R^{13}\right)^{2}=1 \ \ \ \ \ (19)

This is the equation of a sphere of radius 1 in a rectangular coordinate system with coordinates {R^{11}}, {R^{12}} and {R^{13}}. Equation 17 is the equation of a plane in the same coordinate system, so to satisfy both equations means taking the intersection of the plane and the sphere. This requires equations 16 to be true, so the cross product is indeed a proper vector that transforms correctly under rotation.

Lie’s method of generating rotations

References: Anthony Zee, Einstein Gravity in a Nutshell, (Princeton University Press, 2013) – Chapter I.3, Problem 3.

As an introduction to the idea of invariance under a coordinate transformation, Zee looks at rotations in detail in his chapter I.3. He does most of the derivation in 2 dimensions, so we’ll summarize his results, but use 3 dimensions to illustrate.

A vector is defined as an object that transforms like the rectangular coordinates under rotation, that is for a vector {\vec{p}}, its form {\vec{p}^{\prime}} in a rotated system is given by the matrix equation

\displaystyle  \vec{p}^{\prime}=R\left(\theta\right)\vec{p} \ \ \ \ \ (1)

where {R} is the rotation matrix and {\theta} is the angle of rotation. In 2 dimensions, we’re assuming that {\theta} is a counterclockwise rotation about the origin; in higher dimensions, we need to specify the axis of rotation, so there will be different {R} matrices for different axes.

The square of a vector (equivalent to the square of its length) must remain unchanged by a rotation, so that

\displaystyle  \vec{p}^{\prime T}\cdot\vec{p}^{\prime}=\vec{p}^{T}\cdot\vec{p} \ \ \ \ \ (2)

where {T} denotes the transpose of the vector (converting a column vector into a row vector).

Since this holds for all vectors, it also holds for a sum of two vectors (which is also a vector), so:

\displaystyle   \left(\vec{u}^{\prime T}+\vec{v}^{\prime T}\right)\cdot\left(\vec{u}^{\prime}+\vec{v}^{\prime}\right) \displaystyle  = \displaystyle  \left(\vec{u}^{T}+\vec{v}^{T}\right)\cdot\left(\vec{u}+\vec{v}\right)\ \ \ \ \ (3)
\displaystyle  \vec{u}^{\prime T}\cdot\vec{v}^{\prime} \displaystyle  = \displaystyle  \vec{u}^{T}\cdot\vec{v} \ \ \ \ \ (4)

where the last line comes from multiplying out the first line and applying 2, and using the fact that {\vec{u}^{T}\cdot\vec{v}=\vec{v}^{T}\cdot\vec{u}}. Thus the scalar or dot product is also an invariant under rotation. This gives us a condition on {R}, since

\displaystyle   \vec{u}^{\prime T}\cdot\vec{v}^{\prime} \displaystyle  = \displaystyle  \left(R\vec{u}\right)^{T}\left(R\vec{v}\right)\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \vec{u}^{T}\left(R^{T}R\right)\vec{v}\ \ \ \ \ (6)
\displaystyle  R^{T}R \displaystyle  = \displaystyle  I \ \ \ \ \ (7)

where {I} is the identity matrix.

At this point, most introductory courses use a bit of geometry and trigonometry to arrive at the rotation matrix {R} in terms of sines and cosines. Zee takes an alternative approach by introducing Lie’s (pronounced “Lee”) method of deriving the rotation matrix. Lie starts by considering an infinitesimal rotation. Since no rotation at all amounts to {R=I} (no change in any vector), we can write

\displaystyle  R=I+A \ \ \ \ \ (8)

for an infinitesimal rotation, where {A} is a matrix that contains only infinitesimal elements (so that {A^{n}}, where {n\ge2}, terms can be neglected). From 7 we get

\displaystyle   R^{T}R \displaystyle  = \displaystyle  \left(I+A\right)^{T}\left(I+A\right)\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  I+A^{T}+A+\mathcal{O}\left(A^{2}\right)\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  I \ \ \ \ \ (11)

Therefore, {A} must be antisymmetric:

\displaystyle  A^{T}=-A \ \ \ \ \ (12)

An antisymmetric matrix always has only zeroes on the diagonal, so the only elements that can be specified independently are those in the upper (or lower) triangle. In {D} dimensions, there are {\sum_{n=1}^{D-1}n=\frac{1}{2}D\left(D-1\right)} such elements. Thus in 2 dimensions, there is only 1 element, in 3 dimensions there are 3, in 4 dimensions there are 6 and so on.

Looking at 3 dimensions, the 3 independent antisymmetric matrices are

\displaystyle   \mathcal{J}_{x} \displaystyle  = \displaystyle  \left(\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{array}\right)\ \ \ \ \ (13)
\displaystyle  \mathcal{J}_{y} \displaystyle  = \displaystyle  \left(\begin{array}{ccc} 0 & 0 & -1\\ 0 & 0 & 0\\ 1 & 0 & 0 \end{array}\right)\ \ \ \ \ (14)
\displaystyle  \mathcal{J}_{z} \displaystyle  = \displaystyle  \left(\begin{array}{ccc} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 0 \end{array}\right) \ \ \ \ \ (15)

These 3 matrices are known as the generators of the 3-d rotation group {SO\left(3\right)}.

Any 3-d antisymmetric matrix {A} can be written as a linear combination of these 3 matrices:

\displaystyle  A=\theta_{x}\mathcal{J}_{x}+\theta_{y}\mathcal{J}_{y}+\theta_{z}\mathcal{J}_{z} \ \ \ \ \ (16)

Let’s consider a rotation about the {x} axis, so that {\theta_{y}=\theta_{z}=0}. If we want to rotate through some finite (not infinitesimal) angle {\theta_{x}} we can do it by applying lots of infinitesimal rotations. For some large number {N}, we can rotate through {\theta_{x}} by applying {N} rotations of size {\theta_{x}/N}. That is, the rotation matrix {R} for a finite rotation is, in the limit,

\displaystyle   R\left(\theta_{x}\right) \displaystyle  = \displaystyle  \lim_{N\rightarrow\infty}\left[R\left(\frac{\theta_{x}}{N}\right)\right]^{N}\ \ \ \ \ (17)
\displaystyle  \displaystyle  = \displaystyle  \lim_{N\rightarrow\infty}\left[I+\frac{\theta_{x}\mathcal{J}_{x}}{N}\right]^{N} \ \ \ \ \ (18)

This limit turns out to be one of the definitions of the exponential function, that is

\displaystyle  e^{x}=\lim_{N\rightarrow\infty}\left(1+\frac{x}{N}\right)^{N} \ \ \ \ \ (19)

This can be proved by taking the derivative:

\displaystyle   \frac{d}{dx}e^{x} \displaystyle  = \displaystyle  \lim_{N\rightarrow\infty}\left[N\frac{1}{N}\left(1+\frac{x}{N}\right)^{N-1}\right]\ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  \lim_{N\rightarrow\infty}\left[\left(1+\frac{x}{N}\right)^{-1}\left(1+\frac{x}{N}\right)^{N}\right]\ \ \ \ \ (21)
\displaystyle  \displaystyle  = \displaystyle  \lim_{N\rightarrow\infty}\left(1+\frac{x}{N}\right)^{N}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  e^{x} \ \ \ \ \ (23)

Thus we can write 18 as

\displaystyle  R\left(\theta_{x}\right)=e^{\theta_{x}\mathcal{J}_{x}} \ \ \ \ \ (24)

The exponential of a matrix can be evaluated by writing it as a Taylor series for the exponential:

\displaystyle  e^{x}=1+\sum_{n=1}^{\infty}\frac{x^{n}}{n!} \ \ \ \ \ (25)

[I’ve separated out the 1 to avoid problems when {x=0}, for which we’d get {0^{0}} in the sum if we started at {n=0}.]

This might not seem to get us anywhere, since we’re faced with an infinite series of powers of the matrix {\mathcal{J}_{x}}. However, there are only 4 distinct values of these powers:

\displaystyle   \mathcal{J}_{x} \displaystyle  = \displaystyle  \left(\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{array}\right)\ \ \ \ \ (26)
\displaystyle  \mathcal{J}_{x}^{2} \displaystyle  = \displaystyle  \left(\begin{array}{ccc} 0 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & -1 \end{array}\right)\ \ \ \ \ (27)
\displaystyle  \mathcal{J}_{x}^{3} \displaystyle  = \displaystyle  \left(\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & -1\\ 0 & 1 & 0 \end{array}\right)=-\mathcal{J}_{x}\ \ \ \ \ (28)
\displaystyle  \mathcal{J}_{x}^{4} \displaystyle  = \displaystyle  \left(\begin{array}{ccc} 0 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{array}\right)=-\mathcal{J}_{x}^{2} \ \ \ \ \ (29)

After this, the powers repeat themselves in cycles of 4. Separating the odd an even powers, we get

\displaystyle   e^{\theta_{x}\mathcal{J}_{x}} \displaystyle  = \displaystyle  I+\sum_{k=1}^{\infty}\left(-1\right)^{k}\frac{\theta_{x}^{2k}}{\left(2k\right)!}\left(\begin{array}{ccc} 0 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{array}\right)+\nonumber
\displaystyle  \displaystyle  \displaystyle  \sum_{k=1}^{\infty}\left(-1\right)^{k-1}\frac{\theta_{x}^{2k-1}}{\left(2k-1\right)!}\left(\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{array}\right) \ \ \ \ \ (30)

The sums apply only to the lower right {2\times2} submatrix, for which we can use the series expansions of sine and cosine:

\displaystyle   \sin x \displaystyle  = \displaystyle  \sum_{k=1}^{\infty}\left(-1\right)^{k-1}\frac{x^{2k-1}}{\left(2k-1\right)!}\ \ \ \ \ (31)
\displaystyle  \cos x \displaystyle  = \displaystyle  1+\sum_{k=1}^{\infty}\left(-1\right)^{k}\frac{x^{2k}}{\left(2k\right)!} \ \ \ \ \ (32)

We therefore get

\displaystyle  R_{x}\left(\theta_{x}\right)=e^{\theta_{x}\mathcal{J}_{x}}=\left(\begin{array}{ccc} 1 & 0 & 0\\ 0 & \cos\theta_{x} & \sin\theta_{x}\\ 0 & -\sin\theta_{x} & \cos\theta_{x} \end{array}\right) \ \ \ \ \ (33)

By the same sort of calculation, we can get the rotation matrix about the {y} axis:

\displaystyle   R_{y}\left(\theta_{y}\right) \displaystyle  = \displaystyle  e^{\theta_{y}\mathcal{J}_{y}}\ \ \ \ \ (34)
\displaystyle  \displaystyle  = \displaystyle  \left(\begin{array}{ccc} \cos\theta_{y} & 0 & -\sin\theta_{y}\\ 0 & 1 & 0\\ \sin\theta_{y} & 0 & \cos\theta_{y} \end{array}\right) \ \ \ \ \ (35)

[Note the sign convention: the negative term {-\sin\theta_{y}} is in the upper right rather than the lower left. This is true for a right-handed {xyz} system.]

Note that rotations about the {x} and {y} axes don’t commute:

\displaystyle   R_{x}\left(\theta_{x}\right)R_{y}\left(\theta_{y}\right) \displaystyle  = \displaystyle  \left(\begin{array}{ccc} \cos\theta_{y} & 0 & -\sin\theta_{y}\\ \sin\theta_{x}\sin\theta_{y} & \cos\theta_{x} & \sin\theta_{x}\cos\theta_{y}\\ \cos\theta_{x}\sin\theta_{y} & -\sin\theta_{x} & \cos\theta_{x}\cos\theta_{y} \end{array}\right)\ \ \ \ \ (36)
\displaystyle  R_{y}\left(\theta_{y}\right)R_{x}\left(\theta_{x}\right) \displaystyle  = \displaystyle  \left(\begin{array}{ccc} \cos\theta_{y} & \sin\theta_{x}\sin\theta_{y} & -\sin\theta_{y}\cos\theta_{x}\\ 0 & \cos\theta_{x} & \sin\theta_{x}\\ \sin\theta_{y} & -\sin\theta_{x}\cos\theta_{y} & \cos\theta_{x}\cos\theta_{y} \end{array}\right)\ \ \ \ \ (37)
\displaystyle  \displaystyle  \ne \displaystyle  R_{x}\left(\theta_{x}\right)R_{y}\left(\theta_{y}\right) \ \ \ \ \ (38)

Conservation of momentum

References: Anthony Zee, Einstein Gravity in a Nutshell, (Princeton University Press, 2013) – Chapter I.2, problem 1.

The nature of the dependence of a force or potential on the underlying position coordinates can determine certain conservation laws. In his chapter I.2, Zee shows that a central force (a force that is always directed towards its source, such as the Earth’s gravity or a point charge’s electrostatic field) conserves angular momentum. Actually his derivation is a generalization to any number {D\ge2} dimensions of the more familiar proof in 3-d, which goes like this:

The angular momentum of a mass {m} is defined as

\displaystyle  \mathbf{L}=\mathbf{r}\times\mathbf{p} \ \ \ \ \ (1)

where {\mathbf{p}=m\mathbf{v}=m\dot{\mathbf{r}}} is the linear momentum. Taking the time derivative we get

\displaystyle   \dot{\mathbf{L}} \displaystyle  = \displaystyle  \dot{\mathbf{r}}\times\mathbf{p}+\mathbf{r}\times\dot{\mathbf{p}}\ \ \ \ \ (2)
\displaystyle  \displaystyle  = \displaystyle  m\dot{\mathbf{r}}\times\dot{\mathbf{r}}+\mathbf{r}\times\mathbf{F}\ \ \ \ \ (3)
\displaystyle  \displaystyle  = \displaystyle  0+0\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (5)

where the third line uses the fact that {\mathbf{r}\parallel\mathbf{F}} for a central force, so their cross product is zero. Thus {\mathbf{L}} doesn’t change with time.

Now suppose that a force {F} is the negative gradient of a potential function {V} so that by Newton’s law:

\displaystyle  m_{a}\frac{d^{2}x_{a}^{i}}{dt^{2}}=-\frac{\partial V\left(x\right)}{\partial x_{a}^{i}} \ \ \ \ \ (6)

where the index {a} refers to particle {a} in a collection of {N} interacting particles, and {i} is the component of the coordinate {x}. Note that the {x} in {V\left(x\right)} represents all {D} components of {x} (if we’re doing the calculation in {D}-dimensional space) and not just the magnitude of the distance.

Now suppose that {V} is a function only of the coordinate differences {x_{a}^{i}-x_{b}^{i}} between particles {a} and {b}, where {a,b=1,\ldots,N} with {a\ne b}. In this case, the total linear momentum is given by

\displaystyle  p^{i}=\sum_{a}m_{a}\frac{dx_{a}^{i}}{dt} \ \ \ \ \ (7)

The time derivative is

\displaystyle   \dot{p}^{i} \displaystyle  = \displaystyle  \sum_{a}m_{a}\frac{d^{2}x_{a}^{i}}{dt^{2}}\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  -\sum_{a}\frac{\partial V\left(x\right)}{\partial x_{a}^{i}} \ \ \ \ \ (9)

Since {V} is a function of the set of all possible differences {x_{a}^{i}-x_{b}^{i}}, the terms in the sum 9 cancel in pairs. For example, for 3 particles if {V=f\left(\left(x_{a}^{i}-x_{b}^{i}\right),\left(x_{a}^{i}-x_{c}^{i}\right),\left(x_{b}^{i}-x_{c}^{i}\right)\right)}, then {\frac{\partial V}{\partial x_{a}^{i}}} will contain a term equivalent to {\frac{\partial V}{\partial\left(x_{a}^{i}-x_{b}^{i}\right)}} (plus another term resulting from {\frac{\partial V}{\partial\left(x_{a}^{i}-x_{c}^{i}\right)}}). However, {\frac{\partial V}{\partial x_{b}^{i}}} will contain a term equivalent to {-\frac{\partial V}{\partial\left(x_{a}^{i}-x_{b}^{i}\right)}} which cancels the first term. That is

\displaystyle   \frac{\partial V}{\partial x_{a}^{i}} \displaystyle  = \displaystyle  \frac{\partial f}{\partial\left(x_{a}^{i}-x_{b}^{i}\right)}+\frac{\partial f}{\partial\left(x_{a}^{i}-x_{c}^{i}\right)}\ \ \ \ \ (10)
\displaystyle  \frac{\partial V}{\partial x_{b}^{i}} \displaystyle  = \displaystyle  -\frac{\partial f}{\partial\left(x_{a}^{i}-x_{b}^{i}\right)}+\frac{\partial f}{\partial\left(x_{b}^{i}-x_{c}^{i}\right)}\ \ \ \ \ (11)
\displaystyle  \frac{\partial V}{\partial x_{c}^{i}} \displaystyle  = \displaystyle  -\frac{\partial f}{\partial\left(x_{a}^{i}-x_{c}^{i}\right)}-\frac{\partial f}{\partial\left(x_{b}^{i}-x_{c}^{i}\right)} \ \ \ \ \ (12)

so adding up all the derivatives causes the terms to cancel in pairs. The argument is fairly easily extended to {N} particles.

Thus {\dot{p}^{i}=0} and linear momentum is conserved. [I realize this isn’t a very elegant or mathematical way of proving it; there is probably a better way of writing it down, but hopefully you get the idea.]

Deflection of light in Newtonian gravity

References: Anthony Zee, Einstein Gravity in a Nutshell, (Princeton University Press, 2013) – Chapter I.1, problem 3.

As we’ve seen, general relativity using the Schwarzschild metric predicts that light rays are bent as they pass near to a massive body such as the Sun. Newton believed that light was made up of what he called ‘corpuscles’, or small particles of matter which he presumably believed had mass. Thus in Newton’s gravitational theory, light would also be bent as it passed near a massive body.

To get an idea of how much a light corpuscle would be bent, we can assume that it is in a path with an energy per unit mass {\epsilon>0}, that is, its orbit is unbound. Our previous derivation is valid for {\epsilon>0} as well, so we have

\displaystyle u\left(\theta\right)=\frac{1}{r}=\frac{\kappa}{\ell^{2}}\left(1+\sqrt{\frac{2\epsilon\ell^{2}}{\kappa^{2}}+1}\cos\theta\right) \ \ \ \ \ (1)

 

where {\kappa=GM} and {\ell} is the angular momentum per unit mass. This result is a solution of the equations

\displaystyle \frac{1}{2}\dot{r}^{2}+\frac{\ell^{2}}{2r^{2}}-\frac{\kappa}{r} \displaystyle = \displaystyle \epsilon\ \ \ \ \ (2)
\displaystyle \dot{\theta} \displaystyle = \displaystyle \frac{\ell}{r^{2}} \ \ \ \ \ (3)

In an unbound orbit, {r\rightarrow\infty} at either end of the orbit meaning {u\rightarrow0}, so the angles {\theta_{\pm}} at which this occurs are, from 1:

\displaystyle \theta_{\pm}=\pm\arccos\left[-\left(\frac{2\epsilon\ell^{2}}{\kappa^{2}}+1\right)^{-1/2}\right] \ \ \ \ \ (4)

 

This problem is essentially one of Rutherford scattering, except the force is due to gravity rather than electrostatics and is attractive rather than repulsive. However, it’s usual to express the result in terms of the impact parameter {b}, which is the distance of closest approach to the mass {M} that the particle would make if there were no force on it.

The angular momentum {\ell} is a constant of the motion, so we can work it out when {r\rightarrow\infty}. In this limit

\displaystyle \ell \displaystyle = \displaystyle pr\sin\theta\ \ \ \ \ (5)
\displaystyle \displaystyle = \displaystyle pb \ \ \ \ \ (6)

where {p} is the linear momentum per unit mass. For {r\rightarrow\infty} all the motion is in the {r} direction (so that {\dot{\theta}\rightarrow0}), so {p\rightarrow\dot{r}} and from 2

\displaystyle p \displaystyle = \displaystyle \sqrt{2\epsilon}\ \ \ \ \ (7)
\displaystyle \ell \displaystyle = \displaystyle b\sqrt{2\epsilon} \ \ \ \ \ (8)

From 4

\displaystyle \theta_{\pm}=\pm\arccos\left[-\left(\frac{4\epsilon^{2}b^{2}}{\kappa^{2}}+1\right)^{-1/2}\right] \ \ \ \ \ (9)

For weak gravitational fields, we expect {\kappa} to be fairly small, so we can expand this as a series around {\kappa=0} (I used Maple for this, but you can grind through the derivatives if you like) and get

\displaystyle \theta_{\pm}=\pm\left[\frac{\pi}{2}+\frac{\kappa}{2b\epsilon}-\frac{\kappa^{3}}{24b^{3}\epsilon^{3}}+\mathcal{O}\left(\kappa^{5}\right)\right] \ \ \ \ \ (10)

First, we note that if there is no mass, {\kappa=GM=0} and {\theta_{\pm}=\pm\frac{\pi}{2}}. That is, the angular difference between the incoming and outgoing photon is {\pi} which is a straight line with no deflection as we’d expect. As we increase the mass, the deflection from a straight line is, to first order in {\kappa}:

\displaystyle \Delta\theta \displaystyle = \displaystyle \theta_{+}-\theta_{-}-\pi\ \ \ \ \ (11)
\displaystyle \displaystyle = \displaystyle \frac{\kappa}{b\epsilon} \ \ \ \ \ (12)

The deflection increases with mass, but decreases with increasing photon energy (in Newtonian physics, ‘increasing photon energy’ means the photon is going faster; relativity was unknown at the time) and with increasing impact parameter (the larger the closest approach to the mass, the smaller the deflection). This all seems to make sense.