Tag Archives: Lorentz transformations

Lorentz transformation as product of a pure boost and pure rotation

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, we can now complete the demonstration that a general Lorentz transformation is the product of a pure boost (motion at a constant velocity) multiplied by a pure rotation. We’ll follow Corollary IV.2 in Jaffe’s article.

In the last post, we saw that we could write a general Lorentz transformation in the form

\displaystyle  \widehat{\Lambda x}=A\widehat{x}A^{\dagger} \ \ \ \ \ (1)

where {x} is the 4-vector of a spacetime event, {\Lambda} is the Lorentz transformation as a {4\times4} matrix, {A} is a {2\times2} matrix with complex elements and a hat over a symbol means we’re looking at the {2\times2} complex matrix representing that object. We also saw in the last post that this representation restricts

Jaffe goes through a rather involved proof that the transformation {\Lambda\left(A\right)} defined by 1 is a member of the physically relevant group with {\det\Lambda=+1} and {\Lambda_{00}\ge1}, but this involves a lot of somewhat obscure matrix theorems that I don’t want to get into here, and these techniques don’t seem to be required for the rest of the demonstration, so we’ll just accept this fact for now.

What we really want to do is find out how we can calculate {\Lambda} given the {2\times2} matrix {A}. We can do this by using the result we got earlier for the components of the 4-vector {x}:

\displaystyle  \widehat{x}=\sum_{\mu=0}^{3}x_{\mu}\sigma_{\mu} \ \ \ \ \ (2)

where the {\sigma_{\mu}} are four Hermitian matrices:

\displaystyle   \sigma_{0} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]=I\ \ \ \ \ (3)
\displaystyle  \sigma_{1} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & 1\\ 1 & 0 \end{array}\right]\ \ \ \ \ (4)
\displaystyle  \sigma_{2} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (5)
\displaystyle  \sigma_{3} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right] \ \ \ \ \ (6)

We can invert 2 to get

\displaystyle  x_{\nu}=\left\langle \sigma_{\nu},\widehat{x}\right\rangle =\frac{1}{2}\mbox{Tr}\left(\sigma_{\nu}\widehat{x}\right) \ \ \ \ \ (7)

Reverting back to the {4\times4} matrix {\Lambda} (no hat), we have

\displaystyle   x_{\mu}^{\prime} \displaystyle  = \displaystyle  \sum_{\nu=0}^{3}\Lambda\left(A\right)_{\mu\nu}x_{\nu}\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  \left(\Lambda\left(A\right)x\right)_{\mu}\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},\widehat{\Lambda\left(A\right)x}\right\rangle \ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},A\widehat{x}A^{\dagger}\right\rangle \ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},A\sum_{\nu=0}^{3}x_{\nu}\sigma_{\nu}A^{\dagger}\right\rangle \ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  \sum_{\nu=0}^{3}\left\langle \sigma_{\mu},A\sigma_{\nu}A^{\dagger}\right\rangle x_{\nu} \ \ \ \ \ (13)

We used 1 in the fourth line and 2 in the fifth line. Comparing the first and last lines, we see that

\displaystyle   \Lambda\left(A\right)_{\mu\nu} \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},A\sigma_{\nu}A^{\dagger}\right\rangle \ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}^{\dagger}A\sigma_{\nu}A^{\dagger}\right)\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}A\sigma_{\nu}A^{\dagger}\right) \ \ \ \ \ (16)

where in the last line we used the fact that all the {\sigma_{\mu}} are Hermitian so that {\sigma_{\mu}^{\dagger}=\sigma_{\mu}}.

In order for {\Lambda\left(A\right)} to be a valid Lorentz transformation, clearly its elements must be real numbers. We can show this is true as follows. The complex conjugate is represented by drawing a bar over a quantity. We get

\displaystyle  \overline{\Lambda\left(A\right)_{\mu\nu}}=\frac{1}{2}\mbox{Tr}\left(\overline{\sigma_{\mu}A\sigma_{\nu}A^{\dagger}}\right) \ \ \ \ \ (17)

We can now use the fact that the trace of a product of matrices remains unchanged if we cyclically permute the order of multiplication. In particular {\mbox{Tr}\left(XB^{\dagger}\right)=\mbox{Tr}\left(B^{\dagger}X\right)}. Also, {\mbox{Tr}\left(B^{\dagger}X\right)=\mbox{Tr}\left(\left(\overline{X^{\dagger}B}\right)^{T}\right)=\mbox{Tr}\left(\overline{X^{\dagger}B}\right)} since the trace of a matrix is equal to the trace of its transpose. In 17, we can set {X^{\dagger}=\sigma_{\mu}} and {B=A\sigma_{\nu}A^{\dagger}} and use the fact that the {\sigma_{\mu}} are all Hermitian so that {\sigma_{\mu}^{\dagger}=\sigma_{\mu}}:

\displaystyle   \overline{\Lambda\left(A\right)_{\mu\nu}}=\frac{1}{2}\mbox{Tr}\left(\overline{\sigma_{\mu}A\sigma_{\nu}A^{\dagger}}\right) \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\left(A\sigma_{\nu}A^{\dagger}\right)^{\dagger}\sigma_{\mu}\right)\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(A\sigma_{\nu}A^{\dagger}\sigma_{\mu}\right)\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}A\sigma_{\nu}A^{\dagger}\right)\ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\left(A\right)_{\mu\nu} \ \ \ \ \ (21)

where in the third line we cyclically permuted the matrices in the trace. Thus the elements of {\Lambda\left(A\right)} are real.

Now we consider two cases. First, suppose that {A=U}, where {U} is a unitary matrix, so that {U^{\dagger}=U^{-1}}. From 16 we find that {\Lambda\left(U\right)_{00}} is, using {\sigma_{0}=I}:

\displaystyle   \Lambda\left(U\right)_{00} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{0}U\sigma_{0}U^{\dagger}\right)\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(UU^{\dagger}\right)\ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}I\ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  1 \ \ \ \ \ (25)

The other elements in the first row and first column of {\Lambda} are all zero, as we can see by using 16 again:

\displaystyle   \Lambda\left(U\right)_{0i} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{0}U\sigma_{i}U^{\dagger}\right)\ \ \ \ \ (26)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(U\sigma_{i}U^{\dagger}\right)\ \ \ \ \ (27)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(U^{\dagger}U\sigma_{i}\right)\ \ \ \ \ (28)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}\right)\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (30)

since {\mbox{Tr}\sigma_{i}=0} for {i=1,2,3}. A similar argument works for the first column of {\Lambda\left(U\right)} as well:

\displaystyle   \Lambda\left(U\right)_{i0} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}U\sigma_{0}U^{\dagger}\right)\ \ \ \ \ (31)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}UU^{\dagger}\right)\ \ \ \ \ (32)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}\right)\ \ \ \ \ (33)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (34)

For the other elements, we have

\displaystyle   \Lambda\left(U\right)_{ij} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}U\sigma_{j}U^{\dagger}\right)\ \ \ \ \ (35)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}\left(U^{-1}\right)^{\dagger}\sigma_{j}U^{-1}\right)\ \ \ \ \ (36)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{j}U^{-1}\sigma_{i}\left(U^{-1}\right)^{\dagger}\right)\ \ \ \ \ (37)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\left(U^{-1}\right)_{ji}\ \ \ \ \ (38)
\displaystyle  \displaystyle  = \displaystyle  \left[\Lambda\left(U\right)\right]_{ji}^{-1} \ \ \ \ \ (39)

That is

\displaystyle  \left[\Lambda\left(U\right)\right]^{T}=\Lambda\left(U\right)^{-1} \ \ \ \ \ (40)

so that

\displaystyle  \Lambda=\left[\begin{array}{cc} 1 & 0\\ 0 & \mathcal{R} \end{array}\right] \ \ \ \ \ (41)

where {\mathcal{R}} is a {3\times3} matrix, and the 0s represent 3 zero components in the top row and first column. In other words, when {A=U}, {\Lambda} is a pure rotation.

The other case we need to examine is when {A=H}, where {H} is a Hermitian matrix, so that {H^{\dagger}=H}. In that case, from 16

\displaystyle   \Lambda\left(H\right)_{\mu\nu} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}H\sigma_{\nu}H\right)\ \ \ \ \ (42)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(H\sigma_{\mu}H\sigma_{\nu}\right)\ \ \ \ \ (43)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\nu}H\sigma_{\mu}H\right)\ \ \ \ \ (44)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\left(H\right)_{\nu\mu} \ \ \ \ \ (45)

so {\Lambda\left(H\right)} is a symmetric matrix. (We used two cyclic permutations in the trace here.) Although we haven’t proved that a symmetric Lorentz transformation always represents a pure boost, this has been verified (see, for example, Wikipedia; I can’t be bothered going through it all here).

Now we are ready to get our final result. To do this, we need to use a theorem from matrix algebra which says that every matrix {A} in the group {SL\left(2,\mathbb{C}\right)} (that is, a {2\times2} matrix with complex elements and determinant +1) has a unique polar decomposition into a strictly positive Hermitian matrix {H} and a unitary matrix {U}, so that we always have

\displaystyle  A=HU \ \ \ \ \ (46)

To connect this with what we’ve done above, we can define

\displaystyle   H \displaystyle  = \displaystyle  \left(AA^{\dagger}\right)^{1/2}\ \ \ \ \ (47)
\displaystyle  U \displaystyle  = \displaystyle  H^{-1}A=\left(AA^{\dagger}\right)^{1/2}A \ \ \ \ \ (48)

[The square root of a matrix is defined to be the matrix {S=A^{1/2}} so that {S^{2}=A}.] This definition is consistent with {H} being Hermitian, since

\displaystyle   \left(S^{2}\right)^{\dagger} \displaystyle  = \displaystyle  A^{\dagger}=A\ \ \ \ \ (49)
\displaystyle  \displaystyle  = \displaystyle  \left(SS\right)^{\dagger}\ \ \ \ \ (50)
\displaystyle  \displaystyle  = \displaystyle  \left(S^{\dagger}\right)^{2}\ \ \ \ \ (51)
\displaystyle  \displaystyle  = \displaystyle  S^{2} \ \ \ \ \ (52)

Thus if we restrict {S} to be the positive square root, we must have {S^{\dagger}=S}.

The definition is also consistent with {U} being unitary, since

\displaystyle   UU^{\dagger} \displaystyle  = \displaystyle  \left(H^{-1}A\right)\left(H^{-1}A\right)^{\dagger}\ \ \ \ \ (53)
\displaystyle  \displaystyle  = \displaystyle  H^{-1}AA^{\dagger}H^{-1}\ \ \ \ \ (54)
\displaystyle  \displaystyle  = \displaystyle  \left(AA^{\dagger}\right)^{-1/2}AA^{\dagger}\left(AA^{\dagger}\right)^{-1/2}\ \ \ \ \ (55)
\displaystyle  \displaystyle  = \displaystyle  I \ \ \ \ \ (56)

[We define {\left(AA^{\dagger}\right)^{-1/2}} to be the inverse of {\left(AA^{\dagger}\right)^{1/2}}.]

Therefore, we can uniquely decompose any Lorentz transformation {\Lambda\left(A\right)} into

\displaystyle  \Lambda\left(A\right)=\Lambda\left(H\right)\Lambda\left(U\right) \ \ \ \ \ (57)

that is, the product of a pure rotation and a pure boost.

Lorentz transformations and the special linear group SL(2,C)

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, we start off with the representation of a spacetime 4-vector as a {2\times2} complex Hermitian matrix:

\displaystyle  \widehat{x}\equiv\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right] \ \ \ \ \ (1)

Our ultimate goal is to show that any Lorentz transformation can be represented as the product of a pure rotation {R} and a pure boost {B}: {\Lambda=RB}. The step shown in this post may look like little more than an exercise in matrix algebra, but be patient; it takes a while to get to our final goal.

We start by looking at the matrices belonging to the special linear group {SL\left(2,\mathbb{C}\right)}, which consists of {2\times2} matrices containing general complex numbers as elements, and with determinant 1. Each matrix {A\in SL\left(2,\mathbb{C}\right)} can be used to define a linear transformation of the Hermitian matrix 1:

\displaystyle  \widehat{x}^{\prime}=A\widehat{x}A^{\dagger} \ \ \ \ \ (2)

Because the determinant of a product is equal to the product of the determinants, and {\det A=\det A^{\dagger}=1}, {\det\widehat{x}^{\prime}=\det\widehat{x}=x_{\mu}x^{\mu}}. Thus such a transformation leaves the 4-vector length unchanged, so qualifies as a Lorentz transformation. Also, as a general complex {2\times2} matrix contains 4 elements, each with a real and imaginary part, there are 8 parameters. The condition {\det A=1} provides 2 constraints (one on the real part and one on the imaginary part), leaving 6 independent parameters, which is the same as the number of free parameters in a general Lorentz transformation.

We can give a more detailed proof that {A} provides a Lorentz transformation as follows. Suppose we start with two matrices {A,B\in SL\left(2,\mathbb{C}\right)} and define a transformation

\displaystyle  \widehat{x}^{\prime}=A\widehat{x}B \ \ \ \ \ (3)

[Remember that the hats on {\widehat{x}} and {\widehat{x}^{\prime}} mean that we’re considering the {2\times2} matrix version 1 of the 4-vectors {x} and {x^{\prime}}.] The transformed matrix {\widehat{x}^{\prime}} must be Hermitian for all {\widehat{x}}, so we must have

\displaystyle   \left(A\widehat{x}B\right)^{\dagger} \displaystyle  = \displaystyle  A\widehat{x}B\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  B^{\dagger}\widehat{x}A^{\dagger} \ \ \ \ \ (5)

We now left-multiply by {\left(B^{\dagger}\right)^{-1}} and right-multiply by {B^{-1}} to get

\displaystyle  \left(B^{\dagger}\right)^{-1}A\widehat{x}=\widehat{x}A^{\dagger}B^{-1} \ \ \ \ \ (6)

But we also have

\displaystyle  \left(B^{\dagger}\right)^{-1}A=\left(A^{\dagger}B^{-1}\right)^{\dagger} \ \ \ \ \ (7)

so the matrix

\displaystyle  T\equiv\left(B^{\dagger}\right)^{-1}A \ \ \ \ \ (8)

is Hermitian. We can therefore write 6 as

\displaystyle  T\widehat{x}=\widehat{x}T^{\dagger}=\widehat{x}T \ \ \ \ \ (9)

so {T} commutes with {\widehat{x}} for all {\widehat{x}}.

Now we can choose {x=\sigma_{2}} and then {x=\sigma_{3}}, where the {\sigma_{i}}s are two of the Pauli matrices which we showed (together with the identity matrix {\sigma_{0}}) form a basis for the space of {2\times2} Hermitian matrices. Now we’ve seen that{\sigma_{2}} and {\sigma_{3}} also form an irreducible set, and we saw that any matrix {T} that commutes with all the members of an irreducible set must be a multiple of the identity matrix. Thus we must have

\displaystyle  T=\lambda I \ \ \ \ \ (10)

for some constant {\lambda}. However, since {T} is the product of two matrices {A} and {\left(B^{\dagger}\right)^{-1}}, both of which have determinant 1, {\det T=1} also, which means that {\lambda^{2}=1} and {\lambda=\pm1}. Therefore

\displaystyle   \left(B^{\dagger}\right)^{-1}A \displaystyle  = \displaystyle  \pm I\ \ \ \ \ (11)
\displaystyle  A \displaystyle  = \displaystyle  \pm B^{\dagger} \ \ \ \ \ (12)

Thus the transformation 3 can be written as

\displaystyle  \widehat{x}^{\prime}=\pm A\widehat{x}A^{\dagger} \ \ \ \ \ (13)

To eliminate the {-} sign, suppose that

\displaystyle  \widehat{x}^{\prime}=-A\widehat{x}A^{\dagger} \ \ \ \ \ (14)

A Lorentz transformation giving this result can be written as

\displaystyle  \widehat{x}^{\prime}=\widehat{\Lambda x} \ \ \ \ \ (15)

where {\Lambda} is the {4\times4} matrix giving the Lorentz transformation of the original 4-vector {x}. In the original 4-vector notation, we have

\displaystyle   x_{\mu}^{\prime} \displaystyle  = \displaystyle  \sum_{\nu=0}^{3}\Lambda_{\mu\nu}x_{\nu}\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \left(\Lambda x\right)_{\mu} \ \ \ \ \ (17)

From the relation between the 4-vector and {2\times2} matrix representations, we have

\displaystyle  x_{\mu}^{\prime}=\left\langle \sigma_{\mu},\widehat{x}^{\prime}\right\rangle \ \ \ \ \ (18)

where {\left\langle \sigma_{\mu},\widehat{x}^{\prime}\right\rangle } is the inner product of the two matrices. Therefore from 14

\displaystyle   \left(\Lambda x\right)_{\mu} \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},\widehat{x}^{\prime}\right\rangle \ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{\mu},A\widehat{x}A^{\dagger}\right\rangle \ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{\mu},A\left(\sum_{\nu=0}^{3}\sigma_{\nu}x_{\nu}\right)A^{\dagger}\right\rangle \ \ \ \ \ (21)

If we choose {x=\left(1,0,0,0\right)}, we have

\displaystyle   \left(\Lambda x\right)_{0} \displaystyle  = \displaystyle  \Lambda_{00}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{0},A\left(\sum_{\nu=0}^{3}\sigma_{\nu}x_{\nu}\right)A^{\dagger}\right\rangle \ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{0},A\sigma_{0}A^{\dagger}\right\rangle \ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{0},AA^{\dagger}\right\rangle \ \ \ \ \ (25)
\displaystyle  \displaystyle  = \displaystyle  -\frac{1}{2}\mbox{Tr}\left(AA^{\dagger}\right)\ \ \ \ \ (26)
\displaystyle  \displaystyle  \le \displaystyle  0 \ \ \ \ \ (27)

where the penultimate line follows from the definition of the inner product. The last line follows because

\displaystyle  \mbox{Tr}\left(AA^{\dagger}\right)=\left|A_{11}\right|^{2}+\left|A_{22}\right|^{2}\ge0 \ \ \ \ \ (28)

Since we’re requiring the transformation to be orthochronous, we must have {\Lambda_{00}\ge1}, so we must exclude the {-} sign in 13, giving 2.

Finally, we can show that the transformation matrix {A} is unique, up to a sign. We can prove this by supposing that there are two different {SL\left(2,\mathbb{C}\right)} matrices {A} and {B} that give the same transformation for all {\widehat{x}}, that is

\displaystyle  A\widehat{x}A^{\dagger}=B\widehat{x}B^{\dagger} \ \ \ \ \ (29)

This implies

\displaystyle   B^{-1}A\widehat{x}A^{\dagger}\left(B^{\dagger}\right)^{-1} \displaystyle  = \displaystyle  \widehat{x}\ \ \ \ \ (30)
\displaystyle  \displaystyle  = \displaystyle  B^{-1}A\widehat{x}\left(B^{-1}A\right)^{\dagger} \ \ \ \ \ (31)

We can now choose {\widehat{x}=I}, which shows that

\displaystyle  \left(B^{-1}A\right)^{\dagger}=\left(B^{-1}A\right)^{-1} \ \ \ \ \ (32)

which means (by definition), {B^{-1}A} is unitary, so for all {\widehat{x}}

\displaystyle  \widehat{x}=B^{-1}A\widehat{x}\left(B^{-1}A\right)^{-1} \ \ \ \ \ (33)

This means that {B^{-1}A} commutes with {\widehat{x}} for all {\widehat{x}} (that’s the only way we can cancel {B^{-1}A} off the RHS). Using the same argument as above, we can choose {\widehat{x}} to be two of the Pauli matrices, which form an irreducible set. Since {B^{-1}A} commutes with both these matrices, it must be a multiple {\lambda} of the identity:

\displaystyle   B^{-1}A \displaystyle  = \displaystyle  \lambda I\ \ \ \ \ (34)
\displaystyle  A \displaystyle  = \displaystyle  \lambda B \ \ \ \ \ (35)

Since {\det A=\det B=1} and for a {2\times2} matrix {\det\left(\lambda B\right)=\lambda^{2}\det B}, we have {\lambda^{2}=1}, so {\lambda=\pm1}. Therefore {A} is unique up to a sign.

In summary, what we’ve done in this post is show that a restricted Lorentz transformation {\Lambda} (that is, one where {\det\Lambda=+1} and {\Lambda_{00}\ge1}) can be represented by a matrix {A\in SL\left(2,\mathbb{C}\right)} where {A} is unique up to a sign.

Lorentz transformations as rotations

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Before we apply Noether’s theorem to Lorentz transformations, we need to take a step back and look at a generalized version of the Lorentz transformation. Most introductory treatments of special relativity derive the Lorentz transformation as the transformation between two inertial frames that are moving at some constant velocity with respect to each other. This form of the transformations allows us to derive the usual consequences of special relativity such as length contraction and time dilation. However, it’s useful to look at a Lorentz transformation is a more general way.

The idea is to define a Lorentz transformation as any transformation that leaves the magnitude of all four-vectors {x} unchanged, where this magnitude is defined using the usual flat space metric {g^{\mu\nu}} so that

\displaystyle  x^{2}=x_{\mu}x^{\mu}=g^{\mu\nu}x_{\mu}x_{\nu}=x_{0}^{2}-x_{1}^{2}-x_{2}^{2}-x_{3}^{2} \ \ \ \ \ (1)

The flat space (Minkowski) metric is

\displaystyle  g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (2)

We know that the traditional Lorentz transformation between two inertial frames in relative motion satisfies this condition, but in fact a rotation of the coordinate system in 3-d space (leaving the time coordinate unchanged) also satisfies this condition, so a Lorentz transformation defined in this more general way includes more transformations than the traditional one.

We can define this general transformation in terms of a {4\times4} matrix {\Lambda}, so that a four-vector {x} transforms to another vector {x^{\prime}} according to

\displaystyle  x^{\prime}=\Lambda x \ \ \ \ \ (3)

We can define the scalar product of two 4-vectors using the notation

\displaystyle  \left\langle x,y\right\rangle \equiv\sum_{i=0}^{3}x_{i}y_{i} \ \ \ \ \ (4)

The scalar product in flat space using the Minkowski metric {g} is therefore

\displaystyle  \left\langle x,gy\right\rangle =g^{\mu\nu}x_{\mu}y_{\nu}=x_{0}y_{0}-x_{1}y_{1}-x_{2}y_{2}-x_{3}y_{3} \ \ \ \ \ (5)

In matrix notation, in which {x} and {y} are column vectors, this is

\displaystyle  \left\langle x,gy\right\rangle =x^{T}gy \ \ \ \ \ (6)

In this way, the condition that {\Lambda} leaves the magnitude unchanged is

\displaystyle  \left\langle \Lambda x,g\Lambda x\right\rangle =\left\langle x,gx\right\rangle \ \ \ \ \ (7)

for all {x}. In matrix notation, this is

\displaystyle  \left(\Lambda x\right)^{T}g\Lambda x=x^{T}\Lambda^{T}g\Lambda x=x^{T}gx \ \ \ \ \ (8)

from which we get one condition on {\Lambda}:

\displaystyle  \Lambda^{T}g\Lambda=g \ \ \ \ \ (9)

[Note that Jaffe uses a superscript {tr} to indicate a matrix transpose; I find this confusing as {tr} usually means the trace of a matrix, and a superscript {T} is more usual for the transpose.]

Because both sides of 9 refer to a symmetric matrix (on the LHS, {\left(\Lambda^{T}g\Lambda\right)^{T}=\Lambda^{T}g^{T}\left(\Lambda^{T}\right)^{T}=\Lambda^{T}g\Lambda}), this equation gives 10 independent equations for the elements of {\Lambda}, so the number of parameters that can be specified arbitrarily is {4\times4-10=6}.

The set {\mathcal{L}} of all Lorentz transformations forms a group under matrix multiplication, known as the Lorentz group. We can demonstrate this by showing that the four group properties are satisfied.

First, completeness. If we perform two transformations in succession on a 4-vector {x} then we get {x^{\prime}=\Lambda_{2}\Lambda_{1}x}. The compound transformation satisfies 9:

\displaystyle   \left(\Lambda_{2}\Lambda_{1}\right)^{T}g\Lambda_{2}\Lambda_{1} \displaystyle  = \displaystyle  \Lambda_{1}^{T}\Lambda_{2}^{T}g\Lambda_{2}\Lambda_{1}\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \Lambda_{1}^{T}g\Lambda_{1}\ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  g \ \ \ \ \ (12)

Thus the group is closed under multiplication.

Second, associativity is automatically satisfied as matrix multiplication is associative.

An identity element exists in the form of the identity matrix {I}, which is itself a Lorentz transformation as it satisfies 9.

Finally, we need to show that every matrix {\Lambda} has an inverse that is also part of the set {\mathcal{L}}. Taking the determinant of 9 we have

\displaystyle   \det\left(\Lambda^{T}g\Lambda\right) \displaystyle  = \displaystyle  \left(\det\Lambda^{T}\right)\left(\det g\right)\left(\det\Lambda\right)\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  \left(\det\Lambda\right)\left(\det g\right)\left(\det\Lambda\right)\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  -\left(\det\Lambda\right)^{2} \ \ \ \ \ (15)

since {\det g=-1} from 2. From the RHS of 9, this must equal {\det g=-1} so we have

\displaystyle   -\left(\det\Lambda\right)^{2} \displaystyle  = \displaystyle  -1\ \ \ \ \ (16)
\displaystyle  \det\Lambda \displaystyle  = \displaystyle  \pm1 \ \ \ \ \ (17)

From a basic theorem in matrix algebra, any matrix with a non-zero determinant has an inverse, so {\Lambda^{-1}} exists. To show that {\Lambda^{-1}} is a Lorentz transformation, we can take the inverse of 9 and use the fact that {g^{-1}=g}:

\displaystyle   \left(\Lambda^{T}g\Lambda\right)^{-1} \displaystyle  = \displaystyle  g^{-1}=g\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{T}\right)^{-1}\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{-1}\right)^{T} \ \ \ \ \ (20)

since the inverse and transpose operations commute (another basic theorem in matrix algebra). Therefore {\Lambda^{-1}} is also a valid Lorentz transformation.

We can also see that {\Lambda^{T}} is a valid transformation by left-multiplying by {\Lambda} and right-multiplying by {\Lambda^{T}}:

\displaystyle   g \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{-1}\right)^{T}\ \ \ \ \ (21)
\displaystyle  \Lambda g\Lambda^{T} \displaystyle  = \displaystyle  \left(\Lambda\Lambda^{-1}\right)g\left(\Lambda^{-1}\right)^{T}\Lambda^{T}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  g \ \ \ \ \ (23)

We need one more property of {\Lambda} concerning the element {\Lambda_{00}}. Again starting from 9, the 00 component of the RHS is {g_{00}=1}, and writing out the 00 component of the LHS explicitly we have

\displaystyle  \left[\Lambda^{T}g\Lambda\right]_{00}=\Lambda_{00}^{2}-\sum_{i=1}^{3}\Lambda_{i0}^{2}=1 \ \ \ \ \ (24)

This gives

\displaystyle  \Lambda_{00}=\pm\sqrt{1+\sum_{i=1}^{3}\Lambda_{i0}^{2}} \ \ \ \ \ (25)

Thus either {\Lambda_{00}\ge1} or {\Lambda_{00}\le-1}.

From the determinant and {\Lambda_{00}}, we can classify a particular transformation matrix {\Lambda} as being in one of four so-called connected components. Jaffe spells out in detail the proof that these four components are disjoint, that is, we can’t define some parameter {s} that can be varied continuously to move a matrix {\Lambda} from one connected component to another connected component. The notation {\mathcal{L}_{+}^{\uparrow}} indicates the set of matrices with {\det\Lambda=+1} (indicated by the + subscript) and {\Lambda_{00}\ge1} (indicated by the {\uparrow} superscript). The other three connected components are {\mathcal{L}_{-}^{\uparrow}} ({\det\Lambda=-1}, {\Lambda_{00}\ge1}); {\mathcal{L}_{+}^{\downarrow}} ({\det\Lambda=+1}, {\Lambda_{00}\le1}); and {\mathcal{L}_{-}^{\downarrow}} ({\det\Lambda=-1}, {\Lambda_{00}\le1}). Not all of these subsets of {\mathcal{L}} form groups, as some of them are not closed under multiplication.

If {\det\Lambda=+1}, {\Lambda} is called proper, and if{\det\Lambda=-1}, {\Lambda} is called improper. If {\Lambda_{00}\ge+1}, {\Lambda} is orthochronous, and if{\Lambda_{00}\le-1}, {\Lambda} is non-orthochronous. From here on, we’ll consider only proper orthochronous transformations, that is, the connected component {\mathcal{L}_{+}^{\uparrow}}.

Members of {\mathcal{L}_{+}^{\uparrow}} can be subdivided again into two types: pure rotations and pure boosts. A pure rotation is a rotation (about the origin) in 3-d space, leaving the time coordinate unchanged. That is, {\Lambda_{00}=+1}. Such a transformation can be written as

\displaystyle  \Lambda=\left[\begin{array}{cc} 1 & 0\\ 0 & \mathcal{R} \end{array}\right] \ \ \ \ \ (26)

where {\mathcal{R}} is a {3\times3} matrix, and the 0s represent 3 zero components in the top row and first column. We know that the off-diagonal elements in the first column must be zero, since if {\Lambda_{00}=+1}, we have from 25 that

\displaystyle  \sum_{i=1}^{3}\Lambda_{i0}^{2}=0 \ \ \ \ \ (27)

Since {\Lambda^{T}} must also be a valid transformation, this gives the analogous equation

\displaystyle  \sum_{i=1}^{3}\Lambda_{0i}^{2}=0 \ \ \ \ \ (28)

Thus the off-diagonal elements of the top row of {\Lambda} are also zero.

Since {\det\Lambda=1}, we must have {\det\mathcal{R}=1}. From 9, {\mathcal{R}} must also be an orthogonal matrix, that is, its rows must be mutually orthogonal (as must its columns). For example, if we pick the 2,3 element in the product 9, we have

\displaystyle   \left[\Lambda^{T}g\Lambda\right]_{23} \displaystyle  = \displaystyle  g_{23}=0\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  -\sum_{i=1}^{3}\Lambda_{i2}\Lambda_{i3} \ \ \ \ \ (30)

Thus columns 2 and 3 must be orthogonal.

These matrices form a group known as {SO\left(3\right)}, the group of real, orthogonal, {3\times3} matrices with {\det\mathcal{R}=+1}. A familiar example is a rotation by an angle {\theta} about the {z} axis, for which

\displaystyle  \mathcal{R}=\left[\begin{array}{ccc} \cos\theta & -\sin\theta & 0\\ \sin\theta & \cos\theta & 0\\ 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (31)

giving the full transformation matrix as

\displaystyle  \Lambda=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & \cos\theta & -\sin\theta & 0\\ 0 & \sin\theta & \cos\theta & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (32)

In general, a rotation can be about any axis through the origin, in which case {\mathcal{R}} gets more complicated, but the idea is the same.

We’ve already seen that a pure boost, that is, a transformation into a second inertial frame moving at some constant velocity in a given direction relative to the first frame, can be written as a rotation, if we use hyperbolic functions instead of trig functions. In this case {\Lambda_{00}>+1}. The standard situation from introductory special relativity is that of from {S^{\prime}} moving along the {x_{1}} axis at some constant speed {\beta}. If we define

\displaystyle   \cosh\chi \displaystyle  \equiv \displaystyle  \gamma=\frac{1}{\sqrt{1-\beta^{2}}}\ \ \ \ \ (33)
\displaystyle  \sinh\chi \displaystyle  \equiv \displaystyle  \beta\gamma=\frac{\beta}{\sqrt{1-\beta^{2}}} \ \ \ \ \ (34)

then the transformation is

\displaystyle  \Lambda=\left[\begin{array}{cccc} \cosh\chi & \sinh\chi & 0 & 0\\ \sinh\chi & \cosh\chi & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (35)

This has determinant +1 since {\cosh^{2}\chi-\sinh^{2}\chi=1}. We can verify by direct substitution that 9 is satisfied.

It turns out that all proper, orthochronous Lorentz transformations can be written as the product of a pure rotation and a pure boost, that is

\displaystyle  \Lambda=BR \ \ \ \ \ (36)

where the pure rotation {R} is applied first, followed by a pure boost {B}. (Jaffe doesn’t prove this at this point; we’ll return to this later.)

Lorentz invariance in Klein-Gordon momentum states

Michael E. Peskin & Daniel V. Schroeder, An Introduction to Quantum Field Theory, (Perseus Books, 1995) – Chapter 2.

The vacuum state {\left|0\right\rangle } in Klein-Gordon field theory is a postulated state which gives 0 when operated on by any annihilation operator {a_{\mathbf{p}}}. Applying a creation operator {a_{\mathbf{p}}^{\dagger}} to the vacuum converts it to a state with a single particle of momentum {\mathbf{p}}, that is, the state {\left|\mathbf{p}\right\rangle }. The vacuum state is normalized so that {\left\langle 0\left|0\right.\right\rangle =1}. If we require all single-particle momentum states to be orthogonal for different momenta then, since {\mathbf{p}} is a continuous variable, we might expect that a suitable normalization would be

\displaystyle  \left\langle \mathbf{p}\left|\mathbf{q}\right.\right\rangle =\left(2\pi\right)^{3}\delta^{\left(3\right)}\left(\mathbf{p}-\mathbf{q}\right) \ \ \ \ \ (1)

[Again, the factors of {2\pi} depend on how other quantities in the theory are defined, since sometimes these factors turn up elsewhere.] The problem with this normalization is that it’s not Lorentz invariant. That is, if we view the system from a frame moving with velocity {\beta} in the {x_{3}} direction, say, the delta function doesn’t remain invariant. Since 4-momentum is a 4-vector, it transforms under Lorentz transformations, so that

\displaystyle   E^{\prime} \displaystyle  = \displaystyle  \gamma\left(E+\beta p_{3}\right)\ \ \ \ \ (2)
\displaystyle  p_{3}^{\prime} \displaystyle  = \displaystyle  \gamma\left(p_{3}+\beta E\right) \ \ \ \ \ (3)

where {\gamma=1/\sqrt{1-\beta^{2}}} and {E=p_{0}} as usual.

How does the delta function transform? We can use the formula

\displaystyle  \delta\left(f\left(x\right)-f\left(x_{0}\right)\right)=\frac{1}{\left|f^{\prime}\left(x_{0}\right)\right|}\delta\left(x\right) \ \ \ \ \ (4)

We want to find {\delta^{\left(3\right)}\left(\mathbf{p}^{\prime}-\mathbf{q}^{\prime}\right)} when we change from {p_{3}} to {p_{3}^{\prime}} so we need

\displaystyle   \frac{dp_{3}^{\prime}}{dp_{3}} \displaystyle  = \displaystyle  \gamma\left(1+\beta\frac{dE}{dp_{3}}\right)\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \gamma\left(1+\beta\frac{d}{dp_{3}}\sqrt{\mathbf{p}\cdot\mathbf{p}+m^{2}}\right)\ \ \ \ \ (6)
\displaystyle  \displaystyle  = \displaystyle  \gamma\left(1+\frac{\beta}{E}p_{3}\right)\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  \frac{\gamma}{E}\left(E+\beta p_{3}\right)\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  \frac{E^{\prime}}{E} \ \ \ \ \ (9)

So the delta function transforms as

\displaystyle  \delta^{\left(3\right)}\left(\mathbf{p}^{\prime}-\mathbf{q}^{\prime}\right)=\frac{E}{E^{\prime}}\delta^{\left(3\right)}\left(\mathbf{p}-\mathbf{q}\right) \ \ \ \ \ (10)

and is not invariant. However, multiplying through by {E^{\prime}} shows that

\displaystyle  E^{\prime}\delta^{\left(3\right)}\left(\mathbf{p}^{\prime}-\mathbf{q}^{\prime}\right)=E\delta^{\left(3\right)}\left(\mathbf{p}-\mathbf{q}\right) \ \ \ \ \ (11)

so the quantity {E\delta^{\left(3\right)}\left(\mathbf{p}-\mathbf{q}\right)} is Lorentz invariant. As a result, the momentum state is usually normalized so that

\displaystyle   \left|\mathbf{p}\right\rangle \displaystyle  = \displaystyle  \sqrt{2E_{\mathbf{p}}}a_{\mathbf{p}}^{\dagger}\left|0\right\rangle \ \ \ \ \ (12)
\displaystyle  \left\langle \mathbf{p}\left|\mathbf{q}\right.\right\rangle \displaystyle  = \displaystyle  \left(2\pi\right)^{3}2E_{\mathbf{p}}\delta^{\left(3\right)}\left(\mathbf{p}-\mathbf{q}\right) \ \ \ \ \ (13)

[The extra factor of 2 is inserted to make other calculations easier. It doesn’t affect the Lorentz invariance.]

Since we’re defining states to preserve Lorentz invariance, if we apply a Lorentz transformation {\Lambda} to a state to get a new state {\left|\Lambda\mathbf{p}\right\rangle } we require

\displaystyle  \left\langle \Lambda\mathbf{p}\left|\Lambda\mathbf{p}\right.\right\rangle =\left\langle \mathbf{p}\left|\mathbf{p}\right.\right\rangle \ \ \ \ \ (14)

This means that a Lorentz transformation is a unitary operator {U\left(\Lambda\right)}, since it leaves the bracket unchanged. We can write this as

\displaystyle  U\left(\Lambda\right)\left|\mathbf{p}\right\rangle =\left|\Lambda\mathbf{p}\right\rangle \ \ \ \ \ (15)

Because of 12 and the fact that operators transform under a unitary transformation according to {Q^{\prime}=UQU^{-1}}, we get the transformation rule

\displaystyle   \sqrt{2E_{\Lambda\mathbf{p}}}a_{\Lambda\mathbf{p}}^{\dagger} \displaystyle  = \displaystyle  U\left(\Lambda\right)\sqrt{2E_{\mathbf{p}}}a_{\mathbf{p}}^{\dagger}U^{-1}\left(\Lambda\right)\ \ \ \ \ (16)
\displaystyle  U\left(\Lambda\right)a_{\mathbf{p}}^{\dagger}U^{-1}\left(\Lambda\right) \displaystyle  = \displaystyle  \sqrt{\frac{E_{\Lambda\mathbf{p}}}{E_{\mathbf{p}}}}a_{\Lambda\mathbf{p}}^{\dagger} \ \ \ \ \ (17)

Another useful relation can be derived concerning the integration of Lorentz invariant functions. First, consider the four dimensional ‘volume’ element {d^{4}p=d^{3}pdp_{0}=d^{3}pdE}. The four-momentum transforms under Lorentz transformations in the same way as the four-vector representing spacetime, with energy playing the role of time and the three components of {\mathbf{p}} playing the role of the components of {\mathbf{x}}. Thus an increment of energy {dE} will be dilated by the factor {\gamma} in the same way that time intervals are dilated, and the momentum ‘volume’ element {d^{3}p} will be contracted by the factor {1/\gamma} in the same way that the spatial volume element {d^{3}x} is contracted in the direction of relative motion. Thus in the 4-volume element {d^{4}p} these two factors cancel out, meaning that {d^{4}p} is Lorentz invariant.

When we integrate over the four components of four-momentum, we must constrain the integral so that it satisfies the relativistic energy-momentum formula

\displaystyle  E^{2}=\mathbf{p}^{2}+m^{2} \ \ \ \ \ (18)

We can do this by means of a delta-function {\delta\left(p^{2}-m^{2}\right)} where {p^{2}=p^{\mu}p_{\mu}=E^{2}-\mathbf{p}^{2}} is the square of a 4-vector and thus is also Lorentz invariant. Now suppose we have some function {f\left(p\right)} (where {p} is the four-momentum) that is also Lorentz invariant. In that case, the integral

\displaystyle  \left.\int\frac{d^{4}p}{\left(2\pi\right)^{4}}\left(2\pi\right)f\left(p\right)\delta\left(p^{2}-m^{2}\right)\right|_{p^{0}>0} \ \ \ \ \ (19)

is Lorentz invariant because all of the factors in the integrand are invariant. (The factors of {2\pi} are there for consistency with the rest of the theory.) The subscript {p^{0}>0} reminds us that relativistic energy is always positive, so the integral over {p^{0}} is taken only over this interval. Another way of writing this is to use the Heaviside step function {\theta\left(p^{0}\right)} which is 1 for {p^{0}>0} and 0 for {p^{0}<0}:

\displaystyle  \int\frac{d^{4}p}{\left(2\pi\right)^{4}}\left(2\pi\right)f\left(p\right)\delta\left(p^{2}-m^{2}\right)\theta\left(p^{0}\right) \ \ \ \ \ (20)

We can transform the delta function by using 4, where this time the delta function is taken to be a function of {p^{0}}:

\displaystyle  \delta\left(\left(p^{0}\right)^{2}-\mathbf{p}^{2}-m^{2}\right)=\frac{1}{2\sqrt{\mathbf{p}^{2}+m^{2}}}\delta\left(p^{0}\right)=\frac{1}{2E_{\mathbf{p}}}\delta\left(p^{0}\right) \ \ \ \ \ (21)

The integral over {p^{0}} can now be done with the result

\displaystyle  \left.\int\frac{d^{4}p}{\left(2\pi\right)^{4}}\left(2\pi\right)f\left(p\right)\delta\left(p^{2}-m^{2}\right)\right|_{p^{0}>0}=\int\frac{d^{3}p}{\left(2\pi\right)^{3}}\frac{f\left(E_{\mathbf{p}},\mathbf{p}\right)}{2E_{\mathbf{p}}} \ \ \ \ \ (22)

where {f} on the RHS is now a function of the four-momentum with energy and 3-momentum properly related to each other, rather than the general {p} that appeared in the integral on the LHS. In particular, if {f=1}, the integral

\displaystyle  \int\frac{d^{3}p}{\left(2\pi\right)^{3}}\frac{1}{2E_{\mathbf{p}}} \ \ \ \ \ (23)

is a Lorentz invariant ‘measure’ integral.

One example of this is {f=\left|\mathbf{p}\right\rangle \left\langle \mathbf{p}\right|}, for which the Lorentz invariant integral is

\displaystyle  \mathbf{1}=\int\frac{d^{3}p}{\left(2\pi\right)^{3}}\frac{\left|\mathbf{p}\right\rangle \left\langle \mathbf{p}\right|}{2E_{\mathbf{p}}} \ \ \ \ \ (24)

This is the Lorentz invariant form of the expansion of the unit operator in terms of momentum states. It can be verified that it gives the correct answer by inserting it into 13 above:

\displaystyle   \left\langle \mathbf{p}\left|\mathbf{q}\right.\right\rangle \displaystyle  = \displaystyle  \left\langle \mathbf{p}\right|\int\frac{d^{3}p^{\prime}}{\left(2\pi\right)^{3}}\frac{\left|\mathbf{p}^{\prime}\right\rangle \left\langle \mathbf{p}^{\prime}\right|}{2E_{\mathbf{p}^{\prime}}}\left|\mathbf{q}\right\rangle \ \ \ \ \ (25)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \mathbf{p}\right|\int\frac{d^{3}p^{\prime}}{\left(2\pi\right)^{3}}\frac{\left|\mathbf{p}^{\prime}\right\rangle \left\langle \mathbf{p}^{\prime}\left|\mathbf{q}\right.\right\rangle }{2E_{\mathbf{p}^{\prime}}}\ \ \ \ \ (26)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \mathbf{p}\right|\int\frac{d^{3}p^{\prime}}{\left(2\pi\right)^{3}}\frac{\left|\mathbf{p}^{\prime}\right\rangle 2E_{\mathbf{p}^{\prime}}\left(2\pi\right)^{3}\delta^{\left(3\right)}\left(\mathbf{p}^{\prime}-\mathbf{q}\right)}{2E_{\mathbf{p}^{\prime}}}\ \ \ \ \ (27)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \mathbf{p}\left|\mathbf{q}\right.\right\rangle \ \ \ \ \ (28)

Lorentz transformation for infinitesimal relative velocity

References: Amitabha Lahiri & P. B. Pal, A First Book of Quantum Field Theory, Second Edition (Alpha Science International, 2004) – Chapter 1, Problems 1.5 – 1.6.

In special relativity, Lahiri & Pal use the opposite metric to the one we’ve been using so far, in that {g_{\mu\nu}=\mbox{diag}\left(+1,-1,-1,-1\right)}, that is, the time component is positive and the spatial components are negative. With this definition, lowering or raising the 0 index of a tensor has no effect on the sign, while lowering or raising index 1, 2 or 3 changes the sign.

With the usual spacetime four-vector

\displaystyle  x^{\mu}\equiv\left(x^{0},x^{i}\right)=\left(ct,\mathbf{x}\right) \ \ \ \ \ (1)

the lowered version is

\displaystyle  x_{\mu}=g_{\mu\nu}x^{\nu}=\left(ct,-\mathbf{x}\right) \ \ \ \ \ (2)

Under a Lorentz transformation, the {x^{\mu}} transform as

\displaystyle  x^{\prime\mu}=\Lambda_{\;\nu}^{\mu}x^{\nu} \ \ \ \ \ (3)

The transformation for {x_{\mu}} is therefore

\displaystyle   x_{\mu}^{\prime} \displaystyle  = \displaystyle  g_{\mu\nu}x^{\prime\nu}\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  g_{\mu\nu}\Lambda_{\;\sigma}^{\nu}x^{\sigma}\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \Lambda_{\mu\sigma}g^{\sigma\rho}x_{\rho}\ \ \ \ \ (6)
\displaystyle  \displaystyle  = \displaystyle  \Lambda_{\mu}^{\;\rho}x_{\rho} \ \ \ \ \ (7)

The matrix {\Lambda_{\mu}^{\;\rho}} is the original matrix {\Lambda_{\;\rho}^{\mu}} with the first index lowered and second raised. If {\mu=\rho=0} or if both {\mu} and {\rho} are spatial indices, the matrix element remains unchanged: {\Lambda_{\mu}^{\;\rho}=\Lambda_{\;\rho}^{\mu}}. If, however, exactly one index is zero (with the other index being spatial), the element changes sign: {\Lambda_{\mu}^{\;\rho}=-\Lambda_{\;\rho}^{\mu}}.

Infinitesimal relative velocity

In the standard case where the primed frame is moving relative to the unprimed frame at speed {v} along the {x} axis, the Lorentz transformations are

\displaystyle   t^{\prime} \displaystyle  = \displaystyle  \gamma\left(t-\frac{vx}{c^{2}}\right)\ \ \ \ \ (8)
\displaystyle  x^{\prime} \displaystyle  = \displaystyle  \gamma\left(x-vt\right)\ \ \ \ \ (9)
\displaystyle  y^{\prime} \displaystyle  = \displaystyle  y\ \ \ \ \ (10)
\displaystyle  z^{\prime} \displaystyle  = \displaystyle  z \ \ \ \ \ (11)

with

\displaystyle  \gamma\equiv\frac{1}{\sqrt{1-v^{2}/c^{2}}} \ \ \ \ \ (12)

If {\frac{v}{c}} is very small we can expand these equations to first order in {\beta\equiv\frac{v}{c}}. To this order

\displaystyle   \gamma \displaystyle  = \displaystyle  1+\frac{\beta^{2}}{2}+\ldots\ \ \ \ \ (13)
\displaystyle  \displaystyle  \approx \displaystyle  1 \ \ \ \ \ (14)

and

\displaystyle   ct^{\prime} \displaystyle  = \displaystyle  ct-x\beta\ \ \ \ \ (15)
\displaystyle  x^{\prime} \displaystyle  = \displaystyle  x-ct\beta \ \ \ \ \ (16)

so

\displaystyle  \Lambda_{\;\nu}^{\mu}=\left[\begin{array}{cccc} 1 & -\beta & 0 & 0\\ -\beta & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (17)

Lowering the first index we get

\displaystyle   \Lambda_{\mu\nu} \displaystyle  = \displaystyle  g_{\mu\rho}\Lambda_{\;\nu}^{\rho}\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]\left[\begin{array}{cccc} 1 & -\beta & 0 & 0\\ -\beta & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right]\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{cccc} 1 & -\beta & 0 & 0\\ \beta & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (20)

We can write this as the sum of {g_{\mu\nu}} and an antisymmetric matrix {\omega_{\mu\nu}=-\omega_{\nu\mu}}:

\displaystyle   \Lambda_{\mu\nu} \displaystyle  = \displaystyle  g_{\mu\nu}+\omega_{\mu\nu}\ \ \ \ \ (21)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right]+\left[\begin{array}{cccc} 0 & -\beta & 0 & 0\\ \beta & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \end{array}\right] \ \ \ \ \ (22)

Another trip to Alpha Centauri: more Lorentz transformation examples

Reference: Carroll, Bradley W. & Ostlie, Dale A. (2007), An Introduction to Modern Astrophysics, 2nd Edition; Pearson Education – Chapter 4, Problems 4.6-4.7.

We’re now on another trip to {\alpha} Centauri to get a few more examples of length contraction and time dilation. For simplicity, we’ll take the distance to {\alpha} Centauri as exactly 4 ly (the actual distance is 4.367 ly). [This analysis is similar to that in which we considered the twin paradox.]

The ship leaves Earth at time {t=t'=0}, with this event also being the time when the origins of {S} and {S'} coincide. The ship’s speed in Earth’s frame {S} is {\beta=0.8} so the time taken to reach {\alpha} Centauri in frame {S} is

\displaystyle  t=\frac{4}{0.8}=5\mbox{ years} \ \ \ \ \ (1)

The elapsed time as measured on the ship, in frame {S'} is

\displaystyle  t'=\frac{t}{\gamma}=t\sqrt{1-\beta^{2}}=5\times\frac{3}{5}=3\mbox{ years} \ \ \ \ \ (2)

In this case, only a single clock is needed to measure the time.

The distance to {\alpha} Centauri as measured in {S'} is

\displaystyle  d'=\frac{4}{\gamma}=2.4\mbox{ ly} \ \ \ \ \ (3)

A radio signal is sent from Earth to the ship every 6 months, as measured by a clock on Earth. In frame {S}, when the {n^{th}} signal is sent, the ship is at a distance

\displaystyle  d_{n}=0.8\frac{n}{2}=0.4n\mbox{ ly} \ \ \ \ \ (4)

To find when (as measured by {S}) the ship receives the signal, let’s say that the ship receives the {n^{th}} signal when it is at a distance {D_{n}} from Earth and that this occurs at time {T}. Then the time taken by the radio signal to reach the ship is just {D_{n}}, during which time the ship has moved from {d_{n}} to {D_{n}}, so

\displaystyle   D_{n} \displaystyle  = \displaystyle  \frac{D_{n}-d_{n}}{0.8}\ \ \ \ \ (5)
\displaystyle  D_{n} \displaystyle  = \displaystyle  5d_{n}=2n\mbox{ ly} \ \ \ \ \ (6)

The time interval between successive receptions of the signal by the ship is therefore 2 years plus the 6 months between successive transmissions, or 2.5 years, so the ship will receive 2 transmissions during its journey, with the second transmission arriving just as the ship arrives at {\alpha} Centauri. The time interval between receptions as measured on the ship is {2.5/\gamma=1.5\mbox{ years}}.

The ship also sends a radio signal back to Earth every 6 months as measured by the ship. When are these messages received on Earth? To solve this, we can find how far from Earth the ship is when it sends each message. In frame {S'}, the {n^{th}} message is sent at coordinates {\left(t_{n}',x_{n}'\right)=\left(\frac{n}{2},0\right)} so applying an inverse Lorentz transformation to find the ship’s coordinates in the {S} frame:

\displaystyle   t_{n} \displaystyle  = \displaystyle  \gamma\left(t_{n}'+\beta x_{n}'\right)\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  \frac{5}{3}\frac{n}{2}=\frac{5}{6}n\ \ \ \ \ (8)
\displaystyle  x_{n} \displaystyle  = \displaystyle  \gamma\left(x_{n}'+\beta t_{n}'\right)\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \frac{5}{3}\times\frac{4}{5}\times\frac{n}{2}\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \frac{2}{3}n \ \ \ \ \ (11)

The time of arrival is the time the pulse was sent ({t_{n}}) plus the travel time, which is just {x_{n}}:

\displaystyle  t_{arr}=t_{n}+x_{n}=\frac{3}{2}n\mbox{ years} \ \ \ \ \ (12)

Thus the radio pulses are received on Earth every 1.5 years (Earth time), which is the same interval as the pulses are received on the ship. This is to be expected, since the two viewpoints are symmetric. [Note that we could have also used this method to calculate the times when the Earth signals are received by the ship.]

Because of the relative motion of source and observer, the radio signals are Doppler shifted. If the source wavelength is {\lambda=15\mbox{ cm}}, the wavelength received is

\displaystyle  \lambda_{r}=\sqrt{\frac{1+\beta}{1-\beta}}\lambda=3\lambda=45\mbox{ cm} \ \ \ \ \ (13)

The ship immediately reverses direction and heads back to Earth at {\beta=0.8} once it has reached {\alpha} Centauri, but both the ship and Earth continue to send radio signals at 6 month intervals, as measured by their respective clocks. We can analyze the situation using the second method above. From the ship’s frame, it is the Earth that appears to suddenly reverse direction. On the outward journey, the Earth sends signals at {t_{n}'=5n/6} when it is at a distance {x_{n}'=-2n/3} (the Earth is to the left of the ship, so {x_{n}'} is negative). The ship receives these signals every 1.5 years so it receives one when it is halfway to the star and another just as it reaches the star.

At the time of the turn-around, the ship’s frame changes from {S'} to a new frame {S^{\prime\prime}} with a velocity {-\beta}. If we choose the origin of {S^{\prime\prime}} to coincide with the origins of {S} and {S'} then the time and position in {S^{\prime\prime}} when the ship reverses are

\displaystyle   t^{\prime\prime} \displaystyle  = \displaystyle  \gamma\left(t-\left(-\beta\right)x\right)\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \frac{5}{3}\left(5+0.8\times4\right)\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \frac{41}{3}\mbox{ years}\ \ \ \ \ (16)
\displaystyle  x^{\prime\prime} \displaystyle  = \displaystyle  \gamma\left(x+\beta t\right)\ \ \ \ \ (17)
\displaystyle  \displaystyle  = \displaystyle  \frac{5}{3}\left(4+0.8\times5\right)\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \frac{40}{3}\mbox{ ly} \ \ \ \ \ (19)

The time on Earth, as viewed from the ship, jumps as the ship reverses. Before the reverse, {t'=3} so, since on Earth {x=0}:

\displaystyle   t' \displaystyle  = \displaystyle  \gamma\left(t-\beta x\right)\ \ \ \ \ (20)
\displaystyle  3 \displaystyle  = \displaystyle  \frac{5}{3}\left(t-0\right)\ \ \ \ \ (21)
\displaystyle  t \displaystyle  = \displaystyle  \frac{9}{5} \ \ \ \ \ (22)

After the reverse

\displaystyle   t^{\prime\prime} \displaystyle  = \displaystyle  \gamma\left(t+\beta x\right)\ \ \ \ \ (23)
\displaystyle  \frac{41}{3} \displaystyle  = \displaystyle  \frac{5}{3}t\ \ \ \ \ (24)
\displaystyle  t \displaystyle  = \displaystyle  \frac{41}{5} \ \ \ \ \ (25)

When the ship arrives back on Earth, {t=10} so {t^{\prime\prime}} is

\displaystyle   t^{\prime\prime} \displaystyle  = \displaystyle  \gamma\left(t+\beta x\right)\ \ \ \ \ (26)
\displaystyle  \displaystyle  = \displaystyle  \frac{5}{3}\times10=\frac{50}{3} \ \ \ \ \ (27)

Thus in the ship’s frame, the return journey again takes {\frac{50}{3}-\frac{41}{3}=3} years.

To analyze the signals sent between Earth and the ship, we can use the technique above. We’ve already seen that when the ship is outbound, signals are received at intervals of 1.5 years by each end. What happens when the ship is returning?

In the {S^{\prime\prime}} frame, the position and time for the event when Earth sends a signal are:

\displaystyle   x^{\prime\prime} \displaystyle  = \displaystyle  \gamma\left(x+\beta t\right)\ \ \ \ \ (28)
\displaystyle  \displaystyle  = \displaystyle  \frac{5}{3}\times\frac{4}{5}\times\frac{n}{2}\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  \frac{2}{3}n\ \ \ \ \ (30)
\displaystyle  t^{\prime\prime} \displaystyle  = \displaystyle  \gamma t\ \ \ \ \ (31)
\displaystyle  \displaystyle  = \displaystyle  \frac{5}{6}n \ \ \ \ \ (32)

These are the same values as in the {S'} frame, but the difference here is that in {S^{\prime\prime}} the ship’s position is {\frac{40}{3}} so the distance to Earth is {\frac{40}{3}-\frac{2}{3}n}. Thus the times at which the signals arrive are

\displaystyle  t_{arr}^{\prime\prime}=\frac{5}{6}n+\frac{40}{3}-\frac{2}{3}n=\frac{n}{6}+\frac{40}{3} \ \ \ \ \ (33)

Thus signals now arrive 6 times a year instead of every 1.5 years. In the case of the signals sent from Earth, the first two arrive on the outbound leg and account for 3 years in the frame {S'}. The remaining 18 signals (there are a total of 20 signals sent over the 10 years of the trip as measured on Earth) arrive at intervals of {\frac{1}{6}} year so take {\frac{18}{6}=3} years, giving a round trip of 6 years as measured in {S'}.

For the signals sent from the ship, there are 6 signals sent on the outbound leg and 6 more on the return trip. The first 6 arrive at intervals of 1.5 years, accounting for {6\times1.5=9} years. The last 6 arrive at intervals of {\frac{1}{6}} year, accounting for the final year, giving a total of 10 years as measured in {S}.

To summarize:

For signals sent from Earth:

Time sent ({S}) Time arrived ({S'})
0.5 {\frac{3}{2}}
1.0 3
1.5 {3\frac{1}{6}}
2.0 {3\frac{2}{6}}
2.5 {3\frac{3}{6}}
3.0 {3\frac{4}{6}}
3.5 {3\frac{5}{6}}
4.0 {4}
4.5 {4\frac{1}{6}}
5.0 {4\frac{2}{6}}
5.5 {4\frac{3}{6}}
6.0 {4\frac{4}{6}}
6.5 {4\frac{5}{6}}
7.0 5
7.5 {5\frac{1}{6}}
8.0 {5\frac{2}{6}}
8.5 {5\frac{3}{6}}
9.0 {5\frac{4}{6}}
9.5 {5\frac{5}{6}}
10.0 6

For signals sent from the ship:

Time sent ({S'}) Time arrived ({S})
0.5 1.5
1.0 3
1.5 4.5
2.0 6.0
2.5 7.5
3.0 9.0
3.5 {9\frac{1}{6}}
4.0 {9\frac{2}{6}}
4.5 {9\frac{3}{6}}
5.0 {9\frac{4}{6}}
5.5 {9\frac{5}{6}}
6.0 10.0

Lorentz transformations and causality

Reference: Carroll, Bradley W. & Ostlie, Dale A. (2007), An Introduction to Modern Astrophysics, 2nd Edition; Pearson Education – Chapter 4, Problem 4.2.

To determine whether two observers can disagree about the temporal order of two events, we can calculate the invariant interval between the events. Using four-vector notation:

\displaystyle  \Delta s^{2}\equiv\left(\Delta x\right)_{i}\left(\Delta x\right)^{i} \ \ \ \ \ (1)

This gives three possible types of pairs of events:

  1. Timelike: If {\Delta s^{2}<0}, then it is possible to find a frame in which the two events occur at the same spatial point, but at different times, since it is the time component {-\left(\Delta x^{0}\right)^{2}} which is negative.
  2. Lightlike: If {\Delta s^{2}=0} then {c^{2}\left(\Delta t\right)^{2}=\Delta x^{2}} (if the motion is along the {x} axis; the argument is similar for arbitrary directions), so the events can be connected by a light signal.
  3. Spacelike: If {\Delta s^{2}>0}, then it is possible to find a frame in which the two events occur at the same time but at different places. Different observers may disagree about which event occurs first.

However, causality is also preserved directly from the Lorentz transformations. Using relativistic units where {c=1}, they are, in 2 dimensions:

\displaystyle   t' \displaystyle  = \displaystyle  \gamma\left(t-\beta x\right)\ \ \ \ \ (2)
\displaystyle  x' \displaystyle  = \displaystyle  \gamma\left(x-\beta t\right) \ \ \ \ \ (3)

Suppose we observe two events, 1 and 2, in frame {S}, and that

\displaystyle  \Delta x\equiv x_{2}-x_{1}=\alpha\left(t_{2}-t_{1}\right)\equiv\alpha\Delta t \ \ \ \ \ (4)

where {\alpha} is a positive constant (that is, we’re assuming the observer {S} sees event 2 to the right of event 1, and time {t_{2}} is after {t_{1}}). If {\alpha=1}, then {\Delta x=\Delta t} and the two events could be connected by a light signal, so event 2 could be caused by event 1. If {\alpha<1}, then {\Delta x<\Delta t} so it is possible to travel from {x_{1}} to {x_{2}} at less than the speed of light, which means that event 2 could also be caused by event 1. If {\alpha>1} then it is impossible to travel from event 1 to event 2 at less than the speed of light, so the two events cannot be causally connected.

From the Lorentz transformations, we get

\displaystyle   \Delta t' \displaystyle  = \displaystyle  \gamma\left(\Delta t-\beta\Delta x\right)\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \gamma\Delta t\left(1-\alpha\beta\right) \ \ \ \ \ (6)

Since {\beta\le1}, if {\alpha\le1} then {\Delta t'} has the same sign as {\Delta t}, so both observers will always agree about the order in which the events occur. In other words, if event 1 can cause event 2, then event 1 must always precede event 2 in all reference frames.

However, if {\alpha>1}, then if {\frac{1}{\alpha}<\beta\le1}, the sign of {\Delta t'} is opposite to the sign of {\Delta t} so that the two observers will disagree about the order of the events. Since the two events are not causally connected in this case, there is no need for the observers to agree on their order.

Lorentz transformations: derivation from symmetry

Reference: Carroll, Bradley W. & Ostlie, Dale A. (2007), An Introduction to Modern Astrophysics, 2nd Edition; Pearson Education – Chapter 4, Problem 4.1.

Although we’ve already looked at special relativity several times in this blog, it’s worth working through Chapter 4 in Carroll & Ostlie since they offer a few different ways of looking at some of our previous results.

We can start with the Lorentz transformations. The derivation we studied most recently is that from Griffiths’s book on electromagnetism, in which he first derives the time dilation and length contraction effects and then uses these to derive the Lorentz transformations. Carroll & Ostlie take a somewhat simpler and more elegant approach, but there are still a few points that could be filled in.

The arguments rely on using various symmetries, and also the postulate of the constancy of the speed of light to finish things off.

First, we can use translational invariance to show that the Lorentz transformations must be linear. We’ve already shown that this is the case using a rather involved argument, but in fact there is a simple criterion that can be applied. We start by using the line-painting thought experiment to show that coordinates perpendicular to the direction of relative motion are unaffected, so if we use our usual two coordinate systems {S} and {S'}, with {S} at rest relative to the observer, {S'} moving with speed {u} in the {+x} direction and the two frames aligned so that all three of their coordinate axis pairs ({x} and {x'}, {y} and {y'} and {z} and {z'}) are parallel with the origins coinciding at {t=t'=0}, then

\displaystyle y' \displaystyle = \displaystyle y\ \ \ \ \ (1)
\displaystyle z' \displaystyle = \displaystyle z \ \ \ \ \ (2)

For the remaining two coordinates, the most general linear transformation is

\displaystyle x' \displaystyle = \displaystyle a_{11}x+a_{12}y+a_{13}z+a_{14}t\ \ \ \ \ (3)
\displaystyle t' \displaystyle = \displaystyle a_{41}x+a_{42}y+a_{43}z+a_{44}t \ \ \ \ \ (4)

Why linear? Well, suppose we consider the length of a rod in the two frames. The rod is at rest in {S} with one endpoint at {x_{1}=0} and the other at {x_{2}=L}. We know that {S} and {S'} disagree about the length of the rod, but one thing we are sure of is that each observer will obtain only one result for the length. In {S} the length is {L} and in {S'} the length is {L'}. But suppose we changed the origin in {S} and {S'} by shifting it along the {x} axis by a distance of 1. Then in {S}, where the rod is at rest, the coordinates of its endpoints are now {x_{1}=-1} and {x_{2}=L-1} so that the length is still given by {x_{2}-x_{1}=L}. Now whatever the Lorentz transformation is, it has to give {L'} for the length as measured by {S'}. In the original frames (before we shifted the origins) the length in {S'} (taking the rod to lie on the {x} axis so that {y=z=0})

\displaystyle L' \displaystyle = \displaystyle x_{2}'-x_{1}'\ \ \ \ \ (5)
\displaystyle \displaystyle = \displaystyle a_{11}\left(x_{2}-x_{1}\right)+a_{14}\left(t_{2}-t_{1}\right)\ \ \ \ \ (6)
\displaystyle \displaystyle = \displaystyle a_{11}\left(x_{2}-x_{1}\right)\ \ \ \ \ (7)
\displaystyle \displaystyle = \displaystyle a_{11}\left(L-0\right)\ \ \ \ \ (8)
\displaystyle \displaystyle = \displaystyle a_{11}L \ \ \ \ \ (9)

where the third line is true because the two events defining the measurement of the length of the rod occur at the same time in {S} so {t_{1}=t_{2}}. Now if we use the shifted origins

\displaystyle L' \displaystyle = \displaystyle a_{11}\left(x_{2}-x_{1}\right)+a_{14}\left(t_{2}-t_{1}\right)\ \ \ \ \ (10)
\displaystyle \displaystyle = \displaystyle a_{11}\left(L-1-\left(-1\right)\right)\ \ \ \ \ (11)
\displaystyle \displaystyle = \displaystyle a_{11}L \ \ \ \ \ (12)

Thus shifting the origin leaves the length {L'} the same. However, if we use a non-linear transformation {f\left(x\right)} instead of the {a_{11}x} term, then with the original origins

\displaystyle L'=f\left(L\right)-f\left(0\right) \ \ \ \ \ (13)

and with the shifted origins

\displaystyle L'=f\left(L-1\right)-f\left(-1\right) \ \ \ \ \ (14)

and in general these two values won’t be equal. Even if we do find some non-linear function that gives the same values for these particular choices for {x_{1}} and {x_{2}}, what we really need is a transformation that gives the same values for {L'} for all lengths, at any location on the {x} axis. The only transformation that does that is linear.

So much for translational symmetry. Next, we can apply rotational symmetry. If we rotate both coordinates systems by {180^{\circ}} about the {x} axis so that {y} goes to {-y} and {z} to {-z}, all we’ve done is change the coordinate system used to describe the problem; we haven’t actually changed any of the events that occur. Thus, the equations 3 and 4 must give the same results with {y\rightarrow-y} and {z\rightarrow-z}. By choosing an event with {y\ne0} and {z=0} we have

\displaystyle a_{11}x+a_{12}y+a_{14}t \displaystyle = \displaystyle a_{11}x-a_{12}y+a_{14}t\ \ \ \ \ (15)
\displaystyle a_{12}y \displaystyle = \displaystyle -a_{12}y \ \ \ \ \ (16)

so we conclude that {a_{12}=0}. Choosing {y=0} and {z\ne0} gives us {a_{13}=0}. Thus 3 becomes

\displaystyle x'=a_{11}x+a_{14}t \ \ \ \ \ (17)

 

A similar argument applied to 4 gives {a_{42}=a_{43}=0} so

\displaystyle t'=a_{41}x+a_{44}t \ \ \ \ \ (18)

The origin of {S'} is moving to the right with speed {u} in {S}, so at time {t} its {x} coordinate is {x=ut}, but {x'=0} always since the origin of {S'} is at rest in {S'}. Therefore 17 becomes for this origin

\displaystyle 0 \displaystyle = \displaystyle a_{11}ut+a_{14}t\ \ \ \ \ (19)
\displaystyle a_{14} \displaystyle = \displaystyle -a_{11}u \ \ \ \ \ (20)

The transformations up to this point are

\displaystyle x' \displaystyle = \displaystyle a_{11}\left(x-ut\right)\ \ \ \ \ (21)
\displaystyle y' \displaystyle = \displaystyle y\ \ \ \ \ (22)
\displaystyle z' \displaystyle = \displaystyle z\ \ \ \ \ (23)
\displaystyle t' \displaystyle = \displaystyle a_{41}x+a_{44}t \ \ \ \ \ (24)

To get the final three constants, we need to invoke the constancy of the speed of light {c}. Suppose that at {t=t'=0} a pulse of light is generated at the common origins of {S} and {S'}. Because {c} is the same in both frames, both observers will see a spherical shell of light expand from their respective origins. The equations of this shell in the two frames are of the same form:

\displaystyle x^{2}+y^{2}+z^{2} \displaystyle = \displaystyle \left(ct\right)^{2}\ \ \ \ \ (25)
\displaystyle x'^{2}+y'^{2}+z'^{2} \displaystyle = \displaystyle \left(ct'\right)^{2} \ \ \ \ \ (26)

The last equation gives

\displaystyle a_{11}^{2}\left(x-ut\right)^{2}+y^{2}+z^{2} \displaystyle = \displaystyle c^{2}\left(a_{41}x+a_{44}t\right)^{2}\ \ \ \ \ (27)
\displaystyle x^{2}\left(a_{11}^{2}-a_{41}^{2}c^{2}\right)+y^{2}+z^{2} \displaystyle = \displaystyle t^{2}\left(c^{2}a_{44}^{2}-u^{2}a_{11}^{2}\right)+2xt\left(c^{2}a_{41}a_{44}+ua_{11}^{2}\right) \ \ \ \ \ (28)

Comparing this with 25 we get

\displaystyle a_{11}^{2}-a_{41}^{2}c^{2} \displaystyle = \displaystyle 1\ \ \ \ \ (29)
\displaystyle c^{2}a_{44}^{2}-u^{2}a_{11}^{2} \displaystyle = \displaystyle c^{2}\ \ \ \ \ (30)
\displaystyle c^{2}a_{41}a_{44}+ua_{11}^{2} \displaystyle = \displaystyle 0 \ \ \ \ \ (31)

Multiply the first equation by {u^{2}} and add to the second, and multiply the first equation by {u} and subtract the third:

\displaystyle c^{2}a_{44}^{2}-a_{41}^{2}u^{2}c^{2} \displaystyle = \displaystyle c^{2}+u^{2}\ \ \ \ \ (32)
\displaystyle -a_{41}^{2}c^{2}u-c^{2}a_{41}a_{44} \displaystyle = \displaystyle u \ \ \ \ \ (33)

Multiply the second equation by {-u} and add to the first:

\displaystyle c^{2}a_{44}^{2}+uc^{2}a_{44}a_{41}-c^{2} \displaystyle = \displaystyle 0\ \ \ \ \ (34)
\displaystyle a_{44}^{2}+ua_{44}a_{41}-1 \displaystyle = \displaystyle 0 \ \ \ \ \ (35)

Using 31:

\displaystyle a_{44}^{2}-\frac{u^{2}}{c^{2}}a_{44}^{2} \displaystyle = \displaystyle 1\ \ \ \ \ (36)
\displaystyle a_{44} \displaystyle = \displaystyle \frac{1}{\sqrt{1-u^{2}/c^{2}}} \ \ \ \ \ (37)

From 30:

\displaystyle a_{11}^{2} \displaystyle = \displaystyle \frac{c^{2}}{u^{2}}\left(a_{44}^{2}-1\right)\ \ \ \ \ (38)
\displaystyle \displaystyle = \displaystyle \frac{c^{2}}{u^{2}}\left(\frac{u^{2}/c^{2}}{1-u^{2}/c^{2}}\right)\ \ \ \ \ (39)
\displaystyle \displaystyle = \displaystyle \frac{1}{1-u^{2}/c^{2}}\ \ \ \ \ (40)
\displaystyle a_{11} \displaystyle = \displaystyle \frac{1}{\sqrt{1-u^{2}/c^{2}}} \ \ \ \ \ (41)

Finally, from 29

\displaystyle a_{41}^{2} \displaystyle = \displaystyle \frac{1}{c^{2}}\left(a_{11}^{2}-1\right)\ \ \ \ \ (42)
\displaystyle \displaystyle = \displaystyle \frac{1}{c^{2}}\left(\frac{u^{2}/c^{2}}{1-u^{2}/c^{2}}\right)\ \ \ \ \ (43)
\displaystyle a_{41} \displaystyle = \displaystyle \frac{u}{c^{2}\sqrt{1-u^{2}/c^{2}}} \ \ \ \ \ (44)

Putting it all together gives the familiar Lorentz transformations:

\displaystyle x' \displaystyle = \displaystyle \frac{1}{\sqrt{1-u^{2}/c^{2}}}\left(x-ut\right)\ \ \ \ \ (45)
\displaystyle y' \displaystyle = \displaystyle y\ \ \ \ \ (46)
\displaystyle z' \displaystyle = \displaystyle z\ \ \ \ \ (47)
\displaystyle t' \displaystyle = \displaystyle \frac{t-ux/c^{2}}{\sqrt{1-u^{2}/c^{2}}} \ \ \ \ \ (48)

Rapidity

References: Griffiths, David J. (2007), Introduction to Electrodynamics, 3rd Edition; Pearson Education – Chapter 12, Post 19.

An alternative way of writing the Lorentz transformations is to define a quantity called the rapidity:

\displaystyle  \theta\equiv\tanh^{-1}\beta \ \ \ \ \ (1)

Using this definition, we have

\displaystyle   \gamma \displaystyle  = \displaystyle  \frac{1}{\sqrt{1-\beta^{2}}}\ \ \ \ \ (2)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{\sqrt{1-\tanh^{2}\theta}}\ \ \ \ \ (3)
\displaystyle  \displaystyle  = \displaystyle  \frac{\cosh\theta}{\sqrt{\cosh^{2}\theta-\sinh^{2}\theta}}\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  \cosh\theta \ \ \ \ \ (5)

since {\cosh^{2}\theta-\sinh^{2}\theta=1}.

Also

\displaystyle  \gamma\beta=\cosh\theta\tanh\theta=\sinh\theta \ \ \ \ \ (6)

so

\displaystyle   \Lambda \displaystyle  = \displaystyle  \left[\begin{array}{cccc} \gamma & -\beta\gamma & 0 & 0\\ -\beta\gamma & \gamma & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right]\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{cccc} \cosh\theta & -\sinh\theta & 0 & 0\\ -\sinh\theta & \cosh\theta & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (8)

This is similar to a rotation through an angle {\theta} in 3-d space, except both the sinh terms are negative.

The velocity addition formula becomes

\displaystyle   \bar{u} \displaystyle  = \displaystyle  \frac{u+v}{1+uv/c^{2}}\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \frac{\beta_{u}+\beta_{v}}{1+\beta_{u}\beta_{v}}c\ \ \ \ \ (10)
\displaystyle  \beta_{\bar{u}} \displaystyle  = \displaystyle  \frac{\tanh\theta_{u}+\tanh\theta_{v}}{1+\tanh\theta_{u}\tanh\theta_{v}}\ \ \ \ \ (11)
\displaystyle  \tanh\theta_{\bar{u}} \displaystyle  = \displaystyle  \tanh\left(\theta_{u}+\theta_{v}\right) \ \ \ \ \ (12)

where in the last line we’ve used the formula for the tanh of a sum of two arguments.

The rapidities therefore simply add, giving a simpler measure of relativistic velocity:

\displaystyle  \theta_{\bar{u}}=\theta_{u}+\theta_{v} \ \ \ \ \ (13)

Compound Lorentz transformations

References: Griffiths, David J. (2007), Introduction to Electrodynamics, 3rd Edition; Pearson Education – Chapter 12, Post 18.

The Lorentz transformations can be written in matrix form as

\displaystyle  \Lambda_{x}=\left[\begin{array}{cccc} \gamma & -\beta\gamma & 0 & 0\\ -\beta\gamma & \gamma & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (1)

where the 0 ({ct}) component is the first row and first column, followed by the 1, 2, and 3 directions in order. This matrix is for relative motion along the 1 axis.

The Galilean transformations can be written as a matrix as well, where the first coordinate is just {t} rather than {ct}:

\displaystyle  \Gamma=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ -v & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (2)

or if we want to use the same symbols as in the Lorentz case, where the top row of {\Gamma} is a {ct} coordinate, we can write

\displaystyle  \Gamma=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ -\beta & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (3)

The Lorentz transformation along the 2 ({y}) axis is obtained by putting the transformation terms in row and column 2:

\displaystyle  \Lambda_{y}=\left[\begin{array}{cccc} \gamma & 0 & -\beta\gamma & 0\\ 0 & 1 & 0 & 0\\ -\beta\gamma & 0 & \gamma & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (4)

If we apply a Lorentz transformation first in the {x} and then in the {y} direction (with different relative velocities), we get the compound matrix:

\displaystyle   \Lambda_{y}\Lambda_{x} \displaystyle  = \displaystyle  \left[\begin{array}{cccc} \gamma_{y} & 0 & -\beta_{y}\gamma_{y} & 0\\ 0 & 1 & 0 & 0\\ -\beta_{y}\gamma_{y} & 0 & \gamma_{y} & 0\\ 0 & 0 & 0 & 1 \end{array}\right]\left[\begin{array}{cccc} \gamma_{x} & -\beta_{x}\gamma_{x} & 0 & 0\\ -\beta_{x}\gamma_{x} & \gamma_{x} & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right]\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{cccc} \gamma_{y}\gamma_{x} & -\gamma_{y}\gamma_{x}\beta_{x} & -\beta_{y}\gamma_{y} & 0\\ -\beta_{x}\gamma_{x} & \gamma_{x} & 0 & 0\\ -\beta_{y}\gamma_{y}\gamma_{x} & \gamma_{y}\gamma_{x}\beta_{x}\beta_{y} & \gamma_{y} & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (6)

Note that although {\Lambda_{x}} and {\Lambda_{y}} are both symmetric, their product is not. This means that applying the transformations in the opposite order gives a different result.

\displaystyle   \Lambda_{x}\Lambda_{y} \displaystyle  = \displaystyle  \Lambda_{x}^{T}\Lambda_{y}^{T}\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  \left(\Lambda_{y}\Lambda_{x}\right)^{T}\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{cccc} \gamma_{y}\gamma_{x} & -\beta_{x}\gamma_{x} & -\beta_{y}\gamma_{y}\gamma_{x} & 0\\ -\gamma_{y}\gamma_{x}\beta_{x} & \gamma_{x} & \gamma_{y}\gamma_{x}\beta_{x}\beta_{y} & 0\\ -\beta_{y}\gamma_{y} & 0 & \gamma_{y} & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (9)