Featured post

Welcome to Physics Pages

This blog consists of my notes and solutions to problems in various areas of mainstream physics. An index to the topics covered is contained in the links in the sidebar on the right, or in the menu at the top of the page.

This isn’t a “popular science” site, in that most posts use a fair bit of mathematics to explain their concepts. Thus this blog aims mainly to help those who are learning or reviewing physics in depth. More details on what the site contains and how to use it are on the welcome page.

Despite Stephen Hawking’s caution that every equation included in a book (or, I suppose in a blog) would halve the readership, this blog has proved very popular since its inception in December 2010. Details of the number of visits and distinct visitors are given on the hit statistics page.

Many thanks to my loyal followers and best wishes to everyone who visits. I hope you find it useful. Constructive criticism (or even praise) is always welcome, so feel free to leave a comment in response to any of the posts.

Before leaving a comment, you may find it useful to read the “Instructions for commenters“.

Lagrangian for the Schrödinger equation

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 3, Section 3.1.

As a prelude to ‘proper’ quantum field theory, we’ll look first at turning the non-relativistic quantum theory based on the Schrödinger equation into a field theory. Before we develop a quantum field theory of the Schrödinger equation, we’ll first look at this equation treating the wave function {\psi\left(\mathbf{x},t\right)} as a classical (that is, non-quantum) field. The Schrödinger equation is

\displaystyle i\hbar\frac{\partial\psi}{\partial t}=-\frac{\hbar^{2}}{2m}\nabla^{2}\psi+V\left(\mathbf{x},t\right)\psi \ \ \ \ \ (1)

 

where {V\left(\mathbf{x},t\right)} is, as usual, the potential function.

In order to apply the techniques of classical field theory, we need a Lagrangian density {\mathcal{L}}. There doesn’t seem to be any way of actually deriving Lagrangian densities; presumably they are found through trial and error, with perhaps a bit of physical intuition. In any case, the Lagrangian density for the Schrödinger equation turns out to be

\displaystyle \mathcal{L}\left(\psi,\nabla\psi,\dot{\psi}\right)=i\hbar\psi^*\dot{\psi}-\frac{\hbar^{2}}{2m}\nabla\psi^*\cdot\nabla\psi-V\left(\mathbf{x},t\right)\psi^*\psi \ \ \ \ \ (2)

 

As {\psi} is a complex function, it has real and imaginary parts, so we can treat {\psi} and {\psi^*} as independent fields. As we saw earlier, we can derive the Euler-Lagrange equations for multiple fields from the principle of least action and end up with

\displaystyle \frac{\partial\mathcal{L}}{\partial\phi^{r}}-\frac{\partial}{\partial q^{\mu}}\left(\frac{\partial\mathcal{L}}{\partial\phi_{,\mu}^{r}}\right)=0 \ \ \ \ \ (3)

 

where the {\phi^{r}} are the fields and {q^{\mu}=\left(\mathbf{x},t\right)}. In this case, the two fields are {\psi} and {\psi^*} and we get the two equations

\displaystyle \frac{\partial\mathcal{L}}{\partial\psi}-\frac{\partial}{\partial x^{i}}\frac{\partial\mathcal{L}}{\partial\psi_{,i}}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}} \displaystyle = \displaystyle \frac{\partial\mathcal{L}}{\partial\psi}-\nabla\cdot\frac{\partial\mathcal{L}}{\partial\nabla\psi}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}}=0\ \ \ \ \ (4)
\displaystyle \frac{\partial\mathcal{L}}{\partial\psi^*}-\frac{\partial}{\partial x^{i}}\frac{\partial\mathcal{L}}{\partial\psi_{,i}^*}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}^*} \displaystyle = \displaystyle \frac{\partial\mathcal{L}}{\partial\psi^*}-\nabla\cdot\frac{\partial\mathcal{L}}{\partial\nabla\psi^*}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}^*}=0 \ \ \ \ \ (5)

The second term in each row just introduces the gradient sign {\nabla} as a shorthand for the {\frac{\partial}{\partial x^{i}}\frac{\partial\mathcal{L}}{\partial\psi_{,i}}} and {\frac{\partial}{\partial x^{i}}\frac{\partial\mathcal{L}}{\partial\psi_{,i}^*}} terms.

We can plug 2 into these two equations to verify that we recover the original Schrödinger equation 1 and its complex conjugate. From 4 we have

\displaystyle \frac{\partial\mathcal{L}}{\partial\psi} \displaystyle = \displaystyle -V\left(\mathbf{x},t\right)\psi^*\ \ \ \ \ (6)
\displaystyle \nabla\cdot\frac{\partial\mathcal{L}}{\partial\nabla\psi} \displaystyle = \displaystyle -\frac{\hbar^{2}}{2m}\nabla^{2}\psi^*\ \ \ \ \ (7)
\displaystyle \frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}} \displaystyle = \displaystyle i\hbar\dot{\psi}^*\ \ \ \ \ (8)
\displaystyle \frac{\partial\mathcal{L}}{\partial\psi}-\nabla\cdot\frac{\partial\mathcal{L}}{\partial\nabla\psi}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}} \displaystyle = \displaystyle -i\hbar\dot{\psi}^*+\frac{\hbar^{2}}{2m}\nabla^{2}\psi^*-V\left(\mathbf{x},t\right)\psi^*=0\ \ \ \ \ (9)
\displaystyle -i\hbar\dot{\psi}^* \displaystyle = \displaystyle \frac{\hbar^{2}}{2m}\nabla^{2}\psi^*-V\left(\mathbf{x},t\right)\psi^* \ \ \ \ \ (10)

which is the complex conjugate of 1. Plugging 2 into 5 just reproduces 1.

The conjugate momentum density {\pi} can be calculated for the two fields {\psi} and {\psi^*}. We get

\displaystyle \pi_{1}\left(\mathbf{x},t\right) \displaystyle = \displaystyle \frac{\partial\mathcal{L}}{\partial\dot{\psi}}=i\hbar\psi^*\left(\mathbf{x},t\right)\ \ \ \ \ (11)
\displaystyle \pi_{2}\left(\mathbf{x},t\right) \displaystyle = \displaystyle \frac{\partial\mathcal{L}}{\partial\dot{\psi}^*}=0 \ \ \ \ \ (12)

The Hamiltonian density is defined as

\displaystyle \mathcal{H} \displaystyle = \displaystyle \sum_{r}\pi_{r}\dot{\phi}^{r}-\mathcal{L}\ \ \ \ \ (13)
\displaystyle \displaystyle = \displaystyle i\hbar\psi^*\dot{\psi}-\left[i\hbar\psi^*\dot{\psi}-\frac{\hbar^{2}}{2m}\nabla\psi^*\cdot\nabla\psi-V\left(\mathbf{x},t\right)\psi^*\psi\right]\ \ \ \ \ (14)
\displaystyle \displaystyle = \displaystyle \frac{\hbar^{2}}{2m}\nabla\psi^*\cdot\nabla\psi+V\left(\mathbf{x},t\right)\psi^*\psi \ \ \ \ \ (15)

The total Hamiltonian is the integral of this over 3-d space:

\displaystyle H \displaystyle = \displaystyle \int d^{3}x\;\mathcal{H}\ \ \ \ \ (16)
\displaystyle \displaystyle = \displaystyle \int d^{3}x\left[\frac{\hbar^{2}}{2m}\nabla\psi^*\cdot\nabla\psi+V\left(\mathbf{x},t\right)\psi^*\psi\right] \ \ \ \ \ (17)

We can integrate the first term by parts, by integrating the {\nabla\psi^*} term and invoking the usual assumption that {\psi^*\rightarrow0} fast enough at infinity that the integrated term is zero. We then get

\displaystyle H \displaystyle = \displaystyle \int d^{3}x\left[-\frac{\hbar^{2}}{2m}\psi^*\nabla^{2}\psi+V\left(\mathbf{x},t\right)\psi^*\psi\right]\ \ \ \ \ (18)
\displaystyle \displaystyle = \displaystyle \int d^{3}x\;\psi^*\left[-\frac{\hbar^{2}}{2m}\nabla^{2}\psi+V\left(\mathbf{x},t\right)\psi\right] \ \ \ \ \ (19)

Referring back to quantum mechanics for a moment, we see that this last integral is just {\left\langle \psi\left|\hat{H}\right|\psi\right\rangle }, that is, the expectation value of the Hamiltonian operator, which is the total energy of the system.

Finally, we can write down the Poisson brackets, since these are general results for any field {\psi} and its conjugate momentum {\pi}:

\displaystyle \left\{ \psi\left(\mathbf{x},t\right),\pi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle = \displaystyle \delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right)\ \ \ \ \ (20)
\displaystyle \left\{ \phi\left(\mathbf{x},t\right),\phi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle = \displaystyle 0\ \ \ \ \ (21)
\displaystyle \left\{ \pi\left(\mathbf{x},t\right),\pi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle = \displaystyle 0 \ \ \ \ \ (22)

These brackets will be used later when we quantize the theory.

Noether’s theorem and conservation of angular momentum

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Now that we’ve seen that a general Lorentz transformation can be represented as a product of a pure boost and a pure 3-d rotation, we can return to Noether’s theorem and see what conserved property it predicts when we require a physical system to be invariant under a Lorentz transformation. As usual, we consider an infinitesimal transformation, which we can write as

\displaystyle x^{\prime\mu}=x^{\mu}+\delta\omega^{\mu\nu}x_{\nu} \ \ \ \ \ (1)

 

where {\delta\omega^{\mu\nu}} is the infinitesimal rotation in 4-dimensional spacetime. Here, we are treating a pure boost as a rotation; for example, a boost in the {x_{1}} direction is given by the Lorentz transformation

\displaystyle x^{\prime}=\Lambda x \ \ \ \ \ (2)

 


where

\displaystyle \Lambda=\left[\begin{array}{cccc} \cosh\chi & \sinh\chi & 0 & 0\\ \sinh\chi & \cosh\chi & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (3)

 

for some ‘angle’ {\chi}. This reduces to the more familiar form found in introductory relativity courses if we set

\displaystyle \cosh\chi \displaystyle \equiv \displaystyle \gamma=\frac{1}{\sqrt{1-\beta^{2}}}\ \ \ \ \ (4)
\displaystyle \sinh\chi \displaystyle \equiv \displaystyle \beta\gamma=\frac{\beta}{\sqrt{1-\beta^{2}}} \ \ \ \ \ (5)

Returning to 1, we require, to first order in {\delta\omega_{\mu\nu}}, that the Minkowski length of the 4-vector is the same before and after the transformation. That is,

\displaystyle x^{\prime\mu}x_{\mu}^{\prime} \displaystyle = \displaystyle \left(x^{\mu}+\delta\omega^{\mu\sigma}x_{\sigma}\right)\left(x_{\mu}+\delta\omega_{\mu}^{\;\tau}x_{\tau}\right)\ \ \ \ \ (6)
\displaystyle \displaystyle = \displaystyle x^{\mu}x_{\mu}+\delta\omega^{\mu\sigma}x_{\sigma}x_{\mu}+\delta\omega_{\mu}^{\;\tau}x_{\tau}x^{\mu}\ \ \ \ \ (7)
\displaystyle \displaystyle = \displaystyle x^{\mu}x_{\mu}+\delta\omega^{\mu\sigma}x_{\sigma}x_{\mu}+\delta\omega^{\mu\tau}x_{\tau}x_{\mu}\ \ \ \ \ (8)
\displaystyle \displaystyle = \displaystyle x^{\mu}x_{\mu}+2\delta\omega^{\mu\nu}x_{\mu}x_{\nu} \ \ \ \ \ (9)

In the last line, we renamed the dummy indices {\sigma} and {\tau} to {\nu}. To first order, we require the last term in the last line to be zero for all {x_{\mu}}, which means we must impose a condition on {\delta\omega^{\mu\nu}}. We can write this term as

\displaystyle 2\delta\omega^{\mu\nu}x_{\mu}x_{\nu}=x_{\mu}x_{\nu}\left(\delta\omega^{\mu\nu}+\delta\omega^{\nu\mu}\right) \ \ \ \ \ (10)

From this, we see that we must have

\displaystyle \delta\omega^{\mu\nu}=-\delta\omega^{\nu\mu} \ \ \ \ \ (11)

so {\delta\omega^{\mu\nu}} must be antisymmetric.

Incidentally, if this condition seems to be violated in the pure boost matrix 3, remember that 2 is an ordinary matrix product, while the last term in 1 is the product of a tensor and 4-vector, and thus includes the effect of the metric tensor

\displaystyle g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (12)

 

To first order in {\chi}, 3 is

\displaystyle \Lambda=\left[\begin{array}{cccc} 1 & \chi & 0 & 0\\ \chi & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (13)

while for the infinitesimal rotation, we have

\displaystyle \delta\omega=\left[\begin{array}{cccc} 0 & -\chi & 0 & 0\\ \chi & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \end{array}\right] \ \ \ \ \ (14)

In matrix notation, 1 becomes

\displaystyle x^{\prime}=\delta\omega\times g\times x \ \ \ \ \ (15)

from which we can see that this gives the same result as 2.

In order to apply Noether’s theorem, we also need to know how the fields transform under a Lorentz transformation. The assumption is that, for infinitesimal transformations, the transformed field {\phi_{r}^{\prime}\left(x^{\prime}\right)} depends linearly on both the original fields {\phi_{r}\left(x\right)} and on the rotation {\delta\omega_{\mu\nu}}. That is, we assume that

\displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right)+\frac{1}{2}\delta\omega_{\mu\nu}\left(I^{\mu\nu}\right)_{rs}\phi_{s}\left(x\right) \ \ \ \ \ (16)

 

where {I^{\mu\nu}} are the infinitesimal generators of the Lorentz transformation. G & R don’t really explain this, apart from giving a reference to another book, but we won’t need to delve into the details to get the result needed in this post, so we’ll leave it for now.

From here, it’s a matter of plugging 1 and 16 into the equations for Noether’s theorem and seeing what comes out. Noether’s theorem says that

\displaystyle \partial^{\mu}f_{\mu}\left(x\right)=0 \ \ \ \ \ (17)

 

where

\displaystyle f_{\mu}\left(x\right) \displaystyle \equiv \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\ \ \ \ \ (18)
\displaystyle \displaystyle = \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\delta\phi_{r}\left(x\right)-\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right)\right)\delta x^{\nu} \ \ \ \ \ (19)

From 16

\displaystyle \delta\phi_{r}\left(x\right) \displaystyle = \displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)-\phi_{r}\left(x\right)\ \ \ \ \ (20)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\delta\omega_{\mu\nu}\left(I^{\mu\nu}\right)_{rs}\phi_{s}\left(x\right)\ \ \ \ \ (21)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\delta\omega_{\nu\lambda}\left(I^{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right) \ \ \ \ \ (22)

In the last line, we renamed the indices {\mu} and {\nu} to {\nu} and {\lambda} respectively to avoid confusing the {\mu} in the first term of 19 with any of the indices in {\delta\phi_{r}\left(x\right)}.

and from 1

\displaystyle \delta x^{\nu} \displaystyle = \displaystyle x^{\prime\nu}-x^{\nu}\ \ \ \ \ (23)
\displaystyle \displaystyle = \displaystyle \delta\omega^{\nu\lambda}x_{\lambda} \ \ \ \ \ (24)

We also had the energy-momentum tensor

\displaystyle T_{\mu\nu}\equiv\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right) \ \ \ \ \ (25)

 

Putting all this together, we have

\displaystyle f_{\mu}\left(x\right)=\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\frac{1}{2}\delta\omega_{\nu\lambda}\left(I^{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-T_{\mu\nu}\delta\omega^{\nu\lambda}x_{\lambda} \ \ \ \ \ (26)

Because {\delta\omega^{\nu\lambda}=-\delta\omega^{\lambda\nu}}

\displaystyle T_{\mu\nu}\delta\omega^{\nu\lambda}x_{\lambda} \displaystyle = \displaystyle \frac{1}{2}\left[T_{\mu\nu}\delta\omega^{\nu\lambda}x_{\lambda}-T_{\mu\nu}\delta\omega^{\lambda\nu}x_{\lambda}\right]\ \ \ \ \ (27)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\left[T_{\mu\nu}\delta\omega^{\nu\lambda}x_{\lambda}-T_{\mu\lambda}\delta\omega^{\nu\lambda}x_{\nu}\right]\ \ \ \ \ (28)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\delta\omega^{\nu\lambda}\left[T_{\mu\nu}x_{\lambda}-T_{\mu\lambda}x_{\nu}\right] \ \ \ \ \ (29)

In the second line, we swapped the dummy indices {\lambda} and {\nu} in the second term, which is allowed because both indices are summed. Therefore

\displaystyle f_{\mu}\left(x\right) \displaystyle = \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\frac{1}{2}\delta\omega_{\nu\lambda}\left(I^{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-\frac{1}{2}\delta\omega^{\nu\lambda}\left[T_{\mu\nu}x_{\lambda}-T_{\mu\lambda}x_{\nu}\right]\ \ \ \ \ (30)
\displaystyle \displaystyle = \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\frac{1}{2}\delta\omega^{\nu\lambda}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-\frac{1}{2}\delta\omega^{\nu\lambda}\left[T_{\mu\nu}x_{\lambda}-T_{\mu\lambda}x_{\nu}\right]\ \ \ \ \ (31)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\delta\omega^{\nu\lambda}\left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-\left(T_{\mu\nu}x_{\lambda}-T_{\mu\lambda}x_{\nu}\right)\right]\ \ \ \ \ (32)
\displaystyle \displaystyle \equiv \displaystyle \frac{1}{2}\delta\omega^{\nu\lambda}M_{\mu\nu\lambda}\left(x\right) \ \ \ \ \ (33)

In the second line, we swapped the positions of the {\nu\lambda} indices on the terms {\delta\omega_{\nu\lambda}\left(I^{\nu\lambda}\right)_{rs}}. This is OK provided they are both summed over. The last line defines the term {M_{\mu\nu\lambda}\left(x\right)}.

Because the infinitesimal rotations are arbitrary (subject to the condition that {\delta\omega^{\nu\lambda}} is antisymmetric), we can choose all of them to be zero except for one. For each such choice, we have a different {f_{\mu}\left(x\right)}, which leads to a conservation law for each choice. From Noether’s theorem, the quantity that is conserved is the integral of {f_{0}} over 3-space, so we have the conserved quantities

\displaystyle M_{\nu\lambda} \displaystyle = \displaystyle \int d^{3}x\left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-\left(T_{0\nu}x_{\lambda}-T_{0\lambda}x_{\nu}\right)\right]\ \ \ \ \ (34)
\displaystyle \displaystyle = \displaystyle \int d^{3}x\left[T_{0\lambda}x_{\nu}-T_{0\nu}x_{\lambda}+\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)\right] \ \ \ \ \ (35)

From 25, we see that the first two terms are

\displaystyle T_{0\lambda}x_{\nu}-T_{0\nu}x_{\lambda} \displaystyle = \displaystyle x_{\nu}\frac{\partial\mathcal{L}}{\partial\dot{\phi}_{r}}\partial_{\lambda}\phi_{r}-x_{\lambda}\frac{\partial\mathcal{L}}{\partial\dot{\phi}_{r}}\partial_{\nu}\phi_{r}\ \ \ \ \ (36)
\displaystyle \displaystyle = \displaystyle x_{\nu}\pi_{r}\partial_{\lambda}\phi_{r}-x_{\lambda}\pi_{r}\partial_{\nu}\phi_{r} \ \ \ \ \ (37)

where {\pi_{r}} is the conjugate momentum density, defined by

\displaystyle \pi_{r}\equiv\frac{\partial\mathcal{L}}{\partial\dot{\phi}_{r}} \ \ \ \ \ (38)

Using the physical momentum density

\displaystyle p_{\lambda}=\pi_{r}\partial_{\lambda}\phi_{r} \ \ \ \ \ (39)

we find that

\displaystyle T_{0\lambda}x_{\nu}-T_{0\nu}x_{\lambda}=x_{\nu}p_{\lambda}-x_{\lambda}p_{\nu} \ \ \ \ \ (40)

This is one component of the angular momentum density {\mathbf{r}\times\mathbf{p}}, so the integral

\displaystyle \int d^{3}x\left(T_{0\lambda}x_{\nu}-T_{0\nu}x_{\lambda}\right) \ \ \ \ \ (41)

is one component of the total angular momentum.

The other term in 35 is

\displaystyle \int d^{3}x\;\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right) \ \ \ \ \ (42)

depends on the generators {I_{\nu\lambda}}, and thus on the specific way in which the fields transform. G & R tell us that this term describes the spin angular momentum, but at this stage, we just have to accept this on faith.

In any case, the overall conservation rule 35 shows that the sum of the ‘traditional’ angular momentum from the first two terms in the integrand, together with this mysterious other term, is a conserved quantity, so interpreting it as some other form of angular momentum seems reasonable. We’ll just have to wait and see how this plays out.

Lorentz transformation as product of a pure boost and pure rotation

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, we can now complete the demonstration that a general Lorentz transformation is the product of a pure boost (motion at a constant velocity) multiplied by a pure rotation. We’ll follow Corollary IV.2 in Jaffe’s article.

In the last post, we saw that we could write a general Lorentz transformation in the form

\displaystyle  \widehat{\Lambda x}=A\widehat{x}A^{\dagger} \ \ \ \ \ (1)

where {x} is the 4-vector of a spacetime event, {\Lambda} is the Lorentz transformation as a {4\times4} matrix, {A} is a {2\times2} matrix with complex elements and a hat over a symbol means we’re looking at the {2\times2} complex matrix representing that object. We also saw in the last post that this representation restricts

Jaffe goes through a rather involved proof that the transformation {\Lambda\left(A\right)} defined by 1 is a member of the physically relevant group with {\det\Lambda=+1} and {\Lambda_{00}\ge1}, but this involves a lot of somewhat obscure matrix theorems that I don’t want to get into here, and these techniques don’t seem to be required for the rest of the demonstration, so we’ll just accept this fact for now.

What we really want to do is find out how we can calculate {\Lambda} given the {2\times2} matrix {A}. We can do this by using the result we got earlier for the components of the 4-vector {x}:

\displaystyle  \widehat{x}=\sum_{\mu=0}^{3}x_{\mu}\sigma_{\mu} \ \ \ \ \ (2)

where the {\sigma_{\mu}} are four Hermitian matrices:

\displaystyle   \sigma_{0} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]=I\ \ \ \ \ (3)
\displaystyle  \sigma_{1} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & 1\\ 1 & 0 \end{array}\right]\ \ \ \ \ (4)
\displaystyle  \sigma_{2} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (5)
\displaystyle  \sigma_{3} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right] \ \ \ \ \ (6)

We can invert 2 to get

\displaystyle  x_{\nu}=\left\langle \sigma_{\nu},\widehat{x}\right\rangle =\frac{1}{2}\mbox{Tr}\left(\sigma_{\nu}\widehat{x}\right) \ \ \ \ \ (7)

Reverting back to the {4\times4} matrix {\Lambda} (no hat), we have

\displaystyle   x_{\mu}^{\prime} \displaystyle  = \displaystyle  \sum_{\nu=0}^{3}\Lambda\left(A\right)_{\mu\nu}x_{\nu}\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  \left(\Lambda\left(A\right)x\right)_{\mu}\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},\widehat{\Lambda\left(A\right)x}\right\rangle \ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},A\widehat{x}A^{\dagger}\right\rangle \ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},A\sum_{\nu=0}^{3}x_{\nu}\sigma_{\nu}A^{\dagger}\right\rangle \ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  \sum_{\nu=0}^{3}\left\langle \sigma_{\mu},A\sigma_{\nu}A^{\dagger}\right\rangle x_{\nu} \ \ \ \ \ (13)

We used 1 in the fourth line and 2 in the fifth line. Comparing the first and last lines, we see that

\displaystyle   \Lambda\left(A\right)_{\mu\nu} \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},A\sigma_{\nu}A^{\dagger}\right\rangle \ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}^{\dagger}A\sigma_{\nu}A^{\dagger}\right)\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}A\sigma_{\nu}A^{\dagger}\right) \ \ \ \ \ (16)

where in the last line we used the fact that all the {\sigma_{\mu}} are Hermitian so that {\sigma_{\mu}^{\dagger}=\sigma_{\mu}}.

In order for {\Lambda\left(A\right)} to be a valid Lorentz transformation, clearly its elements must be real numbers. We can show this is true as follows. The complex conjugate is represented by drawing a bar over a quantity. We get

\displaystyle  \overline{\Lambda\left(A\right)_{\mu\nu}}=\frac{1}{2}\mbox{Tr}\left(\overline{\sigma_{\mu}A\sigma_{\nu}A^{\dagger}}\right) \ \ \ \ \ (17)

We can now use the fact that the trace of a product of matrices remains unchanged if we cyclically permute the order of multiplication. In particular {\mbox{Tr}\left(XB^{\dagger}\right)=\mbox{Tr}\left(B^{\dagger}X\right)}. Also, {\mbox{Tr}\left(B^{\dagger}X\right)=\mbox{Tr}\left(\left(\overline{X^{\dagger}B}\right)^{T}\right)=\mbox{Tr}\left(\overline{X^{\dagger}B}\right)} since the trace of a matrix is equal to the trace of its transpose. In 17, we can set {X^{\dagger}=\sigma_{\mu}} and {B=A\sigma_{\nu}A^{\dagger}} and use the fact that the {\sigma_{\mu}} are all Hermitian so that {\sigma_{\mu}^{\dagger}=\sigma_{\mu}}:

\displaystyle   \overline{\Lambda\left(A\right)_{\mu\nu}}=\frac{1}{2}\mbox{Tr}\left(\overline{\sigma_{\mu}A\sigma_{\nu}A^{\dagger}}\right) \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\left(A\sigma_{\nu}A^{\dagger}\right)^{\dagger}\sigma_{\mu}\right)\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(A\sigma_{\nu}A^{\dagger}\sigma_{\mu}\right)\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}A\sigma_{\nu}A^{\dagger}\right)\ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\left(A\right)_{\mu\nu} \ \ \ \ \ (21)

where in the third line we cyclically permuted the matrices in the trace. Thus the elements of {\Lambda\left(A\right)} are real.

Now we consider two cases. First, suppose that {A=U}, where {U} is a unitary matrix, so that {U^{\dagger}=U^{-1}}. From 16 we find that {\Lambda\left(U\right)_{00}} is, using {\sigma_{0}=I}:

\displaystyle   \Lambda\left(U\right)_{00} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{0}U\sigma_{0}U^{\dagger}\right)\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(UU^{\dagger}\right)\ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}I\ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  1 \ \ \ \ \ (25)

The other elements in the first row and first column of {\Lambda} are all zero, as we can see by using 16 again:

\displaystyle   \Lambda\left(U\right)_{0i} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{0}U\sigma_{i}U^{\dagger}\right)\ \ \ \ \ (26)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(U\sigma_{i}U^{\dagger}\right)\ \ \ \ \ (27)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(U^{\dagger}U\sigma_{i}\right)\ \ \ \ \ (28)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}\right)\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (30)

since {\mbox{Tr}\sigma_{i}=0} for {i=1,2,3}. A similar argument works for the first column of {\Lambda\left(U\right)} as well:

\displaystyle   \Lambda\left(U\right)_{i0} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}U\sigma_{0}U^{\dagger}\right)\ \ \ \ \ (31)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}UU^{\dagger}\right)\ \ \ \ \ (32)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}\right)\ \ \ \ \ (33)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (34)

For the other elements, we have

\displaystyle   \Lambda\left(U\right)_{ij} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}U\sigma_{j}U^{\dagger}\right)\ \ \ \ \ (35)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{i}\left(U^{-1}\right)^{\dagger}\sigma_{j}U^{-1}\right)\ \ \ \ \ (36)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{j}U^{-1}\sigma_{i}\left(U^{-1}\right)^{\dagger}\right)\ \ \ \ \ (37)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\left(U^{-1}\right)_{ji}\ \ \ \ \ (38)
\displaystyle  \displaystyle  = \displaystyle  \left[\Lambda\left(U\right)\right]_{ji}^{-1} \ \ \ \ \ (39)

That is

\displaystyle  \left[\Lambda\left(U\right)\right]^{T}=\Lambda\left(U\right)^{-1} \ \ \ \ \ (40)

so that

\displaystyle  \Lambda=\left[\begin{array}{cc} 1 & 0\\ 0 & \mathcal{R} \end{array}\right] \ \ \ \ \ (41)

where {\mathcal{R}} is a {3\times3} matrix, and the 0s represent 3 zero components in the top row and first column. In other words, when {A=U}, {\Lambda} is a pure rotation.

The other case we need to examine is when {A=H}, where {H} is a Hermitian matrix, so that {H^{\dagger}=H}. In that case, from 16

\displaystyle   \Lambda\left(H\right)_{\mu\nu} \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\mu}H\sigma_{\nu}H\right)\ \ \ \ \ (42)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(H\sigma_{\mu}H\sigma_{\nu}\right)\ \ \ \ \ (43)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left(\sigma_{\nu}H\sigma_{\mu}H\right)\ \ \ \ \ (44)
\displaystyle  \displaystyle  = \displaystyle  \Lambda\left(H\right)_{\nu\mu} \ \ \ \ \ (45)

so {\Lambda\left(H\right)} is a symmetric matrix. (We used two cyclic permutations in the trace here.) Although we haven’t proved that a symmetric Lorentz transformation always represents a pure boost, this has been verified (see, for example, Wikipedia; I can’t be bothered going through it all here).

Now we are ready to get our final result. To do this, we need to use a theorem from matrix algebra which says that every matrix {A} in the group {SL\left(2,\mathbb{C}\right)} (that is, a {2\times2} matrix with complex elements and determinant +1) has a unique polar decomposition into a strictly positive Hermitian matrix {H} and a unitary matrix {U}, so that we always have

\displaystyle  A=HU \ \ \ \ \ (46)

To connect this with what we’ve done above, we can define

\displaystyle   H \displaystyle  = \displaystyle  \left(AA^{\dagger}\right)^{1/2}\ \ \ \ \ (47)
\displaystyle  U \displaystyle  = \displaystyle  H^{-1}A=\left(AA^{\dagger}\right)^{1/2}A \ \ \ \ \ (48)

[The square root of a matrix is defined to be the matrix {S=A^{1/2}} so that {S^{2}=A}.] This definition is consistent with {H} being Hermitian, since

\displaystyle   \left(S^{2}\right)^{\dagger} \displaystyle  = \displaystyle  A^{\dagger}=A\ \ \ \ \ (49)
\displaystyle  \displaystyle  = \displaystyle  \left(SS\right)^{\dagger}\ \ \ \ \ (50)
\displaystyle  \displaystyle  = \displaystyle  \left(S^{\dagger}\right)^{2}\ \ \ \ \ (51)
\displaystyle  \displaystyle  = \displaystyle  S^{2} \ \ \ \ \ (52)

Thus if we restrict {S} to be the positive square root, we must have {S^{\dagger}=S}.

The definition is also consistent with {U} being unitary, since

\displaystyle   UU^{\dagger} \displaystyle  = \displaystyle  \left(H^{-1}A\right)\left(H^{-1}A\right)^{\dagger}\ \ \ \ \ (53)
\displaystyle  \displaystyle  = \displaystyle  H^{-1}AA^{\dagger}H^{-1}\ \ \ \ \ (54)
\displaystyle  \displaystyle  = \displaystyle  \left(AA^{\dagger}\right)^{-1/2}AA^{\dagger}\left(AA^{\dagger}\right)^{-1/2}\ \ \ \ \ (55)
\displaystyle  \displaystyle  = \displaystyle  I \ \ \ \ \ (56)

[We define {\left(AA^{\dagger}\right)^{-1/2}} to be the inverse of {\left(AA^{\dagger}\right)^{1/2}}.]

Therefore, we can uniquely decompose any Lorentz transformation {\Lambda\left(A\right)} into

\displaystyle  \Lambda\left(A\right)=\Lambda\left(H\right)\Lambda\left(U\right) \ \ \ \ \ (57)

that is, the product of a pure rotation and a pure boost.

Lorentz transformations and the special linear group SL(2,C)

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, we start off with the representation of a spacetime 4-vector as a {2\times2} complex Hermitian matrix:

\displaystyle  \widehat{x}\equiv\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right] \ \ \ \ \ (1)

Our ultimate goal is to show that any Lorentz transformation can be represented as the product of a pure rotation {R} and a pure boost {B}: {\Lambda=RB}. The step shown in this post may look like little more than an exercise in matrix algebra, but be patient; it takes a while to get to our final goal.

We start by looking at the matrices belonging to the special linear group {SL\left(2,\mathbb{C}\right)}, which consists of {2\times2} matrices containing general complex numbers as elements, and with determinant 1. Each matrix {A\in SL\left(2,\mathbb{C}\right)} can be used to define a linear transformation of the Hermitian matrix 1:

\displaystyle  \widehat{x}^{\prime}=A\widehat{x}A^{\dagger} \ \ \ \ \ (2)

Because the determinant of a product is equal to the product of the determinants, and {\det A=\det A^{\dagger}=1}, {\det\widehat{x}^{\prime}=\det\widehat{x}=x_{\mu}x^{\mu}}. Thus such a transformation leaves the 4-vector length unchanged, so qualifies as a Lorentz transformation. Also, as a general complex {2\times2} matrix contains 4 elements, each with a real and imaginary part, there are 8 parameters. The condition {\det A=1} provides 2 constraints (one on the real part and one on the imaginary part), leaving 6 independent parameters, which is the same as the number of free parameters in a general Lorentz transformation.

We can give a more detailed proof that {A} provides a Lorentz transformation as follows. Suppose we start with two matrices {A,B\in SL\left(2,\mathbb{C}\right)} and define a transformation

\displaystyle  \widehat{x}^{\prime}=A\widehat{x}B \ \ \ \ \ (3)

[Remember that the hats on {\widehat{x}} and {\widehat{x}^{\prime}} mean that we’re considering the {2\times2} matrix version 1 of the 4-vectors {x} and {x^{\prime}}.] The transformed matrix {\widehat{x}^{\prime}} must be Hermitian for all {\widehat{x}}, so we must have

\displaystyle   \left(A\widehat{x}B\right)^{\dagger} \displaystyle  = \displaystyle  A\widehat{x}B\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  B^{\dagger}\widehat{x}A^{\dagger} \ \ \ \ \ (5)

We now left-multiply by {\left(B^{\dagger}\right)^{-1}} and right-multiply by {B^{-1}} to get

\displaystyle  \left(B^{\dagger}\right)^{-1}A\widehat{x}=\widehat{x}A^{\dagger}B^{-1} \ \ \ \ \ (6)

But we also have

\displaystyle  \left(B^{\dagger}\right)^{-1}A=\left(A^{\dagger}B^{-1}\right)^{\dagger} \ \ \ \ \ (7)

so the matrix

\displaystyle  T\equiv\left(B^{\dagger}\right)^{-1}A \ \ \ \ \ (8)

is Hermitian. We can therefore write 6 as

\displaystyle  T\widehat{x}=\widehat{x}T^{\dagger}=\widehat{x}T \ \ \ \ \ (9)

so {T} commutes with {\widehat{x}} for all {\widehat{x}}.

Now we can choose {x=\sigma_{2}} and then {x=\sigma_{3}}, where the {\sigma_{i}}s are two of the Pauli matrices which we showed (together with the identity matrix {\sigma_{0}}) form a basis for the space of {2\times2} Hermitian matrices. Now we’ve seen that{\sigma_{2}} and {\sigma_{3}} also form an irreducible set, and we saw that any matrix {T} that commutes with all the members of an irreducible set must be a multiple of the identity matrix. Thus we must have

\displaystyle  T=\lambda I \ \ \ \ \ (10)

for some constant {\lambda}. However, since {T} is the product of two matrices {A} and {\left(B^{\dagger}\right)^{-1}}, both of which have determinant 1, {\det T=1} also, which means that {\lambda^{2}=1} and {\lambda=\pm1}. Therefore

\displaystyle   \left(B^{\dagger}\right)^{-1}A \displaystyle  = \displaystyle  \pm I\ \ \ \ \ (11)
\displaystyle  A \displaystyle  = \displaystyle  \pm B^{\dagger} \ \ \ \ \ (12)

Thus the transformation 3 can be written as

\displaystyle  \widehat{x}^{\prime}=\pm A\widehat{x}A^{\dagger} \ \ \ \ \ (13)

To eliminate the {-} sign, suppose that

\displaystyle  \widehat{x}^{\prime}=-A\widehat{x}A^{\dagger} \ \ \ \ \ (14)

A Lorentz transformation giving this result can be written as

\displaystyle  \widehat{x}^{\prime}=\widehat{\Lambda x} \ \ \ \ \ (15)

where {\Lambda} is the {4\times4} matrix giving the Lorentz transformation of the original 4-vector {x}. In the original 4-vector notation, we have

\displaystyle   x_{\mu}^{\prime} \displaystyle  = \displaystyle  \sum_{\nu=0}^{3}\Lambda_{\mu\nu}x_{\nu}\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \left(\Lambda x\right)_{\mu} \ \ \ \ \ (17)

From the relation between the 4-vector and {2\times2} matrix representations, we have

\displaystyle  x_{\mu}^{\prime}=\left\langle \sigma_{\mu},\widehat{x}^{\prime}\right\rangle \ \ \ \ \ (18)

where {\left\langle \sigma_{\mu},\widehat{x}^{\prime}\right\rangle } is the inner product of the two matrices. Therefore from 14

\displaystyle   \left(\Lambda x\right)_{\mu} \displaystyle  = \displaystyle  \left\langle \sigma_{\mu},\widehat{x}^{\prime}\right\rangle \ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{\mu},A\widehat{x}A^{\dagger}\right\rangle \ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{\mu},A\left(\sum_{\nu=0}^{3}\sigma_{\nu}x_{\nu}\right)A^{\dagger}\right\rangle \ \ \ \ \ (21)

If we choose {x=\left(1,0,0,0\right)}, we have

\displaystyle   \left(\Lambda x\right)_{0} \displaystyle  = \displaystyle  \Lambda_{00}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{0},A\left(\sum_{\nu=0}^{3}\sigma_{\nu}x_{\nu}\right)A^{\dagger}\right\rangle \ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{0},A\sigma_{0}A^{\dagger}\right\rangle \ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  -\left\langle \sigma_{0},AA^{\dagger}\right\rangle \ \ \ \ \ (25)
\displaystyle  \displaystyle  = \displaystyle  -\frac{1}{2}\mbox{Tr}\left(AA^{\dagger}\right)\ \ \ \ \ (26)
\displaystyle  \displaystyle  \le \displaystyle  0 \ \ \ \ \ (27)

where the penultimate line follows from the definition of the inner product. The last line follows because

\displaystyle  \mbox{Tr}\left(AA^{\dagger}\right)=\left|A_{11}\right|^{2}+\left|A_{22}\right|^{2}\ge0 \ \ \ \ \ (28)

Since we’re requiring the transformation to be orthochronous, we must have {\Lambda_{00}\ge1}, so we must exclude the {-} sign in 13, giving 2.

Finally, we can show that the transformation matrix {A} is unique, up to a sign. We can prove this by supposing that there are two different {SL\left(2,\mathbb{C}\right)} matrices {A} and {B} that give the same transformation for all {\widehat{x}}, that is

\displaystyle  A\widehat{x}A^{\dagger}=B\widehat{x}B^{\dagger} \ \ \ \ \ (29)

This implies

\displaystyle   B^{-1}A\widehat{x}A^{\dagger}\left(B^{\dagger}\right)^{-1} \displaystyle  = \displaystyle  \widehat{x}\ \ \ \ \ (30)
\displaystyle  \displaystyle  = \displaystyle  B^{-1}A\widehat{x}\left(B^{-1}A\right)^{\dagger} \ \ \ \ \ (31)

We can now choose {\widehat{x}=I}, which shows that

\displaystyle  \left(B^{-1}A\right)^{\dagger}=\left(B^{-1}A\right)^{-1} \ \ \ \ \ (32)

which means (by definition), {B^{-1}A} is unitary, so for all {\widehat{x}}

\displaystyle  \widehat{x}=B^{-1}A\widehat{x}\left(B^{-1}A\right)^{-1} \ \ \ \ \ (33)

This means that {B^{-1}A} commutes with {\widehat{x}} for all {\widehat{x}} (that’s the only way we can cancel {B^{-1}A} off the RHS). Using the same argument as above, we can choose {\widehat{x}} to be two of the Pauli matrices, which form an irreducible set. Since {B^{-1}A} commutes with both these matrices, it must be a multiple {\lambda} of the identity:

\displaystyle   B^{-1}A \displaystyle  = \displaystyle  \lambda I\ \ \ \ \ (34)
\displaystyle  A \displaystyle  = \displaystyle  \lambda B \ \ \ \ \ (35)

Since {\det A=\det B=1} and for a {2\times2} matrix {\det\left(\lambda B\right)=\lambda^{2}\det B}, we have {\lambda^{2}=1}, so {\lambda=\pm1}. Therefore {A} is unique up to a sign.

In summary, what we’ve done in this post is show that a restricted Lorentz transformation {\Lambda} (that is, one where {\det\Lambda=+1} and {\Lambda_{00}\ge1}) can be represented by a matrix {A\in SL\left(2,\mathbb{C}\right)} where {A} is unique up to a sign.

Lorentz transformations as 2×2 matrices

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Continuing our examination of general Lorentz transformations, recall that a Lorentz transformation can be represented by a {4\times4} matrix {\Lambda} which preserves the Minkowski length {x_{\mu}x^{\mu}} of all four-vectors {x}. This leads to the condition

\displaystyle  \Lambda^{T}g\Lambda=g \ \ \ \ \ (1)

where {g} is the flat-space Minkowski metric

\displaystyle  g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (2)

It turns out that we can map any 4-vector {x} to a {2\times2} Hermitian matrix {\widehat{x}} defined as

\displaystyle  \widehat{x}\equiv\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right] \ \ \ \ \ (3)

[Recall that a Hermitian matrix {H} is equal to the complex conjugate of its transpose:

\displaystyle  H=\left(H^{T}\right)^*\equiv H^{\dagger} \ \ \ \ \ (4)

Also note that Jaffe uses an unconventional notation for the Hermitian conjugate, as he uses a superscript * rather that a superscript {\dagger}. This can be confusing since usually a superscript * indicates just complex conjugate, without the transpose. I’ll use the more usual superscript {\dagger} for Hermitian conjugate here.]

Although we’re used to the scalar product of two vectors, it is also useful to define the scalar product of two matrices as

\displaystyle  \left\langle A,B\right\rangle \equiv\frac{1}{2}\mbox{Tr}\left(A^{\dagger}B\right) \ \ \ \ \ (5)

where ‘Tr’ means the trace of a matrix, which is the sum of its diagonal elements. Note that the scalar product of {\widehat{x}} with itself is

\displaystyle   \left\langle \widehat{x},\widehat{x}\right\rangle \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right]\left[\begin{array}{cc} x_{0}+x_{3} & x_{1}-ix_{2}\\ x_{1}+ix_{2} & x_{0}-x_{3} \end{array}\right]\ \ \ \ \ (6)
\displaystyle  \displaystyle  \displaystyle  \frac{1}{2}\left[\left(x_{0}+x_{3}\right)^{2}+2\left(x_{1}-ix_{2}\right)\left(x_{1}+ix_{2}\right)+\left(x_{0}-x_{3}\right)^{2}\right]\ \ \ \ \ (7)
\displaystyle  \displaystyle  = \displaystyle  x_{0}^{2}+x_{1}^{2}+x_{2}^{2}+x_{3}^{2} \ \ \ \ \ (8)

The determinant of {\widehat{x}} is

\displaystyle   \det\widehat{x} \displaystyle  = \displaystyle  \left(x_{0}+x_{3}\right)\left(x_{0}-x_{3}\right)-\left(x_{1}-ix_{2}\right)\left(x_{1}+ix_{2}\right)\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  x_{0}^{2}-x_{1}^{2}-x_{2}^{2}-x_{3}^{2}\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  x_{\mu}x^{\mu} \ \ \ \ \ (11)

Thus {\det\widehat{x}} is the Minkowski length squared.

From 3, we observe that we can write {\widehat{x}} as a sum:

\displaystyle  \widehat{x}=\sum_{\mu=0}^{4}x_{\mu}\sigma_{\mu} \ \ \ \ \ (12)

where the {\sigma_{\mu}} are four Hermitian matrices:

\displaystyle   \sigma_{0} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]=I\ \ \ \ \ (13)
\displaystyle  \sigma_{1} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & 1\\ 1 & 0 \end{array}\right]\ \ \ \ \ (14)
\displaystyle  \sigma_{2} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (15)
\displaystyle  \sigma_{3} \displaystyle  = \displaystyle  \left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right] \ \ \ \ \ (16)

The last three are the Pauli spin matrices that we met when looking at spin-{\frac{1}{2}} in quantum mechanics.

The {\sigma_{\mu}} are orthonormal under the scalar product operation, as we can verify by direct calculation. For example

\displaystyle   \left\langle \sigma_{2},\sigma_{3}\right\rangle \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\left[\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right]\ \ \ \ \ (17)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\left(0+0\right)\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  0 \ \ \ \ \ (19)

And:

\displaystyle   \left\langle \sigma_{2},\sigma_{2}\right\rangle \displaystyle  = \displaystyle  \frac{1}{2}\mbox{Tr}\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\left[\begin{array}{cc} 0 & -i\\ i & 0 \end{array}\right]\ \ \ \ \ (20)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2}\left(1+1\right)\ \ \ \ \ (21)
\displaystyle  \displaystyle  = \displaystyle  1 \ \ \ \ \ (22)

The other products work out similarly, so we have

\displaystyle  \left\langle \sigma_{\mu},\sigma_{\nu}\right\rangle =\delta_{\mu\nu} \ \ \ \ \ (23)

We can work out the inverse transformation to 3 by taking the scalar product of 12 with {\sigma_{\nu}}:

\displaystyle   \left\langle \sigma_{\nu},\widehat{x}\right\rangle \displaystyle  = \displaystyle  \sum_{\mu=0}^{4}x_{\mu}\left\langle \sigma_{\nu},\sigma_{\mu}\right\rangle \ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  \sum_{\mu=0}^{4}x_{\mu}\delta_{\nu\mu}\ \ \ \ \ (25)
\displaystyle  \displaystyle  = \displaystyle  x_{\nu} \ \ \ \ \ (26)

Now a few more theorems that will be useful later.

Irreducible Sets of Matrices

A set of matrices {\mathfrak{U}} is called irreducible if the only matrix {C} that commutes with every matrix in {\mathfrak{U}} is the identity matrix {I} (or a multiple of {I}). Any two of the three Pauli matrices {\sigma_{i}}, {i=1,2,3} above form an irreducible set of {2\times2} Hermitian matrices. This can be shown by direct calculation, which Jaffe does in detail in his article. For example, if we define {C} to be some arbitrary matrix

\displaystyle  C=\left[\begin{array}{cc} a & b\\ c & d \end{array}\right] \ \ \ \ \ (27)

where {a,b,c,d} are complex numbers, then

\displaystyle   C\sigma_{1} \displaystyle  = \displaystyle  \left[\begin{array}{cc} b & a\\ d & c \end{array}\right]\ \ \ \ \ (28)
\displaystyle  \sigma_{1}C \displaystyle  = \displaystyle  \left[\begin{array}{cc} c & d\\ a & b \end{array}\right] \ \ \ \ \ (29)

If {C} is to commute with {\sigma_{1}}, we must therefore require {b=c} and {a=d}.

Similarly, for {\sigma_{2}} we have

\displaystyle   C\sigma_{2} \displaystyle  = \displaystyle  \left[\begin{array}{cc} ib & -ia\\ id & -ic \end{array}\right]\ \ \ \ \ (30)
\displaystyle  \sigma_{2}C \displaystyle  = \displaystyle  \left[\begin{array}{cc} -ic & -id\\ ia & ib \end{array}\right] \ \ \ \ \ (31)

so that {C\sigma_{2}=\sigma_{2}C} requires {b=-c} and {a=d}.

And for {\sigma_{3}}:

\displaystyle   C\sigma_{3} \displaystyle  = \displaystyle  \left[\begin{array}{cc} a & -b\\ c & -d \end{array}\right]\ \ \ \ \ (32)
\displaystyle  \sigma_{3}C \displaystyle  = \displaystyle  \left[\begin{array}{cc} a & b\\ -c & -d \end{array}\right] \ \ \ \ \ (33)

so that {C\sigma_{3}=\sigma_{3}C} requires {b=-b} and {c=-c}, so {b=c=0} (no conditions can be inferred for {a} or {d}).

If we form a set {\mathfrak{U}} containing {\sigma_{3}} and one of {\sigma_{1}} or {\sigma_{2}}, we see that {b=c=0} and {a=d}, so {C} is a multiple of {I}. If we form {\mathfrak{U}} from {\sigma_{1}} and {\sigma_{2}} we again have {a=d}, but we must have simultaneously {b=c} and {b=-c} which can be true only if {b=c=0}, so again {C} is a multiple of {I}.

Unitary Matrices

A unitary matrix is one whose Hermitian conjugate is its inverse, so that {U^{\dagger}=U^{-1}}. Some properties of unitary matrices are given on the Wikipedia page, so we’ll just use those without going through the proofs. First, a unitary matrix is normal, which means that {U^{\dagger}U=UU^{\dagger}} (this actually follows from the condition {U^{\dagger}=U^{-1}}). Second, there is another unitary matrix {V} which diagonalizes {U}, that is

\displaystyle  V^{\dagger}UV=D \ \ \ \ \ (34)

where {D} is a diagonal, unitary matrix.

Third,

\displaystyle  \left|\det U\right|=1 \ \ \ \ \ (35)

(The determinant can be complex, but has magnitude 1.)

From this it follows that {\left|\det D\right|=1} and since {D} is unitary and diagonal, each diagonal element {d_{j}} of {D} must satisfy {\left|d_{j}\right|=1}. (Remember that {d_{j}} could be a complex number.) That means that {d_{j}=e^{i\lambda_{j}}} for some real number {\lambda_{j}}, so we can write

\displaystyle  D=e^{i\Lambda} \ \ \ \ \ (36)

where {\Lambda} is a diagonal hermitian matrix containing only real elements, non-zero along its diagonal: {\Lambda_{ij}=\lambda_{j}\delta_{ij}}. As usual, the exponential of a matrix is interpreted in terms of its power series, so that

\displaystyle  e^{i\Lambda}=1+i\Lambda+\frac{\left(i\Lambda\right)^{2}}{2!}+\frac{\left(i\Lambda\right)^{3}}{3!}+\ldots \ \ \ \ \ (37)

For a diagonal matrix {\Lambda} with diagonal elements {\Lambda_{jj}=\lambda_{j}}, the diagonal elements of {\Lambda^{n}} are just {\Lambda_{jj}^{n}=\lambda_{j}^{n}}.

From 34, we have

\displaystyle   U \displaystyle  = \displaystyle  VDV^{\dagger}\ \ \ \ \ (38)
\displaystyle  \displaystyle  = \displaystyle  Ve^{i\Lambda}V^{\dagger} \ \ \ \ \ (39)

Now we also have, since {VV^{\dagger}=I}

\displaystyle   V\Lambda^{n}V^{\dagger} \displaystyle  = \displaystyle  V\Lambda\left(VV^{\dagger}\right)\Lambda\left(VV^{\dagger}\right)\ldots\Lambda V^{\dagger}\ \ \ \ \ (40)
\displaystyle  \displaystyle  = \displaystyle  \left(V\Lambda V^{\dagger}\right)^{n} \ \ \ \ \ (41)

Therefore, from 37

\displaystyle   U \displaystyle  = \displaystyle  Ve^{i\Lambda}V^{\dagger}\ \ \ \ \ (42)
\displaystyle  \displaystyle  = \displaystyle  e^{iV\Lambda V^{\dagger}}\ \ \ \ \ (43)
\displaystyle  \displaystyle  \equiv \displaystyle  e^{iH} \ \ \ \ \ (44)

where {H=V\Lambda V^{\dagger}} is another Hermitian matrix. In other words, we can always write a unitary matrix as the exponential of a Hermitian matrix.

In the case where {H} is a {2\times2} matrix, we can write it in terms of the {\sigma_{\mu}} matrices above as

\displaystyle  H=\sum_{\mu=0}^{3}a_{\mu}\sigma_{\mu} \ \ \ \ \ (45)

where the {a_{\mu}} are real, since the diagonal elements of a Hermitian matrix must be real. This follows because the {\sigma_{\mu}} form an orthonormal basis for the {2\times2} Hermitian matrices. [For some reason, Jaffe refers to the {a_{\mu}}as {\lambda_{\mu}} which is confusing since he has used {\lambda_{\mu}} as the diagonal elements of {\Lambda} above, and they’re not the same thing.]

If {\det U=+1}, then

\displaystyle   \det U \displaystyle  = \displaystyle  \det\left(VDV^{\dagger}\right)\ \ \ \ \ (46)
\displaystyle  \displaystyle  = \displaystyle  \det\left(VV^{\dagger}D\right)\ \ \ \ \ (47)
\displaystyle  \displaystyle  = \displaystyle  \det D\ \ \ \ \ (48)
\displaystyle  \displaystyle  = \displaystyle  \det e^{i\Lambda} \ \ \ \ \ (49)

The second line follows because the determinant of a product of matrices is the product of the determinants, so we can rearrange the multiplication order. To evaluate the last line, we observe that for a diagonal matrix {\Lambda}, using 37 and applying the result to each diagonal element

\displaystyle  e^{i\Lambda}=\left[\begin{array}{cc} e^{i\Lambda_{11}} & 0\\ 0 & e^{i\Lambda_{22}} \end{array}\right] \ \ \ \ \ (50)

Therefore

\displaystyle  \det e^{i\Lambda}=e^{i\left(\Lambda_{11}+\Lambda_{22}\right)}=e^{i\mbox{Tr}\Lambda} \ \ \ \ \ (51)

[By the way, the relation {\det e^{A}=e^{\mbox{Tr}A}} is actually true for any square matrix {A}, and is a corollary of Jacobi’s formula.]

We can now use the cyclic property of the trace (another matrix algebra theroem) which says that for 3 matrices {A,B,C},

\displaystyle  \mbox{Tr}\left(ABC\right)=\mbox{Tr}\left(CAB\right)=\mbox{Tr}\left(BCA\right) \ \ \ \ \ (52)

This gives us

\displaystyle  \mbox{Tr}H=\mbox{Tr}\left(V\Lambda V^{\dagger}\right)=\mbox{Tr}\left(V^{\dagger}V\Lambda\right)=\mbox{Tr}\Lambda \ \ \ \ \ (53)

Finally, from 45 and the fact that the traces of the {\sigma_{i}} are all zero for {i=1,2,3}, and {\mbox{Tr}\sigma_{0}=2}, we have

\displaystyle  \det U=\det e^{i\Lambda}=e^{i\mbox{Tr}H}=e^{2ia_{0}}=1 \ \ \ \ \ (54)

Thus {a_{0}=n\pi} for some integer {n}, but as all values of {n} give the same original unitary matrix {U}, we can choose {n=0} so that {a_{0}=0} and

\displaystyle  H=\sum_{\mu=1}^{3}a_{\mu}\sigma_{\mu} \ \ \ \ \ (55)

Lorentz transformations as rotations

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Before we apply Noether’s theorem to Lorentz transformations, we need to take a step back and look at a generalized version of the Lorentz transformation. Most introductory treatments of special relativity derive the Lorentz transformation as the transformation between two inertial frames that are moving at some constant velocity with respect to each other. This form of the transformations allows us to derive the usual consequences of special relativity such as length contraction and time dilation. However, it’s useful to look at a Lorentz transformation is a more general way.

The idea is to define a Lorentz transformation as any transformation that leaves the magnitude of all four-vectors {x} unchanged, where this magnitude is defined using the usual flat space metric {g^{\mu\nu}} so that

\displaystyle  x^{2}=x_{\mu}x^{\mu}=g^{\mu\nu}x_{\mu}x_{\nu}=x_{0}^{2}-x_{1}^{2}-x_{2}^{2}-x_{3}^{2} \ \ \ \ \ (1)

The flat space (Minkowski) metric is

\displaystyle  g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (2)

We know that the traditional Lorentz transformation between two inertial frames in relative motion satisfies this condition, but in fact a rotation of the coordinate system in 3-d space (leaving the time coordinate unchanged) also satisfies this condition, so a Lorentz transformation defined in this more general way includes more transformations than the traditional one.

We can define this general transformation in terms of a {4\times4} matrix {\Lambda}, so that a four-vector {x} transforms to another vector {x^{\prime}} according to

\displaystyle  x^{\prime}=\Lambda x \ \ \ \ \ (3)

We can define the scalar product of two 4-vectors using the notation

\displaystyle  \left\langle x,y\right\rangle \equiv\sum_{i=0}^{3}x_{i}y_{i} \ \ \ \ \ (4)

The scalar product in flat space using the Minkowski metric {g} is therefore

\displaystyle  \left\langle x,gy\right\rangle =g^{\mu\nu}x_{\mu}y_{\nu}=x_{0}y_{0}-x_{1}y_{1}-x_{2}y_{2}-x_{3}y_{3} \ \ \ \ \ (5)

In matrix notation, in which {x} and {y} are column vectors, this is

\displaystyle  \left\langle x,gy\right\rangle =x^{T}gy \ \ \ \ \ (6)

In this way, the condition that {\Lambda} leaves the magnitude unchanged is

\displaystyle  \left\langle \Lambda x,g\Lambda x\right\rangle =\left\langle x,gx\right\rangle \ \ \ \ \ (7)

for all {x}. In matrix notation, this is

\displaystyle  \left(\Lambda x\right)^{T}g\Lambda x=x^{T}\Lambda^{T}g\Lambda x=x^{T}gx \ \ \ \ \ (8)

from which we get one condition on {\Lambda}:

\displaystyle  \Lambda^{T}g\Lambda=g \ \ \ \ \ (9)

[Note that Jaffe uses a superscript {tr} to indicate a matrix transpose; I find this confusing as {tr} usually means the trace of a matrix, and a superscript {T} is more usual for the transpose.]

Because both sides of 9 refer to a symmetric matrix (on the LHS, {\left(\Lambda^{T}g\Lambda\right)^{T}=\Lambda^{T}g^{T}\left(\Lambda^{T}\right)^{T}=\Lambda^{T}g\Lambda}), this equation gives 10 independent equations for the elements of {\Lambda}, so the number of parameters that can be specified arbitrarily is {4\times4-10=6}.

The set {\mathcal{L}} of all Lorentz transformations forms a group under matrix multiplication, known as the Lorentz group. We can demonstrate this by showing that the four group properties are satisfied.

First, completeness. If we perform two transformations in succession on a 4-vector {x} then we get {x^{\prime}=\Lambda_{2}\Lambda_{1}x}. The compound transformation satisfies 9:

\displaystyle   \left(\Lambda_{2}\Lambda_{1}\right)^{T}g\Lambda_{2}\Lambda_{1} \displaystyle  = \displaystyle  \Lambda_{1}^{T}\Lambda_{2}^{T}g\Lambda_{2}\Lambda_{1}\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \Lambda_{1}^{T}g\Lambda_{1}\ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  g \ \ \ \ \ (12)

Thus the group is closed under multiplication.

Second, associativity is automatically satisfied as matrix multiplication is associative.

An identity element exists in the form of the identity matrix {I}, which is itself a Lorentz transformation as it satisfies 9.

Finally, we need to show that every matrix {\Lambda} has an inverse that is also part of the set {\mathcal{L}}. Taking the determinant of 9 we have

\displaystyle   \det\left(\Lambda^{T}g\Lambda\right) \displaystyle  = \displaystyle  \left(\det\Lambda^{T}\right)\left(\det g\right)\left(\det\Lambda\right)\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  \left(\det\Lambda\right)\left(\det g\right)\left(\det\Lambda\right)\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  -\left(\det\Lambda\right)^{2} \ \ \ \ \ (15)

since {\det g=-1} from 2. From the RHS of 9, this must equal {\det g=-1} so we have

\displaystyle   -\left(\det\Lambda\right)^{2} \displaystyle  = \displaystyle  -1\ \ \ \ \ (16)
\displaystyle  \det\Lambda \displaystyle  = \displaystyle  \pm1 \ \ \ \ \ (17)

From a basic theorem in matrix algebra, any matrix with a non-zero determinant has an inverse, so {\Lambda^{-1}} exists. To show that {\Lambda^{-1}} is a Lorentz transformation, we can take the inverse of 9 and use the fact that {g^{-1}=g}:

\displaystyle   \left(\Lambda^{T}g\Lambda\right)^{-1} \displaystyle  = \displaystyle  g^{-1}=g\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{T}\right)^{-1}\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{-1}\right)^{T} \ \ \ \ \ (20)

since the inverse and transpose operations commute (another basic theorem in matrix algebra). Therefore {\Lambda^{-1}} is also a valid Lorentz transformation.

We can also see that {\Lambda^{T}} is a valid transformation by left-multiplying by {\Lambda} and right-multiplying by {\Lambda^{T}}:

\displaystyle   g \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{-1}\right)^{T}\ \ \ \ \ (21)
\displaystyle  \Lambda g\Lambda^{T} \displaystyle  = \displaystyle  \left(\Lambda\Lambda^{-1}\right)g\left(\Lambda^{-1}\right)^{T}\Lambda^{T}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  g \ \ \ \ \ (23)

We need one more property of {\Lambda} concerning the element {\Lambda_{00}}. Again starting from 9, the 00 component of the RHS is {g_{00}=1}, and writing out the 00 component of the LHS explicitly we have

\displaystyle  \left[\Lambda^{T}g\Lambda\right]_{00}=\Lambda_{00}^{2}-\sum_{i=1}^{3}\Lambda_{i0}^{2}=1 \ \ \ \ \ (24)

This gives

\displaystyle  \Lambda_{00}=\pm\sqrt{1+\sum_{i=1}^{3}\Lambda_{i0}^{2}} \ \ \ \ \ (25)

Thus either {\Lambda_{00}\ge1} or {\Lambda_{00}\le-1}.

From the determinant and {\Lambda_{00}}, we can classify a particular transformation matrix {\Lambda} as being in one of four so-called connected components. Jaffe spells out in detail the proof that these four components are disjoint, that is, we can’t define some parameter {s} that can be varied continuously to move a matrix {\Lambda} from one connected component to another connected component. The notation {\mathcal{L}_{+}^{\uparrow}} indicates the set of matrices with {\det\Lambda=+1} (indicated by the + subscript) and {\Lambda_{00}\ge1} (indicated by the {\uparrow} superscript). The other three connected components are {\mathcal{L}_{-}^{\uparrow}} ({\det\Lambda=-1}, {\Lambda_{00}\ge1}); {\mathcal{L}_{+}^{\downarrow}} ({\det\Lambda=+1}, {\Lambda_{00}\le1}); and {\mathcal{L}_{-}^{\downarrow}} ({\det\Lambda=-1}, {\Lambda_{00}\le1}). Not all of these subsets of {\mathcal{L}} form groups, as some of them are not closed under multiplication.

If {\det\Lambda=+1}, {\Lambda} is called proper, and if{\det\Lambda=-1}, {\Lambda} is called improper. If {\Lambda_{00}\ge+1}, {\Lambda} is orthochronous, and if{\Lambda_{00}\le-1}, {\Lambda} is non-orthochronous. From here on, we’ll consider only proper orthochronous transformations, that is, the connected component {\mathcal{L}_{+}^{\uparrow}}.

Members of {\mathcal{L}_{+}^{\uparrow}} can be subdivided again into two types: pure rotations and pure boosts. A pure rotation is a rotation (about the origin) in 3-d space, leaving the time coordinate unchanged. That is, {\Lambda_{00}=+1}. Such a transformation can be written as

\displaystyle  \Lambda=\left[\begin{array}{cc} 1 & 0\\ 0 & \mathcal{R} \end{array}\right] \ \ \ \ \ (26)

where {\mathcal{R}} is a {3\times3} matrix, and the 0s represent 3 zero components in the top row and first column. We know that the off-diagonal elements in the first column must be zero, since if {\Lambda_{00}=+1}, we have from 25 that

\displaystyle  \sum_{i=1}^{3}\Lambda_{i0}^{2}=0 \ \ \ \ \ (27)

Since {\Lambda^{T}} must also be a valid transformation, this gives the analogous equation

\displaystyle  \sum_{i=1}^{3}\Lambda_{0i}^{2}=0 \ \ \ \ \ (28)

Thus the off-diagonal elements of the top row of {\Lambda} are also zero.

Since {\det\Lambda=1}, we must have {\det\mathcal{R}=1}. From 9, {\mathcal{R}} must also be an orthogonal matrix, that is, its rows must be mutually orthogonal (as must its columns). For example, if we pick the 2,3 element in the product 9, we have

\displaystyle   \left[\Lambda^{T}g\Lambda\right]_{23} \displaystyle  = \displaystyle  g_{23}=0\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  -\sum_{i=1}^{3}\Lambda_{i2}\Lambda_{i3} \ \ \ \ \ (30)

Thus columns 2 and 3 must be orthogonal.

These matrices form a group known as {SO\left(3\right)}, the group of real, orthogonal, {3\times3} matrices with {\det\mathcal{R}=+1}. A familiar example is a rotation by an angle {\theta} about the {z} axis, for which

\displaystyle  \mathcal{R}=\left[\begin{array}{ccc} \cos\theta & -\sin\theta & 0\\ \sin\theta & \cos\theta & 0\\ 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (31)

giving the full transformation matrix as

\displaystyle  \Lambda=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & \cos\theta & -\sin\theta & 0\\ 0 & \sin\theta & \cos\theta & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (32)

In general, a rotation can be about any axis through the origin, in which case {\mathcal{R}} gets more complicated, but the idea is the same.

We’ve already seen that a pure boost, that is, a transformation into a second inertial frame moving at some constant velocity in a given direction relative to the first frame, can be written as a rotation, if we use hyperbolic functions instead of trig functions. In this case {\Lambda_{00}>+1}. The standard situation from introductory special relativity is that of from {S^{\prime}} moving along the {x_{1}} axis at some constant speed {\beta}. If we define

\displaystyle   \cosh\chi \displaystyle  \equiv \displaystyle  \gamma=\frac{1}{\sqrt{1-\beta^{2}}}\ \ \ \ \ (33)
\displaystyle  \sinh\chi \displaystyle  \equiv \displaystyle  \beta\gamma=\frac{\beta}{\sqrt{1-\beta^{2}}} \ \ \ \ \ (34)

then the transformation is

\displaystyle  \Lambda=\left[\begin{array}{cccc} \cosh\chi & \sinh\chi & 0 & 0\\ \sinh\chi & \cosh\chi & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (35)

This has determinant +1 since {\cosh^{2}\chi-\sinh^{2}\chi=1}. We can verify by direct substitution that 9 is satisfied.

It turns out that all proper, orthochronous Lorentz transformations can be written as the product of a pure rotation and a pure boost, that is

\displaystyle  \Lambda=BR \ \ \ \ \ (36)

where the pure rotation {R} is applied first, followed by a pure boost {B}. (Jaffe doesn’t prove this at this point; we’ll return to this later.)

Noether’s theorem and conservation of energy and momentum

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

An important example of Noether’s theorem is the conservation of energy and momentum as consequences of the invariance of the action under coordinate translation in spacetime. Noether’s theorem applies to the situation where we transform the coordinates according to

\displaystyle x_{\mu}^{\prime}=x_{\mu}+\delta x_{\mu} \ \ \ \ \ (1)

 

resulting in a variation of the fields

\displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right)+\delta\phi_{r}\left(x\right) \ \ \ \ \ (2)

 

If this variation in coordinates and fields leaves the action integral unchanged, Noether’s theorem says that the following condition must be satisfied:

\displaystyle \partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial_{\nu}\phi_{r}\delta x^{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\right)=0 \ \ \ \ \ (3)

 

By integrating this over 3-d space and using Gauss’s law, we find a conserved quantity {G}, given by

\displaystyle G\equiv\int_{V}d^{3}x\;\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial_{\nu}\phi_{r}\delta x^{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{0}\right) \ \ \ \ \ (4)

 

Suppose we consider a translation in spacetime, so that the coordinates transform according to

\displaystyle x_{\mu}^{\prime}=x_{\mu}+\epsilon_{\mu} \ \ \ \ \ (5)

where the {\epsilon_{\mu}}s are infinitesimal (and independent) constants. That is, we’re free to vary any (or all) of the coordinates by some infinitesimal amount. In particular, we can choose to make only one of the {\epsilon_{\mu}} variations non-zero. For example, we might choose {\epsilon_{0}} to be non-zero with the remaining three {\epsilon_{i}=0}, which amounts to a translation in time but not in position.

Such a translation means that we perform the same experiment (the same ‘physics’) at a different time and/or at a different place, and we require that we get the same result under all such translations. Note that this does not mean that the behaviour of a system is independent of time or space. Rather, what it is saying is that if we imagine that the only thing that exists in the universe is the physical system we’re studying, it shouldn’t matter if we move the system to some other location, or start the experiment at an earlier or later time; in all cases we should observe the same behaviour. The system might evolve to different states as time passes, but the time-dependence of the system will be the same, as measured from the starting point we have chosen.

In terms of the fields, this amounts to saying that the fields will have exactly the same form when expressed in terms of the translation coordinates, that is

\displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right) \ \ \ \ \ (6)

[Recall from our earlier discussion that {x^{\prime}} and {x} both refer to the same point, but written in different coordinate systems. Under a translation, the value of a scalar field remains the same, as does a vector field, since all we’ve done is move the coordinate axes parallel to themselves. This is different from a rotation of the coordinates, under which a vector does changes its components in the new coordinate system (although its length remains unchanged).]

Thus we have

\displaystyle \delta\phi_{r}\left(x\right)=0 \ \ \ \ \ (7)

and from 3

\displaystyle \partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right)\right)\epsilon^{\nu}=0 \ \ \ \ \ (8)

where we’ve used {\delta x_{\nu}=\epsilon_{\nu}} and used the metric tensor {g_{\mu\nu}} to lower the index: {\epsilon_{\mu}=g_{\mu\nu}\epsilon^{\nu}}. Since the {\epsilon_{\nu}} are arbitrary, we must have

\displaystyle \partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right)\right)=0 \ \ \ \ \ (9)

 

for each value of {\nu=0,1,2,3} separately. Thus we get four conservation laws.

For {\nu=0} we can apply the same procedure that was used to derive 4 from 3. That is, we integrate over 3-d space and use Gauss’s law:

\displaystyle \int_{V}d^{3}x\;\partial^{i}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{i}\phi_{r}\right)}\partial_{0}\phi_{r}-g_{i0}\mathcal{L}\left(x\right)\right)=\partial^{0}\int_{V}d^{3}x\;\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{0}\phi_{r}-g_{00}\mathcal{L}\left(x\right)\right) \ \ \ \ \ (10)

On the LHS, the index {i} runs over the spatial indexes 1,2 and 3, and we’ve set {\nu=0} on both sides. The integral on the LHS is a divergence, so we use Gauss’s law to convert this to a surface integral and extend the surface to infinity, requiring the integrand to go to zero fast enough that the integral is zero in the limit. We then get that

\displaystyle \partial^{0}\int_{V}d^{3}x\;\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{0}\phi_{r}-g_{00}\mathcal{L}\left(x\right)\right)=0 \ \ \ \ \ (11)

 

so that the integral is a conserved quantity (it has zero time derivative).

Comparing this with Hamilton’s equations of motion, we have the conjugate momentum density

\displaystyle \pi=\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)} \ \ \ \ \ (12)

so the integrand of 11 becomes Hamilton’s equation for the Hamiltonian density (using {g_{00}=1} in flat space):

\displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{0}\phi_{r}-g_{00}\mathcal{L}\left(x\right)=\pi_{r}\dot{\phi}_{r}-\mathcal{L}=\mathcal{H} \ \ \ \ \ (13)

Since {\mathcal{H}} is the energy density, 11 says that the total energy of the system is constant in time, so energy is conserved.

We can repeat the procedure for the other three values of {\nu} to get

\displaystyle \partial^{0}\int_{V}d^{3}x\;\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{i}\phi_{r}-g_{0i}\mathcal{L}\left(x\right)\right)=0 \ \ \ \ \ (14)

 

where the index {i=1,2,3}.

Since {g_{0i}=0} in flat space, the integrand reduces to

\displaystyle p_{i}=\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{i}\phi_{r}=\pi_{r}\frac{\partial\phi_{r}}{\partial x^{i}} \ \ \ \ \ (15)

As we’ve seen earlier, we can interpret this quantity as the physical momentum density, so 14 says that each component of the total physical momentum is conserved. Thus requiring a physical system to be invariant under translation in spacetime results in the laws of conservation of energy and linear momentum.

Going back to 9, the general conservation law says that

\displaystyle \partial^{\mu}T_{\mu\nu}=0 \ \ \ \ \ (16)

where

\displaystyle T_{\mu\nu}\equiv\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right) \ \ \ \ \ (17)

is the energy-momentum tensor and is defined for all values of {\mu} and {\nu} in the range 0,1,2,3. [G & R use the symbol {\Theta_{\mu\nu}} for this tensor, but as it’s the same as the stress-energy tensor, we’ll try to keep the notation consistent at this point.]

Noether’s theorem and conservation laws

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

We can now apply the formulas resulting from coordinate transformations to derive Noether’s theorem, which states that for each coordinate transformation that leaves the physics of a system unchanged, there is a corresponding conserved quantity. More precisely, if we transform the coordinates according to

\displaystyle x_{\mu}^{\prime}=x_{\mu}+\delta x_{\mu} \ \ \ \ \ (1)

 

this results in a variation of the fields

\displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right)+\delta\phi_{r}\left(x\right) \ \ \ \ \ (2)

 

We can insert these varied quantities into the Lagrangian density to get its variation:

\displaystyle \mathcal{L}^{\prime}\left(x^{\prime}\right)=\mathcal{L}\left(x\right)+\delta\mathcal{L}\left(x\right) \ \ \ \ \ (3)

 

[Note that the Lagrangian density actually depends on the fields {\phi_{r}} and their derivatives {\partial_{\mu}\phi_{r}}, both of which depend, in turn, on the coordinates {x_{\mu}}, but having to write {\mathcal{L}\left(\phi_{r}\left(x\right),\partial_{\mu}\phi_{r}\left(x\right)\right)} everywhere would get very tedious, so we’ll use {\mathcal{L}\left(x\right)=\mathcal{L}\left(\phi_{r}\left(x\right),\partial_{\mu}\phi_{r}\left(x\right)\right)} as a shorthand.] The function {\mathcal{L}^{\prime}\left(x^{\prime}\right)} is just {\mathcal{L}\left(x\right)} with {x} replaced by {x^{\prime}} and {\phi_{r}\left(x\right)} replaced by {\phi_{r}^{\prime}\left(x^{\prime}\right)}.

The mathematical interpretation of the phrase “the physics doesn’t change” is expressed by requiring the action of the system to remain the same when {\mathcal{L}} is varied. Using G & R’s symbol {W} (rather than the more usual {S}) for the action, this requirement is

\displaystyle \delta W\equiv\int_{\Omega^{\prime}}d^{4}x^{\prime}\mathcal{L}^{\prime}\left(x^{\prime}\right)-\int_{\Omega}d^{4}x\;\mathcal{L}\left(x\right)=0 \ \ \ \ \ (4)

 

As explained earlier, both {x^{\prime}} and {x} refer to the same point in spacetime, written in two different coordinate systems. The volume {\Omega^{\prime}} is the same volume as {\Omega}, but written in the {x^{\prime}} coordinate system.

From here, we can follow G & R’s derivation of Noether’s theorem, which I find somewhat easier to follow than the one in Peskin & Schroeder, which I looked at earlier. In order to understand what 4 is saying, we first need to express everything in terms of one coordinate system, which we’ll take to be {x}. First, we look at the volume element {d^{4}x^{\prime}} in the first integral. We can express this in terms of the volume element {d^{4}x} by using the Jacobian determinant, using 1 to calculate the derivatives.

\displaystyle d^{4}x^{\prime} \displaystyle = \displaystyle \left|\frac{\partial\left(x_{\mu}^{\prime}\right)}{\partial\left(x_{\nu}\right)}\right|\ \ \ \ \ (5)
\displaystyle \displaystyle = \displaystyle \left|\begin{array}{cccc} 1+\frac{\partial\delta x_{0}}{\partial x_{0}} & \frac{\partial\delta x_{0}}{\partial x_{1}} & \frac{\partial\delta x_{0}}{\partial x_{2}} & \frac{\partial\delta x_{0}}{\partial x_{3}}\\ \frac{\partial\delta x_{1}}{\partial x_{0}} & 1+\frac{\partial\delta x_{1}}{\partial x_{1}} & \frac{\partial\delta x_{1}}{\partial x_{2}} & \frac{\partial\delta x_{1}}{\partial x_{3}}\\ \frac{\partial\delta x_{2}}{\partial x_{0}} & \frac{\partial\delta x_{2}}{\partial x_{1}} & 1+\frac{\partial\delta x_{2}}{\partial x_{2}} & \frac{\partial\delta x_{2}}{\partial x_{3}}\\ \frac{\partial\delta x_{3}}{\partial x_{0}} & \frac{\partial\delta x_{3}}{\partial x_{1}} & \frac{\partial\delta x_{3}}{\partial x_{2}} & 1+\frac{\partial\delta x_{3}}{\partial x_{3}} \end{array}\right| \ \ \ \ \ (6)

Since we’re considering only infinitesimal variations, we need to keep only up to first order terms in this determinant. If we expand the determinant about the first row, the first term is

\displaystyle \left(1+\frac{\partial\delta x_{0}}{\partial x_{0}}\right)\left|\begin{array}{ccc} 1+\frac{\partial\delta x_{1}}{\partial x_{1}} & \frac{\partial\delta x_{1}}{\partial x_{2}} & \frac{\partial\delta x_{1}}{\partial x_{3}}\\ \frac{\partial\delta x_{2}}{\partial x_{1}} & 1+\frac{\partial\delta x_{2}}{\partial x_{2}} & \frac{\partial\delta x_{2}}{\partial x_{3}}\\ \frac{\partial\delta x_{3}}{\partial x_{1}} & \frac{\partial\delta x_{3}}{\partial x_{2}} & 1+\frac{\partial\delta x_{3}}{\partial x_{3}} \end{array}\right| \displaystyle =
\displaystyle \left(1+\frac{\partial\delta x_{0}}{\partial x_{0}}\right)\left[\left(1+\frac{\partial\delta x_{1}}{\partial x_{1}}\right)\left(\left(1+\frac{\partial\delta x_{2}}{\partial x_{2}}\right)\left(1+\frac{\partial\delta x_{3}}{\partial x_{3}}\right)-\frac{\partial\delta x_{2}}{\partial x_{3}}\frac{\partial\delta x_{3}}{\partial x_{2}}\right)+\ldots\right] \ \ \ \ \ (7) \displaystyle =
\displaystyle 1+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}+\ldots

In the second line, all the terms represented by the … are of second or higher order in {\delta x_{\mu}} so can be omitted from the final result. Note that we’re summing over {\mu} in the last line. All terms arising from the remaining 3 terms in the expansion about the first row of 6 are also of second or higher order, so the final result, valid to first order, is

\displaystyle d^{4}x^{\prime}=\left(1+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\right)d^{4}x \ \ \ \ \ (8)

So much for the volume element. The only remaining task is to express {\mathcal{L}^{\prime}\left(x^{\prime}\right)} in the {x} coordinate system. To do this, we can use 3

\displaystyle \delta W \displaystyle = \displaystyle \int_{\Omega^{\prime}}d^{4}x^{\prime}\left[\delta\mathcal{L}\left(x\right)+\mathcal{L}\left(x\right)\right]-\int_{\Omega}d^{4}x\;\mathcal{L}\left(x\right)\ \ \ \ \ (9)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\left[\left(1+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\right)\left(\delta\mathcal{L}\left(x\right)+\mathcal{L}\left(x\right)\right)-\mathcal{L}\left(x\right)\right]\ \ \ \ \ (10)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\left[\left(1+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\right)\delta\mathcal{L}\left(x\right)+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\mathcal{L}\left(x\right)\right]\ \ \ \ \ (11)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\delta\mathcal{L}\left(x\right)+\int_{\Omega}d^{4}x\;\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\mathcal{L}\left(x\right) \ \ \ \ \ (12)

where in the last line, we saved terms up to first order only. In the second line, we can replace the volume of integration {\Omega^{\prime}} by {\Omega} in all integrals, since we’ve changed the integration variable from {x^{\prime}} to {x}, and {\Omega} and {\Omega^{\prime}} both represent the same volume, as mentioned above.

Now we can use the total variation, which is

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right)=\delta\mathcal{L}\left(x\right)-\frac{\partial\mathcal{L}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu} \ \ \ \ \ (13)

 

We get

\displaystyle \delta W \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\left[\tilde{\delta}\mathcal{L}\left(x\right)+\frac{\partial\mathcal{L}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu}\right]+\int_{\Omega}d^{4}x\;\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\mathcal{L}\left(x\right)\ \ \ \ \ (14)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\left[\tilde{\delta}\mathcal{L}\left(x\right)+\partial^{\mu}\left(\mathcal{L}\left(x\right)\delta x_{\mu}\right)\right] \ \ \ \ \ (15)

using the product rule (backwards) in the last line.

Now, remembering that {\mathcal{L}\left(x\right)=\mathcal{L}\left(\phi_{r}\left(x\right),\partial_{\mu}\phi_{r}\left(x\right)\right)}, we can use the chain rule to expand the total variation of {\mathcal{L}}:

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right) \displaystyle = \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}\tilde{\delta}\phi_{r}\left(x\right)+\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\tilde{\delta}\left(\partial^{\mu}\phi_{r}\left(x\right)\right) \ \ \ \ \ (16)

We can now add and subtract the same term to the RHS (equivalent to adding zero) to get

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right) \displaystyle = \displaystyle \left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}\tilde{\delta}\phi_{r}\left(x\right)-\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\tilde{\delta}\phi_{r}\left(x\right)\right]\ \ \ \ \ (17)
\displaystyle \displaystyle \displaystyle +\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\tilde{\delta}\phi_{r}\left(x\right)+\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\tilde{\delta}\left(\partial^{\mu}\phi_{r}\left(x\right)\right) \ \ \ \ \ (18)

As we saw earlier, the total variation operation {\tilde{\delta}} commutes with differentiation with respect to {x_{\mu}} so we can interchange the {\tilde{\delta}} and {\partial^{\mu}} in the last term to get

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right) \displaystyle = \displaystyle \left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}\tilde{\delta}\phi_{r}\left(x\right)-\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\tilde{\delta}\phi_{r}\left(x\right)\right]\ \ \ \ \ (19)
\displaystyle \displaystyle \displaystyle +\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\tilde{\delta}\phi_{r}\left(x\right)+\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial^{\mu}\left(\tilde{\delta}\phi_{r}\left(x\right)\right) \ \ \ \ \ (20)

We can now use the reverse product rule on the last two terms to get

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right) \displaystyle = \displaystyle \left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}-\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\right]\tilde{\delta}\phi_{r}\left(x\right)\ \ \ \ \ (21)
\displaystyle \displaystyle \displaystyle +\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\tilde{\delta}\phi_{r}\left(x\right)\right) \ \ \ \ \ (22)

The term in square brackets is just the Euler-Lagrange equation and is zero if the fields {\phi_{r}} satisfy the equations of motion:

\displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}-\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)=0 \ \ \ \ \ (23)

We can therefore insert 22 back into 15 to get

\displaystyle \delta W \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\left[\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\tilde{\delta}\phi_{r}\left(x\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\right)\right]\ \ \ \ \ (24)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\left[\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\right)\right] \ \ \ \ \ (25)

The requirement that {\delta W=0} must mean that the integrand is zero, since the volume {\Omega} over which the integration is done is arbitrary. Thus we get

\displaystyle \partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\right)=0 \ \ \ \ \ (26)

To see what this means, we can define the function {f} as

\displaystyle f_{\mu}\left(x\right)\equiv\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu} \ \ \ \ \ (27)

so that

\displaystyle \partial^{\mu}f_{\mu}\left(x\right)=0 \ \ \ \ \ (28)

 

If we integrate this over 3-d space and use Gauss’s theorem to convert the integral of a divergence to a surface integral, we have

\displaystyle \int_{V}d^{3}x\;\partial^{\mu}f_{\mu}\left(x\right) \displaystyle = \displaystyle \int_{V}d^{3}x\;\partial^{0}f_{0}\left(x\right)+\int_{V}d^{3}x\nabla\cdot\mathbf{f}\left(x\right)\ \ \ \ \ (29)
\displaystyle \displaystyle = \displaystyle \frac{d}{dx_{0}}\int_{V}d^{3}x\;f_{0}\left(x\right)+\int_{S}d\mathbf{a}\cdot\mathbf{f}\left(x\right)\ \ \ \ \ (30)
\displaystyle \displaystyle = \displaystyle \frac{d}{dx_{0}}\int_{V}d^{3}x\;f_{0}\left(x\right) \ \ \ \ \ (31)

where we make the usual assumption in the second line that {\mathbf{f}\left(x\right)\rightarrow0} fast enough at infinity that the surface integral is zero. However, the requirement 28 implies that the result of this volume integral must be zero as well, so that

\displaystyle \frac{d}{dx_{0}}\int_{V}d^{3}x\;f_{0}\left(x\right)=0 \ \ \ \ \ (32)

This implies that {f_{0}\left(x\right)} is a conserved quantity, as its volume integral is constant over time. This is Noether’s theorem, which we can state as:

A continuous symmetry transformation (given by 1 and 2) that leaves the physics unchanged (that is, there is no change in the action integral 4) leads to a conservation law, with the conserved quantity {G} given by

\displaystyle G \displaystyle \equiv \displaystyle \int_{V}d^{3}x\;f_{0}\left(x\right)\ \ \ \ \ (33)
\displaystyle \displaystyle = \displaystyle \int_{V}d^{3}x\;\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{0}\right) \ \ \ \ \ (34)

Coordinate transformations in classical field theory

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

The various conservation laws of physics (energy, linear and angular momentum) can be derived from the invariance of a system under coordinate transformations. To prepare for Noether’s theorem, which is a general theorem allowing us to derive these conservation laws, we need to consider how the fields themselves transform under coordinate transformations.

In what follows, we’ll consider only infinitesimal transformations, and we define a general transformation as

\displaystyle  x_{\mu}^{\prime}=x_{\mu}+\delta x_{\mu} \ \ \ \ \ (1)

Note that {x_{\mu}} and {x_{\mu}^{\prime}} both refer to the same physical point in space; they simply represent two different coordinate systems referring to this same point.

Under this transformation, the mathematical function describing the field will change as well, so we can write

\displaystyle  \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right)+\delta\phi_{r}\left(x\right) \ \ \ \ \ (2)

where the subscript {r} labels which field we’re talking about.

Again, {\phi_{r}^{\prime}\left(x^{\prime}\right)} and {\phi_{r}\left(x\right)} both represent the same field at the same point in space-time; they are just expressed in different coordinate systems.

At this point, it’s useful to have a look at a specific example. Suppose the field {\phi} is a vector field in two dimensions (we’ll drop the {r} subscript, as we’re dealing with only one field). We’ll see what happens if we rotate the coordinate system through an angle {\theta}, as in the diagram, where the unprimed system is drawn in black and the primed system in blue.

In the unprimed system, {\phi} consists of horizontal vectors with a magnitude equal to their {x_{2}} coordinate.

\displaystyle  \phi\left(x\right)=\left[\begin{array}{c} x_{2}\\ 0 \end{array}\right] \ \ \ \ \ (3)

Under a rotation, the coordinates transform according to

\displaystyle  x^{\prime}=\left[\begin{array}{c} x_{1}^{\prime}\\ x_{2}^{\prime} \end{array}\right]=\left[\begin{array}{cc} \cos\theta & \sin\theta\\ -\sin\theta & \cos\theta \end{array}\right]\left[\begin{array}{c} x_{1}\\ x_{2} \end{array}\right]=\left[\begin{array}{c} x_{1}\cos\theta+x_{2}\sin\theta\\ -x_{1}\sin\theta+x_{2}\cos\theta \end{array}\right] \ \ \ \ \ (4)

Inverting the rotation gives

\displaystyle  \left[\begin{array}{c} x_{1}\\ x_{2} \end{array}\right]=\left[\begin{array}{cc} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta \end{array}\right]\left[\begin{array}{c} x_{1}^{\prime}\\ x_{2}^{\prime} \end{array}\right]=\left[\begin{array}{c} x_{1}^{\prime}\cos\theta-x_{2}^{\prime}\sin\theta\\ x_{1}^{\prime}\sin\theta+x_{2}^{\prime}\cos\theta \end{array}\right] \ \ \ \ \ (5)

For our example vector field 3, we have

\displaystyle  \phi^{\prime}\left(x^{\prime}\right)=\left[\begin{array}{c} x_{2}\cos\theta\\ -x_{2}\sin\theta \end{array}\right]=\left[\begin{array}{c} x_{1}^{\prime}\sin\theta\cos\theta+x_{2}^{\prime}\cos^{2}\theta\\ -x_{1}^{\prime}\sin^{2}\theta-x_{2}^{\prime}\sin\theta\cos\theta \end{array}\right] \ \ \ \ \ (6)

As we can see from the diagram by looking at the magenta vector, the vector in the unprimed system is parallel to the {x_{1}} axis, with length {x_{2}} as given by 3. If we rotate the coordinate axes by the angle {\theta} we get the primed system shown as the blue axes, and we can see that in that system, the magenta vector has a positive component in the {x_{1}^{\prime}} direction and a negative component in the {x_{2}^{\prime}} direction. However, the length of the vector remains the same in both systems, since the vector itself doesn’t change when we simply rotate the coordinates.

Since we’ll deal primarily with infinitesimal transformations from now on, we’ll do the rest of the analysis using that approximation. For the rotation example above, if {\theta} is now an infinitesimal angle (I suppose I should write it as {\delta\theta} but this just clutters up the notation, so just remember that {\theta} is infinitesimal and all will be well.), then we have, to first order in {\theta}, {\cos\theta=1} and {\sin\theta=\theta}, so for a general rotation

\displaystyle   x^{\prime} \displaystyle  = \displaystyle  \left[\begin{array}{c} x_{1}^{\prime}\\ x_{2}^{\prime} \end{array}\right]=\left[\begin{array}{c} x_{1}+x_{2}\theta\\ -x_{1}\theta+x_{2} \end{array}\right]\ \ \ \ \ (7)
\displaystyle  \delta x \displaystyle  = \displaystyle  x^{\prime}-x=\left[\begin{array}{c} x_{2}\theta\\ -x_{1}\theta \end{array}\right] \ \ \ \ \ (8)

For the specific example above, to first order in {\theta}

\displaystyle  \phi^{\prime}\left(x^{\prime}\right)=\left[\begin{array}{c} x_{2}\\ -x_{2}\theta \end{array}\right]=\left[\begin{array}{c} x_{1}^{\prime}\theta+x_{2}^{\prime}\\ -x_{2}^{\prime}\theta \end{array}\right] \ \ \ \ \ (9)

Plugging 3 and 9 into 2, we get

\displaystyle  \delta\phi\left(x\right)=\phi^{\prime}\left(x^{\prime}\right)-\phi\left(x\right)=\left[\begin{array}{c} 0\\ -x_{2}\theta \end{array}\right] \ \ \ \ \ (10)

Up to now, we’ve considered what happens at one specific point when the coordinate system is varied. The variation {\delta\phi\left(x\right)} is the result of varying both the coordinate system and the effect this variation has on the form of the field expression. In practice, another kind of variation, called the modified or total variation is defined by

\displaystyle  \tilde{\delta}\phi_{r}\left(x\right)\equiv\phi_{r}^{\prime}\left(x\right)-\phi_{r}\left(x\right) \ \ \ \ \ (11)

Note that the difference between {\tilde{\delta}\phi_{r}\left(x\right)} and {\delta\phi_{r}\left(x\right)} is that the {\phi_{r}^{\prime}} term is evaluated at {x} in the former and at {x^{\prime}} in the latter. This notation is somewhat confusing, since in 2, both {x^{\prime}} and {x} refer to the same point in the plane, while in the latter, the {x} in {\phi_{r}^{\prime}\left(x\right)} is a different point from the {x} in {\phi_{r}\left(x\right)}. We can illustrate this by looking again at the above diagram. The point {x} in the unprimed system is at around {\left(x_{1},x_{2}\right)=\left(1,2\right)} (it’s the location of the tail of the magenta vector, identified by the dotted black lines). The notation {\phi_{r}^{\prime}\left(x\right)} means that we insert the same numerical values for {\left(x_{1},x_{2}\right)} into the function {\phi_{r}^{\prime}}, that is, we set {\left(x_{1}^{\prime},x_{2}^{\prime}\right)=\left(1,2\right)}. This gives the location indicated by the tail of the green vector, as identified by the dotted blue lines. Since this location is higher up the {x_{2}} axis than the magenta vector, the green vector is longer than the magenta vector, so that {\phi_{r}^{\prime}\left(x\right)} and {\phi_{r}\left(x\right)} now refer to two different vectors. The quantity {\tilde{\delta}\phi_{r}\left(x\right)} therefore measures the change in the field due solely to the transformation of the coordinates.

We can, nevertheless, derive a relation between {\tilde{\delta}\phi_{r}\left(x\right)} and {\delta\phi_{r}\left(x\right)}. Starting from 11, we have

\displaystyle   \tilde{\delta}\phi_{r}\left(x\right) \displaystyle  = \displaystyle  \phi_{r}^{\prime}\left(x\right)-\phi_{r}\left(x\right)\ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  \phi_{r}^{\prime}\left(x\right)-\phi_{r}^{\prime}\left(x^{\prime}\right)+\phi_{r}^{\prime}\left(x^{\prime}\right)-\phi_{r}\left(x\right)\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  -\left(\phi_{r}^{\prime}\left(x^{\prime}\right)-\phi_{r}^{\prime}\left(x\right)\right)+\delta\phi_{r}\left(x\right)\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \delta\phi_{r}\left(x\right)-\frac{\partial\phi_{r}^{\prime}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu}\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \delta\phi_{r}\left(x\right)-\frac{\partial\phi_{r}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu} \ \ \ \ \ (16)

In the penultimate line, we replaced {\phi_{r}^{\prime}\left(x^{\prime}\right)-\phi_{r}^{\prime}\left(x\right)} by its first order term in the Taylor expansion, and in the last line, we approximated {\phi_{r}^{\prime}\left(x\right)} by {\phi_{r}\left(x\right)}, again valid to first order.

As an example, we can apply this formula to the above vector field. Starting with 11, we have, using 9 and 3

\displaystyle   \tilde{\delta}\phi\left(x\right) \displaystyle  = \displaystyle  \phi^{\prime}\left(x\right)-\phi\left(x\right)\ \ \ \ \ (17)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} x_{1}\theta+x_{2}\\ -x_{2}\theta \end{array}\right]-\left[\begin{array}{c} x_{2}\\ 0 \end{array}\right]\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} x_{1}\theta\\ -x_{2}\theta \end{array}\right] \ \ \ \ \ (19)

Now we can check 16. From 3 we have

\displaystyle   \frac{\partial\phi\left(x\right)}{\partial x_{1}} \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 0 \end{array}\right]\ \ \ \ \ (20)
\displaystyle  \frac{\partial\phi\left(x\right)}{\partial x_{2}} \displaystyle  = \displaystyle  \left[\begin{array}{c} 1\\ 0 \end{array}\right] \ \ \ \ \ (21)

From 8, we have

\displaystyle   \frac{\partial\phi_{r}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu} \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 0 \end{array}\right]\delta x_{1}+\left[\begin{array}{c} 1\\ 0 \end{array}\right]\delta x_{2}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 0 \end{array}\right]+\left[\begin{array}{c} -x_{1}\theta\\ 0 \end{array}\right]\ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  -\left[\begin{array}{c} x_{1}\theta\\ 0 \end{array}\right] \ \ \ \ \ (24)

Combining this with 10 we get

\displaystyle   \tilde{\delta}\phi_{r}\left(x\right) \displaystyle  = \displaystyle  \delta\phi_{r}\left(x\right)-\frac{\partial\phi_{r}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu}\ \ \ \ \ (25)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ -x_{2}\theta \end{array}\right]+\left[\begin{array}{c} x_{1}\theta\\ 0 \end{array}\right]\ \ \ \ \ (26)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} x_{1}\theta\\ -x_{2}\theta \end{array}\right] \ \ \ \ \ (27)

which agrees with 19.

Finally, we can note a couple of formulas concerning the derivative of the two variations {\tilde{\delta}\phi_{r}\left(x\right)} and {\delta\phi_{r}\left(x\right)}. Since {\tilde{\delta}\phi_{r}\left(x\right)} depends only on {x} (and not on {x^{\prime}}), the derivative commutes with the variation:

\displaystyle  \frac{\partial}{\partial x_{\mu}}\tilde{\delta}\phi_{r}\left(x\right)=\tilde{\delta}\left(\frac{\partial\phi_{r}\left(x\right)}{\partial x_{\mu}}\right) \ \ \ \ \ (28)

The other variation {\delta\phi_{r}\left(x\right)} is a bit trickier, since it involves {x^{\prime}} as well as {x}. However, using the chain rule, we can find its derivative. I’ll use the shorthand {\partial_{\mu}\equiv\partial/\partial x_{\mu}} and {\partial_{\mu}^{\prime}\equiv\partial/\partial x_{\mu}^{\prime}}.

\displaystyle   \partial_{\mu}\left(\delta\phi_{r}\left(x\right)\right) \displaystyle  = \displaystyle  \partial_{\mu}\phi_{r}^{\prime}\left(x^{\prime}\right)-\partial_{\mu}\phi_{r}\left(x\right)\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  \left[\partial_{\mu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)-\partial_{\mu}\phi_{r}\left(x\right)\right]+\partial_{\mu}\phi_{r}^{\prime}\left(x^{\prime}\right)-\partial_{\mu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)\ \ \ \ \ (30)
\displaystyle  \displaystyle  = \displaystyle  \delta\left(\partial_{\mu}\phi_{r}\left(x\right)\right)+\left(\partial_{\nu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)\right)\left(\partial_{\mu}x^{\prime\nu}\right)-\partial_{\mu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right) \ \ \ \ \ (31)

We can now use 1 on the middle term:

\displaystyle   \partial_{\mu}x^{\prime\nu} \displaystyle  = \displaystyle  \partial_{\mu}\left(x^{\nu}+\delta x^{\nu}\right)\ \ \ \ \ (32)
\displaystyle  \displaystyle  = \displaystyle  \delta^{\mu\nu}+\partial_{\mu}\delta x^{\nu} \ \ \ \ \ (33)

Combining the last two terms, we get

\displaystyle   \left(\partial_{\nu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)\right)\left(\delta^{\mu\nu}+\partial_{\mu}\delta x^{\nu}\right)-\partial_{\mu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right) \displaystyle  = \displaystyle  \left(\partial_{\nu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)\right)\partial_{\mu}\delta x^{\nu}\ \ \ \ \ (34)
\displaystyle  \displaystyle  = \displaystyle  \left(\partial_{\nu}\phi_{r}\left(x\right)\right)\partial_{\mu}\delta x^{\nu} \ \ \ \ \ (35)

Again, the last step is valid to first order in the variations. Thus we have

\displaystyle  \partial_{\mu}\left(\delta\phi_{r}\left(x\right)\right)=\delta\left(\partial_{\mu}\phi_{r}\left(x\right)\right)+\left(\partial_{\nu}\phi_{r}\left(x\right)\right)\partial_{\mu}\delta x^{\nu} \ \ \ \ \ (36)

Poisson brackets and Hamilton’s equations of motion

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.2.

Although I’ve looked at Poisson brackets before, it’s worth going through G & R’s treatment as it is a fair bit simpler and gives clearer results.

First, we need the time derivative of a functional. In the simplest case, a functional {F\left[\phi\right]} depends on a function {\phi}which in turn depends on an independent variable {x}. The functional itself does not depend on {x}, however, usually because {F} is defined as the integral of {\phi\left(x\right)} over some range of {x} values, so the dependence on {x} disappears in the integration.

We can generalize things a bit by taking {\phi} as a function of two variables, say {x} and {t}. If {F} is defined in the same way (say, as an integral of {\phi} over {x}), then the variable {t} also appears in the functional, so we can write this as {F\left(t\right)}, which is

\displaystyle F\left(t\right)=\int dx\;g\left(\phi\left(x,t\right)\right) \ \ \ \ \ (1)

where {g\left(\phi\right)} is some function of {\phi}. Since {F} now depends on {t}, we can take the derivative {dF/dt} which comes out to

\displaystyle \dot{F}\equiv\frac{dF}{dt}=\int dx\frac{dg}{d\phi}\frac{\partial\phi}{\partial t}=\int dx\frac{dg}{d\phi}\dot{\phi}\left(x,t\right) \ \ \ \ \ (2)

As we’ve seen before, the functional derivative of {F} in this case is

\displaystyle \frac{\delta F\left(t\right)}{\delta\phi\left(y,t\right)}=\frac{dg\left(\phi\left(y,t\right)\right)}{d\phi} \ \ \ \ \ (3)

where the notation means that we evaluate the derivative on the RHS at the point {\left(y,t\right)}. Using this result, we can therefore write {\dot{F}} as

\displaystyle \dot{F}\left(t\right)=\int dx\frac{\delta F\left(t\right)}{\delta\phi\left(x,t\right)}\dot{\phi}\left(x,t\right) \ \ \ \ \ (4)

We can generalize this to 4-d space time, so that {x} now indicates the four-vector {x=\left(\mathbf{x},t\right)}, and the integral is over 3-d space:

\displaystyle \dot{F}\left(t\right)=\int d^{3}\mathbf{x}\frac{\delta F\left(t\right)}{\delta\phi\left(x\right)}\dot{\phi}\left(x\right) \ \ \ \ \ (5)

Generalizing even further, we can make {F} a functional of two fields, {\phi} and {\pi}, so we get

\displaystyle \dot{F}\left(t\right)=\int d^{3}\mathbf{x}\left[\frac{\delta F\left(t\right)}{\delta\phi\left(x\right)}\dot{\phi}\left(x\right)+\frac{\delta F\left(t\right)}{\delta\pi\left(x\right)}\dot{\pi}\left(x\right)\right] \ \ \ \ \ (6)

Interpreting {\phi} as the field and {\pi} as its conjugate momentum, we can now use Hamilton’s equations of motion

\displaystyle \dot{\phi} \displaystyle = \displaystyle \frac{\delta H}{\delta\pi}\ \ \ \ \ (7)
\displaystyle \dot{\pi} \displaystyle = \displaystyle -\frac{\delta H}{\delta\phi} \ \ \ \ \ (8)

We get

\displaystyle \dot{F}\left(t\right)=\int d^{3}\mathbf{x}\left[\frac{\delta F\left(t\right)}{\delta\phi\left(x\right)}\frac{\delta H}{\delta\pi}-\frac{\delta F\left(t\right)}{\delta\pi\left(x\right)}\frac{\delta H}{\delta\phi}\right] \ \ \ \ \ (9)

The quantity on the RHS is defined to be the Poisson bracket:

\displaystyle \left\{ F,H\right\} _{PB}\equiv\int d^{3}\mathbf{x}\left[\frac{\delta F\left(t\right)}{\delta\phi\left(x\right)}\frac{\delta H}{\delta\pi}-\frac{\delta F\left(t\right)}{\delta\pi\left(x\right)}\frac{\delta H}{\delta\phi}\right] \ \ \ \ \ (10)

 

We thus have the general result that the time derivative of a functional is equal to its Poisson bracket with the Hamiltonian:

\displaystyle \boxed{\dot{F}\left(t\right)=\left\{ F,H\right\} _{PB}} \ \ \ \ \ (11)

 

We can use this result in a rather curious way to re-derive Hamilton’s equations of motion. We first observe that we can write the field {\phi} as an integral:

\displaystyle \phi\left(\mathbf{x},t\right)=\int d^{3}\mathbf{x}^{\prime}\;\phi\left(\mathbf{x}^{\prime},t\right)\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (12)

This effectively defines an ordinary function {\phi} as a functional depending on itself. In this case, both {\mathbf{x}} and {t} are parameters that are present on both sides of the equation; it is the dummy variable {\mathbf{x}^{\prime}} that is the variable of integration.

Taking the variation on both sides, we get

\displaystyle \delta\phi\left(\mathbf{x},t\right)=\int d^{3}\mathbf{x}^{\prime}\;\delta\phi\left(\mathbf{x}^{\prime},t\right)\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (13)

[Be careful not to get the {\delta}s confused here: {\delta\phi} is a variation of the function {\phi} while {\delta^{3}} is the 3-d delta function.] Comparing this to the definition of the functional derivative

\displaystyle \delta F\left[\phi\right]\equiv\int d^{3}\mathbf{x}\frac{\delta F\left[\phi\right]}{\delta\phi\left(x\right)}\delta\phi\left(x\right) \ \ \ \ \ (14)

 

we see that we have the functional derivative of {\phi} with respect to itself:

\displaystyle \frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}=\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (15)

We could use the same argument on the conjugate momentum, so we also have

\displaystyle \frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}=\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (16)

Since {\phi} and {\pi} are independent fields

\displaystyle \frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}=\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}=0 \ \ \ \ \ (17)

 

We can now use 11 to find the time derivatives of {\phi} and {\pi} by treating them as functionals:

\displaystyle \dot{\phi}\left(\mathbf{x},t\right) \displaystyle = \displaystyle \left\{ \phi\left(\mathbf{x},t\right),H\right\} \ \ \ \ \ (18)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime}\left[\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\frac{\delta H\left(t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}-\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}\frac{\delta H\left(t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\right]\ \ \ \ \ (19)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime}\left[\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right)\frac{\delta H\left(t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}-0\right]\ \ \ \ \ (20)
\displaystyle \displaystyle = \displaystyle \frac{\delta H\left(t\right)}{\delta\pi\left(\mathbf{x},t\right)} \ \ \ \ \ (21)

This gives the first Hamilton equation of motion 7. We can work out the second equation similarly:

\displaystyle \dot{\pi}\left(\mathbf{x},t\right) \displaystyle = \displaystyle \left\{ \pi\left(\mathbf{x},t\right),H\right\} \ \ \ \ \ (22)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime}\left[\frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\frac{\delta H\left(t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}-\frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}\frac{\delta H\left(t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\right]\ \ \ \ \ (23)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime}\left[0-\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right)\frac{\delta H\left(t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\right]\ \ \ \ \ (24)
\displaystyle \displaystyle = \displaystyle -\frac{\delta H\left(t\right)}{\delta\phi\left(\mathbf{x},t\right)} \ \ \ \ \ (25)

Finally, we can work out the Poisson brackets of the fields with each other, using the definition 10 and the results above.

\displaystyle \left\{ \phi\left(\mathbf{x},t\right),\pi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle \equiv \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\pi\left(\mathbf{x}^{\prime},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}-\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\pi\left(\mathbf{x}^{\prime},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\right]\ \ \ \ \ (26)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime\prime}\right)\delta^{3}\left(\mathbf{x}^{\prime}-\mathbf{x}^{\prime\prime}\right)-0\right]\ \ \ \ \ (27)
\displaystyle \displaystyle = \displaystyle \delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (28)

The other two Poisson brackets are zero because of 17:

\displaystyle \left\{ \phi\left(\mathbf{x},t\right),\phi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle \equiv \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\phi\left(\mathbf{x}^{\prime},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}-\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\phi\left(\mathbf{x}^{\prime},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\right]\ \ \ \ \ (29)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[0-0\right]\ \ \ \ \ (30)
\displaystyle \displaystyle = \displaystyle 0\ \ \ \ \ (31)
\displaystyle \left\{ \pi\left(\mathbf{x},t\right),\pi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle \equiv \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[\frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\pi\left(\mathbf{x}^{\prime},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}-\frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\pi\left(\mathbf{x}^{\prime},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\right]\ \ \ \ \ (32)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[0-0\right]\ \ \ \ \ (33)
\displaystyle \displaystyle = \displaystyle 0 \ \ \ \ \ (34)