Category Archives: Quantum field theory

Lagrangian for the Schrödinger equation

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 3, Section 3.1.

As a prelude to ‘proper’ quantum field theory, we’ll look first at turning the non-relativistic quantum theory based on the Schrödinger equation into a field theory. Before we develop a quantum field theory of the Schrödinger equation, we’ll first look at this equation treating the wave function {\psi\left(\mathbf{x},t\right)} as a classical (that is, non-quantum) field. The Schrödinger equation is

\displaystyle i\hbar\frac{\partial\psi}{\partial t}=-\frac{\hbar^{2}}{2m}\nabla^{2}\psi+V\left(\mathbf{x},t\right)\psi \ \ \ \ \ (1)


where {V\left(\mathbf{x},t\right)} is, as usual, the potential function.

In order to apply the techniques of classical field theory, we need a Lagrangian density {\mathcal{L}}. There doesn’t seem to be any way of actually deriving Lagrangian densities; presumably they are found through trial and error, with perhaps a bit of physical intuition. In any case, the Lagrangian density for the Schrödinger equation turns out to be

\displaystyle \mathcal{L}\left(\psi,\nabla\psi,\dot{\psi}\right)=i\hbar\psi^*\dot{\psi}-\frac{\hbar^{2}}{2m}\nabla\psi^*\cdot\nabla\psi-V\left(\mathbf{x},t\right)\psi^*\psi \ \ \ \ \ (2)


As {\psi} is a complex function, it has real and imaginary parts, so we can treat {\psi} and {\psi^*} as independent fields. As we saw earlier, we can derive the Euler-Lagrange equations for multiple fields from the principle of least action and end up with

\displaystyle \frac{\partial\mathcal{L}}{\partial\phi^{r}}-\frac{\partial}{\partial q^{\mu}}\left(\frac{\partial\mathcal{L}}{\partial\phi_{,\mu}^{r}}\right)=0 \ \ \ \ \ (3)


where the {\phi^{r}} are the fields and {q^{\mu}=\left(\mathbf{x},t\right)}. In this case, the two fields are {\psi} and {\psi^*} and we get the two equations

\displaystyle \frac{\partial\mathcal{L}}{\partial\psi}-\frac{\partial}{\partial x^{i}}\frac{\partial\mathcal{L}}{\partial\psi_{,i}}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}} \displaystyle = \displaystyle \frac{\partial\mathcal{L}}{\partial\psi}-\nabla\cdot\frac{\partial\mathcal{L}}{\partial\nabla\psi}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}}=0\ \ \ \ \ (4)
\displaystyle \frac{\partial\mathcal{L}}{\partial\psi^*}-\frac{\partial}{\partial x^{i}}\frac{\partial\mathcal{L}}{\partial\psi_{,i}^*}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}^*} \displaystyle = \displaystyle \frac{\partial\mathcal{L}}{\partial\psi^*}-\nabla\cdot\frac{\partial\mathcal{L}}{\partial\nabla\psi^*}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}^*}=0 \ \ \ \ \ (5)

The second term in each row just introduces the gradient sign {\nabla} as a shorthand for the {\frac{\partial}{\partial x^{i}}\frac{\partial\mathcal{L}}{\partial\psi_{,i}}} and {\frac{\partial}{\partial x^{i}}\frac{\partial\mathcal{L}}{\partial\psi_{,i}^*}} terms.

We can plug 2 into these two equations to verify that we recover the original Schrödinger equation 1 and its complex conjugate. From 4 we have

\displaystyle \frac{\partial\mathcal{L}}{\partial\psi} \displaystyle = \displaystyle -V\left(\mathbf{x},t\right)\psi^*\ \ \ \ \ (6)
\displaystyle \nabla\cdot\frac{\partial\mathcal{L}}{\partial\nabla\psi} \displaystyle = \displaystyle -\frac{\hbar^{2}}{2m}\nabla^{2}\psi^*\ \ \ \ \ (7)
\displaystyle \frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}} \displaystyle = \displaystyle i\hbar\dot{\psi}^*\ \ \ \ \ (8)
\displaystyle \frac{\partial\mathcal{L}}{\partial\psi}-\nabla\cdot\frac{\partial\mathcal{L}}{\partial\nabla\psi}-\frac{\partial}{\partial t}\frac{\partial\mathcal{L}}{\partial\dot{\psi}} \displaystyle = \displaystyle -i\hbar\dot{\psi}^*+\frac{\hbar^{2}}{2m}\nabla^{2}\psi^*-V\left(\mathbf{x},t\right)\psi^*=0\ \ \ \ \ (9)
\displaystyle -i\hbar\dot{\psi}^* \displaystyle = \displaystyle \frac{\hbar^{2}}{2m}\nabla^{2}\psi^*-V\left(\mathbf{x},t\right)\psi^* \ \ \ \ \ (10)

which is the complex conjugate of 1. Plugging 2 into 5 just reproduces 1.

The conjugate momentum density {\pi} can be calculated for the two fields {\psi} and {\psi^*}. We get

\displaystyle \pi_{1}\left(\mathbf{x},t\right) \displaystyle = \displaystyle \frac{\partial\mathcal{L}}{\partial\dot{\psi}}=i\hbar\psi^*\left(\mathbf{x},t\right)\ \ \ \ \ (11)
\displaystyle \pi_{2}\left(\mathbf{x},t\right) \displaystyle = \displaystyle \frac{\partial\mathcal{L}}{\partial\dot{\psi}^*}=0 \ \ \ \ \ (12)

The Hamiltonian density is defined as

\displaystyle \mathcal{H} \displaystyle = \displaystyle \sum_{r}\pi_{r}\dot{\phi}^{r}-\mathcal{L}\ \ \ \ \ (13)
\displaystyle \displaystyle = \displaystyle i\hbar\psi^*\dot{\psi}-\left[i\hbar\psi^*\dot{\psi}-\frac{\hbar^{2}}{2m}\nabla\psi^*\cdot\nabla\psi-V\left(\mathbf{x},t\right)\psi^*\psi\right]\ \ \ \ \ (14)
\displaystyle \displaystyle = \displaystyle \frac{\hbar^{2}}{2m}\nabla\psi^*\cdot\nabla\psi+V\left(\mathbf{x},t\right)\psi^*\psi \ \ \ \ \ (15)

The total Hamiltonian is the integral of this over 3-d space:

\displaystyle H \displaystyle = \displaystyle \int d^{3}x\;\mathcal{H}\ \ \ \ \ (16)
\displaystyle \displaystyle = \displaystyle \int d^{3}x\left[\frac{\hbar^{2}}{2m}\nabla\psi^*\cdot\nabla\psi+V\left(\mathbf{x},t\right)\psi^*\psi\right] \ \ \ \ \ (17)

We can integrate the first term by parts, by integrating the {\nabla\psi^*} term and invoking the usual assumption that {\psi^*\rightarrow0} fast enough at infinity that the integrated term is zero. We then get

\displaystyle H \displaystyle = \displaystyle \int d^{3}x\left[-\frac{\hbar^{2}}{2m}\psi^*\nabla^{2}\psi+V\left(\mathbf{x},t\right)\psi^*\psi\right]\ \ \ \ \ (18)
\displaystyle \displaystyle = \displaystyle \int d^{3}x\;\psi^*\left[-\frac{\hbar^{2}}{2m}\nabla^{2}\psi+V\left(\mathbf{x},t\right)\psi\right] \ \ \ \ \ (19)

Referring back to quantum mechanics for a moment, we see that this last integral is just {\left\langle \psi\left|\hat{H}\right|\psi\right\rangle }, that is, the expectation value of the Hamiltonian operator, which is the total energy of the system.

Finally, we can write down the Poisson brackets, since these are general results for any field {\psi} and its conjugate momentum {\pi}:

\displaystyle \left\{ \psi\left(\mathbf{x},t\right),\pi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle = \displaystyle \delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right)\ \ \ \ \ (20)
\displaystyle \left\{ \phi\left(\mathbf{x},t\right),\phi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle = \displaystyle 0\ \ \ \ \ (21)
\displaystyle \left\{ \pi\left(\mathbf{x},t\right),\pi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle = \displaystyle 0 \ \ \ \ \ (22)

These brackets will be used later when we quantize the theory.

Noether’s theorem and conservation of angular momentum

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Now that we’ve seen that a general Lorentz transformation can be represented as a product of a pure boost and a pure 3-d rotation, we can return to Noether’s theorem and see what conserved property it predicts when we require a physical system to be invariant under a Lorentz transformation. As usual, we consider an infinitesimal transformation, which we can write as

\displaystyle x^{\prime\mu}=x^{\mu}+\delta\omega^{\mu\nu}x_{\nu} \ \ \ \ \ (1)


where {\delta\omega^{\mu\nu}} is the infinitesimal rotation in 4-dimensional spacetime. Here, we are treating a pure boost as a rotation; for example, a boost in the {x_{1}} direction is given by the Lorentz transformation

\displaystyle x^{\prime}=\Lambda x \ \ \ \ \ (2)



\displaystyle \Lambda=\left[\begin{array}{cccc} \cosh\chi & \sinh\chi & 0 & 0\\ \sinh\chi & \cosh\chi & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (3)


for some ‘angle’ {\chi}. This reduces to the more familiar form found in introductory relativity courses if we set

\displaystyle \cosh\chi \displaystyle \equiv \displaystyle \gamma=\frac{1}{\sqrt{1-\beta^{2}}}\ \ \ \ \ (4)
\displaystyle \sinh\chi \displaystyle \equiv \displaystyle \beta\gamma=\frac{\beta}{\sqrt{1-\beta^{2}}} \ \ \ \ \ (5)

Returning to 1, we require, to first order in {\delta\omega_{\mu\nu}}, that the Minkowski length of the 4-vector is the same before and after the transformation. That is,

\displaystyle x^{\prime\mu}x_{\mu}^{\prime} \displaystyle = \displaystyle \left(x^{\mu}+\delta\omega^{\mu\sigma}x_{\sigma}\right)\left(x_{\mu}+\delta\omega_{\mu}^{\;\tau}x_{\tau}\right)\ \ \ \ \ (6)
\displaystyle \displaystyle = \displaystyle x^{\mu}x_{\mu}+\delta\omega^{\mu\sigma}x_{\sigma}x_{\mu}+\delta\omega_{\mu}^{\;\tau}x_{\tau}x^{\mu}\ \ \ \ \ (7)
\displaystyle \displaystyle = \displaystyle x^{\mu}x_{\mu}+\delta\omega^{\mu\sigma}x_{\sigma}x_{\mu}+\delta\omega^{\mu\tau}x_{\tau}x_{\mu}\ \ \ \ \ (8)
\displaystyle \displaystyle = \displaystyle x^{\mu}x_{\mu}+2\delta\omega^{\mu\nu}x_{\mu}x_{\nu} \ \ \ \ \ (9)

In the last line, we renamed the dummy indices {\sigma} and {\tau} to {\nu}. To first order, we require the last term in the last line to be zero for all {x_{\mu}}, which means we must impose a condition on {\delta\omega^{\mu\nu}}. We can write this term as

\displaystyle 2\delta\omega^{\mu\nu}x_{\mu}x_{\nu}=x_{\mu}x_{\nu}\left(\delta\omega^{\mu\nu}+\delta\omega^{\nu\mu}\right) \ \ \ \ \ (10)

From this, we see that we must have

\displaystyle \delta\omega^{\mu\nu}=-\delta\omega^{\nu\mu} \ \ \ \ \ (11)

so {\delta\omega^{\mu\nu}} must be antisymmetric.

Incidentally, if this condition seems to be violated in the pure boost matrix 3, remember that 2 is an ordinary matrix product, while the last term in 1 is the product of a tensor and 4-vector, and thus includes the effect of the metric tensor

\displaystyle g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (12)


To first order in {\chi}, 3 is

\displaystyle \Lambda=\left[\begin{array}{cccc} 1 & \chi & 0 & 0\\ \chi & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (13)

while for the infinitesimal rotation, we have

\displaystyle \delta\omega=\left[\begin{array}{cccc} 0 & -\chi & 0 & 0\\ \chi & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \end{array}\right] \ \ \ \ \ (14)

In matrix notation, 1 becomes

\displaystyle x^{\prime}=\delta\omega\times g\times x \ \ \ \ \ (15)

from which we can see that this gives the same result as 2.

In order to apply Noether’s theorem, we also need to know how the fields transform under a Lorentz transformation. The assumption is that, for infinitesimal transformations, the transformed field {\phi_{r}^{\prime}\left(x^{\prime}\right)} depends linearly on both the original fields {\phi_{r}\left(x\right)} and on the rotation {\delta\omega_{\mu\nu}}. That is, we assume that

\displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right)+\frac{1}{2}\delta\omega_{\mu\nu}\left(I^{\mu\nu}\right)_{rs}\phi_{s}\left(x\right) \ \ \ \ \ (16)


where {I^{\mu\nu}} are the infinitesimal generators of the Lorentz transformation. G & R don’t really explain this, apart from giving a reference to another book, but we won’t need to delve into the details to get the result needed in this post, so we’ll leave it for now.

From here, it’s a matter of plugging 1 and 16 into the equations for Noether’s theorem and seeing what comes out. Noether’s theorem says that

\displaystyle \partial^{\mu}f_{\mu}\left(x\right)=0 \ \ \ \ \ (17)



\displaystyle f_{\mu}\left(x\right) \displaystyle \equiv \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\ \ \ \ \ (18)
\displaystyle \displaystyle = \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\delta\phi_{r}\left(x\right)-\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right)\right)\delta x^{\nu} \ \ \ \ \ (19)

From 16

\displaystyle \delta\phi_{r}\left(x\right) \displaystyle = \displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)-\phi_{r}\left(x\right)\ \ \ \ \ (20)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\delta\omega_{\mu\nu}\left(I^{\mu\nu}\right)_{rs}\phi_{s}\left(x\right)\ \ \ \ \ (21)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\delta\omega_{\nu\lambda}\left(I^{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right) \ \ \ \ \ (22)

In the last line, we renamed the indices {\mu} and {\nu} to {\nu} and {\lambda} respectively to avoid confusing the {\mu} in the first term of 19 with any of the indices in {\delta\phi_{r}\left(x\right)}.

and from 1

\displaystyle \delta x^{\nu} \displaystyle = \displaystyle x^{\prime\nu}-x^{\nu}\ \ \ \ \ (23)
\displaystyle \displaystyle = \displaystyle \delta\omega^{\nu\lambda}x_{\lambda} \ \ \ \ \ (24)

We also had the energy-momentum tensor

\displaystyle T_{\mu\nu}\equiv\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right) \ \ \ \ \ (25)


Putting all this together, we have

\displaystyle f_{\mu}\left(x\right)=\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\frac{1}{2}\delta\omega_{\nu\lambda}\left(I^{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-T_{\mu\nu}\delta\omega^{\nu\lambda}x_{\lambda} \ \ \ \ \ (26)

Because {\delta\omega^{\nu\lambda}=-\delta\omega^{\lambda\nu}}

\displaystyle T_{\mu\nu}\delta\omega^{\nu\lambda}x_{\lambda} \displaystyle = \displaystyle \frac{1}{2}\left[T_{\mu\nu}\delta\omega^{\nu\lambda}x_{\lambda}-T_{\mu\nu}\delta\omega^{\lambda\nu}x_{\lambda}\right]\ \ \ \ \ (27)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\left[T_{\mu\nu}\delta\omega^{\nu\lambda}x_{\lambda}-T_{\mu\lambda}\delta\omega^{\nu\lambda}x_{\nu}\right]\ \ \ \ \ (28)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\delta\omega^{\nu\lambda}\left[T_{\mu\nu}x_{\lambda}-T_{\mu\lambda}x_{\nu}\right] \ \ \ \ \ (29)

In the second line, we swapped the dummy indices {\lambda} and {\nu} in the second term, which is allowed because both indices are summed. Therefore

\displaystyle f_{\mu}\left(x\right) \displaystyle = \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\frac{1}{2}\delta\omega_{\nu\lambda}\left(I^{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-\frac{1}{2}\delta\omega^{\nu\lambda}\left[T_{\mu\nu}x_{\lambda}-T_{\mu\lambda}x_{\nu}\right]\ \ \ \ \ (30)
\displaystyle \displaystyle = \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\frac{1}{2}\delta\omega^{\nu\lambda}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-\frac{1}{2}\delta\omega^{\nu\lambda}\left[T_{\mu\nu}x_{\lambda}-T_{\mu\lambda}x_{\nu}\right]\ \ \ \ \ (31)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\delta\omega^{\nu\lambda}\left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-\left(T_{\mu\nu}x_{\lambda}-T_{\mu\lambda}x_{\nu}\right)\right]\ \ \ \ \ (32)
\displaystyle \displaystyle \equiv \displaystyle \frac{1}{2}\delta\omega^{\nu\lambda}M_{\mu\nu\lambda}\left(x\right) \ \ \ \ \ (33)

In the second line, we swapped the positions of the {\nu\lambda} indices on the terms {\delta\omega_{\nu\lambda}\left(I^{\nu\lambda}\right)_{rs}}. This is OK provided they are both summed over. The last line defines the term {M_{\mu\nu\lambda}\left(x\right)}.

Because the infinitesimal rotations are arbitrary (subject to the condition that {\delta\omega^{\nu\lambda}} is antisymmetric), we can choose all of them to be zero except for one. For each such choice, we have a different {f_{\mu}\left(x\right)}, which leads to a conservation law for each choice. From Noether’s theorem, the quantity that is conserved is the integral of {f_{0}} over 3-space, so we have the conserved quantities

\displaystyle M_{\nu\lambda} \displaystyle = \displaystyle \int d^{3}x\left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)-\left(T_{0\nu}x_{\lambda}-T_{0\lambda}x_{\nu}\right)\right]\ \ \ \ \ (34)
\displaystyle \displaystyle = \displaystyle \int d^{3}x\left[T_{0\lambda}x_{\nu}-T_{0\nu}x_{\lambda}+\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right)\right] \ \ \ \ \ (35)

From 25, we see that the first two terms are

\displaystyle T_{0\lambda}x_{\nu}-T_{0\nu}x_{\lambda} \displaystyle = \displaystyle x_{\nu}\frac{\partial\mathcal{L}}{\partial\dot{\phi}_{r}}\partial_{\lambda}\phi_{r}-x_{\lambda}\frac{\partial\mathcal{L}}{\partial\dot{\phi}_{r}}\partial_{\nu}\phi_{r}\ \ \ \ \ (36)
\displaystyle \displaystyle = \displaystyle x_{\nu}\pi_{r}\partial_{\lambda}\phi_{r}-x_{\lambda}\pi_{r}\partial_{\nu}\phi_{r} \ \ \ \ \ (37)

where {\pi_{r}} is the conjugate momentum density, defined by

\displaystyle \pi_{r}\equiv\frac{\partial\mathcal{L}}{\partial\dot{\phi}_{r}} \ \ \ \ \ (38)

Using the physical momentum density

\displaystyle p_{\lambda}=\pi_{r}\partial_{\lambda}\phi_{r} \ \ \ \ \ (39)

we find that

\displaystyle T_{0\lambda}x_{\nu}-T_{0\nu}x_{\lambda}=x_{\nu}p_{\lambda}-x_{\lambda}p_{\nu} \ \ \ \ \ (40)

This is one component of the angular momentum density {\mathbf{r}\times\mathbf{p}}, so the integral

\displaystyle \int d^{3}x\left(T_{0\lambda}x_{\nu}-T_{0\nu}x_{\lambda}\right) \ \ \ \ \ (41)

is one component of the total angular momentum.

The other term in 35 is

\displaystyle \int d^{3}x\;\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(I_{\nu\lambda}\right)_{rs}\phi_{s}\left(x\right) \ \ \ \ \ (42)

depends on the generators {I_{\nu\lambda}}, and thus on the specific way in which the fields transform. G & R tell us that this term describes the spin angular momentum, but at this stage, we just have to accept this on faith.

In any case, the overall conservation rule 35 shows that the sum of the ‘traditional’ angular momentum from the first two terms in the integrand, together with this mysterious other term, is a conserved quantity, so interpreting it as some other form of angular momentum seems reasonable. We’ll just have to wait and see how this plays out.

Lorentz transformations as rotations

References: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

Arthur Jaffe, Lorentz transformations, rotations and boosts, online notes available (at time of writing, Sep 2016) here.

Before we apply Noether’s theorem to Lorentz transformations, we need to take a step back and look at a generalized version of the Lorentz transformation. Most introductory treatments of special relativity derive the Lorentz transformation as the transformation between two inertial frames that are moving at some constant velocity with respect to each other. This form of the transformations allows us to derive the usual consequences of special relativity such as length contraction and time dilation. However, it’s useful to look at a Lorentz transformation is a more general way.

The idea is to define a Lorentz transformation as any transformation that leaves the magnitude of all four-vectors {x} unchanged, where this magnitude is defined using the usual flat space metric {g^{\mu\nu}} so that

\displaystyle  x^{2}=x_{\mu}x^{\mu}=g^{\mu\nu}x_{\mu}x_{\nu}=x_{0}^{2}-x_{1}^{2}-x_{2}^{2}-x_{3}^{2} \ \ \ \ \ (1)

The flat space (Minkowski) metric is

\displaystyle  g=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0\\ 0 & 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (2)

We know that the traditional Lorentz transformation between two inertial frames in relative motion satisfies this condition, but in fact a rotation of the coordinate system in 3-d space (leaving the time coordinate unchanged) also satisfies this condition, so a Lorentz transformation defined in this more general way includes more transformations than the traditional one.

We can define this general transformation in terms of a {4\times4} matrix {\Lambda}, so that a four-vector {x} transforms to another vector {x^{\prime}} according to

\displaystyle  x^{\prime}=\Lambda x \ \ \ \ \ (3)

We can define the scalar product of two 4-vectors using the notation

\displaystyle  \left\langle x,y\right\rangle \equiv\sum_{i=0}^{3}x_{i}y_{i} \ \ \ \ \ (4)

The scalar product in flat space using the Minkowski metric {g} is therefore

\displaystyle  \left\langle x,gy\right\rangle =g^{\mu\nu}x_{\mu}y_{\nu}=x_{0}y_{0}-x_{1}y_{1}-x_{2}y_{2}-x_{3}y_{3} \ \ \ \ \ (5)

In matrix notation, in which {x} and {y} are column vectors, this is

\displaystyle  \left\langle x,gy\right\rangle =x^{T}gy \ \ \ \ \ (6)

In this way, the condition that {\Lambda} leaves the magnitude unchanged is

\displaystyle  \left\langle \Lambda x,g\Lambda x\right\rangle =\left\langle x,gx\right\rangle \ \ \ \ \ (7)

for all {x}. In matrix notation, this is

\displaystyle  \left(\Lambda x\right)^{T}g\Lambda x=x^{T}\Lambda^{T}g\Lambda x=x^{T}gx \ \ \ \ \ (8)

from which we get one condition on {\Lambda}:

\displaystyle  \Lambda^{T}g\Lambda=g \ \ \ \ \ (9)

[Note that Jaffe uses a superscript {tr} to indicate a matrix transpose; I find this confusing as {tr} usually means the trace of a matrix, and a superscript {T} is more usual for the transpose.]

Because both sides of 9 refer to a symmetric matrix (on the LHS, {\left(\Lambda^{T}g\Lambda\right)^{T}=\Lambda^{T}g^{T}\left(\Lambda^{T}\right)^{T}=\Lambda^{T}g\Lambda}), this equation gives 10 independent equations for the elements of {\Lambda}, so the number of parameters that can be specified arbitrarily is {4\times4-10=6}.

The set {\mathcal{L}} of all Lorentz transformations forms a group under matrix multiplication, known as the Lorentz group. We can demonstrate this by showing that the four group properties are satisfied.

First, completeness. If we perform two transformations in succession on a 4-vector {x} then we get {x^{\prime}=\Lambda_{2}\Lambda_{1}x}. The compound transformation satisfies 9:

\displaystyle   \left(\Lambda_{2}\Lambda_{1}\right)^{T}g\Lambda_{2}\Lambda_{1} \displaystyle  = \displaystyle  \Lambda_{1}^{T}\Lambda_{2}^{T}g\Lambda_{2}\Lambda_{1}\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \Lambda_{1}^{T}g\Lambda_{1}\ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  g \ \ \ \ \ (12)

Thus the group is closed under multiplication.

Second, associativity is automatically satisfied as matrix multiplication is associative.

An identity element exists in the form of the identity matrix {I}, which is itself a Lorentz transformation as it satisfies 9.

Finally, we need to show that every matrix {\Lambda} has an inverse that is also part of the set {\mathcal{L}}. Taking the determinant of 9 we have

\displaystyle   \det\left(\Lambda^{T}g\Lambda\right) \displaystyle  = \displaystyle  \left(\det\Lambda^{T}\right)\left(\det g\right)\left(\det\Lambda\right)\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  \left(\det\Lambda\right)\left(\det g\right)\left(\det\Lambda\right)\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  -\left(\det\Lambda\right)^{2} \ \ \ \ \ (15)

since {\det g=-1} from 2. From the RHS of 9, this must equal {\det g=-1} so we have

\displaystyle   -\left(\det\Lambda\right)^{2} \displaystyle  = \displaystyle  -1\ \ \ \ \ (16)
\displaystyle  \det\Lambda \displaystyle  = \displaystyle  \pm1 \ \ \ \ \ (17)

From a basic theorem in matrix algebra, any matrix with a non-zero determinant has an inverse, so {\Lambda^{-1}} exists. To show that {\Lambda^{-1}} is a Lorentz transformation, we can take the inverse of 9 and use the fact that {g^{-1}=g}:

\displaystyle   \left(\Lambda^{T}g\Lambda\right)^{-1} \displaystyle  = \displaystyle  g^{-1}=g\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{T}\right)^{-1}\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{-1}\right)^{T} \ \ \ \ \ (20)

since the inverse and transpose operations commute (another basic theorem in matrix algebra). Therefore {\Lambda^{-1}} is also a valid Lorentz transformation.

We can also see that {\Lambda^{T}} is a valid transformation by left-multiplying by {\Lambda} and right-multiplying by {\Lambda^{T}}:

\displaystyle   g \displaystyle  = \displaystyle  \Lambda^{-1}g\left(\Lambda^{-1}\right)^{T}\ \ \ \ \ (21)
\displaystyle  \Lambda g\Lambda^{T} \displaystyle  = \displaystyle  \left(\Lambda\Lambda^{-1}\right)g\left(\Lambda^{-1}\right)^{T}\Lambda^{T}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  g \ \ \ \ \ (23)

We need one more property of {\Lambda} concerning the element {\Lambda_{00}}. Again starting from 9, the 00 component of the RHS is {g_{00}=1}, and writing out the 00 component of the LHS explicitly we have

\displaystyle  \left[\Lambda^{T}g\Lambda\right]_{00}=\Lambda_{00}^{2}-\sum_{i=1}^{3}\Lambda_{i0}^{2}=1 \ \ \ \ \ (24)

This gives

\displaystyle  \Lambda_{00}=\pm\sqrt{1+\sum_{i=1}^{3}\Lambda_{i0}^{2}} \ \ \ \ \ (25)

Thus either {\Lambda_{00}\ge1} or {\Lambda_{00}\le-1}.

From the determinant and {\Lambda_{00}}, we can classify a particular transformation matrix {\Lambda} as being in one of four so-called connected components. Jaffe spells out in detail the proof that these four components are disjoint, that is, we can’t define some parameter {s} that can be varied continuously to move a matrix {\Lambda} from one connected component to another connected component. The notation {\mathcal{L}_{+}^{\uparrow}} indicates the set of matrices with {\det\Lambda=+1} (indicated by the + subscript) and {\Lambda_{00}\ge1} (indicated by the {\uparrow} superscript). The other three connected components are {\mathcal{L}_{-}^{\uparrow}} ({\det\Lambda=-1}, {\Lambda_{00}\ge1}); {\mathcal{L}_{+}^{\downarrow}} ({\det\Lambda=+1}, {\Lambda_{00}\le1}); and {\mathcal{L}_{-}^{\downarrow}} ({\det\Lambda=-1}, {\Lambda_{00}\le1}). Not all of these subsets of {\mathcal{L}} form groups, as some of them are not closed under multiplication.

If {\det\Lambda=+1}, {\Lambda} is called proper, and if{\det\Lambda=-1}, {\Lambda} is called improper. If {\Lambda_{00}\ge+1}, {\Lambda} is orthochronous, and if{\Lambda_{00}\le-1}, {\Lambda} is non-orthochronous. From here on, we’ll consider only proper orthochronous transformations, that is, the connected component {\mathcal{L}_{+}^{\uparrow}}.

Members of {\mathcal{L}_{+}^{\uparrow}} can be subdivided again into two types: pure rotations and pure boosts. A pure rotation is a rotation (about the origin) in 3-d space, leaving the time coordinate unchanged. That is, {\Lambda_{00}=+1}. Such a transformation can be written as

\displaystyle  \Lambda=\left[\begin{array}{cc} 1 & 0\\ 0 & \mathcal{R} \end{array}\right] \ \ \ \ \ (26)

where {\mathcal{R}} is a {3\times3} matrix, and the 0s represent 3 zero components in the top row and first column. We know that the off-diagonal elements in the first column must be zero, since if {\Lambda_{00}=+1}, we have from 25 that

\displaystyle  \sum_{i=1}^{3}\Lambda_{i0}^{2}=0 \ \ \ \ \ (27)

Since {\Lambda^{T}} must also be a valid transformation, this gives the analogous equation

\displaystyle  \sum_{i=1}^{3}\Lambda_{0i}^{2}=0 \ \ \ \ \ (28)

Thus the off-diagonal elements of the top row of {\Lambda} are also zero.

Since {\det\Lambda=1}, we must have {\det\mathcal{R}=1}. From 9, {\mathcal{R}} must also be an orthogonal matrix, that is, its rows must be mutually orthogonal (as must its columns). For example, if we pick the 2,3 element in the product 9, we have

\displaystyle   \left[\Lambda^{T}g\Lambda\right]_{23} \displaystyle  = \displaystyle  g_{23}=0\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  -\sum_{i=1}^{3}\Lambda_{i2}\Lambda_{i3} \ \ \ \ \ (30)

Thus columns 2 and 3 must be orthogonal.

These matrices form a group known as {SO\left(3\right)}, the group of real, orthogonal, {3\times3} matrices with {\det\mathcal{R}=+1}. A familiar example is a rotation by an angle {\theta} about the {z} axis, for which

\displaystyle  \mathcal{R}=\left[\begin{array}{ccc} \cos\theta & -\sin\theta & 0\\ \sin\theta & \cos\theta & 0\\ 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (31)

giving the full transformation matrix as

\displaystyle  \Lambda=\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & \cos\theta & -\sin\theta & 0\\ 0 & \sin\theta & \cos\theta & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (32)

In general, a rotation can be about any axis through the origin, in which case {\mathcal{R}} gets more complicated, but the idea is the same.

We’ve already seen that a pure boost, that is, a transformation into a second inertial frame moving at some constant velocity in a given direction relative to the first frame, can be written as a rotation, if we use hyperbolic functions instead of trig functions. In this case {\Lambda_{00}>+1}. The standard situation from introductory special relativity is that of from {S^{\prime}} moving along the {x_{1}} axis at some constant speed {\beta}. If we define

\displaystyle   \cosh\chi \displaystyle  \equiv \displaystyle  \gamma=\frac{1}{\sqrt{1-\beta^{2}}}\ \ \ \ \ (33)
\displaystyle  \sinh\chi \displaystyle  \equiv \displaystyle  \beta\gamma=\frac{\beta}{\sqrt{1-\beta^{2}}} \ \ \ \ \ (34)

then the transformation is

\displaystyle  \Lambda=\left[\begin{array}{cccc} \cosh\chi & \sinh\chi & 0 & 0\\ \sinh\chi & \cosh\chi & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (35)

This has determinant +1 since {\cosh^{2}\chi-\sinh^{2}\chi=1}. We can verify by direct substitution that 9 is satisfied.

It turns out that all proper, orthochronous Lorentz transformations can be written as the product of a pure rotation and a pure boost, that is

\displaystyle  \Lambda=BR \ \ \ \ \ (36)

where the pure rotation {R} is applied first, followed by a pure boost {B}. (Jaffe doesn’t prove this at this point; we’ll return to this later.)

Noether’s theorem and conservation of energy and momentum

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

An important example of Noether’s theorem is the conservation of energy and momentum as consequences of the invariance of the action under coordinate translation in spacetime. Noether’s theorem applies to the situation where we transform the coordinates according to

\displaystyle x_{\mu}^{\prime}=x_{\mu}+\delta x_{\mu} \ \ \ \ \ (1)


resulting in a variation of the fields

\displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right)+\delta\phi_{r}\left(x\right) \ \ \ \ \ (2)


If this variation in coordinates and fields leaves the action integral unchanged, Noether’s theorem says that the following condition must be satisfied:

\displaystyle \partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial_{\nu}\phi_{r}\delta x^{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\right)=0 \ \ \ \ \ (3)


By integrating this over 3-d space and using Gauss’s law, we find a conserved quantity {G}, given by

\displaystyle G\equiv\int_{V}d^{3}x\;\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial_{\nu}\phi_{r}\delta x^{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{0}\right) \ \ \ \ \ (4)


Suppose we consider a translation in spacetime, so that the coordinates transform according to

\displaystyle x_{\mu}^{\prime}=x_{\mu}+\epsilon_{\mu} \ \ \ \ \ (5)

where the {\epsilon_{\mu}}s are infinitesimal (and independent) constants. That is, we’re free to vary any (or all) of the coordinates by some infinitesimal amount. In particular, we can choose to make only one of the {\epsilon_{\mu}} variations non-zero. For example, we might choose {\epsilon_{0}} to be non-zero with the remaining three {\epsilon_{i}=0}, which amounts to a translation in time but not in position.

Such a translation means that we perform the same experiment (the same ‘physics’) at a different time and/or at a different place, and we require that we get the same result under all such translations. Note that this does not mean that the behaviour of a system is independent of time or space. Rather, what it is saying is that if we imagine that the only thing that exists in the universe is the physical system we’re studying, it shouldn’t matter if we move the system to some other location, or start the experiment at an earlier or later time; in all cases we should observe the same behaviour. The system might evolve to different states as time passes, but the time-dependence of the system will be the same, as measured from the starting point we have chosen.

In terms of the fields, this amounts to saying that the fields will have exactly the same form when expressed in terms of the translation coordinates, that is

\displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right) \ \ \ \ \ (6)

[Recall from our earlier discussion that {x^{\prime}} and {x} both refer to the same point, but written in different coordinate systems. Under a translation, the value of a scalar field remains the same, as does a vector field, since all we’ve done is move the coordinate axes parallel to themselves. This is different from a rotation of the coordinates, under which a vector does changes its components in the new coordinate system (although its length remains unchanged).]

Thus we have

\displaystyle \delta\phi_{r}\left(x\right)=0 \ \ \ \ \ (7)

and from 3

\displaystyle \partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right)\right)\epsilon^{\nu}=0 \ \ \ \ \ (8)

where we’ve used {\delta x_{\nu}=\epsilon_{\nu}} and used the metric tensor {g_{\mu\nu}} to lower the index: {\epsilon_{\mu}=g_{\mu\nu}\epsilon^{\nu}}. Since the {\epsilon_{\nu}} are arbitrary, we must have

\displaystyle \partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right)\right)=0 \ \ \ \ \ (9)


for each value of {\nu=0,1,2,3} separately. Thus we get four conservation laws.

For {\nu=0} we can apply the same procedure that was used to derive 4 from 3. That is, we integrate over 3-d space and use Gauss’s law:

\displaystyle \int_{V}d^{3}x\;\partial^{i}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{i}\phi_{r}\right)}\partial_{0}\phi_{r}-g_{i0}\mathcal{L}\left(x\right)\right)=\partial^{0}\int_{V}d^{3}x\;\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{0}\phi_{r}-g_{00}\mathcal{L}\left(x\right)\right) \ \ \ \ \ (10)

On the LHS, the index {i} runs over the spatial indexes 1,2 and 3, and we’ve set {\nu=0} on both sides. The integral on the LHS is a divergence, so we use Gauss’s law to convert this to a surface integral and extend the surface to infinity, requiring the integrand to go to zero fast enough that the integral is zero in the limit. We then get that

\displaystyle \partial^{0}\int_{V}d^{3}x\;\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{0}\phi_{r}-g_{00}\mathcal{L}\left(x\right)\right)=0 \ \ \ \ \ (11)


so that the integral is a conserved quantity (it has zero time derivative).

Comparing this with Hamilton’s equations of motion, we have the conjugate momentum density

\displaystyle \pi=\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)} \ \ \ \ \ (12)

so the integrand of 11 becomes Hamilton’s equation for the Hamiltonian density (using {g_{00}=1} in flat space):

\displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{0}\phi_{r}-g_{00}\mathcal{L}\left(x\right)=\pi_{r}\dot{\phi}_{r}-\mathcal{L}=\mathcal{H} \ \ \ \ \ (13)

Since {\mathcal{H}} is the energy density, 11 says that the total energy of the system is constant in time, so energy is conserved.

We can repeat the procedure for the other three values of {\nu} to get

\displaystyle \partial^{0}\int_{V}d^{3}x\;\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{i}\phi_{r}-g_{0i}\mathcal{L}\left(x\right)\right)=0 \ \ \ \ \ (14)


where the index {i=1,2,3}.

Since {g_{0i}=0} in flat space, the integrand reduces to

\displaystyle p_{i}=\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\partial_{i}\phi_{r}=\pi_{r}\frac{\partial\phi_{r}}{\partial x^{i}} \ \ \ \ \ (15)

As we’ve seen earlier, we can interpret this quantity as the physical momentum density, so 14 says that each component of the total physical momentum is conserved. Thus requiring a physical system to be invariant under translation in spacetime results in the laws of conservation of energy and linear momentum.

Going back to 9, the general conservation law says that

\displaystyle \partial^{\mu}T_{\mu\nu}=0 \ \ \ \ \ (16)


\displaystyle T_{\mu\nu}\equiv\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial_{\nu}\phi_{r}-g_{\mu\nu}\mathcal{L}\left(x\right) \ \ \ \ \ (17)

is the energy-momentum tensor and is defined for all values of {\mu} and {\nu} in the range 0,1,2,3. [G & R use the symbol {\Theta_{\mu\nu}} for this tensor, but as it’s the same as the stress-energy tensor, we’ll try to keep the notation consistent at this point.]

Noether’s theorem and conservation laws

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

We can now apply the formulas resulting from coordinate transformations to derive Noether’s theorem, which states that for each coordinate transformation that leaves the physics of a system unchanged, there is a corresponding conserved quantity. More precisely, if we transform the coordinates according to

\displaystyle x_{\mu}^{\prime}=x_{\mu}+\delta x_{\mu} \ \ \ \ \ (1)


this results in a variation of the fields

\displaystyle \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right)+\delta\phi_{r}\left(x\right) \ \ \ \ \ (2)


We can insert these varied quantities into the Lagrangian density to get its variation:

\displaystyle \mathcal{L}^{\prime}\left(x^{\prime}\right)=\mathcal{L}\left(x\right)+\delta\mathcal{L}\left(x\right) \ \ \ \ \ (3)


[Note that the Lagrangian density actually depends on the fields {\phi_{r}} and their derivatives {\partial_{\mu}\phi_{r}}, both of which depend, in turn, on the coordinates {x_{\mu}}, but having to write {\mathcal{L}\left(\phi_{r}\left(x\right),\partial_{\mu}\phi_{r}\left(x\right)\right)} everywhere would get very tedious, so we’ll use {\mathcal{L}\left(x\right)=\mathcal{L}\left(\phi_{r}\left(x\right),\partial_{\mu}\phi_{r}\left(x\right)\right)} as a shorthand.] The function {\mathcal{L}^{\prime}\left(x^{\prime}\right)} is just {\mathcal{L}\left(x\right)} with {x} replaced by {x^{\prime}} and {\phi_{r}\left(x\right)} replaced by {\phi_{r}^{\prime}\left(x^{\prime}\right)}.

The mathematical interpretation of the phrase “the physics doesn’t change” is expressed by requiring the action of the system to remain the same when {\mathcal{L}} is varied. Using G & R’s symbol {W} (rather than the more usual {S}) for the action, this requirement is

\displaystyle \delta W\equiv\int_{\Omega^{\prime}}d^{4}x^{\prime}\mathcal{L}^{\prime}\left(x^{\prime}\right)-\int_{\Omega}d^{4}x\;\mathcal{L}\left(x\right)=0 \ \ \ \ \ (4)


As explained earlier, both {x^{\prime}} and {x} refer to the same point in spacetime, written in two different coordinate systems. The volume {\Omega^{\prime}} is the same volume as {\Omega}, but written in the {x^{\prime}} coordinate system.

From here, we can follow G & R’s derivation of Noether’s theorem, which I find somewhat easier to follow than the one in Peskin & Schroeder, which I looked at earlier. In order to understand what 4 is saying, we first need to express everything in terms of one coordinate system, which we’ll take to be {x}. First, we look at the volume element {d^{4}x^{\prime}} in the first integral. We can express this in terms of the volume element {d^{4}x} by using the Jacobian determinant, using 1 to calculate the derivatives.

\displaystyle d^{4}x^{\prime} \displaystyle = \displaystyle \left|\frac{\partial\left(x_{\mu}^{\prime}\right)}{\partial\left(x_{\nu}\right)}\right|\ \ \ \ \ (5)
\displaystyle \displaystyle = \displaystyle \left|\begin{array}{cccc} 1+\frac{\partial\delta x_{0}}{\partial x_{0}} & \frac{\partial\delta x_{0}}{\partial x_{1}} & \frac{\partial\delta x_{0}}{\partial x_{2}} & \frac{\partial\delta x_{0}}{\partial x_{3}}\\ \frac{\partial\delta x_{1}}{\partial x_{0}} & 1+\frac{\partial\delta x_{1}}{\partial x_{1}} & \frac{\partial\delta x_{1}}{\partial x_{2}} & \frac{\partial\delta x_{1}}{\partial x_{3}}\\ \frac{\partial\delta x_{2}}{\partial x_{0}} & \frac{\partial\delta x_{2}}{\partial x_{1}} & 1+\frac{\partial\delta x_{2}}{\partial x_{2}} & \frac{\partial\delta x_{2}}{\partial x_{3}}\\ \frac{\partial\delta x_{3}}{\partial x_{0}} & \frac{\partial\delta x_{3}}{\partial x_{1}} & \frac{\partial\delta x_{3}}{\partial x_{2}} & 1+\frac{\partial\delta x_{3}}{\partial x_{3}} \end{array}\right| \ \ \ \ \ (6)

Since we’re considering only infinitesimal variations, we need to keep only up to first order terms in this determinant. If we expand the determinant about the first row, the first term is

\displaystyle \left(1+\frac{\partial\delta x_{0}}{\partial x_{0}}\right)\left|\begin{array}{ccc} 1+\frac{\partial\delta x_{1}}{\partial x_{1}} & \frac{\partial\delta x_{1}}{\partial x_{2}} & \frac{\partial\delta x_{1}}{\partial x_{3}}\\ \frac{\partial\delta x_{2}}{\partial x_{1}} & 1+\frac{\partial\delta x_{2}}{\partial x_{2}} & \frac{\partial\delta x_{2}}{\partial x_{3}}\\ \frac{\partial\delta x_{3}}{\partial x_{1}} & \frac{\partial\delta x_{3}}{\partial x_{2}} & 1+\frac{\partial\delta x_{3}}{\partial x_{3}} \end{array}\right| \displaystyle =
\displaystyle \left(1+\frac{\partial\delta x_{0}}{\partial x_{0}}\right)\left[\left(1+\frac{\partial\delta x_{1}}{\partial x_{1}}\right)\left(\left(1+\frac{\partial\delta x_{2}}{\partial x_{2}}\right)\left(1+\frac{\partial\delta x_{3}}{\partial x_{3}}\right)-\frac{\partial\delta x_{2}}{\partial x_{3}}\frac{\partial\delta x_{3}}{\partial x_{2}}\right)+\ldots\right] \ \ \ \ \ (7) \displaystyle =
\displaystyle 1+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}+\ldots

In the second line, all the terms represented by the … are of second or higher order in {\delta x_{\mu}} so can be omitted from the final result. Note that we’re summing over {\mu} in the last line. All terms arising from the remaining 3 terms in the expansion about the first row of 6 are also of second or higher order, so the final result, valid to first order, is

\displaystyle d^{4}x^{\prime}=\left(1+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\right)d^{4}x \ \ \ \ \ (8)

So much for the volume element. The only remaining task is to express {\mathcal{L}^{\prime}\left(x^{\prime}\right)} in the {x} coordinate system. To do this, we can use 3

\displaystyle \delta W \displaystyle = \displaystyle \int_{\Omega^{\prime}}d^{4}x^{\prime}\left[\delta\mathcal{L}\left(x\right)+\mathcal{L}\left(x\right)\right]-\int_{\Omega}d^{4}x\;\mathcal{L}\left(x\right)\ \ \ \ \ (9)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\left[\left(1+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\right)\left(\delta\mathcal{L}\left(x\right)+\mathcal{L}\left(x\right)\right)-\mathcal{L}\left(x\right)\right]\ \ \ \ \ (10)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\left[\left(1+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\right)\delta\mathcal{L}\left(x\right)+\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\mathcal{L}\left(x\right)\right]\ \ \ \ \ (11)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\delta\mathcal{L}\left(x\right)+\int_{\Omega}d^{4}x\;\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\mathcal{L}\left(x\right) \ \ \ \ \ (12)

where in the last line, we saved terms up to first order only. In the second line, we can replace the volume of integration {\Omega^{\prime}} by {\Omega} in all integrals, since we’ve changed the integration variable from {x^{\prime}} to {x}, and {\Omega} and {\Omega^{\prime}} both represent the same volume, as mentioned above.

Now we can use the total variation, which is

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right)=\delta\mathcal{L}\left(x\right)-\frac{\partial\mathcal{L}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu} \ \ \ \ \ (13)


We get

\displaystyle \delta W \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\left[\tilde{\delta}\mathcal{L}\left(x\right)+\frac{\partial\mathcal{L}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu}\right]+\int_{\Omega}d^{4}x\;\frac{\partial\delta x_{\mu}}{\partial x_{\mu}}\mathcal{L}\left(x\right)\ \ \ \ \ (14)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\left[\tilde{\delta}\mathcal{L}\left(x\right)+\partial^{\mu}\left(\mathcal{L}\left(x\right)\delta x_{\mu}\right)\right] \ \ \ \ \ (15)

using the product rule (backwards) in the last line.

Now, remembering that {\mathcal{L}\left(x\right)=\mathcal{L}\left(\phi_{r}\left(x\right),\partial_{\mu}\phi_{r}\left(x\right)\right)}, we can use the chain rule to expand the total variation of {\mathcal{L}}:

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right) \displaystyle = \displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}\tilde{\delta}\phi_{r}\left(x\right)+\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\tilde{\delta}\left(\partial^{\mu}\phi_{r}\left(x\right)\right) \ \ \ \ \ (16)

We can now add and subtract the same term to the RHS (equivalent to adding zero) to get

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right) \displaystyle = \displaystyle \left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}\tilde{\delta}\phi_{r}\left(x\right)-\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\tilde{\delta}\phi_{r}\left(x\right)\right]\ \ \ \ \ (17)
\displaystyle \displaystyle \displaystyle +\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\tilde{\delta}\phi_{r}\left(x\right)+\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\tilde{\delta}\left(\partial^{\mu}\phi_{r}\left(x\right)\right) \ \ \ \ \ (18)

As we saw earlier, the total variation operation {\tilde{\delta}} commutes with differentiation with respect to {x_{\mu}} so we can interchange the {\tilde{\delta}} and {\partial^{\mu}} in the last term to get

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right) \displaystyle = \displaystyle \left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}\tilde{\delta}\phi_{r}\left(x\right)-\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\tilde{\delta}\phi_{r}\left(x\right)\right]\ \ \ \ \ (19)
\displaystyle \displaystyle \displaystyle +\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\tilde{\delta}\phi_{r}\left(x\right)+\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\partial^{\mu}\left(\tilde{\delta}\phi_{r}\left(x\right)\right) \ \ \ \ \ (20)

We can now use the reverse product rule on the last two terms to get

\displaystyle \tilde{\delta}\mathcal{L}\left(x\right) \displaystyle = \displaystyle \left[\frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}-\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)\right]\tilde{\delta}\phi_{r}\left(x\right)\ \ \ \ \ (21)
\displaystyle \displaystyle \displaystyle +\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\tilde{\delta}\phi_{r}\left(x\right)\right) \ \ \ \ \ (22)

The term in square brackets is just the Euler-Lagrange equation and is zero if the fields {\phi_{r}} satisfy the equations of motion:

\displaystyle \frac{\partial\mathcal{L}\left(x\right)}{\partial\phi_{r}}-\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\right)=0 \ \ \ \ \ (23)

We can therefore insert 22 back into 15 to get

\displaystyle \delta W \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\left[\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\tilde{\delta}\phi_{r}\left(x\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\right)\right]\ \ \ \ \ (24)
\displaystyle \displaystyle = \displaystyle \int_{\Omega}d^{4}x\;\left[\partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\right)\right] \ \ \ \ \ (25)

The requirement that {\delta W=0} must mean that the integrand is zero, since the volume {\Omega} over which the integration is done is arbitrary. Thus we get

\displaystyle \partial^{\mu}\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu}\right)=0 \ \ \ \ \ (26)

To see what this means, we can define the function {f} as

\displaystyle f_{\mu}\left(x\right)\equiv\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{\mu}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{\mu} \ \ \ \ \ (27)

so that

\displaystyle \partial^{\mu}f_{\mu}\left(x\right)=0 \ \ \ \ \ (28)


If we integrate this over 3-d space and use Gauss’s theorem to convert the integral of a divergence to a surface integral, we have

\displaystyle \int_{V}d^{3}x\;\partial^{\mu}f_{\mu}\left(x\right) \displaystyle = \displaystyle \int_{V}d^{3}x\;\partial^{0}f_{0}\left(x\right)+\int_{V}d^{3}x\nabla\cdot\mathbf{f}\left(x\right)\ \ \ \ \ (29)
\displaystyle \displaystyle = \displaystyle \frac{d}{dx_{0}}\int_{V}d^{3}x\;f_{0}\left(x\right)+\int_{S}d\mathbf{a}\cdot\mathbf{f}\left(x\right)\ \ \ \ \ (30)
\displaystyle \displaystyle = \displaystyle \frac{d}{dx_{0}}\int_{V}d^{3}x\;f_{0}\left(x\right) \ \ \ \ \ (31)

where we make the usual assumption in the second line that {\mathbf{f}\left(x\right)\rightarrow0} fast enough at infinity that the surface integral is zero. However, the requirement 28 implies that the result of this volume integral must be zero as well, so that

\displaystyle \frac{d}{dx_{0}}\int_{V}d^{3}x\;f_{0}\left(x\right)=0 \ \ \ \ \ (32)

This implies that {f_{0}\left(x\right)} is a conserved quantity, as its volume integral is constant over time. This is Noether’s theorem, which we can state as:

A continuous symmetry transformation (given by 1 and 2) that leaves the physics unchanged (that is, there is no change in the action integral 4) leads to a conservation law, with the conserved quantity {G} given by

\displaystyle G \displaystyle \equiv \displaystyle \int_{V}d^{3}x\;f_{0}\left(x\right)\ \ \ \ \ (33)
\displaystyle \displaystyle = \displaystyle \int_{V}d^{3}x\;\left(\frac{\partial\mathcal{L}\left(x\right)}{\partial\left(\partial^{0}\phi_{r}\right)}\left(\delta\phi_{r}\left(x\right)-\partial^{\nu}\phi_{r}\delta x_{\nu}\right)+\mathcal{L}\left(x\right)\delta x_{0}\right) \ \ \ \ \ (34)

Coordinate transformations in classical field theory

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.4.

The various conservation laws of physics (energy, linear and angular momentum) can be derived from the invariance of a system under coordinate transformations. To prepare for Noether’s theorem, which is a general theorem allowing us to derive these conservation laws, we need to consider how the fields themselves transform under coordinate transformations.

In what follows, we’ll consider only infinitesimal transformations, and we define a general transformation as

\displaystyle  x_{\mu}^{\prime}=x_{\mu}+\delta x_{\mu} \ \ \ \ \ (1)

Note that {x_{\mu}} and {x_{\mu}^{\prime}} both refer to the same physical point in space; they simply represent two different coordinate systems referring to this same point.

Under this transformation, the mathematical function describing the field will change as well, so we can write

\displaystyle  \phi_{r}^{\prime}\left(x^{\prime}\right)=\phi_{r}\left(x\right)+\delta\phi_{r}\left(x\right) \ \ \ \ \ (2)

where the subscript {r} labels which field we’re talking about.

Again, {\phi_{r}^{\prime}\left(x^{\prime}\right)} and {\phi_{r}\left(x\right)} both represent the same field at the same point in space-time; they are just expressed in different coordinate systems.

At this point, it’s useful to have a look at a specific example. Suppose the field {\phi} is a vector field in two dimensions (we’ll drop the {r} subscript, as we’re dealing with only one field). We’ll see what happens if we rotate the coordinate system through an angle {\theta}, as in the diagram, where the unprimed system is drawn in black and the primed system in blue.

In the unprimed system, {\phi} consists of horizontal vectors with a magnitude equal to their {x_{2}} coordinate.

\displaystyle  \phi\left(x\right)=\left[\begin{array}{c} x_{2}\\ 0 \end{array}\right] \ \ \ \ \ (3)

Under a rotation, the coordinates transform according to

\displaystyle  x^{\prime}=\left[\begin{array}{c} x_{1}^{\prime}\\ x_{2}^{\prime} \end{array}\right]=\left[\begin{array}{cc} \cos\theta & \sin\theta\\ -\sin\theta & \cos\theta \end{array}\right]\left[\begin{array}{c} x_{1}\\ x_{2} \end{array}\right]=\left[\begin{array}{c} x_{1}\cos\theta+x_{2}\sin\theta\\ -x_{1}\sin\theta+x_{2}\cos\theta \end{array}\right] \ \ \ \ \ (4)

Inverting the rotation gives

\displaystyle  \left[\begin{array}{c} x_{1}\\ x_{2} \end{array}\right]=\left[\begin{array}{cc} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta \end{array}\right]\left[\begin{array}{c} x_{1}^{\prime}\\ x_{2}^{\prime} \end{array}\right]=\left[\begin{array}{c} x_{1}^{\prime}\cos\theta-x_{2}^{\prime}\sin\theta\\ x_{1}^{\prime}\sin\theta+x_{2}^{\prime}\cos\theta \end{array}\right] \ \ \ \ \ (5)

For our example vector field 3, we have

\displaystyle  \phi^{\prime}\left(x^{\prime}\right)=\left[\begin{array}{c} x_{2}\cos\theta\\ -x_{2}\sin\theta \end{array}\right]=\left[\begin{array}{c} x_{1}^{\prime}\sin\theta\cos\theta+x_{2}^{\prime}\cos^{2}\theta\\ -x_{1}^{\prime}\sin^{2}\theta-x_{2}^{\prime}\sin\theta\cos\theta \end{array}\right] \ \ \ \ \ (6)

As we can see from the diagram by looking at the magenta vector, the vector in the unprimed system is parallel to the {x_{1}} axis, with length {x_{2}} as given by 3. If we rotate the coordinate axes by the angle {\theta} we get the primed system shown as the blue axes, and we can see that in that system, the magenta vector has a positive component in the {x_{1}^{\prime}} direction and a negative component in the {x_{2}^{\prime}} direction. However, the length of the vector remains the same in both systems, since the vector itself doesn’t change when we simply rotate the coordinates.

Since we’ll deal primarily with infinitesimal transformations from now on, we’ll do the rest of the analysis using that approximation. For the rotation example above, if {\theta} is now an infinitesimal angle (I suppose I should write it as {\delta\theta} but this just clutters up the notation, so just remember that {\theta} is infinitesimal and all will be well.), then we have, to first order in {\theta}, {\cos\theta=1} and {\sin\theta=\theta}, so for a general rotation

\displaystyle   x^{\prime} \displaystyle  = \displaystyle  \left[\begin{array}{c} x_{1}^{\prime}\\ x_{2}^{\prime} \end{array}\right]=\left[\begin{array}{c} x_{1}+x_{2}\theta\\ -x_{1}\theta+x_{2} \end{array}\right]\ \ \ \ \ (7)
\displaystyle  \delta x \displaystyle  = \displaystyle  x^{\prime}-x=\left[\begin{array}{c} x_{2}\theta\\ -x_{1}\theta \end{array}\right] \ \ \ \ \ (8)

For the specific example above, to first order in {\theta}

\displaystyle  \phi^{\prime}\left(x^{\prime}\right)=\left[\begin{array}{c} x_{2}\\ -x_{2}\theta \end{array}\right]=\left[\begin{array}{c} x_{1}^{\prime}\theta+x_{2}^{\prime}\\ -x_{2}^{\prime}\theta \end{array}\right] \ \ \ \ \ (9)

Plugging 3 and 9 into 2, we get

\displaystyle  \delta\phi\left(x\right)=\phi^{\prime}\left(x^{\prime}\right)-\phi\left(x\right)=\left[\begin{array}{c} 0\\ -x_{2}\theta \end{array}\right] \ \ \ \ \ (10)

Up to now, we’ve considered what happens at one specific point when the coordinate system is varied. The variation {\delta\phi\left(x\right)} is the result of varying both the coordinate system and the effect this variation has on the form of the field expression. In practice, another kind of variation, called the modified or total variation is defined by

\displaystyle  \tilde{\delta}\phi_{r}\left(x\right)\equiv\phi_{r}^{\prime}\left(x\right)-\phi_{r}\left(x\right) \ \ \ \ \ (11)

Note that the difference between {\tilde{\delta}\phi_{r}\left(x\right)} and {\delta\phi_{r}\left(x\right)} is that the {\phi_{r}^{\prime}} term is evaluated at {x} in the former and at {x^{\prime}} in the latter. This notation is somewhat confusing, since in 2, both {x^{\prime}} and {x} refer to the same point in the plane, while in the latter, the {x} in {\phi_{r}^{\prime}\left(x\right)} is a different point from the {x} in {\phi_{r}\left(x\right)}. We can illustrate this by looking again at the above diagram. The point {x} in the unprimed system is at around {\left(x_{1},x_{2}\right)=\left(1,2\right)} (it’s the location of the tail of the magenta vector, identified by the dotted black lines). The notation {\phi_{r}^{\prime}\left(x\right)} means that we insert the same numerical values for {\left(x_{1},x_{2}\right)} into the function {\phi_{r}^{\prime}}, that is, we set {\left(x_{1}^{\prime},x_{2}^{\prime}\right)=\left(1,2\right)}. This gives the location indicated by the tail of the green vector, as identified by the dotted blue lines. Since this location is higher up the {x_{2}} axis than the magenta vector, the green vector is longer than the magenta vector, so that {\phi_{r}^{\prime}\left(x\right)} and {\phi_{r}\left(x\right)} now refer to two different vectors. The quantity {\tilde{\delta}\phi_{r}\left(x\right)} therefore measures the change in the field due solely to the transformation of the coordinates.

We can, nevertheless, derive a relation between {\tilde{\delta}\phi_{r}\left(x\right)} and {\delta\phi_{r}\left(x\right)}. Starting from 11, we have

\displaystyle   \tilde{\delta}\phi_{r}\left(x\right) \displaystyle  = \displaystyle  \phi_{r}^{\prime}\left(x\right)-\phi_{r}\left(x\right)\ \ \ \ \ (12)
\displaystyle  \displaystyle  = \displaystyle  \phi_{r}^{\prime}\left(x\right)-\phi_{r}^{\prime}\left(x^{\prime}\right)+\phi_{r}^{\prime}\left(x^{\prime}\right)-\phi_{r}\left(x\right)\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  -\left(\phi_{r}^{\prime}\left(x^{\prime}\right)-\phi_{r}^{\prime}\left(x\right)\right)+\delta\phi_{r}\left(x\right)\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \delta\phi_{r}\left(x\right)-\frac{\partial\phi_{r}^{\prime}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu}\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \delta\phi_{r}\left(x\right)-\frac{\partial\phi_{r}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu} \ \ \ \ \ (16)

In the penultimate line, we replaced {\phi_{r}^{\prime}\left(x^{\prime}\right)-\phi_{r}^{\prime}\left(x\right)} by its first order term in the Taylor expansion, and in the last line, we approximated {\phi_{r}^{\prime}\left(x\right)} by {\phi_{r}\left(x\right)}, again valid to first order.

As an example, we can apply this formula to the above vector field. Starting with 11, we have, using 9 and 3

\displaystyle   \tilde{\delta}\phi\left(x\right) \displaystyle  = \displaystyle  \phi^{\prime}\left(x\right)-\phi\left(x\right)\ \ \ \ \ (17)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} x_{1}\theta+x_{2}\\ -x_{2}\theta \end{array}\right]-\left[\begin{array}{c} x_{2}\\ 0 \end{array}\right]\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} x_{1}\theta\\ -x_{2}\theta \end{array}\right] \ \ \ \ \ (19)

Now we can check 16. From 3 we have

\displaystyle   \frac{\partial\phi\left(x\right)}{\partial x_{1}} \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 0 \end{array}\right]\ \ \ \ \ (20)
\displaystyle  \frac{\partial\phi\left(x\right)}{\partial x_{2}} \displaystyle  = \displaystyle  \left[\begin{array}{c} 1\\ 0 \end{array}\right] \ \ \ \ \ (21)

From 8, we have

\displaystyle   \frac{\partial\phi_{r}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu} \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 0 \end{array}\right]\delta x_{1}+\left[\begin{array}{c} 1\\ 0 \end{array}\right]\delta x_{2}\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ 0 \end{array}\right]+\left[\begin{array}{c} -x_{1}\theta\\ 0 \end{array}\right]\ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  -\left[\begin{array}{c} x_{1}\theta\\ 0 \end{array}\right] \ \ \ \ \ (24)

Combining this with 10 we get

\displaystyle   \tilde{\delta}\phi_{r}\left(x\right) \displaystyle  = \displaystyle  \delta\phi_{r}\left(x\right)-\frac{\partial\phi_{r}\left(x\right)}{\partial x_{\mu}}\delta x_{\mu}\ \ \ \ \ (25)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} 0\\ -x_{2}\theta \end{array}\right]+\left[\begin{array}{c} x_{1}\theta\\ 0 \end{array}\right]\ \ \ \ \ (26)
\displaystyle  \displaystyle  = \displaystyle  \left[\begin{array}{c} x_{1}\theta\\ -x_{2}\theta \end{array}\right] \ \ \ \ \ (27)

which agrees with 19.

Finally, we can note a couple of formulas concerning the derivative of the two variations {\tilde{\delta}\phi_{r}\left(x\right)} and {\delta\phi_{r}\left(x\right)}. Since {\tilde{\delta}\phi_{r}\left(x\right)} depends only on {x} (and not on {x^{\prime}}), the derivative commutes with the variation:

\displaystyle  \frac{\partial}{\partial x_{\mu}}\tilde{\delta}\phi_{r}\left(x\right)=\tilde{\delta}\left(\frac{\partial\phi_{r}\left(x\right)}{\partial x_{\mu}}\right) \ \ \ \ \ (28)

The other variation {\delta\phi_{r}\left(x\right)} is a bit trickier, since it involves {x^{\prime}} as well as {x}. However, using the chain rule, we can find its derivative. I’ll use the shorthand {\partial_{\mu}\equiv\partial/\partial x_{\mu}} and {\partial_{\mu}^{\prime}\equiv\partial/\partial x_{\mu}^{\prime}}.

\displaystyle   \partial_{\mu}\left(\delta\phi_{r}\left(x\right)\right) \displaystyle  = \displaystyle  \partial_{\mu}\phi_{r}^{\prime}\left(x^{\prime}\right)-\partial_{\mu}\phi_{r}\left(x\right)\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  \left[\partial_{\mu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)-\partial_{\mu}\phi_{r}\left(x\right)\right]+\partial_{\mu}\phi_{r}^{\prime}\left(x^{\prime}\right)-\partial_{\mu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)\ \ \ \ \ (30)
\displaystyle  \displaystyle  = \displaystyle  \delta\left(\partial_{\mu}\phi_{r}\left(x\right)\right)+\left(\partial_{\nu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)\right)\left(\partial_{\mu}x^{\prime\nu}\right)-\partial_{\mu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right) \ \ \ \ \ (31)

We can now use 1 on the middle term:

\displaystyle   \partial_{\mu}x^{\prime\nu} \displaystyle  = \displaystyle  \partial_{\mu}\left(x^{\nu}+\delta x^{\nu}\right)\ \ \ \ \ (32)
\displaystyle  \displaystyle  = \displaystyle  \delta^{\mu\nu}+\partial_{\mu}\delta x^{\nu} \ \ \ \ \ (33)

Combining the last two terms, we get

\displaystyle   \left(\partial_{\nu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)\right)\left(\delta^{\mu\nu}+\partial_{\mu}\delta x^{\nu}\right)-\partial_{\mu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right) \displaystyle  = \displaystyle  \left(\partial_{\nu}^{\prime}\phi_{r}^{\prime}\left(x^{\prime}\right)\right)\partial_{\mu}\delta x^{\nu}\ \ \ \ \ (34)
\displaystyle  \displaystyle  = \displaystyle  \left(\partial_{\nu}\phi_{r}\left(x\right)\right)\partial_{\mu}\delta x^{\nu} \ \ \ \ \ (35)

Again, the last step is valid to first order in the variations. Thus we have

\displaystyle  \partial_{\mu}\left(\delta\phi_{r}\left(x\right)\right)=\delta\left(\partial_{\mu}\phi_{r}\left(x\right)\right)+\left(\partial_{\nu}\phi_{r}\left(x\right)\right)\partial_{\mu}\delta x^{\nu} \ \ \ \ \ (36)

Poisson brackets and Hamilton’s equations of motion

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.2.

Although I’ve looked at Poisson brackets before, it’s worth going through G & R’s treatment as it is a fair bit simpler and gives clearer results.

First, we need the time derivative of a functional. In the simplest case, a functional {F\left[\phi\right]} depends on a function {\phi}which in turn depends on an independent variable {x}. The functional itself does not depend on {x}, however, usually because {F} is defined as the integral of {\phi\left(x\right)} over some range of {x} values, so the dependence on {x} disappears in the integration.

We can generalize things a bit by taking {\phi} as a function of two variables, say {x} and {t}. If {F} is defined in the same way (say, as an integral of {\phi} over {x}), then the variable {t} also appears in the functional, so we can write this as {F\left(t\right)}, which is

\displaystyle F\left(t\right)=\int dx\;g\left(\phi\left(x,t\right)\right) \ \ \ \ \ (1)

where {g\left(\phi\right)} is some function of {\phi}. Since {F} now depends on {t}, we can take the derivative {dF/dt} which comes out to

\displaystyle \dot{F}\equiv\frac{dF}{dt}=\int dx\frac{dg}{d\phi}\frac{\partial\phi}{\partial t}=\int dx\frac{dg}{d\phi}\dot{\phi}\left(x,t\right) \ \ \ \ \ (2)

As we’ve seen before, the functional derivative of {F} in this case is

\displaystyle \frac{\delta F\left(t\right)}{\delta\phi\left(y,t\right)}=\frac{dg\left(\phi\left(y,t\right)\right)}{d\phi} \ \ \ \ \ (3)

where the notation means that we evaluate the derivative on the RHS at the point {\left(y,t\right)}. Using this result, we can therefore write {\dot{F}} as

\displaystyle \dot{F}\left(t\right)=\int dx\frac{\delta F\left(t\right)}{\delta\phi\left(x,t\right)}\dot{\phi}\left(x,t\right) \ \ \ \ \ (4)

We can generalize this to 4-d space time, so that {x} now indicates the four-vector {x=\left(\mathbf{x},t\right)}, and the integral is over 3-d space:

\displaystyle \dot{F}\left(t\right)=\int d^{3}\mathbf{x}\frac{\delta F\left(t\right)}{\delta\phi\left(x\right)}\dot{\phi}\left(x\right) \ \ \ \ \ (5)

Generalizing even further, we can make {F} a functional of two fields, {\phi} and {\pi}, so we get

\displaystyle \dot{F}\left(t\right)=\int d^{3}\mathbf{x}\left[\frac{\delta F\left(t\right)}{\delta\phi\left(x\right)}\dot{\phi}\left(x\right)+\frac{\delta F\left(t\right)}{\delta\pi\left(x\right)}\dot{\pi}\left(x\right)\right] \ \ \ \ \ (6)

Interpreting {\phi} as the field and {\pi} as its conjugate momentum, we can now use Hamilton’s equations of motion

\displaystyle \dot{\phi} \displaystyle = \displaystyle \frac{\delta H}{\delta\pi}\ \ \ \ \ (7)
\displaystyle \dot{\pi} \displaystyle = \displaystyle -\frac{\delta H}{\delta\phi} \ \ \ \ \ (8)

We get

\displaystyle \dot{F}\left(t\right)=\int d^{3}\mathbf{x}\left[\frac{\delta F\left(t\right)}{\delta\phi\left(x\right)}\frac{\delta H}{\delta\pi}-\frac{\delta F\left(t\right)}{\delta\pi\left(x\right)}\frac{\delta H}{\delta\phi}\right] \ \ \ \ \ (9)

The quantity on the RHS is defined to be the Poisson bracket:

\displaystyle \left\{ F,H\right\} _{PB}\equiv\int d^{3}\mathbf{x}\left[\frac{\delta F\left(t\right)}{\delta\phi\left(x\right)}\frac{\delta H}{\delta\pi}-\frac{\delta F\left(t\right)}{\delta\pi\left(x\right)}\frac{\delta H}{\delta\phi}\right] \ \ \ \ \ (10)


We thus have the general result that the time derivative of a functional is equal to its Poisson bracket with the Hamiltonian:

\displaystyle \boxed{\dot{F}\left(t\right)=\left\{ F,H\right\} _{PB}} \ \ \ \ \ (11)


We can use this result in a rather curious way to re-derive Hamilton’s equations of motion. We first observe that we can write the field {\phi} as an integral:

\displaystyle \phi\left(\mathbf{x},t\right)=\int d^{3}\mathbf{x}^{\prime}\;\phi\left(\mathbf{x}^{\prime},t\right)\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (12)

This effectively defines an ordinary function {\phi} as a functional depending on itself. In this case, both {\mathbf{x}} and {t} are parameters that are present on both sides of the equation; it is the dummy variable {\mathbf{x}^{\prime}} that is the variable of integration.

Taking the variation on both sides, we get

\displaystyle \delta\phi\left(\mathbf{x},t\right)=\int d^{3}\mathbf{x}^{\prime}\;\delta\phi\left(\mathbf{x}^{\prime},t\right)\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (13)

[Be careful not to get the {\delta}s confused here: {\delta\phi} is a variation of the function {\phi} while {\delta^{3}} is the 3-d delta function.] Comparing this to the definition of the functional derivative

\displaystyle \delta F\left[\phi\right]\equiv\int d^{3}\mathbf{x}\frac{\delta F\left[\phi\right]}{\delta\phi\left(x\right)}\delta\phi\left(x\right) \ \ \ \ \ (14)


we see that we have the functional derivative of {\phi} with respect to itself:

\displaystyle \frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}=\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (15)

We could use the same argument on the conjugate momentum, so we also have

\displaystyle \frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}=\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (16)

Since {\phi} and {\pi} are independent fields

\displaystyle \frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}=\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}=0 \ \ \ \ \ (17)


We can now use 11 to find the time derivatives of {\phi} and {\pi} by treating them as functionals:

\displaystyle \dot{\phi}\left(\mathbf{x},t\right) \displaystyle = \displaystyle \left\{ \phi\left(\mathbf{x},t\right),H\right\} \ \ \ \ \ (18)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime}\left[\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\frac{\delta H\left(t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}-\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}\frac{\delta H\left(t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\right]\ \ \ \ \ (19)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime}\left[\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right)\frac{\delta H\left(t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}-0\right]\ \ \ \ \ (20)
\displaystyle \displaystyle = \displaystyle \frac{\delta H\left(t\right)}{\delta\pi\left(\mathbf{x},t\right)} \ \ \ \ \ (21)

This gives the first Hamilton equation of motion 7. We can work out the second equation similarly:

\displaystyle \dot{\pi}\left(\mathbf{x},t\right) \displaystyle = \displaystyle \left\{ \pi\left(\mathbf{x},t\right),H\right\} \ \ \ \ \ (22)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime}\left[\frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\frac{\delta H\left(t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}-\frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime},t\right)}\frac{\delta H\left(t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\right]\ \ \ \ \ (23)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime}\left[0-\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right)\frac{\delta H\left(t\right)}{\delta\phi\left(\mathbf{x}^{\prime},t\right)}\right]\ \ \ \ \ (24)
\displaystyle \displaystyle = \displaystyle -\frac{\delta H\left(t\right)}{\delta\phi\left(\mathbf{x},t\right)} \ \ \ \ \ (25)

Finally, we can work out the Poisson brackets of the fields with each other, using the definition 10 and the results above.

\displaystyle \left\{ \phi\left(\mathbf{x},t\right),\pi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle \equiv \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\pi\left(\mathbf{x}^{\prime},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}-\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\pi\left(\mathbf{x}^{\prime},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\right]\ \ \ \ \ (26)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[\delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime\prime}\right)\delta^{3}\left(\mathbf{x}^{\prime}-\mathbf{x}^{\prime\prime}\right)-0\right]\ \ \ \ \ (27)
\displaystyle \displaystyle = \displaystyle \delta^{3}\left(\mathbf{x}-\mathbf{x}^{\prime}\right) \ \ \ \ \ (28)

The other two Poisson brackets are zero because of 17:

\displaystyle \left\{ \phi\left(\mathbf{x},t\right),\phi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle \equiv \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\phi\left(\mathbf{x}^{\prime},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}-\frac{\delta\phi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\phi\left(\mathbf{x}^{\prime},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\right]\ \ \ \ \ (29)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[0-0\right]\ \ \ \ \ (30)
\displaystyle \displaystyle = \displaystyle 0\ \ \ \ \ (31)
\displaystyle \left\{ \pi\left(\mathbf{x},t\right),\pi\left(\mathbf{x}^{\prime},t\right)\right\} _{PB} \displaystyle \equiv \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[\frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\pi\left(\mathbf{x}^{\prime},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}-\frac{\delta\pi\left(\mathbf{x},t\right)}{\delta\pi\left(\mathbf{x}^{\prime\prime},t\right)}\frac{\delta\pi\left(\mathbf{x}^{\prime},t\right)}{\delta\phi\left(\mathbf{x}^{\prime\prime},t\right)}\right]\ \ \ \ \ (32)
\displaystyle \displaystyle = \displaystyle \int d^{3}\mathbf{x}^{\prime\prime}\left[0-0\right]\ \ \ \ \ (33)
\displaystyle \displaystyle = \displaystyle 0 \ \ \ \ \ (34)

Hamilton’s equations of motion in classical field theory

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2, Section 2.2.

We’ve looked at the derivation of Hamilton’s equations of motion for fields before, but it’s worth running through G & R’s discussion, and filling in a few gaps.

We start with the Lagrangian as a functional of the field {\phi} and its time derivative {\dot{\phi}}:

\displaystyle  L\left(t\right)=L\left[\phi\left(\mathbf{x},t\right),\dot{\phi}\left(\mathbf{x},t\right)\right] \ \ \ \ \ (1)

Recall that {L} does not depend on the spatial position {\mathbf{x}} as it is actually an integral of the Lagrange density {\mathcal{L}} (which is an ordinary function, not a functional) over space:

\displaystyle  L\left(t\right)=\int d^{3}x\;\mathcal{L}\left(\phi\left(\mathbf{x},t\right),\nabla\phi\left(\mathbf{x},t\right),\dot{\phi}\left(\mathbf{x},t\right)\right) \ \ \ \ \ (2)

To get Hamilton’s formulation in classical field theory, we want an analogue of Hamilton’s equations in classical particle theory, where we define a conjugate momentum {p_{k}} for each position coordinate {q_{k}} as

\displaystyle  p_{k}\equiv\frac{\partial L}{\partial\dot{q}_{k}} \ \ \ \ \ (3)

[Remember that in particle theory, {L} is an ordinary function, not a functional.]

We then define the Hamiltonian by means of the Legendre transformation

\displaystyle  H\equiv\sum_{k}p_{k}\dot{q}_{k}-L \ \ \ \ \ (4)

To do the same thing for fields, we replace the position coordinates {q_{k}} by the field functions {\phi\left(\mathbf{x},t\right)}, but we then need an analogue for the conjugate momenta {p_{k}}. Using 3 as inspiration, and remembering that {L} in field theory is a functional, not a function, so we need to use a functional derivative instead of the partial derivative in 3. We get the conjugate momentum {\pi\left(\mathbf{x},t\right)}:

\displaystyle  \pi\left(\mathbf{x},t\right)\equiv\frac{\delta L}{\delta\dot{\phi}\left(\mathbf{x},t\right)} \ \ \ \ \ (5)

Note that both {\pi\left(\mathbf{x},t\right)} and {\phi\left(\mathbf{x},t\right)} are functions of space and time. Remember also that functional derivatives are actually densities, in that they are given per unit volume.

Now we can define a Legendre transformation for field theory as an analogue of 4 to get the Hamiltonian {H}, which is a functional:

\displaystyle   H\left(t\right) \displaystyle  \equiv \displaystyle  \int d^{3}x\;\pi\left(\mathbf{x},t\right)\dot{\phi}\left(\mathbf{x},t\right)-L\left(t\right)\ \ \ \ \ (6)
\displaystyle  \displaystyle  = \displaystyle  \int d^{3}x\left[\pi\left(x\right)\dot{\phi}\left(x\right)-\mathcal{L}\left(x\right)\right] \ \ \ \ \ (7)

where in the last line we used the Lagrangian density 2, and used the single symbol {x} to represent both space {\mathbf{x}} and time {t}. In classical particle theory, the Hamiltonian is defined to depend on {p_{k}} and {q_{k}} as the independent variables, rather than {q_{k}} and {\dot{q}_{k}} on which the Lagrangian depends. For simple particle systems where {p_{k}=m\dot{q}_{k}} (with {m} being the mass of the particle), this change of variable is fairly trivial, although I imagine there are more complex cases where the correspondence isn’t quite so simple.

We can now derive Hamilton’s equations of motion by taking the variation of {H} and using the definition of a functional derivative, and taking the field Hamiltonian to be a functional depending on {\pi} and {\phi}:

\displaystyle  \delta H=\int d^{3}x\left[\frac{\delta H}{\delta\pi}\delta\pi+\frac{\delta H}{\delta\phi}\delta\phi\right] \ \ \ \ \ (8)

We can find expressions for the two functional derivatives in this equation by finding the variation of 6.

\displaystyle  \delta H=\int d^{3}x\left(\dot{\phi}\delta\pi+\pi\delta\dot{\phi}\right)-\delta L \ \ \ \ \ (9)

The variation of the Lagrangian is

\displaystyle  \delta L=\int d^{3}x\left[\frac{\delta L}{\delta\dot{\phi}}\delta\dot{\phi}+\frac{\delta L}{\delta\phi}\delta\phi\right] \ \ \ \ \ (10)

We can get rid of the two functional derivatives as follows. For the first one, we use 5, while for the second one we can use the Euler-Lagrange equation

\displaystyle   \frac{\delta L}{\delta\phi}-\frac{\partial}{\partial t}\frac{\delta L}{\delta\dot{\phi}} \displaystyle  = \displaystyle  0\ \ \ \ \ (11)
\displaystyle  \frac{\delta L}{\delta\phi} \displaystyle  = \displaystyle  \dot{\pi} \ \ \ \ \ (12)

Therefore we have

\displaystyle  \delta L=\int d^{3}x\left(\pi\delta\dot{\phi}+\dot{\pi}\delta\phi\right) \ \ \ \ \ (13)

Inserting this into 9 gives

\displaystyle  \delta H=\int d^{3}x\left(\dot{\phi}\delta\pi-\dot{\pi}\delta\phi\right) \ \ \ \ \ (14)

Comparing with 8 we get

\displaystyle   \dot{\phi} \displaystyle  = \displaystyle  \frac{\delta H}{\delta\pi}\ \ \ \ \ (15)
\displaystyle  \dot{\pi} \displaystyle  = \displaystyle  -\frac{\delta H}{\delta\phi} \ \ \ \ \ (16)

These are Hamilton’s equations of motion, although not in a terribly useful form, since we need to know the functional derivatives in order to get a pair of differential equations that we can try to solve. To get an expression in terms of ordinary functions, we can define the Hamiltonian density {\mathcal{H}} from 7:

\displaystyle  \mathcal{H}\equiv\pi\left(x\right)\dot{\phi}\left(x\right)-\mathcal{L}\left(x\right) \ \ \ \ \ (17)

As with the Lagrangian density, we now assume that {\mathcal{H}} depends on {\pi}, {\phi} and their first spatial derivatives {\partial_{i}\pi} and {\partial_{i}\phi}. We can then follow the same procedure we used earlier to get expressions for the functional derivatives of the Lagrangian in terms of ordinary derivatives of the Lagrangian density. We write the variation of the Hamiltonian as

\displaystyle  \delta H=\int d^{3}x\left[\frac{\partial\mathcal{H}}{\partial\pi}\delta\pi+\frac{\partial\mathcal{H}}{\partial\pi_{,i}}\delta\pi_{,i}+\frac{\partial\mathcal{H}}{\partial\phi}\delta\phi+\frac{\partial\mathcal{H}}{\partial\phi_{,i}}\delta\phi_{,i}\right] \ \ \ \ \ (18)

The second and fourth terms in the integrand can be converted using integration by parts to give (assuming that {\mathcal{H}} and its derivatives go to zero sufficiently quickly at infinity):

\displaystyle   \int d^{3}x\;\frac{\partial\mathcal{H}}{\partial\pi_{,i}}\delta\pi_{,i} \displaystyle  = \displaystyle  -\int d^{3}x\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{H}}{\partial\pi_{,i}}\right)\delta\pi\ \ \ \ \ (19)
\displaystyle  \int d^{3}x\;\frac{\partial\mathcal{H}}{\partial\phi_{,i}}\delta\phi_{,i} \displaystyle  = \displaystyle  -\int d^{3}x\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{H}}{\partial\phi_{,i}}\right)\delta\phi \ \ \ \ \ (20)

We therefore have

\displaystyle  \delta H=\int d^{3}x\left[\left(\frac{\partial\mathcal{H}}{\partial\pi}-\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{H}}{\partial\pi_{,i}}\right)\right)\delta\pi+\left(\frac{\partial\mathcal{H}}{\partial\phi}-\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{H}}{\partial\phi_{,i}}\right)\right)\delta\phi\right] \ \ \ \ \ (21)

Comparing with 8 we have

\displaystyle   \frac{\delta H}{\delta\pi} \displaystyle  = \displaystyle  \frac{\partial\mathcal{H}}{\partial\pi}-\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{H}}{\partial\pi_{,i}}\right)\ \ \ \ \ (22)
\displaystyle  \frac{\delta H}{\delta\phi} \displaystyle  = \displaystyle  \frac{\partial\mathcal{H}}{\partial\phi}-\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{H}}{\partial\phi_{,i}}\right) \ \ \ \ \ (23)

Inserting into 15 and 16 we get Hamilton’s equations of motion in a form free of functional derivatives:

\displaystyle   \dot{\phi} \displaystyle  = \displaystyle  \frac{\partial\mathcal{H}}{\partial\pi}-\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{H}}{\partial\pi_{,i}}\right)\ \ \ \ \ (24)
\displaystyle  \dot{\pi} \displaystyle  = \displaystyle  -\frac{\partial\mathcal{H}}{\partial\phi}+\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{H}}{\partial\phi_{,i}}\right) \ \ \ \ \ (25)

These results differ from the ones we got earlier in that the equation for {\dot{\phi}} has an extra term {-\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{H}}{\partial\pi_{,i}}\right)} that wasn’t present earlier. This is because in our earlier treatment, we assumed that {\mathcal{H}} depended only on {\phi}, {\phi_{,i}} and {\pi} but not on {\pi_{,i}}.

If we can get an expression for {\mathcal{H}} in terms of the field and momentum, and their spatial derivatives, we have a set of differential equations that we can solve (or at least try to).

Functional derivatives and the Lagrangian

Reference: W. Greiner & J. Reinhardt, Field Quantization, Springer-Verlag (1996), Chapter 2.

Although I’ve looked at functionals and functional derivatives before, I’ve been reading yet another book on quantum field theory (Greiner & Reinhardt, referenced above) so I think it’s worth re-examining them and their use in deriving the Euler-Lagrange equations in classical field theory.

G & R’s starting point is their definition of a functional derivative (in one dimension) as follows:

\displaystyle  \delta F\left[\phi\right]\equiv\int dx\frac{\delta F\left[\phi\right]}{\delta\phi\left(x\right)}\delta\phi\left(x\right) \ \ \ \ \ (1)

I find this notation a bit confusing, but I think we can interpret it like this. {F\left[\phi\right]} is a functional of the function {\phi\left(x\right)}, which means that {F} depends on the function {\phi}, but not (at least explicitly) on {x}. One way this might happen is if {F} is defined as an integral of {\phi} over some range of {x}:

\displaystyle  F\left[\phi\right]=\int_{x_{1}}^{x_{2}}\phi\left(x\right)dx \ \ \ \ \ (2)

Clearly if we change {\phi}, the value of {F\left[\phi\right]} will (usually) change as well, so we can see that a functional is a mapping from a set of functions onto the set of ordinary (possibly real or complex) numbers. In this way, it differs from a regular function {f\left(x\right)} which is a mapping from one set of numbers (the {x} values) to another (usually the same) set of numbers.

To understand what the definition 1 is saying, we need to picture what happens if we perturb the function {\phi\left(x\right)} by an amount {\delta\phi\left(x\right)}. At each point {x}, the perturbation {\delta\phi\left(x\right)} will cause a change in the resulting functional {F}. This change can be written as {\frac{\delta F\left[\phi\right]}{\delta\phi\left(x\right)}}, which can be interpreted as the change in {F} per unit change in {\phi} and per unit of {x}. Since the actual change in {\phi} at point {x} is {\delta\phi\left(x\right)}, then the change in {F\left[x\right]} over a distance {dx} is {dx\frac{\delta F\left[\phi\right]}{\delta\phi\left(x\right)}\delta\phi\left(x\right)} and the total change in {F\left[x\right]} is the integral over all {x}, as given by 1.

The reason the notation is confusing is that the functional derivative {\frac{\delta F\left[\phi\right]}{\delta\phi\left(x\right)}} makes no mention of the ‘per unit of {x}‘ part, and as a result, the units in 1 don’t appear to balance on each side of the equation. Once we include this, we see that the units of {\frac{\delta F\left[\phi\right]}{\delta\phi\left(x\right)}} are {\left(\mbox{units of }F\right)\left(\mbox{units of }\phi\right)^{-1}\left(\mbox{length}\right)^{-1}} and then the units on the RHS of 1 come out to just the units of {F}, thus agreeing with the LHS.

Another point worth making is that, because the functional derivative does depend explicitly on {x}, it’s just an ordinary function, not a functional.

Once we understand this, the derivation of the Euler-Lagrange equations given in G & R’s section 2.1 is somewhat easier to follow. Although I’ve run through this derivation previously, it’s worth following through G & R’s derivation, as it’s a bit different and illustrates the points above.

In classical field theory, the Lagrangian is taken to be a functional of a field function {\phi\left(\mathbf{x},t\right)} and its time derivative {\dot{\phi}\left(\mathbf{x},t\right)}:

\displaystyle  L\left(t\right)=L\left[\phi\left(\mathbf{x},t\right),\dot{\phi}\left(\mathbf{x},t\right)\right] \ \ \ \ \ (3)

Note a couple of things: first, we’re now dealing with 3 space dimensions and second, {L} has no explicit dependence on the spatial position {\mathbf{x}}. We can generalize 1 to the 3-d case, where {L} depends on two fields ({\phi} and {\dot{\phi}}) by writing

\displaystyle  \delta L\left[\phi,\dot{\phi}\right]=\int d^{3}x\left[\frac{\delta L}{\delta\phi\left(\mathbf{x},t\right)}\delta\phi\left(\mathbf{x},t\right)+\frac{\delta L}{\delta\dot{\phi}\left(\mathbf{x},t\right)}\delta\dot{\phi}\left(\mathbf{x},t\right)\right] \ \ \ \ \ (4)

The two functional derivatives on the RHS have the units of {\left(\mbox{energy}\right)\left(\mbox{volume}\right)^{-1}\left(\mbox{units of }\phi\right)^{-1}}, which again isn’t entirely obvious from the notation. At this point, I think it makes a bit more sense to acknowledge the fact that the functional derivatives have the dimensions of a density (that is, the units of something per unit volume), and introduce now the Lagrangian density {\mathcal{L}}, which is the Lagrangian per unit volume, so that the total Lagrangian is defined as

\displaystyle  L\left(t\right)=\int d^{3}x\;\mathcal{L}\left(\phi\left(\mathbf{x},t\right),\nabla\phi\left(\mathbf{x},t\right),\dot{\phi}\left(\mathbf{x},t\right)\right) \ \ \ \ \ (5)

Note that {\mathcal{L}} is actually a function rather than a functional, in that it depends explicitly on the position vector {\mathbf{x}} via the field function {\phi}. That is, the total Lagrangian {L} does not depend on {\mathbf{x}}, since it is an integral over all space, while the Lagrangian density {\mathcal{L}} is a local function that varies as we move around in space. The list of arguments of {\mathcal{L}\left(\phi\left(\mathbf{x},t\right),\nabla\phi\left(\mathbf{x},t\right),\dot{\phi}\left(\mathbf{x},t\right)\right)} is an assumption of the theory; we’re explicitly assuming that {\mathcal{L}} doesn’t depend on derivatives of any higher order than the first derivative.

We can now write the variation in {L} using ordinary derivatives:

\displaystyle  \delta L\left(t\right)=\int d^{3}x\;\left[\frac{\partial\mathcal{L}}{\partial\phi}\delta\phi+\frac{\partial\mathcal{L}}{\partial\phi_{,i}}\delta\phi_{,i}+\frac{\partial\mathcal{L}}{\partial\dot{\phi}}\delta\dot{\phi}\right] \ \ \ \ \ (6)

where the notation {\phi_{,i}} is defined as

\displaystyle  \phi_{,i}\equiv\frac{\partial\phi}{\partial x^{i}} \ \ \ \ \ (7)

and {i=1,2,3} with the summation convention being used. We can integrate the middle term in 6 by parts, using

\displaystyle  \delta\phi_{,i}=\frac{\partial}{\partial x^{i}}\left(\delta\phi\right) \ \ \ \ \ (8)

\displaystyle  \int d^{3}x\frac{\partial\mathcal{L}}{\partial\phi_{,i}}\delta\phi_{,i}=\left.\frac{\partial\mathcal{L}}{\partial\phi_{,i}}\delta\phi\right|_{boundary}-\int d^{3}x\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{L}}{\partial\phi_{,i}}\right)\delta\phi \ \ \ \ \ (9)

As usual, we assume the boundary term goes to zero at infinity, so we’re left with

\displaystyle  \delta L\left(t\right)=\int d^{3}x\;\left[\left(\frac{\partial\mathcal{L}}{\partial\phi}-\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{L}}{\partial\phi_{,i}}\right)\right)\delta\phi+\frac{\partial\mathcal{L}}{\partial\dot{\phi}}\delta\dot{\phi}\right] \ \ \ \ \ (10)

Comparing with 4, this allows us to write explicit expressions for the functional derivatives in 4, which makes the volume dependence a bit more obvious:

\displaystyle   \frac{\delta L}{\delta\phi\left(\mathbf{x},t\right)} \displaystyle  = \displaystyle  \frac{\partial\mathcal{L}}{\partial\phi\left(\mathbf{x},t\right)}-\frac{\partial}{\partial x^{i}}\left(\frac{\partial\mathcal{L}}{\partial\phi\left(\mathbf{x},t\right)_{,i}}\right)\ \ \ \ \ (11)
\displaystyle  \frac{\delta L}{\delta\dot{\phi}\left(\mathbf{x},t\right)} \displaystyle  = \displaystyle  \frac{\partial\mathcal{L}}{\partial\dot{\phi}\left(\mathbf{x},t\right)} \ \ \ \ \ (12)

From here, the derivation of the Euler-Lagrange equation proceeds as we showed earlier with the result

\displaystyle  \frac{\partial\mathcal{L}}{\partial\phi^{r}}-\frac{\partial}{\partial x^{\mu}}\left(\frac{\partial\mathcal{L}}{\partial\phi_{,\mu}^{r}}\right)=0 \ \ \ \ \ (13)

Here the superscript {r} labels the field (if we have more than one independent field) and we’ve used four-vector notation {x^{\mu}=\left(t,\mathbf{x}\right)}, and the index {\mu} now extends over 0, 1, 2, 3, with {x^{0}=t}.

Klein-Gordon Feynman propagator

Michael E. Peskin & Daniel V. Schroeder, An Introduction to Quantum Field Theory, (Perseus Books, 1995) – Chapter 2.

Although we’ve already gone through a derivation of the Feynman propagator for the Klein-Gordon field (see here and subsequent posts, listed as pingbacks at the bottom of that post), it’s worth revisiting it here to review the derivation in P&S, which is quite a bit shorter now that we have a few tools ready to deal with it.

In our original derivation of the Green’s function for the Klein-Gordon equation we defined the function

\displaystyle  D_{R}\left(x-y\right)=\frac{1}{\left(2\pi\right)^{3}}\int d^{3}p\int dp^{0}\left(\frac{-1}{2\pi i}\right)\frac{e^{-ip\left(x-y\right)}}{p^{2}-m^{2}} \ \ \ \ \ (1)

The {p^{0}} integral is done in the complex {p^{0}} plane as a contour integral where the contour runs along the real axis from {-\infty} to {\infty}, skirting the poles at {p^{0}=\pm\left(\mathbf{p}^{2}+m^{2}\right)=\pm E_{\mathbf{p}}} with little semicircular arcs that go above the poles. For {x^{0}>y^{0}}, the contour is closed with a large semicircle in the lower half plane so that the contour encloses both poles, with the result that {D_{R}\left(x-y\right)} is non-zero (and the contour is clockwise, which cancels the {-1} in the integrand). For {x^{0}<y^{0}}, the contour is closed with a large semicircle in the upper half plane so that the contour excludes both poles, with the result that {D_{R}\left(x-y\right)} is zero.

We can choose 3 other ways of skirting the two poles: we could use semicircles that go under both poles, or over the pole at {-E_{\mathbf{p}}} and under the pole at {+E_{\mathbf{p}}}, or vice versa. The last choice (under {-E_{\mathbf{p}}} and over {+E_{\mathbf{p}}}) gives the Feynman propagator. In this case, if {x^{0}>y^{0}} we close the contour using a large semicircle in the lower half plane, which excludes the {-E_{\mathbf{p}}} pole and includes the {+E_{\mathbf{p}}} pole. If {x^{0}<y^{0}}, we close the contour in the upper half plane, which includes the {-E_{\mathbf{p}}} pole and excludes the {+E_{\mathbf{p}}} pole. In either case, the integral includes only one pole, and the result is a propagator that we met when discussing causality.

\displaystyle  D\left(x-y\right)=\left\langle 0\left|\phi\left(x\right)\phi\left(y\right)\right|0\right\rangle =\frac{1}{\left(2\pi\right)^{3}}\int d^{3}p\;\left.\frac{e^{-ip\left(x-y\right)}}{2E_{\mathbf{p}}}\right|_{p^{0}=E_{\mathbf{p}}} \ \ \ \ \ (2)

Take {x^{0}>y^{0}} first. Then the contour is in the lower half plane, and is clockwise. The residue at {p^{0}=+E_{\mathbf{p}}} is

\displaystyle  \mbox{Res}\left(E_{\mathbf{p}}\right)=\left.\frac{e^{-ip\left(x-y\right)}}{2E_{\mathbf{p}}}\right|_{p^{0}=E_{\mathbf{p}}} \ \ \ \ \ (3)

The integral is the same as 1, but with a different contour, so we’ll call it {D_{F}}. Doing the {p^{0}} integral around this contour gives

\displaystyle   D_{F}\left(x-y\right) \displaystyle  = \displaystyle  \frac{1}{\left(2\pi\right)^{3}}\int d^{3}p\int dp^{0}\left(\frac{-1}{2\pi i}\right)\frac{e^{-ip\left(x-y\right)}}{p^{2}-m^{2}}\ \ \ \ \ (4)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{\left(2\pi\right)^{3}}\int d^{3}p\left(\frac{-1}{2\pi i}\right)\left(-2\pi i\right)\mbox{Res}\left(E_{\mathbf{p}}\right)\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{\left(2\pi\right)^{3}}\int d^{3}p\;\left.\frac{e^{-ip\left(x-y\right)}}{2E_{\mathbf{p}}}\right|_{p^{0}=E_{\mathbf{p}}}\ \ \ \ \ (6)
\displaystyle  \displaystyle  = \displaystyle  D\left(x-y\right) \ \ \ \ \ (7)

For {y^{0}>x^{0}}, the contour is in the upper half plane and is now counterclockwise and the residue is at {p^{0}=-E_{\mathbf{p}}}:

\displaystyle  \mbox{Res}\left(-E_{\mathbf{p}}\right)=\left.-\frac{e^{-ip\left(x-y\right)}}{2E_{\mathbf{p}}}\right|_{p^{0}=-E_{\mathbf{p}}} \ \ \ \ \ (8)

The integral to be done is now

\displaystyle   D_{F}\left(x-y\right) \displaystyle  = \displaystyle  \frac{1}{\left(2\pi\right)^{3}}\int d^{3}p\int dp^{0}\left(\frac{-1}{2\pi i}\right)\frac{e^{-ip\left(x-y\right)}}{p^{2}-m^{2}}\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{\left(2\pi\right)^{3}}\int d^{3}p\left(\frac{-1}{2\pi i}\right)\left(2\pi i\right)\mbox{Res}\left(-E_{\mathbf{p}}\right)\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{\left(2\pi\right)^{3}}\int d^{3}p\;\left.\frac{e^{-ip\left(x-y\right)}}{2E_{\mathbf{p}}}\right|_{p^{0}=-E_{\mathbf{p}}} \ \ \ \ \ (11)

Note that we now multiply the residue by {+2\pi i} because the contour is now counterclockwise. We can now write for the exponent

\displaystyle  -ip^{0}\left(x^{0}-y^{0}\right)=iE_{\mathbf{p}}\left(x^{0}-y^{0}\right)=-iE_{\mathbf{p}}\left(y^{0}-x^{0}\right) \ \ \ \ \ (12)

Since the spatial part of the exponent is integrated over all {\mathbf{p}}, we can replace {\mathbf{p}} by {\mathbf{-p}}, or equivalently, {\mathbf{x-y}} by {\mathbf{y-x}}, to get

\displaystyle   D_{F}\left(x-y\right) \displaystyle  = \displaystyle  \frac{1}{\left(2\pi\right)^{3}}\int d^{3}p\;\frac{e^{-ip\left(y-x\right)}}{2E_{\mathbf{p}}}\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  D\left(y-x\right) \ \ \ \ \ (14)

Combining the two results 7 and 14 with step functions, we get

\displaystyle   D_{F}\left(x-y\right) \displaystyle  = \displaystyle  \theta\left(x^{0}-y^{0}\right)\left\langle 0\left|\phi\left(x\right)\phi\left(y\right)\right|0\right\rangle +\theta\left(y^{0}-x^{0}\right)\left\langle 0\left|\phi\left(y\right)\phi\left(x\right)\right|0\right\rangle \ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \left\langle 0\left|T\phi\left(x\right)\phi\left(y\right)\right|0\right\rangle \ \ \ \ \ (16)

where {T} is the time-ordering symbol which places the field with the later time first.

The Feynman propagator 15 is still a Green’s function for the Klein-Gordon operator, as we can show by following through the same steps we did earlier for {D_{R}}. Applying the Klein-Gordon operator to the first term in 15 we get (remember that all derivatives are with respect to {x}, not {y}):

\displaystyle   \left(\partial^{2}+m^{2}\right)\theta\left(x^{0}-y^{0}\right)\left\langle 0\left|\phi\left(x\right)\phi\left(y\right)\right|0\right\rangle \displaystyle  = \displaystyle  -\delta\left(x^{0}-y^{0}\right)\left\langle 0\left|\pi\left(x\right)\phi\left(y\right)\right|0\right\rangle +\ \ \ \ \ (17)
\displaystyle  \displaystyle  \displaystyle  2\delta\left(x^{0}-y^{0}\right)\left\langle 0\left|\pi\left(x\right)\phi\left(y\right)\right|0\right\rangle +0\ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \delta\left(x^{0}-y^{0}\right)\left\langle 0\left|\pi\left(x\right)\phi\left(y\right)\right|0\right\rangle \ \ \ \ \ (19)

Doing the same to the second term gives the same result with opposite signs on the delta functions because

\displaystyle  \frac{d\theta\left(y^{0}-x^{0}\right)}{dx^{0}}=-\frac{d\theta\left(x^{0}-y^{0}\right)}{dx^{0}}=-\delta\left(x^{0}-y^{0}\right) \ \ \ \ \ (20)

Thus we get

\displaystyle   \left(\partial^{2}+m^{2}\right)\theta\left(y^{0}-x^{0}\right)\left\langle 0\left|\phi\left(y\right)\phi\left(x\right)\right|0\right\rangle \displaystyle  = \displaystyle  \delta\left(x^{0}-y^{0}\right)\left\langle 0\left|\phi\left(y\right)\pi\left(x\right)\right|0\right\rangle +\ \ \ \ \ (21)
\displaystyle  \displaystyle  \displaystyle  -2\delta\left(x^{0}-y^{0}\right)\left\langle 0\left|\phi\left(y\right)\pi\left(x\right)\right|0\right\rangle +0\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  -\delta\left(x^{0}-y^{0}\right)\left\langle 0\left|\phi\left(y\right)\pi\left(x\right)\right|0\right\rangle \ \ \ \ \ (23)

Combining the two gives

\displaystyle   \left(\partial^{2}+m^{2}\right)D_{F}\left(x-y\right) \displaystyle  = \displaystyle  \delta\left(x^{0}-y^{0}\right)\left\langle 0\left|\left[\pi\left(x\right),\phi\left(y\right)\right]\right|0\right\rangle \ \ \ \ \ (24)
\displaystyle  \displaystyle  = \displaystyle  -i\delta^{\left(4\right)}\left(x-y\right) \ \ \ \ \ (25)

which is the same as the result we got for {D_{R}}.