# Monty Hall problem

As it’s the last day of 2016 and my mind isn’t quite into physics mode today I thought I’d post something frivolous. Here are a few thoughts on the Monty Hall problem, which has reputedly puzzled even some Nobel prize winners. (The problem also featured in episode 4×08, Skyfire Cycle, of the popular TV comedy Brooklyn Nine-Nine.) The problem is named after Monty Hall, who was the host of an American game show called Let’s Make a Deal. In the climax of each show, the winning contestant was faced with three doors. Behind two of the doors was a worthless prize (symbolized in the usual statement of the problem by a goat) while behind the third door was a major prize such as a car. The contestant initially chooses one of the three doors. At this point, Monty Hall opens one of the other two doors to reveal one of the worthless prizes (one of the goats). The contestant is then offered the chance to switch his/her choice to the other unopened door. The question is: should the contestant stay with their first choice or switch to the other door?

At first glance, the answer appears obvious. After Monty opens one of the two remaining doors to reveal the location of one of the two goats, the contestant has a choice between two doors, so the odds appear to be 50-50 as to which door hides the car. Thus it would appear that there is no advantage to be gained by switching, so it doesn’t matter what the contestant does.

However, this isn’t the correct way of looking at the problem. The key is that Monty knows where the car is, since he always opens a door hiding a goat. Thus if the contestant’s initial choice is wrong, they are guaranteed to win if they switch doors. The problem then reduces to: what is the probability that the contestant’s initial choice was wrong? The answer here is obvious: since the initial choice is a blind one-in-three choice, the chance that their initial choice is wrong is ${\frac{2}{3}}$. When offered the chance to switch doors, they should therefore do it, as they are twice as likely to win the car than if they stayed with their original choice.

It’s surprising that this relatively simple problem has given rise to such controversy (see the Wikipedia article for details).

Anyway, hopefully readers have found my posts from 2016 to be interesting and/or helpful, so here’s looking forward to a few more posts in 2017.

# Postulates of quantum mechanics: Schrödinger equation and propagators

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 4.3.

The first three postulates of quantum mechanics concern the properties of a quantum state. The fourth postulate concerns how states evolve with time. The postulate simply states that in non-relativistic quantum mechanics, a state satisfies the Schrödinger equation:

$\displaystyle i\hbar\frac{\partial}{\partial t}\left|\psi\right\rangle =H\left|\psi\right\rangle \ \ \ \ \ (1)$

where ${H}$ is the Hamiltonian, which is obtained from the classical Hamiltonian by means of the other postulates of quantum mechanics, namely that we replace all references to the position ${x}$ by the quantum position operator ${X}$ with matrix elements (in the ${x}$ basis) of

$\displaystyle \left\langle x^{\prime}\left|X\right|x\right\rangle =\delta\left(x-x^{\prime}\right) \ \ \ \ \ (2)$

and all references to classical momentum ${p}$ by the momentum operator ${P}$ with matrix elements

$\displaystyle \left\langle x^{\prime}\left|P\right|x\right\rangle =-i\hbar\delta^{\prime}\left(x-x^{\prime}\right) \ \ \ \ \ (3)$

Although we’ve posted many articles based on Griffiths’s book in which we solved the Schrödinger equation, the approach taken by Shankar is a bit different and, in some ways, a lot more elegant. We begin with a Hamiltonian that does not depend explicitly on time, and then by observing that, since the Schrödinger equation contains only the first derivative with respect to time, The time evolution of a state can be uniquely determined if we specify only the initial state ${\left|\psi\left(0\right)\right\rangle }$. [A differential equation that is second order in time, such as the wave equation, requires both the initial position and initial velocity to be specified.]

The solution of the Schrödinger equation is then found in analogy to the approach we used in solving the coupled masses problem earlier. We find the eigenvalues and eigenvectors of the Hamiltonian in some basis and use these to construct the propagator ${U\left(t\right)}$. We can then write the solution as

$\displaystyle \left|\psi\left(t\right)\right\rangle =U\left(t\right)\left|\psi\left(0\right)\right\rangle \ \ \ \ \ (4)$

For the case of a time-independent Hamiltonian, we can actually construct ${U\left(t\right)}$ in terms of the eigenvectors of ${H}$. The eigenvalue equation is

$\displaystyle H\left|E\right\rangle =E\left|E\right\rangle \ \ \ \ \ (5)$

where ${E}$ is an eigenvalue of ${H}$ and ${\left|E\right\rangle }$ is its corresponding eigenvector. Since the eigenvectors form a vector space, we can expand the wave function in terms of them in the usual way

 $\displaystyle \left|\psi\left(t\right)\right\rangle$ $\displaystyle =$ $\displaystyle \sum\left|E\right\rangle \left\langle E\left|\psi\left(t\right)\right.\right\rangle \ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle \equiv$ $\displaystyle \sum a_{E}\left(t\right)\left|E\right\rangle \ \ \ \ \ (7)$

The coefficient ${a_{E}\left(t\right)}$ is the component of ${\left|\psi\left(t\right)\right\rangle }$ along the ${\left|E\right\rangle }$ vector as a function of time. We can now apply the Schrödinger equation 1 to get (a dot over a symbol indicates a time derivative):

 $\displaystyle i\hbar\frac{\partial}{\partial t}\left|\psi\left(t\right)\right\rangle$ $\displaystyle =$ $\displaystyle i\hbar\sum\dot{a}_{E}\left(t\right)\left|E\right\rangle \ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle H\left|\psi\left(t\right)\right\rangle \ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum a_{E}\left(t\right)H\left|E\right\rangle \ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum a_{E}\left(t\right)E\left|E\right\rangle \ \ \ \ \ (11)$

Since the eigenvectors ${\left|E\right\rangle }$ are linearly independent (as they form a basis for the vector space), each term in the sum in the first line must be equal to the corresponding term in the sum in the last line, so we have

$\displaystyle i\hbar\dot{a}_{E}\left(t\right)=a_{E}\left(t\right)E \ \ \ \ \ (12)$

The solution is

 $\displaystyle a_{E}\left(t\right)$ $\displaystyle =$ $\displaystyle a_{E}\left(0\right)e^{-iEt/\hbar}\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{-iEt/\hbar}\left\langle E\left|\psi\left(0\right)\right.\right\rangle \ \ \ \ \ (14)$

The general solution 7 is therefore

$\displaystyle \left|\psi\left(t\right)\right\rangle =\sum e^{-iEt/\hbar}\left|E\right\rangle \left\langle E\left|\psi\left(0\right)\right.\right\rangle \ \ \ \ \ (15)$

from which we can read off the propagator:

$\displaystyle U\left(t\right)=\sum e^{-iEt/\hbar}\left|E\right\rangle \left\langle E\right| \ \ \ \ \ (16)$

Thus if we can determine the eigenvalues and eigenvectors of ${H}$, we can write the propagator in terms of them and get the general solution. We can see from this that ${U\left(t\right)}$ is unitary:

 $\displaystyle U^{\dagger}U$ $\displaystyle =$ $\displaystyle \sum_{E^{\prime}}\sum_{E}e^{-i\left(E-E^{\prime}\right)t/\hbar}\left|E\right\rangle \left\langle E\left|E^{\prime}\right.\right\rangle \left\langle E^{\prime}\right|\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{E^{\prime}}\sum_{E}e^{-i\left(E-E^{\prime}\right)t/\hbar}\left|E\right\rangle \delta_{EE^{\prime}}\left\langle E^{\prime}\right|\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{E}\left|E\right\rangle \left\langle E\right|\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 1 \ \ \ \ \ (20)$

This derivation uses the fact that the eigenvectors are orthonormal and form a complete set, so that ${\left\langle E\left|E^{\prime}\right.\right\rangle =\delta_{EE^{\prime}}}$ and ${\sum_{E}\left|E\right\rangle \left\langle E\right|=1}$. Since a unitary operator doesn’t change the norm of a vector, we see from 4 that if ${\left|\psi\left(0\right)\right\rangle }$ is normalized, then so is ${\left|\psi\left(t\right)\right\rangle }$ for all times ${t}$. Further, the probability that the state will be measured to be in eigenstate ${\left|E\right\rangle }$ is constant over time, since this probability is given by

$\displaystyle \left|a_{E}\left(t\right)\right|^{2}=\left|e^{-iEt/\hbar}\left\langle E\left|\psi\left(0\right)\right.\right\rangle \right|^{2}=\left|\left\langle E\left|\psi\left(0\right)\right.\right\rangle \right|^{2} \ \ \ \ \ (21)$

This derivation assumed that the spectrum of ${H}$ was discrete and non-degenerate. If the possible eigenvalues ${E}$ are continuous, then the sum is replaced by an integral

$\displaystyle U\left(t\right)=\int e^{-iEt/\hbar}\left|E\right\rangle \left\langle E\right|dE \ \ \ \ \ (22)$

If the spectrum is discrete and degenerate, then we need to find an orthonormal set of eigenvectors that spans each degenerate subspace, and sum over these sets. For example, if ${E_{1}}$ is degenerate, then we find a set of eigenvectors ${\left|E_{1},\alpha\right\rangle }$ that spans the subspace for which ${E_{1}}$ is the eigenvalue. The index ${\alpha}$ runs from 1 up to the degree of degeneracy of ${E_{1}}$, and the propagator is then

$\displaystyle U\left(t\right)=\sum_{\alpha}\sum_{E_{i}}e^{-iE_{i}t/\hbar}\left|E_{i},\alpha\right\rangle \left\langle E_{i},\alpha\right| \ \ \ \ \ (23)$

The sum over ${E_{i}}$ runs over all the distinct eigenvalues, and the sum over ${\alpha}$ runs over the eigenvectors for each different ${E_{i}}$.

Another form of the propagator can be written directly in terms of the time-independent Hamiltonian as

$\displaystyle U\left(t\right)=e^{-iHt/\hbar} \ \ \ \ \ (24)$

This relies on the concept of the function of an operator, so that ${e^{-iHt/\hbar}}$ is a matrix whose elements are power series of the exponent ${-\frac{iHt}{\hbar}}$. The power series must, of course, converge for this solution to be valid. Since ${H}$ is Hermitian, ${U\left(t\right)}$ is unitary. We can verify that the solution using this form of ${U\left(t\right)}$ satisfies the Schrödinger equation:

 $\displaystyle \left|\psi\left(t\right)\right\rangle$ $\displaystyle =$ $\displaystyle U\left(t\right)\left|\psi\left(0\right)\right\rangle \ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{-iHt/\hbar}\left|\psi\left(0\right)\right\rangle \ \ \ \ \ (26)$ $\displaystyle i\hbar\left|\dot{\psi}\left(t\right)\right\rangle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dt}\left(e^{-iHt/\hbar}\right)\left|\psi\left(0\right)\right\rangle \ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\left(-\frac{i}{\hbar}\right)He^{-iHt/\hbar}\left|\psi\left(0\right)\right\rangle \ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle =$ $\displaystyle He^{-iHt/\hbar}\left|\psi\left(0\right)\right\rangle \ \ \ \ \ (29)$ $\displaystyle$ $\displaystyle =$ $\displaystyle H\left|\psi\left(t\right)\right\rangle \ \ \ \ \ (30)$

The derivative of ${U\left(t\right)}$ can be calculated from the derivatives of its matrix elements, which are all power series.

# Postulates of quantum mechanics: momentum

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Sections 4.1 – 4.2; Exercises 4.2.2 – 4.2.3.

One of the postulates of quantum mechanics is that the momentum operator ${P}$ in position space is given by

$\displaystyle \left\langle x\left|P\right|x^{\prime}\right\rangle =-i\hbar\delta^{\prime}\left(x-x^{\prime}\right) \ \ \ \ \ (1)$

By using the properties of the derivative of the delta function, we can find the eigenfunctions of ${P}$. We have

 $\displaystyle \left\langle x\left|P\right|\psi\right\rangle$ $\displaystyle =$ $\displaystyle \int\left\langle x\left|P\right|x^{\prime}\right\rangle \left\langle x^{\prime}\left|\psi\right.\right\rangle dx^{\prime}\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\hbar\int\delta^{\prime}\left(x-x^{\prime}\right)\left\langle x^{\prime}\left|\psi\right.\right\rangle dx^{\prime}\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\hbar\frac{d}{dx}\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\hbar\frac{d\psi\left(x\right)}{dx} \ \ \ \ \ (5)$

The eigenvector of ${P}$ is ${\left|p\right\rangle }$ and has the property that

$\displaystyle P\left|p\right\rangle =p\left|p\right\rangle \ \ \ \ \ (6)$

If we project this onto position space and use 5 we get

 $\displaystyle \left\langle x\left|P\right|\psi\right\rangle$ $\displaystyle =$ $\displaystyle p\left\langle x\left|p\right.\right\rangle \ \ \ \ \ (7)$ $\displaystyle -i\hbar\frac{d\psi_{p}\left(x\right)}{dx}$ $\displaystyle =$ $\displaystyle p\psi_{p}\left(x\right) \ \ \ \ \ (8)$

where

$\displaystyle \psi_{p}\left(x\right)\equiv\left\langle x\left|p\right.\right\rangle \ \ \ \ \ (9)$

Solving this differential equation and normalizing so that ${\left\langle p^{\prime}\left|p\right.\right\rangle =\delta\left(p-p^{\prime}\right)}$ we get

$\displaystyle \psi_{p}\left(x\right)=\frac{1}{\sqrt{2\pi\hbar}}e^{ipx/\hbar} \ \ \ \ \ (10)$

For an arbitrary wave function ${\left|\psi\right\rangle }$, if we know its position-space form, we can find its momentum-space version as follows:

 $\displaystyle \left\langle p\left|\psi\right.\right\rangle$ $\displaystyle =$ $\displaystyle \int\left\langle p\left|x\right.\right\rangle \left\langle x\left|\psi\right.\right\rangle dx\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int\psi_{p}^*\left(x\right)\left\langle x\left|\psi\right.\right\rangle dx\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2\pi\hbar}}\int e^{-ipx/\hbar}\psi\left(x\right)dx \ \ \ \ \ (13)$

This has an interesting consequence if the position-space function ${\psi\left(x\right)}$ is real. The probability density for finding a particle in a state with momentum ${p}$ is ${\left|\left\langle p\left|\psi\right.\right\rangle \right|^{2}}$, which we can write as

 $\displaystyle \left|\left\langle p\left|\psi\right.\right\rangle \right|^{2}$ $\displaystyle =$ $\displaystyle \left\langle p\left|\psi\right.\right\rangle ^*\left\langle p\left|\psi\right.\right\rangle \ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi\hbar}\int\int e^{ip\left(x-x^{\prime}\right)/\hbar}\psi\left(x\right)\psi\left(x^{\prime}\right)dx\;dx^{\prime}\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi\hbar}\int\int e^{-ip\left(x^{\prime}-x\right)/\hbar}\psi\left(x\right)\psi\left(x^{\prime}\right)dx\;dx^{\prime}\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi\hbar}\int\int e^{-ip\left(x-x^{\prime}\right)/\hbar}\psi\left(x^{\prime}\right)\psi\left(x\right)dx\;dx^{\prime}\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left|\left\langle -p\left|\psi\right.\right\rangle \right|^{2} \ \ \ \ \ (18)$

In the fourth line, since ${x}$ and ${x^{\prime}}$ are dummy integration variables, both of which are integrated over the same range, we can simply swap them without changing anything. Note that the derivation relies on ${\psi\left(x\right)}$ being real, since if it were complex we would have

 $\displaystyle \left|\left\langle p\left|\psi\right.\right\rangle \right|^{2}$ $\displaystyle =$ $\displaystyle \left\langle p\left|\psi\right.\right\rangle ^*\left\langle p\left|\psi\right.\right\rangle \ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi\hbar}\int\int e^{ip\left(x-x^{\prime}\right)/\hbar}\psi\left(x\right)\psi^*\left(x^{\prime}\right)dx\;dx^{\prime}\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi\hbar}\int\int e^{-ip\left(x^{\prime}-x\right)/\hbar}\psi\left(x\right)\psi^*\left(x^{\prime}\right)dx\;dx^{\prime}\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi\hbar}\int\int e^{-ip\left(x-x^{\prime}\right)/\hbar}\psi\left(x^{\prime}\right)\psi^*\left(x\right)dx\;dx^{\prime}\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle \ne$ $\displaystyle \left|\left\langle -p\left|\psi\right.\right\rangle \right|^{2} \ \ \ \ \ (23)$

since

$\displaystyle \left|\left\langle -p\left|\psi\right.\right\rangle \right|^{2}=\frac{1}{2\pi\hbar}\int\int e^{-ip\left(x-x^{\prime}\right)/\hbar}\psi\left(x\right)\psi^*\left(x^{\prime}\right)dx\;dx^{\prime} \ \ \ \ \ (24)$

That is, for ${\left|\left\langle -p\left|\psi\right.\right\rangle \right|^{2}}$ the position ${x^{\prime}}$ that is the argument of the ${\psi^*\left(x^{\prime}\right)}$ factor appears as the positive term ${ipx^{\prime}}$ in the exponential, but in 22 the argument of the complex conjugate wave function is ${x}$, which appears as the negative term ${-ipx}$ in the exponential.

Thus for any real wave function, the probability of the particle having momentum ${+p}$ is equal to the probability of it having ${-p}$, so for such wave functions, the mean momentum is always ${\left\langle P\right\rangle =0}$.

As another example, suppose we have a wave function ${\psi\left(x\right)}$ with a mean momentum ${\bar{p}}$, so that

$\displaystyle \left\langle \psi\left|P\right|\psi\right\rangle =\bar{p} \ \ \ \ \ (25)$

If we now multiply ${\psi}$ by ${e^{ip_{0}x/\hbar}}$ where ${p_{0}}$ is a constant momentum, we can calculate the new mean momentum using 5:

 $\displaystyle \left\langle P\right\rangle$ $\displaystyle =$ $\displaystyle \left\langle e^{ip_{0}x/\hbar}\psi\left|P\right|e^{ip_{0}x/\hbar}\psi\right\rangle \ \ \ \ \ (26)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\hbar\int e^{-ip_{0}x/\hbar}\psi^*\left(x\right)\frac{d}{dx}\left(e^{ip_{0}x/\hbar}\psi\left(x\right)\right)dx\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\hbar\int e^{-ip_{0}x/\hbar}\psi^*\left[\frac{ip_{0}}{\hbar}e^{ip_{0}x/\hbar}\psi\left(x\right)+e^{ip_{0}x/\hbar}\frac{d}{dx}\psi\left(x\right)\right]dx\ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int p_{0}\psi^*\psi dx-i\hbar\int\psi^*\left(x\right)\frac{d}{dx}\psi\left(x\right)dx\ \ \ \ \ (29)$ $\displaystyle$ $\displaystyle =$ $\displaystyle p_{0}+\bar{p} \ \ \ \ \ (30)$

The first integral in the fourth line uses the fact that ${p_{0}}$ is constant and ${\psi}$ is normalized so that

$\displaystyle \int\psi^*\psi dx=1 \ \ \ \ \ (31)$

# Postulates of quantum mechanics: states and measurements

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Sections 4.1 – 4.2; Exercise 4.2.1.

Although we’ve covered the basics of nonrelativistic quantum mechanics before, the approach taken by Shankar in his Chapter 4 provides a new way of looking at it, so it’s worth a summary.

Quantum mechanics is based on four postulates, the first three of which describe the quantum state at a fixed instant in time, and the fourth which describes its time evolution via the Schrödinger equation. We’ll summarize the first three postulates here, and compare each with its classical analogue.

First, in classical mechanics, the path of a particle is, in the Hamiltonian formalism, described by specifying its position ${x\left(t\right)}$ and momentum ${p\left(t\right)}$ as functions of time. Both the position and momentum are specified precisely at all times. In quantum mechanics, the state of a particle is specified by a vector (ket) ${\left|\psi\left(t\right)\right\rangle }$ in a Hilbert space.

Second, in classical mechanics, any dynamical variable ${\omega}$ is a function of the two phase-space coordinates ${x}$ and ${p}$: ${\omega=\omega\left(x,p\right)}$. In quantum mechanics, the spatial coordinate ${x}$ is replaced by the Hermitian operator ${X}$ and the momentum ${p}$ is replaced by the differential operator ${P=\hbar K}$ which we discussed earlier. The matrix elements of ${X}$ and ${P}$ in position space are

 $\displaystyle \left\langle x\left|X\right|x^{\prime}\right\rangle$ $\displaystyle =$ $\displaystyle x\delta\left(x-x^{\prime}\right)\ \ \ \ \ (1)$ $\displaystyle \left\langle x\left|P\right|x^{\prime}\right\rangle$ $\displaystyle =$ $\displaystyle -i\hbar\delta^{\prime}\left(x-x^{\prime}\right) \ \ \ \ \ (2)$

The classical dynamical variable ${\omega\left(x,p\right)}$ becomes a Hermitian operator ${\Omega\left(X,P\right)}$, where ${x}$ and ${p}$ in ${\omega\left(x,p\right)}$ are replaced by their corresponding operators ${X}$ and ${P}$.

The third postulate states how measurements work in quantum mechanics. In classical mechanics, it is assumed that (in principle) any dynamical variable ${\omega}$ may be measured with arbitrary precision without changing the state of the particle. In quantum mechanics, if we wish to measure the value of a variable represented by the operator ${\Omega}$, we must determine the eigenvalues ${\omega_{i}}$ and corresponding eigenvectors ${\left|\omega_{i}\right\rangle }$ of ${\Omega}$, then express the state ${\left|\psi\right\rangle }$ as a linear combination of the ${\left|\omega_{i}\right\rangle }$. Then the best we can do is to state that the particular eigenvalue ${\omega_{i}}$ will be measured with probability ${\left|\left\langle \omega_{i}\left|\psi\right.\right\rangle \right|^{2}}$. After the measurement, the state ${\left|\psi\right\rangle }$ ‘collapses’ to become the state ${\left|\omega_{i}\right\rangle }$. The only possible outcomes of a measurement of ${\Omega}$ are its eigenvalues; no intermediate values are possible.

To illustrate these postulates, suppose we have the following three operators on a complex 3-d Hilbert space (essentially these are the spin-1 operators without the ${\hbar}$)

 $\displaystyle L_{x}$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2}}\left[\begin{array}{ccc} 0 & 1 & 0\\ 1 & 0 & 1\\ 0 & 1 & 0 \end{array}\right]\ \ \ \ \ (3)$ $\displaystyle L_{y}$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2}}\left[\begin{array}{ccc} 0 & -i & 0\\ i & 0 & -i\\ 0 & i & 0 \end{array}\right]\ \ \ \ \ (4)$ $\displaystyle L_{z}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & -1 \end{array}\right] \ \ \ \ \ (5)$

Since ${L_{z}}$ is diagonal, its eigenvalues can be read off from the diagonal elements as ${0,\pm1}$, so these are the possible values of ${L_{z}}$ that could be obtained in a measurement. Also because ${L_{z}}$ is diagonal, its eigenvectors are

 $\displaystyle \left|L_{z}=+1\right\rangle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{c} 1\\ 0\\ 0 \end{array}\right]\ \ \ \ \ (6)$ $\displaystyle \left|L_{z}=0\right\rangle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{c} 0\\ 1\\ 0 \end{array}\right]\ \ \ \ \ (7)$ $\displaystyle \left|L_{z}=-1\right\rangle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{c} 0\\ 0\\ 1 \end{array}\right] \ \ \ \ \ (8)$

Suppose we start with the state ${\left|L_{z}=+1\right\rangle }$ in which ${L_{z}=+1}$, and we want to measure ${L_{x}}$ in this state. To find the expectation values ${\left\langle L_{x}\right\rangle }$ and ${\left\langle L_{x}^{2}\right\rangle }$ in this state, we calculate

 $\displaystyle \left\langle L_{x}\right\rangle$ $\displaystyle =$ $\displaystyle \left\langle L_{z}=+1\left|L_{x}\right|L_{z}=+1\right\rangle \ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 1 & 0 & 0\end{array}\right]\frac{1}{\sqrt{2}}\left[\begin{array}{ccc} 0 & 1 & 0\\ 1 & 0 & 1\\ 0 & 1 & 0 \end{array}\right]\left[\begin{array}{c} 1\\ 0\\ 0 \end{array}\right]\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2}}\left[\begin{array}{ccc} 1 & 0 & 0\end{array}\right]\left[\begin{array}{c} 0\\ 1\\ 0 \end{array}\right]\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (12)$

To get ${\left\langle L_{x}^{2}\right\rangle }$ we first find the operator

$\displaystyle L_{x}^{2}=\frac{1}{2}\left[\begin{array}{ccc} 1 & 0 & 1\\ 0 & 2 & 0\\ 1 & 0 & 1 \end{array}\right] \ \ \ \ \ (13)$

Now we have

 $\displaystyle \left\langle L_{x}^{2}\right\rangle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{ccc} 1 & 0 & 0\end{array}\right]\frac{1}{2}\left[\begin{array}{ccc} 1 & 0 & 1\\ 0 & 2 & 0\\ 1 & 0 & 1 \end{array}\right]\left[\begin{array}{c} 1\\ 0\\ 0 \end{array}\right]\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left[\begin{array}{ccc} 1 & 0 & 0\end{array}\right]\left[\begin{array}{c} 1\\ 0\\ 1 \end{array}\right]\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2} \ \ \ \ \ (16)$

The uncertainty, or variance, is

$\displaystyle \Delta L_{x}=\sqrt{\left\langle L_{x}^{2}\right\rangle -\left\langle L_{x}\right\rangle ^{2}}=\frac{1}{\sqrt{2}} \ \ \ \ \ (17)$

To find the possible values of ${L_{x}}$ and their probabilities, we need to find the eigenvalues and eigenvectors of ${L_{x}}$, which we can do in the ${L_{z}}$ basis, since this basis is given by the three vectors in 6. The eigenvalues are found in the usual way from the determinant:

 $\displaystyle \left|\begin{array}{ccc} -\lambda & \frac{1}{\sqrt{2}} & 0\\ \frac{1}{\sqrt{2}} & -\lambda & \frac{1}{\sqrt{2}}\\ 0 & \frac{1}{\sqrt{2}} & -\lambda \end{array}\right|$ $\displaystyle =$ $\displaystyle -\lambda\left(\lambda^{2}-\frac{1}{2}\right)-\frac{1}{\sqrt{2}}\left(\frac{-\lambda}{\sqrt{2}}\right)\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\lambda^{3}+\lambda=0\ \ \ \ \ (19)$ $\displaystyle \lambda$ $\displaystyle =$ $\displaystyle 0,\pm1 \ \ \ \ \ (20)$

The eigenvectors can be found in the usual way, by solving

$\displaystyle \left(L_{x}-\lambda I\right)\left|L_{x}=\lambda\right\rangle =0 \ \ \ \ \ (21)$

where the ket takes on the three possible values of ${\lambda}$ successively. We let

$\displaystyle \left|L_{x}=\lambda\right\rangle =\left[\begin{array}{c} a\\ b\\ c \end{array}\right] \ \ \ \ \ (22)$

For ${\lambda=+1}$ we have

 $\displaystyle -a+\frac{b}{\sqrt{2}}$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (23)$ $\displaystyle \frac{1}{\sqrt{2}}\left(a-\sqrt{2}b+c\right)$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (24)$ $\displaystyle \frac{b}{\sqrt{2}}-c$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (25)$

Only two of these three equations are independent, so we can set ${a=1}$ and solve for ${b}$ and ${c}$ to get

 $\displaystyle a$ $\displaystyle =$ $\displaystyle 1\ \ \ \ \ (26)$ $\displaystyle b$ $\displaystyle =$ $\displaystyle \sqrt{2}\ \ \ \ \ (27)$ $\displaystyle c$ $\displaystyle =$ $\displaystyle 1 \ \ \ \ \ (28)$

Normalizing the eigenvector gives

$\displaystyle \left|L_{x}=+1\right\rangle =\frac{1}{2}\left[\begin{array}{c} 1\\ \sqrt{2}\\ 1 \end{array}\right] \ \ \ \ \ (29)$

The other two eigenvectors can be found the same way, with the result

 $\displaystyle \left|L_{x}=0\right\rangle$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2}}\left[\begin{array}{c} 1\\ 0\\ -1 \end{array}\right]\ \ \ \ \ (30)$ $\displaystyle \left|L_{x}=-1\right\rangle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left[\begin{array}{c} 1\\ -\sqrt{2}\\ 1 \end{array}\right] \ \ \ \ \ (31)$

Note that these eigenvectors are orthonormal.

Now that we have the eigenvectors of ${L_{x}}$ we can answer the following question. If we start with the state ${\left|L_{z}=-1\right\rangle }$ and measure ${L_{x}}$, what are the possible outcomes and the probability of each?

First, we need to express ${\left|L_{z}=-1\right\rangle }$ in terms of the eigenvectors of ${L_{x}}$ which we can do by solving three simultaneous linear equations, and we find

$\displaystyle \left|L_{z}=-1\right\rangle =\left[\begin{array}{c} 0\\ 0\\ 1 \end{array}\right]=\frac{1}{2}\left(\left|L_{x}=+1\right\rangle +\left|L_{x}=-1\right\rangle \right)-\frac{1}{\sqrt{2}}\left|L_{x}=0\right\rangle \ \ \ \ \ (32)$

(You can verify this by direct substitution.) Thus all 3 possible values of ${L_{x}}$ can result from a measurement, and the probability of each is

 $\displaystyle P\left(L_{x}=+1\right)$ $\displaystyle =$ $\displaystyle \left|\left\langle L_{x}=+1\left|L_{z}=-1\right.\right\rangle \right|^{2}\ \ \ \ \ (33)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\frac{1}{2}\left[\begin{array}{ccc} 1 & \sqrt{2} & 1\end{array}\right]\left[\begin{array}{c} 0\\ 0\\ 1 \end{array}\right]\right)^{2}\ \ \ \ \ (34)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{4}\ \ \ \ \ (35)$ $\displaystyle P\left(L_{x}=0\right)$ $\displaystyle =$ $\displaystyle \left|\left\langle L_{x}=0\left|L_{z}=-1\right.\right\rangle \right|^{2}\ \ \ \ \ (36)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\frac{1}{\sqrt{2}}\left[\begin{array}{ccc} 1 & 0 & -1\end{array}\right]\left[\begin{array}{c} 0\\ 0\\ 1 \end{array}\right]\right)^{2}\ \ \ \ \ (37)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\ \ \ \ \ (38)$ $\displaystyle P\left(L_{x}=-1\right)$ $\displaystyle =$ $\displaystyle \left|\left\langle L_{x}=-1\left|L_{z}=-1\right.\right\rangle \right|^{2}\ \ \ \ \ (39)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\frac{1}{2}\left[\begin{array}{ccc} 1 & -\sqrt{2} & 1\end{array}\right]\left[\begin{array}{c} 0\\ 0\\ 1 \end{array}\right]\right)^{2}\ \ \ \ \ (40)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{4} \ \ \ \ \ (41)$

Now suppose we start with the state, written in the ${L_{z}}$ basis:

$\displaystyle \left|\psi\right\rangle =\left[\begin{array}{c} \frac{1}{2}\\ \frac{1}{2}\\ \frac{1}{\sqrt{2}} \end{array}\right] \ \ \ \ \ (42)$

We take a measurement of ${L_{z}^{2}}$ and obtain ${+1}$. The operator ${L_{z}^{2}}$ is given by squaring 5:

$\displaystyle L_{z}^{2}=\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & 1 \end{array}\right] \ \ \ \ \ (43)$

This has a degenerate eigenvalue ${\lambda=+1}$, so the most we can say about the state ${\left|\psi\right\rangle }$ after the measurement is that it is projected onto the subspace ${a\left[\begin{array}{c} 1\\ 0\\ 0 \end{array}\right]+b\left[\begin{array}{c} 0\\ 0\\ 1 \end{array}\right]}$. That is, the state after the measurement is given by

 $\displaystyle \left|\psi\right\rangle _{after}$ $\displaystyle =$ $\displaystyle \mathbb{P}_{L_{z}=\pm1}\left|\psi\right\rangle _{before}\ \ \ \ \ (44)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\left|L_{z}=+1\right\rangle \left\langle L_{z}=+1\right|+\left|L_{z}=-1\right\rangle \left\langle L_{z}=-1\right|\right]\left[\begin{array}{c} \frac{1}{2}\\ \frac{1}{2}\\ \frac{1}{\sqrt{2}} \end{array}\right]\ \ \ \ \ (45)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\left[\begin{array}{c} 1\\ 0\\ 0 \end{array}\right]\left[\begin{array}{ccc} 1 & 0 & 0\end{array}\right]+\left[\begin{array}{c} 0\\ 0\\ 1 \end{array}\right]\left[\begin{array}{ccc} 0 & 0 & 1\end{array}\right]\right)\left[\begin{array}{c} \frac{1}{2}\\ \frac{1}{2}\\ \frac{1}{\sqrt{2}} \end{array}\right]\ \ \ \ \ (46)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{c} \frac{1}{2}\\ 0\\ \frac{1}{\sqrt{2}} \end{array}\right] \ \ \ \ \ (47)$

We can normalize this state to get

$\displaystyle \left|\psi\right\rangle _{after}=\frac{2}{\sqrt{3}}\left[\begin{array}{c} \frac{1}{2}\\ 0\\ \frac{1}{\sqrt{2}} \end{array}\right] \ \ \ \ \ (48)$

Thus if we measure ${L_{z}}$ immediately after the measurement of ${L_{z}^{2}}$ above, we get ${L_{z}=+1}$ with probability ${\frac{1}{3}}$ and ${L_{z}=-1}$ with probability ${\frac{2}{3}}$.

Finally, suppose we have a state ${\left|\psi\right\rangle }$ with the probabilities of measurements of ${L_{z}}$ given as ${P\left(L_{z}=1\right)=\frac{1}{4}}$, ${P\left(L_{z}=0\right)=\frac{1}{2}}$ and ${P\left(L_{z}=-1\right)=\frac{1}{4}}$. Since these probabilities are given by ${\left|\left\langle L_{z}=\lambda\left|\psi\right.\right\rangle \right|^{2}}$ for each of the three possible values of ${\lambda}$, and the vectors ${\left|L_{z}=\lambda\right\rangle }$ are orthonormal, the most general form for ${\left|\psi\right\rangle }$ is

$\displaystyle \left|\psi\right\rangle =\frac{e^{i\delta_{1}}}{2}\left|L_{z}=1\right\rangle +\frac{e^{i\delta_{2}}}{\sqrt{2}}\left|L_{z}=0\right\rangle +\frac{e^{i\delta_{3}}}{2}\left|L_{z}=-1\right\rangle \ \ \ \ \ (49)$

where the ${\delta_{i}}$ are real numbers. For example

$\displaystyle \left|\left\langle L_{z}=1\left|\psi\right.\right\rangle \right|^{2}=\left|\frac{e^{i\delta_{1}}}{2}\right|^{2}=\frac{1}{4} \ \ \ \ \ (50)$

While the presence of a phase factor in a solitary state doesn’t affect the physics of that state, if we have a sum of states, each with its own (different) phase factor, we can’t ignore these phase factors. For example, if we measure ${L_{x}}$ in this state and want the probability that ${L_{x}=0}$, we have, using 30

 $\displaystyle P\left(L_{x}=0\right)$ $\displaystyle =$ $\displaystyle \left|\left\langle L_{x}=0\left|\psi\right.\right\rangle \right|^{2}\ \ \ \ \ (51)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left|\frac{1}{\sqrt{2}}\left[\begin{array}{ccc} 1 & 0 & -1\end{array}\right]\left(\frac{e^{i\delta_{1}}}{2}\left|L_{z}=1\right\rangle +\frac{e^{i\delta_{2}}}{\sqrt{2}}\left|L_{z}=0\right\rangle +\frac{e^{i\delta_{3}}}{2}\left|L_{z}=-1\right\rangle \right)\right|^{2}\ \ \ \ \ (52)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left|\frac{1}{\sqrt{2}}\left[\begin{array}{ccc} 1 & 0 & -1\end{array}\right]\left(\frac{e^{i\delta_{1}}}{2}\left[\begin{array}{c} 1\\ 0\\ 0 \end{array}\right]+\frac{e^{i\delta_{2}}}{\sqrt{2}}\left[\begin{array}{c} 0\\ 1\\ 0 \end{array}\right]+\frac{e^{i\delta_{3}}}{2}\left[\begin{array}{c} 0\\ 0\\ 1 \end{array}\right]\right)\right|^{2}\ \ \ \ \ (53)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{8}\left|e^{i\delta_{1}}+e^{i\delta_{3}}\right|^{2}\ \ \ \ \ (54)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{8}\left|1+e^{i\left(\delta_{3}-\delta_{1}\right)}\right|^{2} \ \ \ \ \ (55)$

The last line will have a different result for different values of the phase factors ${\delta_{1}}$ and ${\delta_{3}}$, so they can’t be ignored.

# Relation between action and energy

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.8; Exercises 2.8.6 – 2.8.7.

Here we’ll examine an interesting relation between the action ${S}$ and the total energy of a system, as given by the Hamiltonian ${H}$. Suppose a single particle moving in one dimension follows a classical path given by ${x_{cl}\left(t\right)}$, and moves from an initial position at time ${t_{i}}$ of ${x_{cl}\left(t_{i}\right)=x_{i}}$ to a final position at time ${t_{f}}$ of ${x_{cl}\left(t_{f}\right)=x_{f}}$. The action ${S_{cl}}$ of this classical path is given by the integral of the Lagrangian

$\displaystyle S_{cl}=\int_{t_{i}}^{t_{f}}L\left(x,\dot{x}\right)dt \ \ \ \ \ (1)$

What can we say about the rate of change of the action with respect to the final time ${t_{f}}$? That is, we want to calculate ${\partial S_{cl}/\partial t_{f}}$, where all other parameters ${t_{i},x_{i}}$and ${x_{f}}$ are held constant. The situation can be illustrated as shown:

Since the only thing that is changing is ${t_{f}}$, the particle starts at the same initial time (which we’ve taken to be ${t_{i}=0}$ in the diagram) and moves to the same location ${x_{f}}$, but at a different time (in the diagram, later time). This means that the particle must follow a different path, possibly over its entire trajectory. This path, which we’ll call ${x\left(t\right)}$, is related to the original path ${x_{cl}\left(t\right)}$ by perturbing the original path by an amount ${\eta\left(t\right)}$:

$\displaystyle x\left(t\right)=x_{cl}\left(t\right)+\eta\left(t\right) \ \ \ \ \ (2)$

In the diagram, the original path ${x_{cl}}$ is shown in red and the perturbed path ${x}$ in blue. The amount ${\eta}$ is seen to be the vertical distance between these two curves at each time, and in the case of the paths shown in the diagram, ${\eta\left(t\right)<0}$.

The difference in the action between the two paths is due to two contributions: first, there is the contribution due to the extra time, from ${t_{f}}$ to ${t_{f}+\Delta t}$, that the particle takes to complete its path. Second, there is the difference in the two actions over the path from ${t_{i}}$ to ${t_{f}}$. The first contribution is entirely new and, for an infinitesimal extra time ${\Delta t}$, it is given by

$\displaystyle \delta S_{1}=L\left(t_{f}\right)\Delta t \ \ \ \ \ (3)$

where ${L\left(t_{f}\right)}$ is the Lagrangian evaluated at time ${t_{f}}$. The other contribution can be obtained by varying the action over the path from ${t_{i}=0}$ to ${t_{f}}$:

$\displaystyle \delta S_{2}=\int_{0}^{t_{f}}\delta L\;dt \ \ \ \ \ (4)$

Since ${L}$ depends on ${x}$ and ${\dot{x}}$, we have

$\displaystyle \delta L=\frac{\partial L}{\partial x}\delta x+\frac{\partial L}{\partial\dot{x}}\delta\dot{x} \ \ \ \ \ (5)$

For infinitesimally different trajectories, we can see from the diagram above that ${\delta x=\eta\left(t\right)}$ at each point on the curve, so ${\delta\dot{x}=\dot{\eta}\left(t\right)}$, so we get

 $\displaystyle \delta S_{2}$ $\displaystyle =$ $\displaystyle \int_{0}^{t_{f}}\left[\frac{\partial L}{\partial x}\eta\left(t\right)+\frac{\partial L}{\partial\dot{x}}\dot{\eta}\left(t\right)\right]\;dt\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int_{0}^{t_{f}}\left[-\frac{d}{dt}\frac{\partial L}{\partial\dot{x}}+\frac{\partial L}{\partial x}\right]\eta\left(t\right)dt+\int_{0}^{t_{f}}\frac{d}{dt}\left(\frac{\partial L}{\partial\dot{x}}\eta\left(t\right)\right)dt\ \ \ \ \ (7)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0+\left.\frac{\partial L}{\partial\dot{x}}\eta\left(t\right)\right|_{t_{f}} \ \ \ \ \ (8)$

In these equations, the derivatives of ${L}$ are evaluated on the original curve ${x_{cl}}$. To verify the second line, use the product rule on the second integrand and cancel terms to get the first line. The second term in the last is evaluated at ${t=t_{f}}$ only since we’re assuming that ${\eta\left(0\right)=0}$.

The quantity in brackets in the first integral is zero, because of the Euler-Lagrange equations which are valid on the original curve ${x_{cl}}$:

$\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{x}}-\frac{\partial L}{\partial x}=0 \ \ \ \ \ (9)$

Putting everything together, we get for the total variation in the action:

 $\displaystyle \delta S_{cl}$ $\displaystyle =$ $\displaystyle \delta S_{1}+\delta S_{2}\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\frac{\partial L}{\partial\dot{x}}\eta\left(t\right)+L\Delta t\right]_{t_{f}} \ \ \ \ \ (11)$

Looking at the diagram above, the slope of the blue curve ${x\left(t_{f}\right)}$ at the time ${t_{f}}$ is given by

$\displaystyle \dot{x}\left(t_{f}\right)=\frac{\left|\eta\left(t_{f}\right)\right|}{\Delta t} \ \ \ \ \ (12)$

From the definition 2 of ${\eta}$ we see that ${\eta\left(t_{f}\right)<0}$, so

$\displaystyle \eta\left(t_{f}\right)=-\dot{x}\left(t_{f}\right)\Delta t \ \ \ \ \ (13)$

This gives the final equation for the variation of the action:

 $\displaystyle \delta S_{cl}$ $\displaystyle =$ $\displaystyle \left[-\frac{\partial L}{\partial\dot{x}}\dot{x}+L\right]_{t_{f}}\Delta t\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(-p\dot{x}+L\right)\Delta t\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -H\Delta t \ \ \ \ \ (16)$

where the second line follows from the definition of the canonical momentum ${p=\partial L/\partial\dot{x}}$.

The required derivative is

$\displaystyle \boxed{\frac{\partial S_{cl}}{\partial t_{f}}=-H\left(t_{f}\right)} \ \ \ \ \ (17)$

Using a similar technique, we can work out ${\partial S_{cl}/\partial x_{f}}$. In this case, the situation is as shown in this diagram:

The two trajectories now take the same time, but in the modified trajectory, the particle moves a distance ${\Delta x}$ further. Since both paths take the same time, there is no extra contribution ${L\Delta t}$. In this case ${\eta\left(t\right)>0}$, since the new (blue) curve ${x\left(t\right)}$ is above the old (red) one ${x_{cl}\left(t\right)}$. The derivation is the same as above up to 8, and the total variation in the action is now

$\displaystyle \delta S_{cl}=\left.\frac{\partial L}{\partial\dot{x}}\eta\left(t\right)\right|_{t_{f}} \ \ \ \ \ (18)$

At ${t=t_{f}}$, ${\eta\left(t_{f}\right)=\Delta x}$, so we get

 $\displaystyle \delta S_{cl}$ $\displaystyle =$ $\displaystyle \left.\frac{\partial L}{\partial\dot{x}}\right|_{t_{f}}\Delta x\ \ \ \ \ (19)$ $\displaystyle \frac{\partial S_{cl}}{\partial x_{f}}$ $\displaystyle =$ $\displaystyle \left.\frac{\partial L}{\partial\dot{x}}\right|_{t_{f}}=p\left(t_{f}\right) \ \ \ \ \ (20)$

Example We can verify 17 for the case of the one-dimensional harmonic oscillator. The general solution for the position is given by

 $\displaystyle x\left(t\right)$ $\displaystyle =$ $\displaystyle A\cos\omega t+B\sin\omega t\ \ \ \ \ (21)$ $\displaystyle \dot{x}\left(t\right)$ $\displaystyle =$ $\displaystyle -A\omega\sin\omega t+B\omega\cos\omega t \ \ \ \ \ (22)$

The total energy is given by

 $\displaystyle E$ $\displaystyle =$ $\displaystyle \frac{1}{2}m\dot{x}^{2}+\frac{1}{2}m\omega^{2}x^{2}\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m}{2}\left(\left(-A\omega\sin\omega t+B\omega\cos\omega t\right)^{2}+\omega^{2}\left(A\cos\omega t+B\sin\omega t\right)^{2}\right)\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega^{2}}{2}\left(A^{2}+B^{2}\right) \ \ \ \ \ (25)$

where we just multiplied out the second line, cancelled terms and used ${\cos^{2}x+\sin^{2}x=1}$.

To get the action, we need the Lagrangian:

 $\displaystyle L$ $\displaystyle =$ $\displaystyle T-V\ \ \ \ \ (26)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2}m\dot{x}^{2}-\frac{1}{2}m\omega^{2}x^{2}\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m}{2}\left(\left(-A\omega\sin\omega t+B\omega\cos\omega t\right)^{2}-\omega^{2}\left(A\cos\omega t+B\sin\omega t\right)^{2}\right)\ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega^{2}}{2}\left[A^{2}\left(\sin^{2}\omega t-\cos^{2}\omega t\right)+B^{2}\left(\cos^{2}\omega t-\sin^{2}\omega t\right)-4AB\sin\omega t\cos\omega t\right]\ \ \ \ \ (29)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega^{2}}{2}\left(\left(B^{2}-A^{2}\right)\cos2\omega t-2AB\sin2\omega t\right) \ \ \ \ \ (30)$

The action for a trajectory from ${t=0}$ to ${t=T}$ is then

 $\displaystyle S$ $\displaystyle =$ $\displaystyle \int_{0}^{T}Ldt\ \ \ \ \ (31)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega}{4}\left[\left(B^{2}-A^{2}\right)\sin2\omega t+2AB\cos2\omega t\right]_{0}^{T}\ \ \ \ \ (32)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega}{4}\left[\left(B^{2}-A^{2}\right)\sin2\omega T+2AB\left(\cos2\omega T-1\right)\right]\ \ \ \ \ (33)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega}{2}\left[\left(B^{2}-A^{2}\right)\sin\omega T\cos\omega T+AB\left(\cos^{2}\omega T-\sin^{2}\omega T-1\right)\right]\ \ \ \ \ (34)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega}{2}\left[\left(B^{2}-A^{2}\right)\sin\omega T\cos\omega T-2AB\sin^{2}\omega T\right] \ \ \ \ \ (35)$

To proceed further, we need to specify ${A}$ and ${B}$, since these depend on the boundary conditions (that is, on where we require the mass to be at ${t=0}$ and ${t=T}$). If we require ${x\left(0\right)=x_{1}}$ and ${x\left(T\right)=x_{2}}$, then

 $\displaystyle A$ $\displaystyle =$ $\displaystyle x_{1}\ \ \ \ \ (36)$ $\displaystyle x_{1}\cos\omega T+B\sin\omega T$ $\displaystyle =$ $\displaystyle x_{2}\ \ \ \ \ (37)$ $\displaystyle B$ $\displaystyle =$ $\displaystyle \frac{x_{2}-x_{1}\cos\omega T}{\sin\omega T} \ \ \ \ \ (38)$

Plugging these into 25 gives the energy as

 $\displaystyle E$ $\displaystyle =$ $\displaystyle \frac{m\omega^{2}}{2}\left(x_{1}^{2}+\left(\frac{x_{2}-x_{1}\cos\omega T}{\sin\omega T}\right)^{2}\right)\ \ \ \ \ (39)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega^{2}}{2\sin^{2}\omega T}\left(x_{1}^{2}+x_{2}^{2}-2x_{1}x_{2}\cos\omega T\right) \ \ \ \ \ (40)$

Plugging ${A}$ and ${B}$ into 35, we get (using ${c\equiv\cos\omega T}$ and ${s\equiv\sin\omega T}$, so that ${s^{2}+c^{2}=1}$):

 $\displaystyle S$ $\displaystyle =$ $\displaystyle \frac{m\omega}{2s}\left[\left(x_{2}-x_{1}c\right)^{2}c-x_{1}s^{2}c-2x_{1}s^{2}\left(x_{2}-x_{1}c\right)\right]\ \ \ \ \ (41)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega}{2s}\left[\left(x_{2}^{2}-2x_{1}x_{2}c+x_{1}^{2}c^{2}\right)c-x_{1}^{2}s^{2}c-2x_{1}x_{2}s^{2}+2x_{1}s^{2}c\right]\ \ \ \ \ (42)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega}{2s}\left[\left(x_{1}^{2}+x_{2}^{2}\right)c-2x_{1}x_{2}\right]\ \ \ \ \ (43)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega}{2\sin\omega T}\left[\left(x_{1}^{2}+x_{2}^{2}\right)\cos\omega T-2x_{1}x_{2}\right] \ \ \ \ \ (44)$

Taking the derivative, we get

 $\displaystyle \frac{\partial S}{\partial T}$ $\displaystyle =$ $\displaystyle \frac{m\omega}{2s^{2}}\left[-\omega\left(x_{1}^{2}+x_{2}^{2}\right)s^{2}-\left(\left(x_{1}^{2}+x_{2}^{2}\right)c-2x_{1}x_{2}\right)\omega c\right]\ \ \ \ \ (45)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m\omega^{2}}{2s^{2}}\left[-\left(x_{1}^{2}+x_{2}^{2}\right)+2x_{1}x_{2}c\right]\ \ \ \ \ (46)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\frac{m\omega^{2}}{2\sin^{2}\omega T}\left(x_{1}^{2}+x_{2}^{2}-2x_{1}x_{2}\cos\omega T\right)\ \ \ \ \ (47)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -E \ \ \ \ \ (48)$

Thus the result is verified for the harmonic oscillator.

# Hamilton’s equations of motion under a regular canonical transformation

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.8; Exercise 2.8.5.

If the Hamiltonian is invariant under a regular canonical transformation and we can find a generator ${g}$ such that an infinitesimal version of this transformation is given by

 $\displaystyle \bar{q}_{i}$ $\displaystyle =$ $\displaystyle q_{i}+\varepsilon\frac{\partial g}{\partial p_{i}}\equiv q_{i}+\delta q_{i}\ \ \ \ \ (1)$ $\displaystyle \bar{p}_{i}$ $\displaystyle =$ $\displaystyle p_{i}-\varepsilon\frac{\partial g}{\partial q_{i}}\equiv p_{i}+\delta p_{i} \ \ \ \ \ (2)$

then ${g}$ is conserved.

If we are dealing with a finite regular canonical transformation where we go from ${\left(q,p\right)\rightarrow\left(\bar{q},\bar{p}\right)}$, and the Hamiltonian is invariant under this transformation, then it turns out that if a trajectory ${\left(q\left(t\right),p\left(t\right)\right)}$ satisfies Hamilton’s equations of motion:

 $\displaystyle \frac{\partial H}{\partial p_{i}}$ $\displaystyle =$ $\displaystyle \dot{q}_{i}\ \ \ \ \ (3)$ $\displaystyle -\frac{\partial H}{\partial q_{i}}$ $\displaystyle =$ $\displaystyle \dot{p}_{i} \ \ \ \ \ (4)$

then the trajectory obtained by transforming every point in the original trajectory ${\left(q\left(t\right),p\left(t\right)\right)}$ to the barred system ${\left(\bar{q}\left(t\right),\bar{p}\left(t\right)\right)}$ is also a solution of Hamilton’s equations in the sense that

 $\displaystyle \frac{\partial H}{\partial\bar{p}_{i}}$ $\displaystyle =$ $\displaystyle \dot{\bar{q}}_{i}\ \ \ \ \ (5)$ $\displaystyle -\frac{\partial H}{\partial\bar{q}_{i}}$ $\displaystyle =$ $\displaystyle \dot{\bar{p}}_{i} \ \ \ \ \ (6)$

The proof of this is a bit subtle, but goes as follows. To begin, review the derivation of the conditions for a transformation to be canonical. This derivation applied to a passive transformation, in which the two sets of parameters ${\left(q,p\right)\rightarrow\left(\bar{q},\bar{p}\right)}$ refer to the same point in phase space. The transformation we’re considering here is an active transformation, in which ${\left(q,p\right)\rightarrow\left(\bar{q},\bar{p}\right)}$ actually moves the point in phase space. The original derivation (for passive transformations) relied on the fact that the numerical value of the Hamiltonian is the same in both coordinate systems, since both ${\left(q,p\right)}$ and ${\left(\bar{q},\bar{p}\right)}$ refer to the same point in phase space. However, for our active transformation, we’re assuming that the Hamiltonian is invariant under the transformation, that is ${H\left(\bar{q},\bar{p}\right)=H\left(q,p\right)}$, where ${\left(q,p\right)}$ and ${\left(\bar{q},\bar{p}\right)}$ now refer to different points in phase space. Since the assumption that the Hamiltonian satisfies ${H\left(\bar{q},\bar{p}\right)=H\left(q,p\right)}$ was all that we used in the original derivation, the same derivation works both for passive transformations (always) and for active transformations (if the Hamiltonian is invariant under the active transformation). We therefore end up with the equations

 $\displaystyle \dot{\overline{q}}_{j}$ $\displaystyle =$ $\displaystyle \sum_{k}\frac{\partial H}{\partial\overline{q}_{k}}\left\{ \overline{q}_{j},\overline{q}_{k}\right\} +\sum_{k}\frac{\partial H}{\partial\overline{p}_{k}}\left\{ \overline{q}_{j},\overline{p}_{k}\right\} \ \ \ \ \ (7)$ $\displaystyle \dot{\overline{p}}_{j}$ $\displaystyle =$ $\displaystyle \sum_{k}\frac{\partial H}{\partial\overline{q}_{k}}\left\{ \overline{p}_{j},\overline{q}_{k}\right\} +\sum_{k}\frac{\partial H}{\partial\overline{p}_{k}}\left\{ \overline{p}_{j},\overline{p}_{k}\right\} \ \ \ \ \ (8)$

Since the transformation is specified to be canonical, the conditions on the Poisson brackets apply here:

 $\displaystyle \left\{ \overline{q}_{j},\overline{q}_{k}\right\}$ $\displaystyle =$ $\displaystyle \left\{ \overline{p}_{j},\overline{p}_{k}\right\} =0\ \ \ \ \ (9)$ $\displaystyle \left\{ \overline{q}_{j},\overline{p}_{k}\right\}$ $\displaystyle =$ $\displaystyle \delta_{jk} \ \ \ \ \ (10)$

The result is that the transformed trajectory also satisfies Hamilton’s equations 5 and 6.

We can now revisit the 2-d harmonic oscillator to show that a noncanonical transformation violates these results. The Hamiltonian is

$\displaystyle H=\frac{1}{2m}\left(p_{x}^{2}+p_{y}^{2}\right)+\frac{1}{2}m\omega^{2}\left(x^{2}+y^{2}\right) \ \ \ \ \ (11)$

and we consider the transformation where we rotate the coordinates but not the momenta. The transformation is

 $\displaystyle \bar{x}$ $\displaystyle =$ $\displaystyle x\cos\theta-y\sin\theta\ \ \ \ \ (12)$ $\displaystyle \bar{y}$ $\displaystyle =$ $\displaystyle x\sin\theta+y\cos\theta\ \ \ \ \ (13)$ $\displaystyle \bar{p}_{x}$ $\displaystyle =$ $\displaystyle p_{x}\ \ \ \ \ (14)$ $\displaystyle \bar{p}_{y}$ $\displaystyle =$ $\displaystyle p_{y} \ \ \ \ \ (15)$

As we’ve seen, this is a noncanonical transformation. To see what happens, we’ll consider the initial conditions

 $\displaystyle x\left(0\right)$ $\displaystyle =$ $\displaystyle a\ \ \ \ \ (16)$ $\displaystyle p_{x}\left(0\right)$ $\displaystyle =$ $\displaystyle b\ \ \ \ \ (17)$ $\displaystyle y\left(0\right)$ $\displaystyle =$ $\displaystyle p_{y}\left(0\right)=0 \ \ \ \ \ (18)$

The mass is started off at a point on the ${x}$ axis with a momentum only in the ${x}$ direction. In this case, the mass behaves like a one-dimensional harmonic oscillator, moving along the ${x}$ axis only. To be precise, we can work out Hamilton’s equations of motion:

 $\displaystyle \dot{p}_{x}$ $\displaystyle =$ $\displaystyle -\frac{\partial H}{\partial x}=-m\omega^{2}x\ \ \ \ \ (19)$ $\displaystyle \dot{x}$ $\displaystyle =$ $\displaystyle \frac{\partial H}{\partial p_{x}}=\frac{p_{x}}{m} \ \ \ \ \ (20)$

The equations for ${y}$ and ${p_{y}}$ are the same, with ${x}$ replaced by ${y}$ everywhere. We can solve these ODEs in the usual way, by differentiating the first one and substituting the second one into the first to get

$\displaystyle \ddot{p}_{x}=-m\omega^{2}\dot{x}=-\omega^{2}p_{x} \ \ \ \ \ (21)$

This has the general solution

$\displaystyle p_{x}\left(t\right)=A\cos\omega t+B\sin\omega t \ \ \ \ \ (22)$

We can do the same for ${x}$ and get

$\displaystyle x\left(t\right)=C\cos\omega t+D\sin\omega t \ \ \ \ \ (23)$

Applying the initial conditions, we get

 $\displaystyle p_{x}\left(0\right)$ $\displaystyle =$ $\displaystyle A=b\ \ \ \ \ (24)$ $\displaystyle x\left(0\right)$ $\displaystyle =$ $\displaystyle C=a \ \ \ \ \ (25)$

Plugging these into the equations of motion 19 and 20 and solving for ${B}$ and ${D}$ we get the final solution

 $\displaystyle p_{x}\left(t\right)$ $\displaystyle =$ $\displaystyle b\cos\omega t-m\omega a\sin\omega t\ \ \ \ \ (26)$ $\displaystyle x\left(t\right)$ $\displaystyle =$ $\displaystyle a\cos\omega t+\frac{b}{m\omega}\sin\omega t\ \ \ \ \ (27)$ $\displaystyle y\left(t\right)$ $\displaystyle =$ $\displaystyle p_{y}\left(t\right)=0 \ \ \ \ \ (28)$

Now suppose we start off with ${x\left(0\right)=0}$, ${y\left(0\right)=a}$, ${p_{x}\left(0\right)=b}$ and ${p_{y}\left(0\right)=0}$. That is, we have rotated the coordinates through ${\frac{\pi}{2}}$, but not the momenta. We now begin with the mass on the ${y}$ axis, but moving in the ${x}$ direction, so as time progresses, it will have components of momentum in both the ${x}$ and ${y}$ directions. Although it’s fairly obvious that this motion will not be simply the motion in the first case rotated through ${\frac{\pi}{2}}$, let’s go through the equations. By the same technique as above, we can solve the equations to get

 $\displaystyle p_{x}\left(t\right)$ $\displaystyle =$ $\displaystyle b\cos\omega t\ \ \ \ \ (29)$ $\displaystyle p_{y}\left(t\right)$ $\displaystyle =$ $\displaystyle -m\omega a\sin\omega t\ \ \ \ \ (30)$ $\displaystyle x\left(t\right)$ $\displaystyle =$ $\displaystyle \frac{b}{m\omega}\sin\omega t\ \ \ \ \ (31)$ $\displaystyle y\left(t\right)$ $\displaystyle =$ $\displaystyle a\cos\omega t \ \ \ \ \ (32)$

If we look at the system at, say, ${t=\frac{\pi}{2\omega}}$, then ${\cos\omega t=0}$ and ${\sin\omega t=1}$. The mass that started off on the ${x}$ axis will be at position ${\left(x,y\right)=\left(\frac{b}{m\omega},0\right)}$ and so will the mass that started off on the ${y}$ axis. Since the two masses are in the same place, obviously one is not the rotated version of the other.

Another, probably easier, way to see this is that since the first mass moves only along the ${x}$ axis, if the rotated version of the trajectory was also to be a solution, the rotated trajectory would have to lie entirely along the ${y}$ axis, which is certainly not true for the mass that starts off on the ${y}$ axis, but with a momentum ${p_{x}\ne0}$.

In the general case, if the transformation is noncanonical, then the Poisson brackets in 7 and 8 don’t satisfy the conditions 9 and 10, with the result that Hamilton’s equations aren’t satisfied in the ${\left(\bar{q},\bar{p}\right)}$ coordinates. (There may be a deeper, physical interpretation that I’ve missed, but from a mathematical point of view, that’s what goes wrong.)

# Infinitesimal rotations in canonical and noncanonical transformations

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.8; Exercises 2.8.3 – 2.8.4.

Here are a couple of examples of transformations of variables and their consequences with regard to conservation laws.

First, we look at the 2-d harmonic oscillator where the Hamiltonian is

$\displaystyle H=\frac{1}{2m}\left(p_{x}^{2}+p_{y}^{2}\right)+\frac{1}{2}m\omega^{2}\left(x^{2}+y^{2}\right) \ \ \ \ \ (1)$

If we rotate the system so that both the coordinates and momenta get rotated, then

 $\displaystyle \bar{x}$ $\displaystyle =$ $\displaystyle x\cos\theta-y\sin\theta\ \ \ \ \ (2)$ $\displaystyle \bar{y}$ $\displaystyle =$ $\displaystyle x\sin\theta+y\cos\theta\ \ \ \ \ (3)$ $\displaystyle \bar{p}_{x}$ $\displaystyle =$ $\displaystyle p_{x}\cos\theta-p_{y}\sin\theta\ \ \ \ \ (4)$ $\displaystyle \bar{p}_{y}$ $\displaystyle =$ $\displaystyle p_{x}\sin\theta+p_{y}\cos\theta \ \ \ \ \ (5)$

We can show by direct calculation that ${H}$ is invariant under this transformation, and we can verify that this is a canonical transformation. Shankar shows in his equation 2.8.8 that the generator of this transformation is the angular momentum ${\ell_{z}=xp_{y}-yp_{x}}$.

However, if we rotate only the coordinates and not the momenta, we get the transformation:

 $\displaystyle \bar{x}$ $\displaystyle =$ $\displaystyle x\cos\theta-y\sin\theta\ \ \ \ \ (6)$ $\displaystyle \bar{y}$ $\displaystyle =$ $\displaystyle x\sin\theta+y\cos\theta\ \ \ \ \ (7)$ $\displaystyle \bar{p}_{x}$ $\displaystyle =$ $\displaystyle p_{x}\ \ \ \ \ (8)$ $\displaystyle \bar{p}_{y}$ $\displaystyle =$ $\displaystyle p_{y} \ \ \ \ \ (9)$

Again, we can show by direct calculation that

$\displaystyle \bar{x}^{2}+\bar{y}^{2}=x^{2}+y^{2} \ \ \ \ \ (10)$

so ${H}$ is also invariant under this transformation. However, this transformation is noncanonical, as we can see by calculating one of the Poisson brackets:

 $\displaystyle \left\{ \bar{x},\bar{p}_{x}\right\}$ $\displaystyle =$ $\displaystyle \sum_{i}\left(\frac{\partial\overline{x}}{\partial q_{i}}\frac{\partial\bar{p}_{x}}{\partial p_{i}}-\frac{\partial\overline{x}}{\partial p_{i}}\frac{\partial\bar{p}_{x}}{\partial q_{i}}\right)\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \cos\theta\ne1 \ \ \ \ \ (12)$

The other mixed brackets (with a coordinate and a momentum) are also not either 0 or 1 as would be required if the transformation were to be canonical.

In order for this transformation to give rise to a conservation law, we would need to find a generator ${g}$ that satisfied, for an infinitesimal rotation ${\varepsilon}$:

 $\displaystyle \bar{q}_{i}$ $\displaystyle =$ $\displaystyle q_{i}+\varepsilon\frac{\partial g}{\partial p_{i}}\equiv q_{i}+\delta q_{i}\ \ \ \ \ (13)$ $\displaystyle \bar{p}_{i}$ $\displaystyle =$ $\displaystyle p_{i}-\varepsilon\frac{\partial g}{\partial q_{i}}\equiv p_{i}+\delta p_{i} \ \ \ \ \ (14)$

For an infinitesimal rotation, the transformation 6 becomes

 $\displaystyle \bar{x}$ $\displaystyle =$ $\displaystyle x-\varepsilon y\ \ \ \ \ (15)$ $\displaystyle \bar{y}$ $\displaystyle =$ $\displaystyle y+\varepsilon x\ \ \ \ \ (16)$ $\displaystyle \bar{p}_{x}$ $\displaystyle =$ $\displaystyle p_{x}\ \ \ \ \ (17)$ $\displaystyle \bar{p}_{y}$ $\displaystyle =$ $\displaystyle p_{y} \ \ \ \ \ (18)$

Therefore, the generator would have to satisfy

 $\displaystyle \frac{\partial g}{\partial p_{x}}$ $\displaystyle =$ $\displaystyle -y\ \ \ \ \ (19)$ $\displaystyle \frac{\partial g}{\partial p_{y}}$ $\displaystyle =$ $\displaystyle x\ \ \ \ \ (20)$ $\displaystyle \frac{\partial g}{\partial x}$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (21)$ $\displaystyle \frac{\partial g}{\partial y}$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (22)$

The last two conditions state that ${g}$ cannot depend on ${x}$ or ${y}$, but integrating the first two conditions, we get

$\displaystyle g=-yp_{x}+xp_{y}+f\left(x,y\right) \ \ \ \ \ (23)$

where ${f}$ is a function that depends only on ${x}$ and/or ${y}$. Thus there is no ${g}$ that satisfies all four conditions, so there is no conservation law associated with a rotation of the coordinates only, even though the Hamiltonian is invariant under this transformation. Only canonical transformations that leave ${H}$ invariant give rise to conservation laws.

As another example, suppose he have the one-dimensional system with

$\displaystyle H=\frac{1}{2}\left(p^{2}+x^{2}\right) \ \ \ \ \ (24)$

and perform a rotation in phase space, that is, in the ${x-p}$ plane:

 $\displaystyle \bar{x}$ $\displaystyle =$ $\displaystyle x\cos\theta-p\sin\theta\ \ \ \ \ (25)$ $\displaystyle \bar{p}$ $\displaystyle =$ $\displaystyle x\sin\theta+p\cos\theta \ \ \ \ \ (26)$

The Hamiltonian is invariant:

 $\displaystyle \bar{p}^{2}+\bar{x}^{2}$ $\displaystyle =$ $\displaystyle x^{2}\sin^{2}\theta+2xp\sin\theta\cos\theta+p^{2}\cos^{2}\theta+\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle$ $\displaystyle x^{2}\cos^{2}\theta-2xp\sin\theta\cos\theta+p^{2}\sin^{2}\theta\ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x^{2}+p^{2} \ \ \ \ \ (29)$

The transformation is canonical as we can verify by calculating the Poisson bracket

 $\displaystyle \left\{ \bar{x},\bar{p}\right\}$ $\displaystyle =$ $\displaystyle \frac{\partial\overline{x}}{\partial x}\frac{\partial\bar{p}}{\partial p}-\frac{\partial\overline{x}}{\partial p}\frac{\partial\bar{p}}{\partial x}\ \ \ \ \ (30)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \cos^{2}\theta-\left(-\sin^{2}\theta\right)\ \ \ \ \ (31)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 1 \ \ \ \ \ (32)$

An infinitesimal rotation gives the transformation

 $\displaystyle \bar{x}$ $\displaystyle =$ $\displaystyle x-\varepsilon p\ \ \ \ \ (33)$ $\displaystyle \bar{p}$ $\displaystyle =$ $\displaystyle p+\varepsilon x \ \ \ \ \ (34)$

To find the generator, we need to solve 13 and 14:

 $\displaystyle \frac{\partial g}{\partial p}$ $\displaystyle =$ $\displaystyle -p\ \ \ \ \ (35)$ $\displaystyle \frac{\partial g}{\partial x}$ $\displaystyle =$ $\displaystyle -x \ \ \ \ \ (36)$

These can be integrated to give

$\displaystyle g\left(x,p\right)=-\frac{1}{2}\left(p^{2}+x^{2}\right)+C \ \ \ \ \ (37)$

where ${C}$ is a constant of integration. Thus the quantity that is conserved is (apart from the minus sign, which we could eliminate by rotating through ${-\theta}$ instead of ${\theta}$) is just the original Hamiltonian, or total energy.

# Passive, regular and active transformations. Invariance of the Hamiltonian and generators of transformations

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Sections 2.7 & 2.8; Exercises 2.8.1 – 2.8.2.

The canonical transformations we’ve considered so far are of the form

 $\displaystyle \overline{q}_{i}$ $\displaystyle =$ $\displaystyle \overline{q}_{i}\left(q,p\right)\ \ \ \ \ (1)$ $\displaystyle \overline{p}_{i}$ $\displaystyle =$ $\displaystyle \overline{p}_{i}\left(q,p\right) \ \ \ \ \ (2)$

The interpretation of these transformations is that we are using a new set of coordinates and momenta to describe the same point in phase space. For example, in 2-d we can describe the point one unit along the ${y}$ axis by the coordinates ${x=0,y=1}$ if we use rectangular coordinates, or by ${r=1,\theta=\frac{\pi}{2}}$ if we use polar coordinates. The numerical values of the coordinates are different in the two systems, but the geometric point being described is the same. Such a transformation is called a passive transformation. In a passive transformation, any function ${\omega}$ always has the same value at a given point in phase space no matter which coordinate system we’re using, so we can say that

$\displaystyle \omega\left(q,p\right)=\omega\left(\bar{q},\bar{p}\right) \ \ \ \ \ (3)$

where it is understood that ${\left(q,p\right)}$ and ${\left(\bar{q},\bar{p}\right)}$ both refer to the same point, but in different representations.

One characteristic of a passive transformation is that the ranges of the variables used to represent a point in phase space need not be the same in the two systems. For example, in 2-d rectangular coordinates, both ${x}$ and ${y}$ can range from ${-\infty}$ to ${+\infty}$, while in polar coordinates ${r}$ ranges between 0 and ${+\infty}$ while the angle ${\theta}$ runs between 0 and ${2\pi}$.

A special type of transformation is a regular transformation, in which the variables in the two systems have the same ranges. For example, if we translate a 2-d system by 1 unit along the ${x}$ axis, the new coordinates are related to the old ones by

 $\displaystyle \bar{x}$ $\displaystyle =$ $\displaystyle x-1\ \ \ \ \ (4)$ $\displaystyle \bar{y}$ $\displaystyle =$ $\displaystyle y \ \ \ \ \ (5)$

Both the original and barred systems have the same range (${-\infty}$ to ${+\infty}$).

Although we can interpret a regular transformation as a passive transformation, we can also think of it in a different way. We can image that instead of just providing a different label for the same point that the transformed coordinate has actually shifted the system to a new location in phase space. In the above example, this would mean that we have physically moved the system by 1 unit along the ${x}$ axis. This interpretation is known as an active transformation.

If a function ${\omega}$ is invariant under an active transformation, then it satisfies the condition

$\displaystyle \omega\left(q,p\right)=\omega\left(\bar{q},\bar{p}\right) \ \ \ \ \ (6)$

Although mathematically this is the same as 3, physically it means something quite different, since now the points ${\left(q,p\right)}$ and ${\left(\bar{q},\bar{p}\right)}$ refer to different points in phase space, so we’re saying that the function ${\omega}$ does not change when we move the physical system in the way specified by the active transformation.

We now restrict ourselves to talking about regular canonical transformations. Consider some dynamical variable (it could be momentum or angular momentum, for example) ${g\left(q,p\right)}$ and suppose we define the transformations

 $\displaystyle \bar{q}_{i}$ $\displaystyle =$ $\displaystyle q_{i}+\epsilon\frac{\partial g}{\partial p_{i}}\equiv q_{i}+\delta q_{i}\ \ \ \ \ (7)$ $\displaystyle \bar{p}_{i}$ $\displaystyle =$ $\displaystyle p_{i}-\epsilon\frac{\partial g}{\partial q_{i}}\equiv p_{i}+\delta p_{i} \ \ \ \ \ (8)$

where ${\epsilon}$ is some infinitesimal quantity.

First, we need to show that, to first order in ${\epsilon}$, this is a canonical transformation. The required conditions for this are

 $\displaystyle \left\{ \overline{q}_{i},\overline{q}_{j}\right\}$ $\displaystyle =$ $\displaystyle \left\{ \overline{p}_{i},\overline{p}_{j}\right\} =0\ \ \ \ \ (9)$ $\displaystyle \left\{ \overline{q}_{i},\overline{p}_{j}\right\}$ $\displaystyle =$ $\displaystyle \delta_{ij} \ \ \ \ \ (10)$

Consider first (we’ll use the summation convention, so the index ${k}$ is summed in what follows):

 $\displaystyle \left\{ \overline{q}_{i},\overline{p}_{j}\right\}$ $\displaystyle =$ $\displaystyle \frac{\partial}{\partial q_{k}}\left(q_{i}+\epsilon\frac{\partial g}{\partial p_{i}}\right)\frac{\partial}{\partial p_{k}}\left(p_{j}-\epsilon\frac{\partial g}{\partial q_{j}}\right)-\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \frac{\partial}{\partial p_{k}}\left(q_{i}+\epsilon\frac{\partial g}{\partial p_{i}}\right)\frac{\partial}{\partial q_{k}}\left(p_{j}-\epsilon\frac{\partial g}{\partial q_{j}}\right)\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\delta_{ik}+\epsilon\frac{\partial^{2}g}{\partial p_{i}q_{k}}\right)\left(\delta_{jk}-\epsilon\frac{\partial^{2}g}{\partial p_{k}q_{j}}\right)-\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \left(0+\epsilon\frac{\partial^{2}g}{\partial p_{i}p_{k}}\right)\left(0-\epsilon\frac{\partial^{2}g}{\partial q_{j}q_{k}}\right) \ \ \ \ \ (12)$

The zeroes in the last line follow from the fact that ${q_{k}}$ and ${p_{k}}$ are independent variables. We can now keep terms only up to first order in ${\epsilon}$ to get

 $\displaystyle \left\{ \overline{q}_{i},\overline{p}_{j}\right\}$ $\displaystyle =$ $\displaystyle \delta_{ik}\delta_{jk}+\epsilon\left(\frac{\partial^{2}g}{\partial p_{i}q_{k}}\delta_{jk}-\frac{\partial^{2}g}{\partial p_{k}q_{j}}\delta_{ik}\right)\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \delta_{ij}+\epsilon\left(\frac{\partial^{2}g}{\partial p_{i}q_{j}}-\frac{\partial^{2}g}{\partial p_{i}q_{j}}\right)\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \delta_{ij} \ \ \ \ \ (15)$

The other two brackets work out similarly:

 $\displaystyle \left\{ \overline{q}_{i},\overline{q}_{j}\right\}$ $\displaystyle =$ $\displaystyle \frac{\partial}{\partial q_{k}}\left(q_{i}+\epsilon\frac{\partial g}{\partial p_{i}}\right)\frac{\partial}{\partial p_{k}}\left(q_{j}+\epsilon\frac{\partial g}{\partial p_{j}}\right)-\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \frac{\partial}{\partial p_{k}}\left(q_{i}+\epsilon\frac{\partial g}{\partial p_{i}}\right)\frac{\partial}{\partial q_{k}}\left(q_{j}+\epsilon\frac{\partial g}{\partial p_{j}}\right)\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\delta_{ik}+\epsilon\frac{\partial^{2}g}{\partial p_{i}q_{k}}\right)\left(0+\epsilon\frac{\partial^{2}g}{\partial p_{k}p_{j}}\right)-\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \left(0+\epsilon\frac{\partial^{2}g}{\partial p_{i}p_{k}}\right)\left(\delta_{jk}+\epsilon\frac{\partial^{2}g}{\partial p_{j}q_{k}}\right)\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \delta_{ik}\epsilon\frac{\partial^{2}g}{\partial p_{k}p_{j}}-\delta_{jk}\epsilon\frac{\partial^{2}g}{\partial p_{i}p_{k}}\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \epsilon\left(\frac{\partial^{2}g}{\partial p_{i}p_{j}}-\frac{\partial^{2}g}{\partial p_{i}p_{j}}\right)\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (20)$
 $\displaystyle \left\{ \overline{p}_{i},\overline{p}_{j}\right\}$ $\displaystyle =$ $\displaystyle \frac{\partial}{\partial q_{k}}\left(p_{i}-\epsilon\frac{\partial g}{\partial q_{i}}\right)\frac{\partial}{\partial p_{k}}\left(p_{j}-\epsilon\frac{\partial g}{\partial q_{j}}\right)-\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \frac{\partial}{\partial p_{k}}\left(p_{i}-\epsilon\frac{\partial g}{\partial q_{i}}\right)\frac{\partial}{\partial q_{k}}\left(p_{j}-\epsilon\frac{\partial g}{\partial q_{j}}\right)\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(0-\epsilon\frac{\partial^{2}g}{\partial q_{i}q_{k}}\right)\left(\delta_{jk}-\epsilon\frac{\partial^{2}g}{\partial p_{k}q_{j}}\right)-\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \left(\delta_{ik}-\epsilon\frac{\partial^{2}g}{\partial q_{i}p_{k}}\right)\left(0-\epsilon\frac{\partial^{2}g}{\partial q_{j}q_{k}}\right)\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\delta_{jk}\epsilon\frac{\partial^{2}g}{\partial q_{i}q_{k}}+\delta_{ik}\epsilon\frac{\partial^{2}g}{\partial q_{k}q_{j}}\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\epsilon\left(\frac{\partial^{2}g}{\partial q_{i}q_{j}}-\frac{\partial^{2}g}{\partial q_{i}q_{j}}\right)\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (25)$

Thus all the brackets check out, so the transformation is canonical.

The point of all this is that, if the Hamiltonian is invariant under the transformations 7 and 8 then the variable ${g}$ is conserved (that is, doesn’t change with time). ${g}$ is called the generator of the transformation. We can verify this by using the chain rule to calculate the variation in ${H}$:

 $\displaystyle \delta H$ $\displaystyle =$ $\displaystyle \frac{\partial H}{\partial q_{i}}\delta q_{i}+\frac{\partial H}{\partial p_{i}}\delta p_{i}\ \ \ \ \ (26)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \epsilon\left[\frac{\partial H}{\partial q_{i}}\frac{\partial g}{\partial p_{i}}-\frac{\partial H}{\partial p_{i}}\frac{\partial g}{\partial q_{i}}\right]\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \epsilon\left\{ H,g\right\} \ \ \ \ \ (28)$

Since ${H}$ is invariant, we must have ${\delta H=0}$, so

$\displaystyle \left\{ H,g\right\} =0 \ \ \ \ \ (29)$

However, this is the condition for ${g}$ to be conserved. QED.

Example Suppose we have a two particle system moving in one dimension, with positions ${q_{1},q_{2}}$ and momenta ${p_{1},p_{2}}$. If we take

$\displaystyle g=p_{1}+p_{2} \ \ \ \ \ (30)$

we get

 $\displaystyle \delta q_{i}$ $\displaystyle =$ $\displaystyle \epsilon\frac{\partial g}{\partial p_{i}}=\epsilon\ \ \ \ \ (31)$ $\displaystyle \delta p_{i}$ $\displaystyle =$ $\displaystyle -\epsilon\frac{\partial g}{\partial q_{i}}=0 \ \ \ \ \ (32)$

That is, each particle gets shifted by the same amount ${\epsilon}$ but the momentum of each particle remains unchanged. Thus the total momentum is the generator of infinitesimal translations. The physical interpretation of this is that, since the momentum of each particle is conserved, the total kinetic energy

$\displaystyle T=\frac{p_{1}^{2}}{2m_{1}}+\frac{p_{2}^{2}}{2m_{2}} \ \ \ \ \ (33)$

remains unchanged. Since the total energy is invariant, the total potential energy of the system is unaffected by a translation, which means that there is no external force on the system.

# Poisson brackets are invariant under a canonical transformation

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.7; Exercise 2.7.9.

The Poisson bracket of two functions is defined as

$\displaystyle \left\{ \omega,\sigma\right\} =\sum_{i}\left(\frac{\partial\omega}{\partial q_{i}}\frac{\partial\sigma}{\partial p_{i}}-\frac{\partial\omega}{\partial p_{i}}\frac{\partial\sigma}{\partial q_{i}}\right) \ \ \ \ \ (1)$

Calculating the Poisson bracket requires knowing ${\omega}$ and ${\sigma}$ as functions of the coordinates ${q_{i}}$ and momenta ${p_{i}}$ in the particular coordinate system we’re using. However, we’ve seen that the Euler-Lagrange and Hamilton’s equations are invariant under a canonical transformation and since the Poisson bracket is a fundamental quantity in classical mechanics, in particular because the time derivative of a function ${\omega}$ is the Poisson bracket ${\left\{ \omega,H\right\} }$ with the Hamiltonian, it’s natural to ask how the Poisson bracket of two functions transforms under a canonical transformation.

The simplest way of finding out (although not the most elegant) is to write the canonical transformation as

 $\displaystyle \bar{q}_{i}$ $\displaystyle =$ $\displaystyle \bar{q}_{i}\left(q,p\right)\ \ \ \ \ (2)$ $\displaystyle \bar{p}_{i}$ $\displaystyle =$ $\displaystyle \bar{p}\left(q,p\right) \ \ \ \ \ (3)$

We can then write the Poisson bracket in the new coordinates as

$\displaystyle \left\{ \omega,\sigma\right\} _{\bar{q},\bar{p}}=\sum_{j}\left(\frac{\partial\omega}{\partial\bar{q}_{j}}\frac{\partial\sigma}{\partial\bar{p}_{j}}-\frac{\partial\omega}{\partial\bar{p}_{j}}\frac{\partial\sigma}{\partial\bar{q}_{j}}\right) \ \ \ \ \ (4)$

Assuming the transformation is invertible, we can use the chain rule to calculate the derivatives with respect to the barred coordinates. This gives the following (we’ve used the summation convention in which any index repeated twice in a product is summed; thus in the following, there are implied sums over ${i,j}$ and ${k}$):

 $\displaystyle \left\{ \omega,\sigma\right\} _{\bar{q},\bar{p}}$ $\displaystyle =$ $\displaystyle \left(\frac{\partial\omega}{\partial q_{i}}\frac{\partial q_{i}}{\partial\bar{q}_{j}}+\frac{\partial\omega}{\partial p_{i}}\frac{\partial p_{i}}{\partial\bar{q}_{j}}\right)\left(\frac{\partial\sigma}{\partial q_{k}}\frac{\partial q_{k}}{\partial\bar{p}_{j}}+\frac{\partial\sigma}{\partial p_{k}}\frac{\partial p_{k}}{\partial\bar{p}_{j}}\right)-\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \left(\frac{\partial\omega}{\partial q_{i}}\frac{\partial q_{i}}{\partial\bar{p}_{j}}+\frac{\partial\omega}{\partial p_{i}}\frac{\partial p_{i}}{\partial\bar{p}_{j}}\right)\left(\frac{\partial\sigma}{\partial q_{k}}\frac{\partial q_{k}}{\partial\bar{q}_{j}}+\frac{\partial\sigma}{\partial p_{k}}\frac{\partial p_{k}}{\partial\bar{q}_{j}}\right)\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\partial\omega}{\partial q_{i}}\frac{\partial\sigma}{\partial p_{k}}\left(\frac{\partial q_{i}}{\partial\bar{q}_{j}}\frac{\partial p_{k}}{\partial\bar{p}_{j}}-\frac{\partial q_{i}}{\partial\bar{p}_{j}}\frac{\partial p_{k}}{\partial\bar{q}_{j}}\right)+\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \frac{\partial\omega}{\partial p_{i}}\frac{\partial\sigma}{\partial q_{k}}\left(\frac{\partial p_{i}}{\partial\bar{q}_{j}}\frac{\partial q_{k}}{\partial\bar{p}_{j}}-\frac{\partial p_{i}}{\partial\bar{p}_{j}}\frac{\partial q_{k}}{\partial\bar{q}_{j}}\right)+\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \frac{\partial\omega}{\partial q_{i}}\frac{\partial\sigma}{\partial q_{k}}\left(\frac{\partial q_{i}}{\partial\bar{q}_{j}}\frac{\partial q_{k}}{\partial\bar{p}_{j}}-\frac{\partial q_{i}}{\partial\bar{p}_{j}}\frac{\partial q_{k}}{\partial\bar{q}_{j}}\right)+\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \frac{\partial\omega}{\partial p_{i}}\frac{\partial\sigma}{\partial p_{k}}\left(\frac{\partial p_{i}}{\partial\bar{q}_{j}}\frac{\partial p_{k}}{\partial\bar{p}_{j}}-\frac{\partial p_{i}}{\partial\bar{p}_{j}}\frac{\partial p_{k}}{\partial\bar{q}_{j}}\right)\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\partial\omega}{\partial q_{i}}\frac{\partial\sigma}{\partial p_{k}}\left\{ q_{i},p_{k}\right\} +\frac{\partial\omega}{\partial p_{i}}\frac{\partial\sigma}{\partial q_{k}}\left\{ p_{i},q_{k}\right\} +\nonumber$ $\displaystyle$ $\displaystyle$ $\displaystyle \frac{\partial\omega}{\partial q_{i}}\frac{\partial\sigma}{\partial q_{k}}\left\{ q_{i},q_{k}\right\} +\frac{\partial\omega}{\partial p_{i}}\frac{\partial\sigma}{\partial p_{k}}\left\{ p_{i},p_{k}\right\} \ \ \ \ \ (7)$

For a canonical transformation, the Poisson brackets in the last equation satisfy

 $\displaystyle \left\{ q_{i},p_{k}\right\}$ $\displaystyle =$ $\displaystyle -\left\{ p_{i},q_{k}\right\} =\delta_{ik}\ \ \ \ \ (8)$ $\displaystyle \left\{ q_{i},q_{k}\right\}$ $\displaystyle =$ $\displaystyle \left\{ p_{i},p_{k}\right\} =0 \ \ \ \ \ (9)$

[Actually, we had worked out these conditions for the barred coordinates in terms of the original coordinates, but since the transformation is invertible and both sets of coordinates are canonical, the Poisson brackets work either way.] Applying these conditions to the above, we find

 $\displaystyle \left\{ \omega,\sigma\right\} _{\bar{q},\bar{p}}$ $\displaystyle =$ $\displaystyle \left(\frac{\partial\omega}{\partial q_{i}}\frac{\partial\sigma}{\partial p_{k}}-\frac{\partial\omega}{\partial p_{i}}\frac{\partial\sigma}{\partial p_{k}}\right)\delta_{ik}\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\partial\omega}{\partial q_{i}}\frac{\partial\sigma}{\partial p_{i}}-\frac{\partial\omega}{\partial p_{i}}\frac{\partial\sigma}{\partial p_{i}}\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\{ \omega,\sigma\right\} _{q,p} \ \ \ \ \ (12)$

Thus the Poisson bracket is invariant under a canonical transformation.

# Canonical transformations: a few more examples

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.7; Exercises 02.07.06 – 02.07.07, 02.07.08(4).

Here are a few more examples of canonical variable transformations.

Example 1 First, we revisit the two-body problem, in which we simplified the problem by transforming from the coordinates ${\mathbf{r}_{1}}$ and ${\mathbf{r}_{2}}$ of the masses ${m_{1}}$ and ${m_{2}}$ to two new position vectors:

 $\displaystyle \mathbf{r}$ $\displaystyle \equiv$ $\displaystyle \mathbf{r}_{1}-\mathbf{r}_{2}\ \ \ \ \ (1)$ $\displaystyle \mathbf{r}_{CM}$ $\displaystyle \equiv$ $\displaystyle \frac{m_{1}\mathbf{r}_{1}+m_{2}\mathbf{r}_{2}}{M} \ \ \ \ \ (2)$

Here ${M\equiv m_{1}+m_{2}}$ is the total mass, ${\mathbf{r}}$ is the relative position, and ${\mathbf{r}_{CM}}$ is the position of the centre of mass. The conjugate momenta in the original system are

$\displaystyle \mathbf{p}_{i}=m\dot{\mathbf{r}}_{i} \ \ \ \ \ (3)$

The conjugate momenta transform according to

 $\displaystyle \mathbf{p}_{CM}$ $\displaystyle =$ $\displaystyle M\mathbf{r}_{CM}=\mathbf{p}_{1}+\mathbf{p}_{2}\ \ \ \ \ (4)$ $\displaystyle \mathbf{p}$ $\displaystyle =$ $\displaystyle \mu\dot{\mathbf{r}}\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m_{2}\mathbf{p}_{1}-m_{1}\mathbf{p}_{2}}{M} \ \ \ \ \ (6)$

where ${\mu=m_{1}m_{2}/M}$ is the reduced mass.

To check that this is a canonical transformation, we need to calculate the Poisson brackets. To make things easier, note that the new coordinates depend only on the old coordinates (and not on the momenta), and conversely, the new momenta depend only on the old momenta (and not on the coordinates). Since the Poisson brackets ${\left\{ \overline{q}_{i},\overline{q}_{j}\right\} }$and ${\left\{ \overline{p}_{i},\overline{p}_{j}\right\} }$all involve taking derivatives of coordinates with respect to momenta (in the first case) or momenta with respect to coordinates (in the second case), all these brackets are zero. We need, therefore, to check only the mixed brackets between coordinates and momenta.

Because we’re dealing with 3-d vector equations, there are 3 components to each vector and to be thorough, we need to calculate all possible brackets between all pairs of components. However, if we do the ${x}$ component of each, it should be obvious that the ${y}$ and ${z}$ components behave in the same way.

First, consider

$\displaystyle \left\{ r_{x},p_{x}\right\} =\sum_{i}\left(\frac{\partial r_{x}}{\partial q_{i}}\frac{\partial p_{x}}{\partial p_{i}}-\frac{\partial r_{x}}{\partial p_{i}}\frac{\partial p_{x}}{\partial q_{i}}\right) \ \ \ \ \ (7)$

In the RHS, the term ${q_{i}}$ stands for all 6 components of the original position vectors, that is ${q_{i}=\left\{ r_{1x},r_{1y},\ldots,r_{2z}\right\} }$ and the term ${p_{i}}$ in the denominators refers to all 6 components of the original momentum vectors. The ${p_{x}}$ in the numerators refers to the ${x}$ component of ${\mathbf{p}}$ in 6. Hopefully this won’t cause too much confusion.

The second term on the RHS is zero because it involves derivatives of coordinates with respect to momenta (and vice versa). In the first term, ${r_{x}}$ depends only the ${x}$ components of ${\mathbf{r}_{1}}$ and ${\mathbf{r}_{2}}$, and ${p_{x}}$ depends only on the ${x}$ components of ${\mathbf{p}_{1}}$and ${\mathbf{p}_{2}}$, so we have

 $\displaystyle \left\{ r_{x},p_{x}\right\}$ $\displaystyle =$ $\displaystyle \frac{\partial r_{x}}{\partial r_{1x}}\frac{\partial p_{x}}{\partial p_{1x}}+\frac{\partial r_{x}}{\partial r_{2x}}\frac{\partial p_{x}}{\partial p_{2x}}\ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(1\right)\frac{m_{2}}{M}+\left(-1\right)\left(-\frac{m_{1}}{M}\right)\ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m_{1}+m_{2}}{M}\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 1 \ \ \ \ \ (11)$

The same result is obtained for the ${y}$ and ${z}$ components. If we look at mixing two different components, we have, for example

$\displaystyle \left\{ r_{x},p_{y}\right\} =\frac{\partial r_{x}}{\partial r_{1x}}\frac{\partial p_{y}}{\partial p_{1x}}+\frac{\partial r_{x}}{\partial r_{2x}}\frac{\partial p_{y}}{\partial p_{2x}}+\frac{\partial r_{x}}{\partial r_{1y}}\frac{\partial p_{y}}{\partial p_{1y}}+\frac{\partial r_{x}}{\partial r_{2y}}\frac{\partial p_{y}}{\partial p_{2y}}=0 \ \ \ \ \ (12)$

This is zero because each term in the sum contains a derivative of an ${x}$ component with respect to a ${y}$ component (or vice versa), all of which are zero.

For the centre of mass components, we have

 $\displaystyle \left\{ r_{CMx},p_{CMx}\right\}$ $\displaystyle =$ $\displaystyle \frac{\partial r_{CMx}}{\partial r_{1x}}\frac{\partial p_{CMx}}{\partial p_{1x}}+\frac{\partial r_{CMx}}{\partial r_{2x}}\frac{\partial p_{CMx}}{\partial p_{2x}}\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{m_{1}}{M}\left(1\right)+\frac{m_{2}}{M}\left(1\right)\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 1\ \ \ \ \ (15)$ $\displaystyle \left\{ r_{CMx},p_{CMy}\right\}$ $\displaystyle =$ $\displaystyle \frac{\partial r_{CMx}}{\partial r_{1x}}\frac{\partial p_{CMy}}{\partial p_{1x}}+\frac{\partial r_{CMx}}{\partial r_{2x}}\frac{\partial p_{CMy}}{\partial p_{2x}}+\frac{\partial r_{CMx}}{\partial r_{1y}}\frac{\partial p_{CMy}}{\partial p_{1y}}+\frac{\partial r_{CMx}}{\partial r_{2y}}\frac{\partial p_{CMy}}{\partial p_{2y}}\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (17)$

where the last bracket is zero for the same reason as ${\left\{ r_{x},p_{y}\right\} }$: we’re mixing ${x}$ and ${y}$ in the derivatives. Again, it should be obvious that the brackets for the other combinations of ${x}$, ${y}$ and ${z}$ components work out the same way.

Example 2 A bizarre transformation of variables in one dimension is given by

 $\displaystyle \overline{q}$ $\displaystyle =$ $\displaystyle \ln\frac{\sin p}{q}=\ln\sin p-\ln q\ \ \ \ \ (18)$ $\displaystyle \overline{p}$ $\displaystyle =$ $\displaystyle q\cot p \ \ \ \ \ (19)$

To show this is canonical, we need calculate only ${\left\{ \overline{q},\overline{p}\right\} }$ (since the Poisson bracket of a function with itself is always zero, we have ${\left\{ \overline{q},\overline{q}\right\} =\left\{ \overline{p},\overline{p}\right\} =0}$). We need one rather obscure derivative of a trig function.

 $\displaystyle \frac{d}{dp}\cot p$ $\displaystyle =$ $\displaystyle \frac{d}{dp}\left(\frac{\cos p}{\sin p}\right)\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{-\sin^{2}p-\cos^{2}p}{\sin^{2}p}\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -1-\cot^{2}p \ \ \ \ \ (22)$

We get

 $\displaystyle \left\{ \overline{q},\overline{p}\right\}$ $\displaystyle =$ $\displaystyle \frac{\partial\overline{q}}{\partial q}\frac{\partial\overline{p}}{\partial p}-\frac{\partial\overline{q}}{\partial p}\frac{\partial\overline{p}}{\partial q}\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(-\frac{1}{q}\right)\left(q\left(-1-\cot^{2}p\right)\right)-\frac{\cos p}{\sin p}\cot p\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 1+\cot^{2}p-\cot^{2}p\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 1 \ \ \ \ \ (26)$

Thus the transformation is canonical.

Example 3 Finally, we return to the point transformation, which is given in general by

 $\displaystyle \overline{q}_{i}$ $\displaystyle =$ $\displaystyle \overline{q}_{i}\left(q_{1},\ldots,q_{n}\right)\ \ \ \ \ (27)$ $\displaystyle \overline{p}_{i}$ $\displaystyle =$ $\displaystyle \sum_{j}\frac{\partial q_{j}}{\partial\overline{q}_{i}}p_{j} \ \ \ \ \ (28)$

In this case, the coordinate transformation to ${\overline{q}}$ is completely arbitrary, but the momentum transformation must follow the formula given. The derivatives ${\frac{\partial q_{i}}{\partial\overline{q}_{j}}}$ in the formula for ${\overline{p}_{i}}$ are taken at constant ${\overline{q}}$. As in the earlier examples, since the coordinate formulas depend only on the old coordinates, and the momentum formulas depend only on the old momenta, the Poisson brackets satisfy

$\displaystyle \left\{ \overline{q}_{i},\overline{q}_{j}\right\} =\left\{ \overline{p}_{i},\overline{p}_{j}\right\} =0 \ \ \ \ \ (29)$

For the mixed brackets, we have

 $\displaystyle \left\{ \overline{q}_{i},\overline{p}_{j}\right\}$ $\displaystyle =$ $\displaystyle \sum_{k}\left(\frac{\partial\overline{q}_{i}}{\partial q_{k}}\frac{\partial\overline{p}_{j}}{\partial p_{k}}-\frac{\partial\overline{q}_{i}}{\partial p_{k}}\frac{\partial\overline{p}_{j}}{\partial q_{k}}\right)\ \ \ \ \ (30)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{k}\frac{\partial\overline{q}_{i}}{\partial q_{k}}\frac{\partial q_{k}}{\partial\overline{q}_{j}}\ \ \ \ \ (31)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\partial\overline{q}_{i}}{\partial\overline{q}_{j}}\ \ \ \ \ (32)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \delta_{ij} \ \ \ \ \ (33)$

The second term in the first line is zero (mixed derivatives again). We used 28 to calculate the derivative ${\frac{\partial\overline{p}_{j}}{\partial p_{k}}}$ and get the second line and then notice that the sum is an expansion of the chain rule for the derivative in line 3. Since ${\overline{q}_{i}}$ and ${\overline{q}_{j}}$ are independent variables, the result is that given in the last line. Thus a point transformation is a canonical transformation.