# Hamiltonian formalism and Legendre transformations

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.5; Exercise 2.5.1.

The Lagrangian formulation of classical mechanics is one of two principal formalisms used to obtain equations of motion for a system. The other method is the Hamiltonian formalism. The main difference between the two methods is that the Lagrangian treats the generalized coordinates ${q_{i}}$ and their respective velocities ${\dot{q}_{i}}$ as the independent variables, while in the Hamiltonian formalism, the coordinates and their associated momenta are the independent variables. The momentum ${p_{i}}$ corresponding to a coordinate ${q_{i}}$ is defined by

$\displaystyle p_{i}\equiv\frac{dL}{d\dot{q}_{i}} \ \ \ \ \ (1)$

The Lagrangian is replaced by a function ${H\left(q,p\right)}$ (where we’re using unsubscripted variables ${q}$ and ${p}$ to represent the sets of coordinates and momenta) with the property that

$\displaystyle \dot{q}_{i}=\frac{\partial H}{\partial p_{i}} \ \ \ \ \ (2)$

The method for transforming from the Lagrangian picture to the Hamiltonian picture is known as a Legendre transformation and works as follows. Suppose we start with a function ${f\left(x_{1},x_{2},\ldots,x_{n}\right)}$ (here, the ${x_{i}}$ can be any independent variables; we’re not considering coordinates explicitly yet) and we want to replace a subset ${\left\{ x_{i},i=1\ldots,j\right\} }$ with different variables ${u_{i}}$, where

$\displaystyle u_{i}\equiv\frac{\partial f}{\partial x_{i}} \ \ \ \ \ (3)$

We now construct the function

$\displaystyle g\left(u_{1},\ldots,u_{j},x_{j+1},\ldots,x_{n}\right)\equiv\sum_{i=1}^{j}u_{i}x_{i}-f\left(x_{1},\ldots,x_{n}\right) \ \ \ \ \ (4)$

We’re assuming that all the ${x_{i}}$ in the set ${\left\{ x_{i},i=1\ldots,j\right\} }$ can be written as functions of ${\left\{ u_{1},\ldots,u_{j},x_{j+1},\ldots,x_{n}\right\} }$. In other words, when written out in full, 4 contains only the variables ${\left\{ u_{1},\ldots,u_{j},x_{j+1},\ldots,x_{n}\right\} }$. We can now take the derivative:

 $\displaystyle \frac{\partial g}{\partial u_{i}}$ $\displaystyle =$ $\displaystyle x_{i}+\sum_{i=1}^{j}\left[u_{i}\frac{\partial x_{i}}{\partial u_{i}}-\frac{\partial f}{\partial x_{i}}\frac{\partial x_{i}}{\partial u_{i}}\right]\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{i}+\sum_{i=1}^{j}\left[u_{i}\frac{\partial x_{i}}{\partial u_{i}}-u_{i}\frac{\partial x_{i}}{\partial u_{i}}\right]\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{i} \ \ \ \ \ (7)$

where the second line follows from the definition 3.

To move from the Lagrangian formalism to the Hamiltonian formalism, the Lagrangian plays the role of ${f}$, the generalized velocities ${\dot{q}_{i}}$ are the variables ${\left\{ x_{i},i=1\ldots,j\right\} }$ to be replaced, and the Hamiltonian is the new function ${g}$. That is, we have

$\displaystyle H\left(q,p\right)=\sum_{i=1}^{n}p_{i}\dot{q}_{i}-L\left(q,\dot{q}\right) \ \ \ \ \ (8)$

There are a total of ${n}$ momenta ${p_{i}}$ and ${n}$ coordinates ${q_{i}}$, for a total of ${2n}$ independent coordinates. In 8, it is assumed that we can express all the velocities ${\dot{q}_{i}}$ as functions of ${q_{i}}$ and ${p_{i}}$. With these definitions, we can see by following through the derivation of 7 that 2 is satisfied.

We can get another equation by considering the derivative

 $\displaystyle \frac{\partial H}{\partial q_{i}}$ $\displaystyle =$ $\displaystyle \sum_{j=1}^{n}p_{j}\frac{\partial\dot{q}_{j}}{\partial q_{i}}-\frac{\partial L}{\partial q_{i}}-\sum_{j=1}^{n}\frac{\partial L}{\partial\dot{q}_{j}}\frac{\partial\dot{q}_{j}}{\partial q_{i}}\ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{j=1}^{n}\left[p_{j}\frac{\partial\dot{q}_{j}}{\partial q_{i}}-\frac{\partial L}{\partial\dot{q}_{j}}\frac{\partial\dot{q}_{j}}{\partial q_{i}}\right]-\frac{\partial L}{\partial q_{i}}\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{j=1}^{n}\left[p_{j}\frac{\partial\dot{q}_{j}}{\partial q_{i}}-p_{j}\frac{\partial\dot{q}_{j}}{\partial q_{i}}\right]-\frac{\partial L}{\partial q_{i}}\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\frac{\partial L}{\partial q_{i}}\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\frac{d}{dt}\frac{\partial L}{\partial\dot{q}_{i}}\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -\dot{p}_{i} \ \ \ \ \ (14)$

In the third line, we used 1, in the fifth line we used the Euler-Lagrange equation

$\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{q}_{i}}-\frac{\partial L}{\partial q_{i}}=0 \ \ \ \ \ (15)$

and in the last line, we used 1 again. We thus get Hamilton’s canonical equations:

 $\displaystyle \frac{\partial H}{\partial p_{i}}$ $\displaystyle =$ $\displaystyle \dot{q}_{i}\ \ \ \ \ (16)$ $\displaystyle -\frac{\partial H}{\partial q_{i}}$ $\displaystyle =$ $\displaystyle \dot{p}_{i} \ \ \ \ \ (17)$

[As an aside at this point, I was (and still am) unsure exactly what the term ‘canonical’ means in this, or in almost any other, context. Google is not very helpful in this respect, as it appears that nobody else really knows where the term came from. According to Wikepedia, the term ‘canonical’ is used to describe equations in several areas of mathematics, physics and even computer science, but ultimately the term appears to originate in religion, as in ‘canon law’, which is a system of laws created by the Catholic church. Presumably the term in physics is used to describe some equation or principle which is widely applicable and general. Any other thoughts are welcome in the comments.]

In cases where the potential energy doesn’t depend on velocity, the Lagrangian is ${T-V}$, where ${T}$ is the kinetic energy. The Hamiltonian (as you’ve probably guessed) can be interpreted as the total energy of such a system, as we can see as follows.

Using rectangular coordinates, where each mass ${m_{i}}$ has a kinetic energy ${T_{i}=\frac{1}{2}m_{i}\dot{x}_{i}^{2}}$ (this is true in one dimension; to extend to 3 dimensions, we write ${T_{i}=\frac{1}{2}m_{i}\left(\dot{x}_{i}^{2}+\dot{y}_{i}^{2}+\dot{z}_{i}^{2}\right)}$ and the same argument follows). Thus the momentum is

$\displaystyle p_{i}=\frac{\partial L}{\partial\dot{x}_{i}}=\frac{\partial T}{\partial\dot{x}_{i}}=m_{i}\dot{x}_{i} \ \ \ \ \ (18)$

Thus the first term in 8 is

$\displaystyle \sum_{i=1}^{n}p_{i}\dot{q}_{i}=\sum_{i=1}^{n}m_{i}\dot{x}_{i}^{2}=2T \ \ \ \ \ (19)$

and the Hamiltonian is

$\displaystyle H=2T-L=2T-T+V=T+V \ \ \ \ \ (20)$

Now consider a more general kinetic energy defined as

$\displaystyle T=\sum_{i}\sum_{j}T_{ij}\left(q\right)\dot{q}_{i}\dot{q}_{j} \ \ \ \ \ (21)$

That is, ${T}$ is a matrix that depends on the positions of the various masses. We have

 $\displaystyle p_{k}$ $\displaystyle =$ $\displaystyle \frac{\partial L}{\partial\dot{q}_{k}}=\frac{\partial T}{\partial\dot{q}_{k}}\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i}\sum_{j}T_{ij}\left(q\right)\frac{\partial\dot{q}_{i}}{\partial\dot{q}_{k}}\dot{q}_{j}+\sum_{i}\sum_{j}T_{ij}\left(q\right)\dot{q}_{i}\frac{\partial\dot{q}_{j}}{\partial\dot{q}_{k}}\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{i}\sum_{j}T_{ij}\left(q\right)\delta_{ik}\dot{q}_{j}+\sum_{i}\sum_{j}T_{ij}\left(q\right)\dot{q}_{i}\delta_{jk}\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{j}T_{kj}\left(q\right)\dot{q}_{j}+\sum_{i}T_{ik}\left(q\right)\dot{q}_{i}\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{j}\left(T_{kj}+T_{jk}\right)\dot{q}_{j} \ \ \ \ \ (26)$

The first term in 8 now becomes

 $\displaystyle \sum_{k}p_{k}\dot{q}_{k}$ $\displaystyle =$ $\displaystyle \sum_{k}\sum_{j}\left(T_{kj}+T_{jk}\right)\dot{q}_{j}\dot{q}_{k}\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 2T \ \ \ \ \ (28)$

where the last line follows because the RHS of the first line is symmetric under the exchange of ${j}$ and ${k}$.

# Lagrangian for the two-body problem

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.3; Exercise 2.3.1.

A fundamental problem in classical physics is the two-body problem, in which two masses interact via a potential ${V\left(\mathbf{r}_{1}-\mathbf{r}_{2}\right)}$ that depends only on the relative positions of the two masses. In such a case, the Lagrangian can be decoupled so that the problem gets reduced to a one-body problem.

The Euler-Lagrange equations are

$\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{q}_{i}}-\frac{\partial L}{\partial q_{i}}=0 \ \ \ \ \ (1)$

where ${q_{i}}$ and ${\dot{q_{i}}}$ are the generalized coordinates and velocities, respectively. For systems where the potential energy ${V\left(q_{i}\right)}$ is independent of the velocities ${\dot{q}_{i}}$, the Lagrangian can be written as

$\displaystyle L=T-V \ \ \ \ \ (2)$

where ${T}$ is the kinetic energy. In terms of the absolute positions and velocities, we have

$\displaystyle L=\frac{1}{2}m_{1}\left|\dot{\mathbf{r}}_{1}\right|^{2}+\frac{1}{2}m_{2}\left|\dot{\mathbf{r}}_{2}\right|^{2}-V\left(\mathbf{r}_{1}-\mathbf{r}_{2}\right) \ \ \ \ \ (3)$

To decouple this equation, we define two new position vectors:

 $\displaystyle \mathbf{r}$ $\displaystyle \equiv$ $\displaystyle \mathbf{r}_{1}-\mathbf{r}_{2}\ \ \ \ \ (4)$ $\displaystyle \mathbf{r}_{CM}$ $\displaystyle \equiv$ $\displaystyle \frac{m_{1}\mathbf{r}_{1}+m_{2}\mathbf{r}_{2}}{m_{1}+m_{2}} \ \ \ \ \ (5)$

Here ${\mathbf{r}}$ is the relative position, and ${\mathbf{r}_{CM}}$ is the position of the centre of mass.

We can invert these equations to get

 $\displaystyle \mathbf{r}_{1}$ $\displaystyle =$ $\displaystyle \mathbf{r}+\mathbf{r}_{2}\ \ \ \ \ (6)$ $\displaystyle \left(m_{1}+m_{2}\right)\mathbf{r}_{CM}$ $\displaystyle =$ $\displaystyle m_{1}\mathbf{r}+\left(m_{1}+m_{2}\right)\mathbf{r}_{2}\ \ \ \ \ (7)$ $\displaystyle \mathbf{r}_{2}$ $\displaystyle =$ $\displaystyle \mathbf{r}_{CM}-\frac{m_{1}}{m_{1}+m_{2}}\mathbf{r}\ \ \ \ \ (8)$ $\displaystyle \mathbf{r}_{1}$ $\displaystyle =$ $\displaystyle \mathbf{r}_{CM}-\frac{m_{2}}{m_{1}+m_{2}}\mathbf{r} \ \ \ \ \ (9)$

To decouple the Lagrangian, we insert these last two equations into 3.

 $\displaystyle m_{1}\left|\dot{\mathbf{r}}_{1}\right|^{2}$ $\displaystyle =$ $\displaystyle m_{1}\left[\dot{\mathbf{r}}_{CM}-\frac{m_{2}}{m_{1}+m_{2}}\dot{\mathbf{r}}\right]\cdot\left[\dot{\mathbf{r}}_{CM}-\frac{m_{2}}{m_{1}+m_{2}}\dot{\mathbf{r}}\right]\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle m_{1}\left|\dot{\mathbf{r}}_{CM}\right|^{2}-2\frac{m_{1}m_{2}}{m_{1}+m_{2}}\dot{\mathbf{r}}_{CM}\cdot\dot{\mathbf{r}}+m_{1}\left(\frac{m_{2}}{m_{1}+m_{2}}\right)^{2}\left|\dot{\mathbf{r}}\right|^{2}\ \ \ \ \ (11)$ $\displaystyle m_{2}\left|\dot{\mathbf{r}}_{2}\right|^{2}$ $\displaystyle =$ $\displaystyle m_{2}\left[\dot{\mathbf{r}}_{CM}+\frac{m_{1}}{m_{1}+m_{2}}\dot{\mathbf{r}}\right]\cdot\left[\dot{\mathbf{r}}_{CM}+\frac{m_{1}}{m_{1}+m_{2}}\dot{\mathbf{r}}\right]\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle m_{2}\left|\dot{\mathbf{r}}_{CM}\right|^{2}+2\frac{m_{1}m_{2}}{m_{1}+m_{2}}\dot{\mathbf{r}}_{CM}\cdot\dot{\mathbf{r}}+m_{2}\left(\frac{m_{1}}{m_{1}+m_{2}}\right)^{2}\left|\dot{\mathbf{r}}\right|^{2}\ \ \ \ \ (13)$ $\displaystyle \frac{1}{2}m_{1}\left|\dot{\mathbf{r}}_{1}\right|^{2}+\frac{1}{2}m_{2}\left|\dot{\mathbf{r}}_{2}\right|^{2}$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left(m_{1}+m_{2}\right)\left|\dot{\mathbf{r}}_{CM}\right|^{2}+\frac{1}{2}\frac{m_{1}m_{2}^{2}+m_{2}m_{1}^{2}}{\left(m_{1}+m_{2}\right)^{2}}\left|\dot{\mathbf{r}}\right|^{2}\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left(m_{1}+m_{2}\right)\left|\dot{\mathbf{r}}_{CM}\right|^{2}+\frac{1}{2}\frac{m_{1}m_{2}}{m_{1}+m_{2}}\left|\dot{\mathbf{r}}\right|^{2} \ \ \ \ \ (15)$

The Lagrangian 3 thus becomes

 $\displaystyle L$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left(m_{1}+m_{2}\right)\left|\dot{\mathbf{r}}_{CM}\right|^{2}+\frac{1}{2}\frac{m_{1}m_{2}}{m_{1}+m_{2}}\left|\dot{\mathbf{r}}\right|^{2}-V\left(\mathbf{r}\right)\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle \equiv$ $\displaystyle L_{CM}+L_{r} \ \ \ \ \ (17)$

with

 $\displaystyle L_{CM}$ $\displaystyle \equiv$ $\displaystyle \frac{1}{2}\left(m_{1}+m_{2}\right)\left|\dot{\mathbf{r}}_{CM}\right|^{2}\ \ \ \ \ (18)$ $\displaystyle L_{r}$ $\displaystyle \equiv$ $\displaystyle \frac{1}{2}\frac{m_{1}m_{2}}{m_{1}+m_{2}}\left|\dot{\mathbf{r}}\right|^{2}-V\left(\mathbf{r}\right) \ \ \ \ \ (19)$

Thus ${L}$ decouples into two Lagrangians, one of which depends only on ${\dot{\mathbf{r}}_{CM}}$ and the other of which depends only on ${\mathbf{r}}$ and ${\dot{\mathbf{r}}}$. The absence of ${\mathbf{r}_{CM}}$ means that, from 1

 $\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{r}_{i,CM}}$ $\displaystyle =$ $\displaystyle \frac{d}{dt}\frac{\partial L_{CM}}{\partial\dot{r}_{i,CM}}=\frac{m_{1}+m_{2}}{2}\frac{d\dot{r}_{i,CM}}{dt}=0\ \ \ \ \ (20)$ $\displaystyle \dot{r}_{i,CM}$ $\displaystyle =$ $\displaystyle \mbox{constant} \ \ \ \ \ (21)$

which is separately true for each component of ${\dot{\mathbf{r}}_{CM}}$, which shows that the velocity of the centre of mass is a constant, as we’d expect for an isolated two-body system with no external force.

From the other Lagrangian, we get

$\displaystyle \frac{m_{1}m_{2}}{m_{1}+m_{2}}\ddot{\mathbf{r}}=-\nabla V\left(\mathbf{r}\right) \ \ \ \ \ (22)$

which is the equation of motion of a single particle of mass ${\frac{m_{1}m_{2}}{m_{1}+m_{2}}}$, called the reduced mass. Viewed from the centre of mass frame, where ${\dot{\mathbf{r}}_{CM}=0}$, ${\mathbf{r}}$ becomes the absolute position of the reduced mass. We can transform the result back to the ‘absolute’ frame by using 4.

# Electromagnetic Lagrangian

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.2.

The Euler-Lagrange equations are

$\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{q}_{i}}-\frac{\partial L}{\partial q_{i}}=0 \ \ \ \ \ (1)$

where ${q_{i}}$ and ${\dot{q_{i}}}$ are the generalized coordinates and velocities, respectively. For systems where the potential energy ${V\left(q_{i}\right)}$ is independent of the velocities ${\dot{q}_{i}}$, the Lagrangian can be written as

$\displaystyle L=T-V \ \ \ \ \ (2)$

where ${T}$ is the kinetic energy. However, there is one important area in classical physics where the potential does depend on velocity, and that is electromagnetism.

The relation between the electric scalar potential ${\phi}$, the magnetic vector potential ${\mathbf{A}}$ and the electric and magnetic fields ${\mathbf{E}}$ and ${\mathbf{B}}$ is given by Maxwell’s equations in terms of potentials:

 $\displaystyle \mathbf{E}$ $\displaystyle =$ $\displaystyle -\nabla\phi-\frac{1}{c}\frac{\partial\mathbf{A}}{\partial t}\ \ \ \ \ (3)$ $\displaystyle \mathbf{B}$ $\displaystyle =$ $\displaystyle \nabla\times\mathbf{A} \ \ \ \ \ (4)$

[These are the forms used by Shankar, which are in Gaussian units. All my earlier posts on electromagnetism are taken from Griffiths’s book, which uses the MKS system of units, so various constants will be different in the two systems.]

The force on a charge ${q}$ due to electric and magnetic fields ${\mathbf{E}}$ and ${\mathbf{B}}$ is given by

$\displaystyle \mathbf{F}=q\left(\mathbf{E}+\frac{\mathbf{v}}{c}\times\mathbf{B}\right) \ \ \ \ \ (5)$

Shankar merely states that the correct force can be derived from 1 if we use the Lagrangian

$\displaystyle L=\frac{1}{2}m\mathbf{v}\cdot\mathbf{v}-q\phi+\frac{q}{c}\mathbf{v}\cdot\mathbf{A} \ \ \ \ \ (6)$

It appears from a bit of googling that this Lagrangian is obtained more or less by trial and error, rather than by some rigorous derivation, so it seems we just need to accept it “because it works”. The velocity ${\mathbf{v}}$ in rectangular coordinates is

 $\displaystyle \mathbf{v}$ $\displaystyle =$ $\displaystyle \left[\dot{x}_{1},\dot{x}_{2},\dot{x}_{3}\right]\ \ \ \ \ (7)$ $\displaystyle \mathbf{v}\cdot\mathbf{v}$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{3}\dot{x}_{i}^{2}\ \ \ \ \ (8)$ $\displaystyle \mathbf{v}\cdot\mathbf{A}$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{3}\dot{x}_{i}A_{i} \ \ \ \ \ (9)$

Both ${\phi}$ and ${\mathbf{A}}$ are functions of position, so depend on ${x_{i}}$.

Thus from 1, we have

$\displaystyle \frac{d}{dt}\left(m\dot{x}_{i}+\frac{q}{c}A_{i}\right)=-q\frac{\partial\phi}{\partial x_{i}}+\frac{q}{c}\frac{\partial\left(\mathbf{v}\cdot\mathbf{A}\right)}{\partial x_{i}} \ \ \ \ \ (10)$

The three equations represented here can be combined into a single vector equation by noticing that ${\frac{\partial}{\partial x_{i}}}$ are the components of the gradient.

$\displaystyle \frac{d}{dt}\left(m\mathbf{v}+\frac{q}{c}\mathbf{A}\right)=-q\nabla\phi+\frac{q}{c}\nabla\left(\mathbf{v}\cdot\mathbf{A}\right) \ \ \ \ \ (11)$

The LHS contains the total time derivative ${\frac{d\mathbf{A}}{dt}}$ which is composed of two contributions. First, ${\mathbf{A}}$ itself can be time varying, in the sense that if we stayed at the same location, the value of ${\mathbf{A}}$ at that location can vary in time. The second contribution comes from the motion of the charge so that, even if ${\mathbf{A}}$ is constant in time, the charge will perceive a change in ${\mathbf{A}}$ as it moves because ${\mathbf{A}}$ can vary over space. That is, the total derivative of the first component ${A_{1}}$ is

 $\displaystyle \frac{dA_{1}}{dt}$ $\displaystyle =$ $\displaystyle \frac{\partial A_{1}}{\partial t}+\sum_{i=1}^{3}\frac{\partial A_{1}}{\partial x_{i}}\frac{dx_{i}}{dt}\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\partial A_{1}}{\partial t}+\left(\mathbf{v}\cdot\nabla\right)A_{1} \ \ \ \ \ (13)$

The derivative of ${\mathbf{A}}$ can thus be written as

$\displaystyle \frac{d\mathbf{A}}{dt}=\frac{\partial\mathbf{A}}{\partial t}+\left(\mathbf{v}\cdot\nabla\right)\mathbf{A} \ \ \ \ \ (14)$

Plugging this into 11 and rearranging, we get

 $\displaystyle \frac{d}{dt}\left(m\mathbf{v}\right)$ $\displaystyle =$ $\displaystyle -q\nabla\phi-\frac{q}{c}\frac{\partial\mathbf{A}}{\partial t}+\frac{q}{c}\left[\nabla\left(\mathbf{v}\cdot\mathbf{A}\right)-\left(\mathbf{v}\cdot\nabla\right)\mathbf{A}\right]\ \ \ \ \ (15)$ $\displaystyle \mathbf{F}$ $\displaystyle =$ $\displaystyle -q\nabla\phi-\frac{q}{c}\frac{\partial\mathbf{A}}{\partial t}+\frac{q}{c}\left(\mathbf{v}\times\left(\nabla\times\mathbf{A}\right)\right)\ \ \ \ \ (16)$ $\displaystyle \mathbf{F}$ $\displaystyle =$ $\displaystyle q\left(\mathbf{E}+\frac{1}{c}\mathbf{v}\times\mathbf{B}\right) \ \ \ \ \ (17)$

In the second line, we used a standard vector identity:

$\displaystyle \mathbf{v}\times\left(\nabla\times\mathbf{A}\right)=\nabla\left(\mathbf{v}\cdot\mathbf{A}\right)-\left(\mathbf{v}\cdot\nabla\right)\mathbf{A} \ \ \ \ \ (18)$

Thus the Lagrangian 6 does indeed give the correct force law. The Lagrangian is not of the form ${T-V}$ because the term ${q\phi-\frac{q}{c}\mathbf{v}\cdot\mathbf{A}}$ isn’t a potential energy. In electrostatics, ${q\phi}$ is indeed potential energy, but because the magnetic force always acts perpendicular to the velocity, it does no work, so we can’t interpret ${-\frac{q}{c}\mathbf{v}\cdot\mathbf{A}}$ as some form of ‘magnetic potential energy’. The work done when moving a charge through an electromagnetic field in general depends on the path taken, so is not conservative, and we can’t write the force as the gradient of some potential.

# Lagrangian for a spherically symmetric potential energy function

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.1; Exercise 2.1.3.

We now consider a more general example of the Euler-Lagrange equations of motion

$\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{q}_{i}}-\frac{\partial L}{\partial q_{i}}=0 \ \ \ \ \ (1)$

where ${q_{i}}$ and ${\dot{q_{i}}}$ are the generalized coordinates and velocities, respectively. For systems where the potential energy ${V\left(q_{i}\right)}$ is independent of the velocities ${\dot{q}_{i}}$, the Lagrangian can be written as

$\displaystyle L=T-V \ \ \ \ \ (2)$

where ${T}$ is the kinetic energy.

Suppose we consider a system in three dimensions and use spherical coordinates to represent the position of a particle of mass ${m}$. We’ll restrict ourselves to potential energy functions that depend only on the radial distance ${r}$ from the origin, so that ${V\left(r,\theta,\phi\right)=V\left(r\right)}$. To write down the Lagrangian, we need an expression for the kinetic energy ${T}$.

An infinitesimal line element in spherical coordinates has a length ${ds}$ given by

$\displaystyle ds^{2}=dr^{2}+r^{2}\;d\theta^{2}+r^{2}\sin^{2}\theta\;d\phi^{2} \ \ \ \ \ (3)$

The square of the velocity is then given by dividing this expression through by ${dt^{2}}$, and using a dot above a symbol to indicate the derivative with respect to time ${t}$. We have

$\displaystyle v^{2}=\left(\frac{ds}{dt}\right)^{2}=\dot{r}^{2}+r^{2}\dot{\theta}^{2}+r^{2}\sin^{2}\theta\dot{\phi}^{2} \ \ \ \ \ (4)$

The Lagrangian is then

 $\displaystyle L$ $\displaystyle =$ $\displaystyle T-V\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2}m\left[\dot{r}^{2}+r^{2}\dot{\theta}^{2}+r^{2}\sin^{2}\theta\dot{\phi}^{2}\right]-V\left(r\right) \ \ \ \ \ (6)$

We now get three equations of motion by applying 1 to each coordinate in turn. For ${r}$:

 $\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{r}}$ $\displaystyle =$ $\displaystyle \frac{\partial L}{\partial r}\ \ \ \ \ (7)$ $\displaystyle \ddot{r}$ $\displaystyle =$ $\displaystyle r\dot{\theta}^{2}+r\sin^{2}\theta\dot{\phi}^{2}-\frac{1}{m}\frac{dV}{dr} \ \ \ \ \ (8)$

For ${\theta}$:

 $\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{\theta}}$ $\displaystyle =$ $\displaystyle \frac{\partial L}{\partial\theta}\ \ \ \ \ (9)$ $\displaystyle \frac{d}{dt}\left(mr^{2}\dot{\theta}\right)$ $\displaystyle =$ $\displaystyle mr^{2}\sin\theta\cos\theta\dot{\phi}^{2}\ \ \ \ \ (10)$ $\displaystyle 2r\dot{r}\dot{\theta}+r^{2}\ddot{\theta}$ $\displaystyle =$ $\displaystyle r^{2}\sin\theta\cos\theta\dot{\phi}^{2}\ \ \ \ \ (11)$ $\displaystyle \ddot{\theta}$ $\displaystyle =$ $\displaystyle -\frac{2}{r}\dot{r}\dot{\theta}+\sin\theta\cos\theta\dot{\phi}^{2} \ \ \ \ \ (12)$

For ${\phi}$:

 $\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{\phi}}$ $\displaystyle =$ $\displaystyle \frac{\partial L}{\partial\phi}\ \ \ \ \ (13)$ $\displaystyle \frac{d}{dt}\left(mr^{2}\sin^{2}\theta\;\dot{\phi}\right)$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (14)$ $\displaystyle 2r\dot{r}\sin^{2}\theta\dot{\phi}+2r^{2}\sin\theta\cos\theta\dot{\theta}\dot{\phi}+r^{2}\sin^{2}\theta\ddot{\phi}$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (15)$ $\displaystyle \ddot{\phi}$ $\displaystyle =$ $\displaystyle -\frac{2}{r}\dot{r}\dot{\phi}-2\cot\theta\dot{\theta}\dot{\phi} \ \ \ \ \ (16)$

Although the only equation in which the potential energy ${V}$ has a direct effect is the one for ${r}$, these three equations constitute a system of non-linear coupled differential equations so in the general case, they can be difficult to solve.

One important special case is that of a path that lies in the plane ${\theta=\frac{\pi}{2}}$, such as the orbit of a planet around the sun. In that case ${\dot{\theta}=0}$, ${\sin\theta=1}$ and ${\cos\theta=0}$, so the equations simplify to

 $\displaystyle \ddot{r}$ $\displaystyle =$ $\displaystyle r\dot{\phi}^{2}-\frac{1}{m}\frac{dV}{dr}\ \ \ \ \ (17)$ $\displaystyle \ddot{\theta}$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (18)$ $\displaystyle \ddot{\phi}$ $\displaystyle =$ $\displaystyle -\frac{2}{r}\dot{r}\dot{\phi} \ \ \ \ \ (19)$

# Lagrangians for harmonic oscillators

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 2.1; Exercises 2.1.1 – 2.1.2.

The Euler-Lagrange equations of motion, derived from the principle of least action are

$\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{q}_{i}}-\frac{\partial L}{\partial q_{i}}=0 \ \ \ \ \ (1)$

where ${q_{i}}$ and ${\dot{q_{i}}}$ are the generalized coordinates and velocities, respectively. Here are a couple of simple examples of how these equations can be used to derive equations of motion.

Example 1 The harmonic oscillator. We have a mass ${m}$ sliding on a frictionless horizontal surface with a spring of spring constant ${k}$ connected between one end of the mass and a fixed support. The horizontal displacement of the mass from its equilibrium position is given by ${x}$, with ${x<0}$ when the mass moves to the left, compressing the spring, and ${x>0}$ when it moves to the right, stretching the spring.

For systems where the potential energy ${V\left(q_{i}\right)}$ is independent of the velocities ${\dot{q}_{i}}$, the Lagrangian can be written as

$\displaystyle L=T-V \ \ \ \ \ (2)$

where ${T}$ is the kinetic energy. In the case of the mass

 $\displaystyle T$ $\displaystyle =$ $\displaystyle \frac{1}{2}m\dot{x}^{2}\ \ \ \ \ (3)$ $\displaystyle V$ $\displaystyle =$ $\displaystyle \frac{1}{2}kx^{2}\ \ \ \ \ (4)$ $\displaystyle L$ $\displaystyle =$ $\displaystyle \frac{1}{2}m\dot{x}^{2}-\frac{1}{2}kx^{2} \ \ \ \ \ (5)$

The equation of motion is

 $\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{q}_{i}}-\frac{\partial L}{\partial q_{i}}$ $\displaystyle =$ $\displaystyle m\ddot{x}+kx=0\ \ \ \ \ (6)$ $\displaystyle m\ddot{x}$ $\displaystyle =$ $\displaystyle -kx \ \ \ \ \ (7)$

which is the familiar equation for the force on the mass equal to ${-kx}$.

Example 2 We can revisit the problem of two masses coupled by three springs, as described earlier. In this case, we have two coordinates ${x_{1}}$ and ${x_{2}}$. The total kinetic energy is

$\displaystyle T=\frac{1}{2}m\left(\dot{x}_{1}^{2}+\dot{x}_{2}^{2}\right) \ \ \ \ \ (8)$

The total potential energy is

 $\displaystyle V$ $\displaystyle =$ $\displaystyle \frac{1}{2}kx_{1}^{2}+\frac{1}{2}k\left(x_{2}-x_{1}\right)^{2}+\frac{1}{2}kx_{2}^{2}\ \ \ \ \ (9)$ $\displaystyle$ $\displaystyle =$ $\displaystyle k\left(x_{1}^{2}+x_{2}^{2}-x_{1}x_{2}\right) \ \ \ \ \ (10)$

The Lagrangian and equations of motion are then

 $\displaystyle L$ $\displaystyle =$ $\displaystyle \frac{1}{2}m\left(\dot{x}_{1}^{2}+\dot{x}_{2}^{2}\right)-k\left(x_{1}^{2}+x_{2}^{2}-x_{1}x_{2}\right)\ \ \ \ \ (11)$ $\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{x}_{1}}-\frac{\partial L}{\partial x_{1}}$ $\displaystyle =$ $\displaystyle m\ddot{x}_{1}+2kx_{1}-kx_{2}=0\ \ \ \ \ (12)$ $\displaystyle \frac{d}{dt}\frac{\partial L}{\partial\dot{x}_{2}}-\frac{\partial L}{\partial x_{2}}$ $\displaystyle =$ $\displaystyle m\ddot{x}_{2}+2kx_{2}-kx_{1}=0 \ \ \ \ \ (13)$

This gives the same equations of motion we had earlier.

 $\displaystyle \ddot{x}_{1}$ $\displaystyle =$ $\displaystyle -2\frac{k}{m}x_{1}+\frac{k}{m}x_{2}\ \ \ \ \ (14)$ $\displaystyle \ddot{x}_{2}$ $\displaystyle =$ $\displaystyle \frac{k}{m}x_{1}-2\frac{k}{m}x_{2} \ \ \ \ \ (15)$

# Vibrating string – normal mode analysis

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.10. Exercise 1.10.4.

I’ll run through Shakar’s example 1.10.1 on a vibrating string, so we can see an application of the theory of infinite dimensional spaces. Suppose we have a string (for example, a violin string) that is anchored at ${x=0}$ and ${x=L}$. If we pluck the string at ${t=0}$, its future position is governed by the wave equation:

$\displaystyle \frac{\partial^{2}\psi}{\partial t^{2}}=\frac{\partial^{2}\psi}{\partial x^{2}} \ \ \ \ \ (1)$

[For simplicity, we’re taking the wave speed to be 1, which is why there’s no constant in this equation.] We can write this as an operator equation using the ${K=-i\frac{\partial}{\partial x}}$ operator we introduced last time. Viewing the wave as a vector in the ${\left|x\right\rangle }$ basis, we then have

$\displaystyle \left|\ddot{\psi}\left(t\right)\right\rangle =-K^{2}\left|\psi\left(t\right)\right\rangle \ \ \ \ \ (2)$

The idea is now to look at the RHS of this equation and diagonalize the ${K^{2}}$ operator by finding its eigenvalues and eigenvectors. Working in the ${\left|x\right\rangle }$ basis, we can write the eigenvalue problem as

 $\displaystyle \left\langle x\left|K^{2}\right|\psi\right\rangle$ $\displaystyle =$ $\displaystyle k^{2}\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (3)$ $\displaystyle -\frac{d^{2}\psi\left(x\right)}{dx^{2}}$ $\displaystyle =$ $\displaystyle k^{2}\psi\left(x\right) \ \ \ \ \ (4)$

This has the general solution

$\displaystyle \psi\left(x\right)=A\cos kx+B\sin kx \ \ \ \ \ (5)$

where ${A}$ and ${B}$ are constants of integration, to be determined by the boundary conditions. Since the ends of the string are fixed at ${\psi\left(0\right)=\psi\left(L\right)=0}$, we must have ${A=0}$, and we then have

$\displaystyle B\sin kL=0 \ \ \ \ \ (6)$

In order to avoid a trivial solution where ${\psi\left(x\right)=0}$ everywhere, we have ${B\ne0}$, so

$\displaystyle kL=m\pi \ \ \ \ \ (7)$

for ${m=1,2,3,\ldots}$. We therefore have the discrete set of solutions

$\displaystyle \psi_{m}\left(x\right)=B\sin\frac{m\pi x}{L} \ \ \ \ \ (8)$

We can choose ${B=\sqrt{\frac{2}{L}}}$ to normalize the solution so that

$\displaystyle \int\psi_{m}\left(x\right)\psi_{m^{\prime}}\left(x\right)dx=\delta_{mm^{\prime}} \ \ \ \ \ (9)$

So far, this is the same as the solution to the infinite square well in quantum mechanics, but now we follow a different path, since we need to satisfy the wave equation 1, and not Schrödinger’s equation, which is first order in time.

We now have two different orthonormal bases that can be used to represent the states of the string. The ${\left|x\right\rangle }$ basis is continuous, consisting of all real values of ${x}$ in the interval ${\left[0,L\right]}$. The other basis is also infinite, but it is discrete, as it consists of the possible values of ${k}$ as given by 7. Since ${k}$ is determined by the integer ${m}$, we’ll call this the ${\left|m\right\rangle }$ basis. In the ${\left|x\right\rangle }$ basis, the state ${\left|m\right\rangle }$ is given by 8:

$\displaystyle \left\langle x\left|m\right.\right\rangle =\sqrt{\frac{2}{L}}\sin\frac{m\pi x}{L} \ \ \ \ \ (10)$

The general solution as a function of time is an abstract vector ${\left|\psi\left(t\right)\right\rangle }$. We can project this onto the ${\left|x\right\rangle }$ basis, when we would get

$\displaystyle \left\langle x\left|\psi\left(t\right)\right.\right\rangle =\psi\left(x,t\right) \ \ \ \ \ (11)$

Or we can project it onto the ${\left|m\right\rangle }$ basis, which gives that component of ${\left|\psi\left(t\right)\right\rangle }$ that is composed of a wave with index ${m}$. In the ${\left|m\right\rangle }$ basis, the operator ${K^{2}}$ is diagonal, since

$\displaystyle K^{2}\psi_{m}\left(x\right)=-\sqrt{\frac{2}{L}}\frac{d^{2}}{dx^{2}}\sin\frac{m\pi x}{L}=\left(\frac{m\pi}{L}\right)^{2}\sqrt{\frac{2}{L}}\sin\frac{m\pi x}{L}=\left(\frac{m\pi}{L}\right)^{2}\psi_{m}\left(x\right)=k^{2}\psi_{m}\left(x\right) \ \ \ \ \ (12)$

We can write the projection of ${\left|\psi\left(t\right)\right\rangle }$ onto the ${\left|m\right\rangle }$ basis as ${\left\langle m\left|\psi\left(t\right)\right.\right\rangle }$. Going back to 2, we see that each component ${\left\langle m\left|\psi\left(t\right)\right.\right\rangle }$ individually satisfies the differential equation

$\displaystyle \frac{d^{2}}{dt^{2}}\left\langle m\left|\psi\left(t\right)\right.\right\rangle =-\left(\frac{m\pi}{L}\right)^{2}\left\langle m\left|\psi\left(t\right)\right.\right\rangle \ \ \ \ \ (13)$

This is the same equation as 4, except now we’re dealing with a time derivative instead of a space derivative. The solution is therefore of the same form:

$\displaystyle \left\langle m\left|\psi\left(t\right)\right.\right\rangle =C\cos kt+D\sin kt \ \ \ \ \ (14)$

To find ${C}$ and ${D}$, we now use the initial conditions at ${t=0}$. We’ll assume that the string is held in some fixed shape and then released at ${t=0}$, which means that we need to specify this initial shape as ${\left\langle m\left|\psi\left(0\right)\right.\right\rangle }$, and that the initial velocity is zero. The latter condition means that

$\displaystyle \frac{d}{dt}\left\langle m\left|\psi\left(0\right)\right.\right\rangle =-kC\sin k0+kD\cos k0=0 \ \ \ \ \ (15)$

which gives us ${D=0}$, so we have

$\displaystyle \left\langle m\left|\psi\left(t\right)\right.\right\rangle =\left\langle m\left|\psi\left(0\right)\right.\right\rangle \cos kt=\left\langle m\left|\psi\left(0\right)\right.\right\rangle \cos\frac{m\pi t}{L} \ \ \ \ \ (16)$

The general solution is therefore found by inserting the unit operator in the form ${1=\sum_{m}\left|m\right\rangle \left\langle m\right|}$:

 $\displaystyle \left|\psi\left(t\right)\right\rangle$ $\displaystyle =$ $\displaystyle \sum_{m}\left|m\right\rangle \left\langle m\left|\psi\left(t\right)\right.\right\rangle \ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{m}\left|m\right\rangle \cos\frac{m\pi t}{L}\left\langle m\left|\psi\left(0\right)\right.\right\rangle \ \ \ \ \ (18)$

This can be written as a propagator ${U\left(t\right)}$ acting on the initial state:

 $\displaystyle \left|\psi\left(t\right)\right\rangle$ $\displaystyle =$ $\displaystyle U\left(t\right)\left|\psi\left(0\right)\right\rangle \ \ \ \ \ (19)$ $\displaystyle U\left(t\right)$ $\displaystyle \equiv$ $\displaystyle \sum_{m}\left|m\right\rangle \left\langle m\right|\cos\frac{m\pi t}{L} \ \ \ \ \ (20)$

Just as with our earlier example of two masses coupled by springs, all the time dependence has been incorporated into the propagator, so all we need to do is specify the initial shape of the spring to get the general solution. This solution can be restored to the ${\left|x\right\rangle }$ basis by applying the bra ${\left\langle x\right|}$ to 18 and using 10:

 $\displaystyle \psi\left(x,t\right)$ $\displaystyle =$ $\displaystyle \left\langle x\left|\psi\left(t\right)\right.\right\rangle \ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sum_{m}\left\langle x\left|m\right.\right\rangle \cos\frac{m\pi t}{L}\left\langle m\left|\psi\left(0\right)\right.\right\rangle \ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \sqrt{\frac{2}{L}}\sum_{m}\sin\frac{m\pi x}{L}\cos\frac{m\pi t}{L}\left\langle m\left|\psi\left(0\right)\right.\right\rangle \ \ \ \ \ (23)$

We still need to get rid of the final ${\left\langle m\right|}$ bra in the last term, which we can do by inserting a unit operator using the ${\left|x\right\rangle }$ basis:

 $\displaystyle \psi\left(x,t\right)$ $\displaystyle =$ $\displaystyle \sqrt{\frac{2}{L}}\sum_{m}\sin\frac{m\pi x}{L}\cos\frac{m\pi t}{L}\int_{0}^{L}\left\langle m\left|x^{\prime}\right.\right\rangle \left\langle x^{\prime}\left|\psi\left(0\right)\right.\right\rangle dx^{\prime}\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{2}{L}\sum_{m}\sin\frac{m\pi x}{L}\cos\frac{m\pi t}{L}\int_{0}^{L}\sin\frac{m\pi x^{\prime}}{L}\psi\left(x^{\prime},0\right)dx^{\prime} \ \ \ \ \ (25)$

The last line follows from 10 because ${\left\langle m\left|x^{\prime}\right.\right\rangle =\left\langle x^{\prime}\left|m\right.\right\rangle ^*}$ and ${\left\langle m\left|x^{\prime}\right.\right\rangle }$ is real. Thus to get the final solution, we need to do the integral in the last line, which depends on the initial shape of the string.

For example, suppose the string is held at its midpoint a distance ${h}$ away from the ${x}$ axis, and follows a straight line on either side of the midpoint. Then the initial state is given by

$\displaystyle \psi\left(x,0\right)=\begin{cases} \frac{2xh}{L} & 0\le x\le\frac{L}{2}\\ \frac{2h}{L}\left(L-x\right) & \frac{L}{2}\le x\le L \end{cases} \ \ \ \ \ (26)$

We then need to do the integral

$\displaystyle \int_{0}^{L}\sin\frac{m\pi x}{L}\psi\left(x,0\right)dx=\frac{2h}{L}\left[\int_{0}^{L/2}x\sin\frac{m\pi x}{L}dx+\int_{L/2}^{L}\left(L-x\right)\sin\frac{m\pi x}{L}dx\right] \ \ \ \ \ (27)$

The integrals can be done by parts although it’s a bit tedious, so I used Maple to get

 $\displaystyle \frac{hL}{\pi^{2}m^{2}}\left[-\left(m\pi\cos\frac{\pi m}{2}-2\sin\frac{\pi m}{2}\right)+\left(m\pi\cos\frac{\pi m}{2}+2\sin\frac{\pi m}{2}\right)\right]$ $\displaystyle =$ $\displaystyle \frac{4hL}{\pi^{2}m^{2}}\sin\frac{\pi m}{2} \ \ \ \ \ (28)$

Plugging this back into 25 we get the final answer:

$\displaystyle \psi\left(x,t\right)=\frac{8h}{\pi^{2}}\sum_{m}\frac{1}{m^{2}}\sin\frac{m\pi x}{L}\cos\frac{m\pi t}{L}\sin\frac{\pi m}{2} \ \ \ \ \ (29)$

Each term in the sum is a normal mode, and we can see that the amplitude drops off as ${1/m^{2}}$, so higher frequencies are less prevalent in the overall motion of the string.

Notice that if we start the string off in a pure sine wave shape, this is the only mode that is ever present. That is, if, for some fixed integer ${n}$ and amplitude of initial displacement ${h}$:

$\displaystyle \psi\left(x,0\right)=h\sin\frac{n\pi x}{L} \ \ \ \ \ (30)$

then

$\displaystyle \int_{0}^{L}\sin\frac{m\pi x}{L}\psi\left(x,0\right)dx=\frac{hL}{2}\delta_{mn} \ \ \ \ \ (31)$

Thus the only mode present is ${m=n}$, and the string’s motion is

$\displaystyle \psi\left(x,t\right)=h\sin\frac{n\pi x}{L}\cos\frac{n\pi t}{L} \ \ \ \ \ (32)$

# Differential operator – eigenvalues and eigenstates

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.10.

Continuing with our study of differential operators, we’ll look now at their eigenvalues and eigenstates. The operator we’re studying is

$\displaystyle K=-i\frac{d}{dx} \ \ \ \ \ (1)$

The eigenvalue equation is as usual:

$\displaystyle K\left|k\right\rangle =k\left|k\right\rangle \ \ \ \ \ (2)$

where ${\left|k\right\rangle }$is an eigenstate and ${k}$ (outside the ket) is a (possibly complex) scalar. To find ${\left|k\right\rangle }$, we form the matrix element with ${\left\langle x\right|}$ and insert the unit operator:

 $\displaystyle \left\langle x\left|K\right|k\right\rangle$ $\displaystyle =$ $\displaystyle k\left\langle x\left|k\right.\right\rangle \ \ \ \ \ (3)$ $\displaystyle \left\langle x\left|K\right|k\right\rangle$ $\displaystyle =$ $\displaystyle \int\left\langle x\left|K\right|x^{\prime}\right\rangle \left\langle x^{\prime}\left|k\right.\right\rangle dx^{\prime}\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\int\delta^{\prime}\left(x-x^{\prime}\right)\psi_{k}\left(x^{\prime}\right)dx^{\prime}\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\frac{d}{dx}\psi_{k}\left(x\right) \ \ \ \ \ (6)$

In the third line we used the matrix element

$\displaystyle \left\langle x\left|K\right|x^{\prime}\right\rangle =-i\delta^{\prime}\left(x-x^{\prime}\right) \ \ \ \ \ (7)$

Equating the RHS on the first and last lines gives the differential equation

$\displaystyle -i\frac{d}{dx}\psi_{k}\left(x\right)=k\psi_{k}\left(x\right) \ \ \ \ \ (8)$

which has the solution

$\displaystyle \psi_{k}\left(x\right)=Ae^{ikx} \ \ \ \ \ (9)$

where ${A}$ is a constant of integration. In order for ${\psi_{k}\left(x\right)}$ to be bounded as ${x\rightarrow\pm\infty}$, ${k}$ must be real, so we’ll restrict our attention to that case. The usual choice for ${A}$ is ${1/\sqrt{2\pi}}$ so that

$\displaystyle \psi_{k}\left(x\right)=\frac{e^{ikx}}{\sqrt{2\pi}} \ \ \ \ \ (10)$

This leads to the normalization condition

 $\displaystyle \left\langle k\left|k^{\prime}\right.\right\rangle$ $\displaystyle =$ $\displaystyle \int_{-\infty}^{\infty}\left\langle k\left|x\right.\right\rangle \left\langle x\left|k^{\prime}\right.\right\rangle dx\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi}\int_{-\infty}^{\infty}e^{-i\left(k-k^{\prime}\right)x}dx\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \delta\left(k-k^{\prime}\right) \ \ \ \ \ (13)$

where in the last line we used the traditional formula for the delta function. Thus the ${\left|k\right\rangle }$ basis is orthogonal, and normalized the same way as the ${\left|x\right\rangle }$ basis.

To convert between the ${\left|k\right\rangle }$ and ${\left|x\right\rangle }$ bases, we can use the unit operator in the two bases. Thus for some vector (function) ${\left|f\right\rangle }$ we have

$\displaystyle f\left(k\right)=\left\langle k\left|f\right.\right\rangle =\int\left\langle k\left|x\right.\right\rangle \left\langle x\left|f\right.\right\rangle dx=\int\psi_{k}^*\left(x\right)f\left(x\right)dx=\frac{1}{\sqrt{2\pi}}\int e^{-ikx}f\left(x\right) \ \ \ \ \ (14)$

Thus ${f\left(k\right)}$ is the Fourier transform of ${f\left(x\right)}$. We can use the same procedure to go in the reverse direction:

$\displaystyle f\left(x\right)=\left\langle x\left|f\right.\right\rangle =\int\left\langle x\left|k\right.\right\rangle \left\langle k\left|f\right.\right\rangle dk=\int\psi_{k}\left(x\right)f\left(k\right)dk=\frac{1}{\sqrt{2\pi}}\int e^{ikx}f\left(k\right) \ \ \ \ \ (15)$

The effect of the position operator ${X}$ on a vector ${\left|f\left(x\right)\right\rangle }$ can be found by inserting the unit operator:

 $\displaystyle \left\langle x\left|X\right|f\right\rangle$ $\displaystyle =$ $\displaystyle \int\left\langle x\left|X\right|x^{\prime}\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle dx^{\prime}\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int x^{\prime}\left\langle x\left|x^{\prime}\right.\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle dx^{\prime}\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int x^{\prime}\delta\left(x-x^{\prime}\right)\left\langle x^{\prime}\left|f\right.\right\rangle dx^{\prime}\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x\left\langle x\left|f\right.\right\rangle \ \ \ \ \ (19)$

Thus ${X}$ just multiplies any function of ${x}$ by ${x}$ itself. A similar argument in the ${\left|k\right\rangle }$ basis shows that

$\displaystyle \left\langle k\left|K\right|f\left(k\right)\right\rangle =k\left\langle k\left|f\left(k\right)\right.\right\rangle \ \ \ \ \ (20)$

We can use similar calculations to find the matrix elements of ${K}$ in the ${\left|x\right\rangle }$ basis and of ${X}$ (the position operator) in the ${\left|k\right\rangle }$ basis. We get

 $\displaystyle \left\langle k\left|X\right|k^{\prime}\right\rangle$ $\displaystyle =$ $\displaystyle \int\int\left\langle k\left|x\right.\right\rangle \left\langle x\left|X\right|x^{\prime}\right\rangle \left\langle x^{\prime}\left|k^{\prime}\right.\right\rangle dx\;dx^{\prime}\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi}\int\int e^{-ikx}x^{\prime}\left\langle x\left|x^{\prime}\right.\right\rangle e^{ik^{\prime}x^{\prime}}dx\;dx^{\prime}\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi}\int\int e^{-ikx}x^{\prime}\delta\left(x-x^{\prime}\right)e^{ik^{\prime}x^{\prime}}dx\;dx^{\prime}\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi}\int xe^{i\left(k^{\prime}-k\right)x}dx\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\frac{d}{dk}\left[\frac{1}{2\pi}\int e^{i\left(k^{\prime}-k\right)x}dx\right]\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\delta^{\prime}\left(k-k^{\prime}\right) \ \ \ \ \ (26)$

The action of ${X}$ on an arbitrary vector ${\left|g\right\rangle }$ in the ${k}$ basis can be found from this:

 $\displaystyle \left\langle k\left|X\right|g\left(k\right)\right\rangle$ $\displaystyle =$ $\displaystyle \int\left\langle k\left|X\right|k^{\prime}\right\rangle \left\langle k^{\prime}\left|g\right.\right\rangle dk^{\prime}\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\int\delta^{\prime}\left(k-k^{\prime}\right)g\left(k^{\prime}\right)dk^{\prime}\ \ \ \ \ (28)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\frac{dg\left(k\right)}{dk}\ \ \ \ \ (29)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\left\langle k\left|\frac{dg\left(k\right)}{dk}\right.\right\rangle \ \ \ \ \ (30)$

where in the third line we’ve used the property of ${\delta^{\prime}\left(k-k^{\prime}\right)}$ mentioned here.

By a similar calculation, we can find the matrix elements of ${K}$ in the ${\left|x\right\rangle }$ basis:

 $\displaystyle \left\langle x\left|K\right|x^{\prime}\right\rangle$ $\displaystyle =$ $\displaystyle \int\int\left\langle x\left|k\right.\right\rangle \left\langle k\left|K\right|k^{\prime}\right\rangle \left\langle k^{\prime}\left|x^{\prime}\right.\right\rangle dk\;dk^{\prime}\ \ \ \ \ (31)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi}\int\int e^{ikx}k^{\prime}\left\langle k\left|k^{\prime}\right.\right\rangle e^{-ik^{\prime}x^{\prime}}dk\;dk^{\prime}\ \ \ \ \ (32)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi}\int\int e^{ikx}k^{\prime}\delta\left(k-k^{\prime}\right)e^{-ik^{\prime}x^{\prime}}dk\;dk^{\prime}\ \ \ \ \ (33)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{1}{2\pi}\int xe^{i\left(x-x^{\prime}\right)k}dk\ \ \ \ \ (34)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\frac{d}{dx}\left[\frac{1}{2\pi}\int e^{i\left(x-x^{\prime}\right)k}dk\right]\ \ \ \ \ (35)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\delta^{\prime}\left(x-x^{\prime}\right) \ \ \ \ \ (36)$

Similarly, we have

 $\displaystyle \left\langle x\left|K\right|g\left(x\right)\right\rangle$ $\displaystyle =$ $\displaystyle \int\left\langle x\left|K\right|x^{\prime}\right\rangle \left\langle x^{\prime}\left|g\right.\right\rangle dx^{\prime}\ \ \ \ \ (37)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\int\delta^{\prime}\left(x-x^{\prime}\right)g\left(x^{\prime}\right)dx^{\prime}\ \ \ \ \ (38)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\frac{dg\left(x\right)}{dx}\ \ \ \ \ (39)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\left\langle x\left|\frac{dg\left(x\right)}{dx}\right.\right\rangle \ \ \ \ \ (40)$

From 30 and 40 we can work out the familiar commutator. Just for variety, we’ll do this in the ${\left|k\right\rangle }$ basis:

 $\displaystyle XK\left|f\left(k\right)\right\rangle$ $\displaystyle =$ $\displaystyle X\left[k\left|f\left(k\right)\right\rangle \right]\ \ \ \ \ (41)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\frac{d}{dk}\left[k\left|f\left(k\right)\right\rangle \right]\ \ \ \ \ (42)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\left[\left|f\left(k\right)\right\rangle +k\left|\frac{df}{dk}\right\rangle \right]\ \ \ \ \ (43)$ $\displaystyle KX\left|f\left(k\right)\right\rangle$ $\displaystyle =$ $\displaystyle iK\left|\frac{df}{dk}\right\rangle \ \ \ \ \ (44)$ $\displaystyle$ $\displaystyle =$ $\displaystyle ik\left|\frac{df}{dk}\right\rangle \ \ \ \ \ (45)$

Therefore

$\displaystyle \left[X,K\right]\left|f\left(k\right)\right\rangle =i\left|f\left(k\right)\right\rangle \ \ \ \ \ (46)$

or, looking just at the operators

$\displaystyle \left[X,K\right]=iI \ \ \ \ \ (47)$

# Differential operators – matrix elements and hermiticity

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.10.

Here, we’ll revisit the differential operator on a continuous vector space which we looked at earlier in its role as the momentum operator. This time around, we’ll use the bra-ket notation and vector space results to analyze it, hopefully putting it on a slightly more mathematical foundation.

We define the differential operator ${D}$ acting on a vector ${\left|f\right\rangle }$ in a continuous vector space as having the action

$\displaystyle D\left|f\right\rangle =\left|\frac{df}{dx}\right\rangle \ \ \ \ \ (1)$

This notation means that ${D}$ operating on ${\left|f\right\rangle }$ produces the vector (ket) ${\left|\frac{df}{dx}\right\rangle }$ corresponding to the function whose form in the ${\left|x\right\rangle }$ basis is ${\frac{df\left(x\right)}{dx}}$. That is, the projection of ${\left|\frac{df}{dx}\right\rangle }$ onto the basis vector ${\left|x\right\rangle }$ is

$\displaystyle \frac{df\left(x\right)}{dx}=\left\langle x\left|\frac{df}{dx}\right.\right\rangle =\left\langle x\left|D\right|f\right\rangle \ \ \ \ \ (2)$

By a similar argument to that which we used to deduce the matrix element ${\left\langle x\left|x^{\prime}\right.\right\rangle }$, we can work out the matrix elements of ${D}$ in the ${\left|x\right\rangle }$ basis. Inserting the unit operator, we have

 $\displaystyle \left\langle x\left|D\right|f\right\rangle$ $\displaystyle =$ $\displaystyle \int dx^{\prime}\left\langle x\left|D\right|x^{\prime}\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle \ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int dx^{\prime}\left\langle x\left|D\right|x^{\prime}\right\rangle f\left(x^{\prime}\right) \ \ \ \ \ (4)$

We need this to be equal to ${\frac{df}{dx}}$. To get this, we can introduce the derivative of the delta function, except this time the delta function is a function of ${x-x^{\prime}}$ rather than just ${x}$ on its own. To see the effect of this derivative, consider the integral

$\displaystyle \int dx^{\prime}\frac{d\delta\left(x-x^{\prime}\right)}{dx}f\left(x^{\prime}\right)=\frac{d}{dx}\int dx^{\prime}\delta\left(x-x^{\prime}\right)f\left(x^{\prime}\right)=\frac{df\left(x\right)}{dx} \ \ \ \ \ (5)$

In the second step, we could take the derivative outside the integral since ${x}$ is a constant with respect to the integration. Comparing this with 4 we see that

$\displaystyle \left\langle x\left|D\right|x^{\prime}\right\rangle \equiv D_{xx^{\prime}}=\frac{d\delta\left(x-x^{\prime}\right)}{dx}=\delta^{\prime}\left(x-x^{\prime}\right) \ \ \ \ \ (6)$

Here the prime in ${\delta^{\prime}}$ means derivative with respect to ${x}$, not ${x^{\prime}}$. [Note that this is not the same formula as that quoted in the earlier post, where we had ${f\left(x\right)\delta^{\prime}\left(x\right)=-f^{\prime}\left(x\right)\delta\left(x\right)}$ because in that formula it was the same variable ${x}$ that was involved in the derivative of the delta function and in the integral.]

The operator ${D}$ is not hermitian as it stands. Since the delta function is real, we have, looking at ${D_{xx^{\prime}}^{\dagger}=D_{x^{\prime}x}^*}$ in bra-ket notation, we see that

$\displaystyle D_{x^{\prime}x}^{\dagger}=\left\langle x^{\prime}\left|D^*\right|x\right\rangle =\delta^{\prime}\left(x^{\prime}-x\right)=-\delta^{\prime}\left(x-x^{\prime}\right)\ne D_{xx^{\prime}} \ \ \ \ \ (7)$

Thus ${D}$ is anti-hermitian. It is easy to fix this and create a hermitian operator by multiplying by an imaginary number, such as ${-i}$ (this choice is, of course, to make the new operator consistent with the momentum operator). Calling this new operator ${K\equiv-iD}$ we have

$\displaystyle K_{x^{\prime}x}^{\dagger}=\left\langle x^{\prime}\left|K^*\right|x\right\rangle =i\delta^{\prime}\left(x^{\prime}-x\right)=-i\delta^{\prime}\left(x-x^{\prime}\right)=K_{xx^{\prime}} \ \ \ \ \ (8)$

A curious fact about ${K}$ (and thus about the momentum operator as well) is that it is not automatically hermitian even with this correction. We’ve seen that it satisfies the hermiticity property with respect to its matrix elements in the position basis, but to be fully hermitian, it must satisfy

$\displaystyle \left\langle g\left|K\right|f\right\rangle =\left\langle f\left|K\right|g\right\rangle ^* \ \ \ \ \ (9)$

for any two vectors ${\left|f\right\rangle }$ and ${\left|g\right\rangle }$. Suppose we are interested in ${x}$ over some range ${\left[a,b\right]}$. Then by inserting a couple of identity operators, we have

 $\displaystyle \left\langle g\left|K\right|f\right\rangle$ $\displaystyle =$ $\displaystyle \int_{a}^{b}\int_{a}^{b}\left\langle g\left|x\right.\right\rangle \left\langle x\left|K\right|x^{\prime}\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle dx\;dx^{\prime}\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\int_{a}^{b}g^*\left(x\right)\frac{df}{dx}dx\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\left.g^*\left(x\right)f\left(x\right)\right|_{a}^{b}+i\int_{a}^{b}f\left(x\right)\frac{dg^*}{dx}dx\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle -i\left.g^*\left(x\right)f\left(x\right)\right|_{a}^{b}+\left\langle f\left|K\right|g\right\rangle ^* \ \ \ \ \ (13)$

The result is hermitian only if the first term in the last line is zero, which happens only for certain choices of ${f}$ and ${g}$. If the limits are infinite, so we’re integrating over all space, and the system is bounded so that both ${f}$ and ${g}$ go to zero at infinity, then we’re OK, and ${K}$ is hermitian. Another option is if ${g}$ and ${f}$ are periodic and the range of integration is equal to an integral multiple of the period, then ${g^*f}$ has the same value at each end and the term becomes zero.

However, as we’ve seen, in quantum mechanics there are cases where we deal with functions such as ${e^{ikx}}$ (for ${k}$ real) that oscillate indefinitely, no matter how large ${x}$ is (see the free particle, for example). There isn’t any mathematically airtight way around such cases (as far as I know), but a hand-wavy way of defining a limit for such oscillating functions is to consider their average behaviour as ${x\rightarrow\pm\infty}$. The average defined by Shankar is given as

$\displaystyle \lim_{x\rightarrow\infty}e^{ikx}e^{-ik^{\prime}x}=\lim_{\substack{L\rightarrow\infty\\ \Delta\rightarrow\infty } }\frac{1}{\Delta}\int_{L}^{L+\Delta}e^{i\left(k-k^{\prime}\right)x}dx \ \ \ \ \ (14)$

This is interpreted as looking at the function very far out on the ${x}$ axis (at position ${L}$), and then considering a very long interval ${\Delta}$ starting at point ${L}$. Since the integral of ${e^{i\left(k-k^{\prime}\right)x}}$ over one period is zero (it’s just a combination of sine and cosine functions), the integral is always bounded between 0 and the area under half a cycle, as successive half-cycles cancel each other. Dividing by ${\Delta}$, which is monotonically increasing, ensures that the limit is zero.

This isn’t an ideal solution, but it’s just one of many cases where an infinitely oscillating function is called upon to do seemingly impossible things. The theory seems to hang together fairly well in any case.

# Non-denumerable basis: position and momentum states

References: References: edX online course MIT 8.05 Section 5.6.

Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.10; Exercises 1.10.1 – 1.10.3.

Although we’ve looked at position and momentum operators in quantum mechanics before, it’s worth another look at the ways that Zwiebach and Shankar introduce them.

First, we’ll have a look at Shankar’s treatment. He begins by considering a string fixed at each end, at positions ${x=0}$ and ${x=L}$, then asks how we could convey the shape of the string to an observer who cannot see the string directly. We could note the position at some fixed finite number of points between 0 and ${L}$, but then the remote observer would have only a partial knowledge of the string’s shape; the locations of those portions of the string between the points at which it was measured are still unknown, although the observer could probably get a reasonable picture by interpolating between these points.

We can increase the number of points at which the position is measured to get a better picture, but to convey the exact shape of the string, we need to measure its position at an infinite number of points. This is possible (in principle) but leads to a problem with the definition of the inner product. For two vectors defined on a finite vector space with an orthonormal basis, the inner product is given by the usual formula for the dot product:

 $\displaystyle \left\langle f\left|g\right.\right\rangle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{n}f_{i}g_{i}\ \ \ \ \ (1)$ $\displaystyle \left\langle f\left|f\right.\right\rangle$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{n}f_{i}^{2} \ \ \ \ \ (2)$

where ${f_{i}}$ and ${g_{i}}$ are the components of ${f}$ and ${g}$ in the orthonormal basis. If we’re taking ${f}$ to be the displacement of a string and we try to increase the accuracy of the picture by increasing the number ${n}$ of points at which measurements are taken, then the value of ${\left\langle f\left|f\right.\right\rangle }$ continues to increase as ${n}$ increases (provided that ${f\ne0}$ everywhere). As ${n\rightarrow\infty}$ then ${\left\langle f\left|f\right.\right\rangle \rightarrow\infty}$ as well, even though the system we’re measuring (a string of finite length with finite displacement) is certainly not infinite in any practical sense.

Shankar proposes getting around this problem by simply redefining the inner product for a finite vector space to be

$\displaystyle \left\langle f\left|g\right.\right\rangle =\sum_{i=1}^{n}f\left(x_{i}\right)g\left(x_{i}\right)\Delta \ \ \ \ \ (3)$

where ${\Delta\equiv L/\left(n+1\right)}$. That is, ${\Delta}$ now becomes the distance between adjacent points at which measurements are taken. If we let ${n\rightarrow\infty}$ this leads to the definition of the inner product as an integral

 $\displaystyle \left\langle f\left|g\right.\right\rangle$ $\displaystyle =$ $\displaystyle \int_{0}^{L}f\left(x\right)g\left(x\right)\;dx\ \ \ \ \ (4)$ $\displaystyle \left\langle f\left|f\right.\right\rangle$ $\displaystyle =$ $\displaystyle \int_{0}^{L}f^{2}\left(x\right)\;dx \ \ \ \ \ (5)$

This looks familiar enough, if you’ve done any work with inner products in quantum mechanics, but there is a subtle point which Shankar overlooks. In going from 1 to 3, we have introduced a factor ${\Delta}$ which, in the string example at least, has the dimensions of length, so the physical interpretation of these two equations is different. The units of ${\left\langle f\left|g\right.\right\rangle }$ appear to be different in the two cases. Now in quantum theory, inner products of the continuous type usually involve the wave function multiplied by its complex conjugate, with possibly another operator thrown in if we’re trying to find the expectation value of some observable. The square modulus of the wave function, ${\left|\Psi\right|^{2}}$, is taken to be a probability density, so it has units of inverse length (in one dimension) or inverse volume (in three dimensions), which makes the integral work out properly.

Admittedly, when we’re using ${f}$ to represent the displacement of a string, it’s not obvious what meaning the inner product of ${f}$ with anything else would actually have, so maybe the point isn’t worth worrying about. However, it does seem to be something that it would be worth Shankar including a comment about.

From this point, Shankar continues by saying that this infinite dimensional vector space is spanned by basis vectors ${\left|x\right\rangle }$, with one basis vector for each value of ${x}$. We require this basis to be orthogonal, which means that we must have, if ${x\ne x^{\prime}}$

$\displaystyle \left\langle x\left|x^{\prime}\right.\right\rangle =0 \ \ \ \ \ (6)$

We then generalize the identity operator to be

$\displaystyle I=\int\left|x\right\rangle \left\langle x\right|dx \ \ \ \ \ (7)$

which leads to

$\displaystyle \left\langle x\left|f\right.\right\rangle =\int\left\langle x\left|x^{\prime}\right.\right\rangle \left\langle x^{\prime}\left|f\right.\right\rangle dx^{\prime} \ \ \ \ \ (8)$

The bra-ket ${\left\langle x\left|f\right.\right\rangle }$ is the projection of the vector ${\left|f\right\rangle }$ onto the ${\left|x\right\rangle }$ basis vector, so it is just ${f\left(x\right)}$. This means

$\displaystyle f\left(x\right)=\int\left\langle x\left|x^{\prime}\right.\right\rangle f\left(x^{\prime}\right)dx^{\prime} \ \ \ \ \ (9)$

which leads to the definition of the Dirac delta function as the normalization of ${\left\langle x\left|x^{\prime}\right.\right\rangle }$:

$\displaystyle \left\langle x\left|x^{\prime}\right.\right\rangle =\delta\left(x-x^{\prime}\right) \ \ \ \ \ (10)$

Shankar then describes some properties of the delta function and its derivative, most of which we’ve already covered. For example, we’ve seen these two results for the delta function:

 $\displaystyle \delta\left(ax\right)$ $\displaystyle =$ $\displaystyle \frac{\delta\left(x\right)}{\left|a\right|}\ \ \ \ \ (11)$ $\displaystyle \frac{d\theta\left(x-x^{\prime}\right)}{dx}$ $\displaystyle =$ $\displaystyle \delta\left(x-x^{\prime}\right) \ \ \ \ \ (12)$

where ${\theta}$ is the step function

$\displaystyle \theta\left(x-x^{\prime}\right)\equiv\begin{cases} 0 & x\le x^{\prime}\\ 1 & x>x^{\prime} \end{cases} \ \ \ \ \ (13)$

One other result is that for a function ${f\left(x\right)}$ with zeroes at a number of points ${x_{i}}$, we have

$\displaystyle \delta\left(f\left(x\right)\right)=\sum_{i}\frac{\delta\left(x_{i}-x\right)}{\left|df/dx_{i}\right|} \ \ \ \ \ (14)$

To see this, consider one of the ${x_{i}}$ where ${f\left(x_{i}\right)=0}$. Expanding in a Taylor series about this point, we have

 $\displaystyle f\left(x_{i}+\left(x-x_{i}\right)\right)$ $\displaystyle =$ $\displaystyle f\left(x_{i}\right)+\left(x-x_{i}\right)\frac{df}{dx_{i}}+\ldots\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0+\left(x-x_{i}\right)\frac{df}{dx_{i}} \ \ \ \ \ (16)$

From 11 we have

$\displaystyle \delta\left(\left(x-x_{i}\right)\frac{df}{dx_{i}}\right)=\frac{\delta\left(x_{i}-x\right)}{\left|df/dx_{i}\right|} \ \ \ \ \ (17)$

The behaviour is the same at all points ${x_{i}}$ and since ${\delta\left(x_{i}-x\right)=0}$ at all other ${x_{j}\ne x_{i}}$ where ${f\left(x_{j}\right)=0}$, we can just add the delta functions for each zero of ${f}$.

Turning now to Zwiebach’s treatment, he begins with the basis states ${\left|x\right\rangle }$ and position operator ${\hat{x}}$ with the eigenvalue equation

$\displaystyle \hat{x}\left|x\right\rangle =x\left|x\right\rangle \ \ \ \ \ (18)$

and simply defines the inner product between two position states to be

$\displaystyle \left\langle x\left|y\right.\right\rangle =\delta\left(x-y\right) \ \ \ \ \ (19)$

With this definition, 9 follows immediately. We can therefore write a quantum state ${\left|\psi\right\rangle }$ as

$\displaystyle \left|\psi\right\rangle =I\left|\psi\right\rangle =\int\left|x\right\rangle \left\langle x\left|\psi\right.\right\rangle dx=\int\left|x\right\rangle \psi\left(x\right)dx \ \ \ \ \ (20)$

That is, the vector ${\left|\psi\right\rangle }$ is the integral of its projections ${\psi\left(x\right)}$ onto the basis vectors ${\left|x\right\rangle }$.

The position operator ${\hat{x}}$ is hermitian as can be seen from

 $\displaystyle \left\langle x_{1}\left|\hat{x}^{\dagger}\right|x_{2}\right\rangle$ $\displaystyle =$ $\displaystyle \left\langle x_{2}\left|\hat{x}\right|x_{1}\right\rangle ^*\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{1}\left\langle x_{2}\left|x_{1}\right.\right\rangle ^*\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{1}\delta\left(x_{2}-x_{1}\right)^*\ \ \ \ \ (23)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{1}\delta\left(x_{2}-x_{1}\right)\ \ \ \ \ (24)$ $\displaystyle$ $\displaystyle =$ $\displaystyle x_{2}\delta\left(x_{2}-x_{1}\right)\ \ \ \ \ (25)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left\langle x_{1}\left|\hat{x}\right|x_{2}\right\rangle \ \ \ \ \ (26)$

The fourth line follows because the delta function is real, and the fifth follows because ${\delta\left(x_{2}-x_{1}\right)}$ is non-zero only when ${x_{1}=x_{2}}$.

Zwiebach then introduces the momentum eigenstates ${\left|p\right\rangle }$ which are analogous to the position states ${\left|x\right\rangle }$, in that

 $\displaystyle \left\langle p^{\prime}\left|p\right.\right\rangle$ $\displaystyle =$ $\displaystyle \delta\left(p^{\prime}-p\right)\ \ \ \ \ (27)$ $\displaystyle I$ $\displaystyle =$ $\displaystyle \int dp\left|p\right\rangle \left\langle p\right|\ \ \ \ \ (28)$ $\displaystyle \hat{p}\left|p\right\rangle$ $\displaystyle =$ $\displaystyle p\left|p\right\rangle \ \ \ \ \ (29)$ $\displaystyle \tilde{\psi}\left(p\right)$ $\displaystyle =$ $\displaystyle \left\langle p\left|\psi\right.\right\rangle \ \ \ \ \ (30)$

By the same calculation as for ${\left|x\right\rangle }$, we see that ${\hat{p}}$ is hermitian.

To get a relation between the ${\left|x\right\rangle }$ and ${\left|p\right\rangle }$ bases, we require that ${\left\langle x\left|p\right.\right\rangle }$ is the wave function for a particle with momentum ${p}$ in the ${x}$ basis, which we’ve seen is

$\displaystyle \psi\left(x\right)=\frac{1}{\sqrt{2\pi\hbar}}e^{ipx/\hbar} \ \ \ \ \ (31)$

Zwiebach then shows that this is consistent with the equation

$\displaystyle \left\langle x\left|\hat{p}\right|\psi\right\rangle =\frac{h}{i}\frac{d}{dx}\left\langle x\left|\psi\right.\right\rangle =\frac{h}{i}\frac{d\psi\left(x\right)}{dx} \ \ \ \ \ (32)$

We can get a similar relation by switching ${x}$ and ${p}$:

 $\displaystyle \left\langle p\left|\hat{x}\right|\psi\right\rangle$ $\displaystyle =$ $\displaystyle \int dx\left\langle p\left|x\right.\right\rangle \left\langle x\left|\hat{x}\right|\psi\right\rangle \ \ \ \ \ (33)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \int dx\left\langle x\left|p\right.\right\rangle ^*x\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (34)$

From 31 we see

 $\displaystyle \left\langle x\left|p\right.\right\rangle ^*$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2\pi\hbar}}e^{-ipx/\hbar}\ \ \ \ \ (35)$ $\displaystyle \left\langle x\left|p\right.\right\rangle ^*x$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dp}\left\langle x\left|p\right.\right\rangle ^*\ \ \ \ \ (36)$ $\displaystyle \int dx\left\langle x\left|p\right.\right\rangle ^*x\left\langle x\left|\psi\right.\right\rangle$ $\displaystyle =$ $\displaystyle i\hbar\int dx\;\frac{d}{dp}\left\langle x\left|p\right.\right\rangle ^*\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (37)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dp}\int dx\;\left\langle x\left|p\right.\right\rangle ^*\left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (38)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d}{dp}\int dx\;\left\langle p\left|x\right.\right\rangle \left\langle x\left|\psi\right.\right\rangle \ \ \ \ \ (39)$ $\displaystyle$ $\displaystyle =$ $\displaystyle i\hbar\frac{d\tilde{\psi}\left(p\right)}{dp} \ \ \ \ \ (40)$

In the fourth line, we took the ${\frac{d}{dp}}$ outside the integral since ${p}$ occurs in only one term, and in the last line we used 7. Thus we have

$\displaystyle \left\langle p\left|\hat{x}\right|\psi\right\rangle =i\hbar\frac{d\tilde{\psi}\left(p\right)}{dp} \ \ \ \ \ (41)$

# Exponentials of operators – Baker-Campbell-Hausdorff formula

References: Shankar, R. (1994), Principles of Quantum Mechanics, Plenum Press. Section 1.9.

Although the result in this post isn’t covered in Shankar’s book, it’s a result that is frequently used in quantum theory, so it’s worth including at this point.

We’ve seen how to define a function of an operator if that function can be expanded in a power series. A common operator function is the exponential:

$\displaystyle f\left(\Omega\right)=e^{i\Omega} \ \ \ \ \ (1)$

If ${\Omega}$ is hermitian, the exponential ${e^{i\Omega}}$ is unitary. If we try to calculate the exponential of two operators such as ${e^{A+B}}$, the result isn’t as simple as we might hope if ${A}$ and ${B}$ don’t commute. To see the problem, we can write this out as a power series

 $\displaystyle e^{A+B}$ $\displaystyle =$ $\displaystyle \sum_{n=0}^{\infty}\frac{\left(A+B\right)^{n}}{n!}\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle I+A+B+\frac{1}{2}\left(A+B\right)\left(A+B\right)+\ldots\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle I+A+B+\frac{1}{2}\left(A^{2}+AB+BA+B^{2}\right)+\ldots \ \ \ \ \ (4)$

The problem appears first in the fourth term in the series, since we can’t condense the ${AB+BA}$ sum into ${2AB}$ if ${\left[A,B\right]\ne0}$. In fact, the expansion of ${e^{A}e^{B}}$ can be written entirely in terms of the commutators of ${A}$ and ${B}$ with each other, nested to increasingly higher levels. This formula is known as the Baker-Campbell-Hausdorff formula. Up to the fourth order commutator, the BCH formula gives

$\displaystyle e^{A}e^{B}=\exp\left[A+B+\frac{1}{2}\left[A,B\right]+\frac{1}{12}\left(\left[A,\left[A,B\right]\right]+\left[B,\left[B,A\right]\right]\right)-\frac{1}{24}\left[B,\left[A,\left[A,B\right]\right]\right]+\ldots\right] \ \ \ \ \ (5)$

There is no known closed form expression for this result. However, an important special case that occurs frequently in quantum theory is the case where ${\left[A,B\right]=cI}$, where ${c}$ is a complex scalar and ${I}$ is the usual identity matrix. Since ${cI}$ commutes with all operators, all terms from the third order upwards are zero, and we have

$\displaystyle e^{A}e^{B}=e^{A+B+\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (6)$

We can prove this result as follows. Start with the operator function

$\displaystyle G\left(t\right)\equiv e^{t\left(A+B\right)}e^{-tA} \ \ \ \ \ (7)$

where ${t}$ is a scalar parameter (not necessarily time!).

From its definition,

$\displaystyle G\left(0\right)=I \ \ \ \ \ (8)$

The inverse is

$\displaystyle G^{-1}\left(t\right)=e^{tA}e^{-t\left(A+B\right)} \ \ \ \ \ (9)$

and the derivative is

 $\displaystyle \frac{dG\left(t\right)}{dt}$ $\displaystyle =$ $\displaystyle \left(A+B\right)e^{t\left(A+B\right)}e^{-tA}-e^{t\left(A+B\right)}e^{-tA}A \ \ \ \ \ (10)$

Note that we have to keep the ${\left(A+B\right)}$ factor to the left of the ${A}$ factor because ${\left[A,B\right]\ne0}$. Now we multiply:

 $\displaystyle G^{-1}\frac{dG}{dt}$ $\displaystyle =$ $\displaystyle e^{tA}e^{-t\left(A+B\right)}\left[\left(A+B\right)e^{t\left(A+B\right)}e^{-tA}-e^{t\left(A+B\right)}e^{-tA}A\right]\ \ \ \ \ (11)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{tA}\left(A+B\right)e^{-tA}-A\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{tA}Ae^{-tA}+e^{tA}Be^{-tA}-A\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{tA}Be^{-tA}\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle B+t\left[A,B\right]\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle B+ctI \ \ \ \ \ (16)$

We used Hadamard’s lemma in the penultimate line, which in this case reduces to

$\displaystyle e^{tA}Be^{-tA}=B+t\left[A,B\right] \ \ \ \ \ (17)$

because ${\left[A,B\right]=cI}$ so all higher order commutators are zero.

We end up with an expression in which ${A}$ has disappeared. This gives the differential equation for ${G}$:

$\displaystyle G^{-1}\frac{dG}{dt}=B+ctI \ \ \ \ \ (18)$

We try a solution of the form (this apparently appears from divine inspiration):

$\displaystyle G\left(t\right)=e^{\alpha tB}e^{\beta ct^{2}} \ \ \ \ \ (19)$

From which we get

 $\displaystyle G^{-1}$ $\displaystyle =$ $\displaystyle e^{-\alpha tB}e^{-\beta ct^{2}}\ \ \ \ \ (20)$ $\displaystyle \frac{dG}{dt}$ $\displaystyle =$ $\displaystyle \left(\alpha B+2\beta ct\right)e^{\alpha tB}e^{\beta ct^{2}}\ \ \ \ \ (21)$ $\displaystyle G^{-1}\frac{dG}{dt}$ $\displaystyle =$ $\displaystyle \alpha B+2\beta ct \ \ \ \ \ (22)$

Comparing this to 18, we have

 $\displaystyle \alpha$ $\displaystyle =$ $\displaystyle 1\ \ \ \ \ (23)$ $\displaystyle \beta$ $\displaystyle =$ $\displaystyle \frac{1}{2}\ \ \ \ \ (24)$ $\displaystyle G\left(t\right)$ $\displaystyle =$ $\displaystyle e^{tB}e^{\frac{1}{2}ct^{2}} \ \ \ \ \ (25)$

Setting this equal to the original definition of ${G}$ in 7 and then taking ${t=1}$ we have

 $\displaystyle e^{A+B}e^{-A}$ $\displaystyle =$ $\displaystyle e^{B}e^{c/2}\ \ \ \ \ (26)$ $\displaystyle e^{A+B}$ $\displaystyle =$ $\displaystyle e^{B}e^{A}e^{\frac{1}{2}c}\ \ \ \ \ (27)$ $\displaystyle$ $\displaystyle =$ $\displaystyle e^{B}e^{A}e^{\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (28)$

If we swap ${A}$ with ${B}$ and use the fact that ${A+B=B+A}$, and also ${\left[A,B\right]=-\left[B,A\right]}$, we have

$\displaystyle e^{A+B}=e^{A}e^{B}e^{-\frac{1}{2}\left[A,B\right]} \ \ \ \ \ (29)$

This is the restricted form of the BCH formula for the case where ${\left[A,B\right]}$ is a scalar.