# Covariant derivative: commutativity

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 18; 8.

The second absolute gradient (or covariant derivative) of a four-vector is not commutative, as we can show by a direct derivation. Starting with the formula for the absolute gradient of a four-vector:

$\displaystyle \nabla_{j}A^{k}\equiv\frac{\partial A^{k}}{\partial x^{j}}+A^{i}\Gamma_{\; ij}^{k} \ \ \ \ \ (1)$

and the formula for the absolute gradient of a mixed tensor:

$\displaystyle \nabla_{l}C_{j}^{i}=\partial_{l}C_{j}^{i}+\Gamma_{lm}^{i}C_{j}^{m}-\Gamma_{lj}^{m}C_{m}^{i} \ \ \ \ \ (2)$

we can write out the second absolute gradient of a four-vector:

$\displaystyle \nabla_{i}\left(\nabla_{j}A^{k}\right)=\partial_{i}\partial_{j}A^{k}+\Gamma_{j\ell}^{k}\partial_{i}A^{\ell}+A^{\ell}\partial_{i}\Gamma_{j\ell}^{k}-\Gamma_{ji}^{m}\left(\partial_{m}A^{k}+A^{\ell}\Gamma_{m\ell}^{k}\right)+\Gamma_{im}^{k}\left(\partial_{j}A^{m}+A^{\ell}\Gamma_{j\ell}^{m}\right) \ \ \ \ \ (3)$

If we now swap ${i}$ and ${j}$, we get, using the commutativity of ordinary derivatives and the symmetry of ${\Gamma_{ji}^{m}}$:

$\displaystyle \nabla_{j}\left(\nabla_{i}A^{k}\right)=\partial_{i}\partial_{j}A^{k}+\Gamma_{i\ell}^{k}\partial_{j}A^{\ell}+A^{\ell}\partial_{j}\Gamma_{i\ell}^{k}-\Gamma_{ji}^{m}\left(\partial_{m}A^{k}+A^{\ell}\Gamma_{m\ell}^{k}\right)+\Gamma_{jm}^{k}\left(\partial_{i}A^{m}+A^{\ell}\Gamma_{i\ell}^{m}\right) \ \ \ \ \ (4)$

Subtracting these two equations gives

$\displaystyle \left(\nabla_{i}\nabla_{j}-\nabla_{j}\nabla_{i}\right)A^{k}=\left(\partial_{i}\Gamma_{j\ell}^{k}-\partial_{j}\Gamma_{i\ell}^{k}+\Gamma_{im}^{k}\Gamma_{j\ell}^{m}-\Gamma_{jm}^{k}\Gamma_{i\ell}^{m}\right)A^{\ell} \ \ \ \ \ (5)$

Using the definition of the Riemann tensor:

$\displaystyle R_{\; j\ell m}^{i}\equiv\partial_{\ell}\Gamma_{\; mj}^{i}-\partial_{m}\Gamma_{\;\ell j}^{i}+\Gamma_{\; mj}^{k}\Gamma_{\;\ell k}^{i}-\Gamma_{\;\ell j}^{k}\Gamma_{\; km}^{i} \ \ \ \ \ (6)$

we have

$\displaystyle \left(\nabla_{i}\nabla_{j}-\nabla_{j}\nabla_{i}\right)A^{k}=R_{\;\ell ij}^{k}A^{\ell} \ \ \ \ \ (7)$

Thus the covariant derivative commutes only if the Riemann tensor is zero, which occurs only in flat spacetime.

# Covariant derivative of the metric tensor: application to a coordinate transformation

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 17; Problem P17.10.

Here’s an application of the fact that the covariant derivative of any metric tensor is always zero. Suppose we define a coordinate transformation in which:

$\displaystyle \frac{\partial x^{a}}{\partial x^{\prime m}}=\delta_{\; m}^{a}-\left[\Gamma_{\; mn}^{a}\right]_{P}\Delta x_{P}^{\prime n} \ \ \ \ \ (1)$

where ${\left[\Gamma_{\; mn}^{a}\right]_{P}}$ is the Christoffel symbol in the primed system evaluated at a particular point ${P}$ (and therefore they are constants). (In Moore’s problem P17.10, he states that this is in the unprimed system, but the problem makes no sense in that case, since we’re summing over an index ${n}$ which refers to the unprimed coordinate system in one term and the primed system in the other.) The quantity ${\Delta x_{P}^{\prime n}\equiv x^{\prime n}-x_{p}^{\prime n}}$ represents a displacement from the point ${P}$, as measured in the primed system.

Using the usual transformation equation for a tensor, we get for the metric tensor:

 $\displaystyle g_{ij}^{\prime}$ $\displaystyle =$ $\displaystyle \frac{\partial x^{a}}{\partial x^{\prime i}}\frac{\partial x^{b}}{\partial x^{\prime j}}g_{ab}\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\delta_{\; i}^{a}-\left[\Gamma_{\; in}^{a}\right]_{P}\Delta x_{P}^{\prime n}\right)\left(\delta_{\; j}^{b}-\left[\Gamma_{\; jn}^{b}\right]_{P}\Delta x_{P}^{\prime n}\right)g_{ab}\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle g_{ij}-g_{ib}\left[\Gamma_{\; jn}^{b}\right]_{P}\Delta x_{P}^{\prime n}-g_{aj}\left[\Gamma_{\; in}^{a}\right]_{P}\Delta x_{P}^{\prime n}+\left[\Gamma_{\; is}^{a}\right]_{P}\Delta x_{P}^{\prime s}\left[\Gamma_{\; jn}^{b}\right]_{P}\Delta x_{P}^{\prime n}g_{ab}\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle g_{ij}-\Delta x_{P}^{\prime n}\left(g_{ib}\left[\Gamma_{\; jn}^{b}\right]_{P}+g_{aj}\left[\Gamma_{\; in}^{a}\right]_{P}\right)+\left[\Gamma_{\; in}^{a}\right]_{P}\left[\Gamma_{\; js}^{b}\right]_{P}\Delta x_{P}^{\prime n}\Delta x_{P}^{\prime s}g_{ab} \ \ \ \ \ (5)$

When ${x^{\prime}=x_{P}^{\prime}}$, ${\Delta x_{P}^{\prime n}=0}$, so ${g_{ij}^{\prime}=g_{ij}}$ at point ${P}$.

Now consider the second term. By renaming the dummy indices ${b\rightarrow m}$ and ${a\rightarrow m}$, we have

$\displaystyle g_{ib}\left[\Gamma_{\; jn}^{b}\right]_{P}+g_{aj}\left[\Gamma_{\; in}^{a}\right]_{P}=g_{im}\left[\Gamma_{\; jn}^{m}\right]_{P}+g_{mj}\left[\Gamma_{\; in}^{m}\right]_{P} \ \ \ \ \ (6)$

Now because the covariant derivative (with respect to the primed coordinates) of the metric is zero, we have

$\displaystyle \nabla_{n}^{\prime}g_{ij}=\partial_{n}^{\prime}g_{ij}-\Gamma_{\; in}^{m}g_{mj}-\Gamma_{\; jn}^{m}g_{im}=0 \ \ \ \ \ (7)$

Therefore, we can write

$\displaystyle g_{ij}^{\prime}=g_{ij}-\partial_{n}^{\prime}g_{ij}\Delta x_{P}^{\prime n}+\left[\Gamma_{\; in}^{a}\right]_{P}\left[\Gamma_{\; js}^{b}\right]_{P}\Delta x_{P}^{\prime n}\Delta x_{P}^{\prime s}g_{ab} \ \ \ \ \ (8)$

If we take the derivative of this with respect to a particular primed coordinate ${x^{\prime k}}$ we can use

$\displaystyle \frac{\partial\Delta x_{P}^{\prime n}}{\partial x^{\prime k}}=\delta_{\; k}^{n} \ \ \ \ \ (9)$

and then evaluate the result at ${x^{\prime}=x_{P}^{\prime}}$ so that all terms containing ${\Delta x_{P}^{\prime n}}$ vanish. We get

 $\displaystyle \frac{\partial g_{ij}^{\prime}}{\partial x^{\prime k}}$ $\displaystyle =$ $\displaystyle \frac{\partial g_{ij}}{\partial x^{\prime k}}-\frac{\partial g_{ij}}{\partial x^{\prime n}}\delta_{\; k}^{n}\ \ \ \ \ (10)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (11)$

Thus all the first derivatives of ${g_{ij}^{\prime}}$ are zero in the primed coordinate system.

# Covariant derivative of the metric tensor

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 17; 9.

One interesting and useful theorem is that the covariant derivative of any metric tensor is always zero. We can show this by using the expression for the covariant derivative of a general tensor to say:

$\displaystyle \nabla_{j}g_{kl}=\partial_{j}g_{kl}-\Gamma_{\; jk}^{m}g_{ml}-\Gamma_{\; jl}^{m}g_{km} \ \ \ \ \ (1)$

We can combine this with the explicit expression for the Christoffel symbols:

$\displaystyle \Gamma_{\; ij}^{m}=\frac{1}{2}g^{ml}\left(\partial_{j}g_{il}+\partial_{i}g_{lj}-\partial_{l}g_{ji}\right) \ \ \ \ \ (2)$

Substituting, we get

$\displaystyle \nabla_{j}g_{kl}=\partial_{j}g_{kl}-\frac{1}{2}g^{mn}\left(\partial_{k}g_{jn}+\partial_{j}g_{nk}-\partial_{n}g_{kj}\right)g_{ml}-\frac{1}{2}g^{mn}\left(\partial_{l}g_{jln}+\partial_{j}g_{nl}-\partial_{n}g_{lj}\right)g_{km} \ \ \ \ \ (3)$

Since ${g^{mn}g_{ml}=\delta_{\; l}^{n}}$, we get

$\displaystyle \nabla_{j}g_{kl}=\partial_{j}g_{kl}-\frac{1}{2}\left(\partial_{k}g_{jl}+\partial_{j}g_{lk}-\partial_{l}g_{kj}\right)-\frac{1}{2}\left(\partial_{l}g_{jk}+\partial_{j}g_{kl}-\partial_{k}g_{lj}\right) \ \ \ \ \ (4)$

Now we use the symmetry of the metric tensor: ${g_{kl}=g_{lk}}$:

 $\displaystyle \nabla_{j}g_{kl}$ $\displaystyle =$ $\displaystyle \partial_{j}g_{lk}-\frac{1}{2}\left(\partial_{k}g_{jl}+\partial_{j}g_{lk}-\partial_{l}g_{kj}\right)-\frac{1}{2}\left(\partial_{l}g_{kj}+\partial_{j}g_{lk}-\partial_{k}g_{jl}\right)\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (6)$

# Covariant derivative of a vector in the Schwarzschild metric

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 17; 8.

Here’s another example of calculating the covariant derivative in the Schwarzschild (S) metric. We’re given a vector with coordinates in the S metric of:

$\displaystyle \mathbf{v}=\left[1-\frac{2GM}{r},0,0,0\right] \ \ \ \ \ (1)$

The covariant derivative is given by

$\displaystyle \nabla_{j}v^{k}\equiv\frac{\partial v^{k}}{\partial x^{j}}+v^{i}\Gamma_{\; ij}^{k} \ \ \ \ \ (2)$

Since the only non-zero component of ${\mathbf{v}}$ is ${v^{t}}$ and it depends only on ${r}$, most of the terms are zero.

$\displaystyle \Gamma_{\; ij}^{t}=\left[\begin{array}{cccc} 0 & \frac{GM}{r^{2}}\left(1-\frac{2GM}{r}\right)^{-1} & 0 & 0\\ \frac{GM}{r^{2}}\left(1-\frac{2GM}{r}\right)^{-1} & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \end{array}\right] \ \ \ \ \ (3)$

$\displaystyle \Gamma_{\; ij}^{r}=\left[\begin{array}{cccc} \frac{GM}{r^{2}}\left(1-\frac{2GM}{r}\right) & 0 & 0 & 0\\ 0 & -\frac{GM}{r^{2}}\left(1-\frac{2GM}{r}\right)^{-1} & 0 & 0\\ 0 & 0 & -r\left(1-\frac{2GM}{r}\right) & 0\\ 0 & 0 & 0 & -r\sin^{2}\theta\left(1-\frac{2GM}{r}\right) \end{array}\right] \ \ \ \ \ (4)$

$\displaystyle \Gamma_{\; ij}^{\theta}=\left[\begin{array}{cccc} 0 & 0 & 0 & 0\\ 0 & 0 & \frac{1}{r} & 0\\ 0 & \frac{1}{r} & 0 & 0\\ 0 & 0 & 0 & -\sin\theta\cos\theta \end{array}\right] \ \ \ \ \ (5)$

$\displaystyle \Gamma_{\; ij}^{\phi}=\left[\begin{array}{cccc} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & \frac{1}{r}\\ 0 & 0 & 0 & \cot\theta\\ 0 & \frac{1}{r} & \cot\theta & 0 \end{array}\right] \ \ \ \ \ (6)$

The one non-zero derivative is

$\displaystyle \frac{\partial v^{t}}{\partial r}=\frac{2GM}{r^{2}} \ \ \ \ \ (7)$

and the values of the second term in 2 are

 $\displaystyle v^{i}\Gamma_{\; ir}^{t}$ $\displaystyle =$ $\displaystyle \frac{GM}{r^{2}}\ \ \ \ \ (8)$ $\displaystyle v^{i}\Gamma_{\; it}^{r}$ $\displaystyle =$ $\displaystyle \frac{GM}{r^{2}}\left(1-\frac{2GM}{r}\right)^{2} \ \ \ \ \ (9)$

with all other terms being zero.

The covariant derivative is then (with ${j}$ the row index and ${k}$ the column index):

$\displaystyle \nabla_{j}v^{k}=\left[\begin{array}{cccc} 0 & \frac{GM}{r^{2}}\left(1-\frac{2GM}{r}\right)^{2} & 0 & 0\\ \frac{3GM}{r^{2}} & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \end{array}\right] \ \ \ \ \ (10)$

# Christoffel symbols in sinusoidal coordinates

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 17; Problem P17.6.

We are now in a position to revisit the system with sinusoidal coordinates. To review, we had a 2-d system with coordinates ${u}$ and ${w}$ defined in terms of the usual rectangular coordinates ${x}$ and ${y}$ by

 $\displaystyle u$ $\displaystyle =$ $\displaystyle x\ \ \ \ \ (1)$ $\displaystyle w$ $\displaystyle =$ $\displaystyle y-A\sin\left(bx\right) \ \ \ \ \ (2)$

The metric for this system is

$\displaystyle g_{ij}=\left[\begin{array}{cc} 1+\left[Ab\cos\left(bu\right)\right]^{2} & Ab\cos\left(bu\right)\\ Ab\cos\left(bu\right) & 1 \end{array}\right] \ \ \ \ \ (3)$

We looked at an object with a velocity given by ${\mathbf{v}=\left[v,0\right]}$ where ${v}$ is a constant. Clearly the acceleration of the object is zero, but if we calculate the velocity components in the ${uw}$ system, we get

 $\displaystyle v^{u}$ $\displaystyle =$ $\displaystyle v\ \ \ \ \ (4)$ $\displaystyle v^{w}$ $\displaystyle =$ $\displaystyle -Ab\cos\left(bu\right)v \ \ \ \ \ (5)$

so although ${dv^{u}/dt=0}$, ${dv^{w}/dt\ne0}$ since ${u=x=vt}$ varies with time.

To get the ‘true’ acceleration, we need to find the actual differential ${d\mathbf{v}}$ and divide this by ${dt}$. We’ve seen how to do this when we defined the Christoffel symbols:

$\displaystyle d\mathbf{v}=\left[\frac{\partial v^{k}}{\partial x^{j}}+v^{i}\Gamma_{\; ij}^{k}\right]\mathbf{e}_{k}dx^{j} \ \ \ \ \ (6)$

To calculate ${d\mathbf{v}}$, we need the ${\Gamma_{\; ij}^{k}}$ for the ${uw}$ system, which we can calculate in the usual way using the geodesic equation. We start with

$\displaystyle g_{aj}\ddot{x}^{j}+\left(\partial_{i}g_{aj}-\frac{1}{2}\partial_{a}g_{ij}\right)\dot{x}^{j}\dot{x}^{i}=0 \ \ \ \ \ (7)$

Since ${g_{ij}}$ is independent of ${w}$ only derivatives with respect to ${u}$ are non-zero. Consider first ${a=u}$; then we get

 $\displaystyle \left[1+\left[Ab\cos\left(bu\right)\right]^{2}\right]\ddot{u}+\left(Ab\cos bu\right)\ddot{w}-A^{2}b^{3}\cos bu\sin bu\dot{u}^{2}+\left[-Ab^{2}\sin bu-\left(-Ab^{2}\sin bu\right)\right]\dot{u}\dot{v}$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (8)$ $\displaystyle \left[1+\left[Ab\cos\left(bu\right)\right]^{2}\right]\ddot{u}+\left(Ab\cos bu\right)\ddot{w}-A^{2}b^{3}\cos bu\sin bu\dot{u}^{2}$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (9)$

Now for ${a=w}$:

$\displaystyle Ab\cos bu\ddot{u}+\ddot{w}-Ab^{2}\sin bu\dot{u}^{2}=0 \ \ \ \ \ (10)$

We would like to compare these equations with the equation involving the Christoffel symbols:

$\displaystyle \ddot{x}^{m}+\Gamma_{\; ij}^{m}\dot{x}^{j}\dot{x}^{i}=0 \ \ \ \ \ (11)$

However, because the metric here is not diagonal, we get second derivatives of more than one coordinate in each equation. We can eliminate ${\ddot{w}}$ by multiplying 10 by ${Ab\cos bu}$ and subtracting it from 9. This gives the convenient result:

$\displaystyle \ddot{u}=0 \ \ \ \ \ (12)$

From this we conclude that

$\displaystyle \Gamma_{\; ij}^{u}=\left[\begin{array}{cc} 0 & 0\\ 0 & 0 \end{array}\right] \ \ \ \ \ (13)$

We can substitute 12 into 10 to get

$\displaystyle \ddot{w}-Ab^{2}\sin bu\dot{u}^{2}=0 \ \ \ \ \ (14)$

from which we conclude

$\displaystyle \Gamma_{\; ij}^{w}=\left[\begin{array}{cc} -Ab^{2}\sin bu & 0\\ 0 & 0 \end{array}\right] \ \ \ \ \ (15)$

We can now evaluate 6 to find:

 $\displaystyle dv^{u}$ $\displaystyle =$ $\displaystyle \left[\frac{\partial v^{u}}{\partial x^{j}}+v^{i}\Gamma_{\; ij}^{u}\right]dx^{j}\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (17)$ $\displaystyle dv^{w}$ $\displaystyle =$ $\displaystyle \left[\frac{\partial v^{w}}{\partial x^{j}}+v^{i}\Gamma_{\; ij}^{w}\right]dx^{j}\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[Ab^{2}v\sin bu-vAb^{2}\sin bu\right]du\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (20)$

Thus by taking the proper derivative using the Christoffel symbols, the differentials of both components of velocity are zero, so the acceleration is zero in the ${uw}$ system as well.

# Covariant derivative in semi-log coordinates

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 17; Problem P17.5.

As another example of using the geodesic equation to calculate Christoffel symbols, we’ll consider the semi-log coordinates introduced earlier:

 $\displaystyle p$ $\displaystyle =$ $\displaystyle x\ \ \ \ \ (1)$ $\displaystyle q$ $\displaystyle =$ $\displaystyle e^{by} \ \ \ \ \ (2)$

The invariant interval is

 $\displaystyle ds^{2}$ $\displaystyle =$ $\displaystyle dx^{2}+dy^{2}\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle dp^{2}+\frac{1}{\left(bq\right)^{2}}dq^{2} \ \ \ \ \ (4)$

so the metric is

$\displaystyle g_{ij}=\left[\begin{array}{cc} 1 & 0\\ 0 & \frac{1}{\left(bq\right)^{2}} \end{array}\right] \ \ \ \ \ (5)$

We compare the geodesic equation:

$\displaystyle g_{aj}\ddot{x}^{j}+\left(\partial_{i}g_{aj}-\frac{1}{2}\partial_{a}g_{ij}\right)\dot{x}^{j}\dot{x}^{i}=0 \ \ \ \ \ (6)$

with the expression for the Christoffel symbols:

$\displaystyle \ddot{x}^{m}+\Gamma_{\; ij}^{m}\dot{x}^{j}\dot{x}^{i}=0 \ \ \ \ \ (7)$

We get

 $\displaystyle \ddot{p}$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (8)$ $\displaystyle \frac{1}{\left(bq\right)^{2}}\ddot{q}-\frac{2}{b^{2}q^{3}}\dot{q}^{2}-\frac{1}{2}\left(-\frac{2}{b^{2}q^{3}}\dot{q}^{2}\right)$ $\displaystyle =$ $\displaystyle 0\ \ \ \ \ (9)$ $\displaystyle \ddot{q}-\frac{1}{q}\dot{q}^{2}$ $\displaystyle =$ $\displaystyle 0 \ \ \ \ \ (10)$

The Christoffel symbols are thus

 $\displaystyle \Gamma_{\; ij}^{p}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 0 & 0\\ 0 & 0 \end{array}\right]\ \ \ \ \ (11)$ $\displaystyle \Gamma_{\; ij}^{q}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 0 & 0\\ 0 & -\frac{1}{q} \end{array}\right] \ \ \ \ \ (12)$

Now consider a vector field ${A^{i}=\left[0,Cx\right]}$ (where ${C}$ is a constant) in rectangular coordinates. In these coordinates, its covariant derivative is just the normal derivative, so

 $\displaystyle \nabla_{i}A^{j}$ $\displaystyle =$ $\displaystyle \partial_{i}A^{j}\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 0 & C\\ 0 & 0 \end{array}\right] \ \ \ \ \ (14)$

In the semi-log system, we have

$\displaystyle \nabla_{i}A^{j}=\partial_{i}A^{j}+\Gamma_{\; ik}^{j}A^{k} \ \ \ \ \ (15)$

To use this formula we need ${A^{i}}$ in the ${pq}$ system, which we can get by the usual transformation, using:

 $\displaystyle A^{p}$ $\displaystyle =$ $\displaystyle A^{i}\partial_{i}p=A^{x}=0\ \ \ \ \ (16)$ $\displaystyle A^{q}$ $\displaystyle =$ $\displaystyle A^{i}\partial_{i}q=A^{y}\left(be^{by}\right)=Cbpq \ \ \ \ \ (17)$

With these transformations, we get

 $\displaystyle \nabla_{i}A^{j}$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 0 & Cbq\\ 0 & Cbp-Cbp \end{array}\right]\ \ \ \ \ (18)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\begin{array}{cc} 0 & Cbq\\ 0 & 0 \end{array}\right] \ \ \ \ \ (19)$

The difference in gradients is because of the different scales used in the vertical direction. Consider first the vertical component of ${A}$. In rectangular coordinates this has a constant value for a given value of ${x}$ namely ${A^{y}=Cx}$. However, if we use ${q}$ to measure vertical distance, a unit change in ${y}$ results in a larger and larger change in ${q}$ the higher up the vertical axis we go, so ${A^{q}}$ must increase as ${q}$ increases, even if ${x=p}$ is held constant. This is reflected by 17, where ${A^{q}}$ is proportional to ${q}$ as well as ${p}$.

If we used ordinary derivatives to calculate the gradient of ${A}$ in the ${pq}$ system, we would get ${\partial_{q}A^{q}=Cbp}$. However, the ‘true’ value of the vertical component of ${A}$ doesn’t change as we move up or down a vertical line, and this is reflected by the ${\Gamma_{\; ik}^{q}A^{k}=-Cbp}$ correction term that is present in the covariant derivative 15, with the result that ${\nabla_{q}A^{q}=\partial_{q}A^{q}+\Gamma_{\; qk}^{q}A^{k}=Cbp-Cbp=0}$.

The value of ${\nabla_{p}A^{q}=Cbq}$ again reflects the fact that a vertical component that is constant in rectangular coordinates must get numerically larger with increasing height in the ${pq}$ system.

As a final test that all is well, we can use the standard tensor transformation to transform 19 back to rectangular coordinates, where we denote rectangular coordinates by ${r^{i}}$ and the semi-log ${pq}$ system by ${s}$:

$\displaystyle \left[\nabla_{i}A^{j}\right]_{r}=\left[\nabla_{a}A^{b}\right]_{s}\frac{\partial s^{a}}{\partial r^{i}}\frac{\partial r^{j}}{\partial s^{b}} \ \ \ \ \ (20)$

The only non-zero component of ${\left[\nabla_{a}A^{b}\right]_{s}}$ is ${\left[\nabla_{p}A^{q}\right]_{s}=Cbq}$, so the RHS has only one non-zero term, which occurs when ${a=p}$, ${i=x}$, ${b=q}$ and ${j=y}$. For this term we have

 $\displaystyle \left[\nabla_{x}A^{y}\right]_{r}$ $\displaystyle =$ $\displaystyle Cbq\frac{\partial p}{\partial x}\frac{\partial y}{\partial q}\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle Cbq\left(1\right)\frac{1}{bq}\ \ \ \ \ (22)$ $\displaystyle$ $\displaystyle =$ $\displaystyle C \ \ \ \ \ (23)$

The overall transformation thus gives 14 back again.

# Christoffel symbols: symmetry

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 17; Box 17.3.

The Christoffel symbols are defined in terms of the basis vectors in a given coordinate system as:

$\displaystyle \boxed{\frac{\partial\mathbf{e}_{i}}{\partial x^{j}}=\Gamma_{\; ij}^{k}\mathbf{e}_{k}} \ \ \ \ \ (1)$

Remember that the basis vectors ${\mathbf{e}_{i}}$ are defined so that

 $\displaystyle ds^{2}$ $\displaystyle =$ $\displaystyle d\mathbf{s}\cdot d\mathbf{s}\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(dx^{i}\mathbf{e}_{i}\right)\cdot\left(dx^{j}\mathbf{e}_{j}\right)\ \ \ \ \ (3)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \mathbf{e}_{i}\cdot\mathbf{e}_{j}dx^{i}dx^{j}\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle \equiv$ $\displaystyle g_{ij}dx^{i}dx^{j} \ \ \ \ \ (5)$

In a locally flat frame using rectangular spatial coordinates, the basis vectors ${\mathbf{e}_{i}}$ are all constants, so from 1, all the Christoffel symbols must be zero: ${\Gamma_{\; ij}^{k}=0}$.

Now let’s look at the second covariant derivative of a scalar field ${\Phi}$:

 $\displaystyle \nabla_{i}\nabla_{j}\Phi$ $\displaystyle =$ $\displaystyle \nabla_{i}\left(\partial_{j}\Phi\right)\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \partial_{i}\partial_{j}\Phi-\Gamma_{ij}^{k}\partial_{k}\Phi \ \ \ \ \ (7)$

where in 6 we used rule 1 for the covariant derivate: the covariant derivative of a scalar is the same as the ordinary derivative.

In the locally flat frame, this equation reduces to

$\displaystyle \nabla_{i}\nabla_{j}\Phi=\partial_{i}\partial_{j}\Phi \ \ \ \ \ (8)$

Since the covariant derivative is a tensor, this is a tensor equation, and since ordinary partial derivatives commute, this equation is the same if we swap the indices ${i}$ and ${j}$. Tensor equations must have the same form in all coordinate systems, so this implies that 7 must also be invariant if we swap ${i}$ and ${j}$. This means that the Christoffel symbols are symmetric under exchange of their two lower indices:

$\displaystyle \boxed{\Gamma_{ij}^{k}=\Gamma_{ji}^{k}} \ \ \ \ \ (9)$

At first glance, this seems wrong, since from the definition 1 this symmetry implies that

$\displaystyle \frac{\partial\mathbf{e}_{i}}{\partial x^{j}}=\frac{\partial\mathbf{e}_{j}}{\partial x^{i}} \ \ \ \ \ (10)$

In 2-D polar coordinates, if we take the usual unit vectors ${\hat{\mathbf{r}}}$ and ${\hat{\boldsymbol{\theta}}}$ then both these vectors are constants as we change ${r}$ and both of them change when we change ${\theta}$, so it’s certainly not true that ${\partial\hat{\mathbf{r}}/\partial\theta=\partial\hat{\boldsymbol{\theta}}/\partial r}$, for example. However, remember that the basis vectors we’re using are not the usual unit vectors; rather they are defined so that condition 4 is true. In polar coordinates, we have

$\displaystyle ds^{2}=dr^{2}+r^{2}d\theta^{2} \ \ \ \ \ (11)$

so

 $\displaystyle \mathbf{e}_{r}$ $\displaystyle =$ $\displaystyle \hat{\mathbf{r}}=\cos\theta\hat{\mathbf{x}}+\sin\theta\hat{\mathbf{y}}\ \ \ \ \ (12)$ $\displaystyle \mathbf{e}_{\theta}$ $\displaystyle =$ $\displaystyle r\hat{\boldsymbol{\theta}}=-r\sin\theta\hat{\mathbf{x}}+r\cos\theta\hat{\mathbf{y}} \ \ \ \ \ (13)$

For the derivatives, we have

 $\displaystyle \frac{\partial\mathbf{e}_{r}}{\partial\theta}$ $\displaystyle =$ $\displaystyle -\sin\theta\hat{\mathbf{x}}+\cos\theta\hat{\mathbf{y}}=\hat{\boldsymbol{\theta}}\ \ \ \ \ (14)$ $\displaystyle \frac{\partial\mathbf{e}_{\theta}}{\partial r}$ $\displaystyle =$ $\displaystyle -\sin\theta\hat{\mathbf{x}}+\cos\theta\hat{\mathbf{y}}=\hat{\boldsymbol{\theta}} \ \ \ \ \ (15)$

Thus the condition 10 is actually satisfied here.

# Covariant derivative of a general tensor

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 17; Box 17.2.

We’ve seen how to define the absolute gradient or covariant derivative of a contravariant vector, giving the formula:

$\displaystyle \boxed{\nabla_{j}A^{k}\equiv\frac{\partial A^{k}}{\partial x^{j}}+A^{i}\Gamma_{\; ij}^{k}} \ \ \ \ \ (1)$

The covariant derivative of this vector is a tensor, unlike the ordinary derivative. Here we see how to generalize this to get the absolute gradient of tensors of any rank.

First, let’s find the covariant derivative of a covariant vector ${B_{i}}$. The starting is to consider ${\nabla_{j}\left(A^{i}B_{i}\right)}$. The quantity ${A^{i}B_{i}}$ is a scalar, and to proceed we require two conditions:

1. The covariant derivative of a scalar is the same as the ordinary derivative.
2. The covariant derivative obeys the product rule.

These two conditions aren’t derived; they are just required as part of the definition of the covariant derivative.

Using these rule 2, we have

 $\displaystyle \nabla_{j}\left(A^{i}B_{i}\right)$ $\displaystyle =$ $\displaystyle \left(\nabla_{j}A^{i}\right)B_{i}+A^{i}\left(\nabla_{j}B_{i}\right)\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\partial_{j}A^{i}+A^{k}\Gamma_{\; kj}^{i}\right)B_{i}+A^{i}\left(\nabla_{j}B_{i}\right) \ \ \ \ \ (3)$

We now apply rule 1 to the LHS:

 $\displaystyle \nabla_{j}\left(A^{i}B_{i}\right)$ $\displaystyle =$ $\displaystyle \partial_{j}\left(A^{i}B_{i}\right)\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\partial_{j}A^{i}\right)B_{i}+A^{i}\left(\partial_{j}B_{i}\right) \ \ \ \ \ (5)$

Equating 3 and 5 we get

 $\displaystyle \left(\partial_{j}A^{i}+A^{k}\Gamma_{\; kj}^{i}\right)B_{i}+A^{i}\left(\nabla_{j}B_{i}\right)$ $\displaystyle =$ $\displaystyle \left(\partial_{j}A^{i}\right)B_{i}+A^{i}\left(\partial_{j}B_{i}\right)\ \ \ \ \ (6)$ $\displaystyle B_{i}A^{k}\Gamma_{\; kj}^{i}+A^{i}\left(\nabla_{j}B_{i}\right)$ $\displaystyle =$ $\displaystyle A^{i}\left(\partial_{j}B_{i}\right)\ \ \ \ \ (7)$ $\displaystyle B_{i}A^{k}\Gamma_{\; kj}^{i}+A^{k}\left(\nabla_{j}B_{k}\right)$ $\displaystyle =$ $\displaystyle A^{k}\left(\partial_{j}B_{k}\right)\ \ \ \ \ (8)$ $\displaystyle \left[B_{i}\Gamma_{\; kj}^{i}+\left(\nabla_{j}B_{k}\right)\right]A^{k}$ $\displaystyle =$ $\displaystyle A^{k}\left(\partial_{j}B_{k}\right) \ \ \ \ \ (9)$

In 8 we’ve relabelled the dummy index ${i}$ to ${k}$ in the second and third terms so we could factor out ${A^{k}}$ in the last line. Since the vector ${A^{k}}$ is arbitrary, the factors multiplying it on each side must be equal, so we get

$\displaystyle B_{i}\Gamma_{\; kj}^{i}+\left(\nabla_{j}B_{k}\right)=\partial_{j}B_{k} \ \ \ \ \ (10)$

from which we can get the covariant derivative of ${B_{k}}$:

$\displaystyle \boxed{\nabla_{j}B_{k}=\partial_{j}B_{k}-B_{i}\Gamma_{\; kj}^{i}} \ \ \ \ \ (11)$

To extend this argument to a tensor of higher rank with mixed indices, we generalize this argument. First, we contract all the indices of the tensor with covariant or contravariant vectors, as appropriate, and then apply the product rule. So for example

 $\displaystyle \nabla_{l}\left(C_{jk}^{i}A_{i}B^{j}D^{k}\right)$ $\displaystyle =$ $\displaystyle \nabla_{l}\left(C_{jk}^{i}\right)A_{i}B^{j}D^{k}+C_{jk}^{i}\nabla_{l}\left(A_{i}\right)B^{j}D^{k}+C_{jk}^{i}A_{i}\nabla_{l}\left(B^{j}\right)D^{k}+C_{jk}^{i}A_{i}B^{j}\nabla_{l}\left(D^{k}\right)\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \partial_{l}\left(C_{jk}^{i}\right)A_{i}B^{j}D^{k}+C_{jk}^{i}\partial_{l}\left(A_{i}\right)B^{j}D^{k}+C_{jk}^{i}A_{i}\partial_{l}\left(B^{j}\right)D^{k}+C_{jk}^{i}A_{i}B^{j}\partial_{l}\left(D^{k}\right) \ \ \ \ \ (13)$

We now substitute for the covariant derivatives of the vectors in 12 and set the result equal to 13, and cancel terms. We then use the fact that ${A_{i}}$, ${B^{j}}$ and ${D^{k}}$ are all arbitrary so the factors on each side of the equation multiplying them must be equal. The result is

$\displaystyle \nabla_{l}C_{jk}^{i}=\partial_{l}C_{jk}^{i}+\Gamma_{lm}^{i}C_{jk}^{m}-\Gamma_{lj}^{m}C_{mk}^{i}-\Gamma_{lk}^{m}C_{jm}^{i} \ \ \ \ \ (14)$

In general, the rule is that for each contravariant (upper) index in the tensor, there is a positive term with a Christoffel symbol, and for each covariant (lower) index, there is a negative term.

# Christoffel symbols

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 17; Box 17.1.

It’s time to return to the study of tensors so we can pave the way for the Einstein equation which is the basis of general relativity. Up to now, we’ve assumed that the Schwarzschild metric described spacetime outside a spherical mass, without any arguments to back up this assumption. Developing these arguments requires some more tools from the tensor toolbox.

One problem with tensors is that their straightforward derivatives are not, in general, tensors themselves. For example, the first derivative of a scalar function ${f}$ is a covariant tensor, as it transforms according to

$\displaystyle \frac{\partial f}{\partial x'^{a}}=\frac{\partial x^{i}}{\partial x'^{a}}\frac{\partial f}{\partial x^{i}} \ \ \ \ \ (1)$

If we take the derivative of this equation with respect to another of the primed coordinates ${x^{\prime c}}$, we get

 $\displaystyle \frac{\partial^{2}f}{\partial x^{\prime a}\partial x^{\prime c}}$ $\displaystyle =$ $\displaystyle \frac{\partial^{2}x^{i}}{\partial x'^{a}\partial x'^{c}}\frac{\partial f}{\partial x^{i}}+\frac{\partial x^{i}}{\partial x'^{a}}\frac{\partial^{2}f}{\partial x^{i}\partial x^{\prime c}}\ \ \ \ \ (2)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{\partial^{2}x^{i}}{\partial x'^{a}\partial x'^{c}}\frac{\partial f}{\partial x^{i}}+\frac{\partial x^{i}}{\partial x'^{a}}\frac{\partial x^{j}}{\partial x'^{c}}\frac{\partial^{2}f}{\partial x^{i}\partial x^{j}} \ \ \ \ \ (3)$

where we used the chain rule on the second term. In order for ${\frac{\partial^{2}f}{\partial x^{i}\partial x^{j}}}$ to transform like a tensor, the first term in the last line would have to be zero, but it’s clearly not, in general.

To remedy this problem, first cast your mind back to the definition of basis vectors in some coordinate system. The basis vectors ${\mathbf{e}_{i}}$ are linearly independent vectors that are tangent to the lines of constant coordinate values and that span the space. They must satisfy

 $\displaystyle ds^{2}$ $\displaystyle =$ $\displaystyle d\mathbf{s}\cdot d\mathbf{s}\ \ \ \ \ (4)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(dx^{i}\mathbf{e}_{i}\right)\cdot\left(dx^{j}\mathbf{e}_{j}\right)\ \ \ \ \ (5)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \mathbf{e}_{i}\cdot\mathbf{e}_{j}dx^{i}dx^{j}\ \ \ \ \ (6)$ $\displaystyle$ $\displaystyle \equiv$ $\displaystyle g_{ij}dx^{i}dx^{j} \ \ \ \ \ (7)$

where ${g_{ij}}$ is the metric tensor. Keep in mind that, for a general coordinate system, these basis vectors need not be either orthogonal or unit vectors, and that they can change as we move around. As such, we can consider the derivative of basis vector ${\mathbf{e}_{i}}$ with respect to coordinate ${x^{j}}$ with all other coordinates held constant. Since the derivative of a vector is another vector, and the basis vectors span the space, we can express this derivative as a linear combination of the basis vectors at the point at which the derivative is taken. That is

$\displaystyle \boxed{\frac{\partial\mathbf{e}_{i}}{\partial x^{j}}=\Gamma_{\; ij}^{k}\mathbf{e}_{k}} \ \ \ \ \ (8)$

The quantities ${\Gamma_{\; ij}^{k}}$ are called Christoffel symbols, named after Elwin Bruno Christoffel, a 19th century German mathematician and physicist. (Students of GR often refer to them as the ‘Christ-awful’ symbols, since formulas involving them can be tricky to use and remember due to the number of indices involved.) It’s important to note that although ${\Gamma_{\; ij}^{k}}$ is written with indices that make it look like a tensor, it is not a tensor on its own. The transformation equation for ${\Gamma_{\; ij}^{k}}$ can be derived from its explicit form (which we haven’t got to yet), but for reference here it is:

$\displaystyle \left(\Gamma'\right)_{\; mn}^{l}=\Gamma_{\; ij}^{k}\frac{\partial x'^{l}}{\partial x^{k}}\frac{\partial x^{i}}{\partial x'^{m}}\frac{\partial x^{j}}{\partial x'^{n}}+\frac{\partial x'^{l}}{\partial x^{k}}\frac{\partial^{2}x^{k}}{\partial x'^{m}\partial x'^{n}} \ \ \ \ \ (9)$

The first term is the normal transformation for a rank-3 tensor, but the second term spoils the transformation. However, notice that if we use the special coordinate transformation where ${x'^{m}=x^{i}}$ and ${x'^{n}=x^{j}}$, then the second derivative in the second term vanishes, since ${x^{k},x^{i}}$ and ${x^{j}}$ are independent variables in the same coordinate system, so none of them depends on the others. Furthermore, in this special case ${\frac{\partial x^{i}}{\partial'x^{m}}=\frac{\partial x^{j}}{\partial x'^{n}}=1}$, so the transformation becomes

$\displaystyle \left(\Gamma'\right)_{\; mn}^{l}=\Gamma_{\; ij}^{k}\frac{\partial x'^{l}}{\partial x^{k}} \ \ \ \ \ (10)$

which is a valid tensor transformation. That is, the Christoffel symbol’s upper index does transform as a tensor, which makes sense, since 8 defines a four-vector with components ${\Gamma_{\; ij}^{k}}$, holding ${i}$ and ${j}$ fixed.

We’ll get to methods for calculating Christoffel symbols in a later post. For now, we’ll see how we can define a derivative of a tensor that is itself always another tensor. Suppose we have a vector field ${\mathbf{A}\left(x^{i}\right)}$. In a given coordinate system, we can write this in terms of the basis vectors:

$\displaystyle \mathbf{A}=A^{i}\mathbf{e}_{i} \ \ \ \ \ (11)$

If we calculate its differential we get

 $\displaystyle d\mathbf{A}$ $\displaystyle =$ $\displaystyle d\left(A^{i}\mathbf{e}_{i}\right)\ \ \ \ \ (12)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(dA^{i}\right)\mathbf{e}_{i}+A^{i}\left(d\mathbf{e}_{i}\right)\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\frac{\partial A^{i}}{\partial x^{j}}dx^{j}\right)\mathbf{e}_{i}+A^{i}\left(\frac{\partial\mathbf{e}_{i}}{\partial x^{j}}dx^{j}\right)\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left(\frac{\partial A^{i}}{\partial x^{j}}dx^{j}\right)\mathbf{e}_{i}+A^{i}\Gamma_{\; ij}^{k}\mathbf{e}_{k}dx^{j}\ \ \ \ \ (15)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \left[\frac{\partial A^{k}}{\partial x^{j}}+A^{i}\Gamma_{\; ij}^{k}\right]\mathbf{e}_{k}dx^{j}\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \nabla_{j}A^{k}\mathbf{e}_{k}dx^{j} \ \ \ \ \ (17)$

where in line 16 we relabelled the index ${i}$ to ${k}$ in the first term, and in line 17 we defined the absolute gradient:

$\displaystyle \boxed{\nabla_{j}A^{k}\equiv\frac{\partial A^{k}}{\partial x^{j}}+A^{i}\Gamma_{\; ij}^{k}} \ \ \ \ \ (18)$

The combination ${\nabla_{j}A^{k}dx^{j}}$ has only one free index ${k}$, and it appears in the form

$\displaystyle \left(\nabla_{j}A^{k}dx^{j}\right)\mathbf{e}_{k} \ \ \ \ \ (19)$

so that ${\nabla_{j}A^{k}dx^{j}}$ is component ${k}$ of a four-vector. Since the differential ${dx^{j}}$ is a tensor, the absolute gradient ${\nabla_{j}A^{k}}$ must also be a tensor of rank 2. We’ve thus found a derivative of a tensor (well, just a four-vector so far) that is itself a tensor. The term ‘absolute gradient’ seems to be peculiar to Moore’s book; it’s more commonly known as the covariant derivative.

# Parallel transport of tensors

Required math: algebra, calculus

Required physics: none

Reference: d’Inverno, Ray, Introducing Einstein’s Relativity (1992), Oxford Uni Press. – Section 6.4; Problem 6.7.

From basic linear algebra, we’re familiar with the idea of moving a vector parallel to itself. To add two vectors A and B in 3-d Euclidean space, for example, we’re usually told to move the vector B parallel to itself so that its tail coincides with the head of A and then draw the sum as the vector from the tail of A to the head of B.

Moving a vector in this way always works in ‘flat’ space such as 3-d space spanned by the three rectangular basis vectors because this space is uniform everywhere. To see that this isn’t always the case for different types of space, consider the example of a sphere.

Here, the notion of a vector has to be defined a bit more carefully. In 3-d flat space, a vector belongs to the same space as the space used to define the points between which we can move. Said another way, the tangent space to a point in 3-d flat space is just that very 3-d flat space itself. On a sphere, though, the tangent space to a point on the sphere is a plane which intersects the sphere at that single point. The tangent space (the plane) is no longer the same space as the surface of the sphere. The vector tangent to the sphere at a point exists in a different space from the sphere itself.

To make matters worse, every tangent vector on the sphere exists in a different tangent space, since the plane tangent to the sphere varies as we move around the sphere. Thus if we try to move the tangent vector and, in some sense, keep it parallel to itself, we are faced with the problem of comparing vectors from different tangent spaces.

A common but illustrative example is to start with a vector tangent to the Earth (assuming the Earth is a sphere, which isn’t quite right, but it will do) at the intersection of the Greenwich meridian (${0^{\circ}}$ longitude) and the equator (${0^{\circ}}$ latitude), with the vector pointing north. If we now walk due north along this line until we reach the north pole and keep the vector parallel to itself (in the sense of keeping it pointing due north and tangent to the sphere at each point), then when we reach the north pole, we’ll have a vector that is horizontal and which points down the ${180^{\circ}}$ meridian.

However, if we return to our starting point on the equator and now walk due east until we reach a longitude of ${90^{\circ}}$E, still keeping our vector pointing due north, then walk north along this meridian until we again reach the north pole, we will now end up with a vector that is horizontal and pointing down the ${90^{\circ}}$W meridian. The angle between the two north pole vectors is thus ${90^{\circ}}$, even though we took great care to keep the vector ‘constant’ as we moved it through the two paths.

This example serves to show that if we are in a curved space, the notion of parallel transport of a vector (and by extension, a tensor) depends on the path we take. This is in fact a general problem and there isn’t a clever way of defining some new term that eliminates this effect.

How can we describe this situation mathematically? Returning to our 3-d flat-space situation for a moment, suppose we define some curve by means of a parameter ${u}$, so that

$\displaystyle x^{a}=x^{a}\left(u\right) \ \ \ \ \ (1)$

Now suppose we have a vector v and we want to transport this vector along this curve in such a way that the vector doesn’t change. Mathematically, what we’re saying is that

$\displaystyle \frac{dv^{a}}{du}=0 \ \ \ \ \ (2)$

for each vector component ${v^{a}}$. Introducing the coordinates, we can expand this using the chain rule to get

$\displaystyle \frac{dv^{a}}{du}=\frac{\partial v^{a}}{\partial x^{b}}\frac{dx^{b}}{du}=0 \ \ \ \ \ (3)$

Since the tangent vector to the curve has components ${dx^{b}/du}$, this equation is taking the contraction of the tangent vector with the derivatives of ${v^{a}}$.

Now we’ve seen that in general, the partial derivative of a vector is not a tensor, and that we got round this problem by introducing the covariant derivative, which is a tensor. Since the tangent vector ${dx^{b}/du}$ is a tensor we don’t need to modify it, but to make the derivative of ${v^{a}}$ along the curve a tensor, we need to generalize the ordinary derivative above to a covariant derivative. That is, we define the condition for parallel transport of a vector in a general curved coordinate system to be

$\displaystyle \frac{dv^{a}}{du}=\frac{dx^{b}}{du}v_{;b}^{a}=0 \ \ \ \ \ (4)$

Expanding the covariant derivative, we get

$\displaystyle \frac{dv^{a}}{du}=\frac{dx^{b}}{du}\left(\frac{\partial v^{a}}{\partial x^{b}}+v^{c}\Gamma_{cb}^{a}\right) \ \ \ \ \ (5)$

We see that if the connections ${\Gamma_{cb}^{a}}$ are all zero, this reduces to the flat-space case. Further, the condition for parallel transport becomes

$\displaystyle \frac{dx^{b}}{du}\left(\frac{\partial v^{a}}{\partial x^{b}}+v^{c}\Gamma_{cb}^{a}\right)=0 \ \ \ \ \ (6)$

For a general tensor, the covariant derivative is

$\displaystyle T_{cd\ldots;e}^{ab\ldots}=\partial_{e}T_{cd\ldots}^{ab\ldots}+T_{cd\ldots}^{fb\ldots}\Gamma_{fe}^{a}+T_{cd\ldots}^{af\ldots}\Gamma_{fe}^{b}+\ldots-T_{fd\ldots}^{ab\ldots}\Gamma_{ce}^{f}-T_{cf\ldots}^{ab\ldots}\Gamma_{de}^{f}-\ldots \ \ \ \ \ (7)$

so we’d need to plug this into the equation above to get

 $\displaystyle \frac{dT_{cd\ldots}^{ab\ldots}}{du}$ $\displaystyle =$ $\displaystyle \frac{dx^{e}}{du}T_{cd\ldots;e}^{ab\ldots}\ \ \ \ \ (8)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \frac{dx^{e}}{du}\left[\partial_{e}T_{cd\ldots}^{ab\ldots}+T_{cd\ldots}^{fb\ldots}\Gamma_{fe}^{a}+T_{cd\ldots}^{af\ldots}\Gamma_{fe}^{b}+\ldots-T_{fd\ldots}^{ab\ldots}\Gamma_{ce}^{f}-T_{cf\ldots}^{ab\ldots}\Gamma_{de}^{f}-\ldots\right] \ \ \ \ \ (9)$

The tangent vector can be written as

$\displaystyle \frac{dx^{a}}{du}\equiv X^{a} \ \ \ \ \ (10)$

and we get what is called the absolute derivative of a tensor:

$\displaystyle \frac{dT_{cd\ldots}^{ab\ldots}}{du}=\frac{dx^{e}}{du}T_{cd\ldots;e}^{ab\ldots} \ \ \ \ \ (11)$

We can generalize this idea to get the contraction of any vector ${X}$ with the covariant derivative of a tensor ${T}$. The notation used for this is:

$\displaystyle \nabla_{X}T_{cd\ldots}^{ab\ldots}\equiv X^{e}T_{cd\ldots;e}^{ab\ldots} \ \ \ \ \ (12)$

This quantity satisfies some standard properties. If we restrict ourselves to vector fields, then if we have three vector fields ${W,\; Y}$ and ${Z}$, two scalar functions ${f}$ and ${g}$, and two constants ${\lambda}$ and ${\mu}$, then we can derive some of these properties. First:

 $\displaystyle \nabla_{W}\left(\lambda Y+\mu Z\right)^{a}$ $\displaystyle =$ $\displaystyle W^{b}\left(\lambda\frac{\partial Y^{a}}{\partial x^{b}}+\lambda Y^{c}\Gamma_{cb}^{a}+\mu\frac{\partial Z^{a}}{\partial x^{b}}+\mu Z^{c}\Gamma_{cb}^{a}\right)\ \ \ \ \ (13)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \lambda W^{b}\left(\frac{\partial Y^{a}}{\partial x^{b}}+Y^{c}\Gamma_{cb}^{a}\right)+\mu W^{b}\left(\frac{\partial Z^{a}}{\partial x^{b}}+Z^{c}\Gamma_{cb}^{a}\right)\ \ \ \ \ (14)$ $\displaystyle$ $\displaystyle =$ $\displaystyle \lambda\nabla_{W}Y^{a}+\mu\nabla_{W}Z^{a} \ \ \ \ \ (15)$

Second:

 $\displaystyle \nabla_{fW+gY}Z^{a}$ $\displaystyle =$ $\displaystyle \left(fW^{b}+gY^{b}\right)\left(\frac{\partial Z^{a}}{\partial x^{b}}+Z^{c}\Gamma_{cb}^{a}\right)\ \ \ \ \ (16)$ $\displaystyle$ $\displaystyle =$ $\displaystyle fW^{b}\left(\frac{\partial Z^{a}}{\partial x^{b}}+Z^{c}\Gamma_{cb}^{a}\right)+gY^{b}\left(\frac{\partial Z^{a}}{\partial x^{b}}+Z^{c}\Gamma_{cb}^{a}\right)\ \ \ \ \ (17)$ $\displaystyle$ $\displaystyle =$ $\displaystyle f\nabla_{w}Z^{a}+g\nabla_{Y}Z^{a} \ \ \ \ \ (18)$

Finally:

 $\displaystyle \nabla_{W}\left(fY\right)^{a}$ $\displaystyle =$ $\displaystyle W^{b}\left(\frac{\partial\left(fY\right)^{a}}{\partial x^{b}}+\left(fY\right)^{c}\Gamma_{cb}^{a}\right)\ \ \ \ \ (19)$ $\displaystyle$ $\displaystyle =$ $\displaystyle W^{b}\left(Y^{a}\partial_{b}f+f\partial_{b}Y^{a}+fY^{c}\Gamma_{cb}^{a}\right)\ \ \ \ \ (20)$ $\displaystyle$ $\displaystyle =$ $\displaystyle Y^{a}W^{b}\partial_{b}f+f\nabla_{W}Y^{a}\ \ \ \ \ (21)$ $\displaystyle$ $\displaystyle =$ $\displaystyle Y^{a}\left(W\cdot\nabla f\right)+f\nabla_{W}Y^{a} \ \ \ \ \ (22)$

where the ${\nabla f}$ is the ordinary gradient of ${f}$. (I believe the answer given in d’Inverno’s problem 6.7 is wrong here.)