Tag Archives: summation convention

Tensor index notation

Required math: algebra

Required physics: none

Reference: Moore, Thomas A., A General Relativity Workbook, University Science Books (2013) – Chapter 4; 1,2.

Here are a few examples of the index notation and summation convention as used in tensor algebra. First, a summary of the rules for correct use of index notation:

  1. An index that is repeated twice within the same term, and where one instance is upper and the other lower, is to be summed over.
  2. A repeated index must not occur more than twice within a single term.
  3. Any index that is not repeated must occur in the same position (up or down) in all terms in an equation. (Exception: it’s allowed to set a tensor expression to zero.)
  4. Any repeated (otherwise known as ‘dummy’ or ‘bound’) index may be renamed to any other symbol, provided it doesn’t violate any of the other rules.
  5. Any single (otherwise known as ‘free’) index may be renamed to any other symbol, provided that symbol also occurs once only in each term.

Before proceeding, I should note that in relativity, some books use the convention that a Greek index can take on all values from 0 to 3 (that is, the time and all three space coordinates), while a Latin index takes on only the values 1,2,3 (space coordinates only). Other books reverse this convention. Sadly, the two books I’ve chosen to study (D’Inverno’s and Moore’s books) use opposite conventions. I’ll stick with the Latin index for all 4 coordinates (which is what D’Inverno uses), since it’s easier to type, but the reader should be aware that various books will use other conventions, so try to determine what the convention is in your favourite book before applying what you see here.

Some examples:

  1. {0=m^{2}+\left(p^{i}\right)^{2}}. This is invalid, since the {m^{2}} term doesn’t have any index, so it violates rule 3.
  2. {dF^{ij}/d\tau=0}. This is OK, since it is an example of the exception to rule 3.
  3. {dp^{i}/d\tau=g}, where {g} is a constant. Invalid; violates rule 3.
  4. {F_{ab}=\eta_{ai}\eta_{bj}F^{ik}}. Invalid. Indexes {j} and {k} occur on RHS only; violates rule 3.
  5. {A^{ab}=\eta_{ai}\eta_{bj}F^{ij}}. Invalid, since {a} and {b} occur below on the RHS and above on the LHS.
  6. {A^{i}=\delta_{\;\; a}^{i}A^{a}}. OK.
  7. {0=A^{i}+B^{j}}. Invalid, since the terms on the RHS have different free indexes.
  8. {qF^{ij}=\frac{dp^{i}}{d\tau}}. Invalid, since RHS has no index {j}.

Now some examples of renaming indexes:

  1. {A^{2}=\eta_{ab}A^{a}A^{b}\implies A^{2}=\eta_{ij}A^{a}B^{b}}. Wrong, since indexes on RHS are no longer repeated.
  2. {0=\eta_{ab}A^{b}+\eta_{ai}B^{i}\implies0=\eta_{ab}\left(A^{b}+B^{b}\right)}. OK.
  3. {\eta_{ij}=\eta_{ab}\Lambda_{\;\; i}^{a}\Lambda_{\;\; j}^{b}\implies\eta_{ij}=\eta_{aa}\Lambda_{\;\; i}^{a}\Lambda_{\;\; j}^{a}}. Wrong, since {a} is repeated 4 times on RHS, so violates rule 2.
  4. {\frac{dp^{i}}{d\tau}=qF^{ij}\eta_{ja}u^{a}\implies\frac{dp^{i}}{d\tau}=qF^{ij}\eta_{ji}u^{i}}. Wrong, since {i} is a free index on LHS and is repeated 3 times on RHS.
  5. {\left(\Lambda^{-1}\right)_{\;\; i}^{a}\eta_{aj}=\eta_{ib}\Lambda_{\;\; j}^{b}\implies\left(\Lambda^{-1}\right)_{\;\; i}^{b}\eta_{bj}=\eta_{ia}\Lambda_{\;\; j}^{a}}. OK, as only dummy indexes have been relabelled and they still occur twice each after relabelling.


Four-vectors: basics

Required math: algebra, vectors

Required physics: basics of relativity

In relativity, a four-vector is a vector with four components. A general four-vector {\vec{A}} is denoted by a letter with an arrow on top, and its four components are defined as

\displaystyle  \vec{A}\stackrel{\mathcal{O}}{\rightarrow}(A^{0},A^{1},A^{2},A^{3}) \ \ \ \ \ (1)

This definition contains an important point. The symbol {\stackrel{\mathcal{O}}{\rightarrow}} is used instead of an equals sign, and is to be read ‘the vector {\vec{A}} in the coordinate system used by observer {\mathcal{O}} has components {(A^{0},A^{1},A^{2},A^{3})}.

A curious point, if you haven’t seen it before, is that the components of a vector are given with superscript indexes, rather than subscripts, as are more usual in linear algebra. The reason for this becomes more apparent when we get a bit deeper into the theory, but for now it should just be accepted. It is important to note that these superscripts are not exponents, but merely labels.

In Euclidean three-dimensional space, a three-vector (one with three components) has different components depending on which coordinate system we are using to describe it. The important point is that the vector exists independently of the coordinate system, and the components used to describe it depend on that coordinate system; the vector itself does not.

In two-dimensionsal space, for example, we might define a vector {\vec{x}} in one coordinate system by the coordinates (1,0), which means that it extends one unit along (and is parallel to) the {x} axis. If we rotate the coordinate system by 90 degrees so that the {x} axis rotates into the {y} axis, the vector itself does not move and now extends one unit in the {-y} direction, so its new coordinates are {(0,-1)}. It is not correct to say that {\vec{x}} equals either of these coordinate descriptions; it is correct only to say it has these numerical coordinates in two particular systems. In yet other systems, it will have other coordinates.

In Euclidean space, a common vector is the displacement vector, which measures the distance and direction from one point to another. Again, the locations of the points and the vector connecting them do not depend on the coordinate system so that the vector will have different components in different systems. However, the length or magnitude of the distance between the points is invariant under a change of coordinates and is given by the standard Euclidean distance formula

\displaystyle  d=\sqrt{(\Delta x)^{2}+(\Delta y)^{2}+(\Delta z)^{2}} \ \ \ \ \ (2)

assuming we are using a cartesian system.

In relativity, the analog of the displacement vector is the interval between two events. Remember that an event is something that occurs at a specific time and place, although the actual numerical values of this time and place depend on the observer who measures them. We saw in an earlier post that the square of the interval between two events is an invariant, as is given by

\displaystyle  \Delta s^{2}=-(\Delta t)^{2}+(\Delta x)^{2}+(\Delta y)^{2}+(\Delta z)^{2} \ \ \ \ \ (3)

We can therefore define the displacement vector in relativity as

\displaystyle  \Delta\vec{x}=(\Delta t,\Delta x,\Delta y,\Delta z) \ \ \ \ \ (4)

The magnitude of this vector is usually defined as the square of the interval, so we have

\displaystyle  \Delta\vec{x}^{2}=-(\Delta t)^{2}+(\Delta x)^{2}+(\Delta y)^{2}+(\Delta z)^{2} \ \ \ \ \ (5)

As usual, the interval squared can be positive, zero or negative, depending on the separation of the events.

The individual components of a four-vector obey the usual rules for vectors under addition and scalar multiplication, so we get, for a scalar {k}

\displaystyle   \vec{A}+\vec{B} \displaystyle  \stackrel{\mathcal{O}}{\rightarrow} \displaystyle  (A^{0}+B^{0},A^{1}+B^{1},A^{2}+B^{2},A^{3}+B^{3})\ \ \ \ \ (6)
\displaystyle  k\vec{A} \displaystyle  \stackrel{\mathcal{O}}{\rightarrow} \displaystyle  (kA^{0},kA^{1},kA^{2},kA^{3}) \ \ \ \ \ (7)

We’ve seen that the space and time coordinates of an event transform between observers by using the Lorentz transformations. In order to write this transformation in an efficient and compact way we need to introduce a bit more notation.

First, we have seen in 1 how a vector is represented in one coordinate system. In the system of another observer {\bar{\mathcal{O}}} we can write for the same vector:

\displaystyle  \vec{A}\stackrel{\bar{\mathcal{O}}}{\rightarrow}(A^{\bar{0}},A^{\bar{1}},A^{\bar{2}},A^{\bar{3}}) \ \ \ \ \ (8)

Notice that the bars go on the indexes of the vector components and not on the symbol {A} that we are using to represent the vector itself. This emphasizes the fact that the vector doesn’t change when we change coordinate systems; only the coordinates used to describe the vector change. Thus any time we see a bar over the coordinate index, it is a coordinate measured by observer {\bar{\mathcal{O}}} .

Back to the Lorentz transformations. The form in which we’ve seen these transformations is that of four equations:

\displaystyle   t_{2} \displaystyle  = \displaystyle  \frac{1}{\sqrt{1-v^{2}}}(t_{1}-vx_{1})\ \ \ \ \ (9)
\displaystyle  x_{2} \displaystyle  = \displaystyle  \frac{1}{\sqrt{1-v^{2}}}(x_{1}-vt_{1})\ \ \ \ \ (10)
\displaystyle  y_{2} \displaystyle  = \displaystyle  y_{1}\ \ \ \ \ (11)
\displaystyle  z_{2} \displaystyle  = \displaystyle  z_{1} \ \ \ \ \ (12)

These transformations were derived for the case of displacement coordinates, but we can extend the definition to all four-vectors. That is we can say that the way to transform the unbarred coordinates into the barred coordinates is by using the Lorentz transformation on the vector’s components.

\displaystyle   A^{\bar{0}} \displaystyle  = \displaystyle  \gamma(A^{0}-vA^{1})\ \ \ \ \ (13)
\displaystyle  A^{\bar{1}} \displaystyle  = \displaystyle  \gamma(A^{1}-vA^{0})\ \ \ \ \ (14)
\displaystyle  A^{\bar{2}} \displaystyle  = \displaystyle  A^{2}\ \ \ \ \ (15)
\displaystyle  A^{\bar{3}} \displaystyle  = \displaystyle  A^{3} \ \ \ \ \ (16)

where we’ve used the shorthand symbol {\gamma=1/\sqrt{1-v^{2}}}. The index 0 corresponds to the time coordinate, and the indexes 1, 2 and 3 to the spatial coordinates.

This set of four equations can be written as a matrix equation, if we define the matrix {\Lambda} as

\displaystyle  \Lambda=\left(\begin{array}{cccc} \gamma & -v\gamma & 0 & 0\\ -v\gamma & \gamma & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right) \ \ \ \ \ (17)

We can now write the transformations as

\displaystyle  A^{\bar{\alpha}}=\sum_{\beta=0}^{3}\Lambda_{\;\beta}^{\bar{\alpha}}A^{\beta} \ \ \ \ \ (18)

Here {\Lambda_{\beta}^{\bar{\alpha}}} is the entry from row {\bar{\alpha}} and column {\beta} of the matrix. Again, the convention of writing the row index as a superscript comes in handy in what follows, so don’t make the mistake of taking this superscript as an exponent.

We’ll close this post with one final notational simplification that is used throughout special and general relativity. Sums over indexes such as that over the index {\beta} above are very common, and having to write out a summation sign over and over again gets very tedious. As a result, Einstein introduced a summation convention into relativity. Whenever a product of two or more terms contains a pair of identical indexes with one of the pair a superscript and the other a subscript, a summation is automatically performed on that index. If the index is a Greek letter, the summation extends over all four components (from 0 to 3); if it is a Latin letter, the summation extends over only the three spatial coordinates (1 to 3). Thus we can write the Lorentz transformation for a general four-vector in the simplified form

\displaystyle  A^{\bar{\alpha}}=\Lambda_{\;\beta}^{\bar{\alpha}}A^{\beta} \ \ \ \ \ (19)

Whenever the same index appears on both sides of an equation (such as {\bar{\alpha}} here), it may be taken to be any of the coordinates within its range (again, Greek = 0, 1, 2, 3; Latin = 1, 2, 3), so this simplified equation actually contains four equations depending on the value assigned to {\bar{\alpha}}.

In both these cases (the index pair used in the summation, and the index pair on opposite sides of the equation), the index concerned is merely a dummy index, and can be replaced by any other index (as long as Greek is replaced by Greek, and Latin by Latin) without changing the meaning of the equation. Thus we could just as well write

\displaystyle  A^{\bar{\xi}}=\Lambda_{\;\tau}^{\bar{\xi}}A^{\tau} \ \ \ \ \ (20)

Here, the summation is over {\tau=0..3} and the index {\bar{\xi}} can stand for any of 0, 1, 2 or 3.

However, what you cannot do is replace a barred index by an unbarred one (or vice versa), since each type of index refers to a particular coordinate system, and the set of coordinates in one system will usually be totally different to the set in the other. The only invariant is the magnitude of a vector, so we will always have

\displaystyle  -(A^{\bar{0}})^{2}+(A^{\bar{1}})^{2}+(A^{\bar{2}})^{2}+(A^{\bar{3}})^{2}=-(A^{0})^{2}+(A^{1})^{2}+(A^{2})^{2}+(A^{3})^{2} \ \ \ \ \ (21)

This is guaranteed for all four-vectors, since we defined their transformation by using the Lorentz transformations, and these transformations were derived by assuming the invariance of the interval. Note that we can’t use the summation convention to write out the invariant interval, since although the sum

\displaystyle  \sum_{\alpha=0}^{3}(A^{\alpha})^{2}=\sum_{\alpha=0}^{3}A^{\alpha}A^{\alpha} \ \ \ \ \ (22)

contains a pair of indexes, they are both superscripts so don’t satisfy the condition for summed indexes.