Tag Archives: probability

Dirac equation: conserved probability current

Reference: References: Robert D. Klauber, Student Friendly Quantum Field Theory, (Sandtrove Press, 2013) – Chapter 4, Problem 4.13.

The Dirac equation is

\displaystyle  \left(i\gamma^{\mu}\partial_{\mu}-m\right)\left|\psi\right\rangle =0 \ \ \ \ \ (1)

and the adjoint Dirac equation is

\displaystyle  i\partial_{\mu}\left\langle \bar{\psi}\right|\gamma^{\mu}+m\left\langle \bar{\psi}\right|=0 \ \ \ \ \ (2)

where there are four solution vectors {n=1,\ldots,4} and the adjoint solutions are given by

\displaystyle  \left\langle \bar{\psi}\right|=\left\langle \psi\right|\gamma^{0} \ \ \ \ \ (3)

We’d like to find a conserved quantity analogous to that for the Klein-Gordon equation. We can do this as follows. First we multiply 1 on the left by the adjoint solutions:

\displaystyle  i\left\langle \bar{\psi}\left|\gamma^{\mu}\partial_{\mu}\right|\psi\right\rangle =m\left\langle \bar{\psi}\left|\psi\right.\right\rangle \ \ \ \ \ (4)

Then, multiply 2 on the right by the original solutions:

\displaystyle  i\left(\partial_{\mu}\left\langle \bar{\psi}\right|\right)\gamma^{\mu}\left|\psi\right\rangle =-m\left\langle \bar{\psi}\left|\psi\right.\right\rangle \ \ \ \ \ (5)

We’ve kept the bra in parentheses since the derivative applies only to it and not the ket portion. Adding these two equations gives

\displaystyle   i\left\langle \bar{\psi}\left|\gamma^{\mu}\partial_{\mu}\right|\psi\right\rangle +i\left(\partial_{\mu}\left\langle \bar{\psi}\right|\right)\gamma^{\mu}\left|\psi\right\rangle \displaystyle  = \displaystyle  0\ \ \ \ \ (6)
\displaystyle  i\partial_{\mu}\left\langle \bar{\psi}\left|\gamma^{\mu}\right|\psi\right\rangle \displaystyle  = \displaystyle  0 \ \ \ \ \ (7)

where we’ve used the product rule to combine the two derivatives, so that the {\partial_{\mu}} in the last line does apply to the full bracket. Note also that in the bracket in the last line, there is no integration over space, since all we’ve done is multiply the adjoint solution into the regular solution.

By the way, you might think that this result is trivial, since spacetime enters into {\left|\psi\right\rangle } only in the form {e^{\pm ipx}} and therefore into {\left\langle \bar{\psi}\right|} in the form {e^{\mp ipx}}, so it would seem that the bracket {\left\langle \bar{\psi}\left|\gamma^{\mu}\right|\psi\right\rangle } has no dependence on {x} so obviously its derivative must be zero. However, the {\left|\psi\right\rangle } here can refer to a sum of states with different momenta {\mathbf{p}}, so the result isn’t quite as trivial as it looks.

We can therefore define a conserved current {j^{\mu}} as

\displaystyle  j^{\mu}\equiv\left\langle \bar{\psi}\left|\gamma^{\mu}\right|\psi\right\rangle \ \ \ \ \ (8)

so that

\displaystyle  \partial_{\mu}j^{\mu}=0 \ \ \ \ \ (9)

Again, remember that there is no integration over space in the definition of {j^{\mu}}.

In Klauber’s equations 4-36 and 4-37, he shows that if we consider a single particle in the state

\displaystyle  \left|\psi\right\rangle =\sum_{r,\mathbf{p}}\sqrt{\frac{m}{VE_{\mathbf{p}}}}C_{r}\left(\mathbf{p}\right)u_{r}e^{-ipx} \ \ \ \ \ (10)

(remember that {r} ranges over the four solutions and {\mathbf{p}} over all possible discrete momenta) and integrate {\rho\equiv j^{0}} over space, we get the condition

\displaystyle  \sum_{r,\mathbf{p}}\left|C_{r}\left(\mathbf{p}\right)\right|^{2}=1 \ \ \ \ \ (11)

Thus we can interpret {\left|C_{r}\left(\mathbf{p}\right)\right|^{2}} as the probability of finding the particle in state {r} with momentum {\mathbf{p}}.

Probability density in a Klein-Gordon field

Reference: References: Robert D. Klauber, Student Friendly Quantum Field Theory, (Sandtrove Press, 2013) – Chapter 3, Problem 3.12.

The probability density {\rho} and current {\mathbf{j}} for the solutions {\phi} to the Klein-Gordon equation in relativistic quantum mechanics (where {\phi} represents a state, not a field) are

\displaystyle   \rho \displaystyle  \equiv \displaystyle  i\left(\phi^{\dagger}\frac{\partial\phi}{\partial t}-\phi\frac{\partial\phi^{\dagger}}{\partial t}\right)\ \ \ \ \ (1)
\displaystyle  \mathbf{j} \displaystyle  = \displaystyle  -i\left(\phi^{\dagger}\nabla\phi-\phi\nabla\phi^{\dagger}\right) \ \ \ \ \ (2)

In field theory, the solutions to the Klein-Gordon equation are mathematically the same as in relativistic quantum mechanics; the only difference is that the coefficients are creation and annihilation operators rather than just numbers. The field solutions are

\displaystyle   \phi\left(x\right) \displaystyle  = \displaystyle  \sum_{\mathbf{k}}\frac{1}{\sqrt{2V\omega_{\mathbf{k}}}}a\left(\mathbf{k}\right)e^{-ikx}+\sum_{\mathbf{k}}\frac{1}{\sqrt{2V\omega_{\mathbf{k}}}}b^{\dagger}\left(\mathbf{k}\right)e^{ikx}\ \ \ \ \ (3)
\displaystyle  \displaystyle  \equiv \displaystyle  \phi^{+}+\phi^{-}\ \ \ \ \ (4)
\displaystyle  \phi^{\dagger}\left(x\right) \displaystyle  = \displaystyle  \sum_{\mathbf{k}}\frac{1}{\sqrt{2V\omega_{\mathbf{k}}}}a^{\dagger}\left(\mathbf{k}\right)e^{ikx}+\sum_{\mathbf{k}}\frac{1}{\sqrt{2V\omega_{\mathbf{k}}}}b\left(\mathbf{k}\right)e^{-ikx}\ \ \ \ \ (5)
\displaystyle  \displaystyle  \equiv \displaystyle  \phi^{\dagger+}+\phi^{\dagger-} \ \ \ \ \ (6)

Since the solutions are mathematically the same in the two cases, the derivation of 1 and 2 is the same. To work out the densities, we must use the field solutions instead of the state solutions.

To work out the probability density {\rho} for a given state {\left|\phi_{1},\phi_{2},\phi_{3},\ldots\right\rangle } we need to evaluate the expression

\displaystyle  \left\langle \phi_{1},\phi_{2},\phi_{3},\ldots\left|\rho\right|\phi_{1},\phi_{2},\phi_{3},\ldots\right\rangle  \ \ \ \ \ (7)

Looking at the fields 3 and 5, we see that their time derivatives will bring down a factor of {\pm i\omega_{\mathbf{k}}} but otherwise leave the expressions unchanged, so the terms on the RHS of 1 will involve products of two of the operators {a^{\dagger}}, {a}, {b^{\dagger}} and {b}. Since the state in the bra of 7 is the same as the state in the ket, only combinations of these operators that leave the ket state unchanged will survive the calculation (due to the orthogonality of states with different numbers of particles or different energies {\omega_{\mathbf{k}}} in them). This means that all terms containing a product of {a^{\dagger}} or {a} with {b^{\dagger}} or {b} will contribute nothing, since they don’t leave the ket unchanged. We can therefore look at {a} and {b} separately and then combine the result.

First, we’ll look at the {b} terms. We can use the earlier result with the original {A} and {B} coefficients replaced by {a} and {b} operators. This gives (I’ve left off the bra and ket from 7 to make the typesetting easier, but you should imagine both sides enclosed within this bra and ket):

\displaystyle   \rho_{b} \displaystyle  = \displaystyle  -\left[\sum_{\mathbf{k}}\frac{b_{\mathbf{k}}}{\sqrt{2\omega_{\mathbf{k}}V}}e^{-ikx}\right]\left[\sum_{\mathbf{k}^{\prime}}\frac{\omega_{\mathbf{k}^{\prime}}b_{\mathbf{k}^{\prime}}^{\dagger}}{\sqrt{2\omega_{\mathbf{k}^{\prime}}V}}e^{ik^{\prime}x}\right]-\nonumber
\displaystyle  \displaystyle  \displaystyle  \left[\sum_{\mathbf{k}^{\prime}}\frac{b_{\mathbf{k}^{\prime}}^{\dagger}}{\sqrt{2\omega_{\mathbf{k}^{\prime}}V}}e^{ik^{\prime}x}\right]\left[\sum_{\mathbf{k}}\frac{\omega_{\mathbf{k}}b_{\mathbf{k}}}{\sqrt{2\omega_{\mathbf{k}}V}}e^{-ikx}\right] \ \ \ \ \ (8)

Only terms in the double sums where {\mathbf{k}=\mathbf{k}^{\prime}} will survive, again because of the orthogonality of state with different {\mathbf{k}} values. Therefore we have

\displaystyle   \rho_{b} \displaystyle  = \displaystyle  -\frac{1}{2V}\sum_{\mathbf{k}}\left(b_{\mathbf{k}}b_{\mathbf{k}}^{\dagger}+b_{\mathbf{k}}^{\dagger}b_{\mathbf{k}}\right)\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  -\frac{1}{2V}\sum_{\mathbf{k}}\left(1+b_{\mathbf{k}}^{\dagger}b_{\mathbf{k}}+b_{\mathbf{k}}^{\dagger}b_{\mathbf{k}}\right)\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  -\frac{1}{V}\sum_{\mathbf{k}}\left(\frac{1}{2}+b_{\mathbf{k}}^{\dagger}b_{\mathbf{k}}\right)\ \ \ \ \ (11)
\displaystyle  \displaystyle  = \displaystyle  -\frac{1}{V}\sum_{\mathbf{k}}\left(\frac{1}{2}+N_{b}\left(\mathbf{k}\right)\right) \ \ \ \ \ (12)

where we used the commutator {\left[b_{\mathbf{k}},b_{\mathbf{k}}^{\dagger}\right]=1} in the second line.

For the {a} operators we have

\displaystyle   \rho_{a} \displaystyle  = \displaystyle  \left[\sum_{\mathbf{k}}\frac{a_{\mathbf{k}}^{\dagger}}{\sqrt{2\omega_{\mathbf{k}}V}}e^{ikx}\right]\left[\sum_{\mathbf{k}^{\prime}}\frac{\omega_{\mathbf{k}^{\prime}}a_{\mathbf{k}^{\prime}}}{\sqrt{2\omega_{\mathbf{k}^{\prime}}V}}e^{-ik^{\prime}x}\right]+\nonumber
\displaystyle  \displaystyle  \displaystyle  \left[\sum_{\mathbf{k}^{\prime}}\frac{a_{\mathbf{k}^{\prime}}}{\sqrt{2\omega_{\mathbf{k}^{\prime}}V}}e^{-ik^{\prime}x}\right]\left[\sum_{\mathbf{k}}\frac{\omega_{\mathbf{k}}a_{\mathbf{k}}^{\dagger}}{\sqrt{2\omega_{\mathbf{k}}V}}e^{ikx}\right]\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2V}\sum_{\mathbf{k}}\left(a_{\mathbf{k}}^{\dagger}a_{\mathbf{k}}+a_{\mathbf{k}}a_{\mathbf{k}}^{\dagger}\right)\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{2V}\sum_{\mathbf{k}}\left(a_{\mathbf{k}}^{\dagger}a_{\mathbf{k}}+1+a_{\mathbf{k}}^{\dagger}a_{\mathbf{k}}\right)\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{V}\sum_{\mathbf{k}}\left(\frac{1}{2}+a_{\mathbf{k}}^{\dagger}a_{\mathbf{k}}\right)\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{V}\sum_{\mathbf{k}}\left(\frac{1}{2}+N_{a}\left(\mathbf{k}\right)\right) \ \ \ \ \ (17)

The total probability density for the state {\left|\phi_{1},\phi_{2},\phi_{3},\ldots\right\rangle } is therefore

\displaystyle   \bar{\rho} \displaystyle  = \displaystyle  \left\langle \phi_{1},\phi_{2},\phi_{3},\ldots\left|\rho\right|\phi_{1},\phi_{2},\phi_{3},\ldots\right\rangle \ \ \ \ \ (18)
\displaystyle  \displaystyle  = \displaystyle  \rho_{a}+\rho_{b}\ \ \ \ \ (19)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{V}\sum_{\mathbf{k}}\left(N_{a}\left(\mathbf{k}\right)-N_{b}\left(\mathbf{k}\right)\right) \ \ \ \ \ (20)

The probability density is seen to be the number density (number of particles per unit volume), except that {b}-particles (antiparticles) count as negative particles.

It should be noted that we’ve implicitly assumed that the state {\left|\phi_{1},\phi_{2},\phi_{3},\ldots\right\rangle } consists entirely of particles that are in energy eigenstates, that is the energy of each particle is precisely {\omega_{\mathbf{k}}} for some {\mathbf{k}} (though not all particles need be in the same eigenstate). If this weren’t the case, the actions of the operators {a^{\dagger}}, {a}, {b^{\dagger}} and {b} aren’t quite so straightforward and we couldn’t get the simple result we did.

Buffon’s needle: estimating pi

Required math: (very simple) probability & calculus

Required physics: none

This interesting little problem serves well to illustrate the notion of a probability density and its application to an experiment which can be done at home. It is known as Buffon’s needle, since it is believed that Georges-Louis Leclerc, Comte de Buffon, first posed the problem in the 18th century.

Suppose we have a needle of length {l} and we drop this needle onto a sheet of paper on which there are a number of parallel lines spaced a distance {l} apart. What is the probability that the dropped needle will cross a line?

One way of analyzing this problem is to begin by considering the needle as the hypotenuse of a right triangle, with sides parallel and perpendicular to the parallel lines. Then if the needle makes an angle {\theta} with the lines, the sides of this triangle are {l\cos\theta} for the parallel side and {l\sin\theta} for the perpendicular side. The side of the triangle parallel to the lines is of no interest here; what we are interested in the perpendicular component.

To see this, suppose we placed a needle of length {x<l} on the paper in such a way that it was perpendicular to the lines. Such a needle would cover a fraction {x/l} of the distance between two adjacent lines. Thus the probability that a line would cross this specially dropped needle is just {x/l}. (Note that if {x=l} the probability is 1, since such a needle covers the entire distance between the lines.)

Therefore, the probability that a needle that makes an angle {\theta} with the lines crosses a line is {(l\sin\theta)/l=\sin\theta}.

How likely is the needle to drop at an angle {\theta}? Since {\theta} is a continuous variable, we need a probability density, rather than just a simple probability. Assuming that the needle is equally likely to drop at any angle between 0 and {\pi\;(180^{\circ})}, the density must be a constant, and must integrate to 1 over the range of valid {\theta}, so it must be {\rho(\theta)=\frac{1}{\pi}}.

The probability {P(\mathrm{crosses\; line})} that a randomly dropped needle crosses a line is therefore

\displaystyle   P(\mathrm{crosses\; line}) \displaystyle  = \displaystyle  \int_{0}^{\pi}P(\mathrm{angle}\;\theta\;\mathrm{crosses\; line})\rho(\theta)d\theta\ \ \ \ \ (1)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{\pi}\int_{0}^{\pi}\sin\theta\; d\theta\ \ \ \ \ (2)
\displaystyle  \displaystyle  = \displaystyle  \frac{2}{\pi}\ \ \ \ \ (3)
\displaystyle  \displaystyle  \approx \displaystyle  0.6366 \ \ \ \ \ (4)

Besides being another of those curious situations where {\pi} pops up unexpectedly, this result offers the possibility of an interesting way of whiling away a rainy afternoon. Simply by dropping a needle repeatedly onto lined paper, you can do an experiment that will determine the value of {\pi}, since

\displaystyle   \pi \displaystyle  = \displaystyle  \frac{2}{P(\mathrm{crosses\; line})}\ \ \ \ \ (5)
\displaystyle  \displaystyle  = \displaystyle  \frac{2}{(\mathrm{Fraction\; of\; needles\; crossing\; lines})}\ \ \ \ \ (6)
\displaystyle  \displaystyle  = \displaystyle  \frac{2(\mathrm{Number\; of\; needles\; dropped})}{\mathrm{Number\; of\; needles\; crossing\; lines}} \ \ \ \ \ (7)

OK, you would need to drop a lot of needles to get a decent value of {\pi}, but it’s interesting that there is such a simple method for getting even an approximate value of {\pi} without using circles, triangles, angles, measurement or anything more than just counting.


Required math: arithmetic

Required physics: none

Probability theory is central to several areas of physics, most notably quantum mechanics and statistical mechanics, so this post will give some of the basics of the theory for those who haven’t encountered it before.

Central to the theory of probability is the concept of randomness, and this is a notoriously difficult notion to define. How can we tell if a number or other event is in some sense ‘random’? Is 64 a random number? Is it truly a random event whether toast falls buttered side down or is Murphy’s law true?

The only realistic way to cope with the definition of randomness is to define a set of probabilities for the occurrence of the events in question. We can illustrate this by considering the question above of whether 64 is a random number. Suppose we are choosing integers from the inclusive set [1…100] and we would like all numbers in this set to occur equally often. If the first number we pick turns out to be 64, this tells us nothing. If we pick 100 numbers (perhaps by picking one of a group of tiles, each of which has one of the numbers from 1 to 100 written on it, out of a bag, noting the number so picked and then replacing the tile) and 64 turns up exactly once, we might say that’s what we expect, but suppose we get no 64s, or two or three of them. Does that mean that 64 isn’t being selected randomly? Not necessarily. The only way we can demonstrate experimentally that 64 is occurring randomly in a way that is consistent with our hypothesis that all numbers between 1 and 100 are equally probable is to pick a very large (ideally infinite, but if we’re doing the simulation on a computer, we would probably stop after several million) number of numbers and then check to see if 64 occurred around 1/100th of the time. If it did, then we would be fairly confident in stating that, yes, 64 is indeed a random number in this context.

But suppose we were dealing with a different situation in which powers of 2 were meant to be twice as likely to occur as other numbers (but we’re still restricting our attention to numbers between 1 and 100). Thus the seven numbers 1, 2, 4, 8, 16, 32 and 64 should show up twice as often as the other numbers. In this case (as we’ll see in a moment) we would expect 64 to occur a fraction of 2/107 of the time as opposed to a non-power of 2 which would occur 1/107th of the time. So if we got the same results as in the first experiment, then we would say, no, 64 is not occurring randomly, or at least it is evidence that the probability of its occurrence is not what we predicted.

In other words, whether or not an event is ‘random’ depends on the context of the problem. Before we can say whether something is random, we must first specify its probability.

So what exactly is probability? Mathematically, to define probability we must first define a set of possible events. Thus in the case of selecting a number out of a bag, we must specify which numbers are possible outcomes of each selection. Suppose we define this set to be the numbers from 1 to 100.

Having specified which events can happen, we need to then define a probability for each event in this set, but we need to do this so that it satisfies an important constraint: the sum of probabilities for all possible mutually exclusive events in the set must add up to 1. Since a probability of 1 indicates certainty, this condition is the mathematical way of saying that the event must be one of those specified in the original set. This is a valuable check when you are doing calculations involving probability: if the sum of probabilities for all the events in a set is not 1, you’ve done something wrong, so go back and check the definition of the set and the probability assigned to each event in the set.

Thus in the case of choosing numbers from the set [1..100] there are 100 possible events. In our first example above, where selection of all numbers is equally likely, the probabilities of choosing each number must be the same, and they must add up to 1. If we denote the probability of choosing number {n} as {P(n)}, then we have the conditions

\displaystyle   P(n) \displaystyle  = \displaystyle  k\ \ \ \ \ (1)
\displaystyle  \sum_{n=1}^{100}P(n) \displaystyle  = \displaystyle  1 \ \ \ \ \ (2)

where {k} is a constant, independent of {n}. From the second equation, since all probabilities are equal, we get {100k=1} or {k=0.01}.

In the second example, the powers of two are to be twice as likely as the other numbers, so we get

\displaystyle  P(n)=\begin{cases} 2k & \mathrm{if}\; n=\mathrm{power\; of\;2}\\ k & \mathrm{otherwise} \end{cases} \ \ \ \ \ (3)

\displaystyle   \sum_{n\ne2^{m}}k+\sum_{n=2^{m}}2k \displaystyle  = \displaystyle  1\ \ \ \ \ (4)
\displaystyle  93k+7(2k) \displaystyle  = \displaystyle  1\ \ \ \ \ (5)
\displaystyle  k \displaystyle  = \displaystyle  \frac{1}{107} \ \ \ \ \ (6)

In the second line, there are 93 non-powers of 2 and 7 powers of 2. So the probability of getting a power of 2 is 2/107 as stated above.

Now we have a look at a few of the basic terms in probability.

The mean or average of a set of numbers is defined as

\displaystyle   \langle n\rangle \displaystyle  \equiv \displaystyle  \sum nP(n) \ \ \ \ \ (7)

where the sum is taken over all values in the set. That is, the average value is a weighted average, where each value {n} is multiplied by its probability and then the whole lot added up. In the simple case where all values have the same probability, this is the same as adding up all the numbers and dividing by the size of the set. For example, in the case of the numbers from 1 to 100, if all numbers are equally likely:

\displaystyle   \langle n\rangle \displaystyle  = \displaystyle  \sum_{n=1}^{100}nP(n)\ \ \ \ \ (8)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{100}\sum_{n=1}^{100}n\ \ \ \ \ (9)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{100}\frac{1}{2}(100)(101)\ \ \ \ \ (10)
\displaystyle  \displaystyle  = \displaystyle  50.5 \ \ \ \ \ (11)

where in the third line we’ve used the formula for summing up the numbers from 1 to {N}:

\displaystyle   \sum_{n=1}^{N}n \displaystyle  = \displaystyle  \frac{1}{2}n(n+1) \ \ \ \ \ (12)

In the second case, where powers of 2 are twice as likely, we get

\displaystyle   \langle n\rangle \displaystyle  = \displaystyle  \sum_{n=1}^{100}nP(n)\ \ \ \ \ (13)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{107}\sum_{n\ne2^{m}}n+\frac{2}{107}\sum_{n=2^{m}}n\ \ \ \ \ (14)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{107}\left(\frac{1}{2}(100)(101)-(2^{7}-1)\right)+\frac{2}{107}(2^{7}-1)\ \ \ \ \ (15)
\displaystyle  \displaystyle  = \displaystyle  \frac{1}{107}(5050-127+2\times127)\ \ \ \ \ (16)
\displaystyle  \displaystyle  = \displaystyle  \frac{5177}{107}\ \ \ \ \ (17)
\displaystyle  \displaystyle  \approx \displaystyle  48.38 \ \ \ \ \ (18)

where in the third line we used the formula for the first {N} powers of 2 (a special case of the formula for a geometric series):

\displaystyle   \sum_{m=0}^{N}2^{m} \displaystyle  = \displaystyle  2^{N+1}-1 \ \ \ \ \ (19)

The mean in this case is lower than in the first case, since there are more powers of 2 less than 50 than greater than 50, thus the lower half of the number set is weighted more.

Using similar logic, we can calculate the mean of any function of {n}. For example, the mean square is the mean of the squares of {n}:

\displaystyle   \langle n^{2}\rangle \displaystyle  = \displaystyle  \sum n^{2}P(n) \ \ \ \ \ (20)

A common measure of the spread of values around the mean is given by the variance, defined as follows:

\displaystyle   var(n) \displaystyle  \equiv \displaystyle  \langle n-\langle n\rangle\rangle^{2}\ \ \ \ \ (21)
\displaystyle  \displaystyle  = \displaystyle  \sum(n-\langle n\rangle)^{2}P(n)\ \ \ \ \ (22)
\displaystyle  \displaystyle  = \displaystyle  \sum n^{2}P(n)-2\langle n\rangle\sum nP(n)+\langle n\rangle^{2}\ \ \ \ \ (23)
\displaystyle  \displaystyle  = \displaystyle  \langle n^{2}\rangle-\langle n\rangle^{2} \ \ \ \ \ (24)

where we have used {\sum P(n)=1} and {\sum nP(n)=\langle n\rangle} in the derivation.

The standard deviation is the (positive) square root of the variance, and is commonly denoted with a lower-case sigma {\sigma}:

\displaystyle   \sigma \displaystyle  \equiv \displaystyle  \sqrt{\langle n^{2}\rangle-\langle n\rangle^{2}} \ \ \ \ \ (25)

For the examples above, we have, using the formula for the sum of squares: {\sum_{n=1}^{N}n^{2}=\frac{N(N+1)(2N+1)}{6}}

\displaystyle   var(n) \displaystyle  = \displaystyle  \frac{1}{100}338350-50.5^{2}\ \ \ \ \ (26)
\displaystyle  \displaystyle  = \displaystyle  833.25\ \ \ \ \ (27)
\displaystyle  \sigma \displaystyle  = \displaystyle  28.87 \ \ \ \ \ (28)

for the equi-probable set, and, for the set with powers of 2 weighted at twice the probability:

\displaystyle   var(n) \displaystyle  = \displaystyle  \frac{1}{107}(338350-5461)+\frac{2}{107}5461-\left(\frac{5177}{107}\right)^{2}\ \ \ \ \ (29)
\displaystyle  \displaystyle  = \displaystyle  872.26\ \ \ \ \ (30)
\displaystyle  \sigma \displaystyle  = \displaystyle  29.53 \ \ \ \ \ (31)

The larger variance in the second case is due to the fact that some of the numbers far from the mean (the lower powers of 2) have a higher weight, so the distribution is spread out more than in the equiprobable case.

This should give you the main ideas that you need to use probabilities of discrete events in physics. Frequently, however, we need to consider probabilities of continuous variables, such as in quantum mechanics where we consider the probability of find a particle at a given point in space. This requires the concept of a probability density function, and is discussed in the context of the wave function in quantum mechanics here.