Stirling’s approximation for large factorials

Reference: Daniel V. Schroeder, An Introduction to Thermal Physics, (Addison-Wesley, 2000) – Problems 2.15-2.16.

The methods for counting the number of microstates in interacting Einstein solids involve calculating binomial coefficients which in turn require factorials. For any but the smallest systems, the factorials rapidly become so large that computers cannot calculate them exactly. In such cases, we can resort to Stirling’s approximation for factorials of large numbers.

Although Stirling’s approximation is quoted in many textbooks, a derivation is not usually given. Schroeder provides a nice derivation in his appendix B.2 and B.3 so we’ll run through that here and then give a couple of examples of its use.

We start with the elementary integral

\displaystyle \int_{0}^{\infty}e^{-ax}dx=\left.-\frac{e^{-ax}}{a}\right|_{0}^{\infty}=a^{-1} \ \ \ \ \ (1)

for some parameter {a}. If we treat {a} as a variable and take the derivative with respect to it, we get

\displaystyle \frac{d}{da}\int_{0}^{\infty}e^{-ax}dx=-\int_{0}^{\infty}xe^{-ax}dx=-a^{-2} \ \ \ \ \ (2)

Repeating the process, we get

\displaystyle \frac{d}{da}\int_{0}^{\infty}xe^{-ax}dx \displaystyle = \displaystyle \int_{0}^{\infty}x^{2}e^{-ax}dx=2a^{-3}\ \ \ \ \ (3)
\displaystyle \frac{d}{da}\int_{0}^{\infty}x^{2}e^{-ax}dx \displaystyle = \displaystyle -\int_{0}^{\infty}x^{3}e^{-ax}dx=-3\times2a^{-4}\ \ \ \ \ (4)
\displaystyle \displaystyle \vdots
\displaystyle \frac{d^{n}}{da^{n}}\int_{0}^{\infty}e^{-ax}dx \ \ \ \ \ (5) \displaystyle = \displaystyle \left(-1\right)^{n}\int_{0}^{\infty}x^{n}e^{-ax}dx
\displaystyle \ \ \ \ \ (6) \displaystyle = \displaystyle \left(-1\right)^{n}n!a^{-\left(n+1\right)}

Setting {a=1} we get

\displaystyle n!=\int_{0}^{\infty}x^{n}e^{-x}dx \ \ \ \ \ (7)

 

This integral is the starting point for Stirling’s approximation. The integrand is a bell-shaped curve which a precise shape that depends on {n}. The maximum value of the integrand is found from

\displaystyle \frac{d}{dx}\left(x^{n}e^{-x}\right) \displaystyle = \displaystyle nx^{n-1}e^{-x}-x^{n}e^{-x}=0\ \ \ \ \ (8)
\displaystyle x_{max} \displaystyle = \displaystyle n\ \ \ \ \ (9)
\displaystyle \left(x^{n}e^{-x}\right)_{max} \displaystyle = \displaystyle n^{n}e^{-n} \ \ \ \ \ (10)

The idea is to approximate the integrand by a Gaussian because then we can easily evaluate the Gaussian integral. We have

\displaystyle x^{n}e^{-x}=e^{n\ln x-x} \ \ \ \ \ (11)

We now substitute {y\equiv x-n}:

\displaystyle n\ln x-x \displaystyle = \displaystyle n\ln\left(n+y\right)-n-y\ \ \ \ \ (12)
\displaystyle \displaystyle = \displaystyle n\ln\left[n\left(1+\frac{y}{n}\right)\right]-n-y \ \ \ \ \ (13)

Near the peak of the curve {x\approx n} so {y\ll n} and we can approximate the logarithm by expanding it in a Taylor series up to the second order term:

\displaystyle \ln\left[n\left(1+\frac{y}{n}\right)\right] \displaystyle = \displaystyle \ln n+\ln\left(1+\frac{y}{n}\right)\ \ \ \ \ (14)
\displaystyle \displaystyle \approx \displaystyle \ln n+\frac{y}{n}-\frac{1}{2}\frac{y^{2}}{n^{2}} \ \ \ \ \ (15)

Plugging this back into 13 we get

\displaystyle n\ln x-x \displaystyle \approx \displaystyle n\ln n-n-\frac{y^{2}}{2n}\ \ \ \ \ (16)
\displaystyle e^{n\ln x-x} \displaystyle \approx \displaystyle n^{n}e^{-n}e^{-y^{2}/2n} \ \ \ \ \ (17)

so from 7 we get the approximation

\displaystyle n!\approx n^{n}e^{-n}\int_{-n}^{\infty}e^{-y^{2}/2n}dy \ \ \ \ \ (18)

[Note that the lower limit on the integral is now {y=-n} due to the substitution above.]

Since the peak of the Gaussian curve occurs at {y=0} and {n} is assumed to be large, we can extend the lower limit on the integral to {-\infty} without affecting the value much, so we get, doing the Gaussian integral:

\displaystyle n!\approx n^{n}e^{-n}\int_{-\infty}^{\infty}e^{-y^{2}/2n}dy=\sqrt{2\pi n}n^{n}e^{-n} \ \ \ \ \ (19)

which is Stirling’s approximation.

It’s common when doing approximations to sums to neglect a small term added to a much larger term, as in {10^{23}+10\approx10^{23}}. What is at first glance harder to believe is that if we have a very large number and multiply it by a much smaller number, the result is essentially the same. For example

\displaystyle 10^{23}\times10^{10^{23}}=10^{10^{23}+23}\approx10^{10^{23}} \ \ \ \ \ (20)

If we’re interested in {\ln n!} and use Stirling’s approximation, we have

\displaystyle \ln n! \displaystyle \approx \displaystyle \frac{1}{2}\ln\left(2\pi n\right)+n\ln n-n\ \ \ \ \ (21)
\displaystyle \displaystyle = \displaystyle \frac{1}{2}\ln\left(2\pi n\right)+n\left(\ln n-1\right) \ \ \ \ \ (22)

For large {n}, the first term is much smaller than the last term and can often be neglected, so the logarithmic form of Stirling’s approximation is sometimes given as

\displaystyle \ln n!\approx n\ln n-n \ \ \ \ \ (23)

Example 1 For {n=50}, the exact and approximate values are

\displaystyle 50! \displaystyle = \displaystyle 3.0414\times10^{64}\ \ \ \ \ (24)
\displaystyle \sqrt{2\pi50}50^{50}e^{-50} \displaystyle = \displaystyle 3.0363\times10^{64}\ \ \ \ \ (25)
\displaystyle \ln50! \displaystyle = \displaystyle 148.477767\ \ \ \ \ (26)
\displaystyle 50\left(\ln50-1\right) \displaystyle = \displaystyle 145.6011502 \ \ \ \ \ (27)

Thus even for {n=50} (which can be handled exactly by most pocket calculators) Stirling’s approximation is reasonable.

Example 2 A larger coin flipping experiment. We flip a coin 1000 times. The probability of getting exactly 500 heads and 500 tails can be worked out using Stirling’s approximation. The total number of outcomes is {2^{1000}}, and the number of ways of getting 500 heads is

\displaystyle \binom{1000}{500} \displaystyle = \displaystyle \frac{1000!}{\left(500!\right)^{2}}\ \ \ \ \ (28)
\displaystyle \displaystyle \approx \displaystyle \frac{1}{\sqrt{2\pi}}\frac{\sqrt{1000}1000^{1000}e^{-1000}}{500\times500^{1000}e^{-1000}}\ \ \ \ \ (29)
\displaystyle \displaystyle = \displaystyle \frac{1}{\sqrt{2\pi}}\frac{\sqrt{1000}}{500}2^{1000} \ \ \ \ \ (30)

Thus the probability of getting exactly 500 heads is approximately

\displaystyle \frac{1}{2^{1000}}\binom{1000}{500}\approx\frac{1}{\sqrt{2\pi}}\frac{\sqrt{1000}}{500}=0.0252 \ \ \ \ \ (31)

To get exactly 600 heads, we need

\displaystyle \binom{1000}{600} \displaystyle = \displaystyle \frac{1000!}{600!400!}\ \ \ \ \ (32)
\displaystyle \displaystyle \approx \displaystyle \frac{1}{\sqrt{2\pi}}\frac{\sqrt{1000}1000^{1000}e^{-1000}}{\sqrt{600\times400}\times600^{600}\times400^{400}e^{-1000}}\ \ \ \ \ (33)
\displaystyle \displaystyle = \displaystyle \frac{1}{\sqrt{480\pi}}\frac{2^{1000}500^{1000}}{600^{600}\times400^{400}} \ \ \ \ \ (34)

The probability is therefore

\displaystyle \frac{1}{2^{1000}}\binom{1000}{500} \displaystyle \approx \displaystyle \frac{1}{\sqrt{480\pi}}\frac{500^{1000}}{600^{600}\times400^{400}}\ \ \ \ \ (35)
\displaystyle \displaystyle = \displaystyle 4.635\times10^{-11} \ \ \ \ \ (36)

[Although Maple is capable of working out the large exponents directly, if you want to do this on a calculator, it would be better to work with logs.]

A general formula for the probability of getting {n} heads in 1000 flips is

\displaystyle Prob\left(n\right)\approx\sqrt{\frac{1000}{2\pi n\left(1000-n\right)}}\frac{500^{1000}}{n^{n}\left(1000-n\right)^{1000-n}} \ \ \ \ \ (37)

A plot of this is as follows:

It’s overwhelmingly probable that you’ll get something close to 500 heads for 1000 coin flips.

6 thoughts on “Stirling’s approximation for large factorials

  1. Pingback: Einstein solids: multiplicity of large systems | Physics pages

  2. Pingback: Paramagnets & coin flips: peak and width of multiplicity function | Physics pages

  3. Pingback: Entropy of an ideal gas; Sackur-Tetrode equation | Physics pages

  4. Pingback: Entropy of mixing | Physics pages

  5. Pingback: Two-state paramagnet: analytic solution | Physics pages

  6. Pingback: Rubber bands and entropy | Physics pages

Leave a Reply

Your email address will not be published. Required fields are marked *