A simple algorithm for the cumulative distribution function of the Gaussian distribution

(1)

deposit_hagen

Publikationsserver der Universitätsbibliothek

Mathematik und

Informatik

Lehrgebiet Stochastik Forschungsbericht

Frank Recker

A simple algorithm for the

cumulative distribution function

of the Gaussian distribution

(2)

A simple algorithm for the cumulative distribution function of the Gaussian distribution

Frank Recker

^∗

Abstract

We give a simple algorithm for the value of the cumulative distribution function of the Gaussian distribution.

Key Words:Gaussian distribution, cumulative distribution function, CDF MSC2010 classification:60-04, 68-04

1 Introduction

The Gaussian Integral

Φ(x) = 1

√2π

x

Z

−∞

e⁻^t

2

2 dt (1)

is ubiquitous in statistical applications. There are libraries such as the GNU scientific library [1] which offer this calculation as a function. However, sometimes one cannot use these libraries due to technical or legal constraints. In this paper, we describe a simple algorithm for this calculation. Example implementations can be found at the github site [2]

of the author.

2 The Algorithm

The algorithm for computingφ(x, n)forx ∈ Randn ∈ N = {1,2, . . .}is defined as follows:

Algorithm 1. Letdn= 0and

dj =−x² 2j

dj+1+ 1 2j+ 1

for allj=n−1, . . . ,1. Finally let

φ(x, n) =1 2 + x

√2π(d1+ 1).

∗Frank Recker, Technische Hochschule Mittelhessen, Mathematik, Fachbereich MNI, Wiesenstraße 14, 35390 Gießen, E-Mail:frank.recker@mni.thm.de

(3)

The algorithm computes an approximation of the Gaussian cumulative distribution function as defined in Equation (1). Theorem 2 formalizes this.

Theorem 2. Letx∈R. DefineΦ(x)as in Equation (1) andφ(x, n)for alln ∈Nas in Algorithm 1. Then we have

n→∞lim φ(x, n) = Φ(x).

Furthermore ifn≥ ^x₂² then

|φ(x, n)−Φ(x)|< 1

√2π

|x|²ⁿ⁺¹ (2n+ 1)2ⁿn!.

Table 1 shows the values of φ(x, n)and the maximum error due to Theorem 2 for somexandn. The values were calculated with the code taken from [2]. Note that all calculations are done with floating point numbers and hence rounding errors will occur.

x n φ(x, n) error bound

1.96 1 1.2819268695868082 no bound

1.96 2 0.7812851592193613 0.28848977918213764 1.96 10 0.9749960638553972 7.014638266104427e-6 1.96 200 0.9750021048517796 1.23e-321

5 1 2.4947114020071637 no bound 5 10 -1169.2649270406318 no bound

5 30 0.9285538915764981 9.958422559186228e-2 5 50 0.9999997133453642 4.5497179496632544e-12 5 200 0.9999997133486902 1.5200212487901728e-158

Table 1: Some values and error bounds forφ(x, n)

3 The Proof

First, we derive a power series forΦ(x).

Lemma 3. LetΦ(x)as in Equation (1). Then

Φ(x) =1 2 + 1

√ 2π

∞

X

k=0

(−1)^kx^2k+1

(2k+ 1)2^kk!. (2)

Proof. We haveΦ(0) = ¹₂ and hence we can rewrite Equation (1) as

Φ(x) = 1 2+ 1

√2π

x

Z

0

e⁻^t

2

2 dt. (3)

Next we use the power series for theexp-function:

e^x=

∞

X

k=0

x^k k!.

This series converges absolutely for allx∈R. We insert this into Equation (3) and get

Φ(x) = 1 2+ 1

√2π

x

Z

0

∞

X

k=0

−^t₂²^k k! dt= 1

2+ 1

√2π

x

Z

0

∞

X

k=0

(−1)^kt^2k 2^kk! dt.

(4)

For convergent power series we can swap integration and summation. Hence we get

Φ(x) =1 2 + 1

√2π

∞

X

k=0 x

Z

0

(−1)^kt^2k 2^kk! dt= 1

2+ 1

√2π

∞

X

k=0

(−1)^kt^2k+1 (2k+ 1)2^kk!

^t=x

t=0

and therefore

Φ(x) =1 2 + 1

√2π

∞

X

k=0

(−1)^kx^2k+1 (2k+ 1)2^kk!

as stated in the lemma.

Next, we will show that φ(x, n) is in fact just a partial sum of the series given in Equation (2).

Lemma 4. Letx∈R,n∈Nandφ(x, n)as defined in Algorithm 1. Then we have

φ(x, n) = 1 2+ 1

√ 2π

n−1

X

k=0

(−1)^kx^2k+1

(2k+ 1)2^kk!. (4)

Proof. For brevity define

φ⁰(x, n) = 1 2+ 1

√2π

n−1

X

k=0

(−1)^kx^2k+1 (2k+ 1)2^kk!.

For allk∈Nletak = _2k+1¹ andbk =−^x_2k². Furthermore leta0 = 1andb0 = ^√^x_2π. Its easy to show by complete induction that

φ⁰(x, n) =1 2 +

n−1

X

k=0



ak k

Y

j=0

bj



.

Writing this in a Horner-scheme style gives

φ⁰(x, n) =1

2 +b0(a0+b1(a1+b2(· · ·+b_n−2(a_n−2+b_n−1a_n−1). . .))). Algorithm 1 just evaluates this expression from the inner brackets to the outer one (d_j is the value of the termb_j(a_j+b_j+1(. . .))for allj= 1, . . . , n−2).

Now we can prove Theorem 2.

Proof of Theorem 2. From Equations (2) and (4) we get

n→∞lim φ(x, n) = Φ(x).

The seriesφ(x, n)is alternating. Letn≥ ^x₂². We compare the terms of the series fork=n andk=n−1:

(−1)ⁿx²ⁿ⁺¹ (2n+1)2ⁿn!

(−1)ⁿ⁻¹x²ⁿ⁻¹ (2n−1)2ⁿ⁻¹(n−1)!

=

(2n−1)x² (2n+ 1)2n

≤

2n−1 2n+ 1

<1.

Hence, the series forφ(x, n)is alternating and the absolute values of the terms are mono- tone decreasing. Calculus shows in this case that|φ(x, n)−Φ(x)|is bound by the absolute value of then-th term of the series which is

√1 2π

|x|²ⁿ⁺¹ (2n+ 1)2ⁿn!.

(5)

References

[1] The Gaussian Distribution in the GNU Scientific Library,http://www.gnu.org/

software/gsl/manual/html_node/The-Gaussian-Distribution.

html

[2] Implementations of Algorithm 1 on github, https://github.com/frecker/

gaussian-distribution

A simple algorithm for the cumulative distribution function of the Gaussian distribution

deposit_hagen

Publikationsserver der Universitätsbibliothek

Mathematik und

Informatik

Lehrgebiet Stochastik Forschungsbericht

Frank Recker

A simple algorithm for the

cumulative distribution function

of the Gaussian distribution

A simple algorithm for the cumulative distribution function of the Gaussian distribution

Frank Recker

Contents

1 Introduction

2 The Algorithm

3 The Proof

References