• Keine Ergebnisse gefunden

A simple algorithm for the cumulative distribution function of the Gaussian distribution

N/A
N/A
Protected

Academic year: 2022

Aktie "A simple algorithm for the cumulative distribution function of the Gaussian distribution"

Copied!
5
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

deposit_hagen

Publikationsserver der Universitätsbibliothek

Mathematik und

Informatik

Lehrgebiet Stochastik Forschungsbericht

Frank Recker

A simple algorithm for the

cumulative distribution function

of the Gaussian distribution

(2)

A simple algorithm for the cumulative distribution function of the Gaussian distribution

Frank Recker

Abstract

We give a simple algorithm for the value of the cumulative distribution function of the Gaussian distribution.

Key Words:Gaussian distribution, cumulative distribution function, CDF MSC2010 classification:60-04, 68-04

Contents

1 Introduction 1

2 The Algorithm 1

3 The Proof 2

1 Introduction

The Gaussian Integral

Φ(x) = 1

√2π

x

Z

−∞

et

2

2 dt (1)

is ubiquitous in statistical applications. There are libraries such as the GNU scientific library [1] which offer this calculation as a function. However, sometimes one cannot use these libraries due to technical or legal constraints. In this paper, we describe a simple algorithm for this calculation. Example implementations can be found at the github site [2]

of the author.

2 The Algorithm

The algorithm for computingφ(x, n)forx ∈ Randn ∈ N = {1,2, . . .}is defined as follows:

Algorithm 1. Letdn= 0and

dj =−x2 2j

dj+1+ 1 2j+ 1

for allj=n−1, . . . ,1. Finally let

φ(x, n) =1 2 + x

√2π(d1+ 1).

Frank Recker, Technische Hochschule Mittelhessen, Mathematik, Fachbereich MNI, Wiesenstraße 14, 35390 Gießen, E-Mail:frank.recker@mni.thm.de

(3)

The algorithm computes an approximation of the Gaussian cumulative distribution function as defined in Equation (1). Theorem 2 formalizes this.

Theorem 2. Letx∈R. DefineΦ(x)as in Equation (1) andφ(x, n)for alln ∈Nas in Algorithm 1. Then we have

n→∞lim φ(x, n) = Φ(x).

Furthermore ifn≥ x22 then

|φ(x, n)−Φ(x)|< 1

√2π

|x|2n+1 (2n+ 1)2nn!.

Table 1 shows the values of φ(x, n)and the maximum error due to Theorem 2 for somexandn. The values were calculated with the code taken from [2]. Note that all calculations are done with floating point numbers and hence rounding errors will occur.

x n φ(x, n) error bound

1.96 1 1.2819268695868082 no bound

1.96 2 0.7812851592193613 0.28848977918213764 1.96 10 0.9749960638553972 7.014638266104427e-6 1.96 200 0.9750021048517796 1.23e-321

5 1 2.4947114020071637 no bound 5 10 -1169.2649270406318 no bound

5 30 0.9285538915764981 9.958422559186228e-2 5 50 0.9999997133453642 4.5497179496632544e-12 5 200 0.9999997133486902 1.5200212487901728e-158

Table 1: Some values and error bounds forφ(x, n)

3 The Proof

First, we derive a power series forΦ(x).

Lemma 3. LetΦ(x)as in Equation (1). Then

Φ(x) =1 2 + 1

√ 2π

X

k=0

(−1)kx2k+1

(2k+ 1)2kk!. (2)

Proof. We haveΦ(0) = 12 and hence we can rewrite Equation (1) as

Φ(x) = 1 2+ 1

√2π

x

Z

0

et

2

2 dt. (3)

Next we use the power series for theexp-function:

ex=

X

k=0

xk k!.

This series converges absolutely for allx∈R. We insert this into Equation (3) and get

Φ(x) = 1 2+ 1

√2π

x

Z

0

X

k=0

t22k k! dt= 1

2+ 1

√2π

x

Z

0

X

k=0

(−1)kt2k 2kk! dt.

(4)

For convergent power series we can swap integration and summation. Hence we get

Φ(x) =1 2 + 1

√2π

X

k=0 x

Z

0

(−1)kt2k 2kk! dt= 1

2+ 1

√2π

X

k=0

(−1)kt2k+1 (2k+ 1)2kk!

t=x

t=0

and therefore

Φ(x) =1 2 + 1

√2π

X

k=0

(−1)kx2k+1 (2k+ 1)2kk!

as stated in the lemma.

Next, we will show that φ(x, n) is in fact just a partial sum of the series given in Equation (2).

Lemma 4. Letx∈R,n∈Nandφ(x, n)as defined in Algorithm 1. Then we have

φ(x, n) = 1 2+ 1

√ 2π

n−1

X

k=0

(−1)kx2k+1

(2k+ 1)2kk!. (4)

Proof. For brevity define

φ0(x, n) = 1 2+ 1

√2π

n−1

X

k=0

(−1)kx2k+1 (2k+ 1)2kk!.

For allk∈Nletak = 2k+11 andbk =−x2k2. Furthermore leta0 = 1andb0 = x. Its easy to show by complete induction that

φ0(x, n) =1 2 +

n−1

X

k=0

ak k

Y

j=0

bj

.

Writing this in a Horner-scheme style gives

φ0(x, n) =1

2 +b0(a0+b1(a1+b2(· · ·+bn−2(an−2+bn−1an−1). . .))). Algorithm 1 just evaluates this expression from the inner brackets to the outer one (dj is the value of the termbj(aj+bj+1(. . .))for allj= 1, . . . , n−2).

Now we can prove Theorem 2.

Proof of Theorem 2. From Equations (2) and (4) we get

n→∞lim φ(x, n) = Φ(x).

The seriesφ(x, n)is alternating. Letn≥ x22. We compare the terms of the series fork=n andk=n−1:

(−1)nx2n+1 (2n+1)2nn!

(−1)n−1x2n−1 (2n−1)2n−1(n−1)!

=

(2n−1)x2 (2n+ 1)2n

2n−1 2n+ 1

<1.

Hence, the series forφ(x, n)is alternating and the absolute values of the terms are mono- tone decreasing. Calculus shows in this case that|φ(x, n)−Φ(x)|is bound by the absolute value of then-th term of the series which is

√1 2π

|x|2n+1 (2n+ 1)2nn!.

(5)

References

[1] The Gaussian Distribution in the GNU Scientific Library,http://www.gnu.org/

software/gsl/manual/html_node/The-Gaussian-Distribution.

html

[2] Implementations of Algorithm 1 on github, https://github.com/frecker/

gaussian-distribution

Referenzen

ÄHNLICHE DOKUMENTE

Szintai [5]; So for the calculation of the gradient vector components we can use the same Monte Carlo simulation procedure that has been developed for

В данной статье рассмотрены проблемы России и стран с трансформируе- мой экономикой, связанные с обеспечением продовольственной безопасности,

From the results, the performance of the estimators in terms of the MDE is not affected by estimating the parameter values of the GLD using the two-step procedure: the location

• If the whole story keeps on confusing you all the time, we will try to clear up some things on

No caso da estimação do Agregado III (Indústrias de Base Agrícola) adota-se o somatório dos valores adicionados pelos setores agroindustriais subtraídos dos valores adicionados

The group settlement systems as well as the town, urban and rural communities will become an organic part of the regional system of population distribution formed within the

We have applied the described algorithms for the four weighted voting games arising from the two different sets of voting weights of the IMF in 2015 and 2016, see tables 2-5, and

The cells (5 x 106 cells per ml) were layered on top of the gradient as a 2-mm thick starting band and the gra- dient was fractionated after 10 hr. Cell viability was routinely