Building Loss Models

(1)

Munich Personal RePEc Archive

Building Loss Models

Burnecki, Krzysztof and Janczura, Joanna and Weron, Rafal

Wrocław University of Technology, Poland

September 2010

Online at https://mpra.ub.uni-muenchen.de/25492/

MPRA Paper No. 25492, posted 28 Sep 2010 20:42 UTC

(2)

Building Loss Models ¹

Krzysztof Burnecki

²

, Joanna Janczura

²

, and Rafał Weron

³

Abstract: This paper is intended as a guide to building insurance risk (loss) models. A typical model for insurance risk, the so-called collective risk model, treats the aggregate loss as having a compound distribution with two main components: one characterizing the arrival of claims and another describing the severity (or size) of loss resulting from the occurrence of a claim. In this paper we first present efficient simulation algorithms for several classes of claim arrival processes. Then we review a collection of loss distributions and present methods that can be used to assess the goodness-of-fit of the claim size distribution. The collective risk model is often used in health insurance and in general insurance, whenever the main risk components are the number of insurance claims and the amount of the claims. It can also be used for modeling other non-insurance product risks, such as credit and operational risk.

Keywords: Insurance risk model; Loss distribution; Claim arrival process; Poisson process; Renewal process; Random variable generation; Goodness-of-fit testing JEL: C15, C46, C63, G22, G32

1 Chapter prepared for the 2^nd edition of Statistical Tools for Finance and Insurance, P.Cizek, W.Härdle, R.Weron (eds.), Springer-Verlag, forthcoming in 2011.

2 Hugo Steinhaus Center, Wrocław University of Technology, Poland

3 Institute of Organization and Management, Wrocław University of Technology, Poland

(3)

1 Building Loss Models

Krzysztof Burnecki, Joanna Janczura, and Rafa l Weron

1.1 Introduction

A loss model or actuarial risk model is a parsimonious mathematical descrip- tion of the behavior of a collection of risks constituting an insurance portfolio.

It is not intended to replace sound actuarial judgment. In fact, according to Willmot (2001), a well formulated model is consistent with and adds to intu- ition, but cannot and should not replace experience and insight. Moreover, a properly constructed loss model should reﬂect a balance between simplicity and conformity to the data since overly complex models may be too complicated to be useful.

A typical model for insurance risk, the so-called collective risk model, treats the aggregate loss as having a compound distribution with two main components:

one characterizing the frequency (or incidence) of events and another describing the severity (or size or amount) of gain or loss resulting from the occurrence of an event (Kaas et al., 2008; Klugman, Panjer, and Willmot, 2008; Tse, 2009).

The stochastic nature of both components is a fundamental assumption of a realistic risk model. In classical form it is deﬁned as follows. If{Nt}t≥0 is a process counting claim occurrences and {Xk}^∞k=1 is an independent sequence of positive independent and identically distributed (i.i.d.) random variables representing claim sizes, then therisk process {Rt}t≥0 is given by

Rt=u+c(t)−

Nt

X

i=1

Xi. (1.1)

The non-negative constant u stands for the initial capital of the insurance company and the deterministic or stochastic function of timec(t) for the premium from sold insurance policies. The sum{PNt

i=1Xi} is the so-called ag-

(4)

gregate claim process, with the number of claims in the interval (0, t] being modeled by the counting process Nt. Recall, that the latter is deﬁned as Nt = max{n : Pn

i=1Wi ≤ t}, where {Wi}^∞i=0 is a sequence of positive random variables and P0

i=1Wi ≡ 0. In the insurance risk context Nt is also referred to as theclaim arrival process.

The collective risk model is often used in health insurance and in general insurance, whenever the main risk components are the number of insurance claims and the amount of the claims. It can also be used for modeling other non- insurance product risks, such as credit and operational risk (Chernobai, Rachev, and Fabozzi, 2007; Panjer, 2006). In the former, for example, the main risk components are the number of credit events (either defaults or downgrades), and the amount lost as a result of the credit event.

The simplicity of the risk process defined in eqn. (1.1) is only illusionary. In most cases no analytical conclusions regarding the time evolution of the process can be drawn. However, it is this evolution that is important for practitioners, who have to calculate functionals of the risk process like the expected time to ruin and the ruin probability, see Chapter ??. The modeling of the aggregate claim process consists of modeling the counting process{Nt}and the claim size sequence{Xk}. Both processes are usually assumed to be independent, hence can be treated independently of each other (Burnecki, Härdle, and Weron, 2004). Modeling of the claim arrival process {Nt} is treated in Section 1.2, where we present efficient algorithms for four classes of processes. Modeling of claim severities and testing the goodness-of-fit is covered in Sections 1.3 and 1.4, respectively. Finally, in Section 1.5 we build a model for the Danish fire losses dataset, which concerns major fire losses in profits that occurred between 1980 and 2002 and were recorded by Copenhagen Re.

1.2 Claim Arrival Processes

In this section we focus on eﬃcient simulation of the claim arrival process {Nt}. This process can be simulated either via the arrival times{Ti}, i.e. moments when theith claim occurs, or the inter-arrival times (or waiting times) Wi=Ti−Ti−1, i.e. the time periods between successive claims (Burnecki and Weron, 2005). Note that in terms ofWi’s the claim arrival process is given by Nt = P∞

n=1I(Tn ≤ t). In what follows we discuss four examples of {Nt}, namely the classical (homogeneous) Poisson process, the non-homogeneous Poisson process, the mixed Poisson process and the renewal process.

(5)

1.2 Claim Arrival Processes 3

1.2.1 Homogeneous Poisson Process (HPP)

The most common and best known claim arrival process is the homogeneous Poisson process (HPP). It has stationary and independent increments and the number of claims in a given time interval is governed by the Poisson law. While this process is normally appropriate in connection with life insurance modeling, it often suﬀers from the disadvantage of providing an inadequate ﬁt to insurance data in other coverages with substantial temporal variability.

Formally, a continuous-time stochastic process{Nt:t≥0}is a (homogeneous) Poisson process with intensity (or rate)λ >0 if (i){Nt}is a counting process, and (ii) the waiting timesWi are independent and identically distributed and follow an exponential law with intensity λ, i.e. with mean 1/λ (see Section 1.3.2). This deﬁnition naturally leads to a simulation scheme for the successive arrival timesTi of the Poisson process on the interval (0, t]:

Algorithm HPP1 (Waiting times) Step 1: set T0= 0

Step 2: generate an exponential random variableE with intensityλ

Step 3: ifTi−1+E < tthen setTi=Ti−1+E and return to step 2 else stop Sample trajectories of homogeneous (and non-homogeneous) Poisson processes are plotted in the top panels of Figure 1.1. The thin solid line is a HPP with intensityλ= 1 (left) andλ= 10 (right). Clearly the latter jumps more often.

Alternatively, the homogeneous Poisson process can be simulated by applying the following property (Rolski et al., 1999). Given that Nt = n, the n occurrence timesT1, T2, . . . , Tnhave the same distribution as the order statistics corresponding toni.i.d. random variables uniformly distributed on the interval (0, t]. Hence, the arrival times of the HPP on the interval (0, t] can be generated as follows:

Algorithm HPP2 (Conditional theorem)

Step 1: generate a Poisson random variableN with intensityλt

Step 2: generate N random variables Ui distributed uniformly on (0,1), i.e.

Ui∼U(0,1), i= 1,2, . . . , N

Step 3: set (T1, T2, . . . , TN) =t·sort{U1, U2, . . . , UN}

(6)

0 2 4 6 8 10 0

10 20 30 40 50 60 70

t

N(t)

0 1 2 3 4 5

0 10 20 30 40 50 60 70

t

N(t)

λ(t)=1+0⋅t λ(t)=1+0.1⋅t λ(t)=1+1⋅t

λ(t)=10+0⋅cos(2πt) λ(t)=10+1⋅cos(2πt) λ(t)=10+10⋅cos(2πt)

0 0.5 1 1.5 2 2.5 3

30 40 50 60 70 80

Time (years)

Number of events per day

Claim intensity November−December

Figure 1.1:Top left panel: Sample trajectories of a NHPP with linear intensity λ(t) =a+b·t. Note that the ﬁrst process (with b= 0) is in fact a HPP.Top right panel: Sample trajectories of a NHPP with periodic intensity λ(t) =a+b·cos(2πt). Again, the ﬁrst process is a HPP.

Bottom panel: Intensity of car accident claims in Greater Wroc law area, Poland, in the period 1998-2000 (data from one of the major insurers in the region). Note, the larger number of accidents in late Fall/early Winter due to worse weather conditions.

STF2loss01.m

In general, this algorithm will run faster than HPP1 as it does not involve a loop. The only two inherent numerical diﬃculties involve generating a Poisson random variable and sorting a vector of occurrence times. Whereas the latter problem can be solved, for instance, via the standard quicksort algorithm im- plemented in most statistical software packages (likesortrows.min Matlab), the former requires more attention.

(7)

1.2 Claim Arrival Processes 5 A straightforward algorithm to generate a Poisson random variable would take N = min{n:U1·. . .·Un <exp(−λ)} −1, (1.2) which is a consequence of the properties of the HPP (see above). However, for largeλ, this method can become slow as the expected run time is proportional toλ. Faster, but more complicated methods are available. Ahrens and Dieter (1982) suggested a generator which utilizes acceptance-complement with trun- cated normal variates forλ >10 and reverts to table-aided inversion otherwise.

Stadlober (1989) adapted the ratio of uniforms method forλ >5 and classical inversion for smallλ’s. H¨ormann (1993) advocated the transformed rejection method, which is a combination of the inversion and rejection algorithms. Sta- tistical software packages often use variants of these methods. For instance, Matlab’spoissrnd.mfunction uses the waiting time method (1.2) forλ <15 and Ahrens’ and Dieter’s method for larger values ofλ.

Finally, since for the HPP the expected value of the process E(Nt) = λt, it is natural to deﬁne the premium function as c(t) = ct, where c = (1 +θ)µλ and µ = E(Xk). The parameter θ > 0 is the relative safety loading which

“guarantees” survival of the insurance company. With such a choice of the premium function we obtain the classical form of the risk process:

Rt=u+ (1 +θ)µλt−

Nt

X

i=1

Xi. (1.3)

1.2.2 Non-Homogeneous Poisson Process (NHPP)

The choice of a homogeneous Poisson process implies that the size of the portfolio cannot increase or decrease. In addition, it cannot describe situations, like in motor insurance, where claim occurrence epochs are likely to depend on the time of the year (worse weather conditions in Central Europe in late Fall/early Winter lead to more accidents, see the bottom panel in Figure 1.1) or of the week (heavier traﬃc occurs on Friday afternoons and before holidays).

For modeling such phenomena the non-homogeneous Poisson process (NHPP) is much better. The NHPP can be thought of as a Poisson process with a variable (but predictable) intensity deﬁned by the deterministic intensity (or rate) functionλ(t). Note that the increments of a NHPP do not have to be stationary. In the special case when the intensity takes a constant valueλ(t) =λ, the NHPP reduces to the homogeneous Poisson process with intensityλ.

(8)

The simulation of the process in the non-homogeneous case is slightly more complicated than for the HPP. The ﬁrst approach, known as the thinning or rejection method, is based on the following fact (Bratley, Fox, and Schrage, 1987; Ross, 2002). Suppose that there exists a constantλsuch thatλ(t)≤λ for all t. LetT₁^∗, T₂^∗, T₃^∗, . . . be the successive arrival times of a homogeneous Poisson process with intensity λ. If we accept the ith arrival time T_i^∗ with probability λ(T_i^∗)/λ, independently of all other arrivals, then the sequence {Ti}^∞i=0 of the accepted arrival times (in ascending order) forms a sequence of the arrival times of a non-homogeneous Poisson process with the rate function λ(t). The resulting simulation algorithm on the interval (0, t] reads as follows:

Algorithm NHPP1 (Thinning) Step 1: setT0= 0 andT^∗= 0

Step 2: generate an exponential random variableEwith intensity λ Step 3: ifT^∗+E < tthen setT^∗=T^∗+E else stop

Step 4: generate a random variableU distributed uniformly on (0,1) Step 5: ifU < λ(T^∗)/λ then setTi=T^∗(→accept the arrival time) Step 6: return to step 2

As mentioned in the previous section, the inter-arrival times of a homogeneous Poisson process have an exponential distribution. Therefore steps 2–3 generate the next arrival time of a homogeneous Poisson process with intensityλ.

Steps 4–5 amount to rejecting (hence the name of the method) or accepting a particular arrival as part of the thinned process (hence the alternative name).

Note, that in this algorithm we generate a HPP with intensityλemploying the HPP1 algorithm. We can also generate it using the HPP2 algorithm, which in general is much faster.

The second approach is based on the observation that for a NHPP with rate function λ(t) the incrementNt−Ns, 0 < s < t, is distributed as a Poisson random variable with intensity eλ = Rt

sλ(u)du (Grandell, 1991). Hence, the cumulative distribution functionFsof the waiting time Wsis given by

Fs(t) = P(Ws≤t) = 1−P(Ws> t) = 1−P(Ns+t−Ns= 0) =

= 1−exp

− Z s+t

s

λ(u)du

= 1−exp

− Z t

0

λ(s+v)dv

.

(9)

1.2 Claim Arrival Processes 7 If the functionλ(t) is such that we can ﬁnd a formula for the inverseF_s⁻¹ for eachs, we can generate a random quantityX with the distributionFsby using the inverse transform method. The simulation algorithm on the interval (0, t], often called theintegration method, can be summarized as follows:

Algorithm NHPP2 (Integration) Step 1: setT0= 0

Step 2: generate a random variableU distributed uniformly on (0,1)

Step 3: ifTi−1+F_s⁻¹(U)< tsetTi=Ti−1+F_s⁻¹(U) and return to step 2 else stop

The third approach utilizes a generalization of the property used in the HPP2 algorithm. Given that Nt = n, the n occurrence times T1, T2, . . . , Tn of the non-homogeneous Poisson process have the same distributions as the order statistics corresponding tonindependent random variables distributed on the interval (0, t], each with the common density functionf(v) =λ(v)/Rt

0λ(u)du, wherev∈(0, t]. Hence, the arrival times of the NHPP on the interval (0, t] can be generated as follows:

Algorithm NHPP3 (Conditional theorem)

Step 1: generate a Poisson random variable N with intensityRt 0λ(u)du Step 2: generate N random variables Vi, i = 1,2, . . . N with density f(v) =

λ(v)/Rt

0λ(u)du.

Step 3: set (T1, T2, . . . , TN) = sort{V1, V2, . . . , VN}.

The performance of the algorithm is highly dependent on the eﬃciency of the computer generator of random variables Vi. Simulation of Vi’s can be done either via the inverse transform method by integrating the density f(v) or via the acceptance-rejection technique using the uniform distribution on the interval (0, t) as the reference distribution. In a sense, the former approach leads to Algorithm NHPP2, whereas the latter one to Algorithm NHPP1.

Sample trajectories of non-homogeneous Poisson processes are plotted in the top panels of Figure 1.1. In the top left panel realizations of a NHPP with linear intensityλ(t) = a+b·t are presented for the same value of parameter

(10)

a. Note, that the higher the value of parameterb, the more pronounced is the increase in the intensity of the process. In the top right panel realizations of a NHPP with periodic intensityλ(t) =a+b·cos(2πt) are illustrated, again for the same value of parameter a. This time, for high values of parameterb the events exhibit a seasonal behavior. The process has periods of high activity (grouped around natural values oft) and periods of low activity, where almost no jumps take place. Such a process is much better suited to model the seasonal intensity of car accident claims (see the bottom panel in Figure 1.1) than the HPP.

Finally, we note that since in the non-homogeneous case the expected value of the process at timetis E(Nt) =Rt

0λ(s)ds, it is natural to deﬁne the premium function asc(t) = (1 +θ)µRt

0λ(s)ds. Then the risk process takes the form:

Rt=u+ (1 +θ)µ Z t

0

λ(s)ds−

Nt

X

i=1

Xi. (1.4)

1.2.3 Mixed Poisson Process

In many situations the portfolio of an insurance company is diversified in the sense that the risks associated with different groups of policy holders are sig- nificantly different. For example, in motor insurance we might want to make a difference between male and female drivers or between drivers of different age.

We would then assume that the claims come from a heterogeneous group of clients, each one of them generating claims according to a Poisson distribution with the intensity varying from one group to another.

Another practical reason for considering yet another generalization of the classical Poisson process is the following. If we measure the volatility of risk processes, expressed in terms of the index of dispersion Var(Nt)/E(Nt), then often we obtain estimates in excess of one – a value obtained for the homogeneous and the non-homogeneous cases. These empirical observations led to the introduction of the mixed Poisson process (MPP), see Rolski et al. (1999).

In the mixed Poisson process the distribution of {Nt} is given by a mixed Poisson distribution (Rolski et al., 1999). This means that, conditioning on an extrinsic random variable Λ (called a structure variable), the random variable {Nt}has a Poisson distribution. Typical examples for Λ are two-point, gamma and general inverse Gaussian distributions (Teugels and Vynckier, 1996). Since

(11)

1.2 Claim Arrival Processes 9 for each t the claim numbers {Nt} up to time t are Poisson variates with intensity Λt, it is now reasonable to consider the premium function of the form c(t) = (1 +θ)µΛt. This leads to the following representation of the risk process:

Rt=u+ (1 +θ)µΛt−

Nt

X

i=1

Xi. (1.5)

The MPP can be generated using the uniformity property: given thatNt=n, then occurrence times T1, T2, . . . , Tn have the same joint distribution as the order statistics corresponding toni.i.d. random variables uniformly distributed on the interval (0, t] (Albrecht, 1982). The procedure starts with the simulation ofnas a realization ofNtfor a given value oft. This can be done in the following way: ﬁrst a realization of a non-negative random variable Λ is generated and, conditioned upon its realization,Ntis simulated according to the Poisson law with parameter Λt. Then we simulate n uniform random numbers in (0, t).

After rearrangement, these values yield the sampleT1≤. . .≤Tnof occurrence times. The algorithm is summarized below.

Algorithm MPP1 (Conditional theorem)

Step 1: generate a mixed Poisson random variableN with intensity Λt Step 2: generate N random variables Ui distributed uniformly on (0,1), i.e.

Ui∼U(0,1), i= 1,2, . . . , N

Step 3: set (T1, T2, . . . , TN) =t·sort{U1, U2, . . . , UN}

1.2.4 Renewal Process

Generalizing the homogeneous Poisson process we come to the point where instead of makingλnon-constant, we can make a variety of diﬀerent distribu- tional assumptions on the sequence of waiting times{W1, W2, . . .}of the claim arrival process{Nt}. In some particular cases it might be useful to assume that the sequence is generated by a renewal process, i.e. the random variablesWi

are i.i.d., positive with a distribution functionF. Note that the homogeneous Poisson process is a renewal process with exponentially distributed inter-arrival times. This observation lets us write the following algorithm for the generation of the arrival times of a renewal process on the interval (0, t]:

(12)

Algorithm RP1 (Waiting times) Step 1: setT0= 0

Step 2: generate anF-distributed random variableX

Step 3: ifTi−1+X < tthen setTi=Ti−1+X and return to step 2 else stop An important point in the previous generalizations of the Poisson process was the possibility to compensate risk and size fluctuations by the premiums. Thus, the premium rate had to be constantly adapted to the development of the claims. For renewal claim arrival processes, a constant premium rate allows for a constant safety loading (Embrechts and Klüppelberg, 1993). Let {Nt} be a renewal process and assume that W1 has finite mean 1/λ. Then the premium function is defined in a natural way asc(t) = (1 +θ)µλt, like for the homogeneous Poisson process, which leads to the risk process of the form (1.3).

1.3 Loss Distributions

There are three basic approaches to deriving the loss distribution: empirical, analytical, and moment based. The empirical method, presented in Section 1.3.1, can be used only when large data sets are available. In such cases a sufficiently smooth and accurate estimate of the cumulative distribution function (cdf) is obtained. Sometimes the application of curve fitting techniques – used to smooth the empirical distribution function – can be beneficial. If the curve can be described by a function with a tractable analytical form, then this approach becomes computationally efficient and similar to the second method.

The analytical approach is probably the most often used in practice and cer- tainly the most frequently adopted in the actuarial literature. It reduces to ﬁnding a suitable analytical expression which ﬁts the observed data well and which is easy to handle. Basic characteristics and estimation issues for the most popular and useful loss distributions are discussed in Sections 1.3.2-1.3.8.

Note, that sometimes it may be helpful to subdivide the range of the claim size distribution into intervals for which diﬀerent methods are employed. For example, the small and medium size claims could be described by the empirical claim size distribution, while the large claims – for which the scarcity of data eliminates the use of the empirical approach – by an analytical loss distribution.

In some applications the exact shape of the loss distribution is not required.

We may then use the moment based approach, which consists of estimating

(13)

1.3 Loss Distributions 11 only the lowest characteristics (moments) of the distribution, like the mean and variance. However, it should be kept in mind that even the lowest three or four moments do not fully deﬁne the shape of a distribution, and therefore the ﬁt to the observed data may be poor. Further details on the moment based approach can be found e.g. in Daykin, Pentikainen, and Pesonen (1994).

Having a large collection of distributions to choose from, we need to narrow our selection to a single model and a unique parameter estimate. The type of the objective loss distribution can be easily selected by comparing the shapes of the empirical and theoretical mean excess functions. Goodness-of-ﬁt can be measured by tests based on the empirical distribution function. Finally, the hypothesis that the modeled random event is governed by a certain loss distribution can be statistically tested. In Section 1.4 these statistical issues are thoroughly discussed.

1.3.1 Empirical Distribution Function

A natural estimate for the loss distribution is the observed (empirical) claim size distribution. However, if there have been changes in monetary values during the observation period, inﬂation corrected data should be used. For a sample of observations{x1, . . . , xn}the empirical distribution function (edf) is deﬁned as:

Fn(x) = 1

n#{i:xi≤x}, (1.6) i.e. it is a piecewise constant function with jumps of size 1/nat pointsxi. Very often, especially if the sample is large, the edf is approximated by a continuous, piecewise linear function with the “jump points” connected by linear functions, see Figure 1.2.

The empirical distribution function approach is appropriate only when there is a suﬃciently large volume of claim data. This is rarely the case for the tail of the distribution, especially in situations where exceptionally large claims are possible. It is often advisable to divide the range of relevant values of claims into two parts, treating the claim sizes up to some limit on a discrete basis, while the tail is replaced by an analytical cdf.

If the claim statistics are too sparse to use the empirical approach it is desirable to ﬁnd an explicit analytical expression for a loss distribution. It should be stressed, however, that many standard models in statistics – like the Gaussian distribution – are unsuitable for ﬁtting the claim size distribution. The main

(14)

0 1 2 3 4 5 0

0.2 0.4 0.6 0.8 1

x

cdf(x)

0 1 2 3 4 5

0 0.2 0.4 0.6 0.8 1

x

cdf(x)

Empirical df (edf)

Lognormal cdf Approx. of the edf

Figure 1.2:Left panel: Empirical distribution function (edf) of a 10-element log-normally distributed sample with parameters µ = 0.5 and σ = 0.5, see Section 1.3.5. Right panel: Approximation of the edf by a continuous, piecewise linear function superimposed on the theoretical distribution function.

STF2loss02.m

reason for this is the strongly skewed nature of loss distributions. The log- normal, Pareto, Burr, and Weibull distributions are typical candidates to be considered in applications. However, before we review these probability laws we introduce two very versatile distributions – the exponential and gamma.

1.3.2 Exponential Distribution

Consider the random variable with the following density and distribution functions, respectively:

f(x) = βe^−βx, x >0, (1.7)

F(x) = 1−e^−βx, x >0. (1.8)

This distribution is called the exponential distribution with parameter (or intensity)β >0. The Laplace transform of (1.7) is

L(t)^def= Z ∞

0

e^−txf(x)dx= β

β+t, t >−β, (1.9)

(15)

1.3 Loss Distributions 13

yielding the general formula for thek-th raw moment mk

def= (−1)^k∂^kL(t)

∂t^k

t=0= k!

β^k. (1.10)

The mean and variance are thus β⁻¹ and β⁻², respectively. The maximum likelihood estimator (equal to the method of moments estimator) forβis given by:

βˆ= 1 ˆ m1

, (1.11)

where

ˆ mk= 1

n Xn

i=1

x^k_i, (1.12)

is the samplek-th raw moment.

To generate an exponential random variableX with intensity β we can use the inverse transform (or inversion) method (Devroye, 1986; Ross, 2002). The method consists of taking a random number U distributed uniformly on the interval (0,1) and settingX =F⁻¹(U), whereF⁻¹(x) =−_β¹log(1−x) is the inverse of the exponential cdf (1.8). In fact we can setX = −¹_βlogU since (1−U) has the same distribution asU.

The exponential distribution has many interesting features. As we have seen in Section 1.2.1, it arises as the inter-occurrence time of the events in a HPP. It has the memoryless property, i.e. P(X > x+y|X > y) = P(X > x). Further, then-th root of the Laplace transform (1.9) is

L(t) = β

β+t ¹n

, (1.13)

which is the Laplace transform of a gamma variate (see Section 1.3.4). Thus the exponential distribution is inﬁnitely divisible.

The exponential distribution is often used in developing models of insurance risks. This usefulness stems in a large part from its many and varied tractable mathematical properties. However, a disadvantage of the exponential distribution is that its density is monotone decreasing (see Figure 1.3), a situation which may not be appropriate in some practical situations.

(16)

0 2 4 6 8 0

0.2 0.4 0.6 0.8

x

pdf(x)

Exp(0.3) Exp(1) MixExp(0.5,0.3,1)

0 2 4 6 8

10⁻² 10⁻¹ 10⁰

x

pdf(x)

Figure 1.3:Left panel: A probability density function (pdf) of a mixture of two exponential distributions with mixing parameter a= 0.5 superimposed on the pdfs of the component distributions. Right panel:

Semi-logarithmic plots of the pdfs displayed in the left panel. The exponential pdfs are now straight lines with slopes −β. Note, the curvature of the pdf of the mixture of two exponentials.

STF2loss03.m

1.3.3 Mixture of Exponential Distributions

Using the technique of mixing one can construct a wide class of distributions.

The most commonly used in applications is a mixture of two exponentials, see Chapter ??. In Figure 1.3 a pdf of a mixture of two exponentials is plotted together with the pdfs of the mixing laws. Note, that the mixing procedure can be applied to arbitrary distributions.

The construction goes as follows. Let a1, a2, . . . , an denote a series of non- negative weights satisfyingPn

i=1ai= 1. LetF1(x), F2(x), . . . , Fn(x) denote an arbitrary sequence of exponential distribution functions given by the parame- tersβ1, β2, . . . , βn, respectively. Then, the distribution function:

F(x) = Xn i=1

aiFi(x) = Xn i=1

ai{1−exp(−βix)}, (1.14) is called a mixture ofn exponential distributions (exponentials). The density

(17)

function of the constructed distribution is f(x) =

Xn i=1

aifi(x) = Xn i=1

aiβiexp(−βix), (1.15) where f1(x), f2(x), . . . , fn(x) denote the density functions of the input exponential distributions. The Laplace transform of (1.15) is

L(t) = Xn

i=1

ai

βi

βi+t, t >− min

i=1...n{βi}, (1.16) yielding the general formula for thek-th raw moment

mk= Xn i=1

ai

k!

β_i^k. (1.17)

The mean is thusPn

i=1aiβ_i⁻¹. The maximum likelihood and method of moments estimators for the mixture ofn (n ≥ 2) exponential distributions can only be evaluated numerically.

Simulation of variates deﬁned by (1.14) can be performed using the composition approach (Ross, 2002). First generate a random variable I, equal to i with probabilityai,i= 1, ..., n. Then simulate an exponential variate with intensity βI. Note, that the method is general in the sense that it can be used for any set of distributionsFi’s.

1.3.4 Gamma Distribution

The probability law with density and distribution functions given by:

f(x) = β(βx)^α−1e^−βx

Γ(α), x >0, (1.18)

F(x) = Z x

0

β(βs)^α−1e^−βs

Γ(α)ds, x >0, (1.19) where αand β are non-negative, is known as the gamma (or Pearson’s Type III) distribution. In the above formulas

Γ(a)^def= Z ∞

0

y^a−1e^−ydy, (1.20)

(18)

is the standard gamma function. Moreover, forβ= 1 the integral in (1.19):

Γ(α, x)^def= 1 Γ(α)

Z x 0

s^α−1e^−sds, (1.21)

is called the incomplete gamma function. If the shape parameter α= 1, the exponential distribution results. Ifα is a positive integer, the distribution is called an Erlang law. If β = ¹₂ and α = ^ν₂ then it is called a chi-squared (χ²) distribution withν degrees of freedom, see the top panels in Figure 1.4.

Moreover, a mixed Poisson distribution with gamma mixing distribution is negative binomial.

The gamma distribution is closed under convolution, i.e. a sum of independent gamma variates with the same parameter β is again gamma distributed with this β. Hence, it is inﬁnitely divisible. Moreover, it is right-skewed and approaches a normal distribution in the limit asαgoes to inﬁnity.

The Laplace transform of the gamma distribution is given by:

L(t) = β

β+t α

, t >−β. (1.22)

Thek-th raw moment can be easily derived from the Laplace transform:

mk= Γ(α+k)

Γ(α)β^k . (1.23)

Hence, the mean and variance are

E(X) = α

β, (1.24)

Var(X) = α

β². (1.25)

Finally, the method of moments estimators for the gamma distribution parameters have closed form expressions:

ˆ

α = mˆ²₁ ˆ

m2−mˆ²₁, (1.26)

βˆ = mˆ1

ˆ

m2−mˆ²₁, (1.27)

but maximum likelihood estimators can only be evaluated numerically.

(19)

0 2 4 6 8

0 0.1 0.2 0.3 0.4 0.5 0.6

x

pdf(x)

0 2 4 6 8

10⁻³ 10⁻² 10⁻¹ 10⁰

x

pdf(x)

Gamma(1,2) Gamma(2,1) Gamma(3,0.5)

0 5 10 15 20 25

0 0.1 0.2 0.3 0.4 0.5

x

pdf(x)

LogN(2,1) LogN(2,0.1) LogN(0.5,2)

0 5 10 15 20 25

10⁻³ 10⁻² 10⁻¹ 10⁰

x

pdf(x)

Figure 1.4:Top panels: Three sample gamma pdfs, Gamma(α, β), on linear and semi-logarithmic plots. Note, that the ﬁrst one (black solid line) is an exponential law, while the last one (dashed blue line) is a χ² distribution withν = 6 degrees of freedom. Bottom pan- els: Three sample log-normal pdfs, LogN(µ, σ), on linear and semi- logarithmic plots. For smallσthe log-normal distribution resembles the Gaussian.

STF2loss04.m Unfortunately, simulation of gamma variates is not straightforward. Forα <1 a simple but slow algorithm due to J¨ohnk (1964) can be used, while for α >1 the rejection method is more optimal (Bratley, Fox, and Schrage, 1987; Devroye, 1986). The gamma law is one of the most important distributions for modeling because it has very tractable mathematical properties. As we have seen above it is also very useful in creating other distributions, but by itself is rarely a reasonable model for insurance claim sizes.

(20)

1.3.5 Log-Normal Distribution

Consider a random variableX which has the normal distribution with density fN(x) = 1

√2πσexp

−1 2

(x−µ)² σ²

, −∞< x <∞. (1.28) LetY =e^X so thatX = logY. Then the probability density function ofY is given by:

f(y) =fN(logy)1

y = 1

√2πσyexp

−1 2

(logy−µ)² σ²

, y >0, (1.29) where σ > 0 is the scale and −∞ < µ <∞ is the location parameter. The distribution ofY is called log-normal; in econometrics it is also known as the Cobb-Douglas law. The log-normal cdf is given by:

F(y) = Φ

logy−µ σ

, y >0, (1.30)

where Φ(·) is the standard normal (with mean 0 and variance l) distribution function. The k-th raw moment mk of the log-normal variate can be easily derived using results for normal random variables:

mk= E Y^k

= E e^kX

=MX(k) = exp

µk+σ²k² 2

, (1.31)

whereMX(z) is the moment generating function of the normal distribution. In particular, the mean and variance are

E(X) = exp

µ+σ² 2

, (1.32)

Var(X) =

exp σ²

−1 exp 2µ+σ²

, (1.33)

respectively. For both standard parameter estimation techniques the estimators are known in closed form. The method of moments estimators are given by:

ˆ

µ = 2 log 1 n

Xn i=1

xi

!

−1 2log 1

n Xn i=1

x²_i

!

, (1.34)

ˆ

σ² = log 1 n

Xn i=1

x²_i

!

−2 log 1 n

Xn i=1

xi

!

, (1.35)

(21)

1.3 Loss Distributions 19 while the maximum likelihood estimators by:

ˆ

µ = 1

n Xn i=1

log(xi), (1.36)

ˆ

σ² = 1 n

Xn i=1

{log(xi)−µˆ}². (1.37) Finally, the generation of a log-normal variate is straightforward. We simply have to take the exponent of a normal variate.

The log-normal distribution is very useful in modeling of claim sizes. For large σits tail is (semi-)heavy – heavier than the exponential but lighter than power- law, see the bottom panels in Figure 1.4. For smallσthe log-normal resembles a normal distribution, although this is not always desirable. It is inﬁnitely divisible and closed under scale and power transformations. However, it also suﬀers from some drawbacks. Most notably, the Laplace transform does not have a closed form representation and the moment generating function does not exist.

1.3.6 Pareto Distribution

Suppose that a variateX has (conditional onβ) an exponential distribution with intensity β (i.e. with mean β⁻¹, see Section 1.3.2). Further, suppose thatβ itself has a gamma distribution (see Section 1.3.4). The unconditional distribution ofX is a mixture and is called the Pareto distribution. Moreover, it can be shown that ifX is an exponential random variable andY is a gamma random variable, thenX/Y is a Pareto random variable.

The density and distribution functions of a Pareto variate are given by:

f(x) = αλ^α

(λ+x)^α+1, x >0, (1.38) F(x) = 1−

λ λ+x

α

, x >0, (1.39)

respectively. Clearly, the shape parameter α and the scale parameter λ are both positive. Thek-th raw moment:

mk=λ^kk!Γ(α−k)

Γ(α) , (1.40)

(22)

0 1 2 3 4 5 0

0.2 0.4 0.6 0.8 1

x

pdf(x)

10⁰ 10¹ 10²

10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰

x

pdf(x)

Par(0.5,2) Par(2,0.5) Par(2,1)

Figure 1.5: Three sample Pareto pdfs, Par(α, λ), on linear and double- logarithmic plots. The thick power-law tails of the Pareto distribution (asymptotically linear in the log-log scale) are clearly visible.

STF2loss05.m

exists only fork < α. The mean and variance are thus:

E(X) = λ

α−1, (1.41)

Var(X) = αλ²

(α−1)²(α−2), (1.42) respectively. Note, that the mean exists only for α >1 and the variance only for α >2. Hence, the Pareto distribution has very thick (or heavy) tails, see Figure 1.5. The method of moments estimators are given by:

ˆ

α = 2 mˆ2−mˆ²₁ ˆ

m2−2 ˆm²₁, (1.43)

λˆ = mˆ1mˆ2

ˆ

m2−2 ˆm²₁, (1.44)

where, as before, ˆmk is the sample k-th raw moment (1.12). Note, that the estimators are well deﬁned only when ˆm2−2 ˆm²₁>0. Unfortunately, there are no closed form expressions for the maximum likelihood estimators and they can only be evaluated numerically.

Like for many other distributions the simulation of a Pareto variate X can be conducted via the inverse transform method. The inverse of the cdf (1.39)

(23)

1.3 Loss Distributions 21 has a simple analytical form F⁻¹(x) = λ

(1−x)^−1/α−1 . Hence, we can setX =λ U^−1/α−1

, whereU is distributed uniformly on the unit interval.

We have to be cautious, however, whenαis larger but very close to one. The theoretical mean exists, but the right tail is very heavy. The sample mean will, in general, be signiﬁcantly lower than E(X).

The Pareto law is very useful in modeling claim sizes in insurance, due in large part to its extremely thick tail. Its main drawback lies in its lack of mathematical tractability in some situations. Like for the log-normal distribution, the Laplace transform does not have a closed form representation and the moment generating function does not exist. Moreover, like the exponential pdf the Pareto density (1.38) is monotone decreasing, which may not be adequate in some practical situations.

1.3.7 Burr Distribution

Experience has shown that the Pareto formula is often an appropriate model for the claim size distribution, particularly where exceptionally large claims may occur. However, there is sometimes a need to find heavy-tailed distributions which offer greater flexibility than the Pareto law, including a non-monotone pdf. Such flexibility is provided by the Burr distribution and its additional shape parameterτ >0. IfY has the Pareto distribution, then the distribution ofX =Y^1/τ is known as the Burr distribution, see the top panels in Figure 1.6. Its density and distribution functions are given by:

f(x) = τ αλ^α x^τ−1

(λ+x^τ)^α+1, x >0, (1.45) F(x) = 1−

λ λ+x^τ

α

, x >0, (1.46)

respectively. Thek-th raw moment mk= 1

Γ(α)λ^k/τΓ

1 + k τ

Γ

α−k

τ

, (1.47)

exists only fork < τ α. Naturally, the Laplace transform does not exist in a closed form and the distribution has no moment generating function as it was the case with the Pareto distribution.

(24)

0 2 4 6 8 0

0.2 0.4 0.6 0.8 1 1.2

x

pdf(x)

10⁰ 10²

10⁻⁴ 10⁻² 10⁰

x

pdf(x)

0 1 2 3 4 5

0 0.2 0.4 0.6 0.8 1 1.2

x

pdf(x)

0 1 2 3 4 5

10⁻⁴ 10⁻² 10⁰

x

pdf(x)

Burr(0.5,2,1.5) Burr(0.5,0.5,5) Burr(2,1,0.5)

Weib(1,0.5) Weib(1,2) Weib(0.01,6)

Figure 1.6:Top panels: Three sample Burr pdfs, Burr(α, λ, τ), on linear and double-logarithmic plots. Note, the heavy, power-law tails. Bottom panels: Three sample Weibull pdfs, Weib(β, τ), on linear and semi- logarithmic plots. We can see that for τ < 1 the tails are much heavier and they look like power-law.

STF2loss06.m

The maximum likelihood and method of moments estimators for the Burr distribution can only be evaluated numerically. A Burr variateXcan be generated using the inverse transform method. The inverse of the cdf (1.46) has a simple analytical form F⁻¹(x) =

λ

(1−x)^−1/α−1 ^1/τ. Hence, we can set X =

λ U^−1/α−1 ^1/τ, where U is distributed uniformly on the unit interval. Like in the Pareto case, we have to be cautious whenτ αis larger but very close to one. The sample mean will generally be signiﬁcantly lower than E(X).

(25)

1.4 Statistical Validation Techniques 23

1.3.8 Weibull Distribution

If V is an exponential variate, then the distribution of X = V^1/τ, τ > 0, is called the Weibull (or Frechet) distribution. Its density and distribution functions are given by:

f(x) = τ βx^τ−1e^−βx^τ, x >0, (1.48) F(x) = 1−e^−βx^τ, x >0, (1.49) respectively. Forτ = 2 it is known as the Rayleigh distribution. The Weibull law is roughly symmetrical for the shape parameterτ≈3.6. Whenτ is smaller the distribution is right-skewed, whenτis larger it is left-skewed, see the bottom panels in Figure 1.6. We can also observe that for τ < 1 the distribution becomes heavy-tailed. In this case, like for the Pareto and Burr distributions, the moment generating function is inﬁnite. Thek-th raw moment can be shown to be

mk =β^−k/τΓ

1 + k τ

. (1.50)

Like for the Burr distribution, the maximum likelihood and method of moments estimators can only be evaluated numerically. Similarly, Weibull variates can be generated using the inverse transform method.

1.4 Statistical Validation Techniques

Having a large collection of distributions to choose from we need to narrow our selection to a single model and a unique parameter estimate. The type of the objective loss distribution can be easily selected by comparing the shapes of the empirical and theoretical mean excess functions. The mean excess function, presented in Section 1.4.1, is based on the idea of conditioning a random variable given that it exceeds a certain level.

Once the distribution class is selected and the parameters are estimated using one of the available methods the goodness-of-ﬁt has to be tested. A standard approach consists of measuring the distance between the empirical and the ﬁtted analytical distribution function. A group of statistics and tests based on this idea is discussed in Section 1.4.2.

(26)

1.4.1 Mean Excess Function

For a claim amount random variable X, the mean excess function or mean residual life function is the expected payment per claim on a policy with a ﬁxed amount deductible ofx, where claims with amounts less than or equal to xare completely ignored:

e(x) = E(X−x|X > x) = R∞

x {1−F(u)}du

1−F(x) . (1.51)

In practice, the mean excess functioneis estimated by ˆenbased on a represen- tative samplex1, . . . , xn:

ˆ en(x) =

P

xi>xxi

#{i:xi> x} −x. (1.52) Note, that in a ﬁnancial risk management context, switching from the right tail to the left tail,e(x) is referred to as the expected shortfall (Weron, 2004).

When considering the shapes of mean excess functions, the exponential distribution plays a central role. It has the memoryless property, meaning that whether the informationX > x is given or not, the expected value of X−x is the same as if one started at x = 0 and calculated E(X). The mean excess function for the exponential distribution is therefore constant. One in fact easily calculates that for this casee(x) = 1/β for allx >0.

If the distribution ofX is heavier-tailed than the exponential distribution we ﬁnd that the mean excess function ultimately increases, when it is lighter- tailede(x) ultimately decreases. Hence, the shape of e(x) provides important information on the sub-exponential or super-exponential nature of the tail of the distribution at hand.

Mean excess functions for the distributions discussed in Section 1.3 are given by the following formulas (and plotted in Figure 1.7):

• exponential:

e(x) = 1 β;

• mixture of two exponentials:

e(x) =

a

β₁exp(−β1x) +^1−a_β

2 exp(−β2x) aexp(−β1x) + (1−a) exp(−β2x);

(27)

1.4 Statistical Validation Techniques 25

0 5 10 15 20

50 100 150

e(x)

x

0 5 10 15 20

50 100 150

e(x)

x Log−normal

Gamma (α<1) Gamma (α>1) Mix−Exp

Pareto Weibull (τ<1) Weibull (τ>1) Burr

Figure 1.7:Left panel: Shapes of the mean excess functione(x) for the log- normal, gamma (with α < 1 and α > 1) and a mixture of two exponential distributions. Right panel: Shapes of the mean excess functione(x) for the Pareto, Weibull (with τ <1 andτ >1) and Burr distributions.

STF2loss07.m

• gamma:

e(x) = α

β · 1−F(x, α+ 1, β)

1−F(x, α, β) −x = β⁻¹{1 +o(1)}, whereF(x, α, β) is the gamma distribution function (1.19);

• log-normal:

e(x) =

exp

µ+^σ₂² n 1−Φ

lnx−µ−σ² σ

o n1−Φ

lnx−µ σ

o −x=

= σ²x

lnx−µ{1 +o(1)},

whereo(1) stands for a term which tends to zero asu→ ∞;

• Pareto:

e(x) =λ+x

α−1, α >1;

(28)

• Burr:

e(x) = λ^1/τΓ α−¹_τ

Γ 1 +¹_τ

Γ(α) ·

λ λ+x^τ

−α

·

1−B

1 + 1 τ, α−1

τ, x^τ λ+x^τ

−x=

= x

ατ−1{1 +o(1)}, ατ >1, where Γ(·) is the standard gamma function (1.20) and

B(a, b, x)^def= Γ(a+b) Γ(a)Γ(b)

Z x 0

y^a−1(1−y)^b−1dy, (1.53) is the beta function;

• Weibull:

e(x) = Γ (1 + 1/τ) β^1/τ

1−Γ

1 + 1

τ, βx^τ

exp (βx^τ)−x=

= x^1−τ

βτ {1 +o(1)},

where Γ(·,·) is the incomplete gamma function (1.21).

1.4.2 Tests Based on the Empirical Distribution Function

A statistics measuring the difference between the empiricalFn(x) and the fit- tedF(x) distribution function, called an edf statistic, is based on the vertical difference between the distributions. This distance is usually measured either by a supremum or a quadratic norm (D’Agostino and Stephens, 1986).

The most popular supremum statistic:

D= sup

x |Fn(x)−F(x)|, (1.54) is known as the Kolmogorov or Kolmogorov-Smirnov statistic. It can also be written in terms of two supremum statistics:

D⁺= sup

x {Fn(x)−F(x)} and D⁻= sup

x {F(x)−Fn(x)},

(29)

1.4 Statistical Validation Techniques 27 where the former is the largest vertical difference when Fn(x) is larger than F(x) and the latter is the largest vertical difference when it is smaller. The Kolmogorov statistic is then given by D = max(D⁺, D⁻). A closely re- lated statistic proposed by Kuiper is simply a sum of the two differences, i.e.

V =D⁺+D⁻.

The second class of measures of discrepancy is given by the Cramer-von Mises family

Q=n Z∞

−∞

{Fn(x)−F(x)}²ψ(x)dF(x), (1.55) whereψ(x) is a suitable function which gives weights to the squared diﬀerence {Fn(x)−F(x)}². Whenψ(x) = 1 we obtain the W² statistic of Cramer-von Mises. Whenψ(x) = [F(x){1−F(x)}]⁻¹formula (1.55) yields theA²statistic of Anderson and Darling. From the deﬁnitions of the statistics given above, suitable computing formulas must be found. This can be done by utilizing the transformationZ=F(X). When F(x) is the true distribution function ofX, the random variableZ is uniformly distributed on the unit interval.

Suppose that a samplex1, . . . , xn gives values zi=F(xi), i= 1, . . . , n. It can be easily shown that, for valueszandxrelated byz=F(x), the corresponding vertical diﬀerences in the edf diagrams forXand forZare equal. Consequently, edf statistics calculated from the empirical distribution function of the zi’s compared with the uniform distribution will take the same values as if they were calculated from the empirical distribution function of thexi’s, compared with F(x). This leads to the following formulas given in terms of the order statisticsz(1)< z(2)<· · ·< z(n):

D⁺ = max

1≤i≤n

i n −z(i)

, (1.56)

D⁻ = max

1≤i≤n

z(i)−(i−1) n

, (1.57)

D = max(D⁺, D⁻), (1.58)

V = D⁺+D⁻, (1.59)

W² = Xn i=1

z(i)−(2i−1) 2n

2

+ 1

12n, (1.60)

(30)

A² = −n−1

n(2i−1) Xn i=1

logz(i)+ log(1−z(n+1−i)) = (1.61)

= −n−1 n

Xn i=1

(2i−1) logz(i)+

+(2n+ 1−2i) log(1−z(i)) . (1.62) The general test of ﬁt is structured as follows. The null hypothesis is that a speciﬁc distribution is acceptable, whereas the alternative is that it is not:

H0: Fn(x) =F(x;θ), H1: Fn(x)6=F(x;θ),

where θ is a vector of known parameters. Small values of the test statistic T are evidence in favor of the null hypothesis, large ones indicate its falsity. To see how unlikely such a large outcome would be if the null hypothesis was true, we calculate thep-value by:

p-value =P(T ≥t), (1.63)

where t is the test value for a given sample. It is typical to reject the null hypothesis when a smallp-value is obtained.

However, we are in a situation where we want to test the hypothesis that the sample has a common distribution function F(x;θ) with unknownθ. To employ any of the edf tests we ﬁrst need to estimate the parameters. It is important to recognize that when the parameters are estimated from the data, the critical values for the tests of the uniform distribution (or equivalently of a fully speciﬁed distribution) must be reduced. In other words, if the value of the test statistics T is d, then thep-value is overestimated byPU(T ≥d).

HerePU indicates that the probability is computed under the assumption of a uniformly distributed sample. Hence, if PU(T ≥d) is small, then thep-value will be even smaller and the hypothesis will be rejected. However, if it is large then we have to obtain a more accurate estimate of thep-value.

Ross (2002) advocates the use of Monte Carlo simulations in this context.

First the parameter vector is estimated for a given sample of sizen, yielding ˆθ, and the edf test statistics is calculated assuming that the sample is distributed according toF(x; ˆθ), returning a value ofd. Next, a sample of sizenofF(x; ˆθ)- distributed variates is generated. The parameter vector is estimated for this simulated sample, yielding ˆθ1, and the edf test statistics is calculated assuming

(31)

1.5 Applications 29 that the sample is distributed according toF(x; ˆθ1). The simulation is repeated as many times as required to achieve a certain level of accuracy. The estimate of thep-value is obtained as the proportion of times that the test quantity is at least as large asd.

An alternative solution to the problem of unknown parameters was proposed by Stephens (1978). The half-sample approach consists of using only half the data to estimate the parameters, but then using the entire data set to conduct the test. In this case, the critical values for the uniform distribution can be applied, at least asymptotically. The quadratic edf tests seem to converge fairly rapidly to their asymptotic distributions (D’Agostino and Stephens, 1986). Although, the method is much faster than the Monte Carlo approach it is not invariant – depending on the choice of the half-samples diﬀerent test values will be obtained and there is no way of increasing the accuracy.

As a side product, the edf tests supply us with a natural technique of estimating the parameter vectorθ. We can simply find such ˆθ^∗ that minimizes a selected edf statistic. Out of the four presented statistics A² is the most powerful when the fitted distribution departs from the true distribution in the tails (D’Agostino and Stephens, 1986). Since the fit in the tails is of crucial importance in most actuarial applicationsA² is the recommended statistic for the estimation scheme.

1.5 Applications

In this section we illustrate some of the methods described earlier in the chapter.

We conduct the analysis for the Danish fire losses dataset, which concerns major fire losses in Danish Krone (DKK) that occurred between 1980 and 2002 and were recorded by Copenhagen Re. Here we consider only losses in profits.

The Danish ﬁre losses dataset has been adjusted for inﬂation using the Danish consumer price index.

1.5.1 Calibration of Loss Distributions

We ﬁrst look for the appropriate shape of the distribution. To this end we plot the empirical mean excess function for the analyzed data set, see Figure 1.8.

Since the function represents an empirical mean above some threshold level, its values for highx’s are not credible, and we do not include them in the plot.

(32)

0 2 4 6 8 10 12 14 16 18 0

2 4 6 8 10 12

en(x) (DKK million)

x (DKK million)

Figure 1.8: The empirical mean excess function ˆen(x) for the Danish ﬁre data.

STF2loss08.m

Before we continue with calibration let us note, that in recent years outlier- resistant or so-called robust estimates of parameters are becoming more wide- spread in risk modeling. Such models – called robust (statistics) models – were introduced by P.J. Huber in 1981 and applied to robust regression analysis (Huber, 2004). Under the robust approach the extreme data points are eliminated to avoid a situation when outliers drive future forecasts in an un- wanted (such as worst-case scenario) direction. One of the ﬁrst applications of robust analysis to insurance claim data can be found in Chernobai et al.

(2006). In that paper top 1% of the catastrophic losses were treated as outliers and excluded from the analysis. This procedure led to an improvement of the forecasting power of considered models. Also the resulting ruin probabilities were more optimistic than those predicted by the classical model. It is important to note, however, that neither of the two approaches – classical or robust – is preﬀered over the other. Rather, in the presence of outliers, the robust model can be used to complement to the classical one. Due to space limits, in this chapter we will only present the results of the latter. The robust approach can be easily conducted following the steps detailed in this Section.

The Danish fire losses show a super-exponential pattern suggesting a log- normal, Pareto or Burr distribution as the most adequate for modeling. Hence, in what follows we fit only these three distributions. We apply two estimation schemes: maximum likelihood andA² statistics minimization. Out of the three fitted distributions only the log-normal has closed form expressions for

Building Loss Models

Munich Personal RePEc Archive

Building Loss Models

Burnecki, Krzysztof and Janczura, Joanna and Weron, Rafal

Wrocław University of Technology, Poland

September 2010

Online at https://mpra.ub.uni-muenchen.de/25492/

MPRA Paper No. 25492, posted 28 Sep 2010 20:42 UTC

Building Loss Models 1

Krzysztof Burnecki

, Joanna Janczura

, and Rafał Weron

Keywords: Insurance risk model; Loss distribution; Claim arrival process; Poisson process; Renewal process; Random variable generation; Goodness-of-fit testing JEL: C15, C46, C63, G22, G32

1 Building Loss Models

1.1 Introduction

1.2 Claim Arrival Processes

1.2.1 Homogeneous Poisson Process (HPP)

1.2.2 Non-Homogeneous Poisson Process (NHPP)

1.2.3 Mixed Poisson Process

1.2.4 Renewal Process

1.3 Loss Distributions

1.3.1 Empirical Distribution Function

1.3.2 Exponential Distribution

1.3.3 Mixture of Exponential Distributions

1.3.4 Gamma Distribution

1.3.5 Log-Normal Distribution

1.3.6 Pareto Distribution

1.3.7 Burr Distribution

1.3.8 Weibull Distribution

1.4 Statistical Validation Techniques

1.4.1 Mean Excess Function

1.4.2 Tests Based on the Empirical Distribution Function

1.5 Applications

1.5.1 Calibration of Loss Distributions

Building Loss Models ¹