 Supplementary material for the calculation process of AIC

(1)

Supplementary material for the calculation process of AIC

In the following, we describe the calculation process of AIC in details. Suppose that

a random variable ^Y has a probability density function f y( | )

, and ^ is the

parameter vector. When we get a set of independent implementation values

1, ,...,2 _N

y y y

, the likelihood function of ^ is defined as

1 2

( ) ( | ) ( | )... ( _N | ) L  f y  f y  f y 

. The g y( )

is the probability density function that

describes the true distribution of Y. Here ^ is considered to be the estimate of  that

maximizes the logarithmic likelihood function^l^{( ) ln ( )}  ^L

. Because of

( ) ln ( | )_i l    f y 

, then we can get

1 l( ) Eln ( | )f Y g y( ) ln ( | ),f y N

N    



  _{. (1)}

So ^ is the largest  estimate of ^E^{ln ( | )}^{f Y}  . According to Kullback's definition of

(2)

relative entropy

( / / ) ( ) log ( ) ( ) KL P Q P x P x dx





Q x [32], we plug in the formula to get

( ; ( | )) ( ) ln ( ) ln ( ) ln ( | )

( | )

S g f Y g y g y dy E g Y E f Y

 f y 





   _.₍₂₎

According to the non-negative property of Kullback property, there is

( ) ln ( ) 0

( | ) g y g y dy

f y  



, if and only if 0 when the distribution of g y( ) and f y( | )

are the same. Therefore, when ^E^{ln ( )}^{g Y} ^E^{ln ( | )}^{f Y} 

is 0, we maximize

ln ( | ) E f y 

. In terms of Kullback principle, it is to find ^{f y}^{( | )}

closest to g y( ) ,

which is essentially the same as maximum likelihood.

Use E S g f Y^* ( ; ( | ))^ as the standard to evaluate ^. ^ is a function of our observed

1

, ,...,

2 _N

x x x

_{. The}

x x

₁

, ,...,

₂

x

_N_andy y₁, ,...,₂ y_N are independent identity distribution.

E* is the mathematical expectation of the distribution of

x x

¹

, ,...,

²

x

_N.When multiple

models are compared, E E^* ln ( )g Y in E S g f Y^* ( ; ( | ))^ E E^*[ ln ( )g Y Eln ( | )]f Y 

is a common term that can be omitted. So we just need a good estimate of

(3)

* ln ( | ) E E f Y  .

We introduce

max ( )0

max ( ) m L

L



^

 by means of the methods in the literature [14]. Then we

get

0 0

2 2

0

max ( )

2 ln 2 ln 2[ ( ) ( )]

max ( )

( | )

[ln ( | ) ln ( | )] [ln ]

( | )

m L l l

L f y f y f y

f y

  



  





 

    

    

. (3)

WhenN 

,2lnm

asymptotically obeys the chi-square distribution of t degrees of

freedom. The t

is the dimension of the parameter vector

. In other words, it is

{2[ ( ) ( )])}0

E l ^ l  t. The formulas are as follows

*

0 0 0 0 0

2 ( ) 2 ln ( | ) 2 ( | ) ln ( | )

2 ( ) 2 ( | ) ln ( | ) 2 ( | ) ln ( | )

i i i

i i i i

l f y N f y f y dy

E l f x f x dx N f x f x dx

   

    

   

  



 

^.⁽⁴⁾

From formula (4), we know that the adjacent shape of ^{2 ( )}^l ^ at   ^ can be

approximated by the adjacent shape of 2E l^* ( ) at  ⁰

. ^{2 ( )}^l ^ and 2E l^* ( ) are

(4)

approximated by quadric surfaces with vertices ^ and ⁰.That means that 2E l^* ( )⁰ _is

t higher than 2E l^* ( )^ on average. So the estimate of E E l{2 ^* ( )}=2^ NE E^* ln ( | )f Y ^

is 2 ( ) 2l ^  t. Then we can get

2 ( ) 2 AIC  l ^  t. (5)