Supplementary material for the calculation process of AIC
In the following, we describe the calculation process of AIC in details. Suppose that
a random variable Y has a probability density function f y( | )
, and is the
parameter vector. When we get a set of independent implementation values
1, ,...,2 N
y y y
, the likelihood function of is defined as
1 2
( ) ( | ) ( | )... ( N | ) L f y f y f y
. The g y( )
is the probability density function that
describes the true distribution of Y. Here is considered to be the estimate of that
maximizes the logarithmic likelihood functionl( ) ln ( ) L
. Because of
( ) ln ( | )i l f y
, then we can get
1 l( ) Eln ( | )f Y g y( ) ln ( | ),f y N
N
. (1)So is the largest estimate of Eln ( | )f Y . According to Kullback's definition of
relative entropy
( / / ) ( ) log ( ) ( ) KL P Q P x P x dx
Q x [32], we plug in the formula to get( ; ( | )) ( ) ln ( ) ln ( ) ln ( | )
( | )
S g f Y g y g y dy E g Y E f Y
f y
. (2)According to the non-negative property of Kullback property, there is
( ) ln ( ) 0
( | ) g y g y dy
f y
, if and only if 0 when the distribution of g y( ) and f y( | )are the same. Therefore, when Eln ( )g Y Eln ( | )f Y
is 0, we maximize
ln ( | ) E f y
. In terms of Kullback principle, it is to find f y( | )
closest to g y( ) ,
which is essentially the same as maximum likelihood.
Use E S g f Y* ( ; ( | )) as the standard to evaluate . is a function of our observed
1
, ,...,
2 Nx x x
. Thex x
1, ,...,
2x
N and y y1, ,...,2 yN are independent identity distribution.E* is the mathematical expectation of the distribution of
x x
1, ,...,
2x
N.When multiplemodels are compared, E E* ln ( )g Y in E S g f Y* ( ; ( | )) E E*[ ln ( )g Y Eln ( | )]f Y
is a common term that can be omitted. So we just need a good estimate of
* ln ( | ) E E f Y .
We introduce
max ( )0
max ( ) m L
L
by means of the methods in the literature [14]. Then we
get
0 0
2 2
0
0
max ( )
2 ln 2 ln 2[ ( ) ( )]
max ( )
( | )
[ln ( | ) ln ( | )] [ln ]
( | )
m L l l
L f y f y f y
f y
. (3)
WhenN
,2lnm
asymptotically obeys the chi-square distribution of t degrees of
freedom. The t
is the dimension of the parameter vector
. In other words, it is
{2[ ( ) ( )])}0
E l l t. The formulas are as follows
*
0 0 0 0 0
2 ( ) 2 ln ( | ) 2 ( | ) ln ( | )
2 ( ) 2 ( | ) ln ( | ) 2 ( | ) ln ( | )
i i i
i i i i
l f y N f y f y dy
E l f x f x dx N f x f x dx
. (4)From formula (4), we know that the adjacent shape of 2 ( )l at can be
approximated by the adjacent shape of 2E l* ( ) at 0
. 2 ( )l and 2E l* ( ) are
approximated by quadric surfaces with vertices and 0.That means that 2E l* ( )0 is
t higher than 2E l* ( ) on average. So the estimate of E E l{2 * ( )}=2 NE E* ln ( | )f Y
is 2 ( ) 2l t. Then we can get
2 ( ) 2 AIC l t. (5)