• Keine Ergebnisse gefunden

Dynamical System

3. UNDERSHOOTING

Our experiments with the penalty term of Eq. 4.12 indicate, that the additional error flow is able to prevent the learning of trend following models. In Fig. 4.7 we combined the penalty term of Eq. 4.12 with the basic ECNN of Fig. 4.5.

D C D C

−Id −Id

zt−1 st zt st+1 yt+1

ut

yt−1d ytd

C D

−Id

zt−3

ut−2

yt−3d

C D

−Id

zt−4 st−3

ut−3

yt−4d

st−2 z D

t−2 st−1

ut−1

zt−3+zt−2 zt−2+zt−1 zt−1+zt

B A

A A

C

B

A

B B

yt−2d

−Id

Id Id Id

Id Id Id

( )2 ( )2 ( )2

Figure 4.7 Combining Alternating Errors and ECNN. The additional output layers (yτydτ)2are used to compute the penalty term of Eq. 4.12 during the training of the network. We provide the output layers (yτ ydτ)2 with task invariant target values of 0 and apply the mean square error function (Eq. 5.1, [PDP00, p. 413-4]), because we want to minimize the autocovariance of the residuals. Note, that we can omit the calculation of the alternating errors in the recall (testing) phase of the network.

The integration of alternating errors into the ECNN is natural: The error correction mechanism of the ECNN provides the model errorzτ = (yτ−ydτ) at each time step of the unfolding, since the model error is re-quired by the ECNN as an additional input. Thus, we connect the output clusterszτ in pairs to another output cluster, which uses a squared error function. This is done by using fixed identity matricesid. Note, that the additional output layers (yτ−yτd)2 are only required during the training of the ECNN.

Due to the initialization shock, we do not calculate a penalty term for the first pair of model errors. Note, that the proposed ECNN of Fig. 4.7 has no additional weights, because we only use already existing information of model errors at the different time steps of the unfolding.

3.1 UNIFORM CAUSALITY

Uniform Causality provides the basics for the embedding of time dis-crete systems. Using this principle, we are able to forecast on a finer time grid than that of the data [ZN01, p. 328].

First, let us assume, that we have to identify an autonomous model for a given time series s0, s1, s2,· · ·, sT. The most obvious approach is to build a discrete time model where the time grid of the model is equal to the grid of the data [Hay94, p. 666-7]:

sn+1 =f(sn). (4.13)

By iteration off we obtain the flowF sn=F(s0, n) :=f ◦ · · · ◦f

| {z }

n

(s0) for n∈N. (4.14) It is a trivial exercise to work with a wider-meshed time grid for the model, e. g. to use daily data in order to develop a weekly forecast model. Here, we reflect on the consequences of a refinement of the model time grid relative to the time grid of the data, e. g. to build a weekly model from monthly data. In a first shot this may be understood as an interpolation between the discretely measured valuessn[ZN01, p. 328].

t+1

st

s

Figure 4.8 Search for a contin-uous embedding of a discretely measured dynamic system.

There are many possible interpolation schemes. Typically, one uses smooth interpolation techniques, e. g. splines, to find a smooth trajectory along the data points [ZN01, p. 328]. Since we want to analyze dynamic systems we now introduce the principle of uniform causality (Eq. 4.15) in order to further constrain the set of possible interpolations (Fig. 4.8) [ZN01, p. 328-9].

For each m dimensional real vector s ∈ Rm and t, t1, t2 ∈ R+, the principle of uniform causalityis given by

embedding: st = ft(s) , additivity: ft1+t2(s) = ft2 ft1(s)

.

(4.15)

The embedding of Eq. 4.15 can be satisfied by any continuous interpola-tion. It is formulated as a continuous iteration [KCG90]. The additivity (Eq. 4.15) is a description ofcausality. Intending to follow a dynamics overt1+t2, we can start atsand track the system tost1 =ft1(s). Then, we begin atst1 for the rest of the trajectory st1+t2 =ft2(st1) [ZN01, p.

329].

For the case of t ∈ N, uniform causality (Eq. 4.15) is self-evident [ZN01, p. 329]: the embedding describes the data and the additivity is a simple iteration of functions. The necessity of t ∈ R+ is a strong additional constraint. Let us assume, that we chooseft as

st=ft(s) , witht∈[0, ] and 0< <1. (4.16) No matter how smallis, due to the initialization of the first part of the trajectory and due to the additivity condition (Eq. 4.15), there is only one way to follow the path in a causal way.

Modeling a dynamical system by an ordinary differential equation

ds

dt =f(s), the principle of uniform causality is always guaranteed [ZN01, p. 329]:

embedding: st = Rt

0f(sτ) dτ , additivity: s0+Rt1+t2...

0 =

s0+Rt1...

0

+Rt1+t2...

t1 .

(4.17)

The embedding is given by the integral of the differential equation and the causal additivity is the additivity of the integral. For details of the continuous iteration or continuous embedding of a given time discrete dynamic system see e. g. Kuczma et al. (1990) [KCG90].

We are interested in finding a dynamic law of the discrete dynamics and thus, look for a uniform causal model fitting the data [ZN01, p.

329]. In the standard approach we identify f(s) by e. g. using training data such that

sn+1 =f(sn) , n∈N. (4.18) Thus, we can compute the discrete flow (s0, s1, . . .) by iterations off

sn=F(s0, n) =fn(s0). (4.19) Note that the standard approach only uses natural numbers as iteration indices. To introduce rational iterations we need the following property off:

ftn(s) = ftn

(s) , n∈N , t∈R+ , (4.20)

which is a direct consequence of iterating the additivity condition de-scribed in Eq. 4.15.

Let us assume, that we are able to identify a functionf1q(s) by sn+1 =f1q ◦ · · · ◦f1q

| {z }

q

(sn) . (4.21)

This is an iterated system where the parameters of f1q can be esti-mated by e. g. backpropagation [Hay94, p. 161-75]. We callf1q a q-root of f [ZN01, p. 329-30]. Referring to f1q, we can identify the trajectory for every rational number to a basis ofq. The embedding and additivity conditions of Eq. 4.15 are fulfilled, since operations with rational expo-nents can be reduced to natural expoexpo-nents using property 4.20 [ZN01, p. 330]:

embedding: sp

q =f

p q(s0) =

f1q

p

(s0) , additivity: f

p1 q +pq2

(s0) =

f1q p1+p2

(s0) =f

p1 q

f

p2 q (s0)

.

(4.22)

By increasing q our approach would lead to more and more uniform causal solutions. Unfortunately, the identification task of Eq. 4.21 be-comes also more and more difficult [ZN01, p. 330].

3.2 APPROACHING THE CONCEPT OF UNDERSHOOTING

Undershooting is the refinement of the model time grid by recurrent neural networks using unfolding in time and shared weights [RHW86, p.

354-7]. The concept of undershooting is directly related to the preceding considerations on uniform causality [ZN01, p. 330].

Let us consider the undershooting neural networks depicted in Fig. 4.9 [ZN01, p. 331]. Suppose, that the task of the networks is to forecast a price shift ln(pt+1/pt) one step ahead.

For simplicity, delayed inputs are ignored, i. e. the input is only given by ut. The input ut is transferred to the recurrent network by matrix B. The network’s output is computed by matrixCand compared to the targets ln(pt+(k+1)/4/pt+k/4), k = 0, . . . ,3 in order to generate an error flow. The recursive structure of the dynamics is coded in matrixA.

If we introduce three intermediate time steps (q = 4) in the descrip-tion of the dynamics, we get the architecture of Fig. 4.9, left. This network cannot be trained, because none of the intermediate targets

24 st+

t+ 4

t

s

ut

t+ 1 4

s 3

t

u

p p

B

34 t+ 42

pt

3 4 s s p

pt+ 4

2 t+ 4

t+

t+

14

s 2 4 1

B

4

ln ln t+ ln

ln pt+ pt + 1

pt+ 43

pt+1 pt

st+

1 ln

C C C

A A

C

st A A A A

C

C C C

Figure 4.9 Since the intermediate targets ln(pt+(k+1)/4/pt+k/4),k= 0,· · ·,3, are not available, the left-hand network cannot be trained. The transformation, right, allows to train the network even if the intermediate targets are not available. We use the target ln(pt+1/pt) for all intermediate states [ZN01, p. 331].

ln(pt+(k+1)/4/pt+k/4),k= 0, . . . ,3 is available. We solve this problem by exploiting the identity of

ln pt+1

pt

=

3

X

k=0

ln

pt+(k+1)/4 pt+k/4

. (4.23)

This leads directly to the redesigned architecture of Fig. 4.9, right.

In such a recurrent interpolation scheme (Fig. 4.9), there is in principle no need to use shared weights A, C. However, by the application of shared weights, we introduce an important regularization property: For all sub-intervals we assume the same underlying dynamics. Another advantage is that multiple gradient information for each shared weight are generated (see chp. 5). This enables us to estimate our models on the basis of very small data sets [ZN01, p. 331-2].

A successful application of the undershooting concept is reported in Zimmermann et al. (2002) [ZNG02, p. 397-8]. The authors forecast the annual development of business rentals in Munich from 1982 to 1996.

The applied undershooting neural network is able to fit the rental dy-namics more accurately than a preset benchmark (recurrent network without undershooting).

3.3 COMBINING ECNN AND UNDERSHOOTING

In this subsection we combine the basic ECNN depicted in Fig. 4.5 with the principle of undershooting. According to our experiments, the ECNN is appropriate for the modeling of dynamical systems in the pres-ence of external shocks or noise. The additional regularization of the ECNN by undershooting should enable us to improve the forecast

per-formance (see chp. 6.). The combined neural network architecture is depicted in Fig. 4.10.

zt−1

yt−1d

−Id

zt

ytd

−Id

yt+1 zt−2

yt−2d

−Id

yt+2

ut−1 ut

st+ st+2

st+ st+1

3

2 3

4 3

5 3

1 st−

st−

3

2 3

1 st B

C D D

D C C C C C C C C C

B

A A st+

A

A A

st+

A

A A

t−1A s

Figure 4.10 Combining Error Correction Neural Networks and Undershooting.

As depicted in Fig. 4.6, undershooting is integrated into the ECNN by a redesign of the autonomous substructure. Assuming a uniform struc-ture in time, the underlying system dynamics of the ECNN is divided into autonomous sub-dynamics. For example, in Fig. 4.10, we added twointermediate statessτ+1/3,sτ+2/3. The output of the undershooting ECNN is computed by gathering the information of the intermediate states.