π₯> π’> ππ+π+
1
π₯ π’
... ππ
π
π₯ π’
>
(6.7) an approximation of the model (6.4) is given by
π₯π+
1β πΎΛΞ¨ π₯π
π’π (6.8)
where ΛπΎ = πΌπ 0
πΎ.
Note that, as expected, the system dynamics are linear in a lifted space spanned by the observ-ables, which are now treated as basis functions. In this way, nonlinearities can be captured by an appropriate choice ofΨ(·). This means that these functions play a crucial role in the accuracy of the approximated model, so their choice must be done with care. An often used choice applies to systems for which the structure of the dynamic equations is partially known, so that basis functions can be selected by picking some or all of the nonlinear functions of the underlying dynamical model, i.e. the R.H.S of (6.4). Alternatively, polynomial, Fourier, radial or any other standard basis functions can be used instead.
6.2 Computation of approximate Koopman Operator
There is only a limited number of applications for which an analytic Koopman operator can be computed. However, where the interest in Koopman operator theory lies, is in its capa-bility to discover nonlinear system dynamics using data-driven techniques. In this sense, the Koopman operator extends the Dynamic Mode Decomposition ([104]) approach to nonlinear systems. In this section, an algorithm to compute the Koopman operator is presented which is tailored for online-based learning, as opposed to system identification-like algorithms in which computational load is not critical.
Assume that a data-set Z = {π§π, π§+
π}π=0,1,Β·Β·Β·π is collected, where π§π andπ§+
π are consecutive data points. It is not necessary (indeed not even desired) for the data collected inZto correspond to a single trajectory, i.e. any two data pairs{{π§π, π§+
π},{π§π+
1, π§+
π+1}}need not be consecutive. In order to find the Koopman operator which better approximates (6.3) given a set of basis functions, an optimization problem is solved. The cost function can be chosen in several ways, in the present context a quadratic function of the approximation residual is used, i.e.
π½ = 1 2
πβ1
Γ
π=0
||Ξ¨(π§+
π) βπΎΞ¨(π§π) ||2.
Depending on the application, other choices might be more meaningful, for instance adding a sparsity-promoting πΏ
1-regularization term [22] can be used in cases in which there is no prior knowledge about the system and a large number of basis functions are attempted in order to prioritize the most important ones. On the other hand, choosing a more complex cost function precludes to some extent an online-adaptive implementation and is, for this reason, not pursued in this thesis. A quadratic cost function is an attractive choice as it enables a closed-form solution and can therefore be efficiently implemented. In light of this, an analytic solution to the optimization problem can be obtained as follows: define the data matricesD,D+ βRππΓπ as
D = [Ξ¨(π§
0) Ξ¨(π§
1) Β· Β· Β·Ξ¨(π§π)], D+ =[Ξ¨(π§+
0) Ξ¨(π§+
1) Β· Β· Β·Ξ¨(π§+
π)]
obtained by stacking the basis functions corresponding to the collected data column-wise. Note that, according to the definition ofπ§andπ§+,D+is just the time-shifted version ofD.
It can be shown that the cost function previously defined is equivalent to π½ = 1
2
||D+βπΎD ||πΉ2 where||π||πΉ =p
Trace(πππ)is the Frobenius norm. The optimization problem is therefore min
πΎ
1 2
||D+βπΎD ||2πΉ
which is unconstrained and has a differentiable cost function, therefore an analytic solution can be found by solving
π π½
π πΎ
=πΎDDπ β D+Dπ =! 0ππΓππ (6.9) for the minimizingπΎ, thus given by
πΎDDπ =D+Dπ πΎ = π΄πΊβ
where the following matrices have been defined for convenience [124]1
πΊ = 1 π
DDπ = 1 π
π
Γ
π=0
Ξ¨(π§π)Ξ¨(π§π)π
π΄= 1 π
D+Dπ = 1 π
π
Γ
π=0
Ξ¨(π§+
π)Ξ¨(π§π)π.
(6.10)
1TheπΊandπ΄matrices are normalized by the total number of collected data pointsπ. This is done to avoid large entries on the matrices which could result in numerical problems at implementation time.
6.2.1 Recursive Computation
An online solution of (6.9) using the definitions (6.10) would require the storage and update of the data vectorsD+,D. Whereas this is often done in many data-driven techniques, particularly those based on kernel methods [125], it has the clear disadvantage that memory allocation and limitation could represent a problem. Moreover, the computation would become increasingly demanding with the number of stored data points π. Alternatively, (6.10) could be expressed in a such way that incoming data is added to the model, without the need to explicitly recompute the previous outer products; this can be done by storingπΊ , π΄βRππΓππ andπβZβ₯0and at time stepπ computing
π+ =π+1 πΊ+ = 1
π+
(π+β1)πΊ+Ξ¨(π§πβ
1)Ξ¨(π§πβ
1)>
π΄+ = 1 π+
(π+β1)π΄+Ξ¨(π§π)Ξ¨(π§πβ
1)>
(6.11)
where+ is used to denote the updated variables. Using these definitions, the Koopman operator can be updated online with consistent and relatively low computational complexity, regardless of the number of data points that have been added to the, now fictitious, data setZ. Furthermore, memory allocation poses no limitation as the size of the matrices being stored (πΊ , π΄, π) is constant.
Koopman Operator Update
Although as discussed above, adding new information to the model (in the form of data pairs {π§πβ
1, π§π}) does not result in considerable computational burden, doing so at every time step is not advisable, as adding e.g. steady-state data-points which do not contribute to enrich the model, would indeed result in actual new information having diminished influence on updating the model (the weight 1/π of any new data point would make the effect negligible for large values ofπ). For this reason, incoming state and input data need to be analyzed in some way to determine if they representnewinformation to enrich the model. There are several ways to do this, the one presented here is intuitive and easy to implement. The principle at the core is to
"learn" only if the current model disagrees (more than a certain threshold) with the observation.
To this end, assume the state and input at the previous β+1 time instants is stored in a vector π=
π₯>
πβ1 π’>
πβ1 ... π₯>
πβββ2 π’>
πβββ2
>
, a decision whether to add the data pair {π§πβ
1, π§π} to the model is made by comparing the maximum prediction error of the approximated system (6.8) within this time window. If this error is greater than a given threshold, the data point is added, i.e. add data point if
wherek Β· kβis the vector infinity norm2. Note that any other norm may be used instead, however this choice offers the advantage that it is independent of the window length βand it enables the use of physical insight to determine a suitable π (i.e. an admissible error tolerance in physical units). Note that in equation (6.12), the state is computed recursively rather than by the linearity-exploiting formπ₯π =[πΌπ 0]πΎβΞ¨(π§πβββ
1), the reason for this is that the dynamic behavior of the input is not being discovered by the Koopman operator, so that the input trajectory is virtually exogenous.