Performance Considerations - Results and Discussion

6. Results and Discussion

6.8. Performance Considerations

A discussion of the results would not be complete without a detailed look at how much overhead is introduced into the computation by using the ODE framework. Such an abstraction penalty may be viable if it is small compared to the total time needed.

6.8. Performance Considerations

Listing 6.5: Spatial discretization of a tree structure using XL rules.

1 const i n t E = EDGE 0 ;

3 module A(double u ) ;

5 protected void i n i t ( )

6 [

7 Axiom ==> F f : F F , f F ;

8 ]

10 public void d i s c r e t i z e ( )

11 [

12 // d i s c r e t i z e e a c h e l e m e n t i n d i v i d u a l l y

13 (∗ f : F ∗) ==>> f −E−> A A A −E−> f ;

14 { d e r i v e ( ) ; }

15 // c o n n e c t s e g m e n t s a c c o r d i n g t o t o p o l o g y

16 (∗ a1 :A −E−> f 1 : F f 2 : F −E−> a2 :A ∗) ==>> a1 a2 ;

17 ]

Listing 6.6: Code to measure accuracy of System.nanoTime().

1 i n t a = I n t e g e r .MAX VALUE;

2 i n t b = I n t e g e r . MIN VALUE ;

3 f o r (i n t i = 0 ; i < 1 0 0 0 0 0 0 0 0 ; i ++) {

4 long t 0 = System . nanoTime ( ) ;

5 long t 1 = System . nanoTime ( ) ;

6 i n t dt = (i n t) ( t 1 − t 0 ) ;

7 a = min ( a , dt ) ;

8 b = max ( b , dt ) ;

9 }

For measuring execution time the Java standard library provides in the classSystem a commandcurrentTimeMillis, and from version 1.5 for the language also a command nanoTime. The latter one returns the difference in nanoseconds between the current time and some arbitrary other time, perhaps even in the future. Although nanosecond precision is provided, this does not necessarily infer nanosecond accuracy, meaning that the same value might be returned by multiple successive calls until the returned value jumps to some higher value.

To get an idea about the time it takes to executenanoTimeand the accuracy obtained for the timing measurements the code in listing 6.6 was executed. In a loop many calls to the function are performed in succession. The value a captures the smallest time delta, while b the greatest one. On the system used to perform timing measurements both values were 26, indicating that a call tonanoTimetakes 26ns to execute.

To measure the impact of the overhead on some example, the model of Anabaena

Listing 6.7: Code to measure execution time for a piece of code.

1 long s = 0 ;

2 . . .

3 // s t a r t measurement

4 long t = System . nanoTime ( ) ;

5 // e x e c u t e some c o d e h e r e

6 . . .

7 // s t o p measurement

8 s += System . nanoTime ( ) − t ;

Counter Value Description

time 600.069 Simulated time (variable time) s0 3645.439ms total time spent in integrate s1 1697.772ms total time spent in getRate s2 1406.324ms total time for first rule (outer) s3 61.553ms total time for first rule (inner) s4 285.431ms total time for second rule (outer) s5 19.926ms total time for second rule (inner)

Table 6.2.: Final simulated time and total execution times for different parts of the Anabaena catenula model. The rate function was called 10854 times, and runwas called 663 times.

catenula from section 6.2 was used, with its complete code in appendix A.2.2. Timing instructions (see listing 6.7) bracket the code in question to find out how long its ex-ecution takes. The variable t stores the starting time, which is needed to compute a time delta later on. All time deltas are summed over multiple iterations to get a more accurate estimate of the time needed.

In the example, timings were taken for the call to integrate(1), the rate function, and for each of the two rules. To distinguish between the time needed to find matches for the given patterns and actual computation and assignment of the rates times were taken outside the rule and inside the code block on the right-hand side of the rule.

The obtained measurements as time series plot (over developmental age) are shown in figure 6.16, and the final values of the measurements in table 6.2.

As can be seen in figure 6.16, the accumulated times increase exponentially as is expected, since the number of simulated cells increases exponentially and execution time directly depends on this. It can also be seen that for the first (curves s2/s3) most time is consumed to search for the queried patterns, while only a negligible time is needed to actually compute and apply the rates. The situation is different for the second rule (curves s4/s5), as here the search is optimized by iterating over the corresponding extent.

Still, for both rules the time needed to find a match is much higher than the time needed to compute and apply the rate.

A similar conclusion can be drawn from the timings presented in table 6.2. Still, there

6.8. Performance Considerations

0 500 1000 1500 2000 2500 3000 3500 4000

0 100 200 300 400 500 600

wall-clocktime[ms]

model time s0

s1 s2 s3 s4 s5

Figure 6.16.: Execution times (in ms) measured for Anabaena catenula model, plotted against model time. Curve s0 is the total time spent inintegrate, s1 total time spent in getRate, s2/s3 total times for first rule (outer/inner), and analogously s4/s5 for second rule.

is a big discrepancy between the total time spent in theintegratefunction and the rate function, so an explanation is needed where the other half of the time went to, especially how much time can be attributed to the handling of monitor functions.

The JDK ships with a nice tool called Java VisualVM, which allows to monitor the execution of a running Java application. There are two modes of operation, a sampler and a profiler. The sampler peeks in regular intervals which method is currently being executed, thus long running methods or those that are called frequently will be sampled often, and thus the sample count is proportional to the execution time for this method.

Fortunately, the running time of the application is almost unaffected, but it may take many samples to get accurate results. The profiler instead introduces timing instructions into the application code for every method, so execution speed of the application is slowed down drastically, but more accurate measurements are obtained in exchange.

Application of the sampler to the original model code (withoutnanoTimeinstructions and visualization) over a model time of 600 time units yields the measurements shown in figure 6.17. The sampling period was set to the lowest selectable value of 20ms (corresponds to 50Hz). Since for a wall-clock time of 3.6s there are just 180 samples, a second measurement over a model time of 800 time units (corresponds to 72s wall-clock time, or 3600 samples) was taken (figure 6.18).

In both cases, the time that was not spent in evaluation of rate assignments can be attributed to step handling, and there to step interpolation (getInterpolatedState).

This is more pronounced for the second measurement over 800 time units of model time. The time needed to evaluate the monitor functions (method g and also part of the time needed to executeBrentSolver.solve), as well as the time needed for event handling (method handleEvent) is only a fraction of the time the integrator needs for step interpolation.

All in all, this is a good result, since it indicates that the additional overhead intro-duced by the ODE framework is only very little. The main contribution to the execution time of the rate function is the time needed to search patterns in the graph, which is needed anyways and thus cannot be counted. But by using the ODE framework it is pos-sible to select more advanced integration methods that obtain the same accuracy with much fewer calls to the rate function, thus amortizing the overhead of copying the state to the graph and the indirection of the rate assignment operator. Similarly, the overhead introduced by monitor functions is small compared to the remaining computations the integration method performs.

Thus the versatility of working on a graph structure instead of an array more than outweighs the additional overhead introduced by the framework.

6.8. Performance Considerations

Figure 6.17.: Sampling results of the Anabaena catenula model for a model time of 600 time units.

Figure 6.18.: Sampling results of theAnabaena catenula model for a model time of 800 time units.

7. Summary

The software GroIMP is an interesting tool for the creation of functional-structural models, especially of plants, and its programming language XL provides a powerful combination of the imperative and rule-based paradigm. However, rule-application is discrete by definition, while the differential equations used to describe functional aspects of a model are continuous.

The thesis proposes a way how to bridge this gap between the discrete and continuous world. A first solution to overcome this discrepancy was the extension of the language by operator overloading in combination with user-defined implicit conversion functions (autoconversions, which naturally extend Java’s autoboxing). As was shown in chap-ter 4 this provides for some useful applications. Based on ideas learned from expression templates, properly overloaded operators allow chemical reactions to be entered as or-dinary XL code, which can be analyzed to derive differential equations for numerical integration. The operator overloading approach is so flexible that even productions of a rule can be explained by it.

For dynamically growing structures, memory management becomes an important is-sue. As existing numerical libraries work on arrays, a mapping between the node at-tributes in the graph and the elements of these arrays must be created. Doing this manually is tedious and error-prone, so a second solution introduces a new operator symbol that allows the user to indicate which attributes should be integrated and how to compute their slope. This simplifies the specification of differential equations for the user and is even close to what users unintentionally wrote before, but with the added benefit that better numerical solvers can be used and also can be easily exchanged.

Some examples demonstrate the usage of the ODE framework and how the numerical solution improves. It was also shown how dL-systems can be directly transformed to equivalent XL code and then be solved on a graph. Turing patterns, which frequently occur in nature in many animals and plants, can also easily be described in XL and solved with help of the new framework. Examples of integration on a one-dimensional and a two-dimensional lattice are provided.

PDEs frequently occur when simulating natural phenomena. The method of lines is a common approach used to solve such PDEs numerically, but requires a spatial discretization of the geometry. How this can be done easily with the help of rules in XL has been shown.

7.1. Outlook

There are still many things that could be done to further enhance the framework. For instance, in physics often second-order ODEs arise in the description of motion. Taking

into account the second derivative along with the first one allows to develop methods with reduced computational effort while maintaining accuracy, as was already observed by Nystr¨om in [Nys25]. In principle, a new operator :’’= could be introduced that allows specification of the second derivative, but it must be investigated how this fits into the existing framework.

Implicit methods that perform Newton iterations for root finding need the Jacobian, which describes how changes in the input parameters affect the output parameters. The coefficients can be derived automatically by a finite difference approximation, but it can be much more efficient if the user can provide this matrix. Thus adding such a feature to the framework is beneficial.

Often, biological models combine processes that occur on different spatial and tem-poral scales. For instance, leaf formation is a matter of days for some trees, while for the same trees growth is modelled on a per-year basis instead. Numerical integration of such combined models can be problematic, as fast processes need a sufficiently small time step for the numerical algorithm to remain appropriately stable and accurate, but which for the slow processes results in a waste of computation at the same time. It should be researched how this issue can be solved robustly, for instance, by separately integrating slow and fast processes. This idea has been addressed previously in the DEVS framework [CJC10], but it would be interesting to also investigate how this can be combined with the ODE framework presented in this thesis.

With the upcoming emergence and availability of multi-core CPUs a natural question is how the framework can make use of this. Splitting a model into independent submod-els, as was already mentioned, would be one way to do this. Another way to increase parallelism would be to distribute (parallel) rule applications to multiple cores, or even computers, as was already suggested in [Kni08]. Also, implicit integration methods that need to solve a system of linear equations can benefit when this task is performed in a distributed fashion (e.g., SUNDIALS allows for this).

Going one step further one might ask if GPUs can accelerate the computation as well. Since GroIMP is inherently dependent on Java, large parts of the runtime system probably need to be reimplemented in terms of a GPU language (CUDA or OpenCL) or a transformation of the Java bytecode to GPU instructions has to be performed. Still a challenge remains of how to map the graph structure to an appropriate data structure for the GPU and how this might look like, since this is critical to obtain an efficient GPU implementation.

Im Dokument Extending the Programming Language XL to Combine Graph Structures with Ordinary Differential Equations (Seite 164-173)