• Keine Ergebnisse gefunden

Multi-run simulation data is produced when the simulation is repeated with perturbed parame-ter settings to represent different design choices or boundary conditions. The idea is rooted in Monte-Carlo simulations [26, 177], dating back to the 1940s. With recent increase in computa-tional power, accurate multi-run simulations of complex systems can be computed in reasonable time. Multi-run simulations are commonly used in many different application domains, for example, in automotive industry [163, 194], climate research [181], biology [227], and epidemi-ology [1].

The analysis of multi-run simulations is of interest for engineers because they can study how parameters influence the behavior of the simulated system and choose optimal parameter combinations. In parameter sensitivity analysis [85, 91, 92], the significance of parameters is determined and their correlations with the output is studied. When the model’s sensitivity to its parameters is understood, engineers canoptimize the design by appropriately choosing parameters.

In this section, we discuss possible approaches to the visualization and visual analysis of multi-run simulation data. An alternative to multi-run visualization is the very actively re-searched field ofuncertainty visualization. There are several surveys and state of the art reports on uncertainty visualization by Brodlie et al. [31], Griethe and Schumann [84], Johnson and Sanderson [110], and Pang et al. [189].

2.4. MULTI-RUN DATA 21 2.4.1 Visualization of Multi-run Data

Visualization of multi-run data sets is especially challenging, because they are typically high-dimensional, contain several different time-dependent result attributes (independent variables), and are also multivariate in the sense that multiple data values are given for each position in space and time, which complicates direct spatio-temporal visualization.

Data pertaining to different runs can be directly displayed in small multiples [203] or in a coordinated multiple views framework. Abstract information visualization views as well as detailed spatio-temporal views can be incorporated. However, this approach is limited to dis-playing only a few runs at a time. If the data set has no relevant spatial aspect, then the simulation control parameters can be considered independent variables in the data set and multi-run data can be compactly visualized using techniques for high-dimensional data [166, 194]. Alternatively, one can visualize statistical measures of the individual distributions of the time-dependent at-tributes from multiple simulation runs [181, 202].

The standard representation of distributions in statistics is thebox plot[170, 253]. Box plots display the minimum and maximum data values, the median, and the lower and upper quartiles.

Unlike histograms and probability density functions, box plots do not require assumptions of the statistical distribution. Potter [201] surveys several modifications of box plots that provide additional information, including the sample size, density information and further statistical properties, such as skewness and kurtosis. Hintze and Nelson [94] propose a combination of box plots and density shapes called violin plots. Potter et al. [202] extend violin plots by a color mapped histogram and glyphs representing additional descriptive statistics such as mean, standard deviation, and higher order moments.

The box plot can also be used for higher dimensional data. The main problem is how to represent the summary values using simple visual metaphors with meaningful spatial positions.

Proposed extensions of box plots for bivariate distribution include the rangefinder box plot, two-dimensional box plot, bagplot, relplot and quelplot [201]. Potter et al. [202] present an extension of their summary plot to 2D distributions.

Box plots cannot be placed in a dense spatial context. Kao et al. [113, 114] display summary statistics of 2D distributions on surfaces using color mapping and line glyphs. This is sufficient when the distribution can be adequately characterized by its statistical parameters. When that is not possible, the authors create a 3D volume from the 2D distribution by adding the data range as a third dimension. The voxel values represent the probability density function. Standard volume visualization techniques can be applied to this density estimate volume. Moreover, shape descriptors (number, location, height, and width of peaks) are computed from the density estimate volumes and can also be visualized by colored slices and line glyphs.

Kehrer et al. [120] represent statistical properties computed from the multi-run data using billboard glyphs placed in a 3D context. Sanyal et al. [222] combine glyph-based uncertainty visualization with the spaghetti plots to display ensemble meteorological simulations. Spaghetti plots [57] combine contours of an attribute at a selected time step over all simulation runs. They are commonly used in meteorology.

Chan et al. [45] represent sensitivity parameters as tangent lines in 2D scatter plots. This helps analysts discover local and global trends in a 2D projection. Changes in one variable can be correlated to changes in another one. The sensitivity information can be understood

22 CHAPTER 2. STATE OF THE ART as velocity, and the resulting visualization resembles a flow field. The artificial flow field can be explored also based on flow-field analysis. For example, data points can be selected and clustered by streamlines. This groups points with similar local trends in a non-linear manner.

2.4.2 Visual Analysis of Multi-run Data

In some cases, multi-run data can be analyzed using methods that were originally designed for high-dimensional data. The HyperSlice [262] displays a multi-dimensional function as a matrix of orthogonal two-dimensional slices around a focus point in the multi-dimensional space. The Influence Explorer [257] allows the exploration of data computed from a model using given sets of parameter values as input values. The user can select a set of points in either the parameter space or the result space and see how that set corresponds to points in other dimensions in both spaces. The Prosection Matrix [80, 257] allows the user to define a slice of adjustable thickness (representing tolerance) and projects data points in the slice to scatter plots. Matkovi´c et al.[166]

describe a system based on coordinate multiple attribute views and interactive brushing that can be used for the analysis of multi-run simulations. The work presented in Chapters 3, 4, and 5 of this thesis extends this framework. Nocke et al. [181] analyze statistical aggregates of multi-run climate simulations in multiple linked attribute views.

Multi-run data can be analyzed based on statistical moments. The system proposed by Potter et al. [203] for spatio-temporal multi-run data links overview and statistical displays. Kehrer et al. [117] integrate the computation of higher-order statistical moments like skewness and kurtosis as well as robust estimates in the visual analysis process and enable the analyst to brush particular statistics. In more recent work by Kehrer et al. [120], multi-run data and aggregated properties are related via an interface, so that selection information between the data parts can be communicated. The statistical properties are represented using glyphs. The glyphs can be explored using focus+context visualization and brushing of statistical aggregates.

The results of time-consuming multi-run simulations can be approximated by simpler sur-rogate models. The HyperMoVal [194] enables the user to validate such models visually. The system shows multiple 2D and 3D projections of the n-dimensional function space around a focal point. The predictions can be compared to known simulation results. The validity of the model at different points can be analyzed, thus regions of bad fit can be found. Unger et al. [258]

discuss a framework for the validation of geoscientific simulation ensembles.

Berger et al. [22] propose a tool, based on HyperMoVal, for the continuous exploration of a sampled space of simulation input parameters and results. The neighborhood of a selected point in the input parameter space can be projected to the result space using nearest-neighbor or model-based predictors. The inverse mapping, from results to parameters, is generally not possible. The authors visualize the neighborhood around the selected point where changes in predicted output produced by the variations of the input parameter remain below a threshold.

The uncertainty of the predictions is visualized by box plots.

Matkovi´c et al. [164] visualize families of data surfaces, i.e., data sets with respect to pairs of independent dimensions. Different levels of abstraction (scalar aggregates, projections with respect to one variable) can be analyzed using multiple views and brushing. Direct visualization of surfaces is used only when the user drills down to one (or a few) interesting surface(s), in order to avoid problems related to occlusion. Piringer et al. [196] analyze and compare surfaces

2.5. CHAPTER CONCLUSIONS 23