• Keine Ergebnisse gefunden

Figure 3.1: Left: the map of all sensor locations in the Minneapolis freeway system traffic data.

A schematic map is underlayed to provide context information. Each red dot marks the location of onestation(which usually encompasses several sensors,detectors, one per lane). Right: road traffic occupancy family of function graphs. Data from some sensors (marked with the black rectangle on road 35W in the map) is highlighted in red. Occupancy is defined as a percentage of time a detector detects vehicles. It is measured in ten minute intervals. An occupancy value of 0.7 means that for seven out of ten minutes a sensor detected vehicles.

an illustrative example analysis of road traffic measurement data. The case study of a Diesel fuel injection system optimization in Section 7.1 demonstrates the usability of our ideas in a real-world engineering context.

3.2 Data Model

Generally speaking, a data model consists of a data definition and a manipulation language (structuring and operational definitions) [248]. Data definitions that result from engineering simulations, real-world sensor data sets, or intelligence data may be very similar. Consequently, the data sets under consideration share some common characteristics.

3.2.1 Data Definition

The data sets contain values form independent variables and ndependent variables. The in-dependent variablesx = [x1, . . . , xm]are real-valued and their values define a subsetI of the data set. A member ofI ⊆ ℜm represents a specific set of valuesxiof independent variables.

For eachxi, the corresponding set of values of dependent variables is provided; this data model does not handle missing values. There are two types of dependent variables, regular and func-tion graphs. While regular variables have a singular value for eachxi, function graph variables use an additional independent variable (often time) to provide a set of values for each xi. A

28 CHAPTER 3. VISUAL ANALYSIS OF FAMILIES OF FUNCTION GRAPHS

Figure 3.2: Two approaches to the management of function graph data. (a) The independent variable of the function graphs (e.g. time) is represented by adding an additional dimension.

Records 1 to 144 represent one function graph. The scalar dimensions of the records need to be duplicated for each value of the independent variable. (b) A function graph is represented as an atomic type in the data. Duplication of scalar dimensions is not necessary. Columns of this table are families of function graphs.

function graph can be visualized as a 2D plot that shows how the value of a dependent variable changes over time. In other words, the regular variables r = [r1, . . . , rnr]depend only onx while the function graph variablesf = [f1, . . . , fnf]depend onxand timet∈ ℜ. For a specific set of valuesxi of independent variables and fixed timetj we can define the set of values of dependent variables asd= [r1(xi), . . . , rnr(xi), f1(xi, tj), . . . , fnf(xi, tj)],nr+nf =n. The dependent variables and their values (possibly, over time) define a subsetDof the data set. For a given function graph variable,fj(x, t), we define afamily of function graphsas a set of function graphs for each possible value ofx,{fj(xi, t)|∀xi∈I}.

We use a road traffic measurements data set to illustrate the concepts described in Sec-tions 3.3 and 3.4. The data set is provided by the Traffic Management Center of Minnesota Department of Transportation2 that maintains an archive database of road traffic measurements from the freeway system in the Twin Cities metropolitan area (Figure 3.1). The data set contains 28 days of measurements from approximately 4,000 sensors grouped into about 1,000 stations covering ten main roads in the Twin Cities metropolitan area. Opposite directions on a road (e.g., northbound vs. southbound) are treated separately, thus effectively creating 20 one-way roads. I consists of the positions of the sensors, road numbers, and weekdays. The sensors report traffic volume and occupancy, thusDconsists of two families of function graphs in this data set. Each function graph in each family represents one day’s worth of measurement data, so there are in total about 112,000 function graphs in the data set (4,000 sensors times 28 days).

The sensor data are aggregated into 10 minute intervals, therefore each function graph contains 144 points to represent the 24 hours of a day.

Such data in databases and also in IVA applications are generally stored as records that consist of attributes. This concept is well-known, records can be considered as points in a space

2Twin Cities Traffic Data Archive,http://www.d.umn.edu/˜tkwon/TMCdata/TMCarchive.html

3.2. DATA MODEL 29 ofm +n dimensions, representing all (independent and dependent) attributes. All attributes (dimensions) are scalars, either numeric or categorical. We omit a more detailed discussion including nominal and ordinal types here. Time-dependent data like function graphs can be represented by adding time as an additional dimension.

The table in Figure 3.2(a) illustrates a conventional way of storing such data. Time appears as an additional column in the table. For each combination of the independent variables (day and sensor location), there are 144 records to represent a function graph variable, one record for each time step. The scalar dimensions of the records need to be duplicated for each time step. If, however, we take into account the nature of the data and allow some columns to contain function graphs, we get a different model, shown in Figure 3.2(b). We have now one record that contains the time seriesVolume(t)as a function graph, as opposed to having 144 records with TimeStepandVolumeattributes, as in Figure 3.2(a). Duplication of scalar dimensions is not nec-essary. Note that we have substantially reduced the number of data records and, simultaneously, increased the complexity of the data model. We did not lose any of the data; to the contrary, we have gained additional information. Now values pertaining to a given day and sensor location are grouped into a single record. All function graphs populating one column in Figure 3.2(b) constitute afamily of function graphs.

If dimensions can be not only scalars, but also function graphs, then we can improve the analysis significantly. This data model has proven itself in several case studies we have done in different fields, for example optimization of Diesel fuel injection (Section 7.1), timing chain drive design (Section 7.2), medical data [160], and ethology [168]. Fang et al. [67] have pro-posed a similar data model for medical image data sets.

3.2.2 Manipulation Language

Once the data set is defined, the question is how to analyze the data. In our data model, the ma-nipulation language is an exploration language that enables search and pattern discovery without modifying the data set. From the visual analytics point of view, the goal is to discover, in an iter-ative manner, trends, tendencies and outliers in the data and to see how patterns inDmap to the corresponding subsets inI and vice-versa. In order to achieve that, data exploration techniques must be conceptually simple, easily combined and visually intuitive.

The visualization framework is based on the described data model and a set of visual oper-ators (brushing techniques) and views (histograms, scatter plots, parallel coordinates, etc.) that are linked together. The design of interactive visual analysis within this framework is based on the following principles. The analyst can select a varying number of views. Within each view, the variables of interest can be selected and the corresponding values displayed. The visual op-erators are used to select a subset of “interesting” values for the specific variables in the view.

The selection is immediately displayed in all other views. Families of function graphs are of special importance in providing a visual space for patterns. Within a family of function graphs, we would like to select function graphs based on their shapes. It is possible to use a combination of function graph values to specify the desired shape of a function graph, i.e., the pattern.

30 CHAPTER 3. VISUAL ANALYSIS OF FAMILIES OF FUNCTION GRAPHS

Figure 3.3: Several occupancy function graphs of atypical shape have been selected by the red line brush. We conclude from very high occupancy values that those function graphs indicate malfunctioning sensors. In the linked map view (scatter plot view of sensor coordinates) we can see that there are two malfunctioning sensors next to each other. In another linked scatter plot view weekdays and road numbers are displayed. Each column represents one direction (for instance, southbound) of a road. We can see that those sensors are on road 35E and that they did not work for three days.