Similarity Measures and Transformation for Two-Dimensional Time-Dependent Data

4. Visual Analysis of Two-Dimensional Time-Dependent Data 137

4.5. Visual Analysis of Two-Dimensional Time-Dependent Data with Grouping of Entities

4.6.2. Similarity Measures and Transformation for Two-Dimensional Time-Dependent Data

4.6. Visual Analysis of Two-Dimensional Time-Dependent Data Using SOM Clustering

4.6. Visual Analysis of Two-Dimensional Time-Dependent Data Using

SOM Clustering

1. Using directtrajectory geometryin combination with a specific distance measure (see next paragraph for more details).

2. Mapping trajectory into a feature vector and using a feature distance measure for assessing similarity between trajectories (see below for more details).

3. Double-cross matrix-based similarity[KM08] – each trajectory (i.e., polyline) is represented in a double-cross matrix. Double double-cross matrix represents relative position of a line segment w.r.t. its starting point.

It is mainly used in grid-based applications. This method disregards the absolute spatial positions of the trajectory segments.

4. Edit distance on real sequence (EDR)was introduced by Chen et al. [COO05]. It is based on string edit distance. It can handle trajectories of varying number of steps. It is robust to data imperfections owing to quantization of the distance to 0 and 1. However, it needs a tolerance thresholdεto be defined in advance.

The choice of the parameterεinfluences the resulting similarity value.

5. Dynamic time warping (DTW)(e.g., [Keo02]) allows for unequal number of time steps and possible phase shifts.

6. Edit distance with real penalty (ERP)introduced by Chen [CN04] combinesL₁norm and edit distance. It can support local time shifting and does not need the pre-setting of the parameterε.

7. Distance based on longest common subsequences (LCSS)was proposed by Vlachos et al. [VGK02]. It gives more weight to the similar portions of the trajectory parts. It allows for stretching in time and global translation. Similarly to EDR, it however needs anεthreshold to be defined in advance.

In our work, we focus on the first two approaches. We explain them in more detail below. Please note that our approach can support also other similarity measures.

Geometry-based Distance Measures include several methods based on distance of points along trajectories in two-dimensional space. The simpleEuclidean distance of trajectory points in each step inR²was applied by [NP06]. This measure requires equal number of points (time steps) in trajectories and is suitable mainly for equally spaced time. Pelekis et al. [PKM^∗07] introduce new measures for “time-relaxed” similarity, applicable also for trajectories with non equally spaced time steps. He proposes the so calledlocality in between polylines (LIP)measure defined as area between two trajectories and its variations. The variations include spatio-temporal LIP distance, directional distance, temporal directional distance, speed pattern spatio-temporal LIP distance.

These measures take into account factors such as locality, temporality and directionality.

Trajectory Features Trajectory features characterize trajectories by a small number of abstract properties.

The similarity between trajectories is given as distance between their feature vectors. Andrienko et al. [AAPS08]

presented a set of characteristics for movement data in geographic context, which could be applied also to abstract trajectories depending on the use case. These characteristics include:

• Length of trajectory:measuring distance of the movement. It is of two types

– total: L(T) =∑d(t_i,t_i₋₁),i=1, . . . ,n, measures the total length of the whole movement path (i.e., sum of the lengths of all movement steps),

– changes: measure the lengths of each trajectory segment (distance between each two following steps),

• Duration of trajectory: measures time duration of the movement. Please note that duration in case of equally spaced time intervals is constant in each step and is linearly proportional to the number of steps.

Therefore, it can be disregarded in this case.

4.6. Visual Analysis of Two-Dimensional Time-Dependent Data Using SOM Clustering

Invariance in size Invariance in position Invariance in rotation

Figure 4.25.: Illustration of the three types of data invariant transformations.

• Speed: describes how fast an element moves (speed=length/duration). Please note that in case of equidis-tant time steps, this measure is linear transformation of the length measure, therefore can be disregarded,

total speed: is analogy to length for equally spaced time intervals, speed changes: is analogy to length for equally spaced time intervals,

• Direction: reflects the direction of motion,

– major direction: reflects the general direction of the movement measured as the direction between start and end point of the movement,

– Dynamics of direction: is an analogy to above, measures directions of each two subsequent steps. It can indicate major turns, or straight movements.

These features are applicable to any two-dimensional time dependent data although they have been developed for geographic movements. In our work, we concentrate on the specific type of two-dimensional time series with equidistant time steps and equal number of time steps. Therefore some of the features are dispensable. In particular, the duration is constant and speed characteristics can be represented by length characteristics when time intervals are equidistant.

4.6.2.2. Transformation

When dealing with abstract dimensional time dependent data, scaling, rotations and translations of the two-dimensional time-dependent data can be applied as a pre-processing step. The type of transformation depends on similarity notion in a particular application and task context. The transformation used thereby impacts the results of similarity measures presented above. Figure4.25presents three types of invariance that could be of interest.

Please note that these transformations usually are not applied for geography-based movement data as the fixed point locations play a significant role in the notion of similarity.

Figure4.25shows three possible invariances that can be used for data transformations: Please note that these invariances can be combined in particular use cases.

• Invariance in size: is used when the trajectory length is not important, for example, when it is not important how big the movements are and the focus is on the shape of the movement.

• Invariance in position: This applies when the exact location is not important and translations in space are possible. In this case, only relative movement is of relevance. For example, it is not important where exactly the movement takes place, the movements are then compared relative to each other.

• Invariance in rotation: In this case, the total direction of movement is not relevant. This is used, for example in handwriting recognition.

Based on the above-mentioned invariances, transformations of the trajectories (i.e., translation, normalization and rotation) are undertaken. These transformations can be performed on the global, local, entity, or time level.

For example in size normalization, global level means normalizing over all trajectories of all entities over the whole time period, local means normalizing each trajectory separately, entity-based means normalizing all tra-jectories of each entity together and time-based is performed for all entities in each time period. For example, for risk-return data, we can normalize each asset separately or over all assets on the market either during the whole time period or for each week separately.

Im Dokument Visual Analytics of Large Weighted Directed Graphs and Two-Dimensional Time-Dependent Data (Seite 185-188)