Alignment of Labeled Point Clouds - 1.2.2 3-Dimensional Descriptors and Projection onto Surface

1.2.2 3-Dimensional Descriptors and Projection onto Surface

4.2 Alignment of Labeled Point Clouds

With labeled point cloud superposition, a novel method was developed to measure similarity between geometric data. For CavBase data, this has the enormous advantage that a further transformation of protein binding sites be-comes unnecessary, a process often connected with a loss of information. As already mentioned, however, often a raw similarity is not sufficient, especially if one wants to explain the obtained similarity or to analyze a whole set of protein binding sites. In such a case the detection of common substructures or conserved patterns is required. This problem was so far only tackled for objects represented by graphs, where, e.g., the concept of multiple graph alignment (Weskamp et al., 2007) was introduced for this purpose. Here, a similar con-cept is developed which can however be applied directly on geometric data, namely labeled point clouds. As already in the case of graph alignment, the goal of labeled point cloud alignment is to establish a one-to-one correspon-dence between the basic constituents of the structures, namely labeled points.

4.2.1 Multiple Point Cloud Alignment

Multiple point cloud alignment (MPCA) is defined according to Definition 2.17, where the set{^Xi | ⁱ= 1, . . . , m}= X ^becomes{^Pi | ⁱ =1, . . . , m}=P^. Hence, one is looking for a one-to-one correspondence between the points of different point clouds. The number of valid alignments is enormous, as al-ready shown. This makes it necessary to search for the best alignment out of the set of valid alignments, that is an alignment which reflects structural corre-spondence in an optimal way. In the case of graph alignment, Weskamp et al.

(2007) solved this problem by defining a scoring function on the combinatorial search space which was optimized by a greedy algorithm. This simple idea could of course also be applied to the multiple point cloud alignment problem.

Unfortunately, combinatorial optimization is in this case NP-hard, leading to the well-known problem that a trade-off must be found between the quality of the solution and the runtime. Therefore another approach is developed here.

The idea is to derive an optimal MPCA from an optimal superposition of the labeled point clouds. More specifically, the pairwise alignment is defined on the basis of a given superposition. As will be seen below, the problem of find-ing an optimal (pairwise) alignment thus comes down to solvfind-ing several linear assignment problems. For solving the multiple point cloud alignment problem, a two step-based approach can be employed, first solving the problem of align-ing two structures and secondly mergalign-ing the pairwise alignments to a multiple alignment. Techniques introduced in Section 2.5 can be used for this purpose.

Alternatively, Shatsky et al. (2006) make use of m-partite pivot graph matching which solves the multiple point cloud alignment directly without decompos-ing it into a set of pairwise problems.

4.2.2 Construction of Pairwise Alignments

The construction of a pairwise alignment of two point clouds P and P can be reduced to an optimal assignment problem. To this end, a square matrix M = (m_i,j) is needed, where m_i,j ∈ R defines the cost for assigning point p_i ∈ P to point p_j ∈ P. According to Definition 2.17, the maximal length of a pairwise alignment is n = |P|+|P|. Therefore, to consider all possible alignments, the matrix M has size n×n.

The entries m_i,jare derived from the optimal superposition of point clouds P and P as produced by a modification of the LPCS method. This modifica-tion concerns the similarity measure to be maximized. Since a mutually op-timal alignment is sought, the similarity is not split into two opop-timal degrees of inclusion, as in measure (4.10). Instead similarity is defined in terms of a compromise measure as follows:

SIM_PCA(P, P) = max

t∈[0,2π]³×R³F(P, P, t), (4.11) where

F(^{P, P}^{, t}) = ¹

2·^inc^TF(^P^{, t})^{, P} +^inc^{P, TF}(^P^{, t}) ^.

Given a spatial superposition optimal in the sense of (4.11), it makes sense to define the cost m_i,jin terms of the associated L₁distance d_Mbetween point p_i∈ P and p_j ∈P. To account for point-to-dummy mappings, the distance between

a point and a dummy is specified by a parameter k. Finally dummy-dummy assignments are scored by zero, so that these mappings will not influence the construction of the alignment. As an illustration, Table 4.1 shows a matrix M for two point clouds P = {a, b, c, d} and P = {a, b, c} and Figure 4.2 the resulting bi-partite graph.

Table 4.1: Matrix representation of the optimal assignment problem.

a b c ⊥ ⊥ ⊥ ⊥

a d_M(a, a) d_M(a, b) d_M(a, c) k k k k b d_M(b, a) d_M(b, b) d_M(b, c) k k k k c dM(c, a) dM(c, b) dM(c, c) k k k k d d_M(d, a) d_M(d, b) d_M(d, c) k k k k

⊥ k k k 0 0 0 0

Formally, the assignment problem, also known as the weighted bi-partite matching problem, is specified by a graph G = (V, E,E)with V = V₁∪V₂, V₁∩V₂=∅and E= (v₁, v₂)|v₁∈V₁, v₂∈ V₂

. Moreover, each edge e ∈E has an associated cost valueE(e). The goal is to find a subset of edges M⊆E solving the following constrained optimization problem:

minimize

∑

e∈ME(^e) ^(4.12)

subject to

(v₁,v2)∈M

{^v1}=^V1,

(v₁,v2)∈M

{^v2}=^V2, (4.13)

and such that(v₁, v₂) ∈ M and (v₁, v₂) ∈ M with(v₁, v₂) = (v₁, v₂)implies v₁ = v₁and v₂ = v₂. In other words, M defines a bijection between V₁and V₂. In this case, the sets V₁and V₂represent, respectively, the points in point cloud P (supplemented with|P|dummy points) and P (supplemented with

|P|dummy points). Moreover the cost E(e)of an edge e = (vi, vj)is given by the corresponding matrix entry m_i,jrepresenting the distance between both points in the optimal superposition according to (4.11). See Figure 4.2 for an illustration.

To solve the weighted bi-partite graph matching problem, the Hungarian algorithm (Kuhn, 2005) is used. Once a cost-minimal assignment has been found, the point cloud alignment is defined by the corresponding point-to-point and point-to-point-to-dummy assignments, while all dummy-to-dummy assign-ments are ignored.

a

A A

c b

d

a

A A

c b

A

A A

‘

d(a, a‘)

Figure 4.2: Illustration of the weighted bi-partite graph matching problem.

Complexity

The space complexity of this approach is given by the size of the matrix repre-senting the bi-partite graph, since the Hungarian algorithm works directly on this matrix. The size of this matrix is obviously(|P|+|P|)². The time complex-ity of this approach was given by Kuhn (2005), who reported a cubic runtime, henceO((|P|+|P|)³)in the case considered here.

4.2.3 Construction of Multiple Alignments

The ability to construct pairwise alignments obviously allows directly the cal-culation of multiple alignments, since techniques like star- or tree-alignment (cf. Section 2.5) are able to merge these pairwise alignments to a multiple align-ment. Moreover, the concept of m-partite graph matching can be applied in a straightforward way by deriving the costs for assignments from the optimal superpositions as done in the pairwise case.

Im Dokument Geometric, Feature-based and Graph-based Approaches for the Structural Analysis of Protein Binding Sites : Novel Methods and Computational Analysis (Seite 100-104)