Proof of Theorem 2.5 - Metric Learning for Structured Data

an edit script over∆A such that ¯δ⁰⁰(x˜) = z, and we obtain˜ d_c(x, ˜˜ z) ≤ c(δ^¯⁰⁰, ˜x) = c(δ, ˜^¯ x) +c(δ^¯⁰, ˜y) =dc(x, ˜˜ y) +dc(y, ˜˜ z).

Lemma A.2. Let Abe analphabet, let X be aforestoverAand let k ∈ {1, . . . ,|X|}. Then, it holds:

∀i∈ {_{1, . . . ,}|X|}_:k∈ _anc_X(i)⇒k <i≤rl_X(i)≤rl_X(k) _(A.6)

∀i∈ {1, . . . ,|X|}:k<i≤rl_X(k)⇒k ∈anc_X(i) (A.7)

∀i,j∈anc_X(k):i< j⇔i∈anc_X(j) (A.8) Proof. We first provide a proof for Equations A.6 and A.7, and then go on to prove EquationA.8. Our first proof works via induction over the size of the subtree ˜x^k.

If|x˜^k|=1, thenkcan not be an ancestor of any element and, likewise, there exist noi such that k <i≤rlX(k) = kork <rlX(i)≤ rlX(k) = k. Therefore, the base case holds for Equations A.6andA.7.

Now, assume that|x˜^k|>1. Let ˜x^k =x_k(x˜^k₁, . . . , ˜x^k_R

k). Further, let for allr ∈ {1, . . . ,R_k+ 1}:kr:=k+_∑^r_l₌⁻₁¹|x˜_l^k|+1. Recall that, according to LemmaA.1, for allr ∈ {1, . . . ,R_k} it holds: ˜x^k^r = x˜^k_r. Further note that rl_X(k_r) = k_r+|x˜^k_r| −1 = k+_∑^r_l₌₁|x˜^k_l| = k_r+1−1.

Finally, it holds: k_R_k+1= k+_∑_l^R₌^k₁|x˜^k_l|+1=k+|x˜^k|=rl_X(k) +1.

Regarding EquationA.6, we considerk ∈anc_X(i). Then per definition of ancestors, one of the following two cases applies.

par_X(i) =k: In that case, letrbe the index such thati=k_r. Then, it holds:

k<k+

r−1 l

∑

|x˜₁^k|+1=kr= i≤rl_X(i) =rl_X(kr) =k_r+1−1≤ k_R_k+1−1=rl_X(k)

par_X(i)6=k: In that case, there is some j∈_anc_X(i)such that par_X(j) = k, otherwise k would not be in anc_X(i). Letr be the index such thatkr = j. Then, per induction, we know that

k<k+

r−1 l

∑

|x˜₁^k|+1=kr= j^I.H.< i≤rl_X(i)

I.H.≤rl_X(j) =rl_X(k_r) =k_r+1−1≤k_R_k+1−1=rl_X(k) which concludes the proof.

Regarding EquationA.7, we considerk<i≤rlX(k). Then, there exists exactly one rsuch that k_r ≤i <k_r+1. Now, ifk_r =i, we obtain par_X(i) = k, which in turn implies k ∈ anc_X(i). If k_r < i, then k_r < i ≤ k_r−1−1 = rl_X(k_r). Therefore, per induction, it holds: kr ∈ancX(i). Due to the definition of ancestors, we also know thatk∈ancX(kr). Therefore,k ∈anc_X(i), which concludes the proof.

Now, consider Equation A.8. We perform an inductive proof over the size of the ancestor set |ancX(k)|.

If anc_X(k)is empty or contains only a single element, then the claim holds trivially.

If |_anc_X(k)| > 1, consider i := _par_X(k). Then, per definition of ancestors, we have anc_X(k) ={i} ∪anc_X(i). Because|anc_X(i)|<|anc_X(k)|, our induction hypothesis applies and the claim holds for all pairwise comparisons within anc_X(i). It remains to show that the claim holds for all pairwise comparisons (i,j)withj∈ anc_X(k). There are only two

possible cases for j. Either i= j, in which case the claim holds trivially, or j∈ancX(i)_. In that case, EquationA.6impliesj<i, which means that the claim holds as well. This concludes the proof.

Next, we generalize the notion oftree mappingsbetweentrees(refer to Definition2.14) totree mappingsbetween forests.

Definition A.3(Mappings). LetAbe analphabetand letX,YbeforestsoverA. Then, we define atree mapping MbetweenXandYas a subsetM ⊆ {1, . . . ,|X|} × {1, . . . ,|Y|}

such that the following conditions hold for all entries(i,j),(i⁰,j⁰)∈ M.

i≥i⁰ ⇐⇒ j≥j⁰ (pre-order preservation) (A.9) i∈anc_X(i⁰) ⇐⇒ j∈anc_Y(j⁰) (ancestral preservation) (A.10) We define theleft-complementofMasI(M,X,Y):=i∈ {1, . . . ,|X|}@^j∈ {1, . . . ,|Y|}: (i,j)∈ M and we define theright-complementofMasJ(M,X,Y):=j∈ {1, . . . ,|Y|}@ⁱ∈ {1, . . . ,|X|}:(i,j)∈ M . Finally, we define thecostofMaccording to somecost function coverAas follows.

c(M,X,Y)_:=

∑

(i,j)∈M

c(x_i,y_j) +

∑

i∈I(M,X,Y)

c(x_i,−) +

∑

j∈J(M,X,Y)

c(−_,y_j)

In a next step, we show that we can always find an edit script which is exactly as expensive as thetree mappingin question. Conversely, we can always find atree mapping which is at most as expensive as theedit scriptin question. This very fact permits us to search for cheapesttree mappingsinstead of cheapestedit scripts, as we show in the next Lemma. First, however, we define an alternative distance based ontree mappings, which we will then show to be equivalent.

Definition A.4(Forest Edit Distance, Forest Mapping Distance). LetAbe analphabet and letX,YbeforestsoverA. Further, letcbe acost functionoverA. Then, we define theforest edit distancedc(X,Y)betweenXandYas

d_c(X,Y):= min

δ¯∈_∆^∗_A{c(δ,^¯ X)|δ^¯(X) =Y} (A.11) Further, we define theforest tree mappingdistanceD_c(X,Y)betweenXandYas

D_c(X,Y):= min

M⊂{1,...,|X|}×{1,...,|Y|}{c(M,X,Y)|M is atree mappingbetween XandY} (A.12) In the next lemma, we demonstrate that under some conditions to thecost function, d_candD_c are equivalent.

Lemma A.3. LetAbe analphabetand let X,Y beforestsoverA. Further, let c be acost function overA. Then, it holds:

1. For anytree mapping M between X and Y there exists anedit scriptδ¯_M ∈ _∆_Asuch that δ¯(X) =Y and c(δ,^¯ X) =c(M,X,Y).

2. If c fulfills the triangular inequality and is self-equal, then for anyedit scriptδ¯∈ _∆_Awith δ¯(X) = Y there exists a tree mapping Mδ¯ between X and Y such that c(Mδ¯,X,Y) ≤ c(δ,^¯ X).

3. If c fulfills the triangular inequality and is self-equal, then d_c(X,Y) =D_c(X,Y). Proof. We will consider each claim in turn.

Regarding the first claim, we define two more auxiliary sets, namelyI^C(_M,_X,_Y)_:= i∈ {1, . . . ,|X|}∃j∈ {1, . . . ,|Y|}:(i,j)∈ M andJ^C(M,X,Y):=j∈ {1, . . . ,|Y|}∃i∈ {1, . . . ,|X|}: (i,j)∈ M . Then, we can construct ¯δ_M as the concatenation of three edit scripts δ¯^rep_M , ¯δ^del_M , and ¯δ^ins_M as follows. We define ¯δ^rep_M as the list of rep_i,y

j for all(i,j)∈ M in lexical ascending order, first sorted according to iand then according to j. Per con-struction, thisedit scriptreplaces allx_i with the mapped labely_j according to thetree mapping M.

Next, we define ¯δ^del_M as the list of del_i for all i∈ I(M,X,Y)indescendingorder. Per construction, ¯δ^del_M (X)contains exactly those x_i such thati∈ I^C(M,X,Y).

Finally, we define ¯δ^ins_M as the list of ins_par

Y(j),y_j,rY(j),rY(j)+RM,X,Y(j) for all j ∈ J(M,X,Y) in ascending order, where we define R_M,X,Y(j) recursively as R_M,X,Y(j) := |adj_Y(j)∩

J^C(M,X,Y)|+_∑_j0∈adj_Y(j)∩J(M,X,Y)R_M,X,Y(j⁰)and where adj_Y(j) ={j⁰|par_Y(j⁰) =j}. Per construction, ¯δ^ins_M inserts all labels of Y which are missing in ¯δ^rep_M δ¯^del_M (X). The definition ofr_Y(j)and R_M,X,Y(j)ensures that labely_jis inserted at the correct position and uses all children which are mapped to labels in Xand are descendants of ˜y^j inY.

For ¯δ_M :=δ^¯^rep_M δ¯^del_M δ¯^ins_M we thus obtain ¯δ_M(X) =Yand

c(δ^¯_M,X) =c(δ^¯^rep_M ,X) +c(δ^¯^del_M , ¯δ^rep_M (X)) +c(δ^¯^ins_M, ¯δ^rep_M δ¯^del_M (X))

∑

(i,j)∈M

c(x_i,y_j) +

∑

i∈I(M,X,Y)

c(x_i,−) +

∑

j∈J(M,X,Y)

c(−,y_j) =c(M,X,Y).

Regarding the second claim, we perform an inductive proof. As base case, consider the emptyedit script δ¯ =e, which implies that ¯δ(_X) = _Y = X. In that case, we define Mδ¯ = (i,i)i ∈ {1, . . . ,|X|} . Accordingly, we obtain c(Mδ¯,X,Y) = c(Mδ¯,X,X) =

∑_i^|=^X^|1c(_x_i_,_x_i)_{. Because}_cis self equal,c(_x_i_,_x_i)is zero for all i, which in turn implies that c(Mδ¯,X,Y) =0=c(e,X)as desired.

Now, consider a non-emptyedit scriptδ¯ = δ₁. . .δ_T+1 over∆A such that ¯δ(X) = Y and let ¯δ⁰ :=δ₁. . .δ_T as well asY⁰ :=δ^¯⁰(X). Due to induction, we know that there exists atree mapping Mδ¯⁰ betweenX andY⁰ such thatc(Mδ¯⁰,X,Y⁰)≤ c(δ^¯⁰,X). Now, consider the final edit δ_T+1. Ifδ_T+1(Y⁰) = Y⁰ = Y, we define Mδ¯ := Mδ¯⁰. Because Mδ¯⁰ is a valid tree mappingbetweenXandY⁰ it is also a validtree mappingbetweenX andY=Y⁰. Further, for the cost we obtainc(δ,^¯ X) =c(δ^¯⁰,X)^Induction≥ c(Mδ¯⁰,X,Y⁰) =c(Mδ¯,X,Y).

It remains to consider all cases in which Y = δ_T+₁(Y⁰) 6= Y⁰. We distinguish the following cases.

δ_T+1=rep_j,y

j for somej∈ {1, . . . ,|Y|}. Then, we define Mδ¯ := Mδ¯⁰. Mδ¯ is a tree map-pingbetweenX andY because the ancestral structure ofY⁰ andY is exactly the same and Mδ¯⁰ was per induction a validtree mappingbetweenXandY⁰.

Further, if there exists anisuch that(i,j)∈ Mδ¯⁰ we obtain:

c(δ,^¯ X) =c(δ^¯⁰,X) +c(y⁰_j,y_j)^Induction≥ c(Mδ¯⁰,X,Y⁰) +c(y⁰_j,y_j)

=c(Mδ¯,X,Y)−c(x_i,y_j) +c(x_i,y⁰_j) +c(y⁰_j,y_j)

triang.

≥ c(Mδ¯,X,Y)−c(x_i,y_j) +c(x_i,y_j) =c(Mδ¯,X,Y) Conversely, if there is noisuch that(i,j)∈ Mδ¯⁰ we obtain:

c(δ,^¯ X) =c(δ^¯⁰,X) +c(y⁰_j,y_j)^Induction≥ c(Mδ¯⁰,X,Y⁰) +c(y⁰_j,y_j)

=c(Mδ¯,X,Y)−c(−,y_j) +c(−,y⁰_j) +c(y⁰_j,y_j)

triang.

≥ c(Mδ¯,X,Y)−c(−,y_j) +c(−,y_j) =c(Mδ¯,X,Y)

δ_T+1 =del_j for somej∈ {1, . . . ,|Y⁰|}. Then, for allj⁰,∈ {1, . . . ,j−1}it holds anc_Y(j⁰) = anc_Y⁰(j⁰), and for allj⁰,∈ {j+1, . . . ,|Y⁰|}it holds anc_Y(j⁰) ={j⁰⁰|j⁰⁰ ∈anc_Y⁰(j⁰),j⁰⁰ <

j} ∪ {j⁰⁰−1|j⁰⁰ ∈anc_Y⁰(j⁰),j⁰⁰ ≥j}. Accordingly, we define Mδ¯ :={(i,j⁰)∈ Mδ¯⁰|j⁰ <

j} ∪ {(i,j⁰−1)∈ Mδ¯⁰|j⁰ >j}such that Mδ¯ is a validtree mappingbetweenX and Y.

Further, if there exists anisuch that(i,j)∈ Mδ¯⁰ we obtain:

c(δ,^¯ X) =c(δ^¯⁰,X) +c(y⁰_j,−)^Induction≥ c(Mδ¯⁰,X,Y⁰) +c(y⁰_j,−)

=c(Mδ¯,X,Y)−c(x_i,−) +c(x_i,y⁰_j) +c(y⁰_j,−)

triang.

≥ c(Mδ¯,X,Y)−c(x_i,−) +c(x_i,−) =c(Mδ¯,X,Y) Conversely, if there is noisuch that(i,j)∈ Mδ¯⁰ we obtain:

c(δ,^¯ X) =c(δ^¯⁰,X) +c(y⁰_j,−)^Induction≥ c(Mδ¯⁰,X,Y⁰) +c(y⁰_j,−)

=c(Mδ¯,X,Y) +c(−,y⁰_j) +c(y⁰_j,−)

triang.

≥ c(Mδ¯,X,Y) +c(−,−)^{sel f}=⁻^id.c(Mδ¯,X,Y)

δ_T+1 =ins_par₍_j₎_,y_j_,l,r for somej∈ {1, . . . ,|Y|},l≤r ∈ {1, . . . ,|$¯(y˜^j)|}. Then, for all j⁰ < j it holds: anc_Y(j⁰) = anc_Y⁰(j⁰). For all j⁰ with j ∈ anc_Y(j⁰) it holds: anc_Y(j⁰) = {j⁰⁰ ∈ anc_Y⁰(j⁰−1)|j⁰⁰ < j} ∪ {j} ∪ {j⁰⁰+1|j⁰⁰ ∈ anc_Y⁰(j⁰−1),j⁰⁰ ≥ j}. Finally, for all j⁰ with j⁰ > j and j ∈/ anc_Y(j⁰) it holds: anc_Y(j⁰) = {j⁰⁰ ∈ anc_Y⁰(j⁰−1)|j⁰⁰ <

j} ∪ {j⁰⁰+1|j⁰⁰ ∈ anc_Y⁰(j⁰−1),j⁰⁰ ≥ j}. In other words, the ancestors for all j⁰ < j are maintained, while the ancestors for j⁰ > j in Y are the ancestors of j⁰ −1 inY⁰, except for j, which may be added as an ancestor. Accordingly, we define Mδ¯:={(i,j⁰)∈ Mδ¯⁰|j⁰ < j} ∪ {(i,j⁰+1)|(i,j⁰)∈ Mδ¯⁰,j⁰ ≥ j}such that Mδ¯is a valid tree mappingbetween XandY.

Further, for the cost we obtain:

c(δ,^¯ X) =c(δ^¯⁰,X) +c(−,y_j)^Induction≥ c(Mδ¯⁰,X,Y⁰) +c(−,y_j) =c(Mδ¯,X,Y)

Therefore, in all cases, we obtain c(Mδ¯,X,Y) ≤ c(_δ,^¯ X)which concludes the proof by induction.

Finally, the third claim follows from the previous two. In particular, consider the following proof by contradition. If D_c(X,Y)<d_c(X,Y), then there exists atree mapping M between X and Y such that c(M,X,Y) < dc(X,Y). However, we have shown that we can construct an edit script δ¯_M such that ¯δ_M(X) = Y and c(δ^¯_M,X) = c(M,X,Y). Therefore,d_c(X,Y)≤ c(δ^¯_M,X) =c(M,X,Y)<d_c(X,Y), which is a contradiction. Con-versely, if dc(X,Y) < Dc(X,Y), then there exists an edit script δ¯ such that ¯δ(X) = Y and c(δ,^¯ X) < D_c(X,Y). However, we have shown that we can construct a tree map-ping Mδ¯ between X and Y such that c(Mδ¯,X,Y) ≤ c(δ,^¯ X). Therefore, D_c(X,Y) ≤ c(Mδ¯,X,Y) ≤ c(δ,^¯ X) < Dc(X,Y), which is also a contradiction. This only leaves the option D_c(X,Y) =d_c(X,Y), which concludes the proof.

As an example for the first construction in LemmaA.3, consider thetreesx˜= a(b) and ˜y=c(d), as well as thetree mappingM ={(1, 2)}. Mwould be translated into the following three edit scripts. First, ¯δ^rep_M =rep_1,y

2 =rep_1,d; second, ¯δ^del_M =del₂; and third, δ¯^ins_M = ins_par

y(1),y1,ry˜(1),ry˜(1)+RM, ˜x, ˜y(1) = ins_0,c,1,2. Note that the third construction works because

R_{M, ˜}_{x, ˜}_y(1) =|adj_y_˜(1)∩J^C(M, ˜x, ˜y)|+

∑

j⁰∈adj_y_˜(₁)∩J(M, ˜x, ˜y)

R_{M, ˜}_{x, ˜}_y(j⁰) =|{2} ∩ {2}|+0=1

Accordingly, thetree mapping M= {(1, 2)}would be translated into theedit script δ¯_M = rep_1,ddel₂ins_0,c,1,2, which does indeed result in ¯δ_M(x˜) = del₂ins_0,c,1,2(d(b)) = ins_0,c,1,2(d) =c(d) =y. The costs are˜ c(δ^¯_M, ˜x) =c(a,d) +c(b,−) +c(−,d) =c(M, ˜x, ˜y).

As an example for the second construction in LemmaA.3, consider thetreesx˜ =aand y˜ =b, as well as theedit scriptδ¯=rep_1,cins_0,b,1,2del2. Thisedit scriptwould be translated into atree mapping as follows. First, we initialize ourtree mappingas Me ={(1, 1)}. Next, consider the first edit, δ₁ = rep_1,c, which transforms ˜x into rep_1,c(a) = c. The corresponding tree mappingremainsMrep_1,c ={(1, 1)}. Next, consider the secondedit, δ₂ =_ins_0,b,1,2, which transformscinto ins0,b,1,2(c) = b(c). The accordingtree mapping would thus beM_rep_1,c_ins_0,b,1,2 ={(1, 2)}. Finally, consider the thirdedit,δ₃ =del₂, which transformsb(c)into del₂(b(c)) = y. The according˜ tree mappingwould thus become Mδ¯=∅. For the costs we obtain

c(Mδ¯, ˜x, ˜y) =c(_a,−) +c(−_,b)^triang.≤ c(_a,c) +c(−_,b) +c(_c,−) =c(_{δ, ˜}^¯ x)

By virtue of LemmaA.3we can compute the cheapesttree mappingbetween two forestsinstead of the cheapest edit scriptwhich transforms oneforestinto the other, as long as ourcost functionfulfills the triangular inequality and is self-equal. This already simplifies our problem significantly because there is only a finite number of possible validtree mappingsbetween two inputforests, while there is an infinite number ofedit scripts. However, the number oftree mappingsis inO(₂^|^x^˜^|·|^y^˜^|)such that an exhaustive enumeration is infeasible. Instead, Zhang and Shasha (1989) propose a dynamic program-ming scheme which relies on decomposing theedit distancebetween two inputforests intoedit distancesbetween subforests. In particular, we define subforests as follows.

Definition A.5 (subforest). Let A be an alphabet, let X be a forest over A, and let k ∈ _N,i∈ Z. Then, we define thesubforest X[k,i]fromk toirecursively as follows. If X= e, thenX[k,i]:=e. Otherwise, letX =x(X₁),X⁰ for somex∈ Aand someforests X₁,X⁰ ∈ T(A)^∗. In that case, we define:

X[k,i]:=







e ifk >i∨k>|X| (X₁,X⁰)[k−1,i−1] if 1<k≤ i x(X₁[1,i−1]),X⁰[1,i− |X₁| −1] if 1=k≤ i

(A.13)

For example, the subforest(a,b,c)[2, 3]would beb,c. The subforest ˜x[2, 4] for ˜x = a(b(c,d),e)would beb(c,d). In general, subforests maintain the structure of the input forest, as the following Lemma demonstrates.

Lemma A.4. Let A be an alphabet, and let X 6= e be a forest over A. Then, for any i ∈ {1, . . . ,|X|}it holds: X[i,rl_X(i)] =x˜ⁱ, that is, the subforest from i to rl_X(i)is the ith subtree according to pre-order.

Proof. Note that X 6= eand i ≤ rlX(i)≤ |X| such that the first case of Equation A.13 does not apply. Now, let X = x(X₁),X⁰ for some x ∈ A and some forests X₁,X⁰ ∈ T(A)^∗ and consider the third case of Equation A.13, that is, i = 1. In that case, we obtain X[1,rlX(1)] = X[1,|X₁|+1] = x(X₁[1,|X₁|),X⁰[1, 0] = x(X₁[1,|X₁|). Recursive application of case 3 yieldsx(X₁[1,|X₁|) =. . .= x(X₁,e[1, 0]) =x(X₁) =x˜¹.

Now, consider case 2 of EquationA.13, that is,i>1, and distinguish the following subcases.

If par_X(i) =0, letX=x˜₁, . . . , ˜x_Rand letr ∈ {1, . . . ,R}be the index such that ˜xⁱ =x˜_r. Accordingly,i=_∑^r_l₌⁻₁¹|x˜_l|. Further, let ˜xⁱ =x˜_r =x_i(Xⁱ)for someforestXⁱ. Now, recursive application of case 2 of EquationA.13 yields X[i,rl_X(i)] = (X₁,X⁰)[i−1,rl_X(i)−1] = . . .= (x˜2, . . . , ˜xR)[i− |x˜₁|,rlX(i)− |x˜₁|] = . . .= (x˜r, . . . , ˜xR)[1,|x˜r|] At this point, case 3 of EquationA.13applies and yields(x˜_r, . . . , ˜x_R)[1,|x˜_r|] = x_i(Xⁱ[1,|Xⁱ|]),(x˜_r+1, ˜x_R)[1, 0] = x_i(Xⁱ[1,|Xⁱ|]) =. . .=x˜ⁱ, which concludes the proof.

Using the concept of subforests, we can now go on to establish the Bellman equations which will form the basis for the dynamic programming Algorithm2.1.

Lemma A.5. LetAbe analphabetand let X,Y be non-emptyforestsoverA. Further, let c be a cost functionoverA.

Then, for any i ∈ {1, . . . ,|X|+1}, j ∈ {1, . . . ,|Y|+1}, k ∈ anc_X(i)∪ {i}, and l ∈ anc_Y(j)∪ {j}it holds:

Dc(_e,e) =₀ _(A.14) D_c(X[i,rl_X(k)],e) =c(x_i,−) +D_c(X[i+1,rl_X(k)],e) (A.15) D_c(e,Y[j,rl_Y(l)]) =c(−,y_j) +D_c(e,Y[j+1,rl_Y(l)]) (A.16)

Dc(X[i,rl_X(k)],Y[j,rl_Y(l)]) =minn

(A.17) c(x_i,−) +D_c(X[i+1,rl_X(k)],Y[j,rl_Y(l)]),

c(−,y_j) +D_c(X[i,rl_X(k)],Y[j+1,rl_Y(l)]), c(x_i,y_j) +D_c(X[i+_1,rl_X(i)]_,Y[j+_1,rl_Y(j)])+

D_c(X[rl_X(i) +1,rl_X(k)],Y[rl_Y(j) +1,rl_Y(l)])^o Dc(X[i,rl_X(k)],Y[j,rl_Y(l)]) =minn

(A.18) c(x_i,−) +D_c(X[i+1,rl_X(k)],Y[j,rl_Y(l)]),

c(−,y_j) +D_c(X[i,rl_X(k)],Y[j+1,rl_Y(l)]),

Dc(x˜_i, ˜y_j) +Dc(X[rlX(i) +1,rlX(k)],Y[rl_Y(j) +1,rl_Y(l)])^o D_c(x˜_i, ˜y_j) =min{c(x_i,−) +D_c(X[i+1,rl_X(i)],Y[j,rl_Y(j)]), (A.19)

c(−_,y_j) +D_c(X[i,rl_X(i)]_,Y[j+_1,rl_Y(j)])_, c(x_i,y_j) +D_c(X[i+1,rl_X(i)],Y[j+1,rl_Y(j)])^o

Proof. First, consider EquationsA.14,A.15, andA.16. In all these cases, only the empty tree mapping M=_∅is possible because at least one inputforestis empty. The cost of the emptytree mappingfor any twoforestsXandYis

c(_∅,X,Y) =

|X|

∑

i=1

c(x_i,−) +

|Y| j

∑

c(−,y_j) This cost decomposes as desired, in particular:

c(_∅,_e,e) =_0,

c(_∅,X[i,rl_X(k)],e) =c(x_i,−) +c(_∅,X[i+1,rl_X(k)],e), and c(_∅,e,Y[j,rl_Y(l)]) =c(−,y_j) +c(_∅,e,Y[j+1,rl_Y(l)])

Next, consider EquationsA.17andA.18. In particular, let Mbe atree mapping be-tween the subforests X[i,rl_X(k)]andY[j,rl_Y(l)]such thatc(M,X[i,rl_X(k)],Y[j,rl_Y(l)]) = D_c(X[i,rl_X(k)],Y[j,rl_Y(l)]). To avoid symbol clutter, we will use the shorthands X_i := X[_i,_rl_X(_k)]_, _X_i₊₁ _:= _X[_i+_1,_rl_X(_k)]_,_Y_j _:= _Y[_j,_rl_Y(_l)]_{, and}_Y_j₊₁ _:= _Y[_j+_1,_rl_Y(_l)]_{. Now,} one of the following three cases has to apply:

1∈ I(M,X_i,Y_j): In this case, M⁰ := {(i⁰ −1,j⁰)|(i⁰,j⁰) ∈ M} is a tree mapping be-tweenX_i+1andY_j. Further, it holdsc(M⁰,X_i+1,Y_j) =D_c(X_i+1,Y_j). Otherwise, there would exist atree mapping M˜⁰ betweenX_i+1 andY_j, such thatc(M^˜⁰,X_i+1,Y_j) <

c(M⁰,X_i+1,Y_j). In that case, consider ˜M := {(i⁰+1,j⁰)|(i⁰,j⁰)∈ M^˜⁰}, which is atree mappingbetweenX_i andY_j, such that:

Dc(X_i,Y_j)≤c(M,^˜ X_i,Y_j) =c(x_i,−) +c(M^˜⁰,X_i+1,Y_j)

<c(x_i,−) +c(M⁰,X_i+1,Y_j) =c(M,X_i,Y_j) =D_c(X_i,Y_j)

which is a contradiction. Therefore, it holds:

D_c(X_i,Y_j) =c(M,X_i,Y_j) =c(x_i,−) +c(M⁰,X_i+1,Y_j)

= c(x_i,−) +D_c(X_i+₁,Y_j) _(A.20) 1∈ J(M,X_i,Y_j): In this case, M⁰ := {(i⁰,j⁰ −1)|(i⁰,j⁰) ∈ M} is a tree mapping

be-tweenX_i andY_j+1. Further, it holdsc(M⁰,X_i,Y_j+1) =D_c(X_i,Y_j+1). Otherwise, there would exist a tree mapping M˜⁰ between X_i and Y_j+1, such thatc(M^˜⁰,X_i,Y_j+1) <

c(M⁰,X_i,Y_j+1). In that case, consider ˜M:={(i⁰,j⁰+1)|(i⁰,j⁰)∈ M^˜⁰}, which is atree mappingbetween X_i andY_j, such that:

D_c(X_i,Y_j)≤c(M,^˜ X_i,Y_j) =c(−,y_j) +c(M^˜⁰,X_i,Y_j+1)

<c(−,y_j) +c(M⁰,X_i,Y_j+1) =c(M,X_i,Y_j) =Dc(X_i,Y_j) which is a contradiction. Therefore, it holds:

Dc(X_i,Y_j) =c(M,X_i,Y_j) =c(−,y_j) +c(M⁰,X_i,Y_j+1)

= c(−,y_j) +D_c(X_i,Y_j+1) (A.21) 1∈ I^C(M,X_i,Y_j)∧1∈ J^C(M,X_i,Y_j)_: In this case, we first show that(1, 1)∈ M. If that would not be the case, there would exist ai∈ {1, . . . ,|X_i|}and a j∈ {1, . . . ,|Y_j|}, such that(1,j)∈ M,(i, 1)∈ M, andi6=1 or j6=1. Ifi>1, Equation2.21implies thatj<1, which is a contradiction. Conversely, if j>1, Equation2.21implies that i<1, which is a contradiction. Therefore,i= j=1 and, thus,(1, 1)∈ M.

Now, Equation2.22 implies that for all(i⁰,j⁰)∈ Mit must hold: 1∈anc_X_i(i⁰) ⇐⇒

1 ∈ anc_Y_j(j⁰). In conjunction with Equation A.6, we obtain 1 ≤ i⁰ ≤ |x˜ⁱ| ⇐⇒

1 ≤ j⁰ ≤ |y˜^j|. Accordingly, M must be decomposable as M = M₁∪M2 where for all (i⁰,j⁰) ∈ M₁ it holds i⁰ ≤ |x˜ⁱ| and j⁰ ≤ |y˜^j|; and for all (i⁰,j⁰) ∈ M₂ it holds i⁰ > |x˜ⁱ| and j⁰ > |y˜^j|. This, in turn, implies that M₁ is a tree mapping between X[i,rl_X(i)] ^Lemma= ^A.4 x˜ⁱ and Y[j,rl_Y(j)] ^Lemma= ^A.4 y˜^j, and M₂⁰ := {(i⁰ −

|x˜ⁱ|,j⁰− |y˜^j|)|(i⁰,j⁰) ∈ M₂} is a tree mappingbetween X⁰ := X[rl_X(i) +1,rl_X(k)]

andY⁰ :=Y[rl_Y(j) +1,rl_Y(l)].

Further, it holds c(M₁, ˜xⁱ, ˜y^j) = Dc(x˜ⁱ, ˜y^j). Otherwise, there would exist a tree mapping M˜₁between ˜xⁱ and ˜y^j, such thatc(M^˜₁, ˜xⁱ, ˜y^j)< c(M₁, ˜xⁱ, ˜y^j). In that case, consider ˜M := M^˜₁∪M₂, which is atree mappingbetweenX_i andY_j such that:

Dc(X_i,Y_j)≤ c(M,^˜ X_i,Y_j) =c(M^˜₁, ˜xⁱ, ˜y^j) +c(M⁰₂,X⁰,Y⁰)

< c(M₁, ˜xⁱ, ˜y^j) +c(M₂⁰,X⁰,Y⁰) =c(M,X_i,Y_j) =D_c(X_i,Y_j)

which is a contradiction. Also, it holdsc(M2,X⁰,Y⁰) =Dc(X⁰,Y⁰). Otherwise, there would exist a tree mapping M˜₂⁰ between X⁰ and Y⁰, such that c(M^˜⁰₂,X⁰,Y⁰) <

c(M₂,X⁰,Y⁰). In that case, consider ˜M := M₁∪ {(i⁰+|x˜_i|,j⁰+|y˜_j|)|(i⁰,j⁰) ∈ M^˜⁰₂} which is atree mappingbetweenX_i andY_j such that:

Dc(X_i,Y_j)≤ c(M,^˜ X_i,Y_j) =c(M₁, ˜xⁱ, ˜y^j) +c(M^˜⁰₂,X⁰,Y⁰)

< c(M₁, ˜xⁱ, ˜y^j) +c(M₂⁰,X⁰,Y⁰) =c(M,X_i,Y_j) =D_c(X_i,Y_j)

which is a contradiction. Therefore, we obtain:

Dc(X_i,Y_j) =c(M,X_i,Y_j) =c(M₁, ˜xⁱ, ˜y^j) +c(M⁰₂,X⁰,Y⁰) (A.22)

= D_c(x˜ⁱ, ˜y^j) +D_c X[rl_X(i) +1,rl_X(k)],Y[rl_Y(j) +1,rl_Y(l)]

Finally, consider the term Dc(x˜ⁱ, ˜y^j). Because (1, 1) ∈ M₁, it follows that M₁⁰ := {(i⁰ −1,j⁰−1)|(i⁰,j⁰) ∈ M₁\ {(1, 1)}} is a tree mapping between X⁰_i₊₁ := X[i+ 1,rl_X(i)]andY_j⁰₊₁:=Y[j+1,rl_Y(j)]. Further, it holdsc(M₁⁰,X_i⁰₊₁,Y_j⁰₊₁) = D_c(X⁰_i₊₁,Y_j⁰₊₁). If that would not be the case, there would exist atree mapping M˜⁰₁between X⁰_i₊₁ and Y_j⁰₊₁, such that c(M^˜₁⁰,X_i⁰₊₁,Y_j⁰₊₁) < c(M⁰₁,X⁰_i₊₁,Y_j⁰₊₁). In that case, consider M˜₁:={(1, 1)} ∪ {(i⁰+1,j⁰+1)|(i⁰,j⁰)∈ M^˜₁⁰}, which is atree mappingbetween ˜xⁱ and ˜y^j, such that:

Dc(x˜ⁱ, ˜y^j)≤c(M^˜₁, ˜xⁱ, ˜y^j) =c(x_i,y_j) +c(M^˜₁⁰,X_i⁰₊₁,Y_j⁰₊₁)

<c(x_i,y_j) +c(M⁰₁,X⁰_i₊₁,Y_j⁰₊₁) =c(M₁, ˜xⁱ, ˜y^j) =Dc(x˜ⁱ, ˜y^j) which is a contradiction. Therefore, it holds:

Dc(x˜ⁱ, ˜y^j) =c(M₁, ˜xⁱ, ˜y^j) =c(x_i,y_j) +c(M⁰₁,X⁰_i₊₁,Y_j⁰₊₁) (A.23)

=c(x_i,y_j) +Dc(X[i+1,rl_X(i)],Y[j+1,rl_Y(j)])

Note that these three cases are exhaustive, that is, one of the EquationsA.20,A.21, or A.22 has to apply. Further, the cheapest option of these three has to apply, oth-erwise c(M,X_i,Y_j) > D_c(X_i,Y_j), which would be a contradiction. The minimum of Equations A.20,A.21, andA.22yields EquationA.18. If we then plug Equation A.23into EquationA.18we obtain EquationA.17.

Finally, consider EquationA.19. We obtain this equation by settingk= iand l=jin EquationA.17, thus yielding:

D_c(x˜ⁱ, ˜y^j)^Lemma= ^A.4D_c(X[i,rl_X(i)],Y[j,rl_Y(j)]) =minn c(x_i,−) +D_c(X[i+_1,rl_X(i)]_,Y[j,rl_Y(j)])_, c(−,y_j) +D_c(X[i,rl_X(i)],Y[j+1,rl_Y(j)]), c(x_i,y_j) +Dc(X[i+1,rlX(i)],Y[j+1,rl_Y(j)])+

D_c(X[rl_X(i) +1,rl_X(i)],Y[rl_Y(j) +1,rl_Y(j)])^o

Note that X[rlX(i) +1,rlX(i)] =eandY[rl_Y(j) +1,rl_Y(j)] =e. Therefore,Dc(X[rlX(i) + 1,rl_X(i)],Y[rl_Y(j) +1,rl_Y(j)]) =D_c(e,e)^Eq.=^A.140, which in turn yields EquationA.19.

An example for the decompositions in Equations A.18 and A.19 is shown in Fig-ureA.1.

Using these decompositions, we can finally prove the invariants of Algorithm2.1, which then imply the correctness of the algorithm.

Lemma A.6. LetAbe analphabet, letx and˜ y be˜ treesoverA, and let c be acost functionover A. Then, after each completion of lines 6-26 in Algorithm2.1for the inputx,˜ y, and c it holds for˜ all i∈ {k, . . . ,rlx˜(k)}and all j ∈ {l, . . . ,rly˜(l)}:

D_i,j =D_c(x˜[i,rlx˜(k)], ˜y[j,rly˜(l)]) and (A.24)

d_i,j =D_c(x˜ⁱ, ˜y^j) (A.25)

D_c b

d c

e, f g

c(_b,−) +D_c

c d e, f g

d c

, f g

+_D_c(_e,e)

c(−,f) +D_c b

d c

e,g

c(b,−) +D_c c d, f

c(_b,f) +D_c c d,g

c(−,f) +Dc

d c

,g deleteb

replacebwithf

insertf

deleteb

replacebwithf

insertf

Figure A.1:An illustration of the decompositions in EquationsA.18(top) andA.19(bottom) for the example subforestsXi =b(c,d),eandYj=f(g).

Proof. We perform an inductive argument overiand jin descending order. First, con-sider the base cases. If i = rl_x_˜(k) +1 and j = rl_y_˜(l) +1, we obtain D_c(x˜[rl_x_˜(k) + 1,rlx˜(k)], ˜y[rly˜(l) +1,rly˜(l)]) = Dc(e,e) ^Eq.=^A.14 0, which is correctly computed in line 6.

Further, ifi≤rlx˜(k)andj=rly˜(l) +1, we obtainDc(x˜[i,rlx˜(k)], ˜y[rly˜(l) +1,rly˜(l)]) = D_c(x˜[i,rl_x_˜(k)],e) ^Eq.=^A.15 c(x_i,−) +D_c(x˜[i+1,rl_x_˜(k)], ˜y[rl_y_˜(l) +1,rl_y_˜(l)]), which is cor-rectly computed in lines 7-9 for alli∈ {k, . . . ,rlx˜(k)}.

Similarly, ifj≤rl_y_˜(l)_andi=rl_x_˜(k) +1, we obtainD_c(x˜[rl_x_˜(k) +_1,rl_x_˜(k)]_{, ˜}y[j,rl_y_˜(l)]) = D_c(e, ˜y[j,rl_y_˜(l)]) ^Eq.=^A.15 c(−,y_j) +D_c(x˜[rl_x_˜(k) +1,rl_x_˜(k)], ˜y[j+1,rl_y_˜(l)]), which is cor-rectly computed in lines 10-12 for allj∈ {l, . . . ,rl_y_˜(l)}.

Now, consider the casei≤rl_x_˜(k)andj≤rl_y_˜(l). Per induction, we already know that D_i+1,j =D_c(x˜[i+1,rlx˜(k)], ˜y[j,rly˜(l)]),

D_i,j+1 =D_c(x˜[i,rl_x_˜(k)], ˜y[j+1,rl_y_˜(l)]),

D_i+1,j+1 =Dc(x˜[i+1,rlx˜(k)], ˜y[j+1,rly˜(l)]), and D_rl_x_˜₍_i₎₊_1,rl_y_˜₍_j₎₊₁ =D_c(x˜[rl_x_˜(i) +1,rl_x_˜(k)], ˜y[rl_y_˜(j) +1,rl_y_˜(l)]).

Now, distinguish the following cases.

If rlx˜(i) = rlx˜(k) and rly˜(j) = rly˜(l), we obtain ˜x[i,rlx˜(k)] = x˜[i,rlx˜(i)] ^Lemma= ^A.4 x˜ⁱ and ˜y[j,rly˜(l)] =y˜[j,rly˜(j)]^Lemma= ^A.4y˜^j, such thatDc(x˜[i,rlx˜(k)], ˜y[j,rly˜(l)]) = Dc(x˜ⁱ, ˜y^j), which can be computed according to Equation A.19. Therefore, lines 16-18 of Algo-rithm2.1ensure thatD_i,j = D_c(x˜[i,rl_x_˜(k)], ˜y[j,rl_y_˜(l)]). Further, becauseD_i,j is now equiv-alent toD_c(x˜ⁱ, ˜y^j), line 19 is correct as well.

If rl_x_˜(i) 6= rl_x_˜(k) _orrl_y_˜(j) 6= rl_y_˜(l), the decomposition of D_c(x˜[i,rl_x_˜(k)]_{, ˜}y[j,rl_y_˜(l)]) according to Equation A.18applies. Accordingly, lines 21-23 of Algorithm2.1 ensure D_i,j =D_c(x˜[i,rl_x_˜(k)], ˜y[j,rl_y_˜(l)]), under the condition thatd_i,j = D_c(x˜ⁱ, ˜y^j). We know that this condition holds if we have executed lines 6-26 before with the keyrootskx˜(i)_and ky˜(j). Because the loops in lines 4-5 iterate thekeyrootsin descending oder, it remains to show thatk<k_x_˜(i)andl≤k_y_˜(j), ork ≤k_x_˜(i)andl<k_y_˜(j).

First, consider the caserl_x_˜(i) 6= rl_x_˜(k). In that case,k 6= k_x_˜(i) andk 6= i, otherwise rlx˜(k) =rlx˜(i) =rlx˜(kx˜(i)), which is a contradiction. Further, it must holdk <i≤rlx˜(k), otherwiseiwould not be accessed in the loop in line 13. In turn, EquationA.7implies thatk∈anc_x_˜(i). Further, due to the definition ofkeyroots, k_x_˜(i)≤i≤rl_x_˜(i) =rl_x_˜(k_x_˜(i)). Now, if kx˜(i) = i, we obtain k < i = kx˜(i) as desired. Otherwise, we obtain kx˜(i) <

i ≤ rlx˜(kx˜(i)), such that EquationA.7implies that kx˜(i) ∈ ancx˜(i). Now, assume that k_x_˜(i)< k. In that case, EquationA.8implies k_x_˜(i)∈anc_x_˜(k). Consequently, EquationA.6 tells us that rlx˜(_k) ≤ rlx˜(_k_x_˜(_i)). However, due to k ∈ ancx˜(_i)_{, Equation} _A.6_{also tells} us that rlx˜(k) ≥ rlx˜(i) = rlx˜(kx˜(i)), such that rlx˜(k) = rlx˜(kx˜(i)) = rlx˜(i), which is a contradiction. Therefore, we can conclude that k < k_x_˜(i). It remains to show that l≤ky˜(j)_{. If}rl_y_˜(l) =rl_y_˜(j)_{, then}l=_k_y_˜(j), because the minimum is unique. Otherwise, rly˜(l)6=rly˜(j).

If rly˜(j) 6= rly˜(l), we know that l 6= ky˜(j) and l 6= j, otherwise rly˜(l) = rly˜(j) = rl_y_˜(k_y_˜(j)), which is a contradiction. Further, it must holdl< j≤rl_y_˜(l), otherwisejwould not be accessed in the loop in line 14. In turn, Equation A.7implies that l ∈ ancy˜(j). Further, due to the definition of keyroots, ky˜(j) ≤ j ≤ rly˜(j) = rly˜(ky˜(j)). Now, if k_y_˜(j) = j, we obtainl< j=k_y_˜(j)as desired. Otherwise, we obtain k_y_˜(j)<j≤rl_y_˜(k_y_˜(j)), such that Equation A.7 implies that ky˜(j) ∈ ancy˜(j). Now, assume that ky˜(j) < l. In that case, Equation A.8 implies ky˜(j) ∈ ancy˜(l). Consequently, Equation A.6 tells us that rl_y_˜(l) ≤ rl_y_˜(k_y_˜(j)). However, due to l ∈ anc_y_˜(j), Equation A.6 also tells us that rly˜(l)≥rly˜(j) =rly˜(ky˜(j)), such thatrly˜(l) =rly˜(ky˜(j)) =rly˜(j), which is a contradiction.

Therefore, we can conclude thatl<ky˜(j). It remains to show that k≤kx˜(i). Ifrlx˜(k) = rl_x_˜(i), thenk=k_x_˜(i), because the minimum is unique. Otherwise,rl_x_˜(k)6=rl_x_˜(i), which implies k<_k_x_˜(_i), as we have shown above.

Now, we can finally complete the proof. First, note that Lemma A.3implies that dc(x, ˜˜ y)is equivalent toDc(x˜¹, ˜y¹)ifcfulfills the triangular inequality and is self-equal.

Further, Lemma A.6tells us that the output of Algorithm2.1,d_1,1, is equal to D_c(x˜¹, ˜y¹) if keyrootsk∈ K(x˜)andl∈ K(y˜)exist such thatk≤ 1≤rl_x_˜(k)andl≤1 ≤rl_y_˜(l). Per definition ofoutermost right leaves, we know thatrlx˜(1) =|x˜|andrly˜(1) =|y˜|. Further, per definition ofkeyroots, we know that k_x_˜(|x˜|) = min{k|rl_x_˜(k) = rl_x_˜(|x˜|)} = 1, and k_y_˜(|y˜|) =min{l|rl_y_˜(l) =rl_y_˜(|y˜|)}=1 because 1 is the lowest possible index. Therefore, 1∈ K(x˜)and 1∈ K(y˜), which concludes the overall proof.

Im Dokument Metric Learning for Structured Data (Seite 184-195)