• Keine Ergebnisse gefunden

Approximation via data sparsification

Im Dokument Computational Topology for Data Analysis (Seite 165-173)

Topological Analysis of Point Clouds

6.2 Approximation via data sparsification

One issue with using the Vietoris-Rips or ˇCech filtrations in practice is that their sizes can become huge, even for moderate number of points. For example, when the scale r is larger than the diameter of a point setP, the ˇCech and the Vietoris-Rips complexes ofPcontain every simplex

(a) (b) (c)

Figure 6.2: Vietoris-Rips complex: (b) at small scale, the Rips complex of points shown in (a) requires the two white points; (c) the two white points become redundant at larger scale.

spanned by points in P, in which case the size ofd-skeleton ofCr(P) orVRr(P) isΘ(nd+1) for n=|P|.

On the other hand, as shown in Figure 6.2, as the scale r increases, certain points could become “redundant”, e.g, having no or little contribution to the underlying space of the union of allr-radius balls. Based on this observation, one can approximate these filtrations withsparsified filtrationsof much smaller size. In particular, as the scalerincreases, the point setPwith which one constructs a complex is gradually sparsified keeping the total number of simplicies in the complex linear in the input size ofPwhere the dimension of the embedding space is assumed to be fixed.

We describe two data sparsification schemes in Sections 6.2.1 and 6.2.2, respectively. We focus on the Vietoris-Rips filtration for points in a Euclidean spaceRdequipped with the standard Euclidean distanced.

6.2.1 Data sparsification for Rips filtration via reweighting

Most of the concepts presented in this section apply to general finite metric spaces though we describe them for finite point sets equipped with an Euclidean metric. The reason for this choice is that the complexity analysis draws upon the specific property of Euclidean space. The reader is encouraged to think about generalizing the definitions and the technique to other metric spaces.

First we restate the definition of δ-sample and δ-sparse sample in Definition 2.17 slightly differently.

Definition 6.4 (Nets and net-tower). Given a finite set of points P ⊂ (Rd,d) and γ, γ0 ≥ 0, a subsetQ⊆Pis a (γ, γ0)-net of Pif the following two conditions hold:

Covering condition: Qis aγ-sample for (P,d), i.e., for everyp∈P,d(p,Q)≤γ.

Packing condition: Qis alsoγ-sparse, i.e., for everyq,q0∈Q,d(q,q0)≥γ0. Ifγ =γ0, we also refer toQas aγ-net of P.

A single-parameter family of nets{Nγ}γ is called anet-tower ofP if (i) there is a constant c>0 so that for allγ∈R,Nγis a (γ, γ/c)-net forP, and (ii)Nγ ⊇Nγ0for anyγ≤γ0.

Intuitively, aγ-net approximates a PCD Pat resolution γ (Covering condition), while also being sparse (Packing condition). A net-tower provides a sequence of increasingly sparsified approximation ofP.

Net-tower via farthest point sampling. We now introduce a specific net-tower constructed via the classical strategy offarthest point sampling, also called greedy permutation e.g. in [56, 70].

Given a point set P ⊂ (Rd,d), choose an arbitrary point p1 fromPand set P1 = {p1}. Pick pi

recursively as pi ∈argmaxp∈P\Pi−1d(p,Pi−1)1, and setPi = Pi−1∪ {pi}. Now settpi =d(pi,Pi−1), which we refer to as theexit-time of pi. Based on these exit-times, we construct the following two families of sets:

In what follows, we discuss a sparsification strategy for the Rips filtration ofPusing the above net-towers. The approach can be extended to other net-towers, such as the net-tower constructed using the net-tree data structure of [182].

Weights, weighted distance, and sparse Rips filtration. Given the exit-timetps for all points p∈P, we now associate aweight wp(α) for each pointpat a scaleαas follows (the graph of this weight function is shown on the right): for some constant 0< ε <1,

wp(α)=

Claim 6.1. The weight function wpis a continuous,1-Lipschitz, and non-decreasing function.

The parameter εcontrols the resolution of the sparsification. The net-induced distance at scaleαbetween input points is defined as:

bdα(p,q) :=d(p,q)+wp(α)+wq(α). (6.8) Definition 6.5(Sparse (Vietoris-)Rips). Given a set of pointsP ⊂Rd, a constant 0< ε <1, and the open net-tower{Nγ}as well as the closed net-tower{Nγ}forPas introduced above, theopen sparse-Rips complex at scaleαis defined as

Qα:={σ⊆Nε(1−ε)α| ∀p,q∈σ, bdα(p,q)≤2α}; (6.9)

1Note that there may be multiple points that maximized(p,Pi−1) making argmaxp∈P\P

i1d(p,Pi−1) a set. We can choosepito be any point in this set.

while theclosed sparse-Rips at scaleαis defined as

Qα:={σ⊆Nε(1−ε)α| ∀p,q∈σ, bdα(p,q)≤2α}. (6.10) SetSα:=∪β≤αQα, which we call thecumulative complex at scaleα. The(ε-)sparse Rips filtration then refers to theR-indexed filtrationS={Sα,→Sβ}α≤β.

Obviously,Qα⊆Qα. Note that forα < β,Qαis not necessarily included inQβ(neither isQα inQβ); while the inclusionSα ⊆Sβalways holds.

In what follows, we show that the sparse Rips filtration approximates the standard Vietoris-Rips filtration{VRr(P)}defined overP, and that the size of the sparse Rips filtration is only linear innfor any fixed dimensiondwhich is assumed to be constant. The main results are summarized in the following theorem.

Theorem 6.4. Let P⊂Rdbe a set of n points where d is a constant, andR(P)={VRr(P)}be the Vietoris-Rips filtration over P. Given net-towers{Nγ}and{Nγ}induced by exit-times{tp}p∈P, let S(P)={Sα}be its correspondingε-sparse Rips filtration as defined in Definition 6.5. Then, for a fixed0< ε < 13,

(i) S(P) andR(P)are multiplicatively 1−ε1 -interleaved at the homology level. Thus, for any k ≥ 0, the persistence diagramDgmkS(P)is alog1−ε1 -approximation ofDgmkR(P)at the log-scale.

(ii) For any fixed dimension k ≥ 0, the total number of k-simplices ever appeared inS(P) is Θ((1ε)kdn).

In the remainder of this section, we sketch the proof of the above theorem.

Proof of part (i) of Theorem 6.4. To relateS(P) toR(P), we need to go through a sequence of intermediate steps. First, we define therelaxed Rips complexat scaleαas

VRcα(P) :={σ⊂P| ∀p,q∈σ, bdα(p,q)≤2α}.

The following claim ensures that the relaxed Rips complexes form a valid filtration connected by inclusionsbR(P)={VRcα(P),→VRcβ(P)}α≤β,which we call therelaxed Rips filtration.

Claim 6.2. Ifbdα(p,q)≤2α≤2β, thenbdβ(p,q)≤2β.

Proof. The weight functionwpis 1-Lipschitz for any p∈P(Claim 6.1). Thus we have that bdβ(p,q)=d(p,q)+wp(β)+wq(β)

≤d(p,q)+wp(α)+(β−α)+wq(α)+(β−α)

≤d(p,q)+wp(α)+wq(α)−2α+2β ≤2β.

The last inequality follows fromd(p,q)+wp(α)+wq(α)=bdα(p,q)≤2α.

In what follows, we drop the argumentPfrom notations such as in complexesVRα(P) or in sparse Rips filtrationS(P) when the point set in question is understood.

Proposition 6.5. Let C= 11−ε. Then for anyα≥0we have thatVRα/CVRcαVRα.

Next, we relate filtrationsSandRvia the relaxed Rips filtrationbRby connecting the sparse Rips complexesQαs andQαs. Consider the following projection ofPto points in the netNε(1−ε)α which are also vertices ofQα:

πα(p)=





p ifp∈Nε(1−ε)α

argminq∈Nεαd(p,q) otherwise

Again, if argminq∈Nεαd(p,q) contains more than one point, we set πα(p) to be an arbitrary one.

This projection is well-defined as Nεα ⊆ Nε(1−ε)α given that 0 < ε < 1/3 < 1. We need sev-eral technical results on this projection map, which we rely on later to construct maps between appropriate versions of Rips complexes. First, the following two results are easy to show.

Fact 6.1. For every p∈P,d(p, πα(p))≤wp(α)−wπα(p)(α)≤εα.

Fact 6.2. For every pair p,q∈P, we have thatbdα(p, πα(q))≤bdα(p,q).

We are now ready to show that inclusion induces an isomorphism between the homology groups of the sparse Rips complex and the relaxed Rips complex.

Proposition 6.6. For anyα ≥ 0, the inclusion i : Qα ,→ VRcα induces an isomorphism at the homology level; that is,H(Qα)H(VRcα)under the homomorphism iinduced by i.

Proof. First, we consider the projection map πα and argue that it induces a simplicial map πα : VRcα → Qα which is in fact a simplicial retraction2. Next, we show that the mapi◦πα : VRcαVRcαis contiguous to the identity mapid :VRcαVRcα. Asπαis a simplicial retraction, it follows thatiis an isomorphism (Lemma 2 of [275]).

To see thatπαis a simplicial map, apply Fact 6.2 twice to have that

bdαα(p), πα(q))≤bdα(p, πα(q))≤bdα(p,q). (6.11) Since bothQαandVRcαare clique complexes, this then implies thatπαis a simplicial map. Fur-thermore, it is easy to see that it is a retraction as πα(q) = qfor any q in the vertex set ofQα (which isNε(1−ε)α).

Now to show thati◦παis contiguous toid, we observe that for anyp,q∈Pwithbdα(p,q)≤2α, all edges among {p,q, πα(p), πα(q)} exist and thus all simplices spanned by them exist inVRcα. Indeed, thatbdαα(p), πα(q)) ≤ 2αis already shown above in Eqn (6.11). Combining Fact 6.1 with the fact thatwp(α)≤εα, we have that

bdα(p, πα(p))=d(p, πα(p))+wp(α)+wπα(p)(α)≤2wp(α)≤2εα <2α.

Furthermore, by Fact 6.2,bdα(p, πα(q))≤bdα(p,q) ≤2α. Symmetric arguments can be applied to show thatbdα(q, πα(q)),bdα(q, πα(p)) ≤ 2α. This establishes thati◦πα is contiguous toid. This

proves the proposition.

2A simplicial retractionf:KLis a simplicial map fromKLtoLso that f(σ)=σfor anyσK.

The closed sparse-Rips complexQαis the relaxed Rips complex over the vertex setNε(1−ε)α, which is a superset of the vertex set of Qα. Hence the above proposition also holds for the inclusionQα ,→Qα. It then follows thatH(Qα) H(Qα). Finally, we show that the inclusion also induces an isomorphism betweenH(Qα) andH(Sα), which when combined with the above results connectsSαandVRcα.

Proposition 6.7. For anyα ≥ 0, the inclusion h : Qα ,→ Sα induces an isomorphism at the homology level, that is,H(Qα)H(Sα)under h.

Proof. Consider the sequence{Sα}α∈R. First, we discretizeαto have distinct valuesα0 < α1 <

α2. . . αmso that Sα0 = ∅, andαis are exactly the time when the combinatorial structure of Sα changes. AsSα=P

β≤αQβ, these are also exactly the moments when the combinatorial structure ofQαchanges. Hence we only need to prove the statement for suchαi’s, and it will then work for allα’s. Setλi:=ε(1−ε)αi. Note that the vertex set forQαi isNλi by the definition ofQαin Eqn (6.10).

Now fix ak ≥0. We will show thath:Qαk ,→Sαk induces an isomorphism at the homololgy level. We use some intermediate complexes

Ti,k :=

k

[

j=i

Qαj, fori∈[1,k].

Obviously,T1,k =Sαk whereTk,k=Qαk. Sethi:Ti+1,k ,→Ti,k. The inclusionh:Qαk ,→Sαk can then be written ash=h1◦h2◦ · · · ◦hk−1. In what follows, we prove thathi :Ti+1,k ,→Ti,kinduces an isomorphism at the homology level for eachi∈[1,k−1], which then proves the proposition.

First, note that whileTi,k is not necessarily the same asQαi, they share the same vertex set.

Now, because of our choices ofαis andλis, the vertex set ofTi+1,k, which is the vertex set ofQαi+1, namely Nλi+1, equalsNλi. Hence we can consider the projectionπαi : Ti,k →Ti+1,kgiven by the projection of the vertex setNλi−1 =NλiofTi,kto the vertex setNλi =Nλi+1 ofTi+1,k. To prove that hi induces an isomorphism at the homology level, by Lemma 2 of [275], it suffices to show that (i)παi is a simplicial retraction, and (ii)hi◦παi is contiguous to the identity mapid:Ti,k →Ti,k.

To prove (i), it is easy to verify thatπαi is a retraction. To see thatπαi induces a simplicial map, we need to show that for everyσ∈ Ti,kαi(σ) ∈Ti+1,k. Asπαi is a retraction, we only need to prove this for everyσ∈Ti,k\Ti+1,k. On the other hand, note that by definition,Ti,k\Ti+1,k ⊆Qαi. To this end, the argument in Proposition 6.6 also shows thatπαi :Qαi →Qαi is a simplicial map, and furthermore,h0◦παi is contiguous toid0 : Qαi → Qαi, whereh0 :Qαi ,→ Qαi. Because of our choice ofαis, Qαi andQαi+1 have the same vertex set, which isNλi. Furthermore, for every edge (p,q) ∈ Qαi, we have thatbdαi(p,q) ≤ 2αi. As αi < αi+1, it follows from Claim 6.2 that bdαi+1(p,q) ≤ 2αi+1. Hence, the edge (p,q) is in Qαi+1. This implies thatQαi ⊆ Qαi+1. Putting everything together, it follows that, for everyσ∈Ti,k\Ti+1,k ⊆Qαi, we have

παi(σ)∈Qαi ⊆Qαi+1 ⊆Ti+1,k. Therefore,παi is a simplicial map. This finishes the proof of (i).

Now we prove (ii), that is,hi◦παi is contiguous to the identity mapid : Ti,k → Ti,k. This means that we need to show for everyσ∈Ti,k,σ∪παi(σ)∈Ti,k. Again, asπαi is a simplicial re-traction, we only need to show this forσ∈Ti,k\Ti+1,k ⊆Qαi. As mentioned above, using the same argument as in Proposition 6.6, we know thath0◦παi is contiguous to the identityid0:Qαi →Qαi. Hence we have that for everyσ ∈ Qαi,σ∪παi(σ) ∈ Qαi. It follows that σ∪παi(σ) ∈ Ti,k as Qαi ⊆Ti,k. This proves (ii) completing the proof of the proposition.

Combining Propositions 6.6 (as well as the discussion after this proposition) and 6.7, we have that{Sα}and{VRcα}induces isomorphic persistence modules. This, together with Proposition 6.5, implies part (i) of Theorem 6.4.

Proof of part (ii) of Theorem 6.4. LetS(k)denote the set ofk-simplices ever appeared inS(P), which is also the set ofk-simplices in the last complexSofS(P). To bound the size ofS(k), we charge each simplex inS(k)to the vertex of it withsmallestexit-time. Observe that a pointp∈P does not contribute to any new edge in the sparse Rips complex Qβforβ > ε(1tp−ε). This means that to bound the number of simplices charged top, we only need to bound such simplices inQαp withαp= ε(1−ε)tp .

Set E(p) = {q ∈ P | (p,q) ∈ Qαp and tp ≤ tq}. We add p to E(p) too. We claim that

|E(p)| = O((1ε)d). In particular, consider the closed net-tower{Nγ}; recall thatNγ is aγ-net. As E(p) ⊆ Ntp, the packing-condition of the net implies that the closest pair in E(p) has distance at least tp between them. On the other hand, for each (p,q) ∈ Qαp, we havebdαp(p,q) ≤ 2αp

implying that theE(p)⊆ B(p,2αp). A simple packing argument then implies that the number of points inE(p) is

The last equality follows becauseε <1/3 and thus 1−ε≥2/3. The total number ofk-simplices charged to p is bounded by O((1ε)kd), and the total number k-simplices in S(P) is O((1ε)kdn), proving part (ii) of Theorem 6.4.

6.2.2 Approximation via simplicial tower

We now describe a different sparsification strategy by directly building a simplicial tower of Rips complexes connected by simplicial maps (Definition 4.1 and the discussion below it) whose per-sistent homology also approximates that of the standard Rips-filtration. This sparsification is con-ceptually simpler, but its approximation quality is worse than the one introduced in the previous section.

Given a set of pointsP⊂ Rd,α >0, and some 0 < ε <1, we are interested in the following filtration (which is a subsequence of the standard Rips filtration)

VRα(P),→VRα(1+ε)(P),→VRα(1+ε)2(P),→ · · ·,→VRα(1+ε)m(P). (6.12) We now construct a sparsified sequence by settingP0 := P, building a sequence of point sets Pk,k= 0,1, . . . ,mwherePk+1 is aαε2(1+ε)k−1-net ofPk, and terminating the process whenPm

is of constant size.

Consider the following vertex mapπk :Pk →Pk+1, for anyk∈[0,m−1], whereπk(v) is the

Here, the maps iks and jks are canonical inclusions.

The above result implies that at the homology level, the sequence in Eqn (6.13) and the se-quence Eqn. (6.12) areweakly(1+ε)-interleavedin a multiplicative manner. In particular, dif-ferent from the interleaving introduced by Definition 4.4 in Chapter 4.1, here the interleaving relations only hold atdiscrete index valuesof the filtrations.

Definition 6.6 (Weakly interleaving of vector space towers). Let U = Ua resolutiona0≥0. For some real numberε≥0, we say that they areweaklyε-interleavedif there are two families of linear maps φi : Ua0+ → Va0+(i+1)ε, and ψi : Va0+ → Ua0+(i+1)ε, for any integeri≥0, such that any subdiagram of the following diagram commutes:

U: Ua0 // ##

It turns out that to verify the commutativity of the diagram in Eqn. (6.14), it is sufficient to verify it for all subdiagrams of the form as in Eqn. (4.3). Furthermore, ε-weakly interleaved persistence modules also have bounded bottleneck distances between their persistence diagrams [77] though the distance bound is relaxed to 3ε, that is, if Uand V are weakly-ε interleaved, thendb(DgmU,DgmV) ≤ 3ε. Analogous results hold for multiplicative setting. Finally, using a similar packing argument as before, one can also show that the total number ofk-simplices that ever appear in the simplicial-map based sparsificationbSis linear inn(assuming that k and the dimensiondare both constant). To summarize:

Theorem 6.8. Given a set of n points P ⊂ Rd, we can3 log(1+ε)-approximate the persistence diagram of the discrete Rips filtration in Eqn. (6.12) by that of the filtration in Eqn. (6.13) at the log-scale. The number of k-simplices that ever appear in the filtration in Eqn. (6.13) is O((1ε)O(kd)n).

Im Dokument Computational Topology for Data Analysis (Seite 165-173)