Approximation via data sparsification - Topological Analysis of Point Clouds

Topological Analysis of Point Clouds

6.2 Approximation via data sparsification

One issue with using the Vietoris-Rips or ˇCech filtrations in practice is that their sizes can become huge, even for moderate number of points. For example, when the scale r is larger than the diameter of a point setP, the ˇCech and the Vietoris-Rips complexes ofPcontain every simplex

(a) (b) (c)

Figure 6.2: Vietoris-Rips complex: (b) at small scale, the Rips complex of points shown in (a) requires the two white points; (c) the two white points become redundant at larger scale.

spanned by points in P, in which case the size ofd-skeleton ofC^r(P) orVR^r(P) isΘ(n^d⁺¹) for n=|P|.

On the other hand, as shown in Figure 6.2, as the scale r increases, certain points could become “redundant”, e.g, having no or little contribution to the underlying space of the union of allr-radius balls. Based on this observation, one can approximate these filtrations withsparsified filtrationsof much smaller size. In particular, as the scalerincreases, the point setPwith which one constructs a complex is gradually sparsified keeping the total number of simplicies in the complex linear in the input size ofPwhere the dimension of the embedding space is assumed to be fixed.

We describe two data sparsification schemes in Sections 6.2.1 and 6.2.2, respectively. We focus on the Vietoris-Rips filtration for points in a Euclidean spaceR^dequipped with the standard Euclidean distanced.

6.2.1 Data sparsification for Rips filtration via reweighting

Most of the concepts presented in this section apply to general finite metric spaces though we describe them for finite point sets equipped with an Euclidean metric. The reason for this choice is that the complexity analysis draws upon the specific property of Euclidean space. The reader is encouraged to think about generalizing the definitions and the technique to other metric spaces.

First we restate the definition of δ-sample and δ-sparse sample in Definition 2.17 slightly differently.

Definition 6.4 (Nets and net-tower). Given a finite set of points P ⊂ (R^d,d) and γ, γ⁰ ≥ 0, a subsetQ⊆Pis a (γ, γ⁰)-net of Pif the following two conditions hold:

Covering condition: Qis aγ-sample for (P,d), i.e., for everyp∈P,d(p,Q)≤γ.

Packing condition: Qis alsoγ-sparse, i.e., for everyq,q⁰∈Q,d(q,q⁰)≥γ⁰. Ifγ =γ⁰, we also refer toQas aγ-net of P.

A single-parameter family of nets{N_γ}_γ is called anet-tower ofP if (i) there is a constant c>0 so that for allγ∈R,N_γis a (γ, γ/c)-net forP, and (ii)N_γ ⊇N_γ⁰for anyγ≤γ⁰.

Intuitively, aγ-net approximates a PCD Pat resolution γ (Covering condition), while also being sparse (Packing condition). A net-tower provides a sequence of increasingly sparsified approximation ofP.

Net-tower via farthest point sampling. We now introduce a specific net-tower constructed via the classical strategy offarthest point sampling, also called greedy permutation e.g. in [56, 70].

Given a point set P ⊂ (R^d,d), choose an arbitrary point p₁ fromPand set P₁ = {p₁}. Pick pi

recursively as pi ∈argmax_p∈P\P_i−1d(p,P_i−1)¹, and setPi = P_i−1∪ {pi}. Now settp_i =^d(pi,P_i−1), which we refer to as theexit-time of p_i. Based on these exit-times, we construct the following two families of sets:

In what follows, we discuss a sparsification strategy for the Rips filtration ofPusing the above net-towers. The approach can be extended to other net-towers, such as the net-tower constructed using the net-tree data structure of [182].

Weights, weighted distance, and sparse Rips filtration. Given the exit-timet_ps for all points p∈P, we now associate aweight wp(α) for each pointpat a scaleαas follows (the graph of this weight function is shown on the right): for some constant 0< ε <1,

wp(α)=

Claim 6.1. The weight function w_pis a continuous,1-Lipschitz, and non-decreasing function.

The parameter εcontrols the resolution of the sparsification. The net-induced distance at scaleαbetween input points is defined as:

bd_α(p,q) :=^d(p,q)+wp(α)+wq(α). (6.8) Definition 6.5(Sparse (Vietoris-)Rips). Given a set of pointsP ⊂R^d, a constant 0< ε <1, and the open net-tower{N_γ}as well as the closed net-tower{N_γ}forPas introduced above, theopen sparse-Rips complex at scaleαis defined as

Q^α:={σ⊆N_ε(1−ε)α| ∀p,q∈σ, bd_α(p,q)≤2α}; (6.9)

1Note that there may be multiple points that maximized(p,Pi−1) making argmax_p∈P\P

i−1d(p,Pi−1) a set. We can choosepito be any point in this set.

while theclosed sparse-Rips at scaleαis defined as

Q^α:={σ⊆N_ε(1_−ε)α| ∀p,q∈σ, bd_α(p,q)≤2α}. (6.10) SetS^α:=∪_β≤αQ^α, which we call thecumulative complex at scaleα. The(ε-)sparse Rips filtration then refers to theR-indexed filtrationS={S^α,→S^β}_α≤β.

Obviously,Q^α⊆Q^α. Note that forα < β,Q^αis not necessarily included inQ^β(neither isQ^α inQ^β); while the inclusionS^α ⊆S^βalways holds.

In what follows, we show that the sparse Rips filtration approximates the standard Vietoris-Rips filtration{VR^r(P)}defined overP, and that the size of the sparse Rips filtration is only linear innfor any fixed dimensiondwhich is assumed to be constant. The main results are summarized in the following theorem.

Theorem 6.4. Let P⊂R^dbe a set of n points where d is a constant, andR(P)={VR^r(P)}be the Vietoris-Rips filtration over P. Given net-towers{N_γ}and{N_γ}induced by exit-times{t_p}_p∈P, let S(P)={S^α}be its correspondingε-sparse Rips filtration as defined in Definition 6.5. Then, for a fixed0< ε < ¹₃,

(i) S(P) andR(P)are multiplicatively _1−ε¹ -interleaved at the homology level. Thus, for any k ≥ 0, the persistence diagramDgm_kS(P)is alog_1−ε¹ -approximation ofDgm_kR(P)at the log-scale.

(ii) For any fixed dimension k ≥ 0, the total number of k-simplices ever appeared inS(P) is Θ((¹_ε)^kdn).

In the remainder of this section, we sketch the proof of the above theorem.

Proof of part (i) of Theorem 6.4. To relateS(P) toR(P), we need to go through a sequence of intermediate steps. First, we define therelaxed Rips complexat scaleαas

VRc^α(P) :={σ⊂P| ∀p,q∈σ, bd_α(p,q)≤2α}.

The following claim ensures that the relaxed Rips complexes form a valid filtration connected by inclusionsbR(P)={VRc^α(P),→VRc^β(P)}α≤β,which we call therelaxed Rips filtration.

Claim 6.2. Ifbd_α(p,q)≤2α≤2β, thenbd_β(p,q)≤2β.

Proof. The weight functionwpis 1-Lipschitz for any p∈P(Claim 6.1). Thus we have that bd_β(p,q)=^d(p,q)+w_p(β)+w_q(β)

≤d(p,q)+w_p(α)+(β−α)+w_q(α)+(β−α)

≤d(p,q)+wp(α)+wq(α)−2α+2β ≤2β.

The last inequality follows fromd(p,q)+wp(α)+wq(α)=bd_α(p,q)≤2α.

In what follows, we drop the argumentPfrom notations such as in complexesVR^α(P) or in sparse Rips filtrationS(P) when the point set in question is understood.

Proposition 6.5. Let C= ₁¹_−ε. Then for anyα≥0we have thatVR^α/C ⊆VRc^α⊆VR^α.

Next, we relate filtrationsSandRvia the relaxed Rips filtrationbRby connecting the sparse Rips complexesQ^αs andQ^αs. Consider the following projection ofPto points in the netN_ε(1−ε)α which are also vertices ofQ^α:

π_α(p)=











p ifp∈N_ε(1−ε)α

argmin_q∈N_εαd(p,q) otherwise

Again, if argmin_q∈N_εαd(p,q) contains more than one point, we set πα(p) to be an arbitrary one.

This projection is well-defined as N_εα ⊆ N_ε(1−ε)α given that 0 < ε < 1/3 < 1. We need sev-eral technical results on this projection map, which we rely on later to construct maps between appropriate versions of Rips complexes. First, the following two results are easy to show.

Fact 6.1. For every p∈P,d(p, π_α(p))≤wp(α)−w_π_α_(p)(α)≤εα.

Fact 6.2. For every pair p,q∈P, we have thatbd_α(p, π_α(q))≤bd_α(p,q).

We are now ready to show that inclusion induces an isomorphism between the homology groups of the sparse Rips complex and the relaxed Rips complex.

Proposition 6.6. For anyα ≥ 0, the inclusion i : Q^α ,→ VRc^α induces an isomorphism at the homology level; that is,H_∗(Q^α)^H∗(VRc^α)under the homomorphism i∗induced by i.

Proof. First, we consider the projection map πα and argue that it induces a simplicial map πα : VRc^α → Q^α which is in fact a simplicial retraction². Next, we show that the mapi◦πα : VRc^α →VRc^αis contiguous to the identity mapid :VRc^α →VRc^α. Asπ_αis a simplicial retraction, it follows thati_∗is an isomorphism (Lemma 2 of [275]).

To see thatπ_αis a simplicial map, apply Fact 6.2 twice to have that

bd_α(πα(p), πα(q))≤bd_α(p, πα(q))≤bd_α(p,q). (6.11) Since bothQ^αandVRc^αare clique complexes, this then implies thatπ_αis a simplicial map. Fur-thermore, it is easy to see that it is a retraction as πα(q) = qfor any q in the vertex set ofQ^α (which isN_ε(1−ε)α).

Now to show thati◦π_αis contiguous toid, we observe that for anyp,q∈Pwithbd_α(p,q)≤2α, all edges among {p,q, π_α(p), π_α(q)} exist and thus all simplices spanned by them exist inVRc^α. Indeed, thatbd_α(π_α(p), π_α(q)) ≤ 2αis already shown above in Eqn (6.11). Combining Fact 6.1 with the fact thatw_p(α)≤εα, we have that

bd_α(p, π_α(p))=^d(p, π_α(p))+wp(α)+w_π_α_(p)(α)≤2wp(α)≤2εα <2α.

Furthermore, by Fact 6.2,bd_α(p, π_α(q))≤bd_α(p,q) ≤2α. Symmetric arguments can be applied to show thatbd_α(q, πα(q)),bd_α(q, πα(p)) ≤ 2α. This establishes thati◦πα is contiguous toid. This

proves the proposition.

2A simplicial retractionf:K→Lis a simplicial map fromK⊆LtoLso that f(σ)=σfor anyσ∈K.

The closed sparse-Rips complexQ^αis the relaxed Rips complex over the vertex setN_ε(1−ε)α, which is a superset of the vertex set of Q^α. Hence the above proposition also holds for the inclusionQ^α ,→Q^α. It then follows thatH_∗(Q^α) ^H∗(Q^α). Finally, we show that the inclusion also induces an isomorphism betweenH∗(Q^α) andH∗(S^α), which when combined with the above results connectsS^αandVRc^α.

Proposition 6.7. For anyα ≥ 0, the inclusion h : Q^α ,→ S^α induces an isomorphism at the homology level, that is,H∗(Q^α)^H∗(S^α)under h∗.

Proof. Consider the sequence{S^α}_α∈R. First, we discretizeαto have distinct valuesα₀ < α₁ <

α₂. . . αmso that S^α⁰ = ∅, andαis are exactly the time when the combinatorial structure of S^α changes. AsS^α=P

β≤αQ^β, these are also exactly the moments when the combinatorial structure ofQ^αchanges. Hence we only need to prove the statement for suchαi’s, and it will then work for allα’s. Setλi:=ε(1−ε)αi. Note that the vertex set forQ^αⁱ isN_λ_i by the definition ofQ^αin Eqn (6.10).

Now fix ak ≥0. We will show thath:Q^α^k ,→S^α^k induces an isomorphism at the homololgy level. We use some intermediate complexes

T_i,k :=

[

j=i

Q^α^j, fori∈[1,k].

Obviously,T_1,k =^S^α^k whereT_k,k=^Q^α^k. Sethi:Ti+1,k ,→T_i,k. The inclusionh:Q^α^k ,→S^α^k can then be written ash=h₁◦h₂◦ · · · ◦h_k−1. In what follows, we prove thath_i :T_i_+1,k ,→T_i,kinduces an isomorphism at the homology level for eachi∈[1,k−1], which then proves the proposition.

First, note that whileT_i,k is not necessarily the same asQ^αⁱ, they share the same vertex set.

Now, because of our choices ofαis andλis, the vertex set ofT_i₊_1,k, which is the vertex set ofQ^αⁱ⁺¹, namely N_λ_i+1, equalsN_λ_i. Hence we can consider the projectionπ_α_i : T_i,k →Ti+1,kgiven by the projection of the vertex setN_λ_i−1 =N_λ_iofT_i,kto the vertex setN_λ_i =N_λ_i+1 ofTi+1,k. To prove that h_i induces an isomorphism at the homology level, by Lemma 2 of [275], it suffices to show that (i)π_α_i is a simplicial retraction, and (ii)h_i◦π_α_i is contiguous to the identity mapid:T_i,k →T_i,k.

To prove (i), it is easy to verify thatπ_α_i is a retraction. To see thatπ_α_i induces a simplicial map, we need to show that for everyσ∈ T_i,k,παi(σ) ∈T_i₊_1,k. Asπαi is a retraction, we only need to prove this for everyσ∈T_i,k\Ti+1,k. On the other hand, note that by definition,T_i,k\Ti+1,k ⊆Q^αⁱ. To this end, the argument in Proposition 6.6 also shows thatπ_α_i :Q^αⁱ →Q^αⁱ is a simplicial map, and furthermore,h⁰◦παi is contiguous toid⁰ : Q^αⁱ → Q^αⁱ, whereh⁰ :Q^αⁱ ,→ Q^αⁱ. Because of our choice ofαis, Q^αⁱ andQ^αⁱ⁺¹ have the same vertex set, which isN_λ_i. Furthermore, for every edge (p,q) ∈ Q^αⁱ, we have thatbd_α_i(p,q) ≤ 2αi. As αi < αi+1, it follows from Claim 6.2 that bd_α_i+1(p,q) ≤ 2αi+1. Hence, the edge (p,q) is in Q^αⁱ⁺¹. This implies thatQ^αⁱ ⊆ Q^αⁱ⁺¹. Putting everything together, it follows that, for everyσ∈T_i,k\T_i_+1,k ⊆Q^αⁱ, we have

π_α_i(σ)∈Q^αⁱ ⊆Q^αⁱ⁺¹ ⊆Ti+1,k. Therefore,π_α_i is a simplicial map. This finishes the proof of (i).

Now we prove (ii), that is,hi◦π_α_i is contiguous to the identity mapid : T_i,k → T_i,k. This means that we need to show for everyσ∈T_i,k,σ∪π_α_i(σ)∈T_i,k. Again, asπ_α_i is a simplicial re-traction, we only need to show this forσ∈T_i,k\T_i_+1,k ⊆Q^αⁱ. As mentioned above, using the same argument as in Proposition 6.6, we know thath⁰◦π_α_i is contiguous to the identityid⁰:Q^αⁱ →Q^αⁱ. Hence we have that for everyσ ∈ Q^αⁱ,σ∪π_α_i(σ) ∈ Q^αⁱ. It follows that σ∪π_α_i(σ) ∈ T_i,k as Q^αⁱ ⊆T_i,k. This proves (ii) completing the proof of the proposition.

Combining Propositions 6.6 (as well as the discussion after this proposition) and 6.7, we have that{S^α}and{VRc^α}induces isomorphic persistence modules. This, together with Proposition 6.5, implies part (i) of Theorem 6.4.

Proof of part (ii) of Theorem 6.4. LetS^(k)denote the set ofk-simplices ever appeared inS(P), which is also the set ofk-simplices in the last complexS^∞ofS(P). To bound the size ofS^(k), we charge each simplex inS^(k)to the vertex of it withsmallestexit-time. Observe that a pointp∈P does not contribute to any new edge in the sparse Rips complex Q^βforβ > _ε(1^t^p_−ε). This means that to bound the number of simplices charged top, we only need to bound such simplices inQ^α^p withαp= _ε(1−ε)^t^p .

Set E(p) = {q ∈ P | (p,q) ∈ Q^α^p and tp ≤ tq}. We add p to E(p) too. We claim that

|E(p)| = O((¹_ε)^d). In particular, consider the closed net-tower{N_γ}; recall thatN_γ is aγ-net. As E(p) ⊆ N_t_p, the packing-condition of the net implies that the closest pair in E(p) has distance at least tp between them. On the other hand, for each (p,q) ∈ Q^α^p, we havebd_α_p(p,q) ≤ 2αp

implying that theE(p)⊆ B(p,2αp). A simple packing argument then implies that the number of points inE(p) is

The last equality follows becauseε <1/3 and thus 1−ε≥2/3. The total number ofk-simplices charged to p is bounded by O((¹_ε)^kd), and the total number k-simplices in S(P) is O((¹_ε)^kdn), proving part (ii) of Theorem 6.4.

6.2.2 Approximation via simplicial tower

We now describe a different sparsification strategy by directly building a simplicial tower of Rips complexes connected by simplicial maps (Definition 4.1 and the discussion below it) whose per-sistent homology also approximates that of the standard Rips-filtration. This sparsification is con-ceptually simpler, but its approximation quality is worse than the one introduced in the previous section.

Given a set of pointsP⊂ R^d,α >0, and some 0 < ε <1, we are interested in the following filtration (which is a subsequence of the standard Rips filtration)

VR^α(P),→VR^α(1⁺^ε)(P),→VR^α(1⁺^ε)²(P),→ · · ·,→VR^α(1⁺^ε)^m(P). (6.12) We now construct a sparsified sequence by settingP₀ := P, building a sequence of point sets Pk,k= 0,1, . . . ,mwherePk+1 is a^αε₂(1+ε)^k−1-net ofPk, and terminating the process whenPm

is of constant size.

Consider the following vertex mapπk :Pk →Pk+1, for anyk∈[0,m−1], whereπk(v) is the

Here, the maps iks and jks are canonical inclusions.

The above result implies that at the homology level, the sequence in Eqn (6.13) and the se-quence Eqn. (6.12) areweakly(1+ε)-interleavedin a multiplicative manner. In particular, dif-ferent from the interleaving introduced by Definition 4.4 in Chapter 4.1, here the interleaving relations only hold atdiscrete index valuesof the filtrations.

Definition 6.6 (Weakly interleaving of vector space towers). Let U = Ua resolutiona₀≥0. For some real numberε≥0, we say that they areweaklyε-interleavedif there are two families of linear maps φi : Ua₀+iε → Va₀+(i+1)ε, and ψi : Va₀+iε → Ua₀+(i+1)ε, for any integeri≥0, such that any subdiagram of the following diagram commutes:

U: Ua₀ // ##

It turns out that to verify the commutativity of the diagram in Eqn. (6.14), it is sufficient to verify it for all subdiagrams of the form as in Eqn. (4.3). Furthermore, ε-weakly interleaved persistence modules also have bounded bottleneck distances between their persistence diagrams [77] though the distance bound is relaxed to 3ε, that is, if Uand V are weakly-ε interleaved, thendb(DgmU,DgmV) ≤ 3ε. Analogous results hold for multiplicative setting. Finally, using a similar packing argument as before, one can also show that the total number ofk-simplices that ever appear in the simplicial-map based sparsificationbSis linear inn(assuming that k and the dimensiondare both constant). To summarize:

Theorem 6.8. Given a set of n points P ⊂ R^d, we can3 log(1+ε)-approximate the persistence diagram of the discrete Rips filtration in Eqn. (6.12) by that of the filtration in Eqn. (6.13) at the log-scale. The number of k-simplices that ever appear in the filtration in Eqn. (6.13) is O((¹_ε)^O(kd)n).

Im Dokument Computational Topology for Data Analysis (Seite 165-173)