• Keine Ergebnisse gefunden

Proof. i) It is a consequence of the same inequality for W˜p which in turn is true because it is (by our identification withWˆp) a Kantorovich-Wasserstein metric, for which this inequality is but an application of Hölder’s inequality. Indeed, for 1 ≤ p≤q, Hölder’s inequality gives both,P˜q(Y|X)⊂P˜p(Y|X)and

p(σ, τ)≤W˜q(σ, τ)

for everyσ, τ ∈P˜q(Y|X). Hence, it follows that the same is true when taking infima, so that

Wp0(µ, ν) = inf

σ,τ∈P˜p(Y|X) σ0=µ,τ0

p(σ, τ)≤ inf

σ,τP˜q(Y|X) σ0=µ,τ0

q(σ, τ) =Wq0(µ, ν).

Now by the same reasoning, we see that alsoWp[(µ, ν)≤Wq[(µ, ν), because

n

X

i=1

Wp0i−1, ηi)≤

n

X

i=1

Wq0i−1, ηi)

andPqsub(Y)⊂ Ppsub(Y). Finally, the case Wp] follows again in the same way.

ii) This is again true due to the same result for Kantorovich-Wasserstein metrics.

In the case of bounded spaces, all the spacesP˜p(Y|X) for every possible p coincide, and so do the Ppsub(Y)’s, so that we can argue in the same way as in part i), given the starting point

q≤W˜

p q

p diam(X)

q−p q

for 1≤p≤q.

4.4 Induced Length Metric: Comparison

Also here we continue to assume thatX is a length space.

Just like the identification of W˜p with the Kantorovich-Wasserstein metric Wˆp on the doubled space gave us immediately some properties about the space of charged probabilities, it will now be convenient to compare the previously defined metrics Wp[ and Wp] to a Kantorovich-Wasserstein metric on the one-point completion we introduced in Subsection 2.2. We will do so by providing similar representation formulas as the one in Lemma 4.2.6 for Wp0. To state them, let us introduce some more auxiliary transport cost functions.

Definition 4.4.1. i) Wp0 will denote theLp-Kantorovich-Wasserstein distance on Pp(Y0) induced by the distanced0.

ii) Extending each subprobability measure µ∈ Ppsub(Y)to a probability measure µ0∈ Pp(Y0)byµ0 :=µ+ (1−µ(Y))δinduces a bijection betweenPpsub(Y)and Pp(Y0)(see the lemma below). The induced distance onPpsub(Y)will again be denoted byWp0, i.e.

Wp0(µ, ν) :=Wp00, ν0).

iii) For subprobability measures µ, ν with equal mass µ(Y) = ν(Y) we will also make use of the transportation cost

Wp(µ, ν)p := inf

q∈Cpl(µ,ν)

ˆ

Y×Y

d(x, y)pdq(x, y) (4.4.1) induced byd (which was defined in (2.2.1)).

iv) For a subprobabilityµ∈ Psub(Y)define Wp0(µ,0)p :=Wp0(µ, δ)p=

ˆ

Y

d0(x, ∂)pdµ(x), (4.4.2) with 0 denoting the subprobability measure with vanishing total mass.

Lemma 4.4.2. i) The map

Ppsub(Y)→ Pp(Y0), µ7→µ0 :=µ+ (1−µ(Y))δ

is a bijection, and it is an isometry when we equip both spaces withWp0. ii) If Y is additionally totally bounded, Wp0 metrizes the vague convergence in

Ppsub(Y).

Proof. i) The map is clearly a bijection with inverse Pp(Y0) 3 µ0 7→ µ := µ0|Y ∈ Ppsub(Y). By definition ofWp0 on Ppsub(Y) it is an isometry.

ii) Let us show that Wp0 metrizes the vague convergence in Ppsub(Y). Given a vaguely converging sequenceµn→µ inPpsub(Y), defineµ0n:=µn+ (1−µn(Y))δ ∈ Pp(Y0). This is a sequence of probability measures on a compact space; hence, for every subsequence, Prokhorov’s theorem provides a converging further subsequence.

Since the restriction of all these limits to Y has to coincide with µ, the whole sequenceµ0n converges weakly toµ0 :=µ+ (1−µ(Y))δ, so that Wp00n, µ0)→0.

Then alsoWp0n, µ)→0.

Assume conversely that Wp0n, µ)→0. By definition this means that we have convergence Wp00n, µ0) → 0, which in turn assures that µ0n → µ0 weakly in Y0. Then the restrictions toY converge vaguely.

Remark 4.4.3. Note that without the assumption of total boundedness the vague con-vergence inY wouldnotimply the weak convergence of the corresponding probability measures onY0 since they could lose mass at infinity instead of at the boundary.

Remark 4.4.4. One could equally well define Wp00(µ, ν) := inf{Wp0(ˇµ,ν)ˇ

µ,ˇ νˇ∈ M(Y0),µ|ˇY =µ,ν|ˇY =ν}.

For p = 1 the metrics W10 and W100 coincide, but for p > 1 this is no longer true.

This is due to the fact that(d(x, ∂) +d(y, ∂))p =d(x, ∂)p+d(y, ∂)p only for p= 1.

Intuitively speaking, for p > 1, it makes a difference if we transport mass through the boundary point, or to it – however, for the latter we need to allow for masses bigger than 1. Take for instance X = R, Y = (−3,3) and µ = δ−2, ν =δ2. Then µ0 = µ and ν0 = ν, so that Wp0(µ, ν)p = d0(−2,2)p = 2p, whereas Wp00(µ, ν)p ≤ Wp0(µ+δ, ν+δ)p =d0(−2, ∂)p+d0(2, ∂)p = 2.

The metricW200 coincides with Figalli & Gigli’s metricW b2 [FG10].

4.4 Induced Length Metric: Comparison

We start by characterizing the metric Wp0 in terms of Lptransportation and -annihilation costs.

Lemma 4.4.5. For all µ, ν ∈ Ppsub(Y) Wp0(µ, ν)p= infn

Wp1, ν1)p+Wp2, ν2)p+Wp00,0)p+Wp00,0)p µ=µ120, ν=ν120, (µ+ν0)(Y)≤1,

(ν+µ0)(Y)≤1 o

. (4.4.3) In the casep= 1, contributions from the term Wp2, ν2)p can be avoided, in other words, one can always choose µ22 = 0.

Proof. The derivation of this formula is straightforward. The transport decomposes into trivial transports within ∂ (which do not appear in the formula), transports betweenY and∂(given byWp00,0)p+Wp00,0)p), and transports withinY, and the latter ones into transports usingdandd(given byWp1, ν1)p+Wp2, ν2)p). One can construct these decompositions more explicitly like in the proof of Lemma 4.2.5.

The resulting couplings are still optimal between their marginals. The inequalities in the constraints are due to the fact that we compose theprobability measuresµ0, ν0 instead of the subprobabilitiesµ, νand the trivial transport within∂can be omitted.

For the vanishing of the Wp-term note that in the case p = 1 one has [d0(x, ∂) + d0(x, ∂)]p = d0(x, ∂)p +d0(x, ∂)p, meaning that the term can be absorbed in the annihilation terms Wp00,0)p+Wp00,0)p.

The following lemma discusses the connection between our twoannihilation costs Wp0 andWp.

Lemma 4.4.6. i) For all µ, ν∈ P1(Y) W1(µ, ν) = infn

W1(µ, ξ) +W1(ξ, ν)

ξ∈ P(∂Y)o . ii) For allp≥1 and all µ∈ Pp(Y)

Wp0(µ,0) = inf

Wp(µ, ξ)

ξ∈ P(∂Y) . iii) For allp≥1 and all µ∈ Pp(Y)

2−1+1/pWp0(µ,0)≤Wp(µ)≤Wp0(µ,0).

In particular,W1(µ) =W10(µ,0).

Proof. i) By triangle inequality, we have that for everyξ ∈ P(∂Y) W1(µ, ν)≤W1(µ, ξ) +W1(ξ, ν).

Making use of Lemma 4.1.5, we consider the measures as given on the different copies, µ ∈ P(Y+) and ν ∈ P(Y). Take now a Wˆ1-optimal coupling q ∈ Cpl(µ, ν) ⊂ P(Y+×Y). Let ε >0 and

Gε(x, y) :={γ ∈C0([0,1],X)ˆ

γ0 =x, γ1 =y,|L(γ)−d(x, y)| ≤ε}

be the set of ε-geodesics in Xˆ connecting x and y. Given a curve γ in Xˆ with γ0 ∈ Y+ and γ1 ∈ Y, define α(γ) := inf{s > 0|γs 6∈ Y+} and z(γ) := γα(γ). Thenz(γ)∈∂Y, and given a measurable selection Γε: ˆX×Xˆ →C0([0,1],X)ˆ with Γε(x, y)∈Gε(x, y)(which exists by our measurable selection Lemma 2.5.6), we define the “boundary crossing points” Z := z◦Γε:Y+×Y →∂Y. Using the projection pr1: ˆX×Xˆ →X,ˆ (x, y)7→x, we get a map

(pr1,Z) : Y+×Y →Y+×∂Y

and define the push-forward measureQ1 := (pr1,Z)#q∈ P(Y+×∂Y).

Let us check that this is a coupling between µand ξ :=Z#q ∈ P(∂Y): Given a measurable setA⊂Y+,

(pr1,Z)−1(A×∂Y) ={(x, y)∈Y+×Y

(pr1(x, y),Z(x, y))∈A×∂Y}

={(x, y)∈Y+×Y

(x,Z(x, y))∈A×∂Y}

=A×Y,

which yields that Q1(A×∂Y) = q(A×Y) = µ(A). On the other hand, given B ⊂∂Y measurable,

(pr1,Z)−1(Y+×B) ={(x, y)∈Y+×Y

(pr1(x, y),Z(x, y))∈Y+×B}

={(x, y)∈Y+×Y

(x,Z(x, y))∈Y+×B}

=(Y+×Y)∩ Z−1(B)

=Z−1(B),

hence in this case we haveQ1(Y+×B) =q(Z−1(B)) =ξ(B). Analogously one sees thatQ2 := (Z,pr2)#q as a coupling betweenξ andν.

Now what is left to prove is that ξ is an “almost-midpoint”. Since fory∈∂Y we haved(x, y) =d(x, y), together with Lemma 2.1.7 we get that

W1(µ, ξ)≤ ˆ

Y+×∂Y

d(x, y) dQ1(x, y)

= ˆ

Y+×Y

d(pr1(x, y),Z(x, y)) dq(x, y)

= ˆ

Y+×Y

d(x,Γε(x, y)t|t=α(Γε(x,y))) dq(x, y)

≤ ˆ

Y+×Y

α(Γε(x, y))d(x, y) +εdq(x, y).

4.4 Induced Length Metric: Comparison

Using in the same wayQ2 as a coupling betweenξ andν, we finally get that W1(µ, ξ) +W1(ξ, ν)≤

ˆ

Y+×Y

α(Γ(x, y))d(x, y) +εdq(x, y) +

ˆ

Y+×Y

(1−α(Γ(x, y)))d(x, y) +εdq(x, y)

= ˆ

Y+×Y

d(x, y) + 2εdq(x, y)

=W1(µ, ν) + 2ε.

ii) Given an arbitrary ξ ∈ P(∂Y) and a Wp-optimal coupling q ∈ Cpl(µ, ξ), we get

Wp(µ, ξ)p = ˆ

X×X

d(x, y)pdq(x, y)≥ ˆ

X×X

d0(x, ∂)pdq(x, y) =Wp0(µ,0)p. For the other inequality, similarly as in part i), we will define a map and use its push-forward measure. Givenx∈Y, let

Gε(x) :=

n z∈∂Y

|d(x, z)−d0(x, ∂)| ≤ε o

be the boundary points closest tox. Let us show that the graph of the multivalued mapGε is closed: Take a sequence(xn, zn) withzn∈Gε(x) that converges to(x, z) inX×X. Then z∈∂Y by the closedness of the boundary, and

|d(x, z)−d0(x, ∂)|= lim

n→∞|d(xn, zn)−d0(xn, ∂)| ≤ε,

hencez∈Gε(x). Thus we can apply the measurable selection Theorem 2.5.4 and get a measurable functionΦε:Y →∂Y such thatΦε(x)∈Gε(x). Then for the measure ξε:= (Φε)#µ we see that

Wp(µ, ξε)p≤ ˆ

X

d(x,Φε(x))pdµ(x)≤ ˆ

X

d0(x, ∂) +εp

dµ(x).

First of all observe that the moment bound of µ implies that also the d0-moment

´d0(·, ∂)pdµ is finite. Since µ is a probability measure, constant functions are in-tegrable, thus also the sum d0(·, ∂) +ε is in Lp(µ). This sum converges pointwise to d0(·, ∂) asε→0, and it is dominated byd0(·, ∂) + 1∈Lp(µ). By the dominated convergence theorem we get convergence inLp(µ)asε→0, i.e.

ε→0limWp(µ, ξε)p ≤ ˆ

X

d0(x, ∂)pdµ(x) =Wp0(µ,0).

iii) The triangle inequality for d implies that Wp(µ, ξ) +Wp(ξ, µ) ≥ Wp(µ, µ) for all ξ ∈ P(∂Y). ThusWp0(µ,0)≥ 12Wp(µ, µ) =Wp(µ). An estimate in the other direction is obtained as follows

Wp(µ)p = 2−pWp(µ, µ)p = 2−p ˆ

X×X

z∈Xinf\Y d(x, z) +d(z, y) p

dq(x, y)

≥2−p ˆ

X×X

z∈Xinf\Y d(x, z) + inf

w∈X\Y d(w, y) p

dq(x, y)

≥21−p ˆ

X×X

z∈X\Yinf d(x, z) p

dq(x, y)

= 21−pWp0(µ,0)p,

whereq denotes any Wp-optimal coupling ofµ andµ.

Remark 4.4.7. In general,Wp(µ)andWp0(µ,0)will not coincide. Our lower bound for Wp(µ)/Wp0(µ,0)is sharp. For instance, let Y = (0,2)⊂X =R and µ= 121ε) for some ε ∈(0,1). Then Wp0(µ,0)p = 12(d0(1, ∂)p+d0(ε, ∂)p) = 12(1 +εp) whereas Wp(µ)p = 1+ε2 p

. Thus Wp(µ)

Wp0(µ,0) = 2−1+1p 1 +ε (1 +εp)1p

−→2−1+1p asε→0.

Now we are in position to compareWp[, Wp], Wp0. It turns out that for p= 1 they all coincide.

Theorem 4.4.8. i) For all µ, ν∈ P1sub(Y)

W1[(µ, ν) =W1](µ, ν) =W10(µ, ν).

ii) More generally, for all p≥1 and all µ, ν∈ Ppsub(Y)

W10(µ, ν)≤Wp[(µ, ν)≤Wp](µ, ν)≤Wp0(µ, ν).

In particular,Wp[, Wp], Wp0 do not vanish outside the diagonal.

Proof. i) According to Lemma 4.4.5 and Lemma 4.4.6, W10(µ, ν) = inf

n

W11, ν1) +W10) +W10)

µ=µ10, ν =ν10

o (4.4.4) for all subprobability measures µ, ν ∈ P1sub(Y). Together with Lemma 4.2.6i) this implies W10(µ, ν) ≤ W10(µ, ν). As W1[ is the biggest metric below W10, we have W10 ≤W1[. Using the fact thatW10 is a length metric, this yields

W1](µ, ν) = inf

η:µ ν W1[-cont.

sup

0=s0<...<sn=1 n

X

i=1

W1[si−1, ηsi)

≥ inf

η:µ ν W1[-cont.

sup

0=s0<...<sn=1 n

X

i=1

W10si−1, ηsi)

≥ inf

η:µ ν W10-cont.

sup

0=s0<...<sn=1 n

X

i=1

W10si−1, ηsi) =W10(µ, ν).

4.4 Induced Length Metric: Comparison

SinceW1] is the length metric induced by W1[, one gets W1[≤W1].

Now we are going to show that W1] ≤ W10. To do so, starting from an almost-geodesic in the representation ofW10 given by (4.4.4) we define a new curve connecting µandν and estimate itsW1[-length by using a clever decomposition in the represen-tation formula forW10 given by Lemma 4.2.6ii).

Letε >0 and take a decomposition µ=µ10, ν =ν10 in (4.4.4) such that W10(µ, ν) +ε≥W11, ν1) +W10) +W10).

Then we take an ε-W1-geodesic (ηs,1)s∈[0,1] connecting µ1 and ν1 that is supported onε-geodesics in Y. Define

˜ η0s,0 :=

((1−2s)µ0+ 2sµ0(Y0, s∈[0,12] (2s−1)ν0+ 2(1−s)µ0(Y0, s∈(12,1].

This is a curve connecting µ0 and ν0. Take the restrictionη˜s,0 := ˜ηs,00 |Y and define

˜

ηs:=ηs,1+ ˜ηs,0,

which is a curve connecting µ and ν. To estimate the W1[-length of the restricted curveη˜s it is useful to get a bound onW10(˜ηs,η˜t). Consider the case 0≤s≤t≤ 12: We can rewrite

˜

ηs= (ηs,1+ ˜η0t,0)

| {z }

“µ1”in Lemma 4.2.6

+ 2(t−s)µ0

| {z }

“µ0

and η˜t= (ηt,1+ ˜ηt,00 )

| {z }

“ν1

+ 0

|{z}

“ν0

.

This is an admissible decomposition in Lemma 4.2.6ii) as for instance

“(µ+ν0)(X)” = (ηs,1+ ˜ηs,0+ 0) (X) = (ηs,1+ (1−2s)µ0)(X)

≤(η0,10)(X)≤µ(X)≤1,

and similarly for “(ν +µ0)(X)”. Thus, in this case the representation given by Lemma 4.2.6ii) yields

W10(˜ηs,η˜t)≤ W1s,1+ ˜ηt,0, ηt,1+ ˜ηt,0) +W1(2(t−s)µ0)

= W1s,1, ηt,1) +W1(2(t−s)µ0)

= |t−s|W10,1, η1,1) +|t−s|ε+ 2|t−s|W10),

where we made use of the translation invariance of the Kantorovich-Wasserstein metric forp= 1, see (2.5.3).

In the case 12 ≤s≤t≤1 we analogously rewrite

˜

ηs= (ηs,1+ ˜ηs,0)

| {z }

“µ1

+ 0

|{z}

“µ0

and η˜t= (ηt,1+ ˜ηs,0)

| {z }

“ν1

+ 2(t−s)ν0

| {z }

“ν0

,

and end up with

W10(˜ηs,η˜t)≤ |t−s|W10,1, η1,1) +|t−s|ε+ 2|t−s|W10).

To compute the length of the curve η˜s, we enforce the partitions to visit the time step 12, and then use the above estimates for W10:

L[1(˜η) = sup ( n

X

k=0

W1[(˜ηsi−1,η˜si)

n∈N,0 =s0 <· · ·<n= 1 )

= sup ( n

X

k=0

W1[(˜ηsi−1,η˜si)

n∈N,0 =s0 < si = 1

2 <· · ·<n= 1 )

≤sup ( n

X

k=0

W10(˜ηsi−1,η˜si)

n∈N,0 =s0 < si = 1

2 <· · ·<n= 1 )

≤supn X

si12

|si−si−1|W10,1, η1,1) +|si−si−1|ε+ 2|si−si−1|W10)

+ X

si1

2

|si−si−1|W10,1, η1,1) +|si−si−s|ε+ 2|si−si−1|W10) o

= 1

2W10,1, η1,1) +1

2ε+W10) +1

2W10,1, η1,1) + 1

2ε+W10)

=W11, ν1) +ε+W10) +W10)

≤W10(µ, ν) + 2ε.

This finally yields W1](µ, ν) ≤ L[1(˜η) ≤ W10(µ, ν) + 2ε. Since ε was arbitrary, this provesW1]≤W10.

By the fact that W1[ is the biggest metric below W10 and we now know that W1] = W10 ≤W10, we also get W1[ ≥W1].

ii) Thanks to i) and Lemma 4.3.4 we know that W10 =W1[≤Wp[. Further, since Wp]is the length metric induced byWp[ we also haveWp[≤Wp]. Hence the only thing left to show is thatWp]≤Wp0.

The idea to do so is that locally (along a geodesic) the contribution of Wp is negligible, so that we can compare Wp0 and Wp[ on a small scale and then carry it over to the induced length metrics.

Let subprobabilities µ, ν be given as well as an ε-Wp0-geodesic (ηt0)t∈[0,1] connecting the measures µ0:=µ+ (1−µ(Y))δ and ν0 :=ν+ (1−ν(Y))δ. By the continuity of Wp0 and Wp with respect to weak convergence we can assume without loss of generality thatµand ν have compact supports and forα >0small

ηt(Y)≤1−α

for all t∈ (0,1). Again we use the notation that measures without primes are the restrictions to Y. We thus have ηt(∂) = 0, whereas η0t(∂) ≥α. Choose δ > 0 such that ηt(Bδ0(∂)) ≤ α2. Let Π be the probability measure on C0([0,1], Y0) supported on ε-geodesics such that η0t = (et)#Π and denote by L the essential supremum of d00, γ1) underΠ (which is finite thanks to the compact supports of µand ν). Let δ0:= Lδ.

4.4 Induced Length Metric: Comparison

We consider ηs and ηt for |s−t| ≤ δ0. Using that d(x, y)p ≥ d0(x, ∂)p +d0(y, ∂)p, we see that in the decomposition (4.4.3) it is actually cheaper to annihilate mass at the boundary:

Wp0s, ηt)p = inf n

Wps,1, ηt,1)p+Wps,2, ηt,2)p+Wp0s,0,0)p+Wp0t,0,0)p ηss,1s,2s,0, ηtt,1t,2t,0,

st,0)(Y)≤1,(ηts,0)(Y)≤1o

≥inf n

Wps,1, ηt,1)p+Wp0s,0s,2,0)p+Wp0t,0t,2,0)p ηss,1s,2s,0, ηtt,1t,2t,0,

st,0)(Y)≤1,(ηts,0)(Y)≤1 o

. Since W only occurs where d is smaller than d, its contribution comes from ε-geodesics inB0δ(∂), so that by our choice ofδwe know thatηs,2(Y) =ηs,2(Bδ0(∂))≤ α2 and the same forηt,2. Hence for α small enough we have (ηs+ (ηt,2t,0))(Y)≤1, so that ηs = ηs,1 + ˜ηs,0 with η˜s,0 := ηs,0s,2 is an admissible decomposition in (4.4.3). In particular, the above inequality is an equality. Note that we cannot use this trick fors= 0, t= 1because then the constraint might not be satisfied. Thanks to Lemma 4.4.6 we thus have

Wp0s, ηt)p ≥infn

Wps,1, ηt,1)p+Wp(˜ηs,0)p+Wp(˜ηt,0)p

ηss,1+ ˜ηs,0, ηtt,1+ ˜ηt,0,(ηs+ ˜ηt,0)(Y)≤1, (ηt+ ˜ηs,0)(Y)≤1

o

≥Wp0s, ηt)p ≥Wp[s, ηt)p.

Hence, the Wp0-length of the curve (ηt)t∈[0,1] dominates its Wp[-length. As this curve is an almost-geodesic forWp0, going to the induced length metrics this finally proves

Wp0(µ, ν) +ε≥Wp](µ, ν).

Sinceεwas arbitrary, the proof is finished.

Remark 4.4.9. As we have seen in the proof of part i), W10 ≤W10, and in particular W10 does not vanish outside the diagonal.

Let us give some simple examples illustrating Theorem 4.4.8.

Example 4.4.10. LetX=R, Y = (−1,1), µ=δx, ν =δy for x, y∈Y. Then Wp0(µ, ν) =Wp(µ, ν) =|x−y|,

and for everyp≥1

Wp0(µ, ν) =d0(x, y) = min{|x−y|,2− |x−y|}. (4.4.5) Hence, by the independence on p on the right-hand side of (4.4.5), Theorem 4.4.8 yields

Wp[(µ, ν) =Wp](µ, ν) = min{|x−y|,2− |x−y|}.

Example 4.4.11. LetX =R, Y = (−2,2), µ = 2n+11 δ−1/2, ν = 2n+11 δ+1/2 for n ∈N. Then

Wp0(µ, ν)p =Wp(µ, ν)p = 1 2n+ 1. Taking

σ := 1 2n+ 1

n

X

k=0

δ 2k 2n+11

2

, 1 2n+ 1

n

X

k=1

δ 2k 2n+11

2

!

and

τ := 1 2n+ 1

n

X

k=0

δ2k+1

2n+112, 1 2n+ 1

n−1

X

k=0

δ2k+1 2n+112

! , we see that

Wp0(µ, ν)p ≤W˜p(σ, τ)p= 1

2n+ 1 p

, so that

Wp[(µ, ν)≤Wp0(µ, ν)≤ 1

2n+ 1

<

1 2n+ 1

1p

=Wp0(µ, ν),

for p > 1, n ≥ 1. In particular, the lower estimate for Wp[ in assertion ii) of the previous Theorem is sharp.

Lemma 4.4.12. For allµ, ν ∈ P1sub(Y) W1](µ, ν) = inf

n

W11, ν1) +W10) +W10)

µ=µ10, ν=ν10

o . Proof. This is a result of the identification of W1] with W10 done in Theorem 4.4.8 together with the characterization ofW10 shown in Lemma 4.4.5 and the identification of the annihilation costs in Lemma 4.4.6.