Proof of Theorems 3.1 and 3.2 - Modelling Financial and Social Networks

most ¯r/2. For the ﬁrst one it is enough to have,

3.6 Proof of Theorems 3.1 and 3.2

Recall that we have a time series, Y_t =

∑

We have the observations

Z_t = (δ1tY_1t, . . . ,δNtY_Nt)^>, t=1, . . . ,T, (3.20)

whereδ_it ∼Be(p_i) are independent Bernoulli random variables for eachi=1, . . . ,N and t=1, . . . ,T and some p_i∈(0,1].

The proofs of both statements are based on a version of Bernstein matrix inequality presented in Chapter 4, Proposition 4.3.

Theorem 3.4(Klochkov and Zhivotovskiy (2018), Proposition 4.1). Suppose, the matrices A_t for t=1, . . . ,T are independent and let M=max_t

|||A_t|||op

ψ1 is ﬁnite. Then, S_T =∑^T_t=1A_t satisﬁes for any u≥1

|||S_T −ES_T|||_op>C q

σ²(logN+u) +MlogT(logN+u)

≤e^−u,

whereσ²=|||∑^T_t=1EA^>_t A_t|||_op∨ |||∑_t=1^T EA_tA^>_t |||_op and C is an absolute constant.

Let δ_t = (δ_t1, . . . ,δ_tN)^> denotes the vector with Bernoilli variables from above corre-sponding to the time pointt. In what follows we consider the following matrices,

A^k,j_t,t0=diag{δ_t}Θ^kW_t−kW_t^>0−j[Θ^j]^>diag{δ_t⁰}, so that sinceZ_t=∑_k≥0diag{δt}Θ^kW_t−k, we have

Z_tZ_t^>=

∑

k,j≥0

diag{δ_t}Θ^kW_t−kW_t−^>_j[Θ^j]^>diag{δ_t}=

∑

k,j≥0

A_t,t^k,^j. Therefore, the decomposition takes place

Σ^∗=

∑

k,j≥0

S_k,_j, S_k,_j= 1 T

∑

t=1

A_t,t^k,^j, (3.21)

and we shall analyze the sum for each pair of k,j≥0 separately. We ﬁrst introduce two technical lemmas. In what follows we assume w.l.o.g. that|||S|||op=1, since if we scale it, all the covariances and estimators scale correspondingly.

3.6 Proof of Theorems 3.1 and 3.2 Lemma 3.11. Under the assumptions of Proposition 3.1 it holds,

k|||Pdiag{p}⁻¹Diag(A^k,_t,t^j0)Q|||_opk_ψ₁≤C p⁻¹_min√

M₁M₂γ^k+^j, k|||Pdiag{p}⁻¹Off(A^k,_t,t^j0)diag{p}⁻¹Q|||_opk_ψ₁≤C p⁻²_min√

M₁M₂γ^k+^j, with some C=C(L)>0.

Proof. Denote for simplicity x=Θ^kW_t−k,y=Θ^jW_t⁰₋_j, as well as x^δ =diag{δ_t}x, y^δ = diag{δ_t}y, such thatA^k,_t,t^j0 =x^δ[y^δ]^>. SinceW_t are subgaussian and|||Θ^kSΘ^k|||_op≤γ^2k, we have for eachu∈R^N that

logEexp(u^>x)≤C⁰γ^2kkuk², (3.22) and sinceδ_t takes values in[0,1]^N, same takes place forx^δ. By Theorem 2.1 in Hsu et al.

(2012) it holds for any matrixAand vectoru∈R^N,

kkAx^δkk_ψ₂ ≤C⁰⁰γ^k|||A|||_F, ku^>x^δk_ψ₂ ≤C⁰⁰γ^kkuk, (3.23) and, similarly,

kkAy^δkk_ψ₂ ≤C⁰⁰γ^j|||A|||_F, ku^>y^δk_ψ₂ ≤C⁰⁰γ^jkuk.

We ﬁrst deal with the diagonal term. LetP=∑^M_i=1¹ u_ju^>_j be its eigen-decomposition with ku_jk=1, then

k|||Pdiag(x^δ)|||_opk²_ψ₂=k|||diag(x^δ)Pdiag(x^δ)|||_opk_ψ₁≤

M₁

∑

j=1

k|||diag(x_δ)u_ju^>_j diag(x^δ)|||_opk_ψ₁

j=1

∑

kkdiag(u_j)x^δkk²_ψ₂,

where each term in the latter is bounded byγ^2kdue the fact that|||diag(u_j)|||_F=1. Summing up and taking square root we arrive at

|||Pdiag(x^δ)|||_op

ψ2≤√

C⁰⁰M₁γ^k. Taking into account similar bound forQdiag(y^δ), we have by Hölder inequality

k|||Pdiag{δ}⁻¹diag(x^δ)diag(y^δ)Q|||_opk_ψ₁≤p⁻¹_mink|||Pdiag(x^δ)|||_op

ψ2k|||Qdiag(y^δ)|||_opk_ψ₂

≤C⁰⁰√

M₁M₂γ^k+^j,

which yields the bound for the diagonal. As for the off-diagonal, consider ﬁrst the whole matrix,

k|||Px^δ[y^δ]^>Q|||_opk_ψ₁≤ kkPx^δkk_ψ₂kkQy^δkk_ψ₂ ≤(C⁰⁰)²√

M₁M₂γ^j+k, and since Off(A_t,t^j,k0) =A_t,t^j,k0−Diag(A_t,t^j,k0), the bound follows from the triangular inequality.

The following technical lemma will help us to upper-boundσ²in Theorem 3.4.

Lemma 3.12. Letδ₁, . . . ,δ_Nconsists of independent Bernoilli components with probabilities of success p₁, . . . ,p_N and set p_min=min_i≤Np_i. Leta,b∈R^N be two arbitrary vectors. It

3.6 Proof of Theorems 3.1 and 3.2 To show the second inequality we use decoupling (Theorem 6.1.1 in Vershynin (2018)) and the trivial inequality(x+y)²≤2x²+2y²,

note that the expectation Eδ_iδ_k is only non-vanishing wheni=k, in which case it holds Eδ²_i =p_i−p²_i. Taking into account similar property ofEδ⁰_jδ⁰_l we have that the sum above is It is left to notice that

i6=

∑

Similarly to (3.25) we can show the third inequality.

Now we apply Bernstein matrix inequality to the sumS_{k j}deﬁned in (3.21), dealing sepa-rately with diagonal and off-diagonal parts. After that we present the proof of Proposition 3.1.

Lemma 3.13. Under the assumptions of Proposition 3.1 for each u≥1it holds with proba-bility at least1−e^−u

|||Pdiag{p}⁻¹(Diag(S_k,_j)−EDiag(S_k,_j))Q|||_op

≤Cγ^k+^j s

M₁∨M₂(logN+u) T p_min

√M₁M₂(logN+u) T p_min

where C=C(K)only depends on K.

Proof. Note that,

Pdiag{p}⁻¹Diag(S_{k j})Q=T⁻¹

∑

t=1

A_t, A_t =Pdiag{p}⁻¹Diag(A^k,_t,t^j)Q.

By Lemma 3.11 we havek|||A_t|||_opk_ψ₁≤C p⁻¹_min√

M₁M₂γ^k+^j. Moreover, using decomposition Q=∑^M_j=1² u_ju_j, we have

|||EA_tA^>_t |||_op≤|||Ediag{p}⁻¹Diag(A^k,_t,t^j)QDiag(A^k,j_t,t)diag{p}⁻¹|||_op

≤

∑

j=1

|||Ediag{p}⁻¹Diag(A^k,_t,t^j)u_ju^>_j Diag(A_t,t^k,^j)diag{p}⁻¹|||_op

≤

M₂

∑

j=1

sup

kγk=1

E(γ^>diag{p}⁻¹Diag(A_t,t^k,j)u_j)²

By deﬁnition, Diag(A^k,_t,t^j) =diag{δ_tix_iy_i}^N_i=1forx=Θ^kW_t−k,y=Θ^jW_t−_j. LetE_δ denotes the expectation w.r.t. the Bernoulli variables and conditioned on everything else. Settinga= (x₁γ₁, . . . ,x_Nγ_N)^>)andb= (y₁u₁, . . . ,y_Nu_N)^>, we have by the ﬁrst inequality of Lemma 3.12,

E(γ^>diag{p}⁻¹Diag(A^k,_t,t^j)u_j)²=EE_δ

∑

γix_iδ_ti p_iy_iu_i

≤p⁻¹_minEkak²kbk²

≤p⁻¹_minE^1/2kak⁴E^1/4kbk⁴. Observe that,

kak²=

∑

γ_i²x²_i =x^>diag{γ}²x,

3.6 Proof of Theorems 3.1 and 3.2 so since tr(diag{γ}²) =1 and due to (3.22) and by Theorem 2.1 Hsu et al. (2012) it holds E^1/2kak⁴≤ kkak²k_ψ₁ ≤C⁰γ^2k. Similarly, it holdsE^1/2kak⁴≤C⁰γ²^j, which together implies

|||EA_tA^>_t |||_op∨ |||EA^>_t A^>_t |||_op≤C⁰⁰M₂∨M₁γ^2k+2^j.

Now notice that A_t is not necessary an independent sequence, as A_t depends directly on(W_t−k,W_t−_j,δ_t), which might intersect with e.g. t⁰=t+|j−k|. However, if we take a setI⊂[1,T]such that any twot,t⁰∈Isatisfy|t⁰−t| 6=|j−k|then obviously the sequence (A_t)_t∈I is independent. We separate the whole interval[1,T]into two such independent sets,

I₁={t∈[1,T]: dt/|j−k|eis odd}, I₂={t∈[1,T]: dt/|j−k|eis even}

=[1,T]\I₁.

(3.26)

Indeed, if fort,t⁰∈I₁thendt/|j−k|eanddt⁰/|j−k|eare either equal or differ in at least two, so that in the ﬁrst case we have|t−t⁰|<|j−k|and in the second|t−t⁰|>|j−k|. Since both intervals have, very roughly, at mostT elements, it holds by Theorem 3.4 with probability at least 1−e^−ufor both j,

|||

∑

t∈Ij

A_t−EA_t|||_op

≤Cγ^j+k q

p⁻¹_min(M₁∨M₂)T(logN+u)∨p⁻¹_min√

M₁M₂(logN+u)logT

so summing up the two and dividing byT we get the result.

Lemma 3.14. Under the assumptions of Proposition 3.1 for each u≥1it holds with proba-bility at least1−e^−u

|||Pdiag{p}⁻¹(Off(S_k,_j)−EOff(S_k,j))diag{p}⁻¹Q|||_op

≤Cγ^k+^j

sM₁∨M₂(logN+u) T p²_min

√M₁M₂(logN+u)logT T p²_min

where C=C(K)only depends on K.

Proof. It holds, have that Off(A_t,t^j,k) =Off(xy^>), therefore by Lemma 3.12

E(γ^>diag{p}⁻¹Off(A^k,_t,t^j)diag{p}⁻¹u_j)²=EE_δ

∑

similarly,E^1/4ku^>yk⁴≤C⁰γ^k. Putting those bounds together and applying Cauchy-Schwarz inequality, we have

|||EB_tB^>_t |||_op≤C⁰⁰p⁻²_minM₂γ^2k+2^j. By analogy, we have

|||EB_tB^>_t |||_op∨ |||EB^>_t B_t|||_op≤C⁰⁰p⁻²_minM₁∨M₂γ^2k+2j.

3.6 Proof of Theorems 3.1 and 3.2 Applying the same sample splitting (3.26) we obtain the bound

|||

∑

which divided byT provides the result.

Proof of Theorem 3.1. Set,

S^δ_k,_j=diag{p}⁻¹Diag(S_k,_j)−diag{δ}⁻¹Off(S_k,_j)diag{δ}⁻¹,

holds with probability at least 1−e^−u. Take a union of those bounds for each k,j with u=u_k,_j=k+ j+1+u⁰. The total probability of complementary event is at most On such event it holds

|||P(Σˆ−EΣ)Q|||_op≤

∑

which completes the proof due to the equalities

∑

k,j≥0

γ^k+^j=

∑

k≥0

γ^k

= 1

(1−γ)²

∑

j≥0

(k+j)γ^k+^j=2

∑

k,j≥0

kγ^k+^j= 2 (1−γ)

∑

k≥0

kγ^k= 2 (1−γ)³.

Proof of Theorem 3.2. Recall the deﬁnition,

A^k,j_t,t0=diag{δ_t}Θ^kW_t−kW_t^>0−j[Θ^j]^>diag{δ_t⁰}.

Then, it holds

Z_tZ_t+1^> =

∑

k,j≥0

diag{δt}Θ^kW_t−kW_t+1−^> _j[Θ^j]^>diag{δt+1}=

∑

k,j≥0

A^k,_t,t+1^j , and the decomposition takes place,

A^∗=

∑

k,j≥0

S_k,j, S_k,_j= 1 T−1

T−1 t=1

∑

A_t,t+1^k,^j . We ﬁrst apply Bernstein matrix for eachS_k,_jseparately. Observe that

Pdiag{p}⁻¹S_k,_jdiag{p}⁻¹Q= 1 T−1

T−1 t=1

∑

B_t, B_t=Pdiag{p}⁻¹A_t,t+1^k,^j diag{p}⁻¹Q.

By Lemma 3.11 each term satisﬁes,

maxt k|||B_t|||_opk_ψ₁ ≤C√

M₁M₂γ^k+^j.

Furthermore, let Q=∑^M_j=1² u_ju^>_j with unit vectors u_j. Also, denoting x= Θ^kW_t−k and y= Θ^kW_t+1−k it holds A^k,j_t,t+1 =diag{δ_t}xy^>diag{δ_t+1}. Then, we have for each unit

3.6 Proof of Theorems 3.1 and 3.2 γ ∈R^N and using Lemma 3.12,

E(γ^>diag{p}⁻¹A^k,_t,t+1^j diag{p}⁻¹u_j)²

=EE_δ

∑

i,j

γ_ix_iδ_ti p_i

δ_t+1,_j p_j y_ju_j

≤p⁻²_minEkdiag{γ}xk²kdiag{u}yk²+E(γ^>x)(u^>y)², which due to the subgaussianity ofxandyyields,

Ekdiag{γ}xk²kdiag{u}yk²≤E^1/2kdiag{γ}xk⁴E^1/2kdiag{u}yk⁴

≤C⁰γ^2k+2^j

E(γ^>x)(u^>y)²≤E^1/2(γ^>x)⁴E^1/2(u^>y)⁴

≤C⁰γ^2k+2^j. Therefore, we get that

|||EB_tB^>_t |||_op= sup

kγk=1 M2

∑

j=1

γ^>diag{p}⁻¹A^k,_t,t+1^j diag{p}⁻¹u_j2

≤C⁰⁰p⁻²_minM₂γ^2k+2^j.

Taking similar derivations we can arrive at

σ²=|||EB_tB^>_t |||_op∨ |||EB^>_t B_t|||_op≤C⁰⁰p⁻²_min(M₁∨M₂)γ^2k+2^j.

Now we separate the indicest=1, . . . ,T into four subsets, such that each corresponds to a set of independent matricesB_t. Since eachB_t is generated by(W_t−k,W_t+1−_j,δ_t), and δ_t+1, we simply need to ensure that none of the pair of indicest,t⁰from the same subset satisﬁes|t−t⁰|=|k−j+1|nor|t−t⁰|=1. This can be satisﬁed by the following separation.

First, we separate the indices into two subsets with odd and even indices, respectively, so that none of the subsets contains two indices with |t−t⁰|=1. Then, both of the subsets need to be separated into two others according to the scheme (3.26), so that the assertion

|t−t⁰|=|k−j+1|is avoided within each subset. Therefore, applying Bernstein inequality, Theorem 3.4, to each sum separately and then summing up, we get that for eachu≥1 with

probability at least 1−e^−u,

|||Pdiag{δ}⁻¹(S_k,_j−ES_k,_j)diag{δ}⁻¹Q|||_op

≤C q

p⁻²_min(M₁∨M₂)T(logN+u)^_√

M₁M₂(logN+u)logT

Similarly to the proof of Proposition 3.1 we take the union of those bounds for eachi,jwith u= j+k+u⁰and then the result follows.

Chapter 4 Uniform Hanson-Wright inequality with subgaussian entries

The concentration properties of quadratic forms of random variables is a classic topic in probability. The well-known result is due to Hanson and Wright (we refer to the form of this inequality presented in Rudelson and Vershynin (2013)) which claims that if Ais an n×nreal matrix andX = (X₁, . . . ,X_n)is a random vector inRⁿwith independent centered coordinates satisfying max_ikX_ik_ψ₂ ≤K(we will recall the deﬁnition ofk · k_ψ₂ below) then for allt≥0

P(|X^>AX−EX^>AX| ≥t)≤2 exp

−cmin

t²

K⁴kAk²_HS, t K²kAk

, (4.1)

for some absolutec>0 andkAk_HS=q

∑i,jA²_i,j deﬁnes the Hilbert-Schmidt norm andkAk is an operator norm ofA. An important extension of these results is when instead of just one matrixAwe have a family of matricesA and want to understand the behaviour of random quadratic forms simultaneously for all matrices in the family. As a concrete example we consider an order-2 Rademacher chaos: given a familyA ⊂R^n×nofn×nreal symmetric matrices with zero diagonal, that is for allA∈A we haveA_ii=0 for all i=1, . . . ,n, one wants to study the following random variable

Z= sup

A∈A n i,j=1

∑

A_{i j}ε_iε_j= sup

A∈Aε^>Aε,

whereε = (ε₁, . . . ,ε_n)^> is a sequence of independent Rademacher signs, taking values±1 with equal probabilities. In the celebrated paper Talagrand (1996) it was shown, in particular, that there is an absolute constantc>0, such that for anyt≥0

P(|Z−EZ| ≥t)≤2 exp

Apart from the new techniques the signiﬁcance of this result is that previously (see, for exam-ple, Ledoux and Talagrand (2013)) similar bounds were one-sided and had a multiplicative constant greater than 1 beforeEZ. These results are sometimes calleddeviation inequlitiesin contrast to theconcentration boundsof the form (4.2) that will be studied below. A simpliﬁed proof of the upper-tail of (4.2) appeared later in Boucheron et al. (2003). Similar inequalities in the Gaussian case follow from the results in Borell (1984) and Arcones and Gine (1993).

Observe, that when the diagonal elements are zero, for eachA∈A the corresponding quadratic form is centered,Eε^TAε=0. In a general situation we will be interested in the analysis of

Z= sup

A∈A(X^>AX−EX^>AX), (4.3) for a random vectorX taking its values inRⁿ. As before, the analysis of both the expectation and the concentration properties of this random variable appeared a lot in a recent literature.

Just to name a few: Kramer et al. (2014) study EZ and deviations of Z for classes of positive semideﬁnite matrices with applications to compressive sensing, Dicker and Erdogdu (2017) prove deviation inequalities for sup_A∈A(X^>AX−EX^>AX)and subgaussian vectors X under some extra assumptions. Additionally, a recent paper Adamczak et al. (2018b) studies deviation bounds forZ=kX^>AX−EX^>AXkwith Banach space-valued matrices A and Gaussian variables, providing upper and lower bounds for the moments. Finally, it was shown in Adamczak (2015) that ifX satisﬁes the so-calledconcentration property with constantK, that is for every 1-Lipschitz functionϕ :Rⁿ→Rand anyt≥0 it holds E|ϕ(X)|<∞and

P(|ϕ(X)−Eϕ(X)| ≥t)≤2 exp −t²/2K²

, (4.4)

then the following bound (similar to (4.2)) holds for everyt≥0

P(|Z−EZ| ≥t)≤2 exp

This result has an application in the covariance estimation and recovers another recent concentration result of Koltchinskii and Lounici (2017); we will discuss this in what follows.

The drawback of (4.5) is that the concentration property is quite restrictive: it works whenX has standard Gaussian distribution, for some log-concave distributions (see Ledoux (2001)), but at the same time does not hold for general subgaussian entries and even in the simplest case of Rademacher random vectorε.

We extend the mentioned results in two directions. On one hand we revisit the result of Boucheron et al. (2003) for bounded variables allowing non-zero diagonal values of the matrices, and on the other we allow unbounded subgaussian variablesX_i. First, let us recall the following deﬁnition. Forα >0 denote theψ_α-norm of a random variableY by

kYk_ψ_α =inf refereed to as subexponential and kYk_ψ₂ <∞ will be refereed to as subgaussian and the corresponding norm is usually named a subgaussian norm. We also use theL_p(P)norm. For p≥1 we setkYk_L_p = (E|Y|^p)¹^p. One of our main contributions is the following upper-tail bound.

Theorem 4.1. Suppose that components of X = (X₁, . . . ,X_n) are independent centered random variables and A is a ﬁnite family of n×n real symmetric matrices. Denote M=

where c>0is an absolute constant and Z is deﬁned by(4.3).

Remark 4.1. In Theorem 4.1 and below we assume that all A∈A is symmetric. This was done only for the convenience of presentation and in fact, the analysis may be performed for general square matrixes. The only difference will be that in many places A should be replaced by ¹₂(A+A^T).

In particular, Theorem 4.1 recovers the right-tail of the result of Talagrand (4.2) up to absolute constants, since in this case we obviously have

max_i|ε_i|

ψ2 .1. Furthermore, the result of Theorem 4.1 works without the assumption used in Talagrand (1996) and

Boucheron et al. (2003) that diagonals of all matrices inA are zero. Moreover, it is also applicable in some situations when the concentration property (4.4) holds: indeed, ifX is a standard normal vector inRⁿ then it is well known (see Ledoux and Talagrand (2013)) that M=

max_i|X_i|

ψ2 ∼√

logn and at the same time if the identity matrixI_n∈A then Esup_A∈AkAXk ≥EkXk& √

n. Therefore, in this case the factor M is only of at most logarithmic order when compared toEsup_A∈_AkAXk.

In a special case when A consists of just one matrix our bound recovers the bound which is similar to the original Hanson-Wright inequality. On the one hand our bound may have an extra logarithmic factor that depends on the dimensionn. On the other hand the original term max_ikX_ik_ψ₂kAk_HSis replaced by the better termEkAXk. We will discuss this phenomenon below. The core of the proof of the Hanson-Wright inequality in Rudelson and Vershynin (2013) is based on the decoupling technique which may be used (at least in a straightforward way) to prove the deviation, but not the concentration inequality for sup_A∈_A(X^>AX−EX^>AX)in the case whenA consists of more than one matrix.

A natural question to ask is whether one may improve Theorem 4.1 and replace M= max_i|X_i|

ψ2 byK=max_i X_i

ψ2. In what follows we discuss that in the deviation version of Theorem 4.1 this replacement is not possible in some cases. This is quite unexpected in light of the fact that

max_i|X_i|

ψ2 does not appear in the original Hanson-Wright inequality.

Therefore, we believe that the form of our result is close to optimal. We also provide the following extension of Theorem 4.1, which may be better in some cases.

Proposition 4.1. Suppose that components of X = (X₁, . . . ,X_n)are independent centered random variables. Suppose also, that the variables X_ihave symmetric distribution (X_ihas the same distribution as−X_i). Let A be a ﬁnite family of n×n real symmetric matrices.

Denote M=

where c>0are absolute constants and Z is deﬁned by(4.3).

Remark 4.2. Proposition 4.1 is closer to the standard Hanson-Wright inequality (4.1).

Indeed, in the case whenA ={A}we haveEkAGk ∼ kAk_HS. The difference is that K⁴and K²are replaced by M²K²and MK respectively.

We proceed with some notations that will be used below. For a non-negative random variableY, deﬁne itsentropyas

Ent(Y) =EYlogY−EYlogEY.

Instead of the concentration property (4.4) we also discuss the following property:

Assumption 4.1. We say that the random vector X taking its values in Rⁿ satisﬁes the logarithmic Sobolev inequality with constant K>0if for any continuously differentiable function f :Rⁿ→Rit holds

Ent(f²)≤2K²Ek∇f(X)k², (4.6) whenever both sides of the inequality are not inﬁnite.

To show that logarithmic Sobolev property is closely related to the concentration property we remind (Theorem 5.3 Ledoux (2001)) that Assumption 4.1 implies the concentration property (4.4) and the proof of this fact is based essentially on taking f(X) =exp(λ(ϕ(X)− Eϕ(X))/2)forλ >0 which implies

Ent(exp(λ(ϕ(X)−Eϕ(X))))≤ K²λ²

2 Eexp(λ(ϕ(X)−Eϕ(X))).

This is known to imply (4.4) through Herbst argument, see Boucheron et al. (2013). Moreover, the last inequality is equivalent to concentration property. Indeed, from the concentration property we know that kϕ(X)−Eϕ(X)k_ψ₂ .K and this implies (see van Handel (2016)) that for allλ ∈R

Ent(exp(λ(ϕ(X)−Eϕ(X)))).K²λ²Eexp(λ(ϕ(X)−Eϕ(X))).

One of our technical contributions is that we use a similar scheme to prove Theorem 4.1 and to recover (4.5) under the logarithmic Sobolev Assumption 4.1. The application of logarithmic Sobolev inequalities requires computation of the gradient of the function of interest, that is in our case the gradient of f(X) =sup_A∈A(X^TAX−EX^TAX). It appears that in the analysis we need to control the behaviour of∇f(X)(or its analogs) and, as in Boucheron et al. (2003) and Adamczak (2015), we will use a truncation argument to do so. However, in both cases our proofs will pass through the entropy variational formula of Boucheron et al. (2013), that states that for random variablesY,W withEexp(W)<∞it

holds

E(Wexp(λY))≤Eexp(λY)log(Eexp(W)) +Ent(exp(λY)). (4.7) This will allow us to shorten the proofs and avoid some technicalities appearing in previous papers. Finally, to prove Theorem 4.1 we use a second truncation argument: that will be based on Hoffman-Jørgensen inequality (see Ledoux and Talagrand (2013)). We also present two lemmas, which will be used several times in the text. Both results have short proofs and may be of independent interest.

Lemma 4.1. Suppose, that for random variables Z,W and anyλ >0it holds

Ent(e^λ^Z)≤λ²EWe^λ^Z and P(W >L+θt)≤e^−t, (4.8) whereθ,L are positive constants. Then, the following concentration result holds

P(Z−EZ>t)≤exp

−cmin t²

L+θ^,

√t θ

, (4.9)

where c>0is an absolute constant. Moreover, if (4.8)holds as well forλ ≤0, we have P(|Z−EZ|>t)≤2 exp

−cmin t²

L+θ^,

√t θ

The second technical result is a version of the convex concentration inequality of Tala-grand (1996), which does not require the boundedness of components ofX.

Lemma 4.2. Let f :Rⁿ→Rbe a convex, L-Lipschitz function with respect to Euclidian norm inRⁿand X= (X₁, . . . ,X_n)be a random vector with independent components. Then, it holds for any t≥CLkmax_i|X_i|k_ψ

P(|f(X)−Ef(X)|>t)≤exp −c t² L²kmax_i|X_i|k²_ψ

! ,

where c,C>0are absolute constants.

We discuss the optimality of this result in what follows. Finally, we sum up the structure of the rest of this chapter and outline the main contributions:

• Section 4.1 is devoted to applications and discussions and consists of several parts.

At ﬁrst, we give a simple proof of the uniform bound of Adamczak (2015) under the

4.1 Some applications and discussions logarithmic Sobolev assumption. The second paragraph is devoted to improvements in the non-uniform Hanson-Wright inequality (4.1) in the subgaussian regime. Fur-thermore, we apply our techniques to obtain a uniform concentration result similar to Theorem 4.1 in a particular case of non-independent components. We consider the Ising model under Dobrushin’s condition that caught some attention recently (see Adamczak et al. (2018a) and Götze et al. (2018)). The question we study was raised by Marton (2003) in a closely related scenario. Finally, we show that it is not possible in general to replacekmaxi|X_i|k_ψ₂ with maxikX_ik_ψ₂ in Theorem 4.1 by providing an appropriate counterexample.

• In Section 4.2 we present the proof of Theorem 4.1. Between the lines, we prove Lemma 4.8 and Lemma 4.2. Finally, we give a proof of Proposition 4.1.

• In Section 4.3 we prove a dimension-free matrix Bernstein inequality that holds for random matrices with the subexponential spectral norm. The proof is based on the same truncation approach as in the proof of Theorem 4.1. We demonstrate how our Bernstein inequality can be used in the context of covariance estimation for subgaussian observations, improving the state-of-the-art result of Lounici (2014) for covariance estimation with missing observations.

4.1 Some applications and discussions

We begin with some notation that will be used below. For a random vector X taking its values in Rⁿ let X₁, . . . ,X_n denote its components. In the case when all the components ofX are independent letX_i⁰denote the independent copy of the componentX_i. Symbol∼ denotes equivalence up to absolute constants and.denotes an inequality up to some absolute constant. The numbersC,c>0 denote absolute constants, which also may change from line to line.

A uniform Hanson-Wright inequality under the logarithmic Sobolev condition

In this paragraph we recover the result of Adamczak (2015) under the Assumption 4.1.

Consider a random variablesZdeﬁned by (4.3) as a function ofX, that satisﬁes logarithmic Sobolev assumption (4.6).

Following Adamczak (2015) we assume without the loss of generality, thatA is a ﬁnite set of matrices, thenZis Lebesgue-a.e. differentiable and

k∇Z(X)k ≤2 sup

kAXk,

bounded by a Lipschitz function ofX with good concentration properties.

Remark 4.3. Note, that Assumption 4.1 applies only for smooth functions, so that a standard smoothing argument should be used (see e.g. Ledoux (2001)). For sake of completeness we recover this argument in Section 4.4. In what follows in this section we assume that none of these potential technical problems appear.

In particular, sinceX satisﬁes log-Sobolev condition with constantK, we have (Theorem 5.3 in Ledoux (2001))

Furthermore, the logarithmic Sobolev condition implies for anyλ ∈R Ent(e^λ^Z)≤4K²λ²Esup

kAXk²e^λ^Z. Therefore, by Lemma 4.1 it holds for anyt ≥1,

which coincides with (4.5) forK-concentrated vectors up to absolute constant factors.

Remark 4.4. This result may be used directly to prove the concentration forkΣˆ−Σk, where

Im Dokument Modelling Financial and Social Networks (Seite 95-121)