Matrix Bernstein inequality in the subexponential case

As we mentioned above, one of the prominent applications of the uniform Hanson-Wright inequalities is the recent concentration result in the Gaussian covariance estimation problem.

It is known that covariance estimation problems may be alternatively approached by the matrix Bernstein inequality. Following the truncation approach, which was taken above we provide a version of matrix Bernstein inequality, that does not require uniformly bounded

4.3 Matrix Bernstein inequality in the subexponential case matrices. The standard version of the inequality (see Tropp (2012) and reference therein) may be formulated as follows: consider random independent matricesX₁, . . . ,X_N ∈R^n×n,

. The ﬁrst problem with this result is that it does not hold in general cases when maxikX_ik_ψ₁ or maxikX_ik_ψ₂ are bounded.

The second problem is the dependence on the dimensionn, which does not allow applying it to operators in Hilbert spaces. For a positive deﬁnite real square matrixAwe deﬁne the effective rankas ˜r(A) =^tr(A)_kAk. We show the following bound.

Proposition 4.3. Suppose, we have random independent symmetric matrices X₁, . . . ,X_N ∈ R^n×n, each satisfying

kX_ik

ψ1 <∞.Set M =

max_i≤NkX_ik

ψ1 and let positive-deﬁnite matrix R be such thatE∑^N_i=1X_i²R. Finally, setσ²=kRk. There are absolute constants

Remark 4.8. Using the well known bound for the maximum of subexponential random variables (see Ledoux and Talagrand (2013)) we have

max When n=1the effective rank plays no role and our bound recovers the version of classical Bernstein inequality which is due to Adamczak (2008). In this paper, it is also shown that the logN factor cannot be removed in general, meaning that M=

max_i≤NkX_ik

ψ1 cannot be replaced bymax_i≤N

kX_ik

ψ1 in general.

Proof. FixU>0 and consider the decomposition

X_i=Y_i+Z_i, Y_i=X_iI(kX_ik ≤U), Z_i=X_iI(kX_ik>U),

so that the matrices Y_i are uniformly bounded byU in operator norm. By the triangle inequality and the union bound,

so the two parts can be treated separately. Throughout the proof c> 0 is an absolute constant which may change from line to line. It is known that uniformly bounded random matrices satisfy Bernstein-type inequality (see Theorem 3.1 in Minsker (2017)) for u≥

1 notY_i, we need the following modiﬁcation of the proof of Minsker’s theorem. Using the notation of his proof, it follows from Lemma 3.1 in Minsker (2017):

logEexp(θ(Y_i−EY_i)) φ(θU)

U² E(Y_i−EY_i)² φ(θU)

U² 2EY_i² φ(θU) U² 2EX_i². Now, using the same lines of the proof, instead of formula (3.4) we have

Etrφ θ

whereσ²=kRk. Following last lines of the proof of Theorem 3.1 we ﬁnally have P

4.3 Matrix Bernstein inequality in the subexponential case

Thus, we can apply Proposition 6.8 from Ledoux and Talagrand (2013) toZ_itaking values the Banach space(R^n×n,k · k)equipped with the spectral norm. We have,

E which implies with some constantK>0,

Using Theorem 6.21 from Ledoux and Talagrand (2013) in(R^n×n,k · k)we have,

wherec>0 is an absolute constant. Combining it with (4.35), and that for some absolute C>0 we haveU≤C

To the best of our knowledge, the Proposition 4.3 is the ﬁrst to combine two important properties: it simultaneously captures the effective rank instead of the dimensionnand is valid for matrices with subexponential operator norm (previously matrix Bernstein inequality in the unbounded case was granted under the so-called Bernstein moment condition; we refer to Tropp (2012) and the references therein). We should also compare our results with

Proposition 2 of Koltchinskii (2011), which has the same form as our bound, but instead of the effective rank, the original dimensionnis used andM=

max_i≤nkX_ik

ψ₁ is replaced by maxi≤N

kX_ik ψ1log

maxi≤N kX_ik

ψ1

/σ²

Application to covariance estimation with missing observations

Now we turn to the problem studied in Koltchinskii and Lounici (2017) and Lounici (2014).

Suppose, we want to estimate the covariance structure of a centered random subgaussian vectorX ∈Rⁿ(which will be assumed centered) based onNi.i.d. observationsX₁, . . . ,X_N. For the sake of brevity, we work with the ﬁnite-dimensional case, while as in Koltchinskii and Lounici (2017) our results will not depend explicitly on the dimensionn. Recall, that a centered random vectorX ∈Rⁿissubgaussianif for allu∈Rⁿit holds

khX,uik_ψ₂ .(EhX,ui²)¹², (4.36) which does not require any independence of components ofX.

In what follows we discuss a more general framework suggested by Lounici (2014). Let δi,j,i≤N,j≤nbe independent Bernoulli random variables with the meanδ. We assume that instead of observing X₁, . . . ,X_N we observe vectorsY₁, . . . ,Y_N, which are deﬁned as Y_i^j=δ_i,jX_i^j. This means that some components of vectorsX₁, . . . ,X_N are missing (replaced by zero) each with probability 1−δ. Sinceδ can be easily estimated we assume that it is known. Following Lounici (2014), denote

Σˆ^(δ)= 1 N

N i=1

∑

Y_iY_i^>. It can be easily shown that the estimator

Σˆ = (δ⁻¹−δ⁻²)Diag(Σˆ^(δ)) +δ⁻²Σ^ˆ^(δ) is an unbiased estimator ofΣ=EX_iX_i^>. In particular,

Σ= (δ⁻¹−δ⁻²)Diag(EY_iY_i^>) +δ⁻²EY_iY_i^>. (4.37)

4.3 Matrix Bernstein inequality in the subexponential case Theorem 4.3. Under the assumptions deﬁned above, it holds with probability at least1−e^−t for t≥1

Remark 4.9. The upper-bound above provides an important improvement upon Proposition 3 in Lounici (2014), which is

kΣˆ−Σk.kΣkmax The bound (4.38)depends on n and therefore is not applicable in the inﬁnite dimensional scenarios. It also contains a term proportional to t², which appears due to a straightforward truncation of each observation. Moreover, this result has an unnecessary factorr(Σ)˜ in the term

qr(Σ)t˜

Nδ². Finally, when δ =1tighter results may be obtained using high probability generic chaining bounds for quadratic processes. In particular, Theorem 9 in Koltchinskii and Lounici (2017) implies Unfortunately, this analysis may not be implied forδ <1in general, since the assumption (4.36)will not hold for the vector Y , deﬁned by Y_i^j=δ_i,_jX_i^j. Therefore, our technique is a reasonable alternative which works for generalδ and is almost as tight as(4.39)when δ =1.

To prove Theorem 4.3 we need the following technical Lemma, parts of which may as well be found in Lounici (2014). For a matrixAlet Diag(A) denote its diagonal part and deﬁne Off(A) =A−Diag(A).

Lemma 4.9. Let X ∈Rⁿsatisfy(4.36)with covariance matrixΣany Y = (δ₁X¹, . . . ,δ_nXⁿ), whereδ_i, i≤n are independent Bernoulli random variables with the meanδ. Then, it holds

kDiag(YY^>)k

ψ1.r(Σ)kΣk,˜

kOff(YY^>)k

ψ1.r(Σ)kΣk.˜

Additionally, it holds for some absolute constant C>0

EOff(YY^>)²Cδ²tr(Σ)(Σ+Diag(Σ)), and EDiag(YY^>)².Cδtr(Σ)Diag(Σ).

(4.40)

Proof. Observe, thatkDiag(YY^>)k ≤ kYk²andkOff(YY^>)k ≤ kYY^>k+kDiag(YY^>)k ≤ 2kYk². Therefore,

kOff(YY^>)k ψ1

≤2kkYkk²_ψ₂ ≤2kkXkk²_ψ₂.tr(Σ), and the same bound holds for

kDiag(YY^>)k ψ1.

Let A be an arbitrary symmetric matrix and let us calculate E(Aδ δ^>)², where denotes Hadamard product andδ = (δ₁, . . . ,δ_n)is a vector with independent components having Bernoulli distribution with the meanδ. We have,

E(Aδ δ^>)²i

ii=E

∑

A_ikδ_iδ_kA_kiδ_iδ_k=

∑

A_ikA_ikEδ_i²δ_k²=δ²[A²]_ii+ (δ−δ²)A²_ii. For the element at the positioni jwithi6= jwe have,

E(Aδ δ^>)²i

i j =E

∑

A_ikδ_iδ_kA_{k j}δ_jδ_k=

∑

A_ikA_{k j}Eδ_iδ_jδ_k²

=δ³[A²]_{i j}+ (δ²−δ³)(A_iiA_{i j}+A_{i j}A_{j j}).

This can be put together in the following expression, E(δ δ^>A)²=δ³A²+ (δ²−δ³)

Diag(A²) +Off(A)Diag(A) +Diag(A)Off(A) + (δ−δ²)Diag(A)².

Note, that all of these matrices are positive deﬁnite, apart from the term Off(A)Diag(A) + Diag(A)Off(A), which we can obviously bound by ¹₂(Off(A) +Diag(A))²=A²/2. Taking into accountδ ≤1, we have a simple bound

E(δ δ^>A)²1

2(δ³+δ²)A²+ (δ²−δ³)Diag(A²) + (δ−δ²)Diag(A)² δ²(A²+Diag(A²)) +δDiag(A)².

4.3 Matrix Bernstein inequality in the subexponential case Now recall thatY =diag(δ)X, therefore Off(YY^>) =δ δ^>Off(X X^>). Since the latter has zero diagonal, the term withδ in the formula above disappears. Therefore,

EOff(YY^>)²δ² h

EOff(X X^>)²+Diag

EOff(X X^>)²i

. (4.41)

It holdsEOff(X X^T)²2E(X X^>)²+2EDiag(X X^T)², and we also have from Lounici (2014) that E(X X^>)²Ctr(Σ)Σ. Additionally, due to subgaussianity (4.36) we have EX_i⁴.Σ²_ii. Finally, the following bound holds

EDiag(X X^>)²CDiag(Σ)²Ctr(Σ)Diag(Σ).

Plugging this bounds into (4.41) we get the second inequality.

As for the diagonal, we have forA=Diag(X X^>),

EDiag(YY^>)3δEDiag(X X^>)²Cδtr(Σ)Diag(Σ).

Lemma 4.10. For Y as in Lemma 4.9 and any unit u∈Rⁿit holds,

ku^>Off(YY^>)uk_L₂ .δ²kΣk, ku^>Diag(YY^>)uk_L₂.δkΣk.

Proof. Letv∈Rⁿbe as well arbitrary unit vector. First we want to check, that

ku^>Diag(X X^>)vk_L₄ .kΣk, ku^>Off(X X^>)vk_L₄.kΣk. (4.42) Obviously, ku^>X X^>vk_L₄ ≤ ku^>Xk_L₈kv^>Xk_L₈ .kΣk, so it is enough to check just for the diagonal. Let us apply simmetrization argument. Suppose,ε= (ε₁, . . . ,ε_d)^>are independent Rademacher variables, then

u^>Diag(X X^>)v=E_εε^>diag(u)X X^>diag(v)ε=E_εu_εX X^>v_ε,

whereu_ε= (u₁ε₁, . . . ,u_dε_d)^>andE_ε denotes expectation conditioned onX. Then, by Jensen and Hölder inequalities,

u^>Diag(X X^>)u4

≤E

u^>_εX X^>u_ε 4

=E_εE^1/2[(u^>_εX)⁸|ε]E^1/2[(v^>_εX)⁸|ε].kΣk⁴, thus implying (4.42).

Next, let us consider a zero diagonal symmetric matrixB. We have,

Therefore, due to the fact thatBis symmetric we have

E(δ^>Bδ)²=δ⁴

∑

4.3 Matrix Bernstein inequality in the subexponential case As for the diagonal, we have

Before we start with the proof of deviation bound let us present the following version of Talagrand’s concentration inequality for the empirical processes, which will help us to capture the tail behavior in the subgaussian regime. Remarkably, the following result can be proven using very similar techniques: at ﬁrst one may use the modiﬁed logarithmic Sobolev inequality to prove a version of Talagrand’s concentration inequality in the bounded case and then use the truncation as in the proof of Theorem 4.1 to get the result in the unbounded case.

Theorem 4.4(Theorem 4 in Adamczak (2008)). Let X₁, . . . ,X_N∈X be independent sample

Proof of Theorem 4.3. At ﬁrst, using (4.37) we have kΣˆ−Σk.δ⁻¹

We have ˜r(R)≤2˜r(Σ)andkRk.Nδ²tr(Σ)kΣk. Therefore, with probability at least 1−e^−t Integrating this bound (see e.g. Theorem 2.3 in Boucheron et al. (2013)) we easily get

EkOff(Σˆ^(δ))−EOff(Σˆ^(δ))k.kΣkmax

We proceed with the diagonal term. Applying Proposition 4.3 to the sumNDiag(Σˆ^(δ⁾) =

∑^N_i=1Diag(Y_iY_i^>)withR=CNδtr(Σ)Diag(Σ)we haver(R).r(Σ)andkRk.Nδtr(Σ)kΣk.

Thus, with probability at least 1−e^−twe get, kDiag(Σˆ^(δ))−EDiag(Σˆ^(δ))k.kΣkmax

Again, integrating this inequality we get a bound for the expectation, EkDiag(Σˆ^(δ))−EDiag(Σˆ^(δ))k.kΣkmax

4.4 Approximation argument for non-smooth functions We have ku^>Diag(Y_iY_i^>)uk²_L

2 . δkΣk² and kmax_ikOff(Y_iY_i^>)kk_ψ₁ .r(Σ)kΣklog˜ N by Lemma 4.10 and Lemma 4.9. By Theorem 4.4 we have with probability at least 1−e^−t, kDiag(Σˆ^(δ))−EDiag(Σˆ^(δ))k ≤2EkDiag(Σˆ^(δ))−EDiag(Σˆ^(δ⁾)k+kΣk

r δt

N +kΣkr(Σ)t˜ logN N .kΣkmax

δr(Σ)logr(Σ)

N ,

r δt

N,r(Σ)(logr(Σ) +t)logN N

! .

It is left to combine the off-diagonal and diagonal bounds,

kΣˆ−Σk ≤δ⁻²kOff(Σˆ^(δ))−EOff(Σˆ^(δ))k+δ⁻¹kDiag(Σˆ^(δ⁾)−EDiag(Σˆ^(δ⁾)k.

4.4 Approximation argument for non-smooth functions

In this section we explain how one can apply the Sobolev inequality for functions that are not everywhere difﬁrentiable rigorously. In order to use the Assumption (4.6), we need to take smooth approximations of the function

Z(X) =sup

(X^>AX−EX^>AX).

Notice, that we have

|Z(X)−Z(Y)| ≤ kX−Yk

sup

kAXk+sup

kAYk

The following simple lemma shows how to apply the logarithmic Sobolev inequality to non-differentiable functions that satisfy such inequality.

Lemma 4.11. Suppose, a random vector X satisﬁes Assumption 4.1. Let f :Rⁿ→Rbe such that

|f(x)− f(y)| ≤ |x−y|max(L(x),L(y)),

for some continuous L(x)≥0. Then, for some absolute constant C>0and anyλ ∈Rit holds,

Ent(e^λ^f)≤CK²λ²EL(x)²e^λ^f

Proof. Seth(x) =x²(1−x)²₊ and consider a smoothing kernel supported on unit ball,

Moreover, we have due to the symmetry

∇F_m(x) =

4.4 Approximation argument for non-smooth functions Assumption 1,

Ent(F_m²)≤K²Ek∇F_m(x)k²≤2C_gK²EL²_m(x)Fe_m(x)², and taking limitm→∞gives the required inequality.

Appendix A Technical tools

A.1 Lasso and missing observations

Suppose, we observe a signaly∈Rⁿof the form y=Φb^∗+ε,

whereΦ= [φ₁, . . . ,φ_p]∈R^n×p is a dictionary of words φ_j ∈Rⁿ and b^∗ is some sparse parameter with a supportΛ⊂ {1, . . . ,p}. We want to recover exact sparse representation by solving quadratic program

2ky−Φbk²+γkbk₁→ min

b∈R^p. (A.1)

Denote byR^Λ the set of vectors with elements indexed byΛ, forb∈Rⁿletx_Λ∈R^Λbe the result of taking only elements indexed byΛ. With some abuse of notation we will also associate each vectorx_Λ∈R^Λwith a vectorxfromRⁿthat has same coefﬁcients onΛand zeros elsewhere. Let us alsoΦ_Λ= [φ_j]_j∈Λbe a subdictionary composed of words indexed byΛandP_Λis the projector onto the corresponding subspace.

The following sufﬁcient conditions for the global minimizer of (A.1) to be supported on Λare due to Tropp (2006), who uses the notion ofexact recovery coefﬁcient,

ERC_Φ(Λ) =1−max

j∈Λ/ kΦ⁺_Λφ_jk₁,

The results are summarized in the next theorem.

Theorem A.1 (Tropp (2006)). Let b˜ be a solution to (A.1). Suppose, that kΦ^>εk_∞ ≤ γERC(Λ). Then,

• the support ofb˜ is contained inΛ;

• the distance betweenb˜ and optimal (non-penalized) parameter satisﬁes, kb˜−b^∗k_∞≤ kΦ⁺

Λεk_∞+γk(Φ_ΛΦ^>_Λ)⁻¹k_1,∞, kΦ_Λ(b˜−b^∗)−P_Λεk₂≤γk(Φ⁺

Λ)^>k_2,∞;

In what follows we want to extend this result for the possibility of using missing observa-tions model. Observe that the program (A.1) is equivalent to

2b^>[Φ^>Φ]b−b^>[Φ^>y] +γkbk₁→ min

b∈R^p

so that for the minimization procedure only knowingD=Φ^>Φand c=Φ^>yis required.

Suppose, that instead we have only access to some estimators ˆD≥0 and ˆcthat are close enough to the original matrix and vector, which may come e.g. from missing observations model. Then, we can solve instead the following problem,

2b^>Dbˆ −b^>cˆ+γkbk₁→ min

b∈R^p. (A.2)

In what follows we provide a slight extension of Tropp’s result towards missing observations, the proof mainly follows the same steps.

Further, for a matrixDand two sets of indicesA,Bwe denote the submatrix on those indices asD_A,Band for a vectorc, the corresponding subvector isc_A.

Lemma A.1. Suppose, that

kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λcˆ_Λ−cˆ_Λ^ck_∞≤γ(1− kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λk_1,∞).

Then, the solutionb˜ to(A.2)is supported onΛ.

Proof. Let ˜bbe the solution to (A.2) with the restriction supp(b)⊂Λ. Since ˆD≥0 this is a convex problem and therefore the solution is unique and satisfy,

Dˆ_Λ,Λb˜−cˆ_Λ+γg=0, g∈∂kbk˜ ₁,

A.1 Lasso and missing observations where∂f(b)denotes subdifferential of a convex function f at a pointb, in the case of`₁ norm we havekgk_∞≤1. Thus,

b˜ =Dˆ⁻¹_Λ,Λcˆ_Λ−γD^ˆ⁻¹_Λ,Λg. (A.3)

Next, we want to check that ˜bis a global minimizer. To do so, let us compare the objective function at a pointb=b˜+δe_j for arbitrary index j∈/Λ. Sincekbk₁=kbk˜ ₁+|δ|, we have

L(b)˜ −L(b) =1

2b˜^>Dˆb˜−1

2b^>Dbˆ −cˆ^>(b˜−b)−γ|δ|

=δ²

2 e^>_jDeˆ _j+|δ|γ−δe^>_jDˆb˜+δc_b_j

>|δ|γ−δe^>_jDˆb˜+δc_b_j,

where the latter comes from the fact that ˆDis positively deﬁnite. Applying the equality (A.3) yields,

e^>_jDˆb˜ =Dˆ_j,ΛDˆ⁻¹_Λ,Λcˆ_Λ−γD^ˆ_j,ΛDˆ⁻¹_Λ,Λg, therefore, taking into accountkgk_∞≤1 we have,

L(b)˜ −L(b)>|δ|h

γ(1− kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λk_1,∞)−

Dˆ_j,ΛDˆ⁻¹_Λ,Λcˆ_Λ−cb_j i

where the right-hand side is nonnegative by the condition of the lemma. Since j∈/ Λ is arbitrary, ˜bis a global solution as well.

Remark A.1. It is not hard to see that in the exact caseDˆ =Φ^>Φandcˆ=Φ^>ythe condition of the lemma above turns into the conditionkΦ^>_ΛcP_Λεk_∞≤γERC(Λ)of Theorem A.1.

Since we are particularly interested in an application to time series, the features matrixΦ should in fact be random, thus stating a ERC-like condition onto it might result in additional unnecessary technical difﬁculties. Instead, let us assume that there is some other matrix ¯D, potentially the expectation ofΦ^>Φ, such that it is close enough to ˆD(with some probability, but we are stating all the results deterministically in this section), and the value that controls the exact recovery looks like

ERC(Λ; ¯D) =1− kD¯_Λ^c_,ΛD¯⁻¹_Λ,Λk_1,∞.

Additionally, we set ¯c=Db¯ ^∗=D¯_·,Λb^∗_Λ— the vector that ˆcis intended to approximate. Note that in this case we have ¯D_Λc,ΛD¯⁻¹_Λ,Λc¯_Λ−c¯_Λ^c =D¯_Λc,Λb^∗_Λ−c¯_Λ^c =0, thus the conditions of Lemma A.1 hold for ¯D,c¯once ERC(Λ; ¯D)andγ are nonnegative. In what follows we control the values appearing in the lemma for ˆDand ˆcthrough the differences between ¯c, ¯Dand ˆc, ˆD, respectively, thus allowing the exact recovery of the sparsity pattern. Lemma 3.7

Corollary A.1. LetD and¯ c¯ be such thatc¯=Db¯ ^∗. Assume that

kˆc−ck¯ _∞≤δ_c, kD¯⁻¹_Λ,Λ(ˆc_Λ−c¯_Λ)k_∞≤δ_c⁰, kD¯⁻¹_Λ,Λ(Dˆ_Λ,·−D¯_Λ,·)k∞,∞≤δ_D, k(Dˆ_·,Λ−D¯_·,Λ)b^∗_Λk_∞≤δ_D⁰, kD¯⁻¹_Λ,Λ(D¯_Λ,Λ−Dˆ_Λ,Λ)b^∗_Λk_∞≤δ_D⁰⁰. Suppose,ERC(Λ)≥3/4and

3δc+3δ_D⁰ ≤γ, sδD≤ 1 16,

where|Λ|=s. Then, the solution to(A.2)is supported onΛand satisﬁes

b˜_Λ=Dˆ⁻¹_Λ,Λcˆ_Λ−γD^ˆ⁻¹_Λ,Λg, (A.4) with someg∈R^ssatisfyingkg_Λk_∞≤1and the max-norm error satisﬁes

kb˜−b^∗k_∞≤2(δ_D⁰⁰+δ_c⁰+γkD¯⁻¹_Λ,Λk_1,∞), while the`₂-norm error satisﬁes

kb˜−b^∗k ≤2√

s(δ_D⁰⁰+δ_c⁰+γ σ_min⁻¹).

If additionally2(δ_D⁰⁰+δ_c⁰+γkD¯⁻¹_Λ,Λk_1,∞)≤min_j∈Λ|b^∗_j|,then we have the exact recovery, so that the following equality takes place

b˜_Λ=Dˆ⁻¹_Λ,Λcˆ_λ−γD^ˆ⁻¹_Λ,Λs_Λ, wheres=sign(b^∗).

A.1 Lasso and missing observations Proof. First observe thatD_Λ^c_,ΛD⁻¹_Λ,Λc_Λ−c_Λ^c=Φ^>_Λc(Φ⁺

Λy−y) =Φ^>_Λc(P_Λ−I)ε. By Lemma A.2 we have,

kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λk_1,∞≤ kD¯_Λ^c_,ΛD¯⁻¹_Λ,Λk_1,∞+4sδ_D≤1/2, while since ¯c_Λ^c =D¯_Λ^c_,Λb^∗_Λ=D¯_Λ^c_,ΛD¯⁻¹_Λ,Λc¯_Λ,

kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λcˆ_Λ−cˆ_Λ^ck_∞≤ kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λcˆ_Λ−D¯_Λ^c_,ΛD¯⁻¹_Λ,Λc¯_Λk_∞+kˆc_Λ^c−c¯_Λ^ck_∞

≤ kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λ(ˆc_Λ−c¯_Λ)k_∞+kDˆ_Λ^c_,Λ(Dˆ⁻¹_Λ,Λ−D¯⁻¹_Λ,Λ)¯c_Λk_∞ +k(Dˆ_Λ^c_,Λ−D¯_Λ^c_,Λ)D¯⁻¹_Λ,Λc¯_Λk_∞+δ_c

≤ kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λ(ˆc_Λ−c¯_Λ)k_∞+kDˆ_Λ^c_,Λ(Dˆ⁻¹_Λ,Λ−D¯⁻¹_Λ,Λ)¯c_Λk_∞+δ_D⁰ +δ_c. Here,kDˆ_Λc,ΛDˆ⁻¹_Λ,Λ(ˆc_Λ−c¯_Λ)k_∞≤δ_c/2 due tokDˆ_Λc,ΛDˆ⁻¹_Λ,Λk_1,∞≤1/2. Moreover, we have

kDˆ_Λ^c_,Λ(Dˆ⁻¹_Λ,Λ−D¯⁻¹_Λ,Λ)¯c_Λk_∞=kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λ(D¯_Λ,Λ−Dˆ_Λ,Λ)D¯⁻¹_Λ,Λc¯_Λk_∞

≤ kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λk_1,∞k(D¯_Λ,Λ−Dˆ_Λ,Λ)D¯⁻¹_Λ,Λc¯_Λk_∞

≤δ_D⁰/2.

Using the condition onγ, we get that kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λcˆ_Λ−cˆ_Λ^ck_∞≤ 3

2(δ_D⁰ +δ_c)≤ γ

2≤γ(1− kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λk_1,∞),

so that the conditions of Lemma A.1 are satisﬁed and (A.4) takes place. This allows us to write

b˜_Λ−b^∗_Λ=Dˆ⁻¹_Λ,Λcˆ_Λ−D¯⁻¹_Λ,Λc¯_Λ−γD^ˆ⁻¹_Λ,Λg,

=Dˆ⁻¹_Λ,Λ(D¯_Λ,Λ−Dˆ_Λ,Λ)D¯⁻¹_Λ,Λc¯_Λ+Dˆ⁻¹_Λ,Λ(ˆc_Λ−c¯_Λ)−γD^ˆ⁻¹_Λ,Λg

=Dˆ⁻¹_Λ,Λ(D¯_Λ,Λ−Dˆ_Λ,Λ)b^∗_Λ+Dˆ⁻¹_Λ,Λ(ˆc_Λ−c¯_Λ)−γD^ˆ⁻¹_Λ,Λg

=Dˆ⁻¹_Λ,ΛD¯_Λ,Λ

D¯⁻¹_Λ,Λ(D¯_Λ,Λ−Dˆ_Λ,Λ)b^∗_Λ+D¯⁻¹_Λ,Λ(ˆc_Λ−c¯_Λ)−γD^¯⁻¹_Λ,Λg By Lemma A.2 we havekDˆ⁻¹_Λ,ΛD¯_Λ,Λk_∞7→∞≤2 so that

kb˜_Λ−b^∗_Λk_∞≤2kD¯⁻¹_Λ,Λ(D¯_Λ,Λ−Dˆ_Λ,Λ)b^∗_Λk_∞+2kD¯⁻¹_Λ,Λ(ˆc_Λ−c¯_Λ)k_∞+2γkD¯⁻¹_Λ,Λk_1,∞.

and since we also have|||Dˆ⁻¹_Λ,ΛD¯_Λ,Λ|||_op≤2 andkgk ≤√

s, it holds kb˜_Λ−b^∗_Λk ≤2√

kD¯⁻¹_Λ,Λ(D¯_Λ,Λ−Dˆ_Λ,Λ)b^∗_Λk_∞+kD¯⁻¹_Λ,Λ(ˆc_Λ−c¯_Λ)k_∞+γ|||D¯⁻¹_Λ,Λ|||_op .

Before we proceed with the proof of this corollary, we present a technical lemma that collects some trivial inequalities.

Lemma A.2. Setδ_c=kˆc−ck¯ _∞,δ_D=k(Dˆ_Λc,Λ−D¯_Λc,Λ)D¯⁻¹_Λ,Λk_∞,∞. Suppose,kD¯_ΛcΛD¯⁻¹

ΛΛk_1,∞≤ 1and sδ_D≤1/2. It holds,

• for each q≥1

kD_Λ,ΛDˆ⁻¹_Λ,Λk_q→q≤2, kDˆ⁻¹_Λ,ΛD_Λ,Λk_q→q≤2 ;

•

kDˆ_Λ^c_,ΛDˆ⁻¹_Λ,Λ−D_Λ^c_,ΛD⁻¹_Λ,Λk_1,∞≤4sδ_D. Proof. First, we have

kD_Λ,ΛDˆ⁻¹_Λ,Λk_q→q=kI+ (D_Λ,Λ−Dˆ_Λ,Λ)Dˆ⁻¹_Λ,Λk_q→q

≤1+k(D_Λ,Λ−Dˆ_Λ,Λ)D⁻¹_Λ,Λk_q→qkD_Λ,ΛDˆ⁻¹_Λ,Λk_q→q

≤1+sδ_DkD_Λ,ΛDˆ⁻¹_Λ,Λk_q→q, which solving the inequality and sincesδ_D≤1/2 turns into

kD_Λ,ΛDˆ⁻¹_Λ,Λk_q→q≤ 1

1−sδ_D ^≤^2.

Similarly,kDˆ⁻¹_Λ,ΛD_Λ,Λk_q→q≤2.

Furthermore,

k(Dˆ_Λ^c_,Λ−D_Λ^c_,Λ)Dˆ⁻¹_Λ,Λk_1,∞≤ k(Dˆ_Λ^c_,Λ−D_Λ^c_,Λ)D⁻¹_Λ,Λk_1,∞kD_Λ,ΛDˆ⁻¹_Λ,Λk_1→1

≤2sδD.

A.2 Gaussian approximation for change point statistic and

kD_Λ^c_,Λ(D⁻¹_Λ,Λ−Dˆ⁻¹_Λ,Λ)k_1,∞≤kD_Λ,Λ^cD⁻¹_Λ,Λk_1,∞kDˆ⁻¹_Λ,Λ(Dˆ_Λ,Λ−D_Λ,Λ)k_1→1

≤kD_Λ,Λ^cD⁻¹_Λ,Λk_1,∞kDˆ⁻¹_Λ,ΛD_Λ,Λk_1→1kD⁻¹_Λ,Λ(Dˆ−D)k_1→1

≤2kD_Λ,Λ^cD⁻¹_Λ,Λk_1,∞sδ_D, which together give us the second inequality.

A.2 Gaussian approximation for change point statistic

LetX₁, . . . ,X_n∈R^d be a martingale difference sequence (MDS) with coefﬁcientsb_k, and set q}. Let additionally, with probability one

|X_{i j}| ≤D_n, 1≤i≤n; 1≤ j≤p. Theorem A.2(Chernozhukov et al. (2013), Theorem B.1). Suppose, positive r,q be such that r+q≤n/2and for some c₁,C₁>0and0<c₂<1/4, c₁≤σ(q)≤σ(q)∨σ(r)≤C₁

Suppose we have another MDSX₁⁰, . . . ,X_n⁰, from which we construct a similar to (A.5) statistic ˇT⁰. Suppose, the sequence hasβ-mixing coefﬁcients bounded by the same values b_k and the values of the vectors bounded a.s. by the same D_n. Finally, let us set Σ⁰ =

n∑ⁿ_i=1EX_iX_i^>. Combining the result above with Gaussian comparison and anti-concentration we get the following corollary.

Lemma A.3. Suppose, there are positive q,r such that q+r<n/2and there are c₁,C₁>0

Proof. Simply apply Theorem A.2, together with Theorem 2 of Chernozhukov et al. (2015) and Theorem 1 of Chernozhukov et al. (2017).

Let nowX₁, . . . ,X_n∈R^pbe a martingale difference sequence, withβ-mixing coefﬁcients

into the above form. Following Zhilova (2015) we consider the following approximation.

LetG_ε be anε-net of the unit sphere inR^p, such that for eacha∈R^pit holds,

A.2 Gaussian approximation for change point statistic

and assume that for each suchI it holds,

kV_I⁰−Vk ≤∆I, ∆q=max

|I|=q∆I.

Denote by analogy the test statistics ˆT⁰and the vectorsXe_i⁰. In what follows we assume that the dimension p is constant and the size ofS is growing withn. Moreover, assume that

|X_{i j}|,|X_{i j}⁰| ≤D_nfor eachi,jand that ˆT,Tˆ⁰≤A_n, all with probability≥1−1/n.

Moreover, assume∆_r,∆_q≤c₁/2. Then, for any C₂>0there are c,C>0that only depend covariance difference∆. We have, that (assumings₁≤s₂)

1 between two is bounded by,

|Σ_jk−Σ⁰_jk| ≤a²s₁

Bibliography

Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to markov chains. Electronic Journal of Probability.

Adamczak, R. (2015). A note on the Hanson-Wright inequality for random vectors with dependencies. Electronic Communications in Probability.

Adamczak, R., Kotowski, M., Polaczyk, B., and Strzelecki, M. (2018a). A note on concen-tration for polynomials in the Ising model. arxiv.org/abs/1809.03187.

Adamczak, R., Latała, R., and Meller, R. (2018b). Hanson-Wright inequality in Banach spaces. arXiv preprint arXiv:1811.00353.

Adamczak, R. and Wolff, P. (2015). Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order. Probab. Theory Relat. Fields.

Adams, Z., Füss, R., and Gropp, R. (2014). Spillover effects among ﬁnancial institutions: A state-dependent sensitivity value-at-risk approach. Journal of Financial and Quantitative Analysis, 49(3):575–598.

Arcones, M. and Gine, E. (1993). On decoupling, series expansions, and tail behavior of chaos processes. Journal of Theoretical Probability.

Avanesov, V. and Buzun, N. (2016). Change-point detection in high-dimensional covariance structure. arXiv preprint arXiv:1610.03783.

Avery, C. N., Chevalier, J. A., and Zeckhauser, R. J. (2016). The “CAPS” Prediction System and Stock Market Returns. Review of Finance, 20(4):1363–1381.

Baele, L. and Inghelbrecht, K. (2010). Time-varying integration, interdependence and contagion. Journal of International Money and Finance, 29(5):791–818.

Bauwens, L., Laurent, S., and Rombouts, J. V. (2006). Multivariate GARCH models: a survey. Journal of Applied Econometrics, 21(1):79–109.

Borell, C. (1984). On the taylor series of a wiener polynomial. Seminar Notes on multiple stochastic integration, polynomial chaos and their integration. Case Western Reserve Univ., Cleveland.

Boucheron, S., Bousquet, O., and Lugosi, G. (2005a). Theory of classiﬁcation: A survey of some recent advances. ESAIM: probability and statistics, 9:323–375.

Boucheron, S., Bousquet, O., Lugosi, G., and Massart, P. (2005b). Moment inequalities for functions of independent random variables. The Annals of Probability.

Boucheron, S., Lugosi, G., and Massart, P. (2003). Concentration inequalities using the entropy method. The Annals of Probability.

Boucheron, S., Lugosi, G., and Massart, P. (2013). Concentration inequalities: A nonasymp-totic theory of independence. Oxford university press.

Brody, S. and Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: Using word lengthening to detect sentiment in microblogs. InProceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pages 562–570, Stroudsburg, PA, USA. Association for Computational Linguistics.

Cha, M., Haddadi, H., Benevenuto, F., and Gummadi, K. P. (2010). Measuring user inﬂuence in twitter: The million follower fallacy. In fourth international AAAI conference on weblogs and social media.

Chen, C. Y., Després, R., Guo, L., and Renault, T. (2019a). What makes cryptocurrencies special? Investor sentiment and price predictability during the bubble. working paper.

Chen, C. Y.-H., Härdle, W. K., and Okhrin, Y. (2019b). Tail event driven networks of SIFIs.

Journal of Econometrics, 208(1):282–298.

Chen, S. and Schienle, M. (2019). Pre-screening and reduced rank regression for high-dimensional cointegration. KIT working paper.

Chen, X. and Fan, Y. (2006a). Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspeciﬁcation. Journal of Economet-rics, 135(1-2):125–154.

Chen, X. and Fan, Y. (2006b). Estimation of copula-based semiparametric time series models.

Journal of Econometrics, 130(2):307–335.

Chen, Y., Härdle, W. K., and Pigorsch, U. (2010). Localized realized volatility modeling.

Journal of the American Statistical Association, 105(492):1376–1393.

Chen, Y. and Niu, L. (2014). Adaptive dynamic Nelson–Siegel term structure model with applications. Journal of Econometrics, 180(1):98–115.

Chen, Y., Trimborn, S., and Zhang, J. (2018). Discover regional and size effects in global bitcoin blockchain via sparse-group network autoregressive modeling. Available at SSRN 3245031.

Chernozhukov, V., Chetverikov, D., and Kato, K. (2013). Testing many moment inequalities.

arXiv preprint arXiv:1312.7614.

Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and Related Fields, 162(1-2):47–70.

Chernozhukov, V., Chetverikov, D., and Kato, K. (2017). Detailed proof of Nazarov’s inequality. arXiv preprint arXiv:1711.10696.

Bibliography Chernozhukov, V., Härdle, W. K., Huang, C., and Wang, W. (2018). Lasso-driven inference

in time and space. arXiv preprint arXiv:1806.05081.

Im Dokument Modelling Financial and Social Networks (Seite 132-0)