As we mentioned above, one of the prominent applications of the uniform Hanson-Wright inequalities is the recent concentration result in the Gaussian covariance estimation problem.
It is known that covariance estimation problems may be alternatively approached by the matrix Bernstein inequality. Following the truncation approach, which was taken above we provide a version of matrix Bernstein inequality, that does not require uniformly bounded
4.3 Matrix Bernstein inequality in the subexponential case matrices. The standard version of the inequality (see Tropp (2012) and reference therein) may be formulated as follows: consider random independent matricesX1, . . . ,XN ∈Rn×n,
. The first problem with this result is that it does not hold in general cases when maxikXikψ1 or maxikXikψ2 are bounded.
The second problem is the dependence on the dimensionn, which does not allow applying it to operators in Hilbert spaces. For a positive definite real square matrixAwe define the effective rankas ˜r(A) =tr(A)kAk. We show the following bound.
Proposition 4.3. Suppose, we have random independent symmetric matrices X1, . . . ,XN ∈ Rn×n, each satisfying
kXik
ψ1 <∞.Set M =
maxi≤NkXik
ψ1 and let positive-definite matrix R be such thatE∑Ni=1Xi2R. Finally, setσ2=kRk. There are absolute constants
Remark 4.8. Using the well known bound for the maximum of subexponential random variables (see Ledoux and Talagrand (2013)) we have
max When n=1the effective rank plays no role and our bound recovers the version of classical Bernstein inequality which is due to Adamczak (2008). In this paper, it is also shown that the logN factor cannot be removed in general, meaning that M=
maxi≤NkXik
ψ1 cannot be replaced bymaxi≤N
kXik
ψ1 in general.
Proof. FixU>0 and consider the decomposition
Xi=Yi+Zi, Yi=XiI(kXik ≤U), Zi=XiI(kXik>U),
so that the matrices Yi are uniformly bounded byU in operator norm. By the triangle inequality and the union bound,
P
so the two parts can be treated separately. Throughout the proof c> 0 is an absolute constant which may change from line to line. It is known that uniformly bounded random matrices satisfy Bernstein-type inequality (see Theorem 3.1 in Minsker (2017)) for u≥
1 notYi, we need the following modification of the proof of Minsker’s theorem. Using the notation of his proof, it follows from Lemma 3.1 in Minsker (2017):
logEexp(θ(Yi−EYi)) φ(θU)
U2 E(Yi−EYi)2 φ(θU)
U2 2EYi2 φ(θU) U2 2EXi2. Now, using the same lines of the proof, instead of formula (3.4) we have
Etrφ θ
whereσ2=kRk. Following last lines of the proof of Theorem 3.1 we finally have P
4.3 Matrix Bernstein inequality in the subexponential case
Thus, we can apply Proposition 6.8 from Ledoux and Talagrand (2013) toZitaking values the Banach space(Rn×n,k · k)equipped with the spectral norm. We have,
E which implies with some constantK>0,
E
Using Theorem 6.21 from Ledoux and Talagrand (2013) in(Rn×n,k · k)we have,
wherec>0 is an absolute constant. Combining it with (4.35), and that for some absolute C>0 we haveU≤C
To the best of our knowledge, the Proposition 4.3 is the first to combine two important properties: it simultaneously captures the effective rank instead of the dimensionnand is valid for matrices with subexponential operator norm (previously matrix Bernstein inequality in the unbounded case was granted under the so-called Bernstein moment condition; we refer to Tropp (2012) and the references therein). We should also compare our results with
Proposition 2 of Koltchinskii (2011), which has the same form as our bound, but instead of the effective rank, the original dimensionnis used andM=
maxi≤nkXik
ψ1 is replaced by maxi≤N
kXik ψ1log
N
maxi≤N kXik
ψ1
2
/σ2
.
Application to covariance estimation with missing observations
Now we turn to the problem studied in Koltchinskii and Lounici (2017) and Lounici (2014).
Suppose, we want to estimate the covariance structure of a centered random subgaussian vectorX ∈Rn(which will be assumed centered) based onNi.i.d. observationsX1, . . . ,XN. For the sake of brevity, we work with the finite-dimensional case, while as in Koltchinskii and Lounici (2017) our results will not depend explicitly on the dimensionn. Recall, that a centered random vectorX ∈Rnissubgaussianif for allu∈Rnit holds
khX,uikψ2 .(EhX,ui2)12, (4.36) which does not require any independence of components ofX.
In what follows we discuss a more general framework suggested by Lounici (2014). Let δi,j,i≤N,j≤nbe independent Bernoulli random variables with the meanδ. We assume that instead of observing X1, . . . ,XN we observe vectorsY1, . . . ,YN, which are defined as Yij=δi,jXij. This means that some components of vectorsX1, . . . ,XN are missing (replaced by zero) each with probability 1−δ. Sinceδ can be easily estimated we assume that it is known. Following Lounici (2014), denote
Σˆ(δ)= 1 N
N i=1
∑
YiYi>. It can be easily shown that the estimator
Σˆ = (δ−1−δ−2)Diag(Σˆ(δ)) +δ−2Σˆ(δ) is an unbiased estimator ofΣ=EXiXi>. In particular,
Σ= (δ−1−δ−2)Diag(EYiYi>) +δ−2EYiYi>. (4.37)
4.3 Matrix Bernstein inequality in the subexponential case Theorem 4.3. Under the assumptions defined above, it holds with probability at least1−e−t for t≥1
Remark 4.9. The upper-bound above provides an important improvement upon Proposition 3 in Lounici (2014), which is
kΣˆ−Σk.kΣkmax The bound (4.38)depends on n and therefore is not applicable in the infinite dimensional scenarios. It also contains a term proportional to t2, which appears due to a straightforward truncation of each observation. Moreover, this result has an unnecessary factorr(Σ)˜ in the term
qr(Σ)t˜
Nδ2. Finally, when δ =1tighter results may be obtained using high probability generic chaining bounds for quadratic processes. In particular, Theorem 9 in Koltchinskii and Lounici (2017) implies Unfortunately, this analysis may not be implied forδ <1in general, since the assumption (4.36)will not hold for the vector Y , defined by Yij=δi,jXij. Therefore, our technique is a reasonable alternative which works for generalδ and is almost as tight as(4.39)when δ =1.
To prove Theorem 4.3 we need the following technical Lemma, parts of which may as well be found in Lounici (2014). For a matrixAlet Diag(A) denote its diagonal part and define Off(A) =A−Diag(A).
Lemma 4.9. Let X ∈Rnsatisfy(4.36)with covariance matrixΣany Y = (δ1X1, . . . ,δnXn), whereδi, i≤n are independent Bernoulli random variables with the meanδ. Then, it holds
kDiag(YY>)k
ψ1.r(Σ)kΣk,˜
kOff(YY>)k
ψ1.r(Σ)kΣk.˜
Additionally, it holds for some absolute constant C>0
EOff(YY>)2Cδ2tr(Σ)(Σ+Diag(Σ)), and EDiag(YY>)2.Cδtr(Σ)Diag(Σ).
(4.40)
Proof. Observe, thatkDiag(YY>)k ≤ kYk2andkOff(YY>)k ≤ kYY>k+kDiag(YY>)k ≤ 2kYk2. Therefore,
kOff(YY>)k ψ1
≤2kkYkk2ψ2 ≤2kkXkk2ψ2.tr(Σ), and the same bound holds for
kDiag(YY>)k ψ1.
Let A be an arbitrary symmetric matrix and let us calculate E(Aδ δ>)2, where denotes Hadamard product andδ = (δ1, . . . ,δn)is a vector with independent components having Bernoulli distribution with the meanδ. We have,
h
E(Aδ δ>)2i
ii=E
∑
k
AikδiδkAkiδiδk=
∑
k
AikAikEδi2δk2=δ2[A2]ii+ (δ−δ2)A2ii. For the element at the positioni jwithi6= jwe have,
h
E(Aδ δ>)2i
i j =E
∑
k
AikδiδkAk jδjδk=
∑
k
AikAk jEδiδjδk2
=δ3[A2]i j+ (δ2−δ3)(AiiAi j+Ai jAj j).
This can be put together in the following expression, E(δ δ>A)2=δ3A2+ (δ2−δ3)
Diag(A2) +Off(A)Diag(A) +Diag(A)Off(A) + (δ−δ2)Diag(A)2.
Note, that all of these matrices are positive definite, apart from the term Off(A)Diag(A) + Diag(A)Off(A), which we can obviously bound by 12(Off(A) +Diag(A))2=A2/2. Taking into accountδ ≤1, we have a simple bound
E(δ δ>A)21
2(δ3+δ2)A2+ (δ2−δ3)Diag(A2) + (δ−δ2)Diag(A)2 δ2(A2+Diag(A2)) +δDiag(A)2.
4.3 Matrix Bernstein inequality in the subexponential case Now recall thatY =diag(δ)X, therefore Off(YY>) =δ δ>Off(X X>). Since the latter has zero diagonal, the term withδ in the formula above disappears. Therefore,
EOff(YY>)2δ2 h
EOff(X X>)2+Diag
EOff(X X>)2i
. (4.41)
It holdsEOff(X XT)22E(X X>)2+2EDiag(X XT)2, and we also have from Lounici (2014) that E(X X>)2Ctr(Σ)Σ. Additionally, due to subgaussianity (4.36) we have EXi4.Σ2ii. Finally, the following bound holds
EDiag(X X>)2CDiag(Σ)2Ctr(Σ)Diag(Σ).
Plugging this bounds into (4.41) we get the second inequality.
As for the diagonal, we have forA=Diag(X X>),
EDiag(YY>)3δEDiag(X X>)2Cδtr(Σ)Diag(Σ).
Lemma 4.10. For Y as in Lemma 4.9 and any unit u∈Rnit holds,
ku>Off(YY>)ukL2 .δ2kΣk, ku>Diag(YY>)ukL2.δkΣk.
Proof. Letv∈Rnbe as well arbitrary unit vector. First we want to check, that
ku>Diag(X X>)vkL4 .kΣk, ku>Off(X X>)vkL4.kΣk. (4.42) Obviously, ku>X X>vkL4 ≤ ku>XkL8kv>XkL8 .kΣk, so it is enough to check just for the diagonal. Let us apply simmetrization argument. Suppose,ε= (ε1, . . . ,εd)>are independent Rademacher variables, then
u>Diag(X X>)v=Eεε>diag(u)X X>diag(v)ε=EεuεX X>vε,
whereuε= (u1ε1, . . . ,udεd)>andEε denotes expectation conditioned onX. Then, by Jensen and Hölder inequalities,
E
u>Diag(X X>)u4
≤E
u>εX X>uε 4
=EεE1/2[(u>εX)8|ε]E1/2[(v>εX)8|ε].kΣk4, thus implying (4.42).
Next, let us consider a zero diagonal symmetric matrixB. We have,
Therefore, due to the fact thatBis symmetric we have
E(δ>Bδ)2=δ4
∑
4.3 Matrix Bernstein inequality in the subexponential case As for the diagonal, we have
E
Before we start with the proof of deviation bound let us present the following version of Talagrand’s concentration inequality for the empirical processes, which will help us to capture the tail behavior in the subgaussian regime. Remarkably, the following result can be proven using very similar techniques: at first one may use the modified logarithmic Sobolev inequality to prove a version of Talagrand’s concentration inequality in the bounded case and then use the truncation as in the proof of Theorem 4.1 to get the result in the unbounded case.
Theorem 4.4(Theorem 4 in Adamczak (2008)). Let X1, . . . ,XN∈X be independent sample
Proof of Theorem 4.3. At first, using (4.37) we have kΣˆ−Σk.δ−1
We have ˜r(R)≤2˜r(Σ)andkRk.Nδ2tr(Σ)kΣk. Therefore, with probability at least 1−e−t Integrating this bound (see e.g. Theorem 2.3 in Boucheron et al. (2013)) we easily get
EkOff(Σˆ(δ))−EOff(Σˆ(δ))k.kΣkmax
We proceed with the diagonal term. Applying Proposition 4.3 to the sumNDiag(Σˆ(δ)) =
∑Ni=1Diag(YiYi>)withR=CNδtr(Σ)Diag(Σ)we haver(R).r(Σ)andkRk.Nδtr(Σ)kΣk.
Thus, with probability at least 1−e−twe get, kDiag(Σˆ(δ))−EDiag(Σˆ(δ))k.kΣkmax
Again, integrating this inequality we get a bound for the expectation, EkDiag(Σˆ(δ))−EDiag(Σˆ(δ))k.kΣkmax
4.4 Approximation argument for non-smooth functions We have ku>Diag(YiYi>)uk2L
2 . δkΣk2 and kmaxikOff(YiYi>)kkψ1 .r(Σ)kΣklog˜ N by Lemma 4.10 and Lemma 4.9. By Theorem 4.4 we have with probability at least 1−e−t, kDiag(Σˆ(δ))−EDiag(Σˆ(δ))k ≤2EkDiag(Σˆ(δ))−EDiag(Σˆ(δ))k+kΣk
r δt
N +kΣkr(Σ)t˜ logN N .kΣkmax
r
δr(Σ)logr(Σ)
N ,
r δt
N,r(Σ)(logr(Σ) +t)logN N
! .
It is left to combine the off-diagonal and diagonal bounds,
kΣˆ−Σk ≤δ−2kOff(Σˆ(δ))−EOff(Σˆ(δ))k+δ−1kDiag(Σˆ(δ))−EDiag(Σˆ(δ))k.
4.4 Approximation argument for non-smooth functions
In this section we explain how one can apply the Sobolev inequality for functions that are not everywhere diffirentiable rigorously. In order to use the Assumption (4.6), we need to take smooth approximations of the function
Z(X) =sup
A
(X>AX−EX>AX).
Notice, that we have
|Z(X)−Z(Y)| ≤ kX−Yk
sup
A
kAXk+sup
A
kAYk
.
The following simple lemma shows how to apply the logarithmic Sobolev inequality to non-differentiable functions that satisfy such inequality.
Lemma 4.11. Suppose, a random vector X satisfies Assumption 4.1. Let f :Rn→Rbe such that
|f(x)− f(y)| ≤ |x−y|max(L(x),L(y)),
for some continuous L(x)≥0. Then, for some absolute constant C>0and anyλ ∈Rit holds,
Ent(eλf)≤CK2λ2EL(x)2eλf
Proof. Seth(x) =x2(1−x)2+ and consider a smoothing kernel supported on unit ball,
Moreover, we have due to the symmetry
∇Fm(x) =
4.4 Approximation argument for non-smooth functions Assumption 1,
Ent(Fm2)≤K2Ek∇Fm(x)k2≤2CgK2EL2m(x)Fem(x)2, and taking limitm→∞gives the required inequality.
Appendix A Technical tools
A.1 Lasso and missing observations
Suppose, we observe a signaly∈Rnof the form y=Φb∗+ε,
whereΦ= [φ1, . . . ,φp]∈Rn×p is a dictionary of words φj ∈Rn and b∗ is some sparse parameter with a supportΛ⊂ {1, . . . ,p}. We want to recover exact sparse representation by solving quadratic program
1
2ky−Φbk2+γkbk1→ min
b∈Rp. (A.1)
Denote byRΛ the set of vectors with elements indexed byΛ, forb∈RnletxΛ∈RΛbe the result of taking only elements indexed byΛ. With some abuse of notation we will also associate each vectorxΛ∈RΛwith a vectorxfromRnthat has same coefficients onΛand zeros elsewhere. Let us alsoΦΛ= [φj]j∈Λbe a subdictionary composed of words indexed byΛandPΛis the projector onto the corresponding subspace.
The following sufficient conditions for the global minimizer of (A.1) to be supported on Λare due to Tropp (2006), who uses the notion ofexact recovery coefficient,
ERCΦ(Λ) =1−max
j∈Λ/ kΦ+Λφjk1,
The results are summarized in the next theorem.
Theorem A.1 (Tropp (2006)). Let b˜ be a solution to (A.1). Suppose, that kΦ>εk∞ ≤ γERC(Λ). Then,
• the support ofb˜ is contained inΛ;
• the distance betweenb˜ and optimal (non-penalized) parameter satisfies, kb˜−b∗k∞≤ kΦ+
Λεk∞+γk(ΦΛΦ>Λ)−1k1,∞, kΦΛ(b˜−b∗)−PΛεk2≤γk(Φ+
Λ)>k2,∞;
In what follows we want to extend this result for the possibility of using missing observa-tions model. Observe that the program (A.1) is equivalent to
1
2b>[Φ>Φ]b−b>[Φ>y] +γkbk1→ min
b∈Rp
,
so that for the minimization procedure only knowingD=Φ>Φand c=Φ>yis required.
Suppose, that instead we have only access to some estimators ˆD≥0 and ˆcthat are close enough to the original matrix and vector, which may come e.g. from missing observations model. Then, we can solve instead the following problem,
1
2b>Dbˆ −b>cˆ+γkbk1→ min
b∈Rp. (A.2)
In what follows we provide a slight extension of Tropp’s result towards missing observations, the proof mainly follows the same steps.
Further, for a matrixDand two sets of indicesA,Bwe denote the submatrix on those indices asDA,Band for a vectorc, the corresponding subvector iscA.
Lemma A.1. Suppose, that
kDˆΛc,ΛDˆ−1Λ,ΛcˆΛ−cˆΛck∞≤γ(1− kDˆΛc,ΛDˆ−1Λ,Λk1,∞).
Then, the solutionb˜ to(A.2)is supported onΛ.
Proof. Let ˜bbe the solution to (A.2) with the restriction supp(b)⊂Λ. Since ˆD≥0 this is a convex problem and therefore the solution is unique and satisfy,
DˆΛ,Λb˜−cˆΛ+γg=0, g∈∂kbk˜ 1,
A.1 Lasso and missing observations where∂f(b)denotes subdifferential of a convex function f at a pointb, in the case of`1 norm we havekgk∞≤1. Thus,
b˜ =Dˆ−1Λ,ΛcˆΛ−γDˆ−1Λ,Λg. (A.3)
Next, we want to check that ˜bis a global minimizer. To do so, let us compare the objective function at a pointb=b˜+δej for arbitrary index j∈/Λ. Sincekbk1=kbk˜ 1+|δ|, we have
L(b)˜ −L(b) =1
2b˜>Dˆb˜−1
2b>Dbˆ −cˆ>(b˜−b)−γ|δ|
=δ2
2 e>jDeˆ j+|δ|γ−δe>jDˆb˜+δcbj
>|δ|γ−δe>jDˆb˜+δcbj,
where the latter comes from the fact that ˆDis positively definite. Applying the equality (A.3) yields,
e>jDˆb˜ =Dˆj,ΛDˆ−1Λ,ΛcˆΛ−γDˆj,ΛDˆ−1Λ,Λg, therefore, taking into accountkgk∞≤1 we have,
L(b)˜ −L(b)>|δ|h
γ(1− kDˆΛc,ΛDˆ−1Λ,Λk1,∞)−
Dˆj,ΛDˆ−1Λ,ΛcˆΛ−cbj i
,
where the right-hand side is nonnegative by the condition of the lemma. Since j∈/ Λ is arbitrary, ˜bis a global solution as well.
Remark A.1. It is not hard to see that in the exact caseDˆ =Φ>Φandcˆ=Φ>ythe condition of the lemma above turns into the conditionkΦ>ΛcPΛεk∞≤γERC(Λ)of Theorem A.1.
Since we are particularly interested in an application to time series, the features matrixΦ should in fact be random, thus stating a ERC-like condition onto it might result in additional unnecessary technical difficulties. Instead, let us assume that there is some other matrix ¯D, potentially the expectation ofΦ>Φ, such that it is close enough to ˆD(with some probability, but we are stating all the results deterministically in this section), and the value that controls the exact recovery looks like
ERC(Λ; ¯D) =1− kD¯Λc,ΛD¯−1Λ,Λk1,∞.
Additionally, we set ¯c=Db¯ ∗=D¯·,Λb∗Λ— the vector that ˆcis intended to approximate. Note that in this case we have ¯DΛc,ΛD¯−1Λ,Λc¯Λ−c¯Λc =D¯Λc,Λb∗Λ−c¯Λc =0, thus the conditions of Lemma A.1 hold for ¯D,c¯once ERC(Λ; ¯D)andγ are nonnegative. In what follows we control the values appearing in the lemma for ˆDand ˆcthrough the differences between ¯c, ¯Dand ˆc, ˆD, respectively, thus allowing the exact recovery of the sparsity pattern. Lemma 3.7
Corollary A.1. LetD and¯ c¯ be such thatc¯=Db¯ ∗. Assume that
kˆc−ck¯ ∞≤δc, kD¯−1Λ,Λ(ˆcΛ−c¯Λ)k∞≤δc0, kD¯−1Λ,Λ(DˆΛ,·−D¯Λ,·)k∞,∞≤δD, k(Dˆ·,Λ−D¯·,Λ)b∗Λk∞≤δD0, kD¯−1Λ,Λ(D¯Λ,Λ−DˆΛ,Λ)b∗Λk∞≤δD00. Suppose,ERC(Λ)≥3/4and
3δc+3δD0 ≤γ, sδD≤ 1 16,
where|Λ|=s. Then, the solution to(A.2)is supported onΛand satisfies
b˜Λ=Dˆ−1Λ,ΛcˆΛ−γDˆ−1Λ,Λg, (A.4) with someg∈RssatisfyingkgΛk∞≤1and the max-norm error satisfies
kb˜−b∗k∞≤2(δD00+δc0+γkD¯−1Λ,Λk1,∞), while the`2-norm error satisfies
kb˜−b∗k ≤2√
s(δD00+δc0+γ σmin−1).
If additionally2(δD00+δc0+γkD¯−1Λ,Λk1,∞)≤minj∈Λ|b∗j|,then we have the exact recovery, so that the following equality takes place
b˜Λ=Dˆ−1Λ,Λcˆλ−γDˆ−1Λ,ΛsΛ, wheres=sign(b∗).
A.1 Lasso and missing observations Proof. First observe thatDΛc,ΛD−1Λ,ΛcΛ−cΛc=Φ>Λc(Φ+
Λy−y) =Φ>Λc(PΛ−I)ε. By Lemma A.2 we have,
kDˆΛc,ΛDˆ−1Λ,Λk1,∞≤ kD¯Λc,ΛD¯−1Λ,Λk1,∞+4sδD≤1/2, while since ¯cΛc =D¯Λc,Λb∗Λ=D¯Λc,ΛD¯−1Λ,Λc¯Λ,
kDˆΛc,ΛDˆ−1Λ,ΛcˆΛ−cˆΛck∞≤ kDˆΛc,ΛDˆ−1Λ,ΛcˆΛ−D¯Λc,ΛD¯−1Λ,Λc¯Λk∞+kˆcΛc−c¯Λck∞
≤ kDˆΛc,ΛDˆ−1Λ,Λ(ˆcΛ−c¯Λ)k∞+kDˆΛc,Λ(Dˆ−1Λ,Λ−D¯−1Λ,Λ)¯cΛk∞ +k(DˆΛc,Λ−D¯Λc,Λ)D¯−1Λ,Λc¯Λk∞+δc
≤ kDˆΛc,ΛDˆ−1Λ,Λ(ˆcΛ−c¯Λ)k∞+kDˆΛc,Λ(Dˆ−1Λ,Λ−D¯−1Λ,Λ)¯cΛk∞+δD0 +δc. Here,kDˆΛc,ΛDˆ−1Λ,Λ(ˆcΛ−c¯Λ)k∞≤δc/2 due tokDˆΛc,ΛDˆ−1Λ,Λk1,∞≤1/2. Moreover, we have
kDˆΛc,Λ(Dˆ−1Λ,Λ−D¯−1Λ,Λ)¯cΛk∞=kDˆΛc,ΛDˆ−1Λ,Λ(D¯Λ,Λ−DˆΛ,Λ)D¯−1Λ,Λc¯Λk∞
≤ kDˆΛc,ΛDˆ−1Λ,Λk1,∞k(D¯Λ,Λ−DˆΛ,Λ)D¯−1Λ,Λc¯Λk∞
≤δD0/2.
Using the condition onγ, we get that kDˆΛc,ΛDˆ−1Λ,ΛcˆΛ−cˆΛck∞≤ 3
2(δD0 +δc)≤ γ
2≤γ(1− kDˆΛc,ΛDˆ−1Λ,Λk1,∞),
so that the conditions of Lemma A.1 are satisfied and (A.4) takes place. This allows us to write
b˜Λ−b∗Λ=Dˆ−1Λ,ΛcˆΛ−D¯−1Λ,Λc¯Λ−γDˆ−1Λ,Λg,
=Dˆ−1Λ,Λ(D¯Λ,Λ−DˆΛ,Λ)D¯−1Λ,Λc¯Λ+Dˆ−1Λ,Λ(ˆcΛ−c¯Λ)−γDˆ−1Λ,Λg
=Dˆ−1Λ,Λ(D¯Λ,Λ−DˆΛ,Λ)b∗Λ+Dˆ−1Λ,Λ(ˆcΛ−c¯Λ)−γDˆ−1Λ,Λg
=Dˆ−1Λ,ΛD¯Λ,Λ
D¯−1Λ,Λ(D¯Λ,Λ−DˆΛ,Λ)b∗Λ+D¯−1Λ,Λ(ˆcΛ−c¯Λ)−γD¯−1Λ,Λg By Lemma A.2 we havekDˆ−1Λ,ΛD¯Λ,Λk∞7→∞≤2 so that
kb˜Λ−b∗Λk∞≤2kD¯−1Λ,Λ(D¯Λ,Λ−DˆΛ,Λ)b∗Λk∞+2kD¯−1Λ,Λ(ˆcΛ−c¯Λ)k∞+2γkD¯−1Λ,Λk1,∞.
and since we also have|||Dˆ−1Λ,ΛD¯Λ,Λ|||op≤2 andkgk ≤√
s, it holds kb˜Λ−b∗Λk ≤2√
s
kD¯−1Λ,Λ(D¯Λ,Λ−DˆΛ,Λ)b∗Λk∞+kD¯−1Λ,Λ(ˆcΛ−c¯Λ)k∞+γ|||D¯−1Λ,Λ|||op .
Before we proceed with the proof of this corollary, we present a technical lemma that collects some trivial inequalities.
Lemma A.2. Setδc=kˆc−ck¯ ∞,δD=k(DˆΛc,Λ−D¯Λc,Λ)D¯−1Λ,Λk∞,∞. Suppose,kD¯ΛcΛD¯−1
ΛΛk1,∞≤ 1and sδD≤1/2. It holds,
• for each q≥1
kDΛ,ΛDˆ−1Λ,Λkq→q≤2, kDˆ−1Λ,ΛDΛ,Λkq→q≤2 ;
•
kDˆΛc,ΛDˆ−1Λ,Λ−DΛc,ΛD−1Λ,Λk1,∞≤4sδD. Proof. First, we have
kDΛ,ΛDˆ−1Λ,Λkq→q=kI+ (DΛ,Λ−DˆΛ,Λ)Dˆ−1Λ,Λkq→q
≤1+k(DΛ,Λ−DˆΛ,Λ)D−1Λ,Λkq→qkDΛ,ΛDˆ−1Λ,Λkq→q
≤1+sδDkDΛ,ΛDˆ−1Λ,Λkq→q, which solving the inequality and sincesδD≤1/2 turns into
kDΛ,ΛDˆ−1Λ,Λkq→q≤ 1
1−sδD ≤2.
Similarly,kDˆ−1Λ,ΛDΛ,Λkq→q≤2.
Furthermore,
k(DˆΛc,Λ−DΛc,Λ)Dˆ−1Λ,Λk1,∞≤ k(DˆΛc,Λ−DΛc,Λ)D−1Λ,Λk1,∞kDΛ,ΛDˆ−1Λ,Λk1→1
≤2sδD.
A.2 Gaussian approximation for change point statistic and
kDΛc,Λ(D−1Λ,Λ−Dˆ−1Λ,Λ)k1,∞≤kDΛ,ΛcD−1Λ,Λk1,∞kDˆ−1Λ,Λ(DˆΛ,Λ−DΛ,Λ)k1→1
≤kDΛ,ΛcD−1Λ,Λk1,∞kDˆ−1Λ,ΛDΛ,Λk1→1kD−1Λ,Λ(Dˆ−D)k1→1
≤2kDΛ,ΛcD−1Λ,Λk1,∞sδD, which together give us the second inequality.
A.2 Gaussian approximation for change point statistic
LetX1, . . . ,Xn∈Rd be a martingale difference sequence (MDS) with coefficientsbk, and set q}. Let additionally, with probability one
|Xi j| ≤Dn, 1≤i≤n; 1≤ j≤p. Theorem A.2(Chernozhukov et al. (2013), Theorem B.1). Suppose, positive r,q be such that r+q≤n/2and for some c1,C1>0and0<c2<1/4, c1≤σ(q)≤σ(q)∨σ(r)≤C1
Suppose we have another MDSX10, . . . ,Xn0, from which we construct a similar to (A.5) statistic ˇT0. Suppose, the sequence hasβ-mixing coefficients bounded by the same values bk and the values of the vectors bounded a.s. by the same Dn. Finally, let us set Σ0 =
1
n∑ni=1EXiXi>. Combining the result above with Gaussian comparison and anti-concentration we get the following corollary.
Lemma A.3. Suppose, there are positive q,r such that q+r<n/2and there are c1,C1>0
Proof. Simply apply Theorem A.2, together with Theorem 2 of Chernozhukov et al. (2015) and Theorem 1 of Chernozhukov et al. (2017).
Let nowX1, . . . ,Xn∈Rpbe a martingale difference sequence, withβ-mixing coefficients
into the above form. Following Zhilova (2015) we consider the following approximation.
LetGε be anε-net of the unit sphere inRp, such that for eacha∈Rpit holds,
A.2 Gaussian approximation for change point statistic
and assume that for each suchI it holds,
kVI0−Vk ≤∆I, ∆q=max
|I|=q∆I.
Denote by analogy the test statistics ˆT0and the vectorsXei0. In what follows we assume that the dimension p is constant and the size ofS is growing withn. Moreover, assume that
|Xi j|,|Xi j0| ≤Dnfor eachi,jand that ˆT,Tˆ0≤An, all with probability≥1−1/n.
Moreover, assume∆r,∆q≤c1/2. Then, for any C2>0there are c,C>0that only depend covariance difference∆. We have, that (assumings1≤s2)
1 between two is bounded by,
|Σjk−Σ0jk| ≤a2s1
Bibliography
Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to markov chains. Electronic Journal of Probability.
Adamczak, R. (2015). A note on the Hanson-Wright inequality for random vectors with dependencies. Electronic Communications in Probability.
Adamczak, R., Kotowski, M., Polaczyk, B., and Strzelecki, M. (2018a). A note on concen-tration for polynomials in the Ising model. arxiv.org/abs/1809.03187.
Adamczak, R., Latała, R., and Meller, R. (2018b). Hanson-Wright inequality in Banach spaces. arXiv preprint arXiv:1811.00353.
Adamczak, R. and Wolff, P. (2015). Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order. Probab. Theory Relat. Fields.
Adams, Z., Füss, R., and Gropp, R. (2014). Spillover effects among financial institutions: A state-dependent sensitivity value-at-risk approach. Journal of Financial and Quantitative Analysis, 49(3):575–598.
Arcones, M. and Gine, E. (1993). On decoupling, series expansions, and tail behavior of chaos processes. Journal of Theoretical Probability.
Avanesov, V. and Buzun, N. (2016). Change-point detection in high-dimensional covariance structure. arXiv preprint arXiv:1610.03783.
Avery, C. N., Chevalier, J. A., and Zeckhauser, R. J. (2016). The “CAPS” Prediction System and Stock Market Returns. Review of Finance, 20(4):1363–1381.
Baele, L. and Inghelbrecht, K. (2010). Time-varying integration, interdependence and contagion. Journal of International Money and Finance, 29(5):791–818.
Bauwens, L., Laurent, S., and Rombouts, J. V. (2006). Multivariate GARCH models: a survey. Journal of Applied Econometrics, 21(1):79–109.
Borell, C. (1984). On the taylor series of a wiener polynomial. Seminar Notes on multiple stochastic integration, polynomial chaos and their integration. Case Western Reserve Univ., Cleveland.
Boucheron, S., Bousquet, O., and Lugosi, G. (2005a). Theory of classification: A survey of some recent advances. ESAIM: probability and statistics, 9:323–375.
Boucheron, S., Bousquet, O., Lugosi, G., and Massart, P. (2005b). Moment inequalities for functions of independent random variables. The Annals of Probability.
Boucheron, S., Lugosi, G., and Massart, P. (2003). Concentration inequalities using the entropy method. The Annals of Probability.
Boucheron, S., Lugosi, G., and Massart, P. (2013). Concentration inequalities: A nonasymp-totic theory of independence. Oxford university press.
Brody, S. and Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: Using word lengthening to detect sentiment in microblogs. InProceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pages 562–570, Stroudsburg, PA, USA. Association for Computational Linguistics.
Cha, M., Haddadi, H., Benevenuto, F., and Gummadi, K. P. (2010). Measuring user influence in twitter: The million follower fallacy. In fourth international AAAI conference on weblogs and social media.
Chen, C. Y., Després, R., Guo, L., and Renault, T. (2019a). What makes cryptocurrencies special? Investor sentiment and price predictability during the bubble. working paper.
Chen, C. Y.-H., Härdle, W. K., and Okhrin, Y. (2019b). Tail event driven networks of SIFIs.
Journal of Econometrics, 208(1):282–298.
Chen, S. and Schienle, M. (2019). Pre-screening and reduced rank regression for high-dimensional cointegration. KIT working paper.
Chen, X. and Fan, Y. (2006a). Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. Journal of Economet-rics, 135(1-2):125–154.
Chen, X. and Fan, Y. (2006b). Estimation of copula-based semiparametric time series models.
Journal of Econometrics, 130(2):307–335.
Chen, Y., Härdle, W. K., and Pigorsch, U. (2010). Localized realized volatility modeling.
Journal of the American Statistical Association, 105(492):1376–1393.
Chen, Y. and Niu, L. (2014). Adaptive dynamic Nelson–Siegel term structure model with applications. Journal of Econometrics, 180(1):98–115.
Chen, Y., Trimborn, S., and Zhang, J. (2018). Discover regional and size effects in global bitcoin blockchain via sparse-group network autoregressive modeling. Available at SSRN 3245031.
Chernozhukov, V., Chetverikov, D., and Kato, K. (2013). Testing many moment inequalities.
arXiv preprint arXiv:1312.7614.
Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and Related Fields, 162(1-2):47–70.
Chernozhukov, V., Chetverikov, D., and Kato, K. (2017). Detailed proof of Nazarov’s inequality. arXiv preprint arXiv:1711.10696.
Bibliography Chernozhukov, V., Härdle, W. K., Huang, C., and Wang, W. (2018). Lasso-driven inference
in time and space. arXiv preprint arXiv:1806.05081.
in time and space. arXiv preprint arXiv:1806.05081.