A test for separability in covariance operators of random surfaces

(1)

SFB 823

A test for separability in covariance operators of random surfaces

Discussion Paper

Pramita Bagchi, Holger Dette

Nr. 19/2017

(2)

(3)

A test for separability in covariance operators of random surfaces

Pramita Bagchi, Holger Dette Ruhr-Universität Bochum

Fakultät für Mathematik 44780 Bochum

Germany October 21, 2017

Abstract

The assumption of separability is a simplifying and very popular assumption in the analysis of spatio-temporal or hypersurface data structures. It is often made in situations where the covariance structure cannot be easily estimated, for example because of a small sample size or because of computational storage problems. In this paper we propose a new and very simple test to validate this assumption. Our approach is based on a measure of separability which is zero in the case of separability and positive otherwise. The measure can be estimated without calculating the full non-separable covariance operator. We prove asymptotic normality of the corresponding statistic with a limiting variance, which can easily be estimated from the available data. As a consequence quantiles of the standard normal distribution can be used to obtain critical values and the new test of separability is very easy to implement. In particular, our approach does neither require projections on subspaces generated by the eigenfunctions of the covariance operator, nor resampling procedures to obtain critical values nor distributional assumptions as recently used by Aston et al. (2017) and Constantinou et al. (2017) to construct tests for separability. We investigate the finite sample performance by means of a simulation study and also provide a comparison with the currently available methodology. Finally, the new procedure is illustrated analyzing wind speed and temperature data.

Keywords: functional data, minimum distance, separability, space-time processes, surface data structures

AMS Subject classification: 62G10, 62G20

(4)

1 Introduction

Data, which is functionalandmultidimensional is usually called surface data and arises in areas such as medical imaging [see Skup (2010); Worsley et al. (1996)], spectrograms derived from audio signals or geolocalized data [see Bar-Hen et al. (2008); Rabiner and Schafer (1978) ]. In many of these ultra high-dimensional problems a completely nonparametric estimation of the covariance operator is not possible as the number of available observations is small compared to the dimension. A common approach to obtain reasonable estimates in this context are structural assumptions on the covariance of the underlying process, and in recent years the assumption of separability has become very popular, for example in the analysis of geostatistical space-time models [see Genton (2007); Gneiting et al. (2007), among others]. Roughly speaking, this assumption allows to write the covariance

c(s,t,s⁰,t⁰)=E[X(s,t)X(s⁰,t⁰)]

of a (real valued) space-time process {X(s,t)}(s,t)∈S×T as a product of the space and time covariance function, that is

c(s,t,s⁰,t⁰)=c₁(s,s⁰)c₂(t,t⁰). (1.1) It was pointed out by many authors that the assumption of separability yields a substan- tial simplification of the estimation problem and thus reduces computational costs in the estimation of the covariance in high dimensional problems [see for example Huizenga et al. (2002); Rougier (2017)]. Despite of its importance, there exist only a few tools to validate the assumption of separability for surface data.

Many authors developed tests for spatio-temporal data. For example, Fuentes (2006) proposed a test based on the spectral representation, and Lu and Zimmerman (2005);

Mitchell et al. (2005, 2006) investigated likelihood ratio tests under the assumption of a normal distribution. Recently, Constantinou et al. (2017) derived the joint distribution of the three statistics appearing in the likelihood ratio test and used this result to derive the asymptotic distribution of the (log) likelihood ratio. These authors also proposed alternative tests which are based on distances between an estimator of the covariance under the assumption of separability and an estimator which does not use this assumption.

Aston et al. (2017) considered the problem of testing for separability in the context of hypersurface data. These authors pointed out that many available methods require the estimation of the full multidimensional covariance structure, which can become infea- sible for high dimensional data. In order to address this issue they developed a bootstrap test for applications, where replicates from the underlying random process are available.

To avoid estimation and storage of the full covariance finite-dimensional projections of

(5)

the difference between the covariance operator and a nonparametric separable approximation are proposed. In particular they suggest to project onto subspaces generated by the eigenfunctions of the covariance operator estimated under the assumption of separability. However, as pointed in the same references the choice of the number of eigenfunctions onto which one should project is not trivial and the test might be sensi- tive with respect to this choice.

In this paper we present an alternative and very simple test for the hypothesis of separability in hypersurface data. We consider a similar setup as in Aston et al. (2017) and proceed in two steps. First we derive anexplicit expression for the minimal distance between the covariance operator and its approximation by a separable covariance operator, where the minimum is taken with respect to the second factor of the tensor product. It turns out that this minimum vanishes if and only if the covariance operator is separable. Secondly, we directly estimate the minimal distance (and not the covariance operator itself ) from the available data. As a consequence the calculation of the test statistic does neither use an estimate of the full non-separable covariance operator nor requires the specification of subspaces used for a projection. In the main result of this paper we derive the asymptotic distribution of the test statistic, which is normal (after appropriate standardization) under the null hypothesisandalternative. The limiting variance under the null hypothesis can easily be estimated and as consequence we obtain a very simple test for the hypothesis of separability, which only requires the quantiles of the normal distribution. Moreover, in contrast to the work of Aston et al. (2017), the test proposed here does not require resampling and - from a theoretical perspec- tive - the limiting theorems are valid under more general and easier to verify moment assumptions.

In Section 2 we review some basic terminology and minimize the distance between the covariance operator and separable covariance operators with respect to the second factor of the tensor product. This minimum distance could also be interpreted as a measure of deviation from separability (it is zero in the case of separability and positive otherwise). In Section 3 we propose an estimator of the minimum distance and prove asymptotic normality of a standardised version of this statistic. We also provide a simple estimate of the limiting variance under the null hypothesis and prove its con- sistency. Section 4 is devoted to an investigation of the finite sample properties of the new test and a comparison with two alternative tests for this problem, which have recently been proposed by Aston et al. (2017) and Constantinou et al. (2017). In particular we demonstrate that - despite of its simplicity - the new procedure has very competitive properties compared to the currently available methodology. Finally, some technical details are deferred to the Appendix A.

(6)

2 Hilbert spaces and a measure of separability

We begin introducing some basic facts about Hilbert spaces, Hilbert-Schmidt operators and tensor products. For more details we refer to the monographs of Weidmann (1980), Dunford and Schwartz (1988) or Gohberg et al. (1990). LetHbe a real separable Hilbert space with inner-product〈·,·〉and normk · k. The space of bounded linear operators on His denoted byS_∞(H) with operator norm

T_∞:= sup

kfk≤1

kT fk.

A bounded linear operatorT is said to be compact if it can be written as T=X

j≥1

s_j(T)〈e_j,·〉f_j,

where {e_j : j ≥1} and {f_j : j ≥1} are orthonormal sets of H, {s_j(T) : j ≥1} are the sin- gular values ofT and the series converges in the operator norm. We say that a compact operatorT belongs to the Schatten class of orderp≥1 and writeT ∈S_p(H) if

Tp=

³X

j≥1

s_j(A)´1/p

< ∞.

The Schatten class of orderp≥1 is a Banach space with norm.pand with the property thatS_p(H)⊂S_q(H) forp<q. In particular we are interested in Schatten classes of order p=1 and 2. A compact operatorT is called Hilbert-Schmidt operator ifT ∈S₂(H) and trace class ifT ∈S₁(H). The space of Hilbert-Schmidt operatorsS₂(H) is also a Hilbert space with the Hilbert-Schmidt inner product given by

〈A,B〉H S=X

j≥1

〈Ae_j,B e_j〉

for each A,B∈S₂(H), where {e_j :j ≥1} is an orthonormal basis (the inner product does not depend on the choice of the basis).

For two real separable Hilbert spacesH₁andH₂, the tensor product ofH₁andH₂, denoted as H :=H₁⊗H₂, is the Hilbert space obtained by the completion of all finite

sums N

X

i,j=1

u_i⊗v_j , u_i∈H₁, v_j∈H₂,

under the inner product 〈u⊗v,w⊗z〉 = 〈u,w〉〈v,z〉, foru,w ∈H₁ and v,z ∈H₂. For C₁∈S_∞(H₁) andC₂∈S_∞(H₂), the tensor productC₁⊗˜C₂is defined as the unique linear operator onH₁⊗H₂satisfying

(C₁⊗˜C₂)(u⊗v)=C₁u⊗C₂v, u∈H₁,v∈H₂.

(7)

In factC₁⊗˜C₂∈S_∞(H) withC₁⊗˜C₂∞= C₁∞C₂∞. Moreover, ifC₁∈S_p(H₁) and C₂∈S_p(H₂), forp ≥1, thenC₁⊗C˜ 2∈S_p(H) withC1⊗C˜ 2p = C1pC2p. For more details we refer to Chapter 8 of Weidmann (1980). In the sequel we denote the Hilbert- Schmidt inner product on S₂(H) with H =H₁⊗H₂ as〈·,·〉H S and that of S₂(H₁) and S2(H2) as〈·,·〉S₂(H₁)and〈·,·〉S₂(H₂)respectively.

2.1 Measuring separability

We consider random elementsX in the Hilbert spaceHwithEkXk⁴< ∞. (See Chapter 7 from Hsing and Eubank (2015)) Then the covariance operator ofX is defined asC := E[(X−EX)⊗o(X −EX)], where forf,g ∈Hthe operator f ⊗og :H→His defined by

(f ⊗og)h= 〈h,g〉f for allh∈H.

Under the assumptionEkXk⁴< ∞we haveC∈S₂(H). We also assumeC26=0, which essentially means the random variableX is not degenerate. To test separability we consider the hypothesis

H₀:C=C₁⊗C₂for someC₁∈S₂(H₁) andC₂∈S₂(H₂). (2.1) Our approach is based on a approximation of the operatorC by a separable operator C₁⊗˜C₂with respect to the norm · 2. Ideally, we are looking for

D:=infn

C−C1⊗C˜ 2²₂

¯

¯C₁∈S₂(H₁),C₂∈S₂(H₂)o

, (2.2)

such that the hypothesis of separability in (2.1) can be rewritten in terms of the distance D, that is

H₀:D=0 versus H₁:D>0 . (2.3) However, it turns out that it is difficult to expressDexplicitly in terms of the covariance operatorC. For this reason we proceed in a slightly different way in two steps. First we fixC₁and determine

D(C₁) :=infn

C−C1⊗C˜ 2²₂

¯

¯C₂∈S₂(H₂)o

, (2.4)

that is we are minimizingC−C₁⊗˜C₂²₂with respect to second factorC₂of the tensor product. In particular we will show that the infimum is in fact a minimum and derive an explicit expression for D(C₁) and its minimizer. Next we shows that the resulting minimum,D(C₁) vanishes if and only if the hypothesis of separability holds.

For this purpose we have to introduce additional notation and have to prove several auxiliary results. The main statement is given in Theorem 2.1 (whose formulation also

(8)

requires the new notation). First, consider the bounded linear operator T₁ : S₂(H)× S₂(H₁)7→S₂(H₂) defined by

T₁(A⊗˜B,C₁)= 〈A,C₁〉S2(H1)B (2.5) for allC1∈S2(H1). Similarly, letT2 :S2(H)×S2(H2)→S2(H1) be the bounded linear operator defined by

T₂(A⊗˜B,C₂)= 〈B,C₂〉S2(H2)A (2.6) for allC₂∈S₂(H₂).

Proposition 2.1. The operators T₁and T₂are well-defined, bi-linear and continuous with

〈B,T₁(C,C₁)〉S₂(H₂)= 〈C,C₁⊗B˜ 〉H S, (2.7)

〈A,T₂(C,C₂)〉S2(H1)= 〈C,A⊗˜C₂〉H S. (2.8) for all A,C₁∈S₂(H₁), B,C₂∈S₂(H₂) and C ∈S₂(H).

Proof. By Lemma A.1 in Appendix A the space D0:=

nXⁿ

i=1

A_i⊗˜B_i:A_i∈S₂(H₁),B_i∈S₂(H₂),n∈No

(2.9) is a dense subset ofS₂(H₁⊗H₂) (note that a similar result for the spaceS₁(H₁⊗H₂) has been established in Lemma 1.6 of the supplementary material in Aston et al. (2017)). For allB∈S₂(H₂),E∈D0andC₁∈S₂(H₁), we have

〈B,T₁(E,C₁)〉S₂(H₂)= D

B, Xn

i=1

〈Ai,C₁〉S₂(H₁)Bi

E

S2(H2)= Xn

i=1

〈Ai,C₁〉S₂(H₁)〈B,Bi〉S₂(H₂)

= DXⁿ

i=1

Ai⊗B˜ i,C₁⊗B˜ E

H S= 〈E,C₁⊗B˜ 〉H S. (2.10) Using the fact that

T₁(E,C₁)2≤ T₁(E,C₁)1=sup©

〈B,T₁(E,C₁)〉S2(H):B ∈S₂(H₂),B_∞≤1ª

, (2.11) (2.10) and the Cauchy Schwarz inequality it follows that

T₁(E,C₁)2≤ C₁2E2. (2.12) Therefore, for eachC₁∈S₂(H₁), we can extendT₁(.,C₁) continuously onS₂(H).

Furthermore as (2.7) holds for allC ∈D0and the mapsC 7→ 〈B,T₁(C,C₁)〉S2(H1)and C 7→ 〈C,C₁⊗˜B〉H S are continuous for allB ∈S₂(H₂) andC₁∈S₂(H₁), we can conclude that (2.7) holds for allC∈S₂(H).

(9)

Corollary 2.1. For all C∈S₂(H), C₁∈S₂(H₁)and C₂∈S₂(H₂)we have

T₁(C,C₁)2≤ C2C₁2 and T₂(C,C₂)2≤ C2C₂2.

Proposition 2.2. For any C∈S₂(H)and C₁∈S₂(H₁), we have

〈C,C₁⊗˜T₁(C,C₁)〉H S= T₁(C,C₁)²₂. Proof. Recall the definition of the setD0in (2.9) and letC =PN

n=1A_n⊗˜B_n∈D0, where A_n∈S₂(H₁),B_n∈S₂(H₂). We write

〈C,C₁⊗T˜ 1(C,C₁)〉H S =

* C,C₁⊗˜

N

X

n=1

〈A_n,C₁〉S2(H1)B_n +

H S

=

N

X

n=1

〈A_n,C₁〉S2(H1)〈C,C₁⊗˜B_n〉H S

=

N

X

n=1 N

X

m=1

〈A_n,C₁〉S2(H1)〈A_m,C₁〉S2(H1)〈B_m,B_n〉S2(H2). On the other hand,

〈T₁(C,C₁),T₁(C,C₁)〉S2(H2)=

* N

X

n=1

〈A_n,C₁〉S2(H1)B_n,

N

X

m=1

〈A_m,C₁〉S2(H1)B_m +

S2(H2)

=

N

X

n=1 N

X

m=1

〈A_n,C₁〉S2(H1)〈A_m,C₁〉S2(H1)〈B_m,B_n〉S2(H2). Therefore, for allC1∈S2(H1), the functionsf,g:S2(H)7→Rdefined by

f(C) := 〈C,C₁⊗T˜ 1(C,C₁)〉H S and g(C) := T1(C,C₁)²₂

are continuous and coincide on the dense subset D0ofS₂(H). So f(C)=g(C) for all C∈S₂(H) and hence the result follows.

Theorem 2.1. For each C∈S2(H)and C1∈S2(H1)the distance

D(C₁,C₂)= C−C₁⊗C˜ 22 (2.13) is minimized at

Ce₂=T1(C,C1)

C₁²₂ . (2.14)

Moreover, for C₁∈S₂(H₁)the minimum distance in(2.13)is given by D(C₁)= C²₂−T₁(C,C₁)²₂

C₁²₂ . (2.15)

In particular D(C₁)is zero if and only if C=C₁⊗˜C₂for some C₂∈S₂(H₂).

(10)

Proof. We write

C−C₁⊗˜C₂²₂=C−C₁⊗˜Ce₂²₂+ C₁⊗˜Ce₂−C₁⊗˜C₂²₂ +2〈C−C₁⊗˜Ce₂,C₁⊗˜Ce₂−C₁⊗˜C₂〉H S. For the last term we obtain from (2.14)

〈C−C₁⊗˜Ce₂,C₁⊗˜Ce₂−C₁⊗˜C₂〉H S=〈C,C₁⊗˜Ce₂〉H S− C₁⊗˜Ce₂²₂

− 〈C,C₁⊗C˜ 2〉H S+ 〈C1⊗˜Ce₂,C₁⊗C˜ 2〉H S

= 1

C₁²₂〈C,C₁⊗T˜ 1(C,C₁)〉H S−T₁(C,C₁)²₂

C₁²₂

− 〈C,C₁⊗˜C₂〉H S+ 〈C₂,T₁(C,C₁)〉H S, which is zero by (2.7) and Proposition 2.2. Therefore for allC₂∈S₂(H₂) we have

C−C₁⊗˜C₂²₂≥ C−C₁⊗˜Ce₂²₂ which proves the first assertion of Theorem 2.1.

For a proof of the representation (2.15) we substitute the operatorCe₂defined in (2.14) forC₂in the expression ofD(C₁,C₂) and obtain

D(C₁)=D(C₁,Ce₂)= C−C₁⊗˜Ce₂²₂= 〈C−C₁⊗˜Ce₂,C−C₁⊗˜Ce₂〉H S

=C²₂+ C₁⊗˜Ce₂²₂−2〈C,C₁⊗˜Ce₂〉H S

=C²₂+T1(C,C₁)²₂

C1²₂ − 2

C1²₂〈C,C₁⊗˜T₁(C,C₁)〉H S. Then the second assertion follows from Proposition 2.2.

Now assume thatC=C₁⊗˜C₂for someC₂∈S(H₂), then (2.5) implies

T₁(C,C₁)²₂=(〈C₁,C₁〉S2(H1))²C₂²₂= C₁⁴₂C₂²₂

and therefore D(C₁)=0. Conversely, if D(C₁)=0, we haveC =C₁⊗˜Ce₂, with Ce₂2≤

C2by Corollary 2.1.

2.2 Hilbert-Schmidt integral operators

An important case for applications consists is the setH :=L²(S×T,R) of all square in- tegrable functions defined on S×T, whereS ⊂R^p, T ⊂R^q are bounded measurable sets. If the covariance operatorC of the random variable X is a Hilbert-Schmidt operator it follows from Theorem 6.11 in Weidmann (1980) that there exists a kernelc ∈ L²¡

(S×T)×(S×T),R¢

such thatCcan be characterized as an integral operator, i.e.

C f(s,t)= Z

S

Z

T

c(s,t,s⁰,t⁰)f(s⁰,t⁰)d s⁰d t⁰, f ∈L²(S×T,R),

(11)

almost everywhere onS×T. Moreover the kernel is given by the covariance kernel of X, that isc(s,t,s⁰,t⁰)=Cov£

X(s,t),X(s⁰,t⁰)¤

, and the spaceH can be identified with the tensor product ofH₁=L²(S,R) andH₂=L²(T,R).

Similarly, ifC₁andC₂are Hilbert-Schmidt operators onL²(S,R) andL²(T,R) respectively there exists symmetric kernelsc1∈L²(H1×H1,R) andc2∈L²(H2×H2,R) such that

C₁f(s) = Z

S

c₁(s,s⁰)f(s⁰)d s⁰, f ∈H₁ C₂g(t) =

Z

T

c₂(t,t⁰)g(t⁰)d t⁰, g ∈H₂

almost everywhere onSandT, respectively. The following result shows that in this case the operatorsT₁andT₂defined by (2.5) and (2.6), respectively, are also Hilbert-Schmidt (or equivalently integral) operators.

Proposition 2.3. If C and C₁are integral operators with kernels c∈L²¡

(S×T)×(S×T),R¢ and c₁∈L²(S×S,R), then T1(C,C₁)is a an integral operator with kernel given by

k(t,t⁰)= Z

S

Z

S

c(s,t,s⁰t⁰)c₁(s,s⁰)d sd s⁰. (2.16) An analog result is true for the operator T₂.

Proof. By Lemma A.2 in Appendix A for a given²>0 there exists an integral operatorC⁰ with kernelc⁰such that

C−C⁰2< ² 2C₁2

and kc−c⁰k∞<²/2, whereC⁰=PN

n=1A_n⊗˜B_n, andA_n andB_nare finite rank operators with continuous ker- nelsa_nandb_n. Note thatP_N

n=1a_nb_nis the kernel of the operatorC⁰. LetKbe the integral operator with the kernel defined in (2.16) andK⁰be the integral operator with kernel

k⁰(t,t⁰) := Z

S

Z

S

c⁰(s,t,s⁰t⁰)c₁(s,s⁰)d sd s⁰, then (note thatK⁰is a Hilbert-Schmidt operator)

T₁(C,C₁)−K2≤T₁(C,C₁)−T₁(C⁰,C₁)2+ T₁(C⁰,C₁)−K⁰2+ K⁰−K2. (2.17) By (2.12) the first term is bounded byC₁2C−C₁2<²/2. To handle the second term note that for any f ∈H₂,

T₁(C⁰,C₁)f(t)=T₁ Ã N

X

n=1

A_n⊗˜B_n,C₁

! f(t)=

N

X

n=1

〈A_n,C₁〉S2(H1)B_nf(t)

=

N

X

n=1

Z

T

Z

S

Z

S

an(s,s⁰)c1(s,s⁰)bn(t,t⁰)f(t⁰)d sd s⁰d t⁰