Some theoretical results - Continuous method

Stochastic Approach

3.5 Continuous method

3.5.5 Some theoretical results

In this section we present some theoretical results that explain why we defined the TG as being positioned “somehow” in the middle between the CG and the FCIG.

Proposition 3.16 [76] If the distribution on X= (X₁, . . . ,X_p)is Gaussian and faith-ful to the Full Conditional Independence Graph, then every edge in the Full Condi-tional Independence Graph is also an edge of the Tri-graph, FCIG⊆TG.

3.5 Continuous method 61

Figure 3.9: Tri-graph coincides with CG Proof:.

We prove the statement by contradiction. Assume that an edge(X_i,X_j)in the FCIG is not in the TG. Then eitherσ_{i j} =0orω_{i j}_|_k=0for somek∈{1, . . . ,p} \ {i,j}. In the first case (σ_{i j} = 0), X_i and X_j are (marginally) independent, i.e. they are independent given all the other variables (Thus there can not exist a pathPfromX_ito Xj(and viceversa) otherwise the variables in the paths could explain the dependence of XiandXj,Xi y Xj|{variables in the pathP}. Then we could generalize the set to XiyXj|{Xk : k∈{1, . . . ,n} \ {i,j} }, hence they will be dependent). Therefore they must be in different connectivity components of FCIG, sinceXis faithful to the FCIG.

In the second case (ω_{i j}_|_k = 0for somek,{i,j}), it holds thatXi yXj|Xk. That can be generalized toXiyXj| {Xk : k∈{1, . . . ,p} \ {i,j}and sinceXis faithful to FCIG, it implies thatXiandXjare separated in the FCIG, i.e. there is no edge(X_i,Xj)in the

FCIG.

Proposition 3.17 [76] Assume that the distribution of X is Gaussian. Moreover, as-sume that if Xiand Xjare not adjacent in the Full Conditional Independence Graph, then Xiand Xjare either in different connectivity components of the Full Conditional Independence Graph or there exists a vertex Xk that separates Xi and Xjin the Full Conditional Independence Graph. Then every edge in the Tri-graph is also an edge in the Full Conditional Independence Graph.

Proof:

Assume thatXiandXjare not adjacent in the Full Conditional Independence Graph.

Then we either have:

• Xi andXj are in different connectivity components. Xi and Xj are therefore (marginally) independent, which impliesρ_{i j} = 0and there is no edge between them in the Tri-graph, or

• there exists someXkwithk∈{1, . . . ,p} \ {i,j}that separatesXiandXj. Due to the Markov property, we haveXiyXj|Xkand thereforeω_{i j}_|_k=0, which further implies thatXiandXjare not adjacent in the Tri-graph.

Due to Propositions 3.16 and 3.17, the TG and FCIG may coincide.

In particular all Gaussian distributions corresponding to trees are faithful (see [6]), so that it holds (see [76])

Theorem 3.18 [76] If the Full Conditional Independence Graph of a Gaussian distri-bution is a forest of trees (i.e. the graph does not contain any cycle) then the Tri-graph and the Full Conditional Independence Graph coincide.

Tri-graph and the Full Conditional Independence Graph do also coincide in more complicated scenarios, for example, if the distribution is Gaussian and faithful and if the corresponding Full Conditional Independence Graph consists of sets of cliques that (pairwise) share at most one (common) vertex.

From Proposition 3.16, we expect that sparse Full Conditional Independence Graphs have fewer edges than the Tri-graph. From Theorem 3.18 we also expect that the number of cycles will be an indicator for the difference between the number of edges in the FCIG and in the TG. The larger the number of cycles, the larger the difference in the number of edges.

As distributions are not always faithful, some FCIGs may also contain more edges than the corresponding TGs.

Furthermore we can prove some results concerning the relation between the Tri-graph and the Covariance Graph.

Proposition 3.19 The Tri-graph is a subgraph of the Covariance Graph, TG⊆CG.

Proof:

For every edge(X_i,Xj)in TG we haveρ_{i j} , 0, hence the edge belongs also to the

CG.

Theorem 3.20 If the Covariance Graph of a Gaussian distribution does not contain any cycle, then the Tri-graph coincides with the Covariance Graph

Proof:

The CG has no cycle if there does not exist any triple of variables(i,j,k)such that σ_{i j} , 0,σ_ik,0andσ_jk,0(due to the transitivity of the covariance).

Proposition 3.19 already states that TG⊆CG. By contraddiction we suppose that TG

⊂CG, i.e. there exists an edge(i,j)in CG but not in TG. Hence we have thatσ_{i j} ,0 but there exists an indexksuch thatω_{i j}_|_k=0where the pairwise partial correlation co-efficient is defined as in (3.3). Writing explicitly the correlation coefficient as function in the variance and in the correlation of the variables, we get:

ω_{i j}_|_k =

since the variance is a nonnegative quantity. Thus ω_{i j}_|_k=0⇔σ_{i j}σ_kk =σ_ikσ_jk

3.5 Continuous method 63 Sinceσi j,0(the edge exists in the CG) andσkk,0(otherwise the variableXkwould

be a constant), it must beσikσjk ,0, i.e. σik,0andσjk,0. Consequently the CG

contains a cycle.

In general it is not possible to say which method between the CG and the FCIG will point out less edges since the presence of null non-diagonal elements in the covariance matrix does not imply the corresponding presence of null elements in the precision matrix.

We still need to decide how to check if the pairwise correlation coefficients corre-sponding to the different triples(X_i,X_j,X_k)are different than zero, i.e. ω_{i j}_|_k = 0

∀k ∈ {1 . . .p} \ {i, j}.

Under the Gaussian assumption given in Section (3.5), this can be done with the Likelihood Ratio test (see Appendix A) based on the hypotheses:

• null hypothesis,H₀(i,j|k) :ω_{i j}_|_k=0fork∈{1 . . .p}\{i,j};

• alternative hypothesis,H₁(i,j|k) :ω_{i j}_|_k,0fork∈{1 . . .p}\{i,j}.

Under the null hypothesis and the assumption that the data is independent identically distributed (i.i.d.) the log-likelihood test has an asymptotically Chi-squared distribu-tion. We call P-valueP(i,j|k)the result of the Log-Likelihood Ratio test of the null hypothesisH₀(i,j|k)versus the alternative hypothesisH₁(i,j|k).

An analogous procedure is defined for the marginal correlation of the pair of vari-ables(Xi,Xj), so that we will also have:

• null hypothesis,H0(i,j|0) :ρ_{i j} =0;

• alternative hypothesis,H₁(i,j|0) :ρ_{i j},0;

• P-value,P(i,j|0).

We can now reformulate Definition 3.9.

Definition 3.21 We draw an edge Xi → Xj if and only if all the null hypotheses, H₀(i,j|0)and H₀(i,j|k)∀k, are rejected in all the Log-Likelihood Ratio tests.

Thus there is evidence for an edgeX_i→X_jif max

k∈{0}∪{1...p}\{i,j}P(i,j|k)< α whereαis the significance level for the test.

For deciding about a single edge between vertices Xi and Xj it is not necessary to correct αfor the p−1 multiple testings over all conditioning vertices ksince the following proposition can be proved.

Proposition 3.22 [76] Consider the single hypothesis (for some fixed pair (Xi,Xj)):

H0(i,j): at least one H₀(i,j|k^∗)is true for some k^∗∈{0}∪{1 . . .p}\{i,j}. Assume for all k∈{0}∪{1 . . .p}\{i,j}the individual test satisfies

PH^˜0(i,j|k)[H₀(i,j|k)rejected]≤α

whereH˜₀(i,j|k) = {H₀(i,j|k)true}∪{H₀(i,j|k⁰)true or false (and compatible with H₀(i,j|k)for all k⁰,k )}. Then the error of the first type

PH0(i,j)[H0(i,j|k)are rejected for all k∈{0}∪{1 . . .p}\{i,j}]≤α.

Proof:Consider the hypothesis:

H0=H0(i,j): at least oneH0(i,j|k^∗)is true for somek^∗ The probability for an error of the first type is:

PH₀[H₀(i,j|k)rejected for allk] = PH₀[∩_k{H0(i,j|k)rejected}]

≤ min

k P^H0[H₀(i,j|k)rejected]

≤ PH0[H₀(i,j|k^∗)rejected]

≤ α.

Remark 3.23 Due to this proposition a lot of calculations will be saved and it is also important to point out that the estimation for a Tri-graph is done in an exhaustively manner. This is a very important difference from the Full Conditional Independence Graph where it is often necessary to apply a non-exhaustive procedure in case of a huge graph space.

Im Dokument Identifying dependencies among delays (Seite 76-80)