• Keine Ergebnisse gefunden

We are now going to couple the family(BM)M⊆N to the channelνso that we can interpret the corresponding marginalsˆM)M⊆N causally. In order to simplify the presentation, we first consider an arbitraryσ-subalgebraA ofXN. (Below,A will be chosen to be the σ-algebra generated byν.) We begin with information inMin the context of a configuration

¯

x outside ofM, that isx¯ ∈ XN\M. Given such anx, we define the¯ (M,x)-trace¯ ofA as follows: For eachAA, we consider the(M,x)-section¯ ofA,

secM,x¯(A):=AM,x¯:= {xXM:(x,x)¯ ∈A}. These sections then form the(M,x)-trace of¯ A, that is

trM,¯x(A):=AM,¯x:="

AM,¯x:AA# .

Considering all possible contextsx¯∈XN\M, we finally define theM-traceofA as trM(A):=AM:=

¯ x∈XN\M

AM,x¯.

The(M,x)-trace as well as the¯ M-trace ofA areσ-subalgebras ofXM. Note that in the extreme casesM = ∅andM =N, we recoverA = {∅,{} =X}(wheredenotes the empty sequence), andAN =A, respectively.

The family of allM-traces ofA describes howA is “distributed” over the subsetsM ofN. However, there is a problem here: The canonical projectionsπLMare not necessarily AM-AL-measurable. This projectivity property is required for the definition of a measure of causal information flow that satisfies the general chain rule of Theorem 6. We highlighted this problem for the three-input case in Section4.2. There are two ways to recover the projectivity, first by extending and second by reducingAMappropriately. Let us begin with the extension:

AM:=

L⊆M

πLM−1 AL

. (51)

We have the following characterisation of the family AM, MN, as the smallest projective extension of the familyAM,MN.

Proposition 8 (Extension ofAM,MN)The familyAM,MN, satisfies the following two conditions:

1. For allMN,AMis contained inAM.

2. For allLMN, the canonical projectionπLMisAM-AL-measurable.

Furthermore, for every familyAM,MN, that satisfies these two conditions (whereAM

is replaced byAM), we have

AMAM for allMN. (52)

Proof The first statement is clear (simply choose on the RHS of (51)L = M). For the second statement, we have to show

smallestσ-algebra that contains theseσ-subalgebras (see (51)). This implies (52).

After having defined the smallest extension of the familyAM,MN, as one way to recover projectivity, we now come to the alternative way, which is by reduction of that family. More precisely, we define

We have the following characterisation of this family as the largest projective reduction of AM,MN.

Proposition 9 (Reduction of AM, MN) The family AM, MN, satisfies the following two conditions:

1. For allMN,AMis contained inAM.

2. For allLMN, the canonical projectionπLMisAM-AL-measurable.

Furthermore, for every familyAM,MN, that satisfies these two conditions (whereAM

is replaced byAM), we have

AMAM for allMN.

Proof In order to prove the first statement, letAAM. This means that A':= we come to the measurability of the canonical projectionπLM. For this, we chooseAAL and have to showLM)−1(A)AM:

Finally, we have to prove the maximality. LetAM,MN, be a family that satisfies the two conditions. Then

πMN −1

AM

ANAN=A. This means thatAMAM.

By the definitions, we have

AMAAM, MN, where equalities hold forM=NandM= ∅. More precisely,

AN =A =AN, A= {∅,{}} =A.

This concludes the constructions for a givenσ-algebraA, without explicit reference to the channelν :X×Z → [0,1]. We now couple the studiedσ-algebras with the channel νand therefore chooseA to be theσ-algebra generated by the channelν, that isσ (ν). We highlight this coupling by writingAν, as a particular choice ofA, and consider the fam-ily(AMν)M⊆N of its traces, together with the corresponding smallest projective extension (AνM)MN and the largest projective reduction(AνM)MN. In the context of a channel, the traces of Aν have a natural interpretation. In order to see this, we first consider a configurationx¯∈XN\Mand define the “constrained” Markov kernel

νM,¯x: XM×Z → [0,1], νM,x¯(x;C):=ν(x,x¯;C).

We denote theσ-algebra generated byνM,¯x byσM,x¯(ν). Taking all “constraints” x¯ into account, we then define

σM(ν):=

¯ x∈XN\M

σM,¯x(ν).

Proposition 10 LetAνXN be the σ-algebra generated by the Markov kernelν : XN×Z → [0,1]. Then for allMNand allx¯∈XN\M,

σM,x¯(ν)=trM,x¯(Aν) and σM(ν)=trM(Aν).

Proof Theσ-algebraAν is the smallestσ-algebra that contains all measurable sets of the form

A= {xXN :ν(x;C)B}, (55) with someCZ and a Borel setBinB([0,1]). Now consider the(M,x)-section of such¯ a setA:

secM,¯x(A) = {xXN\M:(x,x)¯ ∈A}

= {xXN\M:ν(x,x¯;C)B}

= {xXN\M:νM,x¯(x;C)B}.

This shows that the sections secM,x¯(A)of measurable sets A of the form (55) generate σM,x¯(ν), which proves the first equality. The second equality is a direct implication of the first one.

The results of the previous section, Theorem 6 and Proposition 7, apply to the informa-tion flows, defined for the projective families(AνM)MN and(AνM)MN. These families take into account the information that is actually used by the channelν. Therefore, we can interpret the corresponding marginal channelsνˆM causally, where we have to distinguish

two kinds of causality. For the projective family(AνM)M⊆N, the channelνˆMincorporates the information in any input configurationxK,KM, that is used byν in conjunction with a context configurationx¯N\K = ¯xN\Mx¯M\Koutside ofK. For the projective family (AνM)M⊆N, on the other hand, the channelνˆMincorporates the information used byνthat is solely contained inxM, independent of any context. When comparing a marginal channel ˆ

νMwith another marginal channelνˆL, whereLM, the corresponding information flows Iξ(XM\LZ|XL)andIξ(XM\LZ|XL), respectively, quantify the causal effects in ˆ

νMthat exceed those inνˆL. These measures will capture different causal aspects, where the difference can be large. This is illustrated by the following extension of Example 3.

Example 11 Let

(Xi,Xi)=(R,B(R)), i∈ {1, . . . , n} =N,

whereB(R)denotes the Borelσ-algebra ofR. We define the channel simply by the sum of the input states, interpreted as a Markov kernel,

AsB(R)is generated by the intervals[rε, r+ε] ⊆R, the smallestσ-algebraAν for which all functionsν(·;C)are measurable is generated by the following sets

A(r, ε):= {(x1, . . . , xn)∈Rn:rεx1+ · · · +xnr+ε}, r∈R, ε∈R+. Therefore, theM-trace ofAν,AMν, is generated by the halfspaces

Hϑ := leads to the largestσ-algebra, the Borel algebra ofRM:

AνM=B(RM).

Therefore, the marginal channelνˆM(x;C)equals the usual marginalνM(x;C)for the pro-jective extension. For the propro-jective reduction, on the other hand, we obtain the trivial σ-algebra except forM=N:

AνM=

{∅,RM} ifMN, Aν ifM=N.

In this case we haveνˆM(x;C)=ν(μ)(C)forM=NandνˆN(x;C)=ν(x;C), whereμ is the joint distribution of the input variables.

We now consider the information flows associated withL MN, for the projec-tive extension as well as for the projecprojec-tive reduction. In both cases these flows coincide with usual (conditional) mutual informations, in an instructive way. More precisely, for the extension we have

Iξ(XM\LZ|XL)=Iξ(XM\L;Z|XL).

For the reduction, we obtain

Iξ(XM\LZ|XL)=

0 ifMN,

Iξ(XN;Z) ifM=N. (56) Interestingly, (56) does not depend onL. The vanishing of the information flow forM=N is due to the fact that the output of the channel, the sumx1+ · · · +xn, cannot be computed from a proper subset of the inputs. The flow of information only takes place if all inputs are given.

The following example, which is closely related to Example 1, highlights continuity issues of the introduced measures of information flow. They result from the fact that small changes in the mechanisms can lead to large differences of the involvedσ-algebras.

Example 12 Consider two input variables, X and Y, and one output variable Z, with corresponding state spaces

(X,X)=(Y,Y)=(Z,Z)=(R,B(R)).

Furthermore, consider two channels,νandν, whereνsimply copies the second input,y, andνthe first one,x. More precisely,

With 0≤ε≤1, we define the convex combination ν(ε):=(1ε)ν+εν.

Forε = 0, only the channelνis acting. Obviously, we have Iξ(XZ) = Iξ(XZ) = 0, as expected in this case, becauseν simply copiesy and is not sensitive tox at all. One might intuitively expect that the causal information flow fromXtoZstays close to 0 if εis small but greater than 0. However, this intuition is not reflected by the actual quantities, as defined in this article. It is easy to see that forε=0,Iξ(XZ)=Iξ(XZ)=Iξ(X;Z). Thus, these quantities are not at all sensitive to the parameterεand behave discontinuously in the limitε→0.

6 Conclusions

Conditioning is an important operation within the study of causality. The theory of causal networks, pioneered by Pearl [17], introduces interventional conditioning as an operation, the so-calleddo-operation, that is fundamentally different from the classical conditioning based on the general ruleP(B|A)=P(AB)/P(A). It models more appropriately exper-imental setups and avoids confusion with purely associational dependencies. Information theory has been classically used for the quantification of such dependencies, in terms of mutual information and conditional mutual information [20]. Within the original setting of information theory, the mutual information between the input and the output of a chan-nel can be interpreted causally. In the more general context of causal networks, however, confounding effects make a distinction between associations and causal effects more diffi-cult. In such cases, information-theoretic quantities can be misleading as measures of causal effects. In order to overcome this problem, information theory has been coupled with the

interventional calculus of causal networks, and corresponding measures of causal informa-tion flow have been proposed [5,6]. Given that such measures are based on the notion of an experimental intervention, which represents a perturbation of the system, it remains unclear to what extent they quantify causal information flows in the unperturbed system. As another consequence of the interventional conditioning, one cannot expect that causal information flow, as defined in [6], decomposes according to a chain rule. The current article is based on an idea of the author from 2003 which precedes the above-mentioned works on com-bining the theory of causal networks with information theory. It proposes a way to quantify causal information flows without perturbing the system through intervention. Instead, it is based on classical conditioning in terms of the conditional distributionP(B|A), where the σ-algebraA is adjusted to the intrinsic mechanisms of the system. The derived informa-tion flow measure satisfies the chain rule and the natural properties of a general measure of causal strength postulated in [14]. The chain rule, together with the generalised Pythagoras relation from information geometry, provide powerful tools within the study of the problem of partial information decomposition [7,8,16].

Even though the introduced information flows satisfy natural properties, the aim of the present article is relatively moderate. For instance, the analysis is focussed on a simple net-work consisting of a number of inputs and one output, which is a strong restriction compared to the setting of [6]. The extension of the present work to more general casual networks remains to be worked out. Furthermore, this article does not address the important prob-lem of causal inference [18]. In addition to these general directions of research, there are various ways to modify and extend the constructions of the present work and thereby poten-tially highlight further causal aspects of a given channel. The following perspectives are particularly important:

1. In the present article, the information flow has been defined for a fixed finite measur-able partitionξ of the state space(Z,Z)of the output variableZ. A natural further step would be to consider the limit of information flows with respect to an increasing sequenceξn,n=1,2, . . ., so that

n=1

σ (ξn)=Z.

This limit will be an information flow measure that is independent of a particular partition.

2. Throughout this article, the partitionξhas not been coupled with theσ-algebra of the channelν. This is the smallestσ-algebra for which all functionsν(x;C),CZ, are measurable. Given that the channel is analysed with respect to the partitionξ, one can restrict attention to the smallestσ-algebra for which the functionsν(x;C),Cξ, are measurable. This will be a potentially smallσ-subalgebra of the one generated by the channel. We would then have a natural coupling of the partitionξwith the information used by the channel.

3. We started with the familyAMν ofM-traces ofAν, theσ-algebra generated byν, as the natural family associated with the channel. However, these traces do not form a projective family ofσ-algebras. Such a projectivity is required for the chain rule for corresponding information flows. One can recover projectivity by extension and by reduction, leading toAνMandAνM, respectively. Example 11 shows that the extension can lead to the largestσ-algebra and the reduction to the trivial one. Given this fact, one might ask whether the extension is too large and the reduction is too small to cap-ture the causal aspects ofν. Even though we argued above that these two projective

families associated withνcapture two different kinds of causal aspects, this question remains to be further pursued. One possible direction would be the analysis of the context-dependent traces ofAν, that is the family of trM,x¯(Aν),x¯∈XN\M. Instead of conditioning with respect to the join

trM(Aν)=

¯ xXN\M

trM,¯x(Aν),

one could adjust the conditioning to the individualσ-algebras trM,x¯(Aν). This would represent an important refinement of the presented theory.

Acknowledgements The author is grateful for valuable comments of two anonymous reviewers. He acknowledges the support of the Deutsche Forschungsgemeinschaft Priority Programme “The Active Self”

(SPP 2134).

Funding Open Access funding enabled and organized by Projekt DEAL.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/.

References

1. Amari, S.-i.: Information Geometry and its Applications. Applied Mathematical Sciences, vol. 194.

Springer, Tokyo (2016)

2. Amari, S.-i., Nagaoka, H.: Methods of Information Geometry. Translations of Mathematical Mono-graphs, vol. 191. Oxford University Press, Oxford (2000)

3. Ay, N., Amari, S.-i.: A novel approach to canonical divergences within information geometry. Entropy 17, 8111–8129 (2015)

4. Ay, N., Jost, J., Lˆe, H.V., Schwachh¨ofer, L.: Information Geometry. A Series of Modern Surveys in Mathematics, vol. 64. Springer International Publishing, New York (2017)

5. Ay, N., Krakauer, D.C.: Geometric robustness theory and biological networks. Theory Biosci.125, 93–

121 (2007)

6. Ay, N., Polani, D.: Information flows in causal networks. Adv. Complex Syst.11, 17–41 (2008) 7. Ay, N., Polani, D., Virgo, N.: Information decomposition based on cooperative game theory. Kybernetika

56, 979–1014 (2020)

8. Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., Ay, N.: Quantifying unique information. Entropy16, 2161–2183 (2014)

9. Bossomaier, T., Barnett, L., Harr´e, M., Lizier, J.T.: An Introduction to Transfer Entropy. Springer International Publishing (2016)

10. Cover, T.M., Thomas, J.A. Elements of Information Theory, 2nd edn. Wiley-Interscience, Hoboken (2006)

11. Dudley, R.M.: Real Analysis and Probability, 2nd edn. Cambridge Studies in Advanced Mathematics, vol. 74. Cambridge University Press, Cambridge (2002)

12. Granger, C.W.J.: Investigating causal relations by econometric models and cross-spectral methods.

Econometrica37, 424–438 (1969)

13. Granger, C.W.J.: Testing for causality: A personal viewpoint. J. Econ. Dyn. Control2, 329–352 (1980) 14. Janzing, D., Balduzzi, D., Grosse-Wentrup, M., Sch¨olkopf, B.: Quantifying causal influences. Ann. Stat.

41, 2324–2358 (2013)

15. Kakihara, Y.: Abstract Methods in Information Theory. Multivariate Analysis, vol. 4. World Scientific, New Jersey (1999)

16. Lizier, J., Bertschinger, N., Jost, J., Wibral, M.: Information decomposition of target effects from multi-source interactions: Perspectives on previous, current and future work. Entropy20, 307 (2018) 17. Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge (2000) 18. Peters, J., Janzing, D., Sch¨olkopf, B.: Elements of Causal Inference: Foundations and Learning Algorithms. Adaptive Computation and Machine Learning Series. The MIT Press, Cambridge (2017) 19. Schreiber, T.: Measuring information transfer. Phys. Rev. Lett.85, 461–464 (2000)

20. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J.27, 623–656 (1948)

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.