• Keine Ergebnisse gefunden

Methods for Diagnosis and Interpretation of Stochastic Actor-oriented Models for Dynamic Networks

N/A
N/A
Protected

Academic year: 2022

Aktie "Methods for Diagnosis and Interpretation of Stochastic Actor-oriented Models for Dynamic Networks"

Copied!
148
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)Methods for Diagnosis and Interpretation of Stochastic Actor-oriented Models for Dynamic Networks Dissertation zur Erlangung des akademischen Grades Doktor der Naturwissenschaften (Dr. rer. nat.). vorgelegt von. Natalie Indlekofer an der. Mathematisch-Naturwissenschaftliche Sektion Fachbereich Informatik & Informationswissenschaft. Tag der mündlichen Prüfung:. 4. Februar 2014. Erster Referent: Zweite Referentin:. Ulrik Brandes Tom A.B. Snijders. Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-274691.

(2)

(3) Deutsche Zusammenfassung Der Begriff soziales Netzwerk ist heutzutage allgegenwärtig. In dieser Dissertation bezieht er sich auf ein System, das aus einer Gruppe sozialer Akteure und Verbindungen zwischen diesen besteht. Beispiele sind Freundschaften innerhalb einer Schulklasse oder Handelsabkommen zwischen einer Gruppe von Staaten. Intuitiv lässt sich vermuten, dass sich Verbindungen in sozialen Netzwerken gegenseitig bedingen. Die Soziale Netzwerkanalyse untersucht diese Abhängigkeitsstruktur zwischen Verbindungen in sozialen Netzwerken. In dieser Arbeit werden soziale Netzwerke betrachtet, die sich mit der Zeit entwickeln. Dabei interessiert uns, in welcher Weise zukünftige Veränderungen im Netzwerk von der aktuellen Netzwerkstruktur abhängen. Das zentrale Element ist ein statistisches Modell zur akteursbasierten Modellierung dynamischer Netzwerkdaten. Es wurde von Snijders (1996) eingeführt und wird üblicherweise als stochastic actor-oriented model (SAOM) bezeichnet. Das SAOM wurde für die statistische Analyse von Netzwerkpaneldaten entwickelt und ist die meistverbreitete Analysemethode für dieses Datenformat. Ziel der Modellierung ist die Identifizierung lokaler netzwerkspezifischer Mechanismen, sogenannter Netzwerkeffekte, die den Entwicklungsprozess des Netzwerks zwischen aufeinanderfolgenden Erhebungszeitpunkten bestimmen. Die Entwicklung des gesamten Netzwerks wird dabei als Folge von Veränderungen einzelner Verbindungen beschrieben und als parametrisierte zeitkontinuierliche Markoff-Kette modelliert. Das SAOM repräsentiert eine akteursbasierte Perspektive, indem es Änderungen von Verbindungen als Folge individueller Akteursentscheidungen modelliert. Ziel dieser Dissertation ist es, Methoden zur Unterstützung der akteursbasierten Modellierung dynamischer Netzwerke bereitzustellen. Dabei werden zwei thematische Schwerpunkte verfolgt: 1. Das SAOM ist eines der wenigen Netzwerkmodelle, die sich zur statistischen Inferenz eignen. Allerdings ist die Interpretierbarkeit der abgeleiteten Ergebnisse durch die Komplexität des Modells erschwert. Derzeitige Analysen beschränken sich auf Hypothesentests zur Bestimmung der statistischen Signifii.

(4) kanz betrachteter Effekte. Die Stärke einzelner Effekte bleibt dagegen weitestgehend unberücksichtigt, da es für SAO Modelle bisher kein etabliertes Maß zur Bewertung relativer Effektstärken gibt. Ein solches Maß wird in dieser Arbeit eingeführt. Es basiert auf dem relativen Einfluss der Effekte auf Akteursentscheidungen und berücksichtigt sowohl die Werte der Modellparameter als auch die untersuchten Daten und die gesamte Modellspezifikation. Dadurch erlaubt es den Vergleich relativer Effektstärken innerhalb eines Modells sowie für unterschiedliche Modelle und für verschiedene Datensätze. Diese Vergleichbarkeit erleichtert die Interpretation und Kommunikation der inferentiell erhaltenen Ergebnisse. 2. Um die statistische Beherrschbarkeit des Modells zu gewährleisten, werden einige vereinfachende Annahmen getroffen, darunter eine Homogenitätsannahme hinsichtlich des Verhaltens der Akteure, die impliziert, dass lokale Netzwerkmechanismen für alle Akteure in gleicher Weise gelten. In vielen Fällen scheint diese Verallgemeinerung ungerechtfertigt und mögliche Ursache für verfälschte Ergebnisse zu sein. Im Folgenden werden diagnostische Methoden zur Validierung dieser Homogenitätsannahme eingeführt. Misslingt die Validierung, unterstützen diagnostische Maßzahlen die Evaluierung existierender Inhomogenitäten und die Identifizierung einzelner Ausreißer oder Gruppen von Akteuren, deren Verhalten von den vom Modell erwarteten Verhaltensweisen abweicht. Eine erste Methode findet abweichende Akteure durch die Analyse der Übereinstimmungen zwischen beobachteten und vom Modell erfassten Veränderungen lokaler Netzwerkkonfigurationen einzelner Akteure. Eine zweite Methode dient speziell zur Identifizierung von Akteuren, deren abweichendes Verhalten die Paramterschätzung stark beeinflusst. Das Auffinden solcher einflussreicher Akteure ist insbesondere wichtig, da ihr Einfluss zu verfälschten Ergebnisses führen kann. Die eingeführten diagnostischen Maßzahlen approximieren die Sensitivität der Schätzergebnisse auf lokale Störungen der zur Schätzung verwendeten Netzwerkdaten. Die Arbeit ist in vier Kapitel unterteilt: Kapitel 1 gibt einen Überblick über analytische Konzepte und Modelle der statistischen Netzwerkanalyse. In Kapitel 2 wird das zentrale Modell, das stochastic actor-oriented model (SAOM), eingeführt. Kapitel 3 beinhaltet die Herleitung eines Maßes zur Bewertung relativer Effektstärken in SAO Modellen, und Kapitel 4 beschäftigt sich mit der Entwicklung diagnostischer Methoden zur Evaluierung von Homogenitätsannahmen hinsichtlich des Verhaltens der Akteure im Netzwerk. Die Arbeit endet mit einer Diskussion der Ergebnisse und möglicher zukünftiger Forschungsfragen.. ii.

(5) Acknowledgements First and foremost, I would like to thank my advisor Ulrik Brandes for his guidance, his constant support, and for everything I have learned from him. I have benefited greatly from his wide knowledge and his inspiring enthusiasm for research, and I am very grateful for the many opportunities he opened up to me. The attendance at several scientific meetings all over the world, a research visit lasting several month, and the flexibility to work partly from Stuttgart are only some examples. I am extremely grateful to Tom Snijders for being my second referee and for inviting me as a research visitor. The month in Oxford have been a great experience and crucial for large parts of my research. Thank you very much for the many motivating and fruitful discussions and ideas. I also want to thank Viviana Amati and Christian Steglich for helpful discussions and important comments on my work as well as Sven Kosub and Marc Scholl for joining my examination committee. Many thanks go to all my colleagues from the algorithmics group and all the people from E2 for their support, collaboration, and friendship, and for making my years in Konstanz very special. I am deeply grateful to my family, most of all my parents Katja and Markus Indlekofer and my brother Julian, for their stable and absolute support through all these years. Finally, I want to thank my wonderful partner Thomas Mundinger for all the love and strength he gave me. Without his encouragement, his endless patience, and his unfailing support, this dissertation would not have been possible. Thank you!. Natalie Indlekofer February 2014. iii.

(6)

(7) Contents. 1. Introduction. 1. Statistical Analysis of Network Data. 5. 1.1 Basic Concepts and Terminology . . . . . . . . . . . . . . . . . . . . . . .. 7. 1.2 Typical Dependencies in Networks . . . . . . . . . . . . . . . . . . . . . . 10 1.3 Statistical Modeling of Social Networks . . . . . . . . . . . . . . . . . . . 13 1.3.1 Statistical Inference in Empirical Research . . . . . . . . . . . . . 13 1.3.2 Models for Cross-sectional Network Data. . . . . . . . . . . . . . 19 1.4 Statistical Models for Network Panel Data . . . . . . . . . . . . . . . . . 23 1.4.1 Continuous-time Markov Processes . . . . . . . . . . . . . . . . . 24 1.4.2 Dyad-Independence Models . . . . . . . . . . . . . . . . . . . . . . 28 2. Stochastic Actor-oriented Models. 33. 2.1 Formulation of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.1.1 Model Assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.1.2 Actor Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.1.3 Timing of Network Changes. . . . . . . . . . . . . . . . . . . . . . 38 2.1.4 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.2 Parameter Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.2.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.2.2 Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.3.1 Significance Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 v.

(8) 3. Relative Importance of Effects in SAOMs. 55. 3.1 Relative Importance in Regression Models . . . . . . . . . . . . . . . . . 57 3.2 How to Define Relative Importance in a SAOM? . . . . . . . . . . . . . 59 3.2.1 How to Assess Relative Importance in a Micro-step?. . . . . . . 60 3.2.2 Main Difficulties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3 A Measure of Relative Importance . . . . . . . . . . . . . . . . . . . . . . 63 3.3.1 Relative Impact on Actor Decisions . . . . . . . . . . . . . . . . . 63 3.3.2 Relative Impact on Network Evolution . . . . . . . . . . . . . . . 66 3.3.3 Computational Costs . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 Application to a Network of University Freshmen . . . . . . . . . . . . . 69 3.4.1 Model Specification and Estimation Results . . . . . . . . . . . . 70 3.4.2 Relative Importance at Observation Moments . . . . . . . . . . . 72 3.4.3 Dynamics of Relative Importance . . . . . . . . . . . . . . . . . . 78 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4. Discovering Actor Inhomogeneity in SAOMs. 89. 4.1 Example Data: Friendships in a School Class . . . . . . . . . . . . . . . 91 4.2 Detecting Outliers and Deviating Groups . . . . . . . . . . . . . . . . . . 92 4.2.1 Local Fit of Individual Actors . . . . . . . . . . . . . . . . . . . . 92 4.2.2 Visual Exploration of Local Fit. . . . . . . . . . . . . . . . . . . . 100 4.3 Influence of Individuals on Estimates . . . . . . . . . . . . . . . . . . . . 102 4.3.1 Deletion of Single Cases to Assess their Influence . . . . . . . . . 103 4.3.2 What is a Single Case in a SAOM? . . . . . . . . . . . . . . . . . 105 4.4 Incorporate Perturbations into Estimating Functions . . . . . . . . . . . 106 4.4.1 Discrete Perturbations for Single Influential Actors. . . . . . . . 107 4.4.2 Continuous Perturbations for Groups of Influential Actors . . . 108 4.5 Sensitivity of Estimates towards Local Perturbations . . . . . . . . . . . 109 4.6 Directions of Maximum Sensitivity . . . . . . . . . . . . . . . . . . . . . . 111 4.6.1 Sensitivity of Single Parameters . . . . . . . . . . . . . . . . . . . 111 4.6.2 Aggregated Sensitivity of All Parameters . . . . . . . . . . . . . . 117 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 vi.

(9) Conclusion. 123. Bibliography. 125. vii.

(10)

(11) Introduction The notion of social network is omnipresent in daily life. In this dissertation, it describes a system consisting of a set of social actors and relations between them. Examples are collaborations between scientists, friendships within a school class, or trade agreements between nations. Social network analysis studies the joint structure of relations in a social network. Intuitively, those relations depend on each other, and indeed, understanding the dependence structure of relations in network data is one of the essential research interests in social network studies. The networks considered in this dissertation are, as most social networks, evolving over time. Relations emerge, disappear, or change. The interest is in the specific dependencies of future changes on the present network structure. However, traditional methods of statistical data analysis rely on independent units so that the statistical analysis of social network data requires specially designed statistical concepts and models. The central element in this book is the stochastic actor-oriented model (SAOM) for dynamic social networks introduced by Snijders (1996). It is designed for the statistical analysis of network panel data, consisting of two or more observations of a network at discrete moments in time, and is the most prominent model for data of this type. The aim is to detect local formation rules, called network effects, that govern the unobserved evolution process between consecutive network observations. Thereby, the evolution is assumed to be a sequence of single relational changes and is modeled as a parameterized continuous-time Markov process. The SAOM postulates an actor-oriented perspective by modeling relational changes as consequences of decisions taken by individual actors. This dissertation aims at providing methods to support the analysis of dynamic networks with stochastic actor-oriented models. The focus is on two aspects:. 1. The SAOM is one of the few network models that allow for statistical inference. However, its complexity makes the interpretation of inferred results difficult. The focus is on statistical significance tests, indicating whether an effect exists or not, while the importance of effects is usually ignored. Indeed, there is no 1.

(12) established approach or measure to assess the relative importance of effects in a SAOM. We introduce such a measure. It is based on the influence of effects on decisions of individual actors and takes parameter sizes as well as the analyzed data and the complete model specification into account. Therefore, it can be used to compare relative importances of effects within a model, among different models, and for different data sets. The comparability of values of the proposed measure facilitates interpretation and communication of inferred results and enriches them by an additional dimension. 2. In order to keep the model statistically tractable several assumptions are made. One is the assumption of actor homogeneity implying that tie formation rules apply to all actors identically, except for differences that can be represented by observed attributes. This assumption seems to be unjustified in many cases and may lead to unreliable results. However, allowing individual rules for all actors would inflate the model and obstruct the purpose of finding general regularities of the network evolution. We introduce diagnostic methods for confirming the assumption of actor homogeneity or, failing that, for evaluating the severeness of existing inhomogeneity and identifying those actors to which the generalized rules do not apply. The first approach discovers deviating actors by analyzing how well the model captures observed changes in their local configurations. Therefore, it is also suitable to evaluate the quality of the model in terms of local fit. The second approach discovers deviating actors with strong influence on the estimation results. Finding influential actors is especially important as their impact could distort the model results. The proposed diagnostics approximate the sensitivity of estimates towards local perturbations of the estimation procedure. In addition to quantitative diagnostics, tailored visualizations are proposed to facilitate the exploration of potential inhomogeneities across actors. The book is structured into four chapters: Chapter 1 provides an overview of analytic concepts related with the statistical analysis of network data. After introducing basic definitions and terminology, we give an overview of typical structural configurations and dependencies in social networks with the focus on those that are relevant for later chapters. As readers are not expected to have their background in statistics or any empirical science, a brief outline of the general ideas underlying statistical inference and modeling is provided. On this basis, examples of statistical models for cross-sectional social network data as well as basic models for longitudinal network panel data are presented. The latter are, like the SAOM, continuous-time Markov processes. 2.

(13) Chapter 2 introduces the stochastic actor-oriented model and related concepts that will be relevant in later chapters. The model is formulated, and its main assumptions are stated and discussed. Moreover, the most frequently used method for estimating the parameters of a SAOM, the method of moments (Snijders, 1996), is detailed, and available tests for the statistical significance of estimates are presented. Chapter 3 develops a measure of relative importance of effects in stochastic actororiented models.1 After a brief overview of corresponding approaches in regression models, we discuss the difficulties encountered when trying to interpret model parameters or to assess the relative importance of effects in a SAOM. We argue that a measure of relative importance has to be based on the influence of effects on actor decisions, propose such a measure, and demonstrate its utility on empirical data. The chapter is concluded by a discussion of the properties, benefits, and limitations of the proposed measure as well as possible modifications. Chapter 4 develops diagnostic methods for the exploration of inhomogeneities across actors and, in particular, for the detection of influential actors, i. e., outliers with a special impact on estimation results. The first part of the chapter proposes methods for the detection of deviating actors or groups of actors that are not well represented by the model2 and presents a possible way to improve the local fit by defining dummy variables for a distinguished treatment of distinct actors. The more problematic detection of influential actors is treated in the second part of the chapter where special diagnostics based on the sensitivity of estimates with respect to small perturbations are derived. The book is concluded by a summary of the findings and a discussion of necessary and possible future work.. 1 2. Parts of this chapter are published in Indlekofer and Brandes (2013). Parts of this chapter are published in Brandes, Indlekofer, and Mader (2012).. 3.

(14)

(15) Chapter 1 Statistical Analysis of Network Data Social networks describe systems of relations between social actors. Examples are relations of friendship among students in a class, collaborations between scientists, business transactions among companies of a certain economic sector, or trade agreements between nations. When analyzing social network data the units of analysis are the relations, particularly, the joint structure they constitute, rather than the individual actors. This system-level perspective is a distinctive characteristic delimiting network analyses from standard social science studies that are individual-level analyses studying mutually independent social entities. In such studies variables of interest are typically attributes of the studied entities and the objective is to investigate associations or causal dependencies between these attributes. Pertinent data comprises a random sample of the studied entities with their values of investigated attribute variables. The researcher might be interested in the determinants leading to differences in incomes of employees. The entities under study are then persons in employment and considered variables might be annual income, age, gender, and years of education. Corresponding data might reveal associations between these variables. For instance, annual incomes of females in the given sample may tend to be lower than those of males but relative differences may decrease with years of education. It is thereby characteristic that the analyzed data is monadic in the sense that the studied entities are regarded as independent autonomous units. In contrast to monadic data, dyadic data consists of pairs of entities. A dyad is made up of its two members and the relations between them. An example are married couples. Variables of interest can be monadic attributes of the members, as gender or age, dyadic attributes of the pair, as number of joint children or age difference between spouses, and attributes of the relations, as duration of marriage. Although consisting of two related entities, the dyad as a whole can still be regarded as an 5.

(16) Chapter 1. Statistical Analysis of Network Data independent autonomous unit. However, when analyzing dyadic data, a particular interest is in dependencies between attributes of the members or the pair and attributes of the relations between them. This additional level of analysis distinguishes dyadic data from monadic data. There is, however, another more substantive difference. With dyads it is a natural extension to regard social entities as members of more than one dyad. An example are friendship relations where a person is typically a members of several dyads. We say that two dyads are incident if they share a member. From this perspective, network data consists of incident dyads among a set of social actors possibly enriched by monadic or dyadic attributes. Thereby, it is in most applications improper to regard incident dyads as mutually independent so that the incidence structure of dyads implies a dependence structure between potential relations in a network. The assumption of independent autonomous units is, therefore, not maintained for network data. On the contrary, existing dependencies are not a nuisance but more often than not they constitute the actual research interest (Brandes, Robins, McCranie, and Wasserman, 2013, p.8). Indeed, exactly those dependencies and the consequent patterned structure are the essence of network data, and taking them into account is the defining aspect distinguishing network analysis from other approaches.. (a) Monadic data. (b) Dyadic data. (c) Network data. Figure 1.1: Monadic data refers to mutually independent entities; Dyadic data refers to pairs of entities, the pairs are mutually independent; Network data refers to a system of incident pairs that may depend on each other. However, most methods of traditional statistics and data analysis rely on independent units. As a consequence, network analysis requires specially designed methods and analytic concepts, some of which will be presented in the following. An extensive introduction to basic methodological concepts of social network analysis is provided by Wasserman and Faust (1994) which can be regarded as a standard reference in this field. More recent collections of articles by established social network scientists are given by Carrington, Scott, and Wasserman (2005), Freeman (2008), and Scott and Carrington (2011). A rather practical guide to the application of social network analysis methods is Henning, Brandes, Pfeffer, and Mergel (2012), providing addi6.

(17) 1.1. Basic Concepts and Terminology tional chapters on survey design and visualization, whereas Brandes and Erlebach (2005) approach the topic from an algorithmic and graph theoretic perspective. The following chapter begins with an introduction of basic concepts and notations in Section 1.1, followed in Section 1.2 by an overview of dependencies and resulting structural configurations that occur frequently in empirical network data. Section 1.3 introduces statistical models for social networks by starting with a brief outline of the general ideas underlying statistical modeling in Section 1.3.1, followed by examples of models for cross-sectional social network data in Section 1.3.2. Section 1.4 is about models for network panel data based on continuous-time Markov processes. Necessary mathematical elements are reviewed in Section 1.4.1 before examples of basic continuous-time Markov models for longitudinal network data are presented in Section 1.4.2.. 1.1. Basic Concepts and Terminology. An intuitively accessible data structure that is traditionally used for the representation of dyadic relations is a graph. Definition 1.1 (Graph) A (directed) graph is a pair G = (V, E) consisting of a finite set V = {1, . . . , N }, the vertex set, and a binary relation E ⊆ V × V on V , the edge set. G is a symmetric graph if E is symmetric, i. e., if for all i, j ∈ V (i, j) ∈ E ⇐⇒ (j, i) ∈ E .. According to Definition 1.1, a directed graph G = (V, E) consists of a vertex set V and a set E of ordered pairs of vertices. Analogously, an undirected graph is defined as a pair Ḡ = (V, Ē) that comprises a vertex set V and a set Ē of unordered pairs of vertices, i. e., of two-element subsets of V . Note that each undirected graph ~ = (V, E) ~ (and vice versa) by Ḡ = (V, Ē) implies a symmetric directed graph G ~ ∧ (j, i) ∈ E. ~ {i, j} ∈ Ē ⇐⇒ (i, j) ∈ E A graph representation of a network structure identifies each actor with a vertex and each relations with an edge between corresponding vertices. Some relations, like collaboration or co-authorship, are symmetric by nature so that an undirected graph is suitable to represent such networks. If relations are not necessarily symmetric, like friendship, trust, or advice seeking, corresponding networks are represented by 7.

(18) Chapter 1. Statistical Analysis of Network Data a directed graph. Due to its origins in mathematical graph theory, the term graph usually refers to undirected graphs. However, as each undirected graph implies a symmetric directed graph, we will use the term graph in this book for the more general class of directed graphs. A graph G = (V, E) can be stored as a list of elements in V and E. An often more convenient representation is the adjacency matrix of G. Definition 1.2 (Adjacency matrix) Let G = (V, E) be a graph. The adjacency matrix (xij )1≤i,j≤|V | of G is defined by ( 1 , if (i, j) ∈ E xij = 0 , if (i, j) ∈ /E. A convenient graphical representation of a graph is a node-link-diagram as illustrated in Figure 1.2a. Figure 1.2b shows the corresponding adjacency matrix.. 0 0  0  0  1  0 1 0 . (a) Node-link-diagram. 1 0 0 0 1 0 1 0. 0 1 0 1 0 0 0 0. 0 1 0 0 0 0 0 0. 0 1 0 0 0 1 0 1. 0 0 0 0 0 0 1 0. 1 1 0 0 0 0 0 0.  0 0  0  0  1  0 0 0. (b) Adjacency matrix. Figure 1.2: Graph representations. The expression adjacency matrix of a network refers to the adjacency matrix of a graph that represents the structure of the network. If there is no determined order of actors in the network, the adjacency matrix is not unique as permutations of rows and columns are possible. For convenience, we assume throughout this book that the actors in a network are labeled by consecutive numbers, thus, represented by a set of integers V = {1, . . . , N }, and we identify the network structure with the corresponding graph on vertex set V or with the associated sociomatrix x. Accordingly, depending on the context, terms like network structure, network, and graph as well as actor, vertex, or node and relation, tie, edge, or link will be used interchangeably. 8.

(19) 1.1. Basic Concepts and Terminology We further postulate that relations in a social network exist only between distinct actors. Thus, self-ties are excluded, and, in a graph G = (V, E) of a social network, (i, i) ∈ / E for each i ∈ V , i. e., E is an irreflexive binary relation. Such graphs are called loop-free, and the diagonal values of their adjacency matrices hold xii = 0 for each i ∈ V . In the context of social networks adjacency matrices with this property are referred to as sociomatrices. Definition 1.3 (Sociomatrix) A sociomatrix is an adjacency matrix x of a social network. In particular, there is a N ≥ 1 with x ∈ XN := {a ∈ {0, 1}N ×N | aii = 0, 1 ≤ i ≤ N } .. Obviously, XN is isomorphic to the set of all directed loop-free labeled graph with N vertices. The following elements and subgraph configurations are of special importance in social network analysis. Definition 1.4 (Tie variable, dyad, triad) Let x ∈ XN be a sociomatrix and let 1 ≤ i 6= j 6= k ≤ N denote actors of the associated network. The entry xij of x is referred to as tie variable. Possible states of a tie variable are {0, 1}. The pair of actors (i, j) together with the ordered pair of tie variables (xij , xji ) is referred to as dyad . Possible states of a dyad are {(0, 0), (0, 1), (1, 0), (1, 1)}. The triple of actors (i, j, k) together with the ordered set of ordered pairs of tie variables ((xij , xji ) , (xjk , xkj ) , (xki , xik )) is referred to as triad . Possible states of a triad are {(d1 , d2 , d3 ) | d1 , d2 , d3 ∈ {(0, 0), (0, 1), (1, 0), (1, 1)}}.. The concepts introduced so far are suitable to represent the relational structure of a network. Additional to the structural information, social networks may also comprise information on monadic or dyadic attributes.. 9.

(20) Chapter 1. Statistical Analysis of Network Data Definition 1.5 (Actor covariate, dyad covariate) Given a network on a set of actors {1, . . . , N } with sociomatrix x ∈ XN . An actor covariate is a vector-valued variable v ∈ RN assigning to each actor i an attribute value vi . A dyad covariate is a matrix-valued variable w ∈ RN ×N assigning to each tie variable xij an attribute value wij .1. To allow for concise notations, explicit references to potential covariates will be usually omitted. Instead, x shall refer implicitly to the complete network data encompassing the underlying graph together with potential covariates. Definitions 1.1 and 1.3 of graphs and sociomatrices allow only dichotomous tie variables, but situations where valued relations might be more appropriate are clearly conceivable. Two persons may know each other, like each other, be friends, or be best friends, and distinguishing between these categories may certainly be informative. Differences between types of network relations may even be essential as, e. g., in the case of diplomatic relationships between countries which may be alliances or hostilities. In principle, such differences can be represented directly by extending the range of tie variables accordingly. However, throughout this book tie variables will be solely dichotomous indicating whether a relation exists or not, and corresponding networks will be represented by sociomatrices x ∈ XN . Further note that in most networks as well as in the above definitions, actors are regarded as entities of the same type and each actor can be linked with each other actor. Such networks are denoted as one-mode networks. If the set of actors decomposes into two distinct groups and relations run only between actors of different groups not within groups, the network is denoted as two-mode network . Being an author of a scientific article is an example for a relation that defines a two-mode network between the two actor groups, authors and articles. Since the following chapters treat only one-mode networks, the term network will always refer to onemode networks throughout this book.. 1.2. Typical Dependencies in Networks. The heart of network science is dependence, both between and within variables (Brandes et al., 2013, p.6). Several characteristic dependencies are theoretically 1 Note that the term dyad covariate may be misleading as a dyad covariate assigns values to directed pairs of actors rather than to dyads in the sense of Definition 1.4.. 10.

(21) 1.2. Typical Dependencies in Networks substantiated and have been found in empirical studies revealing that empirical social networks exhibit remarkable deviations from randomness. Particularly, the high frequency of certain structural patterns is not explainable under the assumption of mutually independent and randomly evolving ties. However, many of those unexpected patterns can be rather well explained by specific structural mechanisms that have been studied for years by social network theorists. Sparseness: A basic characteristic of social networks is the tendency to be sparse. Accordingly, they contain considerably less relations than one would expect in a network where the probability of an existing relation between two arbitrary actors is 50%, and the outdegrees, i. e., the numbers of outgoing ties, of individual actors tend to be much lower than the possible maximum of N − 1. Reciprocity: A key feature in many empirical social networks is that relations tend to be reciprocated. A relation from i to j is more likely if there is also a relation from j to i, thus, tie variables xij and xji depend on each other. While asymmetric relations are also frequently encountered, the number of symmetric relations exceeds in most social networks the expectations under the assumption of mutually independent ties noticeably. The occurrence of reciprocity has been studied from the beginnings of social network analysis, evaluated already in Moreno (1934) and Moreno and Jennings (1938), and is well-founded in sociological theory (cf. Heider, 1958). The degree of reciprocity depends of course on the type of relation. While assuming a high degree of reciprocation is most appropriate when actors are individuals and relations are positive affect, it is pointless for networks with relations that are symmetric or asymmetric by definition, particularly, for undirected networks. An overview of the concept and related methods is given in Wasserman and Faust (1994, Chapter 13). Measures of reciprocity in network data have been proposed, for instance, by Katz and Powell (1955), and a review of such measures is given in Rao and Brandyopadhyay (1987). Holland and Leinhardt (1975) have been among the first to propose statistical models for networks taking reciprocity into account. Homophily: According to the saying birds of a feather flock together, relations are often encountered between similar actors (McPherson, Smith-Lovin, and Cook, 2001; Lazarsfeld and Merton, 1954). This may refer to similarity in terms of actor covariates but also in terms of structural positions in the network and relations to other actors. For instance, Mazur (1971, p.308) claimed the theoretical statement “Friends are likely to agree, and unlikely to disagree; close friends are very likely to agree, and very unlikely to disagree” which has been, with respect to agreements about third parties, empirically supported by Davis, Holland, and Leinhardt (1971) and Holland and Leinhardt 11.

(22) Chapter 1. Statistical Analysis of Network Data (1975). Note that observed similarity of related actors may be indeed explained by homophily, i. e., by the idea that actors prefer relations to similar actors. However, they may also be explained by social influence, i. e., by the idea that related actors influence each other in such a way that they gradually assimilate (Steglich, Snijders, and Pearson, 2010; Knecht, Snijders, Baerveldt, Steglich, and Raub, 2010). Transitivity: Interpersonal choices tend to be transitive – that is, if actor i chooses actor j, and actor j chooses actor k, then i is likely to choose k (Davis et al., 1971). According to the expression a friend of my friend is my friend, a tendency towards transitive relations is characteristic for most social networks. A triad involving actors i, j, and k is referred to as a transitive triad if it contains a two-path, say i → j → k, that is abridged by the direct relation i → k. In this case, i → k is a transitive tie. Regarding the indirect relation as being closed by the transitive tie, this phenomenon is also referred to as triadic closure. In most empirical social networks the number of transitive triads is indeed higher than expected under the assumption that tie variables in a triad are mutually independent. Empirical support for transitivity as an important structural tendency in social networks has been found, for instance, by Davis (1970), and Holland and Leinhardt (1972). The concept of transitivity dates back to Georg Simmel and is firmly established in sociology (Coleman, 1990; Simmel, 1950 [1917]). In the context of networks it has been studied already in Rapoport (1953a,b). Holland and Leinhardt (1971) proposed transitivity in networks as a generalization of the concept of structural balance (Heider, 1946; Cartwright and Harary, 1956). Clustering: In a network with the tendency towards reciprocity and transitive closure, these local-level mechanisms can lead to a global-level occurrence of cohesive subgroups of actors with a high relative frequency of relations within subgroups compared to relations between subgroups. Such cohesive subgroups are referred to as clusters and the segmentation into clusters as clustering.2 For most empirical networks, visualizations reveal a tendency towards a clustered structure. However, the segmentation is usually not perfect resulting in various similarly plausible clusterings. It is an active area of research to develop formal mathematical definitions that are able to capture intuitive and theoretical properties of cohesive subgroups in empirical networks. Besides these examples, there are numerous other local and global structural properties, like hierarchy, assortativity, scale-free degree distributions, etc., that result from characteristic dependencies in social networks. However, as they are less important for the concepts in later chapters, they are not further discussed. 2 Note that observed transitivity may indeed indicate a tendency towards clustering. However, it may also point to a hierarchical structure.. 12.

(23) 1.3. Statistical Modeling of Social Networks. 1.3. Statistical Modeling of Social Networks. The complicated dependence structure between network relations makes the statistical modeling of social networks difficult. Plausible models require the involvement of typical network dependencies such as reciprocity, transitivity, or homophily which precludes the validity and direct applicability of most traditional statistical methods for inference that are usually built on the assumption of independent units of observation. Therefore, tailored statistical methods and models for social networks have been proposed. Before concrete network models will be presented in Section 1.3.2, Section 1.3.1 will briefly sketch the typical way of proceeding and the underlying reasoning when applying methods of statistical inference, particularly, statistical models, to investigate empirical research questions. This is provided to enable readers not familiar with this scientific area to follow the reasoning of later chapters. It is by no means a thorough introduction. Instead, the interested reader is referred to the wide range of textbooks on this subject giving Mood, Graybill, and Boes (1974), Montgomery and Runger (2002), Casella and Berger (2002), Lehmann and Romano (2005), and Young and Smith (2010) as exemplary references with varying emphases and levels of detail.. 1.3.1. Statistical Inference in Empirical Research. Empirical research relies on data. Instead of reasoning from a mathematical abstraction of a general law to specific cases, as for instance in theoretical physics, empirical research generalizes from a specific set of measurements, the data, to general cases. Thereby, the conceptual set of all cases is denoted as the population and a subset of concrete cases for which data is available as a sample. The process of systematically reasoning from a sample to a population is referred to as statistical inference. The aim is to learn from the data, in particular, to infer conclusions that lead to general statements beyond the available data sample. Such conclusions might refer to cases similar to those in the observed sample or to future properties of the observed cases. In either instance, drawing conclusion on all cases from observations of some cases, is likely to result in errors. Methods of statistical inference try to identify generalizable regularities in the observed data but also to quantify the risk of errors. In empirical social research the interest is often in explaining the causal dependencies between attributes of social actors and observed processes or structures these actors are involved in. Accordingly, a typical sample consists of data available for several, in some sense representative, social actors which is used to support or refute hypotheses derived from sociological theories regarding the corresponding population. A typical quantitative social research study is structured as follows: 13.

(24) Chapter 1. Statistical Analysis of Network Data 1. Formulate the research question 2. Construct a theory that provides answers to the research question by explaining causal dependencies between variables 3. Derive concrete hypotheses from the theory and formulate consequences that are testable by means of data 4. Design and perform the data collection 5. Apply statistical methods to test hypotheses with the data 6. Use diagnostics to validate and interpret results To conduct Step 5 and Step 6, methods of statistical inference are employed. Particularly important means are statistical models. 1.3.1.1. Statistical Models. A statistical model is a mathematical formulation describing a phenomenon by describing the relation between variables associated with this phenomenon. Thereby, relations are allowed to be partly stochastic in terms of being composed of a deterministic part, the regularity, and a the random part, the variation. The aim is to explain the regularity while being able to quantify the variation. The specification of a model is usually a simplification of the problem based on contextual assumptions on the deterministic as well as the random part. To allow for degrees of freedom, parametrized formulations are common. In mathematical terms, a statistical model can be regarded as a probability space (Ω, P) assigning to each possible outcome ω, i. e., to each possible instance of the investigated phenomenon, a probability P(ω). Accordingly, a parametrized model defines a family of probability distributions (Pθ )θ∈Θ on Ω. If data on the studied phenomenon is available, the purpose of a statistical model may be described as explaining the data-generating process such that, in a reasonable model, observing the given data should be quite probable. Therefore, from a family of models the element that fits the given data best will be chosen. This means, in practice, that model parameters θ̂ are determined such that the probability of the observed data is as high as possible in the associated model. Methods of statistical inference enable us then to make quantitative statements about the plausibility of the chosen model. If a model is validated by the observed data, it can be used to understand, describe, and quantify important aspects of the studied phenomenon and in some cases even to predict outcomes in slightly modified situations. 14.

(25) 1.3. Statistical Modeling of Social Networks 1.3.1.2. Example: Linear Regression. Regression models are in general designed for exploring the relation between two or more variables. Linear regression models are used if the relations are supposed to be linear. For simplicity, we consider only two variables in which case the model is denoted as simple linear regression model . To express the relation between the two variables the model assumes that one variable, the dependent variable, usually denoted as Y , is determined by the other variable, the independent or explanatory variable, denoted as X. Suppose we have a data sample of n observations revealing X-values x1 , . . . , xn and Y -values y1 , . . . , yn . In order to learn how the X-value of a case i determines its Y -value, the informative features of the data are not the separate values xi and yi but the pairs of variates (xi , yi ). From this perspective, the process of interest is the mapping of X onto Y which is assumed to be composed of a deterministic component in form of a linear function x 7→ β0 + β1 x and a random component in form of a stochastic error term  such that yi = β0 + β1 xi +  . The error term  accounts for unexplained variability and is supposed to be a random variable with expected value 0 and variance σ 2 that is independent of X. Therefore, given a value xi , the dependent variable Y is a random variable and the observed value yi is regarded as its outcome. Given X = x, the conditional expectation E(Y |X = x) of Y is β0 + β1 x and the actual value of Y is made up of this expected value plus the attained value of , thus, Y = β0 + β1 X +  .. (1.1). (1.1) is called simple linear regression model and might be chosen based on theoretical knowledge of the relationship between X and Y or based on guesses resulting from visual exploration of the data. 1.3.1.3. Parameter Estimation. The unknown model parameters of a simple linear regression model are the intercept β0 , the slope β1 , and the error variance σ 2 . To determine them such that the given sample is likely to be a realization of (1.1), a traditional approach is the method of least squares. It estimates parameters in order to minimize the sum of squared differences between the observed outcomes yi and the corresponding expected values E(Y |X = xi ). Hence, the least squares estimators (β̂0 , β̂1 ) of (β0 , β1 ) minimize L(β0 , β1 ) :=. n X. (β1 xi + β0 − yi )2 .. (1.2). i=1. 15.

(26) Chapter 1. Statistical Analysis of Network Data They can be determined by determining the roots of partial derivatives ∂ L which leads to ∂β1. ∂ L ∂β0. and. xy − x · y and β̂0 = y − β̂1 x , (1.3) x2 − x2 where · denotes the mean over all observed values. Hence, the fitted regression line is Ŷ (x) = β̂0 + β̂1 x. Differences β̂1 =. ei := yi − Ŷ (xi ) ,. (1.4). for 1 ≤ i ≤ n, describing the error of model predictions with respect to the ith case, are denoted sum of squared Pn as 2residuals. It 2can be shown that the expected 2 residuals is E( i=1 ei ) = (n − 2)σ such that an estimator of σ is given by Pn 2 e 2 σ̂ = i=1 i . n−2 Regarding the current data sample as outcomes of the random variables X and Y , estimates β̂0 , β̂1 , and σ̂ are themselves random variables. Therefore, it is reasonable to consider expected values and variances of estimates. For instance, based on the above results and some algebra, it can be shown that E[β̂1 ] = β1. and. σ2 , 2 i=1 (xi − x̄). V ar[β̂1 ] = Pn. (1.5). which allows us to estimate a standard error s.e.(β̂1 ) as a measure of accuracy of β̂1 taking the randomness of the underlying sample into account. Note that the method of least squares is just one possible approach to estimate parameters of a linear regression model. A further approach, the maximum likelihood estimation, will be sketched briefly because it a very general and widely used principle (see, e. g., Casella and Berger, 2002, 6.2). It is a direct formalization of the idea that the observed data should be likely in the model. Let (Ω, Pθ ) with θ ∈ Θ ⊂ RK be a family of models and let z ∈ Ω be a given sample. The function L : Θ → R;. θ 7→ Pθ (z). mapping a parameter vector θ to the probability of sample z in the model defined by Pθ is called likelihood . A parameter θ̂ maximizing L, i. e., θ̂ = arg maxθ L(θ) is called a maximum likelihood estimator of θ. Therefore, the regression line with the best fit is the one in which the given sample data is as probable as possible. To determine the likelihood of a set of pairs of variates according to a given regression model some additional information on the error term  is necessary, though. Usually, the assumption is made that  is normally distributed with mean 0 and variance σ 2 . 16.

(27) 1.3. Statistical Modeling of Social Networks 1.3.1.4. Hypothesis Testing. A statistical hypothesis is a statement about parameters of a statistical model. The process of deciding on the basis of empirical data whether a formulated hypothesis is true or not is a hypothesis test. Formally, a statistical hypothesis is expressed as a null hypothesis H0 together with a corresponding alternative hypothesis H1 . H0 is typically an explicit statement on the true value of a specific model parameter θ such as θ = θ0 for a given value θ0 , and H1 comprises the remaining cases, thus, H0 : H1 :. θ = θ0 , θ 6= θ0 .. Moreover, H0 and H1 are specified as exhaustive and mutually exclusive statements such that H1 is true if H0 is false and vice versa. This strategy arises from the fact that a given data sample can only support an assumption but cannot definitely prove its universal validity. However, it can serve as a sort of counterexample to reject a stated assumption. Therefore, H0 is typically formulated so as to express the opposite of what is intended to be shown with the aim of rejecting it by means of the empirical data. From this perspective, a hypothesis test can be regarded as a stochastic proof by contradiction made up of following steps: 1. Formulate H0 as the statement intended to be disproved and H1 as the negation of H0 . 2. Choose a test statistic 3 Z for which, under the assumption that H0 is true, a probability distribution can be derived. 3. Compare the value of Z calculated from the given data with its reference distribution. 4. Based on this comparison, reject H0 or retain H0 . Note that if a test results in retaining H0 it does not imply that the test accepts H0 but only that it failed to reject it. The probability that a test indeed rejects H0 if it is false is denoted as statistical power . Due to the stochastic components, the probability of a wrong decision is always positive. Formally, two types of errors are possible 1. Type I error : H0 is rejected although it is true 2. Type II error : H0 is retained although it is false 3 A statistic is a function of the data. Note that if the data is considered as the outcome of a random variable the value of the statistic is a random variable as well.. 17.

(28) Chapter 1. Statistical Analysis of Network Data Hence, the statistical power of a test is the probability of not committing a Type II error. The probabilities of either type of error should be of course as low as possible which can be achieved to some extent by increasing the sample size. For a fixed sample size, however, it is not possible to control for both error probabilities simultaneously. Therefore, it is customary4 to appoint an upper bound α of acceptable probabilities of a Type I error, called level of significance, and to construct the test so that it holds the given level of significance. Typical values of α are 0.05, 0.01, or 0.001 but these choices are more or less arbitrary. The smallest level of significance that would lead to the rejection of H0 is referred to as p-value. Roughly spoken, the p-value gives the probability that under H0 we observe values of the test statistic Z that are at least as extreme as the value attained for the given data sample. Accordingly, H0 is rejected if and only if the p-value is smaller than α. In this case, the test indicates the statistical significance of θ 6= θ0 at the significance level α. In general, the smaller the p-value the stronger the evidence against H0 . Note that in this form, hypothesis tests control the probability of a Type I error, they do not control the probability of a Type II error. Hence, a large p-value can occur either because H0 is true or because H0 is false but the test has not enough power. As an example, consider parameter β1 of the simple linear regression model (1.1). A slope β1 6= 0 suggests a linear relation between Y and X while for β1 = 0 a linear dependence is doubtful. In this context, a common test statistic for testing the null hypothesis H0 : β1 = 0 against the alternative hypothesis H1 : β1 6= 0 is the β̂1 −0 . Under the assumption that H0 is true and the error terms  t-statistic t = s.e.( β̂) are independent and normally distributed with expected value 0 and variance σ 2 , t follows a t-distribution with n − 2 degrees of freedom where n is the number of cases in the sample. For a given sample, test statistic t can be calculated from (1.3) and (1.5). The corresponding p-value is the probability of drawing a value x with |x| ≥ |t| from a t-distribution with n − 2 degrees of freedom and can be looked up in corresponding tables or calculated by common statistical software packages. If the statistical significance of β1 6= 0 at a suitable significance level can be shown, a linear relation between X and Y is suggested. The slope β1 can be interpreted as the change in the expectation of Y caused by a one-unit change in x. Note, however, that statistical significance does not necessarily imply scientific significance in the sense that |β̂1 | > 0 might be still negligibly small. The concept of testing statistical hypotheses on empirical data is one of the most useful aspects of statistical modeling. Hypothesis tests may indicate the strength of evidence for or against the researcher’s theory. Therefore, statistical models are typically designed in such a way that they allow statistical tests of hypotheses related to the research question. 4. 18. although sometimes criticized, cf. , e. g., Cohen (1992, 1994) or Lehmann and Romano (2005).

(29) 1.3. Statistical Modeling of Social Networks. 1.3.2. Models for Cross-sectional Network Data. In social network analysis most research questions refer to dependencies between tie variables, thus, dependencies within the values of one type of variable rather than between different types of variables. Therefore, traditional models that rely on the assumption of independent and randomly sampled units of observation are for one thing not applicable and for another thing not suitable for answering the usually occurring research questions. Statistical models for networks are primarily designed to test theories about relational structures and mechanisms, maybe enriched by influences on or from exogenous variables such as actor or dyad covariates. A common procedure is to derive, by means of a statistical model, structural properties that a network governed by the processes of a specific theory should reveal and to evaluate them against empirical data. Thereby, the evaluation is typically done by estimating model parameters, with the objective that the predicted structural properties fit the properties of the data, and testing them for statistical significance. Note that in many studies only one network observation is available and the focus of the analysis is on characteristics of that specific network. General conclusions about further networks are usually not justifiable and not the aim of the analysis. A compact review of statistical models for social networks is given in Snijders (2011) and a collection of relevant articles is Carrington et al. (2005). Extensive reviews of models for network data are Goldenberg, Zheng, Fienberg, and Airoldi (2009) and Kolaczyk (2009). The earliest statistical models suitable for the analysis of networks have their origins in the discipline of mathematical graph theory (Harary, 1969). They are referred to as random graph models (Bollobás, 1985) and define probability distributions on sets of graphs. Thus, a random graph model is a probability space (Ω, P) with a sample space Ω that consists of graphs. Let Ω denote in the following the set of undirected loop-free labeled graphs and ΩN the subset of graphs in Ω that contain exactly N vertices. 1.3.2.1. G(N, p) Models. One of the earliest random graph model was proposed by Gilbert (1959). It is often referred to as Bernoulli graph but, depending on the discipline, it may occur with different names. A common notation is G(N, p) where N is a positive integer and p ∈ [0, 1] a real number. It is originally defined by the description of a generation process (Gilbert, 1959, p.1141): “For all pairs of points make random choices, independent of each other, whether or not to join the points of the pair by a line. Let the common probability of joining be p.” 19.

(30) Chapter 1. Statistical Analysis of Network Data Given N such points, the probability that a graph G ∈ ΩN is the outcome of this process depends on the number of edges in G. Since edges emerge independently of each other, it follows directly that a graph G ∈ ΩN with M edges has in G(N, p) the probability Pp (G) = pM · (1 − p). N (N −1) −M 2. .. (1.6). Obviously, G(N, 12 ) defines the uniform distribution on ΩN . Moreover, the plain structure of G(N, p) allows several precise characterizations as, for instance, that the expected edge density of a graph in G(N, p) is equal to p and, consequently, that the expected degree of any vertex is p(N − 1). In principle, the model could also be used for statistical inference. Given an observed graph Gobs ∈ ΩN , parameter p can be estimated to fit Gobs based on the likelihood L(p) = Pp (Gobs ). However, it has not been designed for the modeling of empirical data. Instead, Gilbert (1959) was mainly interested in statements about the behavior of Pp for N → ∞, particularly, in bounds for the probabilities of graphs in which all pairs of vertices or two specific vertices are connected by a path of edges.5 Indeed, graphs generated from a G(N, p) lack most of the characteristic features of empirical networks (cf. Section 1.2). For instance, it can be shown that if p grows inversely proportional to N , the degree distributions of graphs drawn from G(N, p) follow for large N approximately a Poisson distribution. This implies, in particular, that vertices with a notably higher degree than the average are extremely unlikely. However, many empirical social networks contain indeed some hubs, i. e., actors with an especially high degree. More specifically, it has been found that empirical degree distributions often resemble a power-law, thus, 1 P(d(i) = k) ≈ c · γ k where c, γ > 0 are constants and d(i) denotes the degree of actor i. Therefore, efforts have been made to design random graph models that generate graphs with power-law degree distributions. In a process defined by Barabási and Albert (1999) new vertices are successively added to the current graph and create a fixed number of edges to already existing vertices whereby the probability of an edge to vertex v is proportional to its degree d(v). Hence, vertices are assumed to have a preference towards creating edges to more popular vertices. A more formal description of this preferential attachment model has been proposed by Bollobás, Riordan, Spencer, and Tusnády (2001), and indeed it generates graphs with degree distributions that 5 Motivated by similar interests, the closely related Erdős-Rényi model has been proposed by Erdős and Rényi (1959) in the same year and further investigated in Erdős and Rényi (1960). For positive integers N and M , this model assigns the same probability to each graph in Ω with N vertices and M edges and no probability to all other graphs.. 20.

(31) 1.3. Statistical Modeling of Social Networks resemble the degree distributions of empirical networks. However, it fails in reproducing further typical properties such as transitivity, homophily, or clustering. Similarly, another well-known model introduced by Watts and Strogatz (1998) is especially designed for generating small-world graphs, i. e., graphs that exhibit a highly clustered structure and in which the average length of a shortest path between two vertices is rather short, but fails in reproducing at the same time realistic degree distributions and other properties of empirical social networks. 1.3.2.2. Exponential Random Graph Models. The above-mentioned models assume either mutually independent ties or are especially tailored to reproduce a specific structural property. A more flexible and sophisticated class of models that enable the inclusion of various different relational dependencies is the class of exponential random graph models (ERGM ) also referred to as p∗ models. They have been proposed for the modeling of networks by Frank (1991) and Wasserman and Pattison (1996) building on the pioneering work of Frank and Strauss (1986) who introduced Markov random graphs. Markov random graph models are defined on ΩN and assume that non-incident tie variables are independent conditional on the rest of the graph. By applying the Hammersley-Clifford theorem (Besag, 1974), Frank and Strauss (1986) could show that each Markov random graph model in which isomorphic graphs have equal probabilities can be expressed as an exponential family that depends on the number of specific elementary subgraph configuration, in particular, on the number of edges, the number of triangles, and the number of k-stars with k ≥ 2. This characterization gave rise to the class of exponential random graph models. ERGMs are generalizations of Markov graph models allowing for the dependence on arbitrary subgraph configurations and abandoning the restriction to undirected graphs. A random graph model (XN , Pθ ) belongs to the class of ERGMs if the probability of a network x ∈ XN is given by ! K X 1 · exp θk · sk (x) , (1.7) Pθ (x) = κθ k=1 P  P K where κθ = x∈XN exp k=1 θk · sk (x) is a normalization constant ensuring that probabilities of all x ∈ XN sum up to one6 and θ = (θ1 , . . . , θK ) ∈ RK are model parameters. Terms sk define statistics of networks, i. e., functions mapping a network x ∈ XN to a real number, sk : XN −→ R,. 1≤k≤K .. 6 XN contains 2N (N −1) networks so that κθ and, therefore, Pθ (x) for a given network x are practically not computable.. 21.

(32) Chapter 1. Statistical Analysis of Network Data They can be specified so as to describe structural features of the network like density, occurrence of high-degree vertices, or transitivity and may also involve covariates. In practice, they count the number of specific subgraph configurations as the number of ties, the number of vertices with a degree greater than d, 1 ≤ d ≤ N − 2, or the number of transitive triplets. By means of appropriately defined statistics sk , investigated dependencies between tie variables can be incorporated flexibly into the model and, depending on associated parameters, their values determine the probability of a network in the model. Since the collection of statistics s1 , . . . , sK is alterable, (1.7) describes a class of model families. A concrete model is selected by specifying the collection of included statistics and determining the values of associated model parameters. Parameters can be either fixed in order to specify a model in which probable networks have certain structural features or estimated from empirical data which allows to test hypotheses on structural properties and tendencies such as reciprocity, transitivity, or homophily. For estimating parameters of an ERGM, Snijders (2002) proposes an algorithm that approximates maximum likelihood estimates and can be described roughly as a stochastic variant of the Newton-Raphson algorithm. For the iterative approximation of the true parameters, random draws of networks generated by ERGMs based on current parameter values are required. However, in contrast to the G(N, p) model which is for p ∈ (0, 1) also a member of the class of ERGMs7 , the generation of graphs from general ERGMs is complex. Since tie variables may mutually depend on each other, the probability of one tie may influence the probability of other ties such that they cannot be determined independently like in the generation process of Gilbert (1959). Instead, Markov Chain Monte Carlo methods are applied. Specific realizations based, e. g., on Gibbs sampling or on the Metropolis-Hastings algorithm have been elaborated and discussed in Snijders (2002). The choice of statistics included in an ERGM is made by the researcher based on knowledge from theory or earlier studies and may be motivated by hypotheses he/she intends to test. In principle, a great variety of statistics can be defined and incorporated into (1.7). In practice, however, many specifications lead to degenerated models in which only a few small sets of extreme graphs, e. g., almost complete graphs or extremely sparse graphs, are probable while intermediate graphs are very unlikely. This phenomenon is referred to as near-degeneracy and imposes restrictions on reasonable model specifications (cf. Handcock, 2003; Hunter and Handcock, 2006; Snijders, Pattison, Robins, and Handcock, 2006). An extensive introduction to exponential random graph models is given by the textbook of Lusher, Koskinen, and Robins (2012). Overview articles are Snijders et al. (2006) and Robins, Pattison, Kalish, and Lusher (2007a); Robins, Snijders, 7. 22. (1.6) can be rearranged to Pθ (x) =. 1 κθ. p exp(θ · M ) with θ = log( 1−p ) to have form (1.7)..

(33) 1.4. Statistical Models for Network Panel Data Wang, Handcock, and Pattison (2007b). An implementation is available in the standalone software package PNet (Wang, Robins, and Pattison, 2009) from http:// sna.unimelb.edu.au/PNet and in the R-package ergm (Handcock, Hunter, Butts, Goodreau, Krivitsky, and Morris, 2011) as part of statnet (Handcock, Hunter, Butts, Goodreau, and Morris, 2003), an extensive package comprising various statistical methods for the analysis of network data.. 1.4. Statistical Models for Network Panel Data. Most social networks are dynamic by nature; new ties can be established and old ones can be terminated (Snijders and Steglich, 2013, p.7). Therefore, a single network observation is usually the result of an untraceable history of tie changes. While the single tie changes might be explainable by relatively simple rules, the resulting network constitutes a complex dependence structure that seems to be nearly impossible to disentangle. In cases where complete and fine-grained information about the timing and ordering of tie changes is available, the complex evolution history is split into its elementary changes, thus, dependencies are split into endogenous feedback processes.8 Accordingly, recent models for such fine-grained longitudinal network data (Butts, 2008; Brandes, Lerner, and Snijders, 2009) assume that tie changes are conditionally independent given the network structure immediately before the changes and can be estimated by computationally more efficient algorithms than, e. g., the algorithms for estimating ERGMs described in Section 1.3.2.2. For many empirical networks, however, detailed temporal information is not available. Instead, data often falls between these two extremes of either complete or no information on the order and times of tie changes. Throughout this book, the interest will be in such types of longitudinal data. Formally, we assume network panel data, i. e., two or more observations of the same network at discrete, consecutive points in time, thus, a sequence of M > 1 networks x(1) , . . . , x(M ) ∈ X N observed at times t1 , . . . , tM . The network ties are assumed to encode relational states, such as friendship, trust, or cooperation, that may change but feature a tendency to endure over time and the network state x(h) at a particular observation moment th is assumed to be evolved dynamically from the preceding observation x(h−1) albeit lacking timestamps that indicate the times and ordering of changes between th−1 and th . 8 Note that fine-grained temporal information is typically available if tie variables encode relational events, as email correspondence or telephone calls, rather than relational states, as friendship, trust, or collaboration. Therefore, such networks are often referred to as event networks and corresponding tie changes are called events.. 23.

(34) Chapter 1. Statistical Analysis of Network Data A possible framework for the modeling of network panel data is a temporal extension of the ERGM (cf. Robins and Pattison, 2001; Hanneke, Fu, and Xing, 2010; Cranmer and Desmarais, 2011; Krackhardt and Handcock, 2007). Bringing in a temporal component into the class of ERGMs is straightforward. The conditional probabilities Pθ(h) (X(th ) = x(h) |x(h−1) ), for 1 < h ≤ M , are modeled similarly as in (1.7) with the difference that statistics sk are allowed to be functions of both networks x(h) and x(h−1) . In formulae, ! K X 1 (h) · exp θk · sk (x(h) , x(h−1) ) . Pθ(h) (X(th ) = x(h) |x(h−1) ) = κθ(h) k=1 Statistics might purely depend on x(h) , expressing only structural dependencies at time th , or on x(h) and x(h−1) , expressing additional dependencies on the preceding observation. The length of time spans th − th−1 as well as the potential orderings of tie changes within [th−1 − th ] are ignored. Against that, an alternative class of models for network panel data, which will be the focus of the following chapters, adopts the approach of rather extrapolating the unobserved evolution between observation moments from the given data. Instead of representing the dynamics leading from x(h−1) to x(h) by one jump, they model them as the result of a series of elementary changes, thus, as a more or less continuous process independent of the concrete observation moments. This principle goes back to Holland and Leinhardt (1977) who also proposed to model the unobserved evolution as a continuous-time Markov process and concretized the proposal by postulating that changes of single tie variables take place only successively, not concurrently. It implies that at each moment the current network structure is the determinant of which change might occur next. Hence, the process feeds back upon itself whereby the network structure is regarded as the dependent and the explanatory variable at the same time. This design allows to analyze and maybe empirically evidence dependencies as described in Section 1.2 or more general causalities where the interest is in the endogenous effect of the network on itself.. 1.4.1. Continuous-time Markov Processes. The models for network evolution considered in the following are based on the assumption that the observed networks are outcomes of a Markov process evolving in continuous time, thus, are specifications of continuous-time Markov processes.9 A continuous-time Markov process is a special stochastic process.. 9. 24. Textbooks on these processes are, e. g., Karlin and Taylor (1975, 1981) and Norris (1997).

(35) 1.4. Statistical Models for Network Panel Data Definition 4.6 (Stochastic process) Let (Ω, P) be a probability space and (X , Σ) be a measurable space. A stochastic process with state space X is a family {X(t), t ∈ T } of random variables X(t) : Ω → X indexed by a totally ordered set T .. In later applications, stochastic processes will proceed over time taking place on the set of loop-free directed graphs over a constant vertex set. By assuming that changes of the graph may happen at every moment in time, the considered models are stochastic processes on state space X N indexed by time t ∈ T := R+ 0. A main question is how a process {X(t), t ∈ T } may be specified in order to stay statistically manageable while, at the same time, being suitable for a realistic model of social network dynamics that allows for the study of occurring research questions. A particular interest is in statements about associated probabilities, such as P(X(t) = x) for a given x ∈ X N at a specific time t ∈ T . However, even if we assume that, at an arbitrary moment t0 ∈ T , statements about the distribution of future states {X(t), t > t0 } are possible based on knowledge about the past {X(s), s ≤ t0 }, the specification of such relations would be, in general, a family of functions (Pt0 )t0 ∈T mapping the time onto a set of probability mass functions depending on a potentially uncountable number of state histories, thus, still a system that is mathematically hardly tractable. A characterization of such a system will be facilitated by the following requirement on the stochastic process. Definition 4.7 (Continuous-time Markov process) A stochastic process {X(t), t ∈ T } on state space X N is a continuous-time Markov process if for each x ∈ X N and any 0 ≤ t0 < t P (X(t) = x | X(s) = x(s), s ≤ t0 ) = P (X(t) = x | X(t0 ) = x(t0 )) .. (1.8). (1.8) is referred to as Markov property.. In other words, the Markov property requires that, given the present X(t0 ), the conditional distribution of the future {X(t), t > t0 } is independent of the past {X(t), t < t0 }, thus, a continuous-time Markov process is memoryless, and, at each moment, the distribution of the process at any future moment depends only on the current state. Accordingly, for a Markov process, the conditional distributions of future states {X(t), t > t0 } are determined by its state x̃ ∈ X N at time t0 ∈ T . The number of possible state histories to condition on is thereby reduced to | X N | so that 25.

(36) Chapter 1. Statistical Analysis of Network Data distributions (Pt0 )t0 ∈T of future states can be expressed as a family of matrix-valued functions Pt0 : [t0 , ∞) −→ [0, 1]| X N |×| X N | t 7−→ (P (X(t) = x | X(t0 ) = x̃))x̃,x∈X N .. (1.9). A further simplification is achieved by requiring that the right side of (1.9) depends only on the time span t − t0 elapsed between t0 and t and is independent of the specific moments t0 and t. Definition 4.8 (Stationary transition distribution) A continuous-time Markov process {X(t), t ∈ T } has a stationary transition distribution if for all 0 ≤ t0 ≤ t P (X(t) = x | X(t0 ) = x̃) = P (X(t − t0 ) = x | X(0) = x̃) .. (1.10). For a continuous-time Markov process {X(t), t ∈ T } with stationary transition distribution, (1.9) is independent of t0 , thus, can be expressed as solely one matrixvalued function rather than a family of such functions P : [0, ∞) −→ [0, 1]| X N |×| X N | t 7−→ (P (X(t) = x | X(0) = x̃))x̃,x∈X N .. (1.11). (1.11) is referred to as transition matrix of {X(t), t ∈ T }. Let {X(t), t ∈ T } be a continuous-time Markov process with stationary transition distribution. Obviously, the Markov property guarantees that for all x̃, x ∈ X N and t, h ∈ T P (X(t + h) = x | X(0) = x̃) X (1.8) = P (X(t + h) = x | X(h) = x0 ) · P (X(h) = x0 | X(0) = x̃) x0 ∈X N. holds. Due to (1.10), we can further write Px̃x (t + h) = using matrix notation P (t + h) = P (h) · P (t) .. P. x0 ∈X. Px0 x (t) · Px̃x0 (h) and (1.12). In order to characterize how the transition matrix changes with the elapsed time t, we consider the limit of the difference quotient     P (t + h) − P (t) (1.12) P (h) − I = lim ·P (t) , lim h↓0 h↓0 h h | {z } =:Q. 26.

Referenzen

ÄHNLICHE DOKUMENTE