Correlated Voting in Multipopulation Models, Two-Tier Voting Systems, and the Democracy Deficit

(1)

Gabor Toth

Correlated Voting in Multipopulation Models, Two-Tier Voting Systems, and the Democracy Deficit

Dissertation

Fakultät für

Mathematik und

Informatik

(2)

Correlated Voting in Multipopulation Models, Two-Tier Voting Systems, and

the Democracy Deficit

by

Gabor Toth

DISSERTATION

submitted for the degree of

Doctor of Natural Sciences (Dr. rer. nat.) in Mathematics at the Faculty of Mathematics and Computer Science

of the FernUniversit¨ at in Hagen

November 2019

(3)

(4)

Introduction

To paraphrase Winston Churchill, democracy seems to be the least bad form of government humanity has tried so far, but it is far from perfect.¹ In a direct democracy, where people vote on every single issue to make a decision, we had better hope the choice is between only two alternatives, because that is the only case where we have a universally accepted way of arriving at a group decision: Kenneth May [29] showed that the majority rule is optimal in this scenario. If there are more than two alternatives, it is impossible to aggregate preferences or votes of voters in such a way that the procedure satisfies reasonable conditions that ensure it is not manipulable via strategic voting, misrepresenting their true preferences to obtain a better outcome, nor a dictatorship, in which a single voter decides the outcome all by herself. This is known as Arrow’s Impossibility Theorem, first published by Kenneth Arrow [2].

Even if there are only two alternatives, in the real world the predominant system of democratic government is representative democracy. There are several reasons why representative democracy is preferred over direct democracy. Historically, the slowness of and the cost associated with communications ruled out a direct democracy involving a significant proportion of the population of any political entity larger than a medium-sized town. Nowadays with widespread internet access in developed countries, this issue seems to have become secondary. But there are other reasons for choosing a representative democracy over a direct one: many of the issues we face today are highly complex. The barrier to entry to even understand these issues, much less be able to solve them, is so high that it is unrealistic or even impossible for everybody to make an informed decision on everything. Also, if a proposed programme to address some issue can be broken up into several parts, each of which is voted on separately, it is possible that a series of close votes will lead to incompatible parts being approved. To avoid situations like this, it seems sensible to have a group of recognised experts tasked with developing a programme to be voted on and implemented by a group of representatives of the people. This way it is more likely that the outcome will be reasonable. This is not to say that representative democracy does not present its own set of disadvantages compared to a direct democracy, the main one probably being lobbyists and regulatory capture. It is not our intention to argue the pros and

1Churchill said ‘Many forms of Government have been tried, and will be tried in this world of sin and woe.

No one pretends that democracy is perfect or all-wise. Indeed it has been said that democracy is the worst form of Government except for all those other forms that have been tried from time to time...’ However, he did not claim to be the originator of this idea.

5

(7)

6 CHAPTER 1. INTRODUCTION

cons of both systems, but rather to take as given the prevalence of representative democracy and analyse issues related to it.

Once we settle on a representative form of government, the next decision we have to make is how to elect those representatives and the system by which they vote on issues. This is what we shall refer to as a ‘two-tier voting system’: the people vote on the representatives and those representatives vote on issues in the name of their constituency. If it is feasible to divide the entire electorate into equal-sized groups, then the first of these problems becomes much simpler.

There are, however, many situations in which this is not possible. The electorate may already be divided naturally into groups of unequal size, be it for historical, cultural, or economic reasons.

Examples of this are states in the United States of America, bundesl¨ander in Germany, or member countries in the European Union. In all these cases, it has to be decided how much weight the representatives of each group represented in some sort of council should receive. It stands to reason that larger groups should receive more weight. It can also be argued that the difference in weight should not be allowed to become so large that the smallest groups have virtually no weight at all.² It is our intention to contribute some proposals to the determination of the optimal weights in a two-tier voting system.

In order to address the assignation of weights to different groups, we first have to specify what qualities or properties of a voting system we value. There are many options to choose from, as we will describe in Chapter 2. We briefly mention some of the previous works on this subject:

Lionel Penrose [30], John Banzhaf [3], Wojciech Slomczynski, Tomasz Zastawniak, and Karol Zyczkowski [34, 35, 33, 36, 37], and Werner Kirsch and Jessica Langner [20, 21, 27].

We choose to minimise the so-called ‘democracy deficit’, defined as the expected quadratic deviation of the council vote from a hypothetical popular vote over all possible issues put before them. This means we value the property that the council vote accurately represents the opinions held by the public at large. Not only in the sense that if the majority of the population is in favour of a proposal, the council should be in favour as well, but beyond that the percentage of the council representatives taking into account the weights assigned to them should also be as close as possible to the percentage of the population in favour. This relates to the ideal that all voters should feel they are represented, or their voices are heard, in the council, which may not be the case if the council votes differ drastically from the popular vote even if the outcome as far as acceptance or rejection is the same. In particular, if there is a minority opinion in the population, the perception that nobody in the council represents that opinion can be a fatal flaw of a representative democracy, as it can foment the radicalisation of said group. As we shall see in Section 6.3, this seemingly innocuous requirement can lead to some unexpected and rather troubling results.

When we say ‘expected deviation’, we are referring to a mathematical expectation. Therefore, we need to specify a probability measure on the space of all voting outcomes or configurations. We will call these ‘voting measures’, and their purpose is to model the voting behaviour of people in a probabilistic sense. They describe how voters respond to the preferences and opinions of their fellow voters, the correlation structure between the votes. We define and investigate the properties of several voting measures in this thesis, and how the democracy deficit can be minimised for each.

The remainder of this thesis is organised in six chapters:

2This has in fact happened. See the case of Luxembourg in the European Economic Community established in 1958 presented in Chapter 2.

(8)

7

In Chapter 2, we define the concepts of voting system, voting measure, two-tier voting system, and democracy deficit, concepts that have been used in the field of social choice or voting theory, and which are all fundamental to the entire thesis. This is also the chapter where we present some of the numerous previous results in this field.

In Chapter 3, we discuss weak convergence and the method of moments, one of the main tools we will use to prove our results in Chapter 5, as well as some combinatorial concepts that are useful for the application of the method of moments.

Afterwards, we define and analyse the first of two families of voting measures, the collective bias models, in Chapter 4. In these models, the voters are all subject to some central influence by some religious or cultural institution, which can act across group boundaries. Beyond this influence that affects all voters, there is no interaction between voters. In particular, we analyse the asymptotic behaviour of the so-called voting margins, which consist of the difference between the respective numbers of yes and no votes, and which - suitably normalised - converge in distribution as the number of voters approaches infinity. Aside from the voting margins, another key aspect is the behaviour and correlation of the votes cast by the representatives in the council.

A particular point of interest is how strongly these votes are correlated with each other. As we will see, in some variants of the collective bias model, this correlation tends to be very strong which has consequences for the assignation of optimal weights.

Then, in Chapter 5, we define the multigroup Curie-Weiss models. In a sense, these are opposite to collective bias models from the previous chapter in that here all influence on a given voter originates from the other voters. There is no central influence that affects everybody and causes a positive correlation between voters. Instead, there is a tendency for voters to align, at least within each group, possibly across group boundaries as well. The resulting patterns of asymptotic behaviour of normalised voting margins are more varied and perhaps more interesting in Curie-Weiss models compared to collective bias models in the previous chapter. We will prove that depending on the parameter values, the different ‘regimes’ as they are called, multivariate central limit theorems and laws of large numbers hold for the vector of normalised voting margins with different target distributions.

Subsequently, in Chapter 6, we use the results on the voting measures analysed to determine how the weights should be assigned to the representatives of the groups for the council vote in order to minimise the democracy deficit. It is in this chapter that we will study how the distributions of the bias variables in collective bias models and the parameter values in Curie- Weiss models can drastically affect the choice of optimal weights in the council. We will cover examples of scenarios ranging from having only a single set of optimal weights to having any set of non-negative satisfy the optimality condition.

Finally, in Chapter 7, we summarise the results obtained and provide a glimpse at possible future research on the assignation of weights in two-tier voting systems.

(9)

(10)

Chapter 2

Voting Systems: Definitions and Previous Results

2.1 Voting Systems

We will be analysing situations in which there is a proposal put before a population of individuals called voters who decide ‘yes’ or ‘no’. Due to this binary nature of the decision, we can encode a single vote as a value in {−1,1}. If we have N voters, then the outcome of a yes/no vote on an issue can be thought of as an element of the set {−1,1}^N, where the generic element (x1, . . . , xN) is interpreted as an outcome where the i-th voter agrees ifxi= 1 and disagrees if xi=−1. A unanimous affirmative vote is theN-tuple (1,1, . . . ,1).

Definition 2.1.1. Let V = {1,2, . . . , N} be a set of voters. We call {−1,1}^N the space of voting configurations and each element a voting configuration.

We shall refer to subsets of voters as ‘coalitions’, as is common in both politics and the social choice literature. Mathematically, coalitions are elements of the power set of the set of all voters, P(V). The question voting theory seeks to answer is how a group should make a decision, given the individual preferences held by each member. There are many possibilities to define a voting rule that decides the outcome of an election for any voting configuration. We call the procedure by which a group decision is made a ‘voting rule’. At the most fundamental level, a voting rule is defined by the coalitions that can decide the vote on any proposal in their favour. We call these coalitions ‘winning’. In order to avoid contradictions, the complement of a winning coalitions cannot be winning, too. Additionally, it is customary to assume that the ‘grand coalition’, i.e.

the full set V, is winning, and that the voting rule is monotonic, so that adding voters to an already winning coalition preserves its winning quality. This leads us to the

Definition 2.1.2. A collectionW of subsets ofV is called the set of winning coalitions if 1. V ∈ W.

2. IfA⊂B andA∈ W, thenB∈ W.

9

(11)

10 CHAPTER 2. VOTING SYSTEMS: DEFINITIONS AND PREVIOUS RESULTS

We can then define the set of losing coalitionsLas the complement inP(V) ofW.

If we want to avoid contradictions due to having a particular coalition that is winning while its complement is winning, too, we specify a ‘proper voting system’:

Definition 2.1.3. A voting system is called proper if its set of winning coalitionW satisfies: if A∈ W, thenA^c∈ W/ for allA∈ P(V).

Once we have the set of winning coalitions, we can determine whether a certain voterv∈W ∈ W is decisive in the sense that withoutv the coalition turns from winning to losing.

Definition 2.1.4. Given a winning coalitionW and a member of this coalitionv, we say that v is decisive inW ifW\{v} ∈ L. We will also say thatv decides the vote.

Example 2.1.5. The European Economic Community (EEC) established in 1958 by the Treaty of Rome had six member states: West Germany (G), France (F), Italy (I), the Netherlands (NL), Belgium (B), and Luxembourg (L). The voting rule in the Council of Ministers was defined by the set of winning coalitions

W ={{G, F, I},{G, F, I, N L},{G, F, I, B},{G, F, I, L},

{G, F, I, N L, B},{G, F, I, N L, L},{G, F, I, B, L},{G, F, I, N L, B, L}

{G, F, N L, B},{G, I, N L, B}{F, I, N L, B},

{G, F, N L, B, L},{G, I, N L, B, L}{F, I, N L, B, L}}.

As we can appreciate, even for a set of only six voters, this type of definition is fairly cumbersome.

It is desirable to define the set of winning coalitions in a more parsimonious way. One such possibility is a weighted voting system. Each voterv receives a weight wv that represents how many votes she casts in the vote. In order for a proposal to be accepted, it needs to receive a certain minimum proportion of all votes called a ‘quota’.

Definition 2.1.6. A weighted voting system with votersV ={1,2, . . . , N}is an (N+ 1)-tuple [q;w1, w2, . . . , wN], whereqis called the quota and thewv are called the weights. A coalitionC is defined as winning in this system ifP

v∈Cwv≥qP

v∈V wv. Remark 2.1.7. The total sum of weightsP

v∈V wv does not carry any meaningful information.

If it is convenient, we can assume without loss of generality that the weights are normalised such that this sum equals, say, 1. Then the condition under which a coalitionC is winning reduces toP

v∈Cw_v ≥q.

Proposition 2.1.8. The EEC’s Council of Ministers described above is a weighted voting system: Let the weights be w_G =w_F =w_I = 4, w_{N L}=w_B = 2, and w_L = 1 and q= ¹²₁₇. Then any given coalition is winning if and only if it has at least 12 votes. This agrees with the above setW of winning coalitions. In fact, this is the usual definition given for the EEC of 1958.

It is however not always the case that a description in terms of weights and quotas is the most

‘natural’ way of describing a voting system. One counterexample is the United Nations Security Council:

(12)

2.1. VOTING SYSTEMS 11

Example 2.1.9. The United Nations Security Council (UNSC) has a total of fifteen members, five of them permanent and ten temporary, selected for two years at a time. The five permanent members, China, France, Russia, the United Kingdom, and the United States can veto any proposal, whereas the ten temporary members cannot. For a proposal to pass, it is thus necessary that all five permanent members and at least four of the temporary members vote in favour.

Although at first glance this asymmetric system with veto power wielded by some but not all members does not look like a weighted voting system, it is one in fact: Let the quota be ³⁹₄₅, the weight of each permanent member 7 and each temporary member 1. Then the set of winning coalitions is exactly as described verbally.

Definition 2.1.10. Suppose that the weight of each voters is 1. Then we define for each voting configuration the voting margin to be the differenceS:=PN

i=1X_i−(2q−1)N.

The voting margin measures the difference between the number of ‘yes’ votes in a voting configuration and the number of ‘yes’ votes required to clear the quota.

Remark 2.1.11. If the quota is ¹₂, then the voting margin is simply the sum of all votes S =PN

i=1Xi. We will be dealing with weighted voting systems with quotas of ¹₂ exclusively throughout this thesis.

There are different possibilities of analysing the behaviour of voters. One is to assume each of them has a set of preferences over the alternatives. Another is to treat the decision process of each voter as a black box and instead look at the patterns emerging in the collective behaviour of the population. The main topic of this thesis is the probabilistic analysis of voting behaviour.

We express the typical patterns of behaviour by specifying how likely each voting configuration is. The object that describes these probabilities is called a ‘voting measure’:

Definition 2.1.12. A voting measurePis a probability measure on the space of voting configurations{−1,1}^N with the symmetry property

P(X₁=x₁, . . . , X_N =x_N) =P(X₁=−x₁, . . . , X_N =−x_N) (2.1.1) for all voting configurations (x1, . . . , xN).

Remark 2.1.13. Since we will be analysing asymptotic properties of voting measures, it would be correct to make explicit the dependency on the size of the population by writingPN. However, we shall not do so in order to simplify the notation somewhat.

The rationale behind the definition of voting measure is the following:

While the votes cast are assumed to be deterministic, obeying the voters’ preferences which we do not model explicitly, the proposal put before them is assumed to be randomly selected.

Since each yes/no question can be posed in two opposite ways, one to which a given voter would respond ‘yes’ to and one to which she would respond ‘no’ to, it is reasonable to assume that each voter votes ‘yes’ with the same probability she votes ‘no’. We also assume that there is a sufficient range of proposals that elicits all 2^N possible responses from the voting population.

Thus a voting measure indicates the probability of each possible outcome of an election. It represents a model in the sense that it describes the relative frequencies of different patterns of behaviour exhibited by the voters. In particular, it establishes whether voters care about the opinions of others, or whether they instead tend to decide on their own. If they do take into

(13)

consideration the opinions of their fellow voters, the voting measure gives us a measure of the degree of this influence or interaction. As we will see in Chapters 4 and 5, the spectrum of possibilities is quite ample.

As it turns out, the expected voting margin plays an important role in our analysis.

Definition 2.1.14. IfS is the voting margin defined in 2.1.10, then under the voting measure Pwe call the expressionE(|S|) the expected absolute voting margin.

A property of voting measures that comes in handy in Chapter 6 is

Definition 2.1.15. Let (PN) be a sequence of voting measures. We say that (PN) is asymptotically resolute if the probability under PN that the vote is drawn goes to 0 asN → ∞. In keeping with Remark 2.1.13, we shall also say thatPis asymptotically resolute.

Remark 2.1.16. As we shall see, every single voting measure studied in this thesis is asymptotically resolute. From an empirical standpoint this property is reasonable, too.

2.2 Two-Tier Voting Systems and the Democracy Deficit

Suppose we have a populationV ofN voters which is subdivided intoM groups, and the sizes of the groups areN₁, N₂, . . . , N_M. Each groupν votes on a given proposal and depending on the outcome has their representative cast a vote χ_ν ∈ {−1,1} in the council. This is called a

‘two-tier voting system’.

The votes of the representatives in the council are called social choice functions of the groups.

They select the preferred alternative ‘yes’ or ‘no’.

Definition 2.2.1. Let V be a set of voters of size N. We define the social choice function χ:{−1,1}^N → {−1,1}by setting

χ(x₁, . . . , x_N) :=

(1, ifS >0,

−1, otherwise.

In the definition above,S stands once again for the voting margin.

Remark 2.2.2. We resolved draws in the votes by defaulting to ‘no’. If the voting measure describing the voting behaviour is asymptotically resolute, this choice does not matter for large populations.

In order to distinguish the voters belonging to each of the M different groups, we label the random vector (X1, X2, . . . , XN) as follows

(X11, X12, . . . , X1N₁, . . . , XM1, XM2, . . . , XM N_M).

We mention here that in all situations, whenever we analyse the asymptotics of a model, we always assume that for each groupν,N_ν→ ∞as N → ∞.

Definition 2.2.3. We define the group size parameters for each group ν αν := lim

N→∞

Nν

N and assume these limits exist.

(14)

2.2. TWO-TIER VOTING SYSTEMS AND THE DEMOCRACY DEFICIT 13

We will use these group size parameters in all models in this thesis.

We adapt Definition 2.1.15 to the situation where there are several groups:

Definition 2.2.4. If there are M ∈N groups, we say thatP is asymptotically resolute if the voting measure is asymptotically resolute for each group.

Within each group, each voter has a single vote and the quota is set to¹₂. This is called majority voting and it is the only voting rule in yes/no voting that has a number of desirable properties as established by May’s Theorem in 1952 [29]. It is the natural choice in this context. In the council, each group has a certain weight assigned to it. This weight will first and foremost depend on the size of the group; however, as we shall later see, the underlying structure given by the voting measure plays a crucial role as well. The representatives cast their votes according to the will of the people in their respective groups and the proposal is accepted if the quota of

1

2 is met in the council vote.

One could think of a lot of different criteria by which to assign the votes. It is crucial that these criteria be informed by some objective to be achieved, rather than presupposing the end result. A common opinion outside the field of social choice theory is that the one obviously just assignment of weights is proportionality, where each group receives a weight proportional to its size. However, this choice has unintended consequences, such as possibly making the influence each voter has on the outcome of a vote different, depending on which group said voter is a member of.

Given a voting system and a voting measure, we can calculate the probability that a given voter v in one of the groups decides the council vote. According to Definition 2.1.4, we need to calculate the probability of a winning coalition with memberv. In a two-tier voting system, forv to be decisive,vhas to decide her group’s vote in her favour, and additionally her group’s representative must decide the council vote.

Consider the case that all voters act independently of each other. Penrose showed in 1946 [30]

that in order to make every voter across all groups equally likely to be decisive, i.e. to equalise the ‘influence’ of all voters, the weights should be assigned proportionally to the square root of each group’s size. The probability ofv’s being decisive is the product of the probabilities of the voter’s being decisive in her group and the representative’s vote’s being decisive in the council vote. The probability of the representative’s vote deciding the council vote is approximately proportional to the weight assigned to that group. The probability of v deciding her group’s vote depends on the voting measureP. Under the independence assumption, it is easy to show using Stirling’s formula that this is asymptotically proportional to 1/√

N. In order to make the probability of decisiveness equal across groups, the weight of each group should therefore be proportional to the square of each group’s size.

Later on, Banzhaf defined the power index named after him in [3]. Equalising the Banzhaf index of all voters is equivalent to equalising the influence of a voter in [30].

Whenever we talk about the asymptotic behaviour of some expression, we will be referring to the following definition:

Definition 2.2.5. Letf andg be functions ofN. We writef ≈gif lim_N→∞^f(N)_g(N₎ = 1.

Penrose’s square root law has been the gold standard in the field of voting theory. However, intuitively speaking, the assumption that all voters are independent seems unrealistic. Gelman, Katz, and Tuerlinckx [14] and Gelman, Katz, and Bafumi [13] criticised the square root law

(15)

based on statistical evidence: as we will discuss in Section 2.3, under independence of voters in different groups, the optimal weights are proportional to the expected absolute value of the voting margin within each group. The proportional or per capita absolute voting margin E(|S|)/N is a characteristic of the distribution of voting configurations under a voting measure which can be statistically estimated. Gelman et al pointed out that the prediction made under the assumption of independence of all voters, which leads to a binomial distribution of the sum of all votes, is that, asN goes to infinity,E(|S|)/N should behave like 1/√

N. This would imply that larger countries or constituencies should on average have far closer elections, i.e. smaller voting margins, than smaller countries. Gelman et al argued that although this can be observed in the data on U.S. and European elections, the voting margins decrease with a far lower power of N than the ¹₂ that leads to the square root law. As a result, since the expected voting margin goes to zero more slowly, the probability that a specific voter is decisive within his group decreases faster with group size, the weight of her group should be higher to compensate. This led Gelman to suggest a power rule in which the weightwν should be proportional to something likeN_ν^0.9 rather than the square root.

Wojciech Slomczynski, Tomasz Zastawniak, and Karol Zyczkowski [34, 35, 33, 36, 37] have extensively analysed voting in the European Union, placing special emphasis not only on the weights assigned to each member but also the optimal quota in the council. Their research was motivated in part by the suboptimality of the previous systems established in the Treaties of Nice and Lisbon as far as the equalisation of the voting power of voters living in different countries is concerned.

In this thesis, we shall follow the lead of Kirsch and Langner [20, 21], and specify as the objective making the probability distribution of the council vote as close to the distribution of the popular vote as possible. The popular vote is given by

P :=

N

X

i=1

Xi. The council vote is given by

C:=

M

X

λ=1

wλχλ. Then we define

Definition 2.2.6. The democracy deficit given a voting measurePand a set of weightsw1, . . . , wM

is defined by

∆ :=E h

(C−P)²i ,

where the expectation is taken with respect toPover all voting configurations and corresponding popular and council votes as defined above.

In this thesis we shall define and analyse different voting measures and determine which assignment of weights in the council minimises the democracy deficit in each case. Whenever we say

‘optimal weights’, we will be referring to the weights that minimise said deficit.

(16)

2.3. PREVIOUS RESULTS 15

2.3 Previous Results

Aside from the aforementioned square root law due to Lionel Penrose, the question of how to assign weights in a council in a two-tier voting system has been analysed by Kirsch and Langner [20, 21]. Kirsch [20] determined that in a situation where all voters in different groups are independent of each other, the optimal weights are obtained by calculating the expectations of the absolute value of the voting marginsE(|Sν|) for each groupν = 1,2, . . . , M. If the voters within the groups are independent as well, we recover the square root law, as the E(|Sν|) are proportional to the square root of the group sizes. Thus minimising the democracy deficit in this case leads to optimal weights proportional to the square root of each group’s size.

The independence model was also described in Jessica Langner’s PhD thesis [27]. It is defined by the voting measureP0 that assigns each voting configuration the probability ₂¹N.

Additional models studied by Kirsch and Langner were the collective bias model (CBM) and the cooperation or Curie-Weiss model (CWM). The collective bias model describing a single group is defined as follows:

Definition 2.3.1. Letµbe a probability measure on [−1,1] that has the symmetric property µ([a, b]) =µ([−b,−a])

for all intervals [a, b]⊂[−1,1]. LetZ be a random variable on [−1,1] with distribution µ. For each voteriand realisationζofZ, let the conditional distribution of their voteXi be given by

P_ζ(X_i= 1) := 1

2(1 +ζ).

Let us assume that conditionally on ζ each voter casts her vote independently of the others.

Then the conditional probability of each voting outcome (x1, . . . , xN) is given by

N

Y

i=1

Pζ(Xi=xi).

We then define the voting measurePof the CBM by setting for each (x1, . . . , xN)

P(X1=x1, . . . , XN =xN) :=

[−1,1]

N

Y

i=1

Pζ(Xi=xi)dµ(ζ).

Under the premise that voters of different groups do not influence each other, we can assume that each group is separately described by a CBM. Kirsch showed that the optimal weights are then proportional to the group sizes. This happens because in each group the common bias sets the political agenda for the voters in such a way that a majority vote in favour of the alternative preferred by the bias variable is asymptotically almost sure to occur. Therefore, the expected voting margin increases linearly with the group’s size. Langner also analysed a CBM where there is a single global bias variable that affects voters in all groups equally. She found the optimal weights to be indeterminate due to the perfect positive asymptotic correlation between the different groups’s votes under this setup. The optimal weights given by the first order condition describing the weights that minimise the democracy deficit only have to sum

(17)

to a certain constant. Due to remark 2.1.7, this means any assignation of weights is optimal.

Intuitively speaking, it does not matter how the weights are assigned, since everybody votes the same way.

It is important to mention that Philip Straffin [38] already considered in 1982 a special case of the CBM, where the bias variableZ follows a uniform distribution on [−1,1]. We shall analyse a similar model in Chapter 4, as the simplicity of the uniform distribution allows us to calculate the solution of the model explicitly.

The CWM originated in the field of statistical mechanics and is named after Pierre Curie and Pierre Weiss, even though it was first introduced by Husimi [15] and Temperley [39]. Sub- sequently it was discussed by Kac [17], Thompson [40], and Ellis [9]. It models a magnetic material composed of elementary particles called spins that can either point up or down. The orientation of each spin depends on the orientation of all the other spins. The CWM is an example of a mean field model, as the interaction between particles can be expressed as the interaction between each particle and the mean of all other particles.

The CWM can be naturally interpreted as a model of social voting behaviour, in which voters exhibit the tendency to vote alike.

For each voting configuration, we define an ‘energy level’ given by the HamiltonianH :{−1,1}^N → R,

H(x1, . . . , xN) :=− 1 2N

N

X

i=1

xi

!² .

This energy is lowest whenever all voters agree and highest when the voters are evenly split (or close to it). The model has a single parameterβ, the inverse temperature, which regulates how strongly the voters influence each other’s decision. The voting measure that defines the CWM is then

Definition 2.3.2. LetH be defined as above andβ ≥0. Then we callPa CWM if for each voting configuration (x₁, . . . , x_N)

P(X₁=x₁, . . . , X_N =x_N) =Z⁻¹exp(−βH(x₁, . . . , x_N)), where the ‘zustandssume’Z makes certain thatPis indeed a probability measure.

Remark 2.3.3. The voting measure defined above is referred to in physics as a Gibbs measure.

The CWM is in a sense the opposite of a CBM: whereas in the latter the correlation between voters comes from an external bias that affects everybody’s decision and makes a vote for one of the alternatives more likely, in a CWM there is no external bias. Instead, all correlation between the votes comes from a tendency of the voters to align their votes with the votes of the majority. If you remove the effect of the common bias in a CBM, the voters are conditionally independent. Similarly, if you remove the effect the other voters have on an individual, you also obtain conditional independence. De Finetti’s Theorem (first published in [12]; for a discussion see e.g. [1] or [19]) states that an infinite sequence of exchangeable random variables can be expressed as a mixture of conditionally i.i.d. random variables. We say that a sequence of random variables is of de Finetti type if this statement holds. The CWM is of de Finetti type as shown e.g. in Section 5.2 of [19]. Theorem 5.6 there states that

(18)

2.3. PREVIOUS RESULTS 17

P(x₁, . . . , x_N) =Z⁻¹ 1

−1

P_t(x₁, . . . , x_N)e⁻^N²^F(t)

1−t² dt. (2.3.1)

Pt in the above equation is the N-dimensional product measure with each factor identical to the Rademacher measure with parametert ∈[−1,1]. These are some of the distributions that we will encounter in this thesis:

Definition 2.3.4. The Dirac measure or point mass in x ∈ Rⁿ δx is defined on the Borel σ-algebraBⁿ by

δ_x(A) =

(1, ifx∈A, 0, otherwise, for eachA∈ Bⁿ.

The Rademacher measure onBwith parametert∈[−1,1] is defined as Rad_t=1

2((1−t)δ₋₁+ (1 +t)δ₁).

The uniform distribution on any boundedB∈ Bwill be referred to as UB.

The (multivariate) normal distribution of dimensionn∈N with mean µ∈Rⁿ and covariance matrixC∈R^n×n will be referred to asN(µ, C).

The mixing variablet with range [−1,1] in (2.3.1) sums up the interaction between voters in a CWM. If it is positive, the conditionally independent voters will be more likely to vote ‘yes’.

The CWM is on the one hand a relatively simple model, for which solutions have been calculated.

On the other hand, it is complex enough to incorporate a phase transition. At the critical valueβ = 1, the asymptotic behaviour of the mean voting margin _N^S switches. The following description of the model has been taken from the articles [22, 23, 24]; a more in depth explanation can be found in [9, 19]. Different versions of the CWM have been studied with applications to the social sciences in mind; see e.g. [7].

The Curie-Weiss model has a phase transition atβ = 1 in the following sense:

1

NS =⇒ 1

2 δ_−m(β)+δm(β)

, (2.3.2)

where⇒denotes convergence in distribution.

Forβ≤1 we havem(β) = 0 which is the unique solution of the so called Curie-Weiss equation

tanh(βx) =x. (2.3.3)

Ifβ >1 equation (2.3.3) has exactly three solutions andm(β) is defined to be the unique positive one.

Equation (2.3.2) is a substitute for the law of large numbers for i.i.d. random variables.

(19)

Moreover, forβ <1, there is a central limit theorem, i. e.

√1

N SN =⇒ N

0, 1 1−β

(2.3.4) Forβ= 1 there is no such central limit theorem. In fact, the random variables

1

N^3/4S_N (2.3.5)

converge in distribution to a limit which is not a normal distribution.

Using these results, Kirsch showed that the optimal weights when each group is described by a CWM and voters in different groups do not influence each other are:

1. Forβ <1, the optimal weightsw_ν are proportional to √ N_ν. 2. Forβ= 1, the optimal weightsw_ν are proportional to Nν³⁴. 3. Forβ >1, the optimal weightswν are proportional to Nν.

This thesis analyses the situation in which there is correlation between voters not only within each group but across group boundaries as well. We generalise the results for CBMs in Chapter 4 and for CWMs in Chapter 5.

Some of the results presented in Chapter 5 were previously shown in the papers [22, 23, 24] for two groups. Many of these results are generalised to CWMs with any number of groups in this thesis.

(20)

Chapter 3

Method of Moments and some Combinatorial Concepts

3.1 Method of Moments

The technique used in this thesis to determine the asymptotic distributions of the normalised voting margins _N^S^νγ

ν is the method of moments. The power γ > 0 is chosen depending on the model in question, in such a way that the normalised sum converges in distribution. The manuscript [19] provides a detailed exposition of this technique for univariate distributions. We only mention the most important concepts and results.

For measures, the key concept is ‘weak convergence’:

Definition 3.1.1. Let (µ_n) be a sequence of finite measures onR. We say that (µ_n) converges weakly to a finite measureµ, in symbols

µ_n ⇒µ, if for all continuous and bounded functions onR

f(x)dµn(x)→

f(x)dµ(x) holds.

For random variables, we have the concept of convergence in distribution:

Definition 3.1.2. Let (X_n) be a sequence of real random variables. We say that (X_n) converges in distribution to a real random variable X, if the sequence of distributions (P_X_n) of (X_n) converges weakly to the distribution P_X of X. We will also writeX_n ⇒X for convergence in distribution.

In order to show weak convergence for probability measures, or convergence in distribution of random variables, we study the asymptotic behaviour of the moments ofµn. We distinguish moments and absolute moments:

19

(21)

20 CHAPTER 3. METHOD OF MOMENTS AND SOME COMBINATORIAL CONCEPTS

Definition 3.1.3. For a probability measure µ on Rand k ∈ N, we define the k-th absolute momentmk(µ) by

|x|^kdµ(x),

If this expression is finite, we define thek-th momentmk(µ) by

x^kdµ(x).

The absolute moments of probability measures may not be finite. If any moment is infinite, we cannot use the method of moments to study their convergence. We only treat the case where all moments are finite:

Definition 3.1.4. We say that a probability measureµis a measure with existing moments if for allk∈N mk(µ)<∞holds.

What we would like is a theorem that states ‘convergence of the moments mk(µn) → mk(µ) implies weak convergenceµn⇒µ’. Unfortunately, this statement is false in general. The most common counterexample given (see e.g. p. 106 in [8]) is the lognormal density

f0(x) = 1

√2πxexp

−(lnx)² 2

for allx≥0. If we now define for alla∈[−1,1] the function f_a(x) =f₀(x)[1 +asin(2πlnx)]

for allx≥0, then the distributions with densitiesfa have the same moments asf0, despite the fact they are not the same probability distribution.

However, it is possible to constrain the set of probability measures in such a way that, within this set, the convergence of moments implies weak convergence:

Definition 3.1.5. Letµbe a probability measure onR. We say thatµhas ‘moderately growing moments’ if it has existing moments and there are constantsAandC such that

mk(µ)≤AC^kk!

holds for allk∈N.

This property is not too restrictive for our purposes, since it applies to the two most important types of distributions we will concern ourselves with:

Proposition 3.1.6. LetX be a bounded real random variable and letY be normally distributed with mean0 and varianceσ². Then X andY have moderately growing moments.

Proof. SinceX is bounded, there is an upper boundC ∈R such that|X| ≤ C. Then for all k∈N

(22)

3.1. METHOD OF MOMENTS 21

mk(PX) =

|x|^kdPX

≤C^k, and so the moments ofX are moderately growing.

For the normal distribution of Y it is well-known (see e.g. Proposition 2.44 in [19]) that the centred moments of orderkare 0 ifk is odd, and ifkis even,

mk(PY) =mk(PX)

= (k−1)!! σ²^k₂

≤k! σ²^k₂ , where the double factorial is defined recursively by

1!! := 0!! := 1, (k+ 2)!! := (k+ 2)k!!

for allk∈N0.

Moderately growing moments is sufficient to ensure that the moments determine the distribution uniquely:

Theorem 3.1.7. Supposeµis a probability measure with moderately growing moments and let ν be a finite measure such that

mk(µ) =mk(ν) for allk∈N. Then the two measures are equal, µ=ν.

This is Theorem 2.55 in [19] and is proved there. If we have calculated the limits of the moment sequences

mk := lim

n→∞mk(µn),

then we need to know whether there is a probability measureµthat has precisely those moments.

The next theorem assures us that is indeed the case.

Theorem 3.1.8. Suppose(µn)is a sequence of probability measures and(mk)is a real sequence that is moderately growing, i.e.

mk≤AC^kk!

holds for allk∈N. If

mk(µn)^n→∞−→ mk

(23)

holds for allk∈N, then there is a unique probability measureµwithmk(µ) =mk for allk∈N and

µn ⇒µ.

See Theorem 2.56 in [19] for a proof.

In summary, the method of moments refers to the calculation of the limiting moments of a sequence of random variables to determine the limiting distribution.

In this thesis we deal with sequences of random vectors and their asymptotic behaviour. We therefore need to generalise the method of moments to the study of random vectors.

Let for each n ∈ N Xn be an m-dimensional real random vector. What are the moments of a random vector X_n? Following the presentation by Kleiber and Stoyanov in [25], we define multivariate moments.

Definition 3.1.9. LetX be an m-dimensional real random vector and letk= (k₁, . . . , k_m)∈ N^m0. Then we define the absolute moment of orderkofX

m_k(P_X) :=

R^m

x^k₁¹· · ·x^k_m^m dP_X. If this expression is finite, then we define thek-th moment ofX

m_k(P_X) :=

R^m

x^k₁¹· · ·x^k_m^mdP_X.

Remark 3.1.10. Let0:= (0,0, . . . ,0)∈N^m0 stand for the zero vector. The dimension mshould be obvious from the context.

As we will see shortly, the one-dimensional marginal moments are key to our analysis of the convergence of random vectors.

Definition 3.1.11. We call the moments of order (0, . . . ,0, k_i,0, . . . ,0) thei-th one-dimensional marginal moments of orderk_i.

The question is: do random vectors converge in distribution if all their moments convergence, just as in the univariate case? The answer is: it depends! Fortunately, there is a handy sufficient condition that ensures this is the case. Petersen showed in [31] that if all one-dimensional marginal distributions are uniquely determined by their one-dimensional marginal moments, then the jointm-dimensional distribution is uniquely determined by its multivariate moments.

This is Theorem 3 in [31]:

Theorem 3.1.12. Let µ be a probability measure on R^m with the property that each one- dimensional marginal distributionµ_i,i= 1, . . . , m, is uniquely determined by its moments, then µitself is uniquely determined by its multivariate moments.

This is sufficient for our purposes because it allows us to show that

Proposition 3.1.13. LetX be a boundedm-dimensional real random vector andY be a random vector following a multivariate normal distribution with mean0and covariance matrixC. Then these distributions are uniquely determined by their multivariate moments.

(24)

3.2. COMBINATORIAL CONCEPTS 23

Proof. Since X is bounded, each component Xi is a bounded real random variable. Hence its moments, which are the one-dimensional marginal moments ofX, are moderately growing by Proposition 3.1.6. Therefore, the marginal distributions are uniquely determined by their moments, according to Theorem 3.1.7. We can thus conclude by using Theorem 3.1.12 the distribution ofX is uniquely determined by its multivariate moments.

Similarly, for Y we have one-dimensional marginal distributions which are univariate normal.

By the same reasoning, the distribution ofY is uniquely determined by its moments.

We will be dealing with centred multivariate normal distributions (recall Definition 2.3.4) a lot, so we define

Definition 3.1.14. LetCbe a positive semi-definiten×nmatrix, andk1, . . . , kn∈N⁰. Then we set

m_k₁_,...,k_n(C) :=m_k₁_,...,k_n(N(0, C)).

3.2 Combinatorial Concepts

Whenever we use the method of moments, we will have to evaluate sums of the form

E







N₁

X

i=1

Xi

!^K



N₂

X

j=1

Yj





L





= X

i₁,...,i_K

X

j₁,...,j_L

E

Xi₁·Xi₂·. . .·Xi_K·Yj₁·Yj₂·. . .·Yj_L

. (3.2.1)

To do the book-keeping for these huge sums we introduce a few combinatorial concepts taken from [19].

Let|A|stand for the cardinality of the setA.

Definition 3.2.1. We define a multiindexi= (i1, i2, . . . , iL)∈ {1,2, . . . , N}^L. 1. Forj∈ {1,2, . . . , N} we set

ν_j(i) :=|{k∈ {1,2, . . . , L} |i_k =j}|.

2. For`= 0,1, . . . , Lwe define

ρ`(i) :=|{j|νj(i) =`}|

and

ρ(i) := (ρ1(i), . . . , ρL(i)).

(25)

The numbers νj(i) represent the multiplicity of each index j ∈ {1,2, . . . , N} in the multiindex i, and ρ`(i) represents the number of indices ini that occur exactly` times. We shall callρ(i) the profile of the multiindexi.

Lemma 3.2.2. For alli= (i₁, i₂, . . . , i_L)∈ {1,2, . . . , N}^L we havePL

`=1`ρ_`(i) =L.

We use this basic property of profiles to define

Definition 3.2.3. Let r = (r1, . . . , rL) be such that PL

`=1`r` = L hold. We call r a profile vector. We define

wL(r) =

{i∈ {1. . . , N}^L|ρ(i) =r}

to represent the number of multiindicesithat have a given profile vectorr.

We now define the set of all profile vectors for a givenL∈N. Definition 3.2.4. Let Π^(L) = n

r∈ {0,1, . . . , L}^L|PL

`=1`r_`=Lo

. Some important subsets of Π^(L) are Π^(L)_k =

r∈Π^(L)|r1=k , Π^0(L) =

r∈Π^(L)|r`= 0 for all`≥3 and Π^+(L) = r∈Π^(L)|r`>0 for some`≥3 . We can also combine superscripts and subscripts. Then we have, e.g., Π^0(L)₀ =

r∈Π^(L)|r_`= 0 for all`6= 2 .

We shall write for any i ∈ {1,2, . . . , N}^L Xi = Xi₁· · ·Xi_L. For any r ∈ Π^(L) let j ∈ {1,2, . . . , N}^L be such that ρ j

= r. Then we let Xr stand for Xj. This definition is not problematic if we are only interested in the expectation

E X_r

=E

X_j , and the random variablesX₁, . . . , X_N are exchangeable.

If there areM sets{1,2, . . . , Nν}^L^ν, and for each ν iν ∈ {1,2, . . . , Nν}^L^ν, then we set i := iν

and writeXi forXi₁· · ·Xi_M. Similarly, if we have profile vectorsrν∈Π^(L^ν⁾, andjν ∈{1,2,. . .,Nν}^L^ν such thatρ j_ν

=r_ν, then we write Xr for X_j₁· · ·X_j_M. Proposition 3.2.5. Forr∈Π^(L) setr₀:=N−PL

`=1r_`. Then

wL(r) = N! r1!r2!. . . rL!r0!

L!

1!^r¹2!^r²· · ·L!^r^L. If we let N go to infinity, then we have

w_L(r)≈ N^P^L^l=1^r^l r1!r2!. . . rL!

L!

1!^r¹2!^r²· · ·L!^r^L. This proposition is based on Theorem 3.14 and Corollary 3.18 in [19].

(26)

Chapter 4

Collective Bias Models

In this chapter we shall present two different definitions of a collective bias model with idio- syncratic or group biases that exhibit a correlation between votes in different groups. Model A is a multiplicative model, in the sense that the global bias variable and each group’s bias scale are multiplied to determine the bias prevailing among voters of each group. Model B is an additive model where the global bias variable and a bias variable for each group are added to determine that group’s bias. The focus of attention is the asymptotic behaviour of the mean voting margins Sλ/Nλ, so this an example of a model where the correct normalising power γ mentioned at the beginning of Chapter 3 happens to be 1. As we will see, the resulting voting behaviour is different: in model A, there is a strong tendency for most voters in all groups to vote alike; in model B, this tendency can be somewhat weaker.

4.1 Model A

We define a model based on the collective bias model presented in chapter 10 of [27]. LetZ be the random variable that represents a collective bias across allM groups andµits probability distribution on [−1,1] that satisfies the symmetry conditionµ([a, b]) =µ([−b,−a]) for all−1≤ a≤b ≤1. Let Y1, . . . , YM be random variables with range (0,1] and probability distribution µν each. Assume that the Yν are independent both of each other as well as of Z. We now define for each groupν a random variableZν :=YνZ that represents the collective bias within that group. As we can see, biases will not be identical across groups but they will be correlated with each other. Hence voters will be more strongly correlated within each group than across group borders. The motivation behind this definition is to have this dependency across borders while maintaining a simple structure to the model (hence the independence of all variables Z, Y1, . . . , YM). Note that since all Yν are positive, the bias in all groups will be of the same sign, thereby giving rise to positive correlations of votes across all member states.

In general, we shall use the letterζ to denote a realisation of the random variableZ andυ_ν for a realisation ofYν.

Definition 4.1.1. (Collective Bias Measure A) Let (Xλi)λ=1,...,M,i=1,...,N_λ be random variables such that for all ν ∈ {1, . . . , M} and all i ∈ {1, . . . , Nν} Xνi has the conditional distribution

25

(27)

26 CHAPTER 4. COLLECTIVE BIAS MODELS

Pζυ_ν = Radp,p=¹₂(1 +ζυν). Then the Collective Bias Measure A (CBM-A)P, which depends on the distributionsµ, µ1, . . . , µM, is defined as

P(X) :=

[−1,1]

[0,1]

· · ·

[0,1]

N

Y

i=1

((1−p)δ₋₁({Xi}) +pδ1({Xi}))dµdµ1· · ·dµM

for allX ∈ {−1,1}^N. The measuresµ, µ1, . . . , µM are the distributions ofZ, Y1, . . . , YM. Note that in the above definitionpdepends on the valuesζ,υ1, . . . , υM.

Remark 4.1.2. A CBM-A satisfies the symmetry condition (2.1.1). This follows directly from the assumption that the measureµis symmetric on [−1,1].

We now present some results for this model that we will use in Chapter 6. We will be using the symbols E, E_µ, E_µ_ν for the expectations under P, µ, µ_ν, respectively. The symbols E_ζ,υ₁_,...,υ_M and P_ζ,υ₁_,...,υ_M are conditional expectations and probabilities, respectively, given the values Z=ζ, Y_ν =υ_ν.

Theorem 4.1.3. For allν, ν⁰∈ {1, . . . , M},ν 6=ν⁰:

1. E(XνiXνj) =Eµ Z²

Eµ_ν Y_ν²

for alli, j∈ {1, . . . , Nν},i6=j.

2. E(XνiXν⁰j) =Eµ Z²

Eµ_ν(Yν)Eµ_ν0(Yν⁰)for alli∈ {1, . . . , Nν}, j∈ {1, . . . , Nν⁰}.

3. E(Xi₁· · ·Xi_k) =Eµ Z² Eµ₁

Y₁^k¹

· · ·Eµ_M

Y_M^k^M

for all1≤k≤Nand all{i1, . . . , ik} ⊂ {1, . . . , N}, wherekν represents the number of voters that belong to groupν,PM

ν=1kν =k.

4. E(Sνχν⁰)≈NνEµ(|Z|)Eµ_ν(Yν)≈E(|Sν|) =E(Sνχν).

5. E(χ_νχ_ν⁰)≈µ(Z 6= 0).

Note that if the distributions ofYνandYν⁰ are identical, then 1. and 2. in the preceding theorem imply E(XνiXν⁰j) = Eµ Z²

Eµ_ν(Yν)Eµ_ν0(Yν⁰)≤ Eµ Z²

Eµ_ν Y_ν²

=E(XνiXνj), where the inequality is due to the Cauchy-Schwarz inequality. As expected, voters within groups are more strongly correlated than across group borders.

Proof. The key to the proof of this theorem is the feature of this model that conditionally on the values ζ, υ1, . . . , υM all random variablesXνi are independent of each other. In the following calculations we use the definitions ofPandEas given above.

Correlated Voting in Multipopulation Models, Two-Tier Voting Systems, and the Democracy Deficit

Gabor Toth

Correlated Voting in Multipopulation Models, Two-Tier Voting Systems, and the Democracy Deficit

Dissertation

Fakultät für

Mathematik und

Informatik

Correlated Voting in Multipopulation Models, Two-Tier Voting Systems, and

the Democracy Deficit

by

Gabor Toth

DISSERTATION

submitted for the degree of

Doctor of Natural Sciences (Dr. rer. nat.) in Mathematics at the Faculty of Mathematics and Computer Science

of the FernUniversit¨ at in Hagen

November 2019

Contents

Chapter 1

Introduction

Chapter 2

Voting Systems: Definitions and Previous Results

2.1 Voting Systems

2.2 Two-Tier Voting Systems and the Democracy Deficit

2.3 Previous Results

Chapter 3

Method of Moments and some Combinatorial Concepts

3.1 Method of Moments

3.2 Combinatorial Concepts

Chapter 4

Collective Bias Models

4.1 Model A