• Keine Ergebnisse gefunden

Operationalization and model specification

Im Dokument Tartu 2020 (Seite 51-54)

2. Research Design and Data

2.4. Operationalization and model specification

To proceed with data collection and analysis, it is important to operationalize variables and specify the model for estimation of the covariate effects. Legislative speeches can be considered as units of analysis, which have own characteristics. As it was mentioned above, the topical prevalence model has been chosen to reveal the topic structure of the corpus of legislative speeches made in the House of Commons and its dependency on metadata. Considering the formulated hypothesis, the following metadata included as

‘prevalence covariates’ in the model:

 Whether the speaker is a cabinet minister or not (cabinet membership status)

52

 Party affiliation of the speaker

 The degree of speaker’s (MPs’) doubt towards the Brexit deal

 Time, when speech took place

Moreover, the model employs spectral initialization, which is more appropriate for better convergence and recommended by the developers of the STM method (Roberts et al., 2016b), and 500 iterations are stated as maximum for model convergence to increase the computational reliability. Besides, machine learning techniques imply the random subsetting of data, which complicates the process of replication. The seed is set to a value of 2019 to provide the replicability to the results.

To make model calculations and further effect estimations possible, the operationalization of independent variables should be conducted. Cabinet membership status is coded as a numeric and dichotomy variable, where ‘0’ refers to MPs without executive power, and

‘1 refers to government MPs. Party affiliation is coded as a factor variable, where each factor refers to the party name with which the speaker associated.

Regarding the MPs in doubt, the operationalization process should be outlined in more detail. For this research, MPs in doubt are operationalized considering the MPs' voting profile, constructed based on the combination of votes they cast on the government policy.

This assumption is inspired by and based on the spatial model of roll-call voting, which represents the distance between an ideal point of the legislator and proposed policy (Carroll and Poole, 2014; Poole, 2005). In the context of this thesis, this variable is a factor variable, where combinations of ‘Ayes’ and ‘Noes’ can be considered as factors.

For instance, there can be three consequent meaningful votes on the same policy in a given period. If one imagines that there are only three possible combinations of sequential votes, it can be represented, as follows:

When legislator voted three times in favor of policy – ‘Aye, Aye, Aye’,

When legislator voted three times against the policy proposal – ‘No, No, No’,

When legislator voted two times against the policy during the first two votes and changed his mind during the last one – ‘No, No, Aye’.

In the scenario presented in the example, the latter voter will be considered as ‘MP in doubt’ during the analysis, since his voting profile represents the inconsistency in

53 legislative voting behavior. Otherwise, if MPs vote consistently on policy, it may indicate their adherence to the party or policy position. Also, this factor variable contains the sequence of votes according to the order in which votes occurred. Regarding validity, it can be inferred that this measurement fully corresponds with the theoretical explanations and research goals, but for other research, it can be considered limited due to its factor (non-numeric) nature. Regarding reliability, this measure can be used to replicate the results of this thesis without any limitations or possible errors.

Time is also coded as a factor variable, which refers to the important periods when debates took place in the following form – ‘Period_#’. It comes from the assumption that there are several events, which separate the whole timeframe into meaningful periods.

However, time is also operationalized through the month number with which the speech is associated. Month numbers are taken in a range from 1 to 24, where ‘1’ refers to January 2018, and ‘24’ refers to December 2019. This measure is transformed into a spline with 10 degrees of freedom, as suggested by authors of the STM model (Roberts et al., 2016a) to control for a non-linear relationship between the effects of time and dependent variables.

When the STM model is run, one can extract and evaluate the effects of the covariates on the topic proportions to reveal the expected topic probabilities for each of the factors, and then make inferences. All factor variables are also dummy variables in the following model, and the first categories of these variables are reference ones to estimations of which others’ effects are compared (Hardy, 1993). This model also uses the method of composition, which allows to incorporate estimation uncertainty in the dependent variable. As per STM default, ‘Global’ method is used, which implies an approximation to the average covariance matrix formed using the global parameters (Roberts et al., 2017). One hundred simulations are conducted to estimate the result, as per model default.

Thetha (θ) is a dependent variable, which represents the topic porportions of a speech exctracted from the STM model results, while others are independent variables. Thus, such a model must be run to estimate the effects of covariate:

54

θ

sk

= α + γ

1k

∗ C

s(i) +

γ

2k

∗ P

s(i) +

γ

3k

∗ V

s(i) +

γ

4k

∗ T

s(i) +

γ

1k

∗ M

s(i)

where θ – the topic proportions of speech s, by MP i as the dependent variable,

γ

nk

-

coefficients for estimated difference in topics proportions for legislators’

roles and time effects, k – number of topics,

C – cabinet membership status of MP i, P – party affiliation of MP i,

V – voting profile of MP i,

T – period, when speech of MP i was delivered,

M – month, when speech of MP i was delivered (splined).

The month number cannot be a key independent variable for topic proportions instead of a ‘period’ variable to evaluate effects, because it can produce a bias related to the topic prevalence. For some topics the expected topic probability can acutely increase in continuous space due to the possible sharp decline of other topics’ probability. For instance, politicians may resolve some issue and stop debating on it. It will lead to 0 probability of this topic in given period, increasing the probability of others and residuals at the same time.

Im Dokument Tartu 2020 (Seite 51-54)