• Keine Ergebnisse gefunden

2.4. Model Selection

2.4.2. Model Reliability and Validity

After the text preprocessing it is necessary to continue with the model selection. Even though model reliability is not a major concern for the researchers, it was decided to describe the algorithm of model evaluation. To obtain reliable results, researchers usually examine the number of topics that will accurately reflect the main theme categories in the text corpora (Maier et al., 2018). Here it is needed to examine held-out log-likelihood that measures the quality of each topic model (Fukumasu et al., 2012), residuals (Taddy, 2012), lower bound of the topic number (Cheng et al., 2015) and semantic coherence in relation to the proposed number of topics. Here lower bound measures the model

42 convergence, while semantic coherence captures how words co-occur together in topics (Roberts et al., 2014).

The tricky part on this stage is that choosing the number of topics often depends on the researcher choice (Banks et al., 2018; Genovese, 2015; Isoaho, Moilanen, et al., 2019).

In this paper it was decided to run an automated estimation to increase the model’s reliability. However, choosing the number of topics is always a trade-off between k-value and other variables for evaluation.

Running the number of topics estimation on a given sample of documents was conducted in two steps. Firstly, the calculation of model fit between 10 and 50 was conducted.

Figure 2. Diagnostic Values for Finding the Appropriate Number of Topics – from 10 to 50.

(Figure is based on the present analysis)

It is seen that given values indicate that the appropriate number of topics lays around 30.

It is impossible to continue with the interval between 10 and 20, as all the given indicators show less reliable values having too high residual score and too low held-out likelihood, and, consequently, worse model quality. Then, the topic count range between 40 and 50 also cannot be applied for further estimation, as the semantic coherence score tends to have too low values illustrating calculation of the less coherent topics that will be hard to

43 identify and analyze. At the same time, values fall in between 27 and 33 may draw more prominent results being coherent and clustered well enough to proceed with further interpretation of energy policy agenda topics to be revealed.

Figure 3. Diagnostic Values for Finding the Appropriate Number of Topics – from 27 to 33

(Figure is based on the present analysis)

The present graph shows quite unusual but easily interpretable results. Recalling the notion that the number of topics choice is always a trade-off, it was decided to take the K

= 30 for further model calculation as it shows relatively appropriate results. Indeed, K=30 has a very good semantic coherence score that is significantly higher than other values present while held-out likelihood reflects the appropriate model quality and lower bound value illustrates the appropriate model convergence level. Here it can be argued that model with K=31 seems to show better model quality having higher held-out likelihood and similar values on other indicators. However, more topics do not mean a better model fit, as sometimes an increased number of topics increases the chance to receive lower quality of word-per-topic distribution, and, consequently, increase the possibility for topics to have non-sensical and non-interpretable (Mimno et al., 2011). Moreover, K = 31 model tends to have lower semantic coherence value meaning the less coherent word

44 co-occurrence within topics. That is why it was decided to calculate a model consisted of 30 topics.

Another step here is the to calculate and select the best model based on the topic exclusivity and semantic coherence. Semantic coherence estimates how words co-occur together in generated topics (Mimno et al., 2011; Roberts et al., 2014), whereas topic exclusivity captures if the highly frequent words from one topic appear in the same manner in other topics (Reisenbichler & Reutterer, 2019).

Even though in practice the evaluation of topic models are still underdeveloped area of research due to the complexity and sensitivity of the topic models to the data observed, it is necessary to select the best-fitted one with desirable properties and increase the model validity (Maier et al., 2018; Roberts et al., 2014). Moreover, in this paper, the relatively small number of topics requires careful consideration, as default spectral initialization used to huge amounts of data consisting of more than 10 000 words tends to overgeneralize the word-per-topic allocation (Roberts et al., 2014).

Figure 4. Calculated STM Models with the Highest Held-Out Likelihood

(Figure is based on the present analysis)

Four models are pictured on the graph according to their exclusivity and semantic coherence. It is seen that model 1 (red dots) does not cluster well having too wide range

45 on both exclusivity and coherence scales. It means that topics generated within the 1st model are not coherent enough and may overlap with each other. While 2nd (green dots), 3rd (light-blue dots) and 4th (deep-blue dots) models have the same values on the semantic coherence scale, the 4th model seems to have better-defined clusters and higher exclusivity, that is why it was decided to proceed with it.

46 3. Analysis: Topic Modeling

The next step after completing initial analysis is building the selected model, label the topics and interpret the received results. Firstly, in this chapter, the topic model with labeled topics will be presented and interpreted to characterize the EU energy policy image. Secondly, the results of estimate effect regression will be described, which provides insights which of the revealed latent topics tend to be associated with the 2009-2014 or 2009-2014-2019 Commission term.