• Keine Ergebnisse gefunden

I. Statistical Analysis and Content Analysis

The quantitative research presented in this dissertation is conducted by applying methods of statistical analysis and content analysis. In Papers 1 and 3 research hypotheses are tested by examining mean difference between two samples. Given the numerical character of variables in each sample, in the analyses of Paper 1 and Paper 3 unequal-variances t-tests are applied (Weiers 2005). The unequal-variances t-tests can be used to compare the means of two independent samples and can be applied if the F-tests on the equality of the two sample variances have been rejected (Weiers 2005). In the analyses of Paper 2 hypotheses are tested by examining the difference between two samples as well as between three samples. Given the categorical character of variables, in the analyses of Paper 2 Mann-Whitney U-tests are applied (Weiers 2005). Mann-Whitney U-test represents a nonparametric test and can be used to examine the difference between two samples and is similar to the t-test on independent samples but can be performed on ranked or ordinal data (Weiers 2005). In addition, in the analyses of Paper 2 Kruskal-Wallis H-tests are applied. Kruskal-Wallis H-test also represents a nonparametric test that can be performed on categorical data (Weiers 2005). Kruskal-Wallis H-Test can be used in order to examine the difference between more than two samples and represents a between-groups analysis (Weiers 2005).

Papers 4-5 apply multiple linear regression analysis in order to investigate the relationship between a dependent variable and multiple independent variables (predictors) (Weiers 2005).

In the regression analysis the nature of this relationship will be determined and how much of the variance in the independent variable can be explained by the resulting multiple regression equation (Weiers 2005). For each analysis in Papers 4-5 no evidence of multicollinearity between independent variables was detected (Weiers 2005). In addition, the assumptions of normally distributed errors and homoscedasticity are met for each analysis (Weiers 2005). In the analyses of Paper 6 a hazard function model regression is applied (Greene 1997). This type of regression is used in survival analysis and is used to estimate how long an entity will stay in a certain state (e.g. until death) (Greene 1997). In the analysis the hazard rate λ is the likelihood at which an entity does not change its state within a given time period (Greene

1997). In contrast to a linear regression this type of analysis considers the positive characteristics and the non-linear behavior of the dependent variable (Greene 1997).

Content analysis is defined as “any technique for making inferences by objectively and systematically identifying specified characteristics of messages” (Holsti 1969, p. 14). In the analyses of Papers 1-3 and Paper 5 computer-assisted dictionary-based approaches are applied (Rosenberg et al. 1990). In this regard, a dictionary provides a list of terms that are categorized according their psychological or contextual meaning (Weber 1984). This is often applied in sentiment analysis where the tone (e.g. positive or negative) in source materials such as UGC or news is determined (Pang & Lee 2008; Pang et al. 2002; Das & Chen 2007).

Firstly, Papers 1-3 and Paper 5 apply a dictionary provided by the General Inquirer which consists of word lists that can be used to categorize words according to their psychosocial meaning (Stone & Hunt 1963; Stone et al. 1966). Secondly, Paper 1 applies a dictionary that categorizes words according to their meaning in a financial context (Loughran & McDonald 2011). Thirdly, Paper 5 applies the Thomson Reuters Financial Glossary (2013) which contains key terms that are used in the financial industry. These aforementioned word lists are used in order to obtain the frequency of terms with respect to certain categories of meaning (e.g. positive words and negative words) contained in textual data such as UGC (Papers 2-3 and Paper 5) and corporate disclosures (Paper 1). Furthermore, in the analyses of Paper 5 the Gunning-Fog Index is applied in order measure the readability of writing in text (i.e.

UGC) (Gunning 1952; Loughran & McDonald 2014). Readability is defined as “the ease of understanding or comprehension due to the style of writing” (Klare 1963, p. 1). In this regard, the Gunning-Fog Index assesses the average sentence length in a text in combination with the proportion of complex words in a text (Gunning 1952).

II. Datasets

In the following the archival data sources that were used in the papers of this dissertation are described. Overall, the used datasets consist of structured as well as semi-structured data.

With regard to Paper 1, a sample of 4,360 FTSE-100 corporate disclosures was collected that were published during trading hours of the London Stock Exchange via Regulatory News Services between November 2007 and November 2009. With respect to the collected

corporate disclosures, Thomson Reuters Tick History was used to collect intraday price series of the corresponding stocks.

Thomson Reuters SDC Platinum was used in order to identify merger events and to collect merger-specific data. In total 28,933 US mergers & acquisitions transactions have been identified that have been announced between January 1st 2008 and December 31st 2011. This provided the base for the sample selection processes of merger events in Papers 2-6. For each sample of merger events, Thomson Reuters SDC Platinum and Thomson Reuters Datastream were used to collect company-specific data of companies that are involved in a merger attempt. With regard to the selected samples of merger events, LexisNexis was used in order to collect news articles that have been published in The Wall Street Journal and in The New York Times (Papers 4-6). Only news article were collected that were citing the name of a company involved in a merger attempt and that were published during the year prior to the announcement of the respective merger attempt.

In the sample selection processes of merger events in Papers 2-6 only merger events were selected that were officially either completed or withdrawn as of July 2012. In this regard, only merger-related social media posts (i.e. UGC) were collected that were posted in the course of a merger attempt, i.e. between the merger announcement and the date when the final outcome of the merger attempt is known (either completed or withdrawn). In Papers 2-3 Newstex Blogs on Demand was used to collect merger-related blog posts that contain the company name of the acquirer, the company name of the target and the word “merger”. In Papers 4-6 SDL’s SM2 Social Media Monitoring was used to collect merger-related social media posts that contain the company name of the acquirer and the company name of the target. SDL-SM2 is a database for historical social media content and provides the advantage to access social media data with respect to various social media types. All social media posts in the database are assigned to a specific social media type (e.g. blog, message board, microblog and social networking site) and to a specific social media platform (e.g. Facebook or Twitter). Both aforementioned social media data sources provide the full content (written text), the author, the source and the time of publication for all collected social media posts.