• Keine Ergebnisse gefunden

How Technological Advancements May Enable More Secondary Data Analyses

Secondary data analyses as described here involve considerable effort on the part of the analyst. Relevant data need to be identified, acquired, validated, preprocessed, and finally jointly analyzed. In this section, I will briefly reflect on ideas from statistics and computer science that may facilitate such analyses in the future.

Starting with the statistical aspect, it is worth noting that the statistical models we used were strictly rooted in the traditional meta-analysis perspective (e.g., Borenstein et al., 2009).

That is, we computed effect sizes for the original outcomes and modelled these in meta-analytic models that are essentially multi-level models with a hidden level (i.e., the level of raw data is assumed by the model). This conventional way of doing meta-analysis has some advantages. For one, casting studies or outcomes into effect sizes is quite descriptive and allows for some useful visualizations, such as funnel plots (Sterne & Egger, 2005). It can also be easier to ask original authors for effect sizes rather than a full data set. However, using effect sizes entails a loss of data when they are computed from raw data. Instead, summary estimates will be more precise when the raw data from multiple studies are modelled directly in a multi-level model (Riley et al., 2010). Going forward, this should become standard practice for meta-analyses, since there is little reason not to do it. In many ways, the effect size-based way is a workaround stemming from times when sharing data was cumbersome, which is now much easier with the internet and open repositories.

Accessing and reanalyzing existing data is indeed much easier than it probably was 20

years ago, but, as I learned, still quite effortful. Datasets shared by the original authors are often

sparsely documented, leading to a back-and-forth on what certain variables mean. Sometimes the

data are incomplete or even faulty. We discovered quite a few errors in the original data files.

GENERAL DISCUSSION 26

For example, in one case an author misinterpreted their own gender coding in the analyses.

Beyond these properties of the data themselves, the form in which they are transmitted can also be a challenge. Many authors rely on various proprietary file formats. Versioning and country locales can lead to problems. Authors often adjust psychological measurement in order to suit their purpose, for example by changing the response scale, or by removing, rewording, or adding scale items. I will discuss potential solutions to these issues in turn.

Proprietary datafile types is becoming less of an issue with the advent of open-source statistical packages such as R (R Core Team, 2022). With R, most proprietary datafiles can be read without issue, but problems may still prevail for older versions.

Authors changing psychological scales, sometimes without explicitly reporting this in the manuscript, and thus potentially distorting the measurement is a known problem. This has

sometimes to do with the fact that there are multiple references for a psychometric scale that report different versions. Sometimes there is only one reference, but the reference does not include the full text items. A solution could be to create a central registry for psychological measurements that supports versioning and forking (as in common version control systems such as git; Junio Hamano & others, 2022). This registry would assign stable links to psychological measures (including all full-text items and the measurement scales). Reviewers could then ask authors to link to the specific version they have used and confirm that they did not alter the scale.

If they did want to alter the scale, they would need to create a new fork in the registry and link to

that. Recently, a proposal has been made for a central register for studies to make negative

findings more discoverable and thus curb publication bias (Laitin et al., 2021). Similarly,

widespread use of a registry for psychometric scales would also facilitate the discoverability of

data for secondary analyses.

GENERAL DISCUSSION 27

Unclear documentation of data and code is also a common problem that may even worsen as psychological researchers perform more and more coding and data processing without formal training for such tasks. Beyond better training, one solution could be to incorporate datasets and code more explicitly into the review process, but this would place even higher burdens on voluntary reviewers. Perhaps there could be a similar solution like the registry for psychometric scales I proposed previously. Repositories for open data such as the OSF could offer features for uploading datasets that enforce certain quality standards for the data. Such a system could implement concepts from data base theory (Codd, 1970), such as data consistency tests, check the data for missings, and force users to label and explain variables and codings. Of course, such a system could only work on a voluntary basis, as there will always be exotic data structures or types that cannot be foreseen. However, many psychological studies have relatively standard formats, such as experiments or cross-sectional questionnaire studies, that could be implemented as templates in the system. If it were clear that a dataset conforms to a standard format, it would be much easier to process for secondary users, or perhaps even fully programmatically.

Taking these thoughts on standardizing data and study formats as well as psychometric measurement further, one could envision a future of machine-readable psychological research.

Currently, to understand a psychological study, one needs to retrieve the manuscript and read the plain-text methods section. These sections are still far from standardized, and the information conveyed can vary significantly between manuscripts. In many cases, one even requires insider knowledge of the respective field of research to comprehend the study. Imagine that instead, psychological studies were sufficiently standardized to be processed computationally. One data file would include full information of the study design as well as any measurements,

manipulations, and so forth. If these data were freely discoverable and accessible, secondary data

GENERAL DISCUSSION 28

analyses could be more and more automated utilizing innovations in big data processing and cloud computing, such that analysts can search and retrieve data on specific manipulations or measures. Considering how diverse psychological studies are, one may find this idea farfetched.

But consider how far computational communication through standardizing protocols has progressed. Servers all over the world exchange any kinds of data in real time over the internet, connecting warehouses, financial markets, factory components, or even cars. Finding exchange formats and protocols for psychological research will be challenging but could ultimately pave the way to truly connected research with a cumulative, accessible, and comprehensible evidence base.

Limitations

The vision for psychological research that I tried to develop in this dissertation perhaps deviates from the mainstream view. Instead of expanding the concept space in (social and personality) psychology in an ever-faster pace and aiming for breakthrough findings, I argue for a slower, perhaps more boring and bureaucratic way of doing science. In this boring science, there would be higher thresholds for introducing new ideas. More time would be spent on integrative work that standardizes and connects theories and data. The work of organizing and curating previous evidence would be cherished, rather than dismissed as grunt work for the big storytellers. Progress would seem slower but would be more stable and incremental. However, I am uncertain how compatible such a view of psychological science is with the current social architecture of academia, where jobs are sparse and visibility is key.

A related but unresolved issue is how to give credit to the original authors when doing extensive secondary data analyses. The analyses in this dissertation depended heavily on others’

previous work and often ad hoc support with making data accessible and understandable, yet the

GENERAL DISCUSSION 29

only credit these authors received was a citation, often for unpublished work. An alternative could be to involve authors as co-authors with pre-determined responsibilities limited to support with curating the data. The final article would then include a note on the contributions of all authors. Such article types with a large number of authors are becoming more common (for example, the multilab replications of ego depletion; Vohs et al., 2021).

Another potential limitation is that the proposed approach of re-using existing research data may be more feasible for some types of research than for others. The approach seems especially suited for correlational research, which was the primary approach of paper 2 of part II (Frankenbach et al., 2022). In paper 1 of part II, we relied on experimental data, but the approach was in essence also correlational, since we “correlated” the experimental effect with a

measurement of personality. This point is self-evident, since secondary research can only use measures and manipulations that were in the original research, and incidental measurements are much more common than incidental manipulations (if they exist at all).

One issue that I have not definitively discussed is whether it is under all circumstances valid to detach measurements from the theoretical context in which they were conceived. For example, in paper 2 of part II, we retrieved the item “During the last month, how often have you had sexual thoughts involving a partner?” from the Sexual Desire Inventory (Spector et al., 1996) and classified it according to its literal, “atomic” meaning as a measure of the frequency of sexual thoughts. In the inventory, however, the item is thought to reflect a construct called

“dyadic sexual desire”. This also raises the question if questionnaire items only reflect their

verbatim meaning, or if the item context like instructions, previous items, or the overall study

context are also reflected in the item response.

GENERAL DISCUSSION 30

Conclusion

Part I of the present dissertation project illustrated how and why dysfunctional research