• Keine Ergebnisse gefunden

How reviewers think about internal and external validity in empirical software engineering

N/A
N/A
Protected

Academic year: 2022

Aktie "How reviewers think about internal and external validity in empirical software engineering"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

How Reviewers Think About Internal and External Validity in Empirical Software Engineering

Janet SiegmundNorbert SiegmundSven Apel

Abstract: Empirical methods have grown common in software engineering, but there is no consensus on how to apply them properly. Is practical relevance key? Do in-ternally valid studies have any value? Should we replicate more to address the trade-off between internal and external validity? We asked the key players of software-engineering research, but they do not agree on answers to these questions.

The original paper has been published at the International Conference on Software En- gineering 2015 [SSA15]. Empirical research in software engineering came a long way.

From being received as a niche science, the awareness of its importance has increased. In 2005, empirical studies were found in about 2% of papers of major venues and confer- ences, while in recent years, almost all papers of ICSE, ESEC/FSE, and EMSE reported some kind of empirical evaluation, as we found in a literature review. Thus, the amount of empirically investigated claims has increased considerably.

With the rising awareness and usage of empirical studies, the question of where to go with empirical software-engineering research is also emerging. New programming languages, techniques, and paradigms, new tool support to improve debugging and testing, new vi- sualizations to present information emerge almost daily, and claims regarding their merits need to be evaluated—otherwise, they remain claims. But, how should new approaches be evaluated? Do we want observations that we can fully explain, but with a limited gen- eralizability, or do we want results that are applicable to a variety of circumstances, but where we cannot reliably explain underlying factors and relationships? In other words, do researchers focus on internal validity and control every aspect of the experiment setting, so that differences in the outcome can only be caused by the newly introduced technique?

Or, do they focus on external validity and observe their technique in the wild, showing a real-world effect, but without knowing which factors actually caused the observed differ- ence?

This tradeoff between internal and external validity is inherent in empirical research. Due to the options’ different objectives, we cannot choose both. Deciding for one of these options is not easy, and existing guidelines are too general to assist in making this decision.

With our work, we want to raise the awareness of this problem: How should we address the tradeoff between internal or external validity? In the end, every time we are planning an experiment, we must ask ourselves: Do we ask the right questions? Do we want pure,

University of Passau, siegmunj@fim.uni-passau.de

University of Passau, siegmunn@fim.uni-passau.de

University of Passau, apel@uni-passau.de

(2)

Exter8al – 51%

Bala8ce – 29%

I8ter8al – 20%

[i8ter8al] would show 8o value to [the] SE commu8ity Without i8ter8al

validity, the results ca88ot be trusted

[With i8ter8al validity] you might get a more 'reliable' result, but the result could 8ot be used to explai8 a8ythi8g about the real world […] we first 8eed to clearly

co8trol [co8fou8di8g factors] before eve8tually

bei8g able to ge8eralise i8clude two studies [i8 a paper], o8e maximizi8g i8ter8al validity a8d the other maximizi8g exter8al

Fig. 1: Preferences for internal vs. external validity among program-committee and editorial-board members.

ground research, or applied research with immediate practical relevance? Is there even a way to design studies such that we can answer both kinds of questions at the same time, or is there no way around replications (i.e., exactly repeated studies or studies that deviate from the original study design only in a few, well-selected factors) in software-engineering research?

To understand how the key players of software-engineering research would address this problem, we conducted a survey among the program-committee members of the major software-engineering venues of the recent years [SSA15]. In essence, we found that there is no agreement and that the opinions of the key players differ considerably (illustrated in Fig. 1). Even worse, we also found a profound lack of awareness regarding the tradeoff between internal and external validity, such that one reviewer would reject a paper that maximizes internal validity, because it “[w]ould show no value at all to SE community”.

When we asked about replication, many program-committee members admitted that we need more replication in software-engineering research, but also indicated that replications have a difficult stand. One reviewer even states that replications are “a good example of hunting for publications just for the sake of publishing. Come on.”

If the key players cannot agree on how to address the tradeoff between internal and external validity (or even do not see this tradeoff), and admit that replication—a well-established technique in other disciplines—would have almost no success in software-engineering re- search, how should we move forward? In the original paper, we shed light on this question, give insights on the participants’ responses, and make suggestions on how we can address the tradeoff between internal and external validity.

References

[SSA15] Janet Siegmund, Norbert Siegmund, and Sven Apel. Views on Internal and External Valid- ity in Empirical Software Engineering. InProc. Int’l Conf. Software Engineering (ICSE), pages 9–19. IEEE CS, 2015.

Referenzen

ÄHNLICHE DOKUMENTE

Keywords: Employee Satisfaction, Customer Satisfaction, Service Quality, Service, Hotel Industry, Internal Factors, External

Furthermore, it studies whether there is a cointegration relationship between the current account and major variables such as the real exchange rate

Systemic changes at the global level have also been met with changes at national and local levels in Africa: for South Sudan a new state was born; civil uprisings led to changed

Consistent with this approach, government, organized labour and business signed the Framework Agreement for a Sustainable Mining Industry to ensure sustainability of the

HE Motlanthe noted that the South African government has been considering how to create a political arrangement that ensures these contributions also benefit South Africa,

Toward midnight (Egypt time) on 20 June, excellent sources in the Egyptian military warned that SCAF decided to do what it takes to block the rise to power of the Muslim

1007/ s00778- 020- 00644-3 The original article has been published with a missing infor- mation that Shuo Shang is the co- corresponding author.. There is no author affiliated with

From in vivo studies it was suggested that sun-exposed skin contained shorter telomeres in epidermal keratinocytes, dermal fibroblasts and melanocytes compared to sun-protected