• Keine Ergebnisse gefunden

7. Qualitative Evaluation of the Differences between Unit and Integration Tests 121

7.2. Evaluation of RQ 2.5: Test Objective

7.2.2. Maintainability Testing

Maintainability testing is used to test all characteristics that influence the difficulty to change a program. These characteristics include code structure, modularity, code comment quality, and so on [9]. However, most of those characteristics can only be evaluated by static tests (i.e., the source code is not executed). One technique that is especially useful for testing the maintainability are reviews [9].

Maintainability is a concept that is rather broad. The ISO 25000 standard characterizes maintainability by dividing it into five different categories [342].

• Modularity: degree to which a software is separated into different components so that a change to one component does not (or only on a minimal level) affect another component.

• Reusability:degree to which code parts can be reused in other components.

• Analysability: degree to which it is possible to trace the impact of a change to parts of the system, or to detect the causes for software defects, or to find parts that should be modified.

• Modifiability:degree to which a system can be modified without introducing defects or reduce the systems quality.

• Testability: degree to which it is possible to fulfill test criteria for a system and to which tests can be executed to measure if those criteria are met.

Within this section, we present the results of our analysis of the research (Section 7.2.2.1) and practical (Section 7.2.2.2) view on the topic of maintainability testing.

7.2.2.1. Scientific View

One of the earliest publications on maintainability testing was written by Oman et al. [343].

They tried to determine different factors that have an influence on the maintainability of software. They present different metrics that can be used to measure these factors. Oman et al. [343] discuss the reasoning for these metrics and how they fit to the determined fac-tors. Furthermore, they present an approach to compile these metrics into one metric, which should measure the maintainability of a software system. This metric is called the Main-tainability Index (MI).

Within their follow up publication [344] the authors makes use of the MI. They performed a case study on 8 software systems to calibrate the MI (i.e., defining factors of the chosen metrics). Afterwards, they evaluated their MI formula on 14 industrial projects by compar-ing the output of the formula with expert opinions. The evaluation highlighted that there is a correlation between the MI and the expert opinions. Hence, they concluded that the MI is a useful measure for the maintainability of software.

That the MI is a useful metric to test the maintainability of software is also highlighted by other publications (e.g., [345, 346, 347, 348]). Based on these publications, there are several others that try, e.g., to improve/extend the maintainability index (as a recent review highlights [349]) or that develop new models to grasp the concept of maintainability and make it testable (e.g., [350, 351, 352]). There are even different forms of MIs, e.g., [353, 354]. Additionally, the research community defined several other metrics that are connected to the maintainability of software like the readability [355].

The testing of maintainability itself (e.g., through the calculation of the MI) is mostly done through the calculation of source code metrics (e.g., Cyclomatic Complexity (CC)) and evaluating if a certain threshold is exceeded. These metrics are calculated on the unit level and aggregated later on. However, these metrics can also be used to test the maintainability of single units (e.g., by defining a threshold for the unit level).

Spillner et al. [9] describe, that the maintainability should be tested via reviews on the unit level. Extensive research is also done in this direction to improve the quality of reviews or make the reviewing process more efficient. There exist approaches to recommend reviewers for code parts (e.g., [356, 357]) or to improve the overall review process (e.g., [358]). It is possible to perform code reviews on the unit level. On which level reviews are performed (i.e., unit, integration, or system level) is highly dependent on the employed reviewing process and the system under review. However, most code reviews are done after a change of a developer, i.e., one developer reviews the change of another one and, therefore, several code units may be affected by the review.

7.2.2.2. Practical View

The interest within the developer community on maintainability testing seems to be limited.

We found more resources discussing the topic of maintainable tests (e.g., [359, 360]) than maintainability testing. However, there exist several resources that are focusing on this topic.

One resource is a blog post by Nupul Kukreja [361]. Within his post he defines the term maintainability and its sub-characteristics. In addition, Kukreja shows different tables to each of these sub-characteristics, which include different code quality/maintainability metrics together with their correlation to quality, importance, feasibility of automated eval-uation, ease of automated evaleval-uation, completeness of automated evaleval-uation, and units. He makes clear that the maintainability of software code is very complex and therefore hard to assess via different metrics. However, there exist metrics with which it is possible to assess several aspects of maintainability. These metrics are not only unit-level metrics (e.g., CC or unit length), but also metrics that are calculated over all units (e.g., coupling, cohesion).

Furthermore, Kukreja also highlights the importance of the MI, which we also described within the scientific view (Section 7.2.2.1).

During our analysis we found several static source code analysis tools that are able to cal-culate (parts of) the metrics that are mentioned by Kurkeja and that are used to calcal-culate the MI (e.g., [354, 362, 363]). Two of these tools are SonarQube [362] and Checkstyle [363].

SonarQube [362] is a platform that is able to statically analyze source code to assess its quality. The results of this analysis are then presented within a website that can be browsed by developers. It calculates metrics like the number of duplicated lines of code, the com-plexity of code, or the comment lines of code. Furthermore, SonarQube can calculate met-rics that are connected to the maintainability, e.g., it is able to calculate a “Maintainability Rating”, which is a rating that is based on the ratio of the code size and the estimated

time to fix all maintainability issues of the code. All these maintainability related metrics that are calculated by SonarQube are calculated for the whole system and not for the units themselves.

Checkstyle [363] is a static source code analysis tool that is able to check the adherence of code to given style guidelines. For example, it can determine if the brackets of loops are at the correct place (i.e., in accordance to the style guideline given to Checkstyle). Through this, Checkstyle can contribute to assessing the maintainability of source code, but it cannot assess all of its facets.

Another technique that is often discussed for maintainability testing is code review. There exist several platforms that can assist developers with the execution of them, e.g., Code-Flow [364], GitHub [180], or Gerrit [365]. All of these tools and platforms have in com-mon that changes that are committed to the VCS are reviewed. While it is possible to review only changes for one unit, this is often not the case as several units are changed within one commit. Therefore, most of these reviews can