• Keine Ergebnisse gefunden

Metric-Based Detection of Variability-Aware Code Smells

2.4 Challenges for Evolving Variant-Rich Systems

3.1.2 Metric-Based Detection of Variability-Aware Code Smells

26 3. Analyzing the Impact of Preprocessor Directives on Source Code Quality by almost all participants. This result is consistent with the nature of this smell, as usually a coherent piece of code is duplicated, which does not hinder understandabil-ity but increases maintenance e↵ort. In contrast, the smell AnnotationBundle is perceived as (very) problematic with respect to program comprehension by almost all participants. Overall, with some minor exceptions, all variability smells are considered at least problematic by the majority participants for all three of the considered quality aspects. While this result was surprising in its clearness, it indicates that there is a need for identifying such variability-aware code smells and, if possible, to remove them.

3.1. Investigating Variability-Aware Code Smells in-the-wild 27

Table 3.2: Metrics capturing basic characteristics of the Annotation Bundle code smell (adapted from [18])

Abbrev. Full Name Description

LOC Lines of code Source lines of code of the function, ignoring blank lines and com-ments.

LOAC Lines of anno-tated code

Source lines of code in all feature locations within the function.

Lines that occur in a nested feature location are counted only once.

Again, blank lines and comments are ignored.

CND Cumulative nesting depth

Nesting depth of annotations, accumulated over all feature loca-tions within the scope. An#ifdefthat is not enclosed by another

#ifdefis called atop-level#ifdefand has a nesting depth of zero;

an #ifdef within a top-level #ifdef has a nesting depth of one, and so on. Nesting values are accumulated, which means that a function containing two feature locations with a nesting depth of one is assigned aCNDvalue of 2.

FCdup Number of fea-ture constants

Number of feature constants, accumulated over all feature locations within the scope. Feature constants that occur in multiple feature locations are counted multiple times.

FL Number of fea-ture locations

Number of blocks annotated with an#ifdef. An#ifdef contain-ing a complex expression (e. g.,#ifdef A && B) counts as a single feature location. An#ifdef with an #else/#elif branch counts as two locations.

NEG Negation The number of negations in the #ifdef directives in a function.

Both#ifndef Xand#if !defined(X)increaseNEGby 1. #else branches also increaseNEGbecause#if <expr> ... #else ...

is treated as#if <expr> ... #endif #if !<expr> ...

w1· LOAC

LOC ·F L+w2· F Cdup

F L +w3· CN D

F L (3.1)

The equation consists of three terms which capture the following characteristics. The first term mainly captures the amount of variable code, also taking into account how many variable parts exist in a function, and thus, accounting for scattering. Next, the second term addresses the number of preprocessor variables and how they are dis-tributed over annotated code fragments. As a result, this term provides a way to relate scattering and tangling and integrate both of them into our smell definition as a proxy for complexity. As a third term, we also take the nesting depth into account, as this has been shown to a↵ect the comprehension of the nested code fragments.

Finally, each term is complemented by a weight, which allows to control the influence that a particular term has on the overall metric. The reason is that for di↵erent develop-ers or in di↵erent projects, the perception of what makes a function an Annotation-Bundle is di↵erent. Consequently, introducing weights allows for parameterization of the actual metric-based definition of a smell. Besides the weights, we also provide customizable thresholds for each of the atomic metrics in Table 3.2. These thresholds

28 3. Analyzing the Impact of Preprocessor Directives on Source Code Quality

Figure 3.3: Overview of the (pipe&filter) architecture of Skunk, implementing the metric-based code smell detection.

constitute lower boundaries, that is, if the metric is below the threshold, it is not con-sidered for the computation of the overall metricABsmell. This way, we aim at reducing false positives in the result set.

Implementation. We implemented our technique in the toolSkunk2with a pipe&filter architecture so that any intermediate results are available for further usage. As shown in Figure 3.3, the tool consists of two stages: preprocessing and smell detection. In the preprocessing stage, the source code of the system of interest is analyzed. To this end, we employ to existing tools: cppstats3 andsrcML.4 With cppstats, we extract most of the variability-related information that we need for our metrics, such as the location of preprocessor directives or the preprocessor variables involved. However, we need additional information, such as the location of functions definitions or informa-tion about funcinforma-tion calls. This informainforma-tion we obtain from srcML, which provides a bootstrapped Abstract Syntax Tree (AST) in XML format. Most notably, srcML provides this AST with all preprocessor directives included, which is pivotal for any variability-aware analysis.

Based on the results of this preprocessing stage,Skunkperforms the actual code smell detection. To this end, we initially extract all relevant information and compute our metrics, relevant for detecting a particular code smell (theFeature Syntax + Metric

2https://github.com/wfenske/Skunk

3http://www.fosd.net/cppstats/

4http://srcml.org/

3.1. Investigating Variability-Aware Code Smells in-the-wild 29 part in Figure 3.3). Then, the actual smell detection process starts. To this end, we provide code smell templates for each of our proposed variability smells. This template not only specifies the metrics that are relevant for the smell to be detected; it also allows the user to specify the aforementioned thresholds for each of these metrics. For instance, a user can define the threshold for theLOAC/LOC ratio to be at least 50%.

As a result, any function that does achieve this threshold is discarded from the further smell detection. Moreover, in this step the smell metric can be parameterized, that is, the weights can be adjusted. Finally, the code smell template and variability-related information is used to compute the smell metric (here: ABsmell) and the value is stored together with the function name and location. These results can now be investigated by the user to identify smelly functions and to initiate possible countermeasures.

Evaluation. To evaluate our technique, we conducted a case study on five open-source systems, among them long-living and popular systems such as vim or emacs.

With this evaluation, we aim at answering the following research questions:

RQ 1 Does our algorithm detect meaningful instances of the AnnotationBundle smell?

RQ 2 Does the AnnotationBundle smell exhibit recurring, higher-level patterns of usage?

While RQ 1 is focussed more on the accuracy of our detection technique (quantita-tive analysis), RQ 2 is more about certain characteristics that contribute to the overall pattern (qualitative analysis). For the evaluation, we set upSkunkwith a parameter-ized code smell template, based on our experiences with di↵erent variant-rich systems (cf. [18]). Then, we ran the tool on the five subject systems, all of them with a size of

>100 KLOC and an amount of variable code between 20% and 70%. As a result, we obtain a list for each system that contains the function, its location and the computed ABsmell metric value (in descending order). Since there is no experience or baseline with variability-aware code smells and because our metric is not normailzed in a fixed range (e. g., between 0 and 1), it is difficult to determine, which metric values indicate a real smell (true positive) and which values can be neglected (false positives).

To overcome this uncertainty, we performed a manual analysis by sampling the results in the following way: First, for each system, we selected the 10 entries with the highest ABsmellvalue. Second, we divided the remaining entries of the result list into 10 equally distributed segments and then randomly selected one entry per segment. As a result, we obtain 20 smells for each system, which have been manually analyzed by me and another author (cf. [18]). To this end, we independently evaluated each smell regarding its impact on understandability and maintainability on a three point scale ({ 1,0,1}).

Eventually, we compared our results and, in case of disagreeing for a particular smell, discussed their ratings to find a consensus.

In Figure 3.4, we show a summary of the quantitative results. Except for the subject system PHP, our results indicate that our definition for the AnnotationBundle

30 3. Analyzing the Impact of Preprocessor Directives on Source Code Quality

Figure 3.4: Overview of the quantitative results for the evaluation. The overall number of smells detected for each system is indicated by the valueABpot.

smell can be considered appropriate and that our detection technique captures the characteristic of this small with high accuracy. Especially the top-10 entries of the result list have been shown to be very good indicators with an accuracy of more than 70%. Hence, we argue for RQ 1 that our technique is capable of detecting meaningful instances of the smell AnnotationBundle.

For our qualitative analysis, which relies on the manual inspection of the selected code smells (see above), we made several observations about the characteristics of the smell as well as for possible reasons why it has been introduced. Among others, we observed the following (more details about the observations can be round in [18]).

• There is no single reason, and thus, no single metric, that causes the Annota-tionBundle smell. Rather, it is usually a combination of several characteristics that contribute to a smell.

• One particular aspect that fosters the occurrence of our smell is the interaction of preprocessor variability with runtime variability (i. e., conditional statements of the host language). The reason is that already complex control flow is obfuscated even more when annotated with preprocessor directives.

• Long functions are more prone to constitute a smell than short functions.

• A form of theAdapter pattern [21] is a recurring pattern that is likely to introduce the AnnotationBundlesmell.

• Some functions constitute Featurized God Functions, a form of the God Class smell [20]. They are characterized not only by its length, but also by comprising many diverse features tangled up with each other and scattered across

3.2. How Preprocessor-Based Variability A↵ects Maintenance 31