• Keine Ergebnisse gefunden

6. Case Studies 67

6.1.3. A Network-Based Model

In this experiment, the network-based model described in Section 4.2.3 is evaluated.

In doing so, the model for simulating the lifetime of bugs presented in Section 4.2.2 is also included in the evaluation since the former one is based on it.

This experiment is based on [5], where the following research question is analyzed:

”Can we simulate effects like the loss of a core developer realistically?”. To assess the state of the entire software under simulation, we consider the number of open and fixed bugs. Thus, the lifetime of bugs is an important factor for the evaluation of the entire software system under simulation.

Setup: To initialize the simulation model parameters for the contribution behavior and the bugfix frequency of the different developer roles are required. Furthermore, the bug introducing probability for each modeled bug type and the number of initial

6. Case Studies 72 categories must be determined. The mining process for this experiment is based on the mining framework described in [115]. To adapt the simulation, only the parameters based on mining are changed for different runs.

For this experiment, three projects of similar size and duration are examined. How-ever, these projects differ in the effort that the developers spent. A complete data set to instantiate the model comes from K3b1 from previous studies. For the other two projects Log4j2 and Kate3, only the parameters for project size, number of de-velopers of a certain type, the project duration and the number of initial clusters are determined. All other parameters are fixed and based on K3b. An overview of several parameters is presented in Table 6.1.

Project Commits Files Developers Duration

K3b 6142 1046 (1|1|6|116) 12

Log4j 8082 620 (1|1|6|13) 13

Kate 14282 681 (1|1|10|328) 11

Table 6.1.:Overview of project parameters.

For developers: (core|maintainer|major|minor).

Duration: in years. Adapted from [5].

Results: From the mining perspective, essential parameters for simulating a core developer’s loss are information about the team constellation as well as the bug introducing and fixing rates. The latter are used to monitor the impact on the software quality. According to [111], changes in the team constellation influence also the quality of the software.

The commit behavior of the developer types is shown in Figure 6.4 for K3b, in Figure 6.5 for Kate, and in Figure 6.6 for Log4j. The following heuristics are based on this results: core developers perform more than 20% of all commits; more than 25% of the commits of a maintainer are bugfixes; major developers apply more than 2% of all commits; minor developers perform less commits. The number of each developer type shown in Table 6.1 is based on this heuristics.

Since the change coupling graph represents the number of files that are changed together several times and files that are semantically related build clusters in this graph [2], we add the number of clusters to the parameter set of the simulation. We are more interested in clusters of higher dependency. Thus, we omit clusters with less then 5% of the nodes of the graph. The evolution of the number of clusters of the three projects is depicted in Figure 6.7. The number of larger clusters will be between three and six for the analyzed projects after a strong growth in the beginning.

2http://logging.apache.org/log4j

3https://www.kde.org/applications/utilities/kate

73 6.1. Simulating Software Evolution using an Agent-Based Model

Figure 6.4.: Commits by developer type of K3b per month adapted from [5].

Figure 6.5.: Commits by developer type of Kate per month adapted from [5].

To model bugs, we use average values for overall reported and closed ones managed in ITSs. We consider the bug typesmajor,normal, and minor. Other types occurring in the ITS are assigned to one of these three. For example, the reported and closed bugs of Kate are depicted in Figure 6.8. For the instantiation of the model, we use average values of all three projects. These are 0.87 reported and 0.81 fixed bugs per day which is synonymous with a round in the simulation.

With these parameters, we can instantiate the simulation models. First, we simulate the reference project K3b. Afterwards, we change particular parameters according to the mined data for each of the other projects and simulate them. Parameters to be changed are the project size and duration as well as the number of developers per type. To answer the question whether the loss of a core developer can be simulated, we perform two simulation runs for each project. First, a run without changes to the mined parameters. Second, a run where the core developer leaves the project after half of the duration. This results in 12% to 20% fewer bugs that are fixed due to the reduced effort.

The simulated growth measured in the number of files of this model fits well for the project K3b as depicted in Figure 6.9. The other analyzed projects have a similar

6. Case Studies 74

Figure 6.6.: Commits by developer type of Log4j per month.

Figure 6.7.: Number of clusters over time.

growth trend that can be reproduced as well. The results of simulating projects with other growth trends can not be displayed with this setup. This is due to the fixed parameters based on the mining of K3b.

Furthermore, we analyzed in [8] how change coupling networks generated by an agent-based simulation evolves in comparison to the ones originating from the project history. This study is part of the work presented in [142]. The setup is similar to this experiment and K3b is used as the reference project. The simulation results are validated with results of Log4j. We compared several graph metrics like the coupling degree or the modularity. As a result, we figured out that the general evolution of file dependencies can be presented by a network-based simulation model. However, the modularity deviates in part from the reality.