• Keine Ergebnisse gefunden

4. Research Methodology 39

4.2. Data Collection

4.2.4. Extracting the Test Level

We developed three different rule sets to classify tests into unit and integration tests. While the first two rule sets (ISTQB and IEEE) have the definitions of the ISTQB and IEEE as basis (Section 2.1.2), the third rule set (DEV) represents the developer classification of a test and is based on popular coding conventions [119, 120].

Regardless of the rule set that is used, the input is the same for all three: coverage data that is recorded for all tests separately. Hence, the coverage recording includes a list of executed tests together with their covered units. Beforehand, we filter out every test class that might be in the coverage data, as we want to base our test level assignment only on the covered production classes. We look at the path of each covered unit and if this path contains the word “test”, we assume it is a test class and a production class otherwise. We exclude units that have “test” or “validate” as part of their name. A unit counts as covered, if at least one method of the unit was covered by the test. In addition, we only record the coverage of units that arewithinthe project. Hence, if a test or another unit calls functions from a framework that are outside the scope of the project, we do not cover its execution.

The reasoning behind this is that we want to focus on the project that we are analyzing and what is really tested inside this project. Besides these filtering of the tested units, we also need to filter out tests that can not be classified correctly. Hence, we are filtering out:

• tests that are skipped by the test execution framework.

• tests that are empty.

• tests that only test constants but no functions.

• tests that test other projects but not the project at hand (e.g., tests that ensure the correct working of java.io).

• tests that test the test setup itself (e.g., testing if the database is correctly running, but not executing any production code).

• tests where an exception is directly thrown after the first method call (if this is the case, no coverage gets collected and we can not classify the test).

The resulting tests and their tested units are then used to assign a test level to each test separately. Table 4.3 shows the rules that we designed for each rule set. A test is a unit test in respect to the IEEE rule set if it only covers units from within one package. This represents the IEEE definition, where a unit test is a test that tests several units that are logically connected. Within Java, related units are put into one package, as the official Java documentation states [213]. Same goes for Python projects [214]. However, if the test covers units from more than one package, it is classified as integration test. The ISTQB definition is stricter than the IEEE definition. A test is a unit test in respect to the ISTQB rule set if it covers only one unit. If it covers more than one, the test is classified as integration test. We developed several rules to represent the developer classification of a test based on coding conventions. Hence, if a test is matching the name of a unit (e.g. if we have a unit calledFnaticand a test calledFnaticTest) or the path to the test has the term “unit” in it (e.g.,src/test/java/de/ugoe/unit/FnaticTest.java) it is classified as a unit test. If there is no unit matching the name of the test or the path to the test has the term ”integration“ or ”IT“

in it (e.g. src/test/java/de/ugoe/FnaticITTest.java) it is classified as integration test. It is important to mention that we do not classify theintentof a test, but its actual type according to the definitions. Hence, we do not evaluate if the testshould be a unit test, but if itisa unit (or integration) test.

Figure 4.4 gives some examples of our classification schema. The figures depict schematic call graphs. The first figure in Figure 4.4 shows that a test only calls one unit from within one package. If we apply the above explained classification schema the test gets classified as a unit test for the IEEE and ISTQB definitions. This is different for the second figure in Figure 4.4. Here, two different units are called from testt1 which both reside in one package. Hence, the test gets classified as an unit test for the IEEE definition, but as an integration test for the ISTQB definition. The third figure in Figure 4.4 depicts a test that is classified as an integration test for both definitions, because the test calls two different units from two different packages. A more complex example is given in the forth figure in Figure 4.4. While the test only calls one unit directly, this unit calls other units from within other packages. Hence, the coverage data for testt1 would include all four units from two different packages. Therefore, this test gets classified as an integration test for both definitions.

Unit Test Classification Rules Integration Test Classification Rules

IEEE • Only covers units from within one package

• Covers units from more than one package

ISTQB • Only covers one unit • Covers more than one unit DEV • A unit matching the name of the

test

• There exist no unit matching the name of the test

• Path to the test has “unit” in it • The path to the test has “integra-tion” or “IT” in it

Table 4.3.: Rule sets for our test level classification.

To foster the understanding of our classification approach, we included two different real world tests from thecommons-ioproject. Listing 4.1 shows an IEEE/ISTQB unit test. This test asserts the correct workings of thegetPrefixfunction of theFilenameUtilsclass. More precisely, this tests checks ifgetPrefixreturns the correct string if the input string contains null bytes. As this test only calls one unit (i.e., theFilenameUtilsclass) and the class itself does not call other units, this tests gets classified as unit test for both definitions.

1 [ . . . ]

2 p a c k a g e org . a p a c h e . c o m m o n s . io ;

3 [ . . . ]

4

5 @ T e s t

6 p u b l i c voi d t e s t G e t P r e f i x _ w i t h _ n u l l b y t e () {

7 try {

8 a s s e r t E q u a l s (" ~ us er \\ ", F i l e n a m e U t i l s . g e t P r e f i x (" ~ u \ u 0 0 0 0 s e r \\ a \\ b \\ c . txt ") ) ;

9 } c a t c h ( I l l e g a l A r g u m e n t E x c e p t i o n i g n o r e ) {

10 }

11 }

Listing 4.1: Example of an unit test from thecommons-ioproject [185].

On the other hand, Listing 4.2 depicts a test, which is classified as integration test for both definitions. Here, the test checks if theByteArrayOutputStreamclass ofcommons-ioworks as intended if it is used within thecopyfunction of theCopyUtils class. As theCopyUtils class is from the org.apache-commons.io package and the ByteArrayOutputStream class from theorg.apache.commons.io.output package, this test gets classified as an integration test (according to both definitions).

P

1

Figure 4.4.: Different example call graphs.t1depicts a test,uxdepict different units andPx

different packages. 1) IEEE/ISTQB unit test; 2) IEEE unit test/ISTQB integra-tion test; 3) IEEE/ISTQB integraintegra-tion test; 4) IEEE/ISTQB integraintegra-tion test.

1 [ . . . ]

Listing 4.2: Example of an integration test from thecommons-ioproject [185].

Besides the classification via per-test coverage data, we also evaluated other classifica-tion methods. Unfortunately, none of them could be reused. Orellana et al. [118] proposed to classify tests based on with which Maven plugin they are executed (i.e., Maven Sure-Fire [119] or Maven FailSafe [120]) (Section 3.1). The problem with this approach is its applicability, as it can only be applied to Java projects which use Maven and have config-ured the use of both of the mentioned Maven plugins. Thus, the number of projects that can be analyzed via this approach is rather limited. For example, none of the projects that we have used in our study (Section 4.2.2) make use of the Maven FailSafe plugin, while some of the tests of the projects are indeed integration tests. In addition, the approach by Orellana et al. [118] classifies the whole test suite instead of a single test. A deeper analysis based on tests and not test suites would not be possible.

Another possible classification method is the differentiation of tests based on their as-sertions. If a test only asserts one unit, it is classified as a unit test and as an integration test otherwise. But, during the evaluation of this approach we found that developers some-times use only one assert, even for integration tests. They call several units during the test, but only assert one unit, e.g., to check if the communication from one to another unit was working (e.g., by asserting if a flag in the receiver unit was set).

In contrast to other approaches, our approach is fine-grained, as we classify each test separately instead of each test suite only. Therefore, our level of detail is higher and better reflects the development reality. Furthermore, our approach is dynamic in contrast to other approaches that make use of static analysis of the tests and production units. The downside is that we need to execute the tests before we can analyze them to collect the coverage data, which can be difficult, as Tufano et al. [215] highlight. Nevertheless, a dynamic analysis has the advantage that we get more reliable results, as a static analysis approach could not handle techniques like dependency injection [122], reflection [122], other dynamic structures (e.g., functions that can be given as parameters of other functions in Python), or the usage of mocking frameworks (e.g., [216, 217]), which are commonly used nowadays.