A Tool Chain for Analysis and Model Abstraction of C Control Programs / submitted by Thomas Böhm

(1)

Submitted at System Software Supervisor a.Univ.-Prof. Dipl.-Ing. Dr. Herbert Pr¨ahofer May 2018 JOHANNES KEPLER UNIVERSITY LINZ Altenbergerstraße 69 4040 Linz, ¨Osterreich www.jku.at DVR 0093696

A Tool Chain for Analysis

and Model Abstraction of

C Control Programs

Master Thesis

to obtain the academic degree of

Diplom-Ingenieur

in the Master’s Program

Computer Science

(2)

Abstract

This thesis describes the implementation of a tool chain which allows the analysis of C control programs for the purpose of program understanding and model abstraction. Relying in different analysis methods, the tool chain allows analyzing program code and creating different abstract model representations, such as an abstract syntax tree, value sets and a control flow graph. For acquiring those basic model representations, the Frama-C analysis framework has been used. Further, those models are then used to perform a symbolic execution of the program,

which identifies all paths of the program and retrieves the conditions for each path. The

theorem prover Z3 is used to determine the solvability of the paths. By performing the symbolic execution multiple times, a state model of the program can be created. The analysis results, in particular, the created model representations and the state model, can finally help developers in analyzing and understanding the behavior of complex control programs.

(3)

Kurzfassung

Diese Arbeit beschreibt die Realisierung einer Werkzeugkette für die statische Analyse von Steuerungsprogrammen in der Programmiersprache C. Mit dieser Werkzeugkette wird das Ziel verfolgt, für C-Steuerungsprogramme Modellabstraktionen zu bilden und damit ein besseres Programmverständnis zu erreichen. Aufbauend auf bestehenden Analysemethoden und unter Einsatz des Analysewerkzeuges Frame-C werden unterschiedliche Modellrepräsentationen, wie ein Abstrakter Syntaxbaum, Wertmengen für Variabeln und Kontrollflussgraphen, erstellt. Diese Modelle werden dann weiterverwendet, um eine symbolische Ausführung des Programmes durchzuführen, welche alle möglichen Pfade des Programms identifiziert und deren Bedingun-gen bestimmt. Zum Bestimmen der Lösbarkeit von logischen PfadbedingunBedingun-gen wird der The-orembeweiser Z3 verwendet. Durch mehrfaches symbolisches Ausführen des Programmes wird schließlich ein Zustandsdiagramm erstellt, welches das reaktive Verhalten des Steuerungspro-gramms darstellt. Die Analyseergebnisse können nun dazu verwendet werden, um das kom-plexe Verhalten von Steuerungsprogrammen entsprechend darzustellen und den Entwicklern eine bessere Übersicht und besseres Verständnis über das Verhalten des Programms zu vermit-teln.

(4)

Acknowledgement

I want to thank my supervisor Herbert Prähofer for his helpful feedback in the implementation process, as well as the hints for writing this thesis.

The project has been carried out in cooperation with the Software Competence Center Hagen-berg (SCCH). I especially want to thank Josef Pichler for his cooperation and fruitful discus-sions.

This research has been supported by the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry of Science, Research and Economy, and the Province of Upper Austria in the frame of the COMET center SCCH.

(5)

Frame-C [1] is a static analysis tool for C programs. It implements a powerful abstract inter-pretation method which allows deriving possible values or value ranges of variables as well as possible pointer values for pointer variables at code positions. In this way, Frama-C allows drawing important conclusions about a C programs, e.g., find possible memory access viola-tions or invalid arithmetic expressions.

In this thesis, it should be evaluated how the abstract interpretation methods of Frama-C can be exploited for program understanding and model abstraction. Goal is to use the information provided by Frama-C for building higher-level models and views of a C program. The views and models then should assist a developer in program understanding and defect localization. The methods developed should be evaluated based on several case examples. Further, the C++ frontend for Frama-C, which is currently in an experimental state, should be evaluated. This Master’s Thesis will be conducted in cooperation with and is funded by the Software Com-petence Center Hagenberg (www.scch.at).

Referenzen

[1] Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles and Boris Yakobowski: Frama-C, A Software Analysis Perspective. In Formal Aspects of Computing, vol. 27 issue 3, March 2015. http://dx.doi.org/10.1007/s00165-014-0326-7.

a.Univ.Prof. Dr. Herbert Prähofer

Institute for System Software

T +43 732 2468 4352 F +43 732 2468 4345 herbert.praehofer@jku.at Secretary: Birgit Kranzl Ext 4341 birgit.kranzl@jku.at

(8)

Chapter 1 Introduction

This thesis describes the implementation of a tool chain which allows the analysis of C control programs for the purpose of program understanding and model abstraction. The tool chain is based on the Frama-C tool which provides different analysis methods [10]. Frama-C has been used to parse C programs, build an abstract syntax tree representation and perform a value analysis. These results are then further processed to finally do a symbolic execution and build a state model of the program.

The tool chain is supposed to support software developers in understanding the complex be-havior of C programs implementing automation control components. Such controllers usually reveal complex behavior over time, i.e., the control behavior changes with the changing en-vironmental conditions. Because of that, the concrete control outputs are often difficult to analyze. The tool chain should support the visualization of such complex behavior. This is accomplished by building state model representations of control behavior and by determining symbolic representation of outputs.

1.1 Tool Chain Overview

In the following I shortly illustrate the tool chain, including a description of the relevant phases. While the main parts are further described in other chapters, I give an overview of the system and describe the connections between the different steps.

(9)

Figure 1.1: Architecture of the system

Figure 1.1 shows the different phases of the tool chain. As we can see, the analysis consists of several steps, represented as blocks, which depend on each other.

The different phases are:

• Frama-C

The left block in Figure 1.1 and therefore the first component in the tool chain is Frama-C. Frama-C is a C code analysis tool [10]. Frama-C is used to parse the code and collect data about the analyzed code. It provides the possibility to be extended by plugins in order to perform additional calculation on the collected data. Some of those plugins are provided by default, like the value analysis plugin. By implementing and adding specific plugins, the tool can be extended. More information on Frama-C can be found in Chapter 3.

• Export Plugin

A plugin was created in order to retrieve additional data, like conditions for statements, and for exporting the results. That means, this plugin is used to collect information

from Frama-C and filter relevant parts. Furthermore, it finds dependencies between

statements, e.g., under which conditions a statement is reachable. As the last step it exports the collected data to an useful, easy-to-read file format that can be parsed by other applications. More information on the plugin can also be found in Chapter 3.

(10)

• Java Wrapper

The Java Wrapper reads the exported data from the export files generated by the ex-port plugin and represents it in several different Java classes. By referencing this Java wrapper component, the following components can access the data and perform further computations. This component of the tool chain is described in more detail in Section 3.3.

• Analysis Data

In the next component of the tool chain, the Java objects are used to create different kinds of program representations of the analyzed program:

– An abstract syntax tree (AST) of the program is created

– A control flow graph (CFG) is built

– Value sets (information about variables and their current value) are calculated

– Based on the AST, a symbolic execution is performed

Those representations, provided as Java objects, can be used by the last part of the tool chain.

• Model Abstraction

In the last step, the analysis results are used for creating visualizations of the program behavior and build abstract high-level models.

1.1.1 Tools & Frameworks

The tool chain uses several program analysis tools and frameworks. As already outlined, as a basis for the implementation we used the C analysis tool Frama-C. For checking conditions for solvability and finding suitable values for variables, we used the Z3 satisfiability solver [3]. For the representation of the AST, the ASTM-Framework is used [15].

(11)

• Frama-C

Frama-C is an extensible tool that can be used for static analysis of C code. The devel-opers used OCaml as implementation language [4]. The Frama-C tool can be extended by plugins, which allow the developer to extend the functionality, if needed. Such plugins can also be used to export the collected data [5].

Several plugins are provided by default. A plugin can use other plugins. This fact makes the tool very powerful [5]. Additional information on Frama-C can be found in Chapter 3.

• Z3

Z3 is a theorem prover developed by Microsoft. It allows checking logical expressions for solvability. A logical expression is defined as a set of and-concatenated assertions. If all assertions (and therefore the whole condition) are satisfiable, the tool provides a model (a possible variable assignment) that satisfies the logical term [3].

Z3 is provided as standalone program, or as a jar-file that can be imported to use it in a Java project [3]. In the tool chain, we used the Java jar-file, since the system should check solvability automatically. Z3 is an important tool for the symbolic execution method of the tool chain.

• ASTM

ASTM is a Java project consisting of classes to represent and create an abstract syntax tree (AST). It includes all needed classes to represent a C program, and therefore fits the needs of an AST framework. Thus, it is not necessary to implement an AST implementation.

The ASTM framework is used in the model abstraction step of the tool chain. This AST is also used as basis for the symbolic execution step of the tool chain.

1.2 Structure of the Thesis

(12)

Chapter 2 will give an overview of the important analysis methods used in this thesis work.

The next Chapter 3 describes value analysis with Frama-C and the plugin which was used to improve and export the data of the tool. Futhermore, it describes the Java wrapper. This chapter also contains information about created models.

The next chapter describes the symbolic execution. It explains the underlying approach in-cluding some implementation details and the algorithm that performs the symbolic execution. This chapter further contains a description of a use case of symbolic execution, the generation of an automaton model.

(13)

Chapter 2 Static Program Analysis

In this first chapter, I want to give an overview of static program analysis methods. Thus, information about static program analysis in general and additional information about some specific methods and concepts are provided.

Static program analysis in general is a technique which deals with the analysis of programs, in order to extract metrics, find bugs, analyze the behavior or create documentation. When performing static analysis, the program is not executed, but the source code, i. e. its statements, control structures, functions, datatypes and other relevant structures, is analyzed [4].

There are several methods that can be used in order to perform static analysis. In the following sections, I want to introduce the most important underlying concepts that were used in this thesis. As we will see, although each of the methods can be used and gives results by their own, especially a combination of them can lead to powerful, insightful analyses.

Listing 2.1 introduces a C function which will be used as a running example to explain the analysis methods. This short listing contains a function with one input parameter (x) and one local variable (y), both of type int. In the first if and its then-block, the absolute value of the parameter is calculated. Afterwards the value of the absolute input + 1 is assigned to y. At the end of the code, either -1 or y is returned. This depends on whether y is less or greater than zero.

(14)

1 i n t f u n (i n t x ) { i n t y ; 3 i f( x<0){ x∗=−1; 5 } y = x +1; 7 i f ( y < 0 ) { r e t u r n −1; 9 } r e t u r n y ; 11 }

Listing 2.1: Sample Program

2.1 Abstract Syntax Tree

An abstract syntax tree (AST) is a tree that represents a program under inspection. As we can see in Figure 2.1 the tree represents the whole program: Starting from the compilation unit, the program is split up into functions, which further contain different statements. Each of these statements is again split up further, until all leaves are atoms of the program, like constant values, operators and identifiers. Note that it is important that the order of statements and fragments is still preserved. Because of that, it is possible to reproduce the program.

The AST can be used for different purposes:

• Optimizations

With the help of the AST, several optimizations can be done. This can reduce the size of the AST significantly, and therefore the corresponding code. Such optimizations are a first step in a compiler for producing optimized code.

• Basis for analysis

Since the AST is an exact representation of the program, it is the basis for program analysis. For example, it can be used for finding bad code smells or design problems [14].

(15)

C om pi la tio nU ni t Pr og ra m Sc op e Fu nc tio nD efi ni tio n Fu nc tio nS co pe Fo rm al Pa ra m ete r-D efi ni tio n U nn am ed Ty pe R efe re nc e Lo ng In te ge r D ec la ra tio nO rD efi ni tio n-Sta te m en t Va ria bl eD efi ni tio n N am e= y U nn am ed Ty pe -R efe re nc e Lo ng In te ge r IfS ta te m en t Bi na ry Ex pr es si on Id en tifi er R efe re nc e na m e= x In te ge rL ite ra l va lu e= 0 Le ss Bl oc kS ta te m en t Ex pr es si on -Sta te m en t Bi na ry Ex pr es si on Bi na ry Ex pr es si on M ul tip ly Id en tifi er R efe re nc e N am e= x U na ry Ex pr es si on U na ry M in us In te ge rL ite ra l va lu e= 1 Ex pr es si on Sta te m en t Bi na ry Ex pr es si on Id en tifi er R efe re nc e na m e= y As si gn Id en tifi er R efe re nc e na m e= x Bi na ry Ex pr es si on As si gn Id en tifi er R efe re nc e N am e= x In te ge rL ite ra l Va lu e= 1 Ad d IfS ta te m en t Bi na ry Ex pr es si on Id en tifi er R efe re nc e na m e= y In te ge rL ite ra l va lu e= 0 Le ss Bl oc kS ta te m en t R etu rn Sta te m en t U na ry Ex pr es si on In te ge rL ite ra l va lu e= 1 U na ry M in us R etu rn Sta te m en t Id en tifi er R efe re nc e N am e= y U nn am ed Ty pe -R efe re nc e Lo ng In te ge r

(16)

2.2 Control Flow Graph

A control flow graph (CFG), is a graph representation of the control flow of a procedure. Each of the statements is represented as a node in the graph, while successor statements are represented by connecting edges.

Thus, CFGs are a good way to represent a piece of code in a visual way, since it shows the control dependencies between the different statements. Because of that, it helps to get a better understanding of the program.

There are two different ways to represent calls to sub-procedures in a CFG. It can either be constructed context-sensitive or context-insensitive:

• Context-sensitive CFG

One approach is to represent calls to other procedures by inlining the CFG of the called function. Therefore, the CFG is more detailed, since the whole graph with all calls is represented. The disadvantage of this approach is a much bigger CFG. This is because multiple calls to one function are represented as separate subgraphs. Also, this approach is infeasible for recursive functions, since the number of inlining steps is unknown.

• Context-Insensitive CFG

The other approach is to simply provide a reference to the called function. The advantages and disadvantages are the exact opposite of the other approach: The CFG is smaller, but less detailed.

In Figure 2.2, we can see the visualization of the CFG of our sample program. Each ellipse represents a statement and each arrow shows the statements that can follow. As we can see, there are 4 different theoretical paths through the program, by going either the true- or false-path at each if.

The CFG is the basis for many more advanced analyses. For example, in combination with other techniques, unreachable paths can be identified. Such steps are relevant for optimization of program code, either automatically (by simply removing the statements) or by developers (by

(17)

Figure 2.2: CFG representation of sample program

providing information about unreachable code fragments). Furthermore, if the CFG is used for further analysis, such optimizations can improve the performance of the following tasks, since the number of nodes can be significantly smaller.

For example, as it can be seen in the CFG for the example program, in the second if the then-branch can never be executed (as stated in section 2.4), since y is definitely greater than zero. Therefore, the second if can be completely omitted. As a result, a smaller CFG can be created. This process is illustrated in Figure 2.3. However, to do this optimization, a simple value analysis is needed.

2.3 Data Flow Analysis

Data flow analysis is a technique that tries to determine the following information about data, like variables and their values:

(18)

Figure 2.3: Simplification of the CFG

• Definition of variables • Assignments to variables • Usage of variables

One important form of data flow analysis is reaching definitions. Reaching definitions describe the def-use dependencies of variables: For each variable definition in the form of an assignment, all statements that are dependent on that are detected. With the help of this information, expressions can be resolved by known definitions.

Figure 2.4 provides an example for reaching definitions: The assignment of value x + 1 to y is the definition part of the def-use information. The arrows show the corresponding statements that this definition reaches. That means, that definition y = x + 1 reaches statements if (y<0) and return y.

(19)

Figure 2.4: Example of Reaching Definitions

if a new value would be assigned to y in the second if before the return statement, only the if-statement itself would be dependent on the definition.

2.4 Value Analysis

The value analysis, as the name already tells, is the method of analyzing a program and calculate the values of its variables and other constructs, like arrays, pointers, structures and other memory areas at specific positions of the program. This is done without executing the program.

Clearly, values are strongly depend on the executed paths of the program, which might cause a broad variety of different possible values. Therefore, it is normally the case, that a single possible value at a specific position cannot be determined. At those cases, it is tried to determine a subset or range of values for the variable. Moreover, sometimes a variable’s value is an expressions that is dependent on other variables and constants. With this knowledge, ranges for different

(20)

results can be estimated, which again gives information about the program’s behavior.

Figure 2.5: Value analysis of example program

As we can see in our example (visualized in Figure 2.5), at the start of our function, x can have the whole range of all possible integer values. After the first if, it is obvious that x contains a positive value, since the absolute value of x has been calculated. Since y gets the value of x + 1, it is clear that y also contains a positive integer-value greater than 0. This observation helps us to identify the behavior of the function.

2.5 Symbolic Execution

In this subsection, we take a look at symbolic execution. As the name already tells, the program is analyzed with symbolic values, i. e., input parameters and global values are represented by symbolic value constants.

(21)

variables. Then, for all statements, each occurrence of the variables in the expressions is replaced by its symbolic value. At assignments, variables are either replaced by known values (like numbers), by symbols for unknowns, or by more complex expressions that already consists of constants and symbols. After completely performing the symbolic execution, there are only expressions and conditions left which are only dependent of the symbolic variables or fixed values.

Symbolic execution creates a binary tree of a program, where each node contains all statements until a branch statement (e.g. if or while) is detected. The condition of that branch splits up the current path into two new paths, the true and the false path of that branch. This tree represents all possible paths through the program. In case of loops this tree may be infinite.

Figure 2.6: Performing symbolic execution on sample program

In Figure 2.6 we can see the symbolic execution binary tree of the sample program. The

parameter x if the function is represented by symbolic value x1. Each node contains the

statements until a branch-statement is reached. We can see that for each variable, expressions are assigned that just contain our symbolic value or constants.

There are two principal ways how the results of symbolic execution can be used:

• Determine path conditions

When we traverse the generated tree, we can and-concatenate all the conditions and find out all path-conditions. A path condition is the condition under which this specific path is executed. With the help of this condition, which is only dependent of the input and global variables, we can determine if the path is solvable, and which input values are

(22)

needed in order to enter this path. This can be accomplished by an SMT solvers, e.g. Z3 [3].

• Detect formulas

Since each return-statement only contains expressions that are dependent of symbols for unknown input variables, the system finds a formula for the return value of the path that consists of symbols and constants. This can be used to find formulas that define the return value of a function when taking a specific path.

(23)

Chapter 3 Value Analysis with Frama-C

In this chapter, we take a look at the first part of the workflow, the Frama-C tool and the created plugin. Furthermore, we take a look at the Java wrapper, a class system that is used to represent the collected data and use it in other Java projects.

3.1 Frama-C

Frama-C is a tool for the static analysis of C program code. It combines several advanced analysis techniques which are implemented as plugins [10]. We take a look at the most important ones in the following subsection.

Frama-C has an extendable architecture and can be extended by plugins, some of them are included by default. Such plugins can interact with the Frama-C kernel or use other plugins to perform their analyses.

3.1.1 Analysis Techniques

Frama-C provides several advanced program analysis techniques for C programs. They are summarized in the following list:

(24)

• Value Analysis

One of the most important plugins available is the value analysis plugin, as it is the basis for many other plugins. It is an implementation of abstract interpretation [13]. This plugin is included by default. Value analysis provides information about possible value ranges and value sets for variables at all positions in the code [12].

The plugin is added to the execution of Frama-C by adding the parameter -val [12].

• Code Simplification / Normalization

Another very helpful feature provided by Frama-C is code simplification and normaliza-tion. Frama-C allows modifications of the analyzed code in order to have a reduced basis for its own analysis [10]. Since the code is simplified in the first step of the tool chain, the successor steps can also work with the simplified and normalized code.

In the following list, we can see some of the most important simplification steps:

– Loop conversion

All loops are converted to while-loops. For for-loops, this is done by splitting up the header into several statements and include them at the corresponding positions. Because of this feature, only one kind of loop has to be handled in the following phases.

– Condition splitting

Frama-C converts the conditions of conditional statements, so that each condition doesn’t contain any AND or OR-expressions. Therefore, each condition only contains one logical term. However, to achieve this, a statement has to be split into multiple statements. Frama-C adds goto statements to realize an equivalent behavior. This behavior can be suppressed by changing a flag before Frama-C is compiled.

– Only one return statement

Frama-C modifies the code in a way that a function only contains one return state-ment. This is done by introducing a variable _retres. For each return statement the return value is assigned to this variable and a goto is inserted to jump to the one return.

(25)

• ACSL Annotations and Verification

An important aspect of Frama-C is the possibility to annotate code with formal assertions. We can distinguish between two different kinds of annotations: function annotations like function contracts or global invariants, and statement annotations like assertions and loop invariants. A full list can be found in the Frama-C ACSL-description [11]. Those annotations are checked for correctness, and can therefore be used to find bugs in the code [11]. 1 i n t f u n (i n t x ) { i n t y ; 3 i f( x<0){ x∗=−1; 5 } //@ a s s e r t x >= 0 ; 7 y = x +1; i f ( y < 0 ) { 9 r e t u r n −1; } 11 r e t u r n y ; }

Listing 3.1: Annotated C code

Listing 3.1 gives an example of the sample function, including one annotation. Annota-tions are notated in comments that start with either //@ or /*@ [11]. Frama-C checks this assertion and finds out that it is valid. The result of this test in Frama-C can be seen in Figure 3.1. The code in this figure has been normalized by Frama-C. The green circle means that the assertion is fulfilled [11].

• Correctness

With the help of annotations, the user can specify a functional specification for the pro-gram or its parts. Frama-C then tries to prove that the propro-gram matches the specification. This feature helps us to identify implementation errors [10].

3.1.2 Usage

(26)

Figure 3.1: The result of the annotation test

• GUI

First, it can be used with a graphical interface. With the help of this interface, results can be seen interactively [10]. Figure 3.2 shows a part of the GUI of the tool. The results of the value analysis are shown in the bottom panel of the GUI, see Figure 3.3. As we can see, x, which is marked green, can contain a positive integer value.

Figure 3.2: The graphical interface of Frama-C

• Command Line

The second option is to use Frama-C with a command line tool. In this way, it is possible to perform an analysis of the program code and to export the results [10]. The command line variant can be executed by other tools to create data for further analyses.

(27)

Figure 3.3: The bottom panel of Frama-C showing the analysis results

• Plugins

I have already stated in chapter 1.1, that Frama-C is extendable by plugins. Plugins can be used by either the command line variant or the graphical interface, and is therefore an addition to the first two variants. Plugins support adding further analyses on the code that are not supported by the Frama-C kernel. Some useful plugins are already included by default. With the help of such a plugin, we can collect and export the data about statements, functions an structs. More information about the plugin system can be found in section 3.1.3, while the plugin implemented in this thesis is describe in section 3.2.

3.1.3 API and Plugin System

As we have heard already, plugins are a fundamental feature of Frama-C. In this section, I want to describe how the plugin system works.

Frama-C is written in OCaml [9], an object-functional programming language. Because of this, plugins also have to be implemented in OCaml.

For showing how plugins are implemented, Listing 3.2 contains the code for a simple hello world-plugin [9].

l e t help_msg = " o u t p u t a warm welcome message t o t h e u s e r "

2

module S e l f = P l u g i n . R e g i s t e r

4 ( s t r u c t

(28)

6 l e t shortname = " h e l l o " l e t h e l p = help_msg 8 end) 10 l e t run ( ) = l e t chan = open_out " h e l l o . o ut " i n 12 P r i n t f . f p r i n t f chan " H e l l o , w o r l d ! \ n "; c l o s e _ o u t chan 14 l e t ( ) = Db . Main . e x t e n d run

Listing 3.2: Hello World Plugin [9]

Basically, each Frama-C plugin consists of a function, often called run, which performs all the relevant tasks. Therefore, this function can be compared to the main function in other programs. In the example, the function opens an output channel for the file hello.out and prints the string Hello, world!\n.

Frama-C then provides a high order function Db.Main.extend that can be called with the run function as argument. When this call is made, we signal Frama-C to execute the specified function, and therefore performs the defined operations [9].

With the lines module Self = Plugin.Register and the struct argument, specific values for the plugin, like the name and a help-message, can be specified. In the listing, the plugins name is hello world, the shortname is specified as hello, and a help-message is provided in a String variable [9].

Calling this plugin is simple: the user just calls Frama-C with the parameter -load-script helloWorld.ml, which runs Frama-C with the specified plugin [9].

In order to receive data from Frama-C, a visitor has to be implemented. This is done by creating a class that inherits from one of the provided visitor classes, i.e., class Visitor.frama_c_inplace. The visitor classes provide access to the internal AST that is created and used by Frama-C. There are two different kinds of visitors, the inplace-visitor and the copy-visitor. While the first one enables read operations on the AST, the second one also allows modifications [9].

(29)

global structure and statement information. All of them are separately called for each file, global construct or statement, and therefore provide access to all relevant information. A created visitor can be called with the function Visitor.visitFramacFileSameGlobals [9]. 1 ( ∗ ∗ v i s i t o r t h a t c a l c u a l t e s a l l f u n c t i o n −names ∗ ) c l a s s g e t _ f u n c t i o n s _ v i s i t o r = 3 o b j e c t ( s e l f ) i n h e r i t V i s i t o r . f r a m a _ c _ i n p l a c e 5 v a l mutable l i s t = [ ] ; 7 method ! vglob_aux ( g l o b : C i l _ t y p e s . g l o b a l ) = 9 match g l o b w i t h | GFun ( f u n d e c , _) −> 11 l e t function_name = f u n d e c . s v a r . vname i n l i s t <− function_name : : l i s t ; 13 C i l . DoChildren | _ −> C i l . DoChildren 15 method g e t _ l i s t ( )= L i s t . r e v l i s t 17 end

Listing 3.3: A simple Visitor

Listing 3.3 shows a visitor class that is responsible for collecting all names of functions in a program. It consists of two methods, one for finding all function names (vglob_aux) and one for retrieving the result (get_list). Furthermore, it contains a mutable list that stores the collected names. This list initially is empty. The function vglob_aux is called for every global construct. It checks the type of the construct with pattern matching, and if it is a function (type GFun), the name of the function is retrieved and stored as new head of the list. Other types are ignored by the method. For both cases, Cil.DoChildren is called so that the child objects are processed. The created list can be retrieved in reversed order by calling the method get_list.

As we have mentioned in previous sections, the value analysis plugin of Frama-C is important, since it is widely used. Therefore, plugins have to be able to access the results of the value analysis. Because of that, Frama-C provides several functions to check and receive data of the value analysis [9]:

(30)

• Db.Value.is_computed returns if the analysis has been performed • !Db.Value.compute performs a value analysis

• Db.Value.get_stmt_state gets the information about the results for a specific statement • Db.Value.is_reachable checks if a statement is reachable

In Listing 3.4 we can see a method of a visitor that uses the value analysis functions: It gets the value state for a specific statement, checks the reachability, which is also provided by the value analysis, and then performs further operations if the statement is reachable. The mentioned function vstmt_aux is called for every statement. That current statement is accessible by the variable stmt. The state of this statement is retrieved with Db.Value.get_stmt_state, then the reachability of the statement is checked by calling Db.Value.is_reachable. If it is reachable, the local and formal variables of the function are collected and additional operations are performed in another function, iter_vi.

1 method ! vstmt_aux stmt = l e t s t a t e = Db . Value . g e t _ s t m t _ s t a t e stmt i n 3 ( ∗ Only r e a c h a b l e s t a t e m e n t s ∗ ) i f Db . Value . i s _ r e a c h a b l e s t a t e t h e n ( 5 l e t v a r s = L i s t . append ( K e r n e l _ f u n c t i o n . g e t _ f o r m a l s k f ) ( K e r n e l _ f u n c t i o n . g e t _ l o c a l s k f ) i n s e l f#i t e r _ v i v a r s stmt ; 7 ) ;

Listing 3.4: Usage of value analysis functions

3.2 Plugin for Data Collection

In this section we want to take a look at the implemented plugin. It consists of several functions and classes that collect data about statements (id, function id, successors, predecessors ...), structs (name of the struct, names of the members, ...) and functions (id, name, their member statements, ...). This is done by implementing several visitor classes that access the collected data from Frama-C. Furthermore, it performs a value analysis to collect data about the values of variables at each position. Then it combines the relevant data and exports it.

(31)

To perform those tasks, different new types are introduced. The created types partly consist of some Frama-C interal types, which are not presented in the following code snippets. However, their name are significant enough to understand their expected purpose.

In the following the C code from Listing 3.5 will be used as a running example. It is a short C function with an if statement and some variable assignments. For the following steps, the statement y=x; with id 4 is inspected.

1 i n t f u n ( ) { // i d =1 i n t x =0; // i d =1 3 i n t y ; // i d =2 i f ( x>=0){ // i d =3 5 y=x ; // i d =4 } 7 e l s e{ y=x +1; // i d =5 9 } r e t u r n y ; // i d =6 11 }

Listing 3.5: Sample code for plugin usage

The implemented plugin works in the following steps:

1. Collect Basic Data

First, a visitor object is created that finds all the statements, structs, types and functions. To do this, the functions vglob_aux and vstmt_aux are overwritten to visit all global constructs and statement. For each of these items, the relevant data is collected and stored as elements of the following types, cf. Listing 3.6.

• The type cfg_statement stores data about a statement: funid is the id of the cor-responding function, id the id of the statement, is_starter defines if the statement is the first one of a function, text contains the statement text, stmt_kind defines the statement kind, succ is a list of all successor ids and pred contains the ids of all predecessors of the statement.

• Elements of type function_knowledge contain information about a function: fname contains the name and the type of the function, formals contains a list with all

(32)

formal parameters of the function, and locals contain information about all local variables of the function.

• struct_knownledge contains information about a struct in the program, including the id (str_id), the name (str_name) and the fields (str_fields).

• The type typedef stores information about a type definition, i.e. the name (type_name) and the underlying type (type_type).

1 t y p e c f g _ s t a t e m e n t = { f u n i d : i n t ; 3 i d : i n t ; i s _ s t a r t e r : b o o l ; 5 t e x t : s t r i n g ; stmt_kind : s t r i n g ; 7 s u c c : i n t l i s t ; p r e d : i n t l i s t 9 } ; ; 11 t y p e f u n c t i o n _ k n o w l e d g e = { fname : v a r i n f o ; 13 f o r m a l s : v a r i n f o l i s t ; l o c a l s : v a r i n f o l i s t ; 15 } ; ; 17 t y p e s t r u c t _ k n o w l e d g e = { s t r _ i d : i n t ; 19 str_name : s t r i n g ; s t r _ f i e l d s : f i e l d i n f o l i s t ; 21 } ; ; 23 t y p e t y p e d e f = { 25 type_name : s t r i n g ; type_type : C i l _ t y p e s . typ ; 27 } ; ;

Listing 3.6: Basic types for statements and functions

Furthermore, we store ids of blocks, like in then- and else-blocks of an if, and the ids of the statement in this block in lists. For the running example, a map is stored that

(33)

combines the if statement (id 3) with the then-block-statements (only one statement, id 4), and with the else-statements (also only one statement, id 5).

As examples, Listings 3.7 and 3.8 show the data elements for statement 4 and for the function fun. All known values are assigned to the corresponding variables in the element of the type. 1 c f g _ s t a t e m e n t : f u n i d = 1 3 i d = 4 i s _ s t a r t e r = f a l s e 5 t e x t = " y=x " stmt_kind = " Assignment " 7 s u c c = L i s t ( 3 ) p r e d = L i s t ( 6 )

Listing 3.7: Initial knowledge of statement 4

f u n c t i o n _ k n o w l e d g e :

2 fname = v a r i n f o ( i n t , " f u n ", . . . )

f o r m a l s = L i s t ( )

4 l o c a l s = L i s t ( v a r i n f o ( i n t , " x ", . . . ) , v a r i n f o ( i n t , " y ", . . . ) ) Listing 3.8: Knowledge of the function fun

2. Calculate Conditions

As a next step, the plugin finds the condition under which a statement is executed. There-fore, we use the function collect_further_data which starts the condition retrieval pro-cess. It starts with the first statement of each function, which has no condition. Now the algorithm walks down all paths of the function until every statement is processed. If a branch statement, e.g., an if statement, is found, we store the condition. Each successor gets the condition in the original or negated form.

At the join of two paths (a statement has multiple predecessors), if a found condition is the negated version of another condition, the plugin can remove both, since the following statements are no longer dependent on them. Information about such statements are stored in elements of another data type that reuses the former type (see Listing 3.9). cfg_conditions_statement contains an element of the former type cfg_statement, called linkdata. Furthermore, it contains a list of cond elements to represent the

(34)

con-ditions. Such a cond element consists of the condition text cond_text and a bool that specifies if the condition is negated (variable negated).

t y p e cond = { 2 cond_text : s t r i n g ; n e g a t e d : b o o l 4 } ; ; 6 t y p e c f g _ c o n d i t i o n _ s t a t e m e n t = { l i n k d a t a : c f g _ s t a t e m e n t ; 8 c o n d s : cond l i s t } ; ;

Listing 3.9: Datatype for statement with conditions

.

Listings 3.10 and 3.11 show the updated stored values for the inspected statement. As we can see, the condition contains the text and the negated flag, and the updated statement contains a link to the former created statement data and the condition.

1 cond :

cond_text = " x>=0" 3 n e g a t e d = " f a l s e "

Listing 3.10: Stored values for a condition

.

1 c f g _ c o n d i t i o n _ s t a t e m e n t :

l i n k d a t a = ( ∗ c f g _ s t a t e m e n t from f o r m e r s t e p ∗ )

3 c o n d s = L i s t (( ∗ c r e a t e d cond e l e m e n t ∗ ))

Listing 3.11: Stored values for the updated statement

.

3. Build Function List

As a next step, a visitor creates a list of all function names. This visitor was shown in Listing 3.3. This list is used later to iterate over all functions.

(35)

4. Value Analysis

The plugin needs the results from the value analysis to get the information about possible values. Therefore, !Db.Value.compute(); is called, which executes the value analysis for the current code.

5. Collect Value Analysis Results

After we have performed the value analysis, we have to collect the data. Therefore, we use a further visitor which gets the values for each variable at each statement. However, sometimes the algorithm doesn’t have knowledge about values of variables. In such cases the analysis result is not included, only real knowledge is stored. Additional calculations for values are done in the Java wrapper part of the work chain.

The type for the variable knowledge is shown in Listing 3.12. Elements of this type store the name of the variable (name), the name of the function (fun_name) and the value as string (value), so all possible values of different types can be represented.

1 t y p e var_data = {

name : s t r i n g ;

3 fun_name : s t r i n g ;

v a l u e : s t r i n g

5 } ; ;

Listing 3.12: Datatypes to represent results of the value analysis

Listing 3.13 show the collected value of variable x at the inspected statement (id 4), containing variable name "x", function name "fun" and value "0".

1 var_data :

name = " x "

3 fun_name = " f u n "

v a l u e = " 0 "

Listing 3.13: Stored value knowledge for variable x

.

Now, hashtables are created that map a statement id to a list of known variables. There are two such tables, one for the variables before the statement, and one for variables after

(36)

id value before value after 1 x = 0 2 x = 0 x = 0 y = undefined 3 x = 0 y = undefined x = 0 y = undefined 4 x = 0 y = undefined x = 0 y = 0 5 x = 0 y = undefined x = 0 y = 1 6 x = 0 y = 0

Table 3.1: Known values for each statements

the statement. Those tables can be accessed by two getter-functions (get_table and get_table2) of the visitor class.

Table 3.1 shows the values before and after every statement of the example program:

6. Combine Statements with Value Tables

As a next step, the statements and the value tables are combined. This is done based on the stored statement id. This results in data that is represented by a new type, see Listing 3.14. This new type reuses the former type cfg_condition_statement in the variable static_data and adds the two new lists vars and vars_before. vars stores the known values after the statement, vars_before stores the known values before the statement. t y p e c f g _ v a r d a t a _ s t a t e m e n t = { 2 s t a t i c _ d a t a : c f g _ c o n d i t i o n _ s t a t e m e n t ; v a r s : var_data l i s t ; 4 v a r s _ b e f o r e : var_data l i s t } ; ;

Listing 3.14: Datatypes to represent the combination of statement data and values

.

For the running example, Listing 3.15 shows the stored data for statement 4 after the value analysis has been performed. The element consists of the former collected statement

(37)

data and two lists that represent the calculated values. 1 c f g _ v a r d a t a _ s t a t e m e n t :

s t a t i c _ d a t a = ( ∗ c o n t a i n s e l e m e n t from e a r l i e r s t e p ∗ )

3 v a r s = L i s t ( var_data (" x "," f u n "," 0 ") , var_data (" y "," f u n "," 0 ") )

v a r s _ b e f o r e = L i s t ( var_data (" x "," f u n " ," 0 ") )

Listing 3.15: Stored data after the value analysis

7. Tokenize Data

In order to have additional information about the statements, conditions and functions, the stored strings are parsed to tokens. Such tokens contain additional information, like the type of token, which makes it easier to parse and use it in the following phases of the work chain. Therefore some additional types are created. This can be seen in Listing 3.16. Type tokenkind defines the kind of a token. Furthermore, a token consists of a tokenname of type string, which represents the token’s value. The type cond_token con-sists of the tokenized conditions (tokens) and the former flag negated. cfg_tokenized contains the earlier collected information without references to older statement types for easier exporting. The tokens list stores the tokenized statement text. The other two types token_function_knowledge and token_struct_knowledge contain the same values as before, but in tokenized form.

t y p e t o k e n k i n d= 2 | Name | VarName 4 | FunName | O p e r a t i o n 6 | S k i p | Number 8 | Char | S t r i n g 10 | Keyword | L a b e l 12 ; ; 14 t y p e t o k e n = { k i n d : t o k e n k i n d ; 16 tokenname : s t r i n g } ; ; 18 t y p e cond_tokens = {

(38)

20 t o k e n s : t o k e n l i s t ; n e g a t e d : b o o l 22 } ; ; 24 t y p e c f g _ t o k e n i z e d = { t i d : i n t ; 26 t o k e n s : t o k e n l i s t ; i s _ s t a r t e r : b o o l ; 28 stmt_kind : s t r i n g ; s u c c : i n t l i s t ; 30 p r e d : i n t l i s t ; c o n d s : cond_tokens l i s t ; 32 v a r s : var_data l i s t ; v a r s _ b e f o r e : var_data l i s t ; 34 f u n i d : i n t } ; ; 36 t y p e t o k e n _ f u n c t i o n _ k n o w l e d g e = { 38 t v i d : i n t ; tname : t o k e n l i s t ; 40 t f o r m a l s : t o k e n l i s t l i s t ; t l o c a l s : t o k e n l i s t l i s t ; 42 } ; ; 44 t y p e t o k e n _ s t r u c t _ k n o w l e d g e = { t _ s t r _ i d : i n t ; 46 t_str_name : s t r i n g ; t _ s t r _ f i e l d s : t o k e n l i s t l i s t 48 }

Listing 3.16: Datatypes after tokenizing step

Listings 3.17 and 3.18 show the stored data for statement 4 after the tokenizing step: The condition consists of a list of token elements and the former negated flag. cfg_tokenized contains all the former collected data, but the statement’s text and the condition are stored in tokenized form.

cond_tokens :

2 t o k e n s = L i s t ( t o k e n ( VarName ," x ") , t o k e n ( O p e r a t i o n ,">=") , t o k e n ( Number ," 0 ") )

n e g a t e d = f a l s e

(39)

1 c f g _ t o k e n i z e d : t i d = 4 3 t o k e n s = L i s t ( t o k e n ( VarName ," y ") , t o k e n ( O p e r a t i o n ,"=") , t o k e n ( VarName ," x ") ) i s _ s t a r t e r = f a l s e 5 stmt_kind = " Assignment " s u c c = L i s t ( 6 ) 7 p r e d = L i s t ( 3 )

c o n d s =L i s t (( ∗ one e l e m e n t w i t h cond_tokens from above ∗ ))

9 v a r s = L i s t ( var_data (" x "," f u n "," 0 ") , var_data (" y "," f u n "," 0 ") )

v a r s _ b e f o r e = L i s t ( var_data (" x "," f u n " ," 0 ") )

11 f u n i d = 1

Listing 3.18: The stored statement after tokenizing the data

8. Export

As the last step, we export the collected data to several files, where each line contains all the collected data, seperated by multiple ";", which makes it easily parsable for other applications. Listing 3.19 contains the export data for the example function fun. For an import, first split the line at the highest number of successive ";" characters, e.g. for the function export ";;;; to get the id, the function tokens, the formal parameter tokens and the local variable tokens. Most of this parts can be further split at successive ";" characters.

1 6 7 ; ; ; ; i n t ; Keyword ; ;f u n; Name ; ; ; ; ; ; ; ; ; ; i n t ; Keyword ; ; x ; Name ; ; ; ; ; i n t ; Keyword

; ; y ; Name ; ; ; ; ;

Listing 3.19: A sample export line for the function fun

3.3 Java Wrapper

In this section, we take a look at the Java wrapper. The Java wrapper is the part of the project where the exported data of the plugin is read and parsed. The parsed data is then stored in several Java objects that represent the exported data, including references to other objects (e.g. successors,...) and some additional information, such as stored values with conditions. Those objects are the basis for the next steps in the tool chain.

(40)

The objects of the Java wrapper serves as the interface to other parts of the tool chain. The objects are accessible via a defined interface class DataProvider.

Figure 3.4 provides an overview of the architecture of the Java wrapper.

There are 3 fundamental classes that map the collected data of the OCaml types: FramaStatement, FramaFunction and FramaStruct. As the names already tell, objects of that kinds represent statements, functions and structs with all the relevant information that is needed for analyses.

FramaStatement represents the imported data of OCaml type cfg_tokenized. It contains an id, the id of the corresponding function, a list of tokens that represent the statement, the successors and predecessors, conditions under which the statement is executed and knowledge of variables at that statement.

FramaFunction contains the data of OCaml type token_function_knowledge. It contains the id, the name and the type of the function, the list of statements in the function and the start statement of the function.

FramaStruct represents data of token_struct_knowledge. It contains information about its id, the name and the member variables of the struct.

As we can see, there are several enumerations used for specifying the kinds of object. I use this approach instead of the approach of creating subclasses, because the behavior of the dif-ferent kinds is not that difdif-ferent, but still has to be distinguished. However, FramaStatement has subclasses to represent if (FramaStatementIf), while (FramaStatementWhile) and their corresponding blocks (FramaStatementBlock). This is needed later on to build the statement hierarchy.

The class VarInfo, the corresponding class for the Frama-C type varinfo, contains the informa-tion about variables, like their names and types, and if it is a pointer variable. This informainforma-tion is used in functions to specify the local variables and the parameters, and in structs, to specify the members of them.

The class VarData, which represents the OCaml type var_data, and VarDataWithConds rep-resent the knowledge of variables for each statement. Therefore, each statement has a list of

(41)

such objects. They contain information about the variable, the current value and the condition under which the variable has the value. To represent such dependencies, the Condition class is used, which contains the corresponding conditions. Conditions are represented by a list of Token objects, and the corresponding class includes some additional functions. The values for variables can vary, since different paths lead to different values. We calculate the value in two private functions of FramaCDataHolder, collectFurtherData and searchParentData. In those functions, a statement and all predecessors paths are analyzed to get all possible current values and their conditions.

The access point for the data is the class DataProvider. It contains multiple getter methods to get all functions, statements, structs and global variables. The DataProvider class uses the Parser to read the export files of the plugin and creates the corresponding lists. As we can see, the Parser provides methods to retrieve lists of different objects that contain Frama-C data: parseStructData, parseStatementData and parseFunctionData. Those functions are called by the DataProvider to create the FramaCDataHolder.

3.4 Model Abstraction

This chapter describes the model abstraction phase of the tool chain. This phase uses the results of Frama-C that are provided through the Java wrapper and builds and provides representations of the program, like an AST, a CFG and value sets. These representations can either get visualized or be used for further analysis steps.

3.4.1 Program Representations

As the first step, the different kinds of program representations are described. Three different representations are provided and are shortly describe in the following subsections. A more detailed description of the representations and their implementations is then provided in the next sections.

(42)

Fr am aF u nc ti on -id : in t -b lo ck St m tI d s: Lis t< In te g er > C ons tr uc to r ge tt e rs Fr am aS ta tem en t # fu n id : in t # id : in t # is St ar te r: b o o le an # o rig in al C alc To ke n s: St rin g > # co n d it io n s: Lis t< C o n d it io n > # su cc In ts : Lis t< In te g er > # p re d In ts : Lis t< In te g er > C ons tr uc to r co m p a re To a cc es s m e tho ds Fr am aS ta tem en tIf -t h e n B lo ck Id s: Lis t< In te g er > -e ls e B lo ck Id s: Lis t< In te g er > C ons tr uc to r a cc es s m e tho ds Fr am aS ta tem en tW h ile -b lo ck St a te m en tI d s: Lis t< In te g er > C ons tr uc to r a cc es s m e tho ds Fr am aS ta tem en tB lo ck -b lo ck St a te m en tI d s: Lis t< In te g er > C ons tr uc to r a cc es s m e tho ds Fr am aS tr u ct -id : in t -n am e : St rin g Fr am aSt ru ct (in t, St rin g , L is t< V ar In fo > ) ge tt e rs To ke n -t e xt : St rin g + To ke n (St rin g , T o ke n K in d ) + e q u als (O b je ct ): b o o le an a cc es s m e tho ds V ar In fo -n am e : St rin g -t y p e: St rin g -n u m O fP o in te r: in t + V ar In fo (L is t< To ke n > , V ar In fo T yp e) ge tt e rs V ar D at a -n am e : St rin g -v alu e : St rin g # V ar Da ta (St rin g , St rin g , V ar Da ta K n o w le d g e) # e q u als (O b je ct ): b o o le an a cc es s m e tho ds V ar D at aW it h Co n ds -a ct iv e: b o o le an + V ar Da ta W it h C o n d s( St rin g , St rin g , V ar Da ta K n o w le d g e, Lis t< C o n d it io n > ) + ge tC o n d sA sSt rin g( ): St rin g + to S tr in g( ): St rin g a cc es s m e tho ds C on d it io n -n e ga te d : b o o le an + C o n d it io n (L is t< To ke n ) -e q u alT o ke n (C o n d it io n ) + e q u als (C o n d it io n ) + e q u als N e ga te d (C o n d it io n ) + to S tr in g( ): St rin g ge tt e rs D at aP ro vi d er + Da ta P ro vid e r() ge tt e r Pa rs er + p ar se St ru ct Da ta () : Lis t< Fr am aSt ru ct > + p ar se Fu n ct io n Da ta () : Lis t< Fr am aFu n ct io n > + p ar se St a te m en tDa ta () : Lis t< Fr am aSt at em e n t> + p ar se G lo b a lDa ta () : Lis t< V ar In fo > -g e tS ta te m e n tK in d (St rin g ): St at e m en tK in d -g e tT o ke n K in d (St rin g ): T o ke n K in d Fr am aC D at aHo ld er 1 1 * co n ta in s to ke n s h as P re de ce ss o rs h as S u cc es so rs kn o w s 1 1 h as T yp e an d N am e h as F o rm al s h as s ta rt f u nc ti on co n si st s of h as L oc al s << E n u m e ra tio n >> To ke n K in d N am e N u m b e r O p er a tio n K e yw o rd Sk ip C h a r St rin g La b e l is co n si st s of h as co n ta in s h as t he n st at e m en ts h as e ls e s ta te m e nt s co n si st s of co n ta in s u se s C ons tr uc to r -c o lle ct Fu rt h er Da ta (Fr am aSt at em e n t, M a p < Fr am aSt at em e n t, M a p < St rin g , L is t< V ar Da ta W it h C o n d s> >> ): v o id -s e ar ch Pa re n tDa ta (Fr am aSt at em e n t, St rin g , L is t< Fr am aSt at em e n t> , M a p < St rin g , L is t< V ar Da ta W it h C o n d s>> ): v o id ge tt e rs P ro vi d es in fo rm at io n ab o ut st or es in fo rm at io n a b ou t st or es in fo rm at io n a b ou t << E n u m e ra tio n >> V ar In fo Ty p e Fo rm a l Lo ca l Fu n ct io n St ru ct G lo b al _is of t yp e << E n u m e ra tio n >> V ar D at aK n ow led ge K n o w n U n kn o w n is o f t yp e << E n u m e ra tio n >> St at em en tK in d A ss ig n m e n t R e tu rn G o to B re ak C o n tin u e If Sw it ch Lo o p B lo ck O th e r Tr y Th ro w W h ile is o f t yp e

(43)

3.4.1.1 ASTM

The first created program representation is an abstract syntax tree (AST). The AST is created with the Modisco ASTM framework [6], which is an implementation of the ASTM standard from the OMG [6]. This framework is chosen, because it provides all the necessary classes to represent the language features of C, like classes for Expressions, Statements, FunctionDefini-tions, Identifiers and IntegerLiterals. Therefore, no additions to this framework were necessary and it can be used without modifications. This is not the case for other languages, therefore additional steps might be necessary when the tool chain should be used for another language.

3.4.1.2 CFG

For representing the control flow graph (CFG) of a program, the Java wrapper already contains the necessary information. All statements already have references to their successor and prede-cessor statements and their functions, while the functions handle the list of their statements.

However, this information is converted to a Heros CFG [8] from the Soot framework [7]. Heros is a generic framework that defines methods on an CFG to traverse the graph. Therefore, ap-plications that use the HerosCFG interface are able to use the created CFG implementation via the predefined methods. To use this HerosCFG implementation, a class had to be overwritten and additional nodes for statements and functions were implemented.

3.4.1.3 Value Sets

As mentioned in Section 3.3, the Java Wrapper contains information about variable values at code positions. This information in the form of value sets is an important information for the model abstraction steps.

(44)

3.4.2 Architecture

Now we take a look at the overview of the classes that were used in this part of the implemen-tation. A class diagram is provided in Figure 3.5.

Analyzer -ast: CompilationUnit -cfg: HerosCFG +printAst(CompilationUnit): void +getAst(): CompilationUnit +getCfg(): HerosCFG FramaCToInfoStructureTransformation -unit: CompilationUnit -factory: GastmFactory -cfg: HerosCFG -framaFunctions: List<FramaFunction> -framaStructs: List<FramaStruct> -framaGlobals: List<VarInfo> -collector: ist<StatementNode> -statements: List<StatementNode> -methods: List<FunctionNode>

additional temporary collections

-calculateStack(): void

-handleStatement(FramaStatement): void

-parseExpression(List<Token>, StatementNode): Expression

-evalExpression(Stack<Expression>, Stack<OperationStackElement>): Expression -isDataType(String): boolean +getUnit(): CompilationUnit +getCfg(): HerosCFG OperationStackElement -factory: GastmFactory -operation: String +getOperation(): String +getPriority(): int +getOperatorFromKind(): BinaryOperator Utils -factory: GastmFactory +stringToTypeReference(String): TypeReference FunctionNode -statements: List<StatementNode> -framaFunction: FramaFunction -functionDefinition: FunctionDefinition -functionName: String getters HerosCFG -functions: List<FunctionNode> -statements: List<StatementNode> +getFunctions(): List<FunctionNode> +getStatements(): List<StatementNode> +getStarters(): List<StatementNode> +getMethodOf(StatementNode): FunctionNode +getPredsOf(StatementNode): List<StatementNode> +getSuccsOf(StatementNode): List<StatementNode> +getCalleesOfCallAt(StatementNode): Collection<FunctionNode> +getCallersOf(FunctionNode): Collection<StatementNode> +getCallsFromWithin(FunctionNode): Set<StatementNode> +getStartPointsOf(FunctionNode): Collection<StatementNode> +getReturnSitesOfCallAt(StatementNode): Collection<StatementNode> +isCallStmt(StatementNode): boolean +isExitStmt(StatementNode):boolean +isStartPoint(StatementNode):boolean +allNonCallStartNodes(): Set<StatementNode>

+isFallThroughSuccessor(StatementNode, StatementNode): boolean +isBranchTarget(StatementNode stmt, StatementNode succ):boolean

StatementNode -successors: List<StatementNode> -predecessors: List<StatementNode> - method: FunctionNode - framaStatement: FramaStatement -astmStatement: Statement -isCall: boolean -calledFunctions: Collection<FunctionNode> -calledFunctionsStrings: Collection<String>

setters and getters

uses creates VarInfoAstData #expr: Statement #def: Definition +parseVariable(): void + pointerAnalysis(): void getters uses

DataProvider (part of Java wrapper)

receive data

(45)

The Analyzer class is the interface for accessing the relevant information. When the constructor is called, this class retrieves the data from the Java wrapper project (class DataProvider) and starts the transformation of the data to the respective representations. After the representations are created, they can be retrieved from this class by calling two getter functions, getAst and getCfg.

The FramaCToInfoStructureTransformation class is the second important class. It receives the Frama-C data from the Analyzer and creates the AST by using the ASTM framework and the CFG by implementing a HerosCFG. To achieve this goal, each structure, function and the corresponding statements are parsed (e.g. their tokens, successors etc. are analyzed) and the ASTM and CFG objects are created.

ASTM is a framework that consists of several packages and classes to build and manage abstract syntax trees. An important class of this packages is GastmFactory. This class provides the functions to create ASTM nodes and an instance of this class is heavily used in the program to build the AST.

A difficulty in the AST is the correct creation of expressions. Operands and operators are just provided as tokens. Therefore, to build the AST, the correct operator precedence has to be evaluated. This problem is handled by the methods parseExpression and evalExpression. The method parseExpression builds two stacks for operands and operators of a statement. Therefore, the function has to identify brackets (brackets for a function call vs brackets in operations), check if "-" is a subtraction operator or sign for negative values, and many other similar problems. This is done by identifying the type of the token in front of the current token (e.g. if an operation token is in front of a "-" sign, the "-" denotes a negative number instead of the subtraction). Then the method stores the correctly identified operator in the class OperationStackElement. This class stores the operator and provides getter functions (getOperation, getPriority and getOperatorFromKind) to access the stored data. That means, the function builds two stacks, one for the created operators and one for the operands.

After the stacks for a statement are built, the function calls the method evalExpression, which then processes the stacks to build a single expression. The method evalExpression receives the created stacks and transforms them into arrays. It iterates over those arrays and finds the operator with the highest priority, creates an expression and combines the corresponding operands. The arrays are reduced and the next operator is searched. When the whole arrays

(46)

were processed, only one expression is left, which represents the final expression. This expression is returned.

HerosCFG is a class that handles FunctionNodes and StatementNodes. It provides several functions that can be used to traverse the CFG (getSuccsOf), find the start statements (getStartPointsOf), check if a statement is an exit statement of a function (isExitStmt) and many others.

Building a HerosCFG works as follows: For each statement and function, a corresponding FunctionNode and StatementNode is created. Since all the needed data is provided by the Java wrapper objects or produced when the AST is generated, the CFG can be built in a straight forward manner.

The class StatementNode represents a statement in the CFG. It contains the correspond-ing ASTM statement node and the original FramaStatement. Therefore, the value sets can be accessed in the CFG. Furthermore, it contains additional information, like successors, predecessors and if the StatementNode contains a call.

The class FunctionNode represents a function in the CFG. It contains the FramaFunction, the FunctionDefinition of the AST, the name of the function, and a list of StatementNode that belong to this function.

The combination of the classes HerosCFG, FunctionNode and StatementNode represent a com-plete CFG and can be used by the following parts of the tool chain.

For the value sets, no additional classes are required, since we have all the information already collected in the former phase and stored in the object that are created by the Java wrapper.

The classes not shown in the diagram are additional helper classes that are needed for the transformation, i.e. Utils or GlobalBlock.

(47)

3.4.3 Application Scenarios

In this section we want to take a look at a usage scenario of the created models. The generated implementations were used to analyze and visualize functions and programs. Such visualizations are intended to support users in understanding program behavior.

One variant of model usage is to build model abstractions with value sets. A program is implemented that generates formulas and diagrams for assignments.

• Idea

The idea of model abstraction with value sets is to determine a formula or single value for a variable at a specific position in the code. Those formula is dependent on possible values of the input parameters and local variables. Those variables specify the paths through the program, and therefore the possible settings of the variables.

First, a configuration variable is set to restrict the possible paths through the program, which reduces the possible settings of the variables. When the path has been defined, the used variables can be set in different ways under different conditions, which leads to the final variable assignments. Furthermore, the variable’s value can be unknown if no specific value is set, but the input variable is used in the final assignment.

The user can specify which value is used for the relevant variables, and the formula for the expression can be evaluated. Then, a diagram that represents the assignment can be created.

• Approach

As a first step, the user defines the statement under inspection by providing the statement id. Furthermore, it is possible to set some config variables to fixed values. Such config variables are used in conditions of the program.

Next, the relevant variables of the specified statement are determined. For those vari-ables, the possible values and their conditions for the different paths are determined and represented.

(48)

doing this, the definite path to the statement under inspection is specified.

When all values are specified, the variables of the statement can be replaced by the selected values. With the help of a JavaScript interpreter, it is possible to simplify the expression, in some cases even to a single value.

As a last step, the program generates a diagram that represents the assignment operation.

• Example

In order to make this idea and the approach clearer, the function in Listing 3.20 is used. This function contains some variables and an if statement which is dependent on an input variable. This input variable represents a configuration for the program. In this if statement and nested if statements, local variables are set that are used in the assignment of variable r. Finally, r is set to (x+x)*y.

1 #d e f i n e c o n f 1 1 #d e f i n e c o n f 2 2 3 i n t f u n (i n t c o n f , i n t c ) { i n t x ; 5 i n t y ; i n t r ; 7 i f ( c o n f==c o n f 1 ) { i f ( c ) { 9 x =1; y =3; 11 } e l s e{ 13 x =2; y =4; 15 } } 17 e l s e{ x =5; 19 y =6; } 21 r = ( x+x ) ∗y ; r e t u r n r ; 23 }

A Tool Chain for Analysis and Model Abstraction of C Control Programs / submitted by Thomas Böhm