• Keine Ergebnisse gefunden

Value Analysis with Frama-C

3.2 Plugin for Data Collection

In this section we want to take a look at the implemented plugin. It consists of several functions and classes that collect data about statements (id, function id, successors, predecessors ...), structs (name of the struct, names of the members, ...) and functions (id, name, their member statements, ...). This is done by implementing several visitor classes that access the collected data from Frama-C. Furthermore, it performs a value analysis to collect data about the values of variables at each position. Then it combines the relevant data and exports it.

To perform those tasks, different new types are introduced. The created types partly consist of some Frama-C interal types, which are not presented in the following code snippets. However, their name are significant enough to understand their expected purpose.

In the following the C code from Listing 3.5 will be used as a running example. It is a short C function with an if statement and some variable assignments. For the following steps, the statement y=x;with id 4 is inspected.

1 i n t f u n ( ) { // i d =1 i n t x =0; // i d =1

3 i n t y ; // i d =2

i f ( x>=0){ // i d =3

5 y=x ; // i d =4

}

7 e l s e{

y=x +1; // i d =5

9 }

r e t u r n y ; // i d =6

11 }

Listing 3.5: Sample code for plugin usage

The implemented plugin works in the following steps:

1. Collect Basic Data

First, a visitor object is created that finds all the statements, structs, types and functions.

To do this, the functions vglob_aux and vstmt_aux are overwritten to visit all global constructs and statement. For each of these items, the relevant data is collected and stored as elements of the following types, cf. Listing 3.6.

• The type cfg_statementstores data about a statement: funid is the id of the cor-responding function,idthe id of the statement,is_starterdefines if the statement is the first one of a function, text contains the statement text, stmt_kind defines the statement kind,succis a list of all successor ids and predcontains the ids of all predecessors of the statement.

• Elements of typefunction_knowledgecontain information about a function: fname contains the name and the type of the function, formals contains a list with all

formal parameters of the function, and locals contain information about all local variables of the function.

• struct_knownledgecontains information about a struct in the program, including the id (str_id), the name (str_name) and the fields (str_fields).

• The typetypedefstores information about a type definition, i.e. the name (type_name) and the underlying type (type_type).

1 t y p e c f g _ s t a t e m e n t = { f u n i d : i n t ;

3 i d : i n t ;

i s _ s t a r t e r : b o o l ;

5 t e x t : s t r i n g ; stmt_kind : s t r i n g ;

7 s u c c : i n t l i s t ; p r e d : i n t l i s t

9 } ; ;

11 t y p e f u n c t i o n _ k n o w l e d g e = { fname : v a r i n f o ;

13 f o r m a l s : v a r i n f o l i s t ; l o c a l s : v a r i n f o l i s t ;

15 } ; ;

17 t y p e s t r u c t _ k n o w l e d g e = { s t r _ i d : i n t ;

19 str_name : s t r i n g ;

s t r _ f i e l d s : f i e l d i n f o l i s t ;

21

} ; ;

23

t y p e t y p e d e f = {

25 type_name : s t r i n g ;

type_type : C i l _ t y p e s . typ ;

27 } ; ;

Listing 3.6: Basic types for statements and functions

Furthermore, we store ids of blocks, like in then- and else-blocks of an if, and the ids of the statement in this block in lists. For the running example, a map is stored that

combines the if statement (id 3) with the then-block-statements (only one statement, id 4), and with the else-statements (also only one statement, id 5).

As examples, Listings 3.7 and 3.8 show the data elements for statement 4 and for the functionfun. All known values are assigned to the corresponding variables in the element of the type.

1 c f g _ s t a t e m e n t : f u n i d = 1

3 i d = 4

i s _ s t a r t e r = f a l s e

5 t e x t = " y=x "

stmt_kind = " Assignment "

7 s u c c = L i s t ( 3 ) p r e d = L i s t ( 6 )

Listing 3.7: Initial knowledge of statement4

f u n c t i o n _ k n o w l e d g e :

2 fname = v a r i n f o ( i n t , " f u n ", . . . ) f o r m a l s = L i s t ( )

4 l o c a l s = L i s t ( v a r i n f o ( i n t , " x ", . . . ) , v a r i n f o ( i n t , " y ", . . . ) )

Listing 3.8: Knowledge of the function fun

2. Calculate Conditions

As a next step, the plugin finds the condition under which a statement is executed. There-fore, we use the functioncollect_further_datawhich starts the condition retrieval pro-cess. It starts with the first statement of each function, which has no condition. Now the algorithm walks down all paths of the function until every statement is processed. If a branch statement, e.g., an if statement, is found, we store the condition. Each successor gets the condition in the original or negated form.

At the join of two paths (a statement has multiple predecessors), if a found condition is the negated version of another condition, the plugin can remove both, since the following statements are no longer dependent on them. Information about such statements are stored in elements of another data type that reuses the former type (see Listing 3.9).

cfg_conditions_statement contains an element of the former type cfg_statement, called linkdata. Furthermore, it contains a list of cond elements to represent the

con-ditions. Such a cond element consists of the condition text cond_text and a bool that specifies if the condition is negated (variable negated).

t y p e cond = {

2 cond_text : s t r i n g ; n e g a t e d : b o o l

4 } ; ;

6 t y p e c f g _ c o n d i t i o n _ s t a t e m e n t = { l i n k d a t a : c f g _ s t a t e m e n t ;

8 c o n d s : cond l i s t } ; ;

Listing 3.9: Datatype for statement with conditions

.

Listings 3.10 and 3.11 show the updated stored values for the inspected statement. As we can see, the condition contains the text and thenegatedflag, and the updated statement contains a link to the former created statement data and the condition.

1 cond :

cond_text = " x>=0"

3 n e g a t e d = " f a l s e "

Listing 3.10: Stored values for a condition

.

1 c f g _ c o n d i t i o n _ s t a t e m e n t :

l i n k d a t a = (∗c f g _ s t a t e m e n t from f o r m e r s t e p)

3 c o n d s = L i s t ((∗c r e a t e d cond e l e m e n t))

Listing 3.11: Stored values for the updated statement

.

3. Build Function List

As a next step, a visitor creates a list of all function names. This visitor was shown in Listing 3.3. This list is used later to iterate over all functions.

4. Value Analysis

The plugin needs the results from the value analysis to get the information about possible values. Therefore, !Db.Value.compute(); is called, which executes the value analysis for the current code.

5. Collect Value Analysis Results

After we have performed the value analysis, we have to collect the data. Therefore, we use a further visitor which gets the values for each variable at each statement. However, sometimes the algorithm doesn’t have knowledge about values of variables. In such cases the analysis result is not included, only real knowledge is stored. Additional calculations for values are done in the Java wrapper part of the work chain.

The type for the variable knowledge is shown in Listing 3.12. Elements of this type store the name of the variable (name), the name of the function (fun_name) and the value as string (value), so all possible values of different types can be represented.

1 t y p e var_data = {

name : s t r i n g ;

3 fun_name : s t r i n g ; v a l u e : s t r i n g

5 } ; ;

Listing 3.12: Datatypes to represent results of the value analysis

Listing 3.13 show the collected value of variable x at the inspected statement (id 4), containing variable name "x", function name "fun" and value"0".

1 var_data : name = " x "

3 fun_name = " f u n "

v a l u e = " 0 "

Listing 3.13: Stored value knowledge for variable x

.

Now, hashtables are created that map a statement id to a list of known variables. There are two such tables, one for the variables before the statement, and one for variables after

id value before value after

1 x = 0

2 x = 0 x = 0

y = undefined 3 x = 0

y = undefined

x = 0

y = undefined 4 x = 0

y = undefined

x = 0 y = 0 5 x = 0

y = undefined

x = 0 y = 1 6 x = 0

y = 0

Table 3.1: Known values for each statements

the statement. Those tables can be accessed by two getter-functions (get_table and get_table2) of the visitor class.

Table 3.1 shows the values before and after every statement of the example program:

6. Combine Statements with Value Tables

As a next step, the statements and the value tables are combined. This is done based on the stored statement id. This results in data that is represented by a new type, see Listing 3.14. This new type reuses the former type cfg_condition_statement in the variable static_data and adds the two new lists vars and vars_before. vars stores the known values after the statement, vars_before stores the known values before the statement.

t y p e c f g _ v a r d a t a _ s t a t e m e n t = {

2 s t a t i c _ d a t a : c f g _ c o n d i t i o n _ s t a t e m e n t ; v a r s : var_data l i s t ;

4 v a r s _ b e f o r e : var_data l i s t } ; ;

Listing 3.14: Datatypes to represent the combination of statement data and values

.

For the running example, Listing 3.15 shows the stored data for statement 4 after the value analysis has been performed. The element consists of the former collected statement

data and two lists that represent the calculated values.

1 c f g _ v a r d a t a _ s t a t e m e n t :

s t a t i c _ d a t a = (∗c o n t a i n s e l e m e n t from e a r l i e r s t e p)

3 v a r s = L i s t ( var_data (" x "," f u n "," 0 ") , var_data (" y "," f u n "," 0 ") ) v a r s _ b e f o r e = L i s t ( var_data (" x "," f u n " ," 0 ") )

Listing 3.15: Stored data after the value analysis

7. Tokenize Data

In order to have additional information about the statements, conditions and functions, the stored strings are parsed to tokens. Such tokens contain additional information, like the type of token, which makes it easier to parse and use it in the following phases of the work chain. Therefore some additional types are created. This can be seen in Listing 3.16. Type tokenkind defines the kind of a token. Furthermore, a token consists of a tokennameof typestring, which represents the token’s value. The typecond_token con-sists of the tokenized conditions (tokens) and the former flag negated. cfg_tokenized contains the earlier collected information without references to older statement types for easier exporting. The tokens list stores the tokenized statement text. The other two typestoken_function_knowledge and token_struct_knowledge contain the same values as before, but in tokenized form.

t y p e t o k e n k i n d=

2 | Name

| VarName

4 | FunName

| O p e r a t i o n

6 | S k i p

| Number

8 | Char

| S t r i n g

10 | Keyword

| L a b e l

12 ; ;

14 t y p e t o k e n = { k i n d : t o k e n k i n d ;

16 tokenname : s t r i n g } ; ;

18

t y p e cond_tokens = {

20 t o k e n s : t o k e n l i s t ; n e g a t e d : b o o l

22 } ; ;

24 t y p e c f g _ t o k e n i z e d = { t i d : i n t ;

26 t o k e n s : t o k e n l i s t ; i s _ s t a r t e r : b o o l ;

28 stmt_kind : s t r i n g ; s u c c : i n t l i s t ;

30 p r e d : i n t l i s t ;

c o n d s : cond_tokens l i s t ;

32 v a r s : var_data l i s t ;

v a r s _ b e f o r e : var_data l i s t ;

34 f u n i d : i n t } ; ;

36

t y p e t o k e n _ f u n c t i o n _ k n o w l e d g e = {

38 t v i d : i n t ;

tname : t o k e n l i s t ;

40 t f o r m a l s : t o k e n l i s t l i s t ; t l o c a l s : t o k e n l i s t l i s t ;

42 } ; ;

44 t y p e t o k e n _ s t r u c t _ k n o w l e d g e = { t _ s t r _ i d : i n t ;

46 t_str_name : s t r i n g ;

t _ s t r _ f i e l d s : t o k e n l i s t l i s t

48 }

Listing 3.16: Datatypes after tokenizing step

Listings 3.17 and 3.18 show the stored data for statement4after the tokenizing step: The condition consists of a list oftokenelements and the formernegatedflag. cfg_tokenized contains all the former collected data, but the statement’s text and the condition are stored in tokenized form.

cond_tokens :

2 t o k e n s = L i s t ( t o k e n ( VarName ," x ") , t o k e n ( O p e r a t i o n ,">=") , t o k e n ( Number ," 0 ") ) n e g a t e d = f a l s e

Listing 3.17: The stored condition after tokenizing step

1 c f g _ t o k e n i z e d : t i d = 4

3 t o k e n s = L i s t ( t o k e n ( VarName ," y ") , t o k e n ( O p e r a t i o n ,"=") , t o k e n ( VarName ," x ") ) i s _ s t a r t e r = f a l s e

5 stmt_kind = " Assignment "

s u c c = L i s t ( 6 )

7 p r e d = L i s t ( 3 )

c o n d s =L i s t ((∗one e l e m e n t w i t h cond_tokens from above))

9 v a r s = L i s t ( var_data (" x "," f u n "," 0 ") , var_data (" y "," f u n "," 0 ") ) v a r s _ b e f o r e = L i s t ( var_data (" x "," f u n " ," 0 ") )

11 f u n i d = 1

Listing 3.18: The stored statement after tokenizing the data

8. Export

As the last step, we export the collected data to several files, where each line contains all the collected data, seperated by multiple ";", which makes it easily parsable for other applications. Listing 3.19 contains the export data for the example function fun. For an import, first split the line at the highest number of successive ";" characters, e.g. for the function export ";;;; to get the id, the function tokens, the formal parameter tokens and the local variable tokens. Most of this parts can be further split at successive ";"

characters.

1 6 7 ; ; ; ; i n t ; Keyword ; ;f u n; Name ; ; ; ; ; ; ; ; ; ; i n t ; Keyword ; ; x ; Name ; ; ; ; ; i n t ; Keyword

; ; y ; Name ; ; ; ; ;

Listing 3.19: A sample export line for the function fun