Systeme hoher Qualitรคt und Sicherheit Universitรคt Bremen WS 2015/2016
Christoph Lรผth Jan Peleska Dieter Hutter
Lecture 09 (07-12-2015)
Static Program Analysis
Where are we?
01: Concepts of Quality
02: Legal Requirements: Norms and Standards 03: The Software Development Process
04: Hazard Analysis
05: High-Level Design with SysML
06: Formal Modelling with SysML and OCL 07: Detailed Specification with SysML
08: Testing
09: Static Program Analysis
10 and 11: Software Verification (Hoare-Calculus) 12: Model-Checking
13: Concurrency 14: Conclusions
Today: Static Program Analysis
Analysis of run-time behavior of programs without executing them (sometimes called static testing) Analysis is done for all possible runs of a program (i.e. considering all possible inputs)
Typical tasks
๏ง Does the variable x have a constant value ?
๏ง Is the value of the variable x always positive ?
๏ง Can the pointer p be null at a given program point ?
๏ง What are the possible values of the variable y ?
These tasks can be used for verification (e.g. is there any possible dereferencing of the null pointer), or for
optimisation when compiling.
Program Analysis in the Development Cycle
Usage of Program Analysis
Optimising compilers
Detection of sub-expressions that are evaluated multiple times Detection of unused local variables
Pipeline optimisations
Program verification
Search for runtime errors in programs Null pointer dereference
Exceptions which are thrown and not caught
Over/underflow of integers, rounding errors with floating point numbers
Runtime estimation (worst-caste executing time, wcet) In other words, specific verification aspects.
Program Analysis: The Basic Problem
Basic Problem:
Given a property P and a program p, we say ๐ โจ ๐ if a P holds for p. An algorithm (tool) ๐ which decides P is a computable predicate ๐: ๐ โ ๐ต๐๐๐. We say:
๏ง ๐ is sound if whenever ๐ ๐ then ๐ โจ ๐.
๏ง ๐ is safe (or complete) if whenever ๐ โจ ๐ then ๐ ๐ .
From the basic problem it follows that there are no sound and safe tools for interesting properties.
๏ง In other words, all interesting tools must either under- or overapproximate.
All interesting program properties are undecidable.
Program Analysis: Approximation
Correct Errors
Overapproximation Underapproximation
Underapproximation only finds correct programs but may miss out some
๏ง Useful in optimising compilers
๏ง Optimisation must respect semantics of program, but may optimise.
Overapproximation finds all errors but may find non-errors (false positives)
๏ง Useful in verification.
๏ง Safety analysis must find all errors, but may report some more.
๏ง Too high rate of false positives may hinder acceptance of tool.
Not computable
Computable
All programs
Program Analysis Approach
Provides approximate answers
๏ง yes / no / donโt know or
๏ง superset or subset of values
Uses an abstraction of programโs behavior
๏ง Abstract data values (e.g. sign abstraction)
๏ง Summarization of information from
execution paths e.g. branches of the if-else statement
Worst-case assumptions about environmentโs behavior
๏ง e.g. any value of a method parameter is possible
Sufficient precision with good performance
Flow Sensitivity
Flow-sensitive analysis
Considers program's flow of control
Uses control-flow graph as a representation of the source
Example: available expressions analysis
Flow-insensitive analysis
Program is seen as an unordered collection of statements
Results are valid for any order of statements e.g. S1 ; S2 vs. S2 ; S1
Example: type analysis (inference)
Context Sensitivity
Context-sensitive analysis
Stack of procedure invocations and return values of method parameters
Results of analysis of the method M depend on the caller of M
Context-insensitive analysis
Produces the same results for all possible invocations of M independent of possible callers and parameter values.
Intra- vs. Inter-procedural Analysis
Intra-procedural analysis
Single function is analyzed in isolation
Maximally pessimistic assumptions about parameter values and results of procedure calls
Inter-procedural analysis
Whole program is analyzed at once Procedure calls are considered
Data-Flow Analysis
Focus on questions related to values of variables and their lifetime Selected analyses:
Available expressions (forward analysis)
๏ง Which expressions have been computed already without change of the occurring variables (optimization) ?
Reaching definitions (forward analysis)
๏ง Which assignments contribute to a state in a program point?
(verification)
Very busy expressions (backward analysis)
๏ง Which expressions are executed in a block regardless which path the program takes (verification) ?
Live variables (backward analysis)
๏ง Is the value of a variable in a program point used in a later part of the program (optimization) ?
Our Simple Programming Language
In the last lecture, we introduced a very simple language with a C-like syntax.
Synposis:
Arithmetic operators given by
๐ โท= ๐ฅ ๐ ๐1 ๐๐๐ ๐2 Boolean operators given by
๐ โ true false not ๐ ๐1๐๐๐ ๐2 ๐1๐๐๐ ๐2 ๐๐๐ โ ๐๐๐, ๐๐ , ๐๐๐ โ =, <, โค, >, โฅ, โ Statements given by
๐ โท=
๐ฅ โ ๐ ๐ | ๐ ๐๐๐ ๐ ๐1; ๐2 | ๐๐ ๐ ๐ ๐1 ๐๐๐ ๐ ๐2 ๐คโ๐๐๐ ๐ ๐ {๐}
Computing the Control Flow Graph
To calculate the cfg, we define some functions on the abstract syntax:
๏ง The initial label (entry point) init: ๐ โ ๐ฟ๐๐
๏ง The final labels (exit points) final: ๐ โ โ ๐ฟ๐๐
๏ง The elementary blocks block: ๐ โ โ ๐ต๐๐๐๐๐ where an elementary block is
โบ an assignment [x:= a],
โบ or [skip],
โบ or a test [b]
๏ง The control flow flow: ๐ โ โ ๐ฟ๐๐ ร ๐ฟ๐๐ and reverse control flowR: ๐ โ โ ๐ฟ๐๐ ร ๐ฟ๐๐ .
The control flow graph of a program S is given by
๏ง elementary blocks block ๐ as nodes, and
Labels, Blocks, Flows: Definitions
๐๐๐๐๐ ๐ฅ โ ๐ ๐ = ๐
๐๐๐๐๐ ๐ ๐๐๐ ๐ = ๐
๐๐๐๐๐ ๐1; ๐2 = ๐๐๐๐๐ ๐2
๐๐๐๐๐ ๐๐ ๐ ๐ ๐1 ๐๐๐ ๐ {๐2} = ๐๐๐๐๐ ๐1 โช ๐๐๐๐๐ ๐2 ๐๐๐๐๐ ๐คโ๐๐๐ ๐ ๐ ๐ = {๐}
๐๐๐๐ก ๐ฅ โ ๐ ๐ = ๐ ๐๐๐๐ก ๐ ๐๐๐ ๐ = ๐
๐๐๐๐ก ๐1; ๐2 = ๐๐๐๐ก ๐1
๐๐๐๐ก (๐๐ ๐ ๐ ๐1 ๐๐๐ ๐ ๐2 = ๐ ๐๐๐๐ก (๐คโ๐๐๐ ๐ ๐ ๐ = ๐
๐๐๐๐ค ๐ฅ โ ๐ ๐ = โ ๐๐๐๐ค ๐ ๐๐๐ ๐ = โ
๐๐๐๐ค ๐1; ๐2 = ๐๐๐๐ค ๐1 โช ๐๐๐๐ค ๐2 โช ๐, ๐๐๐๐ก ๐2 ) ๐ โ ๐๐๐๐๐ ๐1
๐๐๐๐ค ๐๐ ๐ ๐ ๐1 ๐๐๐ ๐ {๐2 } = ๐๐๐๐ค ๐1 โช ๐๐๐๐ค ๐2 โช {(๐, ๐๐๐๐ก ๐1 ), ๐, ๐๐๐๐ก ๐2 ) ๐๐๐๐ค (๐คโ๐๐๐ ๐ ๐ ๐ = ๐๐๐๐ค ๐ โช ๐, ๐๐๐๐ก ๐ โช { ๐โฒ, ๐ |๐โฒ โ ๐๐๐๐๐ ๐ }
๐๐๐๐ค๐ ๐ = ๐โฒ, ๐ ๐, ๐โฒ โ ๐๐๐๐ค(๐)}
๐๐๐๐๐๐ ๐ฅ โ ๐ ๐ = ๐ฅ โ ๐ ๐ ๐๐๐๐๐๐ ๐ ๐๐๐ ๐ = ๐ ๐๐๐ ๐
๐๐๐๐๐๐ ๐1; ๐2 = ๐๐๐๐๐๐ ๐1 โช ๐๐๐๐๐๐ ๐2 ๐๐๐๐๐๐ ๐๐ ๐ ๐ ๐1 ๐๐๐ ๐ ๐2
= ๐ ๐ โช ๐๐๐๐๐๐ ๐1 โช ๐๐๐๐๐๐ ๐2 ๐๐๐๐๐๐ ๐คโ๐๐๐ ๐ ๐ ๐ = ๐ ๐ โช ๐๐๐๐๐๐ (๐)
๐๐๐๐๐๐ ๐ = ๐ ๐ต ๐ โ ๐๐๐๐๐๐ (๐)}
๐น๐ ๐ = free variables in ๐
๐ด๐๐ฅ๐ ๐ = non-trival subexpressions
in ๐ (variables and constants are trivial)
An Example Program
init(P) = 1 final(P) = {3}
blocks(P) =
{ [x := a+b]1, [y := a*b]2, [y > a+b]3, [a:=a+1]4, [x:= a+b]5} flow(P) = {(1, 2), (2, 3), (3, 4), (4, 5), (5, 3)}
flowR(P) = {(2, 1), (3, 2), (4, 3), (5, 4), (3, 5)}
labels(P) = {1, 2, 3, 4, 5)
FV(a + b) = {a, b}
FV(P) = {a, b, x, y}
Aexp(P) = {a+b, a*b, a+1}
x := a +b
y > a + b
a := a + 1
x := a + b
1
5 4 3
y := a * b 2 P = [x := a+b]1; [y := a*b]2; while [y > a+b]3 { [a:=a+1]4; [x:= a+b]5 }
Available Expression Analysis
x := a +b
y > a + b
a := a + 1
x := a + b
1
5 4 3
y := a * b 2 S :
For each program point, which
expressions must have already been computed, and not modified, on all paths to this program point.
The available expression analysis will determine:
Available Expression Analysis
kill( [x :=a]l ) = ๐โฒ โ ๐ด๐๐ฅ๐ ๐ ๐ฅ โ ๐น๐ โฒ๐ } kill( [skip]l ) = โ
kill( [b]l ) = โ
gen( [x :=a]l ) = ๐โฒ โ ๐ด๐๐ฅ๐ ๐ ๐ฅ โ ๐น๐ โฒ๐ } gen( [skip]l ) = โ
gen( [b]l ) = ๐ด๐๐ฅ๐(๐)
AEin( l ) = โ , if l โ init(S) ๐ด๐ธ๐๐ข๐ก ๐โฒ ๐โฒ, ๐ โ ๐๐๐๐ค(๐) , otherwise
AEout ( l ) = ๐ด๐ธ๐๐ ๐ \ ๐๐๐๐ ๐ต๐ โช ๐๐๐ ๐ต๐ , where ๐ต๐ โ ๐๐๐๐๐๐ (๐)
x := a +b
y > a + b
a := a + 1
x := a + b
1
5 4 3
y := a * b 2 S :
l kill(l) gen(l)
1 โ {a+b}
2 โ {a*b}
3 โ {a+b}
4 {a+b, a*b, a+1} โ
5 โ {a+b}
l AEin AEout
1 โ {a+b}
2 {a+b} {a+b, a*b}
3 {a+b} {a+b}
4 {a+b} โ
5 โ {a+b}
Reaching Definitions Analysis
Reaching definitions (assignment) analysis determines if:
An assignment of the form [x := a]l may reach a certain program point k if there is an execution of the
program where x was last assigned a value at l when the program point k is reached
x := 5
x > 1
y := x * y
x := x - 1
1
5 4 3
y := 1 2 S :
Reaching Definitions Analysis
kill( [skip]l ) = โ kill( [b]l ) = โ
kill( [x :=a]l ) = ๐ฅ, ? โช ๐ฅ, ๐ ๐ต๐ ๐๐ ๐๐ ๐๐ ๐ ๐๐๐๐๐๐ก ๐๐ ๐}
gen( [x :=a]l ) = { ๐ฅ, ๐ } gen( [skip]l ) = โ
gen( [b]l ) = โ
RDin( l ) = { ๐ฅ, ? |๐ฅ โ ๐น๐ ๐ if l โ init(S) ๐ ๐ท๐๐ข๐ก ๐โฒ ๐โฒ, ๐ โ ๐๐๐๐ค ๐ otherwise
RDout ( l ) = ๐ ๐ท๐๐ ๐ \ ๐๐๐๐ ๐ต๐ โช ๐๐๐ ๐ต๐ where ๐ต๐ โ ๐๐๐๐๐๐ (๐)
x := 5
x > 1
y := x * y
x := x - 1
1
5 4 3
y := 1 2
l kill(Bl) gen(Bl)
1 {(x,?), (x,1),(x,5)} {(x, 1)}
2 {(y,?), (y,2),(y,4)} {(y, 2)}
3 โ โ
4 {(y,?), (y,2),(y,4)} {(y, 4)}
5 {(x,?), (x,1),(x,5)} {(x, 5)}
S :
l RDin RDout
1 {(x,?), (y,?)} {(x,1), (y,?)}
2 {(x,1), (y,?)} {(x,1), (y,2)}
3 {(x,1), (x,5), (y,2), (y,4)} {(x,1), (x,5), (y,2), (y,4)}
4 {(x,1), (x,5), (y,2), (y,4)} {(x,1), (x,5),(y,4)}
5 {(x,1), (x,5),(y,4)} {(x,5),(y,4)}
Live Variables Analysis
A variable x is live at some program point (label l) if there exists if there
exists a path from l to an exit point that does not change the variable.
Live Variables Analysis determines:
Application: dead code elemination.
x := 2
x := 1
y > x
z := y
yes no
1
5
4 3
y := 4 2 S :
z := y*y 6
x := z
7
For each program point, which variables may be live at the exit from that point.
Live Variables Analysis
kill( [x :=a]l ) = {๐ฅ}
kill( [skip]l ) = โ kill( [b]l ) = โ gen( [x :=a]l ) = ๐น๐(๐)
gen( [skip]l ) = โ gen( [b]l ) = ๐น๐(๐)
LVout( l ) = โ if l โ final(S) ๐ฟ๐๐๐ ๐โฒ ๐โฒ, ๐ โ ๐๐๐๐ค๐ ๐ otherwise
LVin ( l ) = ๐ฟ๐๐๐ข๐ก ๐ \ ๐๐๐๐ ๐ต๐ โช ๐๐๐ ๐ต๐ where ๐ต๐ โ ๐๐๐๐๐๐ (๐)
x := 2
x := 1
y > x
z := y
yes no
1
5
4 3
y := 4 2
l kill(l) gen(l)
1 {x} โ
2 {y} โ
3 {x} โ
4 โ {x, y}
5 {z} {y}
6 {z} {y}
7 {x} {z}
l LVin LVout
1 โ โ
2 โ {y}
3 {y} {x, y}
4 {x, y} {y}
5 {y} {z}
6 {y} {z}
7 {z} โ
S :
z := y*y 6
x := z
7
First Generalized Schema
Analysis๏ฐ ( l ) = ๐๐ if ๐ โ ๐
โก Analysis๏ท ( lโ ) ๐โฒ, ๐ โ ๐ ๐ฅ๐จ๐ฐ ๐ } otherwise Analysis๏ท ( l ) = ๐l ( Analysis๏ฐ ( l ) )
With:
โก is either ๏ or ๏
๐๐ is the initial / final analysis information ๐ ๐ฅ๐จ๐ฐ is either flow or flowR
๐ is either {init(S)} or final(S)
๐๐ is the transfer function associated with ๐ต๐ โ ๐๐๐๐๐๐ (๐) Backward analysis: ๐ ๐ฅ๐จ๐ฐ = flowR, ๏ท = IN, ๏ฐ = OUT
Forward analysis: ๐ ๐ฅ๐จ๐ฐ = flow, ๏ท = OUT, ๏ฐ = IN
Partial Order
๐ฟ = ๐, โ is a partial order iff
๏ง Reflexivity: โ๐ฅ โ ๐. ๐ฅ โ ๐ฅ
๏ง Transitivity: โ๐ฅ, ๐ฆ, ๐ง โ ๐. ๐ฅ โ ๐ฆ โง ๐ฆ โ ๐ง โ ๐ฅ โ ๐ง
๏ง Anti-symmetry: โ๐ฅ, ๐ฆ โ ๐. ๐ฅ โ ๐ฆ โง ๐ฆ โ ๐ฅ โ ๐ฅ = ๐ฆ
Let ๐ฟ = ๐, โ be a partial order, ๐ โ ๐
๏ง ๐ฆ โ ๐ is upper bound for ๐ ๐ โ ๐ฆ iff โ๐ฅ โ ๐. ๐ฅ โ ๐ฆ
๏ง ๐ฆ โ ๐ is lower bound for S (๐ฆ โ ๐) iff โ๐ฅ โ ๐. ๐ฆ โ ๐ฅ
๏ง Least upper bound โจ๐ โ ๐ of ๐ โ ๐:
โบ ๐ โ โจ๐ โง โ๐ฆ โ ๐. ๐ โ ๐ฆ โ โจ๐ โ ๐ฆ
๏ง Greatest lower bound โ ๐ of ๐ โ ๐:
โบ โ ๐ โ ๐ โง โ๐ฆ โ ๐. ๐ฆ โ ๐ โ ๐ฆ โ โ ๐
Lattice
A lattice (โVerbundโ) is a partial order L = (M, โ) such that
โX and โX exist for all X โ M
Unique greatest element โค = โM = โโ
Unique least element โฅ = โM = โโ
Transfer Functions
Transfer functions to propagate information along the execution path (i.e. from input to output, or vice versa)
Let ๐ฟ = ๐, โ be a lattice. Let ๐น be the set of transfer functions of the form
fl : L ๏ฎ L with l being a label
Knowledge transfer is monotone
๏ง โ ๐ฅ, ๐ฆ. ๐ฅ โ ๐ฆ โน ๐๐ ๐ฅ โ ๐๐ ๐ฆ
Space ๐น of transfer functions
๏ง ๐น contains all transfer functions fl
๏ง ๐น contains the identity function id: โ๐ฅ โ ๐. ๐๐ ๐ฅ = ๐ฅ
๏ง ๐น is closed under composition: โ ๐, ๐ โ ๐น. ๐ โ ๐ โ ๐น
The Generalized Analysis
Analysis๏ฐ ( l ) = โ Analysis๏ท ( lโ ) | (lโฒ, l) โ ๐น๐๐๐ค ๐ โ { ๐๐ธโฒ }
with ๐๐ธโฒ = ๐ธ๐ if ๐ โ ๐ธ
โฅ otherwise Analysis๏ท ( l ) = ๐๐( Analysis๏ฐ ( l ) )
With:
L property space representing data flow information with ๐ฟ, โ a lattice
๐น๐๐๐ค is a finite flow (i.e. ๐๐๐๐ค or ๐๐๐๐ค๐ )
๐ธ๐ is an extremal value for the extremal labels ๐ธ (i.e. ๐๐๐๐ก ๐ or ๐๐๐๐๐(๐)
transfer functions ๐๐ of a space of transfer functions ๐น
Summary
Static Program Analysis is the analysis of run-time behavior of programs without executing them
(sometimes called static testing).
Approximations of program behaviours by analyzing the programโs cfg.
Analysis include
๏ง available expressions analysis,
๏ง reaching definitions,
๏ง live variables analysis.
These are instances of a more general framework.
These techniques are used commercially, e.g.
๏ง AbsInt aiT (WCET)