Intermediate Representation - Information flow analysis for mobile code in dynamic security env

This suits the application scenario outlined in the introduction: The verification of the source code by the software developer is independent of the verification of the bytecode by the app store provider. The software developer may switch to a different high-level code analysis or to a different compiler at any time without affecting the ability of the distribution platform maintainer to verify the bytecode program. Also, there are no requirements for the format of the submitted bytecode code, as the translation to IR code happens only within the bytecode analysis.

Nevertheless, I give a type preservation result for the specific DSD compiler pre-sented in the previous chapter. It shows that all bytecode programs produced by the compiler can be transformed into an IR program, such that if the original high-level program was typable with the high-level type system, then the corresponding IR pro-gram is typable with the IR type system. This gives the propro-grammer the confidence that typable DSD programs will always be compiled to bytecode programs that are universally noninterferent. Note, however, that neither the type preservation result nor the specific compiler are required to certify bytecode for universal noninterference.

The Grail intermediate language [BMS03] serves a similar purpose as the interme-diate representation: it is used to transfer results from a high-level resource analysis to simplify the analysis of low-level code. In contrast to the presented approach, how-ever, the compiler of the Grail framework generates the Grail code directly from the high-level language, and additionally translates the high-level proof to a proof for the corresponding Grail code. To make this approach work in the application scenario presented here, one has to make considerable changes to existing code distribution environments and the execution platform.

5.1 Intermediate Representation

The syntax of IR instructions is defined as follows:

temporary variables: t ∈ TVar

variables: x ∈ Var∪TVar

fields: f ∈ Fld

classes: C ∈ Cls

methods: m ∈ Mtd

instruction addresses: i,j ∈ Adr

expressions: Exp3e ::= (as defined in the high-level language) The instruction set consists of:

• blocks, which are sequences of assignment statements, similar to those from the high-level language;

• conditional and unconditional jumps, similar to those on the bytecode level, but with the conditional expression restored; and

• pseudo-instructions that indicate control dependence regions, with the same meaning as on the bytecode level.

The set of IR instructions is much smaller than the bytecode instruction set; most functionality is provided by the assignment block instructionblocka. An occurrence ofxin assignmentsa and high-level expressionsemay refer to either an ordinary variable or a temporary variable.

IR programs are defined very similarly to bytecode programs.

Definition 5.1 AnIR programPIRis a tuple

(≺,fields,methods,margs,tvars,IR,mentry,mexit) where

• ≺∈P(Cls×Cls)is the subclass relation.

• fields:Cls→Fld^∗andmethods:Cls→Mtd^∗assign to each class the identifiers of the fields and methods they contain.

• margs:Mtd→Var^∗is a function that describes the names of the formal arguments of each method m.

• tvars:Mtd→TVar^∗specifies for each method the temporary variables that may occur in the body of the method.

• IR:Mtd×Adr*InstrIRassigns to methods and instruction addresses an IR in-struction, thereby specifying the method implementations.

• mentry,mexit:Mtd→Adr specify the entry point and the exit point of methods.

The difference to bytecode programs is that IR method bodies contain IR instruc-tions, of course, and that IR methods may also contain temporary variables specified bytvars. Apart from that, the meaning and well-formedness requirements are the same as for bytecode programs. In the following, let us assume a given fixed IR programPIR

whose components are all well-formed.

5.1.2 Semantics

An IR stateσIRis a triple (s,st,h) wherehis a heap, andsandst are two variable stores, one for ordinary and one for temporary variables. The definition of stores, heaps, and values are exactly as for bytecode and high-level programs.

values: v∈Val : (as before)

state: σIR∈StateIR : Store×TStore×Heap store: s∈Store : Var→Val

temporary store: st∈TStore : TVar→Val

heap: h∈Heap : Loc*(Cls×(Fld*Val))

Note thats∪stalways forms a valid store, becauseTVaris disjoint fromVar. For the same reason, it is always possible to split a combined stores∪st back intosandst. All expressionsethat occur in statements are evaluated using an evaluation function that is defined in terms of the high-level expression evaluation:

J^eK

s,st,h=J^eK

¦ s∪st,h

In the following, I define five mutually recursive transition relations, listed in the following table:

small-step transition for IR instructions: (m,i,s,st,h) −→^I ^¦ (m,i⁰,s⁰,s⁰_t,h⁰) big-step transition for IR instructions: (m,i,s,st,h) =⇒

¦ (m,i⁰,s⁰,s⁰_t,h⁰) big-step transition for IR methods: (s,h) =⇒^m

¦ (s⁰,h⁰) small-step transition for IR assignments: (s,st,h) −→^a ^¦ (s⁰,s⁰_t,h⁰)

big-step transition for IR assignments: (s,st,h) =⇒^a ^¦ (s⁰,s⁰_t,h⁰)

5.1 Intermediate Representation

I=ife j J^eK

¦ s,st,h=0 (m,i,s,st,h)−→^I ^¦(m,i+1,s,st,h)

I=ife j J^eK

¦ s,st,h6=0 (m,i,s,st,h) −→^I ^¦(m,j,s,st,h)

I∈{jmpj,cjmpj} (m,i,s,st,h)−→^I ^¦(m,j,s,st,h)

I=cpushj

(m,i,s,st,h) −→^I ^¦(m,i+1,s,st,h)

I=blocka (s,st,h) =⇒^a ^¦(s⁰,s_t⁰,h) (m,i,s,st,h)−→^I ^¦(m,i+1,s⁰,s⁰_t,h⁰)

Figure 5.1: Mostly small-step semantics for IR instructions

S=a∈{x:=e,e.f:=e,x:=newC(e)}

(s∪st,h)−→^S ^¦(s⁰∪s⁰_t,h⁰) (s,st,h)−→^a ^¦(s⁰,s⁰_t,h⁰)

a=x:=e.m(e) J^eK

s,st,h=r J^eK

¦ s,st,h=v sm=[this7→r]∪[margs(m)7→v]∪[ret7→defval] (sm,h) =⇒^m

¦(s⁰_m,h⁰) s⁰∪s⁰_t=(s∪st)[x7→s_m⁰ (ret)]

(s,st,h)−→^a ^¦(s⁰,s⁰_t,h⁰)

Figure 5.2: Semantics for IR assignments

The rules in Figure 5.1 on the preceding page define a (mostly) small-step operational semantics for single IR instructions

(m,i,s,st,h) −→^I ^¦(m,i⁰,s⁰,s⁰_t,h⁰)

which means that given a domain lattice¦and starting from state (s,st,h) at instruction address (m,i), the instructionI leads to instruction address (m,i⁰) and new state (s⁰,s⁰_t,h⁰). A special relation for assignment blocks is defined below.

While the relation is defined for arbitrary instructionsI, it is linked to the instruction IR[m,i] via a big-step transition, which is defined similar to the bytecode big-step semantics:

(m,i⁰,s⁰,s⁰_t,h⁰) =⇒

¦(m,i^k,s^k,s^k_t,h^k)

if and only if there is a (possibly empty) sequence of small execution steps such that (m,i⁰,s⁰,s⁰_t,h⁰) ^IR[m,i

−−−−−−→^¦(m,i¹,s¹,s¹_t,h¹) ^IR[m,i

−−−−−−→^¦ . . . −−−−−−−→^IR[m,i^k−1^] ^¦(m,i^k,s^k,s^k_t,h^k).

I use the following notation is a shorthand for the execution of entire method bodies:

(s,h) =⇒^m

¦(s⁰,h⁰) if and only if

∃st. (m,mentry(m),s, [tvars(m)7→defval],h) =⇒

¦(m,mexit(m),s⁰,st,h⁰) The small-step rule for assignment blocks relies on a transition

(s⁰,s⁰_t,h⁰) =⇒^a ^¦(s^k,s^k_t,h^k)

that executes the assignments inain order — that is, there must exist a sequence of execution steps (s⁰,s_t⁰,h⁰) −−→^a.1 ^¦(s¹,s¹_t,h¹) −−→^a.2 ^¦. . . −−−→^a.k ^¦(s^k,s_t^k,h^k) such that|a| =k.

For each individual assignment, in turn, a transition (s,st,h) −→^a ^¦(s⁰,s⁰_t,h⁰) is used, defined by the rules in Figure 5.2 on the previous page. IR assignments have exactly the semantics as assignment statements in the DSD language; in fact, I directly use the operational semantics of DSD for variable updates, field updates, and object cre-ation assignments. The only difference is that IR method calls initialize all temporary variables of a method with the default valuedefval, and use

Following the notation for bytecode execution successors, I write (m,i)7−→^I (m,i⁰) if there are stores, temporary stores, and heaps such that (m,i,s,s_t,h) −→^I ^¦(m,i⁰,s⁰,s⁰_t,h⁰).

The set of successors of an instruction is defined assucc(m,i)={i⁰|(m,i)7−→^I (m,i⁰)}.

Im Dokument Information flow analysis for mobile code in dynamic security environments (Seite 74-78)