• Keine Ergebnisse gefunden

This suits the application scenario outlined in the introduction: The verification of the source code by the software developer is independent of the verification of the bytecode by the app store provider. The software developer may switch to a different high-level code analysis or to a different compiler at any time without affecting the ability of the distribution platform maintainer to verify the bytecode program. Also, there are no requirements for the format of the submitted bytecode code, as the translation to IR code happens only within the bytecode analysis.

Nevertheless, I give a type preservation result for the specific DSD compiler pre-sented in the previous chapter. It shows that all bytecode programs produced by the compiler can be transformed into an IR program, such that if the original high-level program was typable with the high-level type system, then the corresponding IR pro-gram is typable with the IR type system. This gives the propro-grammer the confidence that typable DSD programs will always be compiled to bytecode programs that are universally noninterferent. Note, however, that neither the type preservation result nor the specific compiler are required to certify bytecode for universal noninterference.

The Grail intermediate language [BMS03] serves a similar purpose as the interme-diate representation: it is used to transfer results from a high-level resource analysis to simplify the analysis of low-level code. In contrast to the presented approach, how-ever, the compiler of the Grail framework generates the Grail code directly from the high-level language, and additionally translates the high-level proof to a proof for the corresponding Grail code. To make this approach work in the application scenario presented here, one has to make considerable changes to existing code distribution environments and the execution platform.

5.1 Intermediate Representation

The syntax of IR instructions is defined as follows:

temporary variables: tTVar

variables: xVar∪TVar

fields: fFld

classes: CCls

methods: mMtd

instruction addresses: i,jAdr

instruction syntax: InstrIR3I ::= ife j|jmpj|cpushj|cjmpj|blocka assignments: Assn3a ::= x:=e|e.f:=e|x:=newC(e)|x:=e.m(e)

expressions: Exp3e ::= (as defined in the high-level language) The instruction set consists of:

• blocks, which are sequences of assignment statements, similar to those from the high-level language;

• conditional and unconditional jumps, similar to those on the bytecode level, but with the conditional expression restored; and

• pseudo-instructions that indicate control dependence regions, with the same meaning as on the bytecode level.

The set of IR instructions is much smaller than the bytecode instruction set; most functionality is provided by the assignment block instructionblocka. An occurrence ofxin assignmentsa and high-level expressionsemay refer to either an ordinary variable or a temporary variable.

IR programs are defined very similarly to bytecode programs.

Definition 5.1 AnIR programPIRis a tuple

(≺,fields,methods,margs,tvars,IR,mentry,mexit) where

≺∈P(Cls×Cls)is the subclass relation.

fields:ClsFldandmethods:ClsMtdassign to each class the identifiers of the fields and methods they contain.

margs:MtdVaris a function that describes the names of the formal arguments of each method m.

tvars:MtdTVarspecifies for each method the temporary variables that may occur in the body of the method.

• IR:Mtd×Adr*InstrIRassigns to methods and instruction addresses an IR in-struction, thereby specifying the method implementations.

mentry,mexit:MtdAdr specify the entry point and the exit point of methods.

The difference to bytecode programs is that IR method bodies contain IR instruc-tions, of course, and that IR methods may also contain temporary variables specified bytvars. Apart from that, the meaning and well-formedness requirements are the same as for bytecode programs. In the following, let us assume a given fixed IR programPIR

whose components are all well-formed.

5.1.2 Semantics

An IR stateσIRis a triple (s,st,h) wherehis a heap, andsandst are two variable stores, one for ordinary and one for temporary variables. The definition of stores, heaps, and values are exactly as for bytecode and high-level programs.

values: vVal : (as before)

state: σIRStateIR : Store×TStore×Heap store: sStore : VarVal

temporary store: stTStore : TVarVal

heap: hHeap : Loc*(Cls×(Fld*Val))

Note thatsstalways forms a valid store, becauseTVaris disjoint fromVar. For the same reason, it is always possible to split a combined storesst back intosandst. All expressionsethat occur in statements are evaluated using an evaluation function that is defined in terms of the high-level expression evaluation:

JeK

¦

s,st,h=JeK

¦ s∪st,h

In the following, I define five mutually recursive transition relations, listed in the following table:

small-step transition for IR instructions: (m,i,s,st,h) −→I ¦ (m,i0,s0,s0t,h0) big-step transition for IR instructions: (m,i,s,st,h) =⇒

IR

¦ (m,i0,s0,s0t,h0) big-step transition for IR methods: (s,h) =⇒m

IR

¦ (s0,h0) small-step transition for IR assignments: (s,st,h) −→a ¦ (s0,s0t,h0)

big-step transition for IR assignments: (s,st,h) =⇒a ¦ (s0,s0t,h0)

5.1 Intermediate Representation

I=ife j JeK

¦ s,st,h=0 (m,i,s,st,h)−→I ¦(m,i+1,s,st,h)

I=ife j JeK

¦ s,st,h6=0 (m,i,s,st,h) −→I ¦(m,j,s,st,h)

I∈{jmpj,cjmpj} (m,i,s,st,h)−→I ¦(m,j,s,st,h)

I=cpushj

(m,i,s,st,h) −→I ¦(m,i+1,s,st,h)

I=blocka (s,st,h) =⇒a ¦(s0,st0,h) (m,i,s,st,h)−→I ¦(m,i+1,s0,s0t,h0)

Figure 5.1: Mostly small-step semantics for IR instructions

S=a∈{x:=e,e.f:=e,x:=newC(e)}

(s∪st,h)−→S ¦(s0s0t,h0) (s,st,h)−→a ¦(s0,s0t,h0)

a=x:=e.m(e) JeK

¦

s,st,h=r JeK

¦ s,st,h=v sm=[this7→r]∪[margs(m)7→v]∪[ret7→defval] (sm,h) =⇒m

IR

¦(s0m,h0) s0s0t=(s∪st)[x7→sm0 (ret)]

(s,st,h)−→a ¦(s0,s0t,h0)

Figure 5.2: Semantics for IR assignments

The rules in Figure 5.1 on the preceding page define a (mostly) small-step operational semantics for single IR instructions

(m,i,s,st,h) −→I ¦(m,i0,s0,s0t,h0)

which means that given a domain lattice¦and starting from state (s,st,h) at instruction address (m,i), the instructionI leads to instruction address (m,i0) and new state (s0,s0t,h0). A special relation for assignment blocks is defined below.

While the relation is defined for arbitrary instructionsI, it is linked to the instruction IR[m,i] via a big-step transition, which is defined similar to the bytecode big-step semantics:

(m,i0,s0,s0t,h0) =⇒

IR

¦(m,ik,sk,skt,hk)

if and only if there is a (possibly empty) sequence of small execution steps such that (m,i0,s0,s0t,h0) IR[m,i

0]

−−−−−−→¦(m,i1,s1,s1t,h1) IR[m,i

1]

−−−−−−→¦ . . . −−−−−−−→IR[m,ik−1] ¦(m,ik,sk,skt,hk).

I use the following notation is a shorthand for the execution of entire method bodies:

(s,h) =⇒m

IR

¦(s0,h0) if and only if

st. (m,mentry(m),s, [tvars(m)7→defval],h) =⇒

IR

¦(m,mexit(m),s0,st,h0) The small-step rule for assignment blocks relies on a transition

(s0,s0t,h0) =⇒a ¦(sk,skt,hk)

that executes the assignments inain order — that is, there must exist a sequence of execution steps (s0,st0,h0) −−→a.1 ¦(s1,s1t,h1) −−→a.2 ¦. . . −−−→a.k ¦(sk,stk,hk) such that|a| =k.

For each individual assignment, in turn, a transition (s,st,h) −→a ¦(s0,s0t,h0) is used, defined by the rules in Figure 5.2 on the previous page. IR assignments have exactly the semantics as assignment statements in the DSD language; in fact, I directly use the operational semantics of DSD for variable updates, field updates, and object cre-ation assignments. The only difference is that IR method calls initialize all temporary variables of a method with the default valuedefval, and use

Following the notation for bytecode execution successors, I write (m,i)7−→I (m,i0) if there are stores, temporary stores, and heaps such that (m,i,s,st,h) −→I ¦(m,i0,s0,s0t,h0).

The set of successors of an instruction is defined assucc(m,i)={i0|(m,i)7−→I (m,i0)}.