• Keine Ergebnisse gefunden

Virtual Memory

N/A
N/A
Protected

Academic year: 2022

Aktie "Virtual Memory"

Copied!
36
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Virtual Memory

Simulation Theorems

Mark A. Hillebrand, Wolfgang J. Paul

{mah,wjp}@cs.uni-sb.de

Saarland University, Saarbruecken, Germany

This work was partially

(2)

Overview

Theorem: the parallel user programs of a main frame see a sequentially consistent virtual shared memory (Correctness of main frame hardware & part of OS)

(3)

Context

A (practical) approach for the complete formal verification of real computer systems:

1. Specify (precisely)

2. Construct (completely)

3. Mathematical correctness proof

4. Check correctness proof by computer

5. Automate approach (partially; recall Gödel)

(4)

Example: Processor design

1.–3. [MP00]

Computer Architecture:

Complexity and Correctness Springer

4. [BJKLP03]

Functional verification of the VAMP processor Charme ’03

5. PhD Thesis Project S. Tverdyshev

(Khabarovsk State Technical University)

(5)

Why Memory Management?

Layers of computer systems (all using local computation and communication):

User Program

Operating System

Hardware

! In memory management hardware and software are coupled extremely tightly.

(6)

DLX Configuration

A processor configuration of the DLX is a pair c = (R, M):

R : {register names} → {0, 1}32

where register names PC , GPR(r), status, . . .

M : {memory addresses} → {0, 1}8 where memory addresses ∈ {0, 1}32 Standard definition is an abstraction:

real hardware usually has no 232 bytes main memory

(7)

DLX

V

Configuration

A virtual processor configuration of DLX V is a triple c = (R, M, r):

R : {register names} → {0, 1}32

where register names: PC , GPR(r), status, . . .

M : {virtual memory addresses} → {0, 1}8 where virtual memory addresses ∈ {0, 1}32

r : {virtual memory addresses} → {R, W} where the rights R (read) and W (write) are identical for each page (4K).

! DLX V is a basis for user programs.

(8)

DLX

S

Configuration

A real specification machine configuration of DLX S is a triple cS = (RS, PM , SM ):

RS \ R:

mode system mode (0) or user mode (1)

pto Page table origin

ptl Page table length (only for exceptions)

PM physical memory

SM swap memory

! DLX S is hardware specification.

(9)

Page-Table Lookup

(pto,012) px bx

+

Page Table 32 20

ppx

02

Let c = (RS, PM , SM ).

Virtual address va = (px, bx)

px: page index bx: byte index

P Tc(px) = PM 4(hptoi + 4 · hpxi)

= 31 ppx[19 : 0] · · · r w1 0v

12 11 3 2

ppxc(va): physical page index

v (va): valid bit (↔ page in PM)

(10)

Address Translation

(pto,012) px bx

+

Page Table 32 20

ppx

02

pmac(va)

20

12 Let c = (RS, PM , SM ).

Virtual address va = (px, bx)

px: page index bx: byte index

pmac(va) = (ppxc(va), bx)

pmac: physical memory address

To access swap memory, we also define:

smac: swap memory address (e.g. sbasec + va)

(11)

Instruction Execution

DLX V uses virtual addresses:

Fetch: va = DPC (delayed PC)

Effective address of load/store:

va = ea = GPR(RS1 ) + imm (register relative) Hardware DLX S for mode = 1 (user):

If vc(va), use translated addresses pmac(va) instead of va.

Otherwise, exception.

(hardware supplies parameters for page fault handler)

(12)

Hardware Implementation

IMMU DMMU fetch

load, store

DCache ICache

PM CPU

Build 2 hardware boxes MMU (memory management unit for fetch and load/store) between CPU and caches

Show it translates

Done

(13)

Hardware Implementation

IMMU DMMU fetch

load, store

DCache ICache

PM CPU

Build 2 hardware boxes MMU (memory management unit for fetch and load/store) between CPU and caches

Show it translates

Done No!

(14)

Hardware Implementation

IMMU DMMU fetch

load, store

DCache ICache

PM CPU

Build hardware boxes MMU & a few gates

Identify software conditions

Show MMU translates if software conditions are met

Show software meets conditions

Almost done

We do not care about translation (purely technical), we care about a simulation theorem.

(15)

Simulating DLX

V

by DLX

S

Let c = (RS, PM , SM ) and cV = (RV , PM , r). Define a projection: cV = Π(c)

Identical register contents: RV (r) = RS(r)

Rights in page table:

R ∈ r(va) ⇔ rc(va) = 1

W ∈ r(va) ⇔ wc(va) = 1

(16)

Simulating DLX

V

by DLX

S

VM (va) =

PM (pmac(va) if vc(va)

SM (smac(va) otherwise

ppx

v

SM bx px

page(px) PT(px)

PM

va

1

PM cache for virtual memory, PT is cache directory.

Handlers (almost!) work accordingly (select victim, write back to SM, swap in from SM)

(17)

Simulating DLX

V

by DLX

S

VM (va) =

PM (pmac(va) if vc(va)

SM (smac(va) otherwise

SM bx px ppx

v

page(px) PT(px)

PM

va

0

PM cache for virtual memory, PT is cache directory.

Handlers (almost!) work accordingly (select victim,

(18)

Software Conditions

1. OS code and data structures (PT, sbase, free space) maintained in system area Sys ⊆ PM

2. OS does not destroy its code & data

Table Page ...

...

sbase PM

User

Sys

3. User program (UP) cannot modify Sys (impossibility of hacking)

4. Writes to code section are separated from reads in code section by sync or (syncing) rfe

Standard sync empties pipelined or OoO (out of order) machine before next instruction is issued.

! Swap in code then user mode fetch

= self modification of code by OS & UP

(19)

Guaranteeing Software Conditions

1. Operating system construction 2. Operating system construction

3. No pages of Sys allocated via PT to UP

4. UP alone not self modifying, handlers end with rfe

(20)

Hardware I

CPU – memory system – protocol

load / store fetch

DCache ICache

PM CPU

Cache Miss

DPC

I

addr dout

DPC

mbusy mr

Cache Hit clk

I

(21)

Hardware II

Inserting 2 MMUs:

IMMU DMMU fetch

load, store

DCache ICache

PM CPU

Must obey memory protocol at both sides!

(22)

Primitive MMU

Primitive MMU controlled by finite state diagram (FSD)

c.addr[31:2]

p.din[31:0]

arce

[31:2]

add

(r,w,v) t

[31:12]

pte[31:0]

drce

c.din[31:0]

[11:0] [31:12]

[31:0]

(p.addr[31:2],0^2)

ptl[19:0] pto[19:0]

0^2 0^12

[31:0]

lexcp

[2:0]

1 0

1 0

< +

ar[31:0]

dr[31:0]

idle

add:

arce,add p.busy p.req &

p.t

seta:

arce,p.busy p.req &

/p.t lexcp

readpte:

c.mr,drce p.busy

/lexcp

c.busy

comppa:

arce,p.busy /c.busy pteexcp

read:

c.mr,drce p.busy

/pteexcp &

p.mr

write:

c.mw,p.busy /pteexcp &

p.mw /c.busy

c.busy

/c.busy

c.busy

p.mr p.mw

(23)

MMU Correctness

Local translation lemma:

Let T and V denote the start and end of a translated read request, no excp. Let t ∈ {T, . . . , V }.

Hypothesis: the following 4 groups of inputs do not change in cycles t (i.e. Xt = XT ):

G0 : va = p.addrt, p.rdt, p.wrt, p.reqt G1 : ptot, ptlt, modet

G2 : P T t

G3 : PM t(pmat(va)

Claim: p.dinV = PM T (pmaT (va)) Proof: plain hardware correctness

(24)

Guaranteeing Hypotheses G

i

G0 MMU keeps p.busy active during translation G1 Extra gates: normal sync before issue not

enough. If rfe or update to {mode, pto, ptl} in issue stage, stop translation of fetch of next

instruction.

G3 User program cannot modify Sys. Preceding system code terminated (by sync)

G4 Fetch: correct by sync

Load: assumes non-pipelined, in-order memory unit, extra arguments otherwise

(25)

Global Hardware Correctness (fetch)

Define scheduling functions I(k, T ) = i: instruction Ii is in stage k during cycle T iff (. . . )

Similar to tag computation

Key concept for hardware verification in SB

Hypothesis: I(fetch, T ) = i, translation from T to V Claim: IR.dinV = PM iS(pmaiS(DPC iS))

Formal proof: part of PhD thesis project of I. Dalinger (Khabarovsk State Technical University)

(26)

Virtual Memory Theorem (SW only!)

Consider a computation of DLX S:

Conf.

Phase Mode

Initialisation User Program Handler User Program Handler 0

c0 cα1

0 1

cα

1 c

0

c c

0 1

c

1 c

0 c

0 c

Initialisation: Π(cαS) = c0V Simulation Step Theorem:

! 2 page faults per instruction possible (fetch & load/store)

(27)

Virtual Memory Theorem II

Assume Π(ciS) = cjV . Define:

Same cycle or first cycle after handler:

s1(i) =

i if ¬pff i ∧ ¬pflsi

min{j > i, modej} otherwise

Cycle after pagefault-free user mode step:

s2(i) =

i + 1 if ¬pff i ∧ ¬pflsi

s1(s1(i)) + 1 otherwise Claim: Π(csS2(i)) = cj+1V

(28)

Liveness

We must maintain in Sys:

MRSI (most recently swapped-in page)

Page fault handler must not evict page MRSI !

Formal proof: PhD thesis project of T. In der Rieden (Saarbrücken)

(29)

Translation Look-aside Buffers I

1-level lookup: formally caches for PT-region of PM

Consistency of 4 caches:

ICache, DCache, ITLB, DTLB

Simply invalidate TLBs at mode switch to 1, sufficient by sync conditions

(30)

Translation Look-aside Buffers II

Multi-level lookup: TLB is simplified cache hardware:

Normal cache entry:

v=1 tag PM(tag,c ad) c ad

TLB entry:

v=1 tag pmat(tag,c ad) c ad

t : time of last sync / rfe

Invalidate at mode switch to 1

No writeback or load of lines

Only ‘cache’ reads and writes of values pma(va) by MMU

Formal verification trivial from verified cache

(31)

Multiuser with Sharing

Easy implementation and proof of protection properties using right bits r(va) and w(va)

(32)

Main Frames I

Proc Proc Proc

Shared Memory

PM: sequentially consistent shared memory (by cache coherence protocol)

New software condition: before change of any page table entry all processors sync

Sync hardware: some AND trees and driver trees interfaced with CPU

Considered alone almost completely meaningless.

(33)

Main Frames II

Theorem: user programs see sequentially consistent virtual shared memory

Proof: Phases OS - UPs OS UPs Global serial schedule:

in each phase from sequential consistency of physical shared memory

straight forward composition across phases

remaining arguments not changed!

(34)

Summary

Mathematical treatment of memory management

Intricate combination of hardware and software considered

Formalization under way

(35)

Future Work

Formal verification of

compilers,

operating systems,

applications,

communication systems in industrial context. . .

(36)

Future Work

Formal verification of

compilers,

operating systems,

applications,

communication systems in industrial context. . .

. . . with a little help from my friends.

Referenzen

ÄHNLICHE DOKUMENTE

Storage consumption remains an important issue due to limited main-memory capacities and better cache utilization for smaller storage and index structures. We examine worst-case

We show that a combination of invisible join and array-based aggregation increases memory efficiency enabling to query genome ranges that are one or- der of magnitude larger than

The other considered criteria, in descending order in view of performance increase, are the usage of compression, the adaptation of node types and size, and the usage of an

We presented a database design algorithm based on this model that finds partition- ings that minimize the number of cache misses for a given workload, and that is able to scale

In this paper we elaborate the concept of a Virtual Community of Students (vicos) who consists of different interest groups in and around Swiss Universities of Applied Sciences

The basic reference of the notion ‘head’ is body part, but frequently head expressions are used to refer to the presumed content of the head, that is the brain, the

Although a practice like geocaching ap- pears to have little to do with memory work at first glance, the approach based on practice theory can unveil such con- nections; a

Its form comprises the panels in horizontal format that form a sequence.. The other order