Contiguous Memory Allocation

(1)

(2)

Outlook



Background



Swapping



Contiguous Memory Allocation



Paging



Structure of the Page Table



Segmentation



Example: The Intel Pentium

2

(3)

(4)

Background



So far we considered how to share the resource CPU among processes.



Processes require an additional vital resource: RAM. Active processes must be kept in memory. They must share the amount of available physical memory.



Physical memory

 Large array of words/bytes

 Each is identified by one address

 Contiguous sequence of addresses (0,…,2ⁿ-1)



How is this memory used?

 Fetch, decode, execute, write back

 Decode: possibly load more memory contents into registers

 Write back: possibly store register into memory

 (Some architectures: direct operations on memory)



Here we are however concerned about just memory accesses but not their meaning.

4

(5)

Basic Hardware Support



Support for (quick) memory access definitively required



Memory and registers are the only storage that the CPU can access directly (with appropriate machine instructions)



Operations on registers are fast (typically one clock cycle)



Operations on RAM is very slow (many CPU cycles). Access requires a CPU stall since data required is not available.



Because memory access is so frequent CPUs have caches (faster but also more expensive memory) in between

registers and RAM



Another question: how to protect several concurrently executing processes from



Overwriting data and code of each other?



Overwriting the kernel space?

We need some hardware support!

(6)

Basic Hardware Support



Each process gets a separate space in memory



Determine the range of legal access



Two register solution



Base and limit can only be overwritten in supervisor mode



Base and limit are deactivated in supervisor mode

6

(7)

Motivation: Address Binding



On execution each instruction

needs to be referable by a unique address  requires address binding



Where does such binding take

place?  see steps a user program typically goes through



Possible steps for address binding

 Compile time – absolute code (example old MS-DOS .COM- format)

 Load time – relocatable code

 Execution time – movable during execution (special hardware must be available, most general purpose operating systems use this scheme)

(8)

Virtual and Physical Addresses



Physical address – the true address value which is to be loaded into the MAR



Virtual address – the value used by the processor



We also speak of physical address space and logical address space.



Compile and load time binding



physical address space = virtual address space



Execution time



Requires mapping from virtual address space into physical address space



Requires hardware support: Memory Management Unit (MMU)

8

(9)

A simple MMU Scheme



A combination of relocation and limit register

Virtual address space: 0 to max

(10)

Remark: Dynamic Loading, Linking and Shared Libraries

 In general: dynamic loading

 a program does not reside completely in memory

 dynamically load called routines if called for the first time

 advantage: routines never called to not occupy any memory

 Library specific: static linking: loader combines system language libraries into the binary program image

 waste of memory and disk space

 Library specific: dynamic linking: similar to dynamic loading, but linking

process instead of loading is postponed until the first call  postpone linking until execution time (see figure on slide 7 again)

 Advantage: keep only one copy of the library code in memory instead of linking library routines into each program using it (Shared library)

 Stubs for library reference

 Locate the memory resident library or load it into memory (if not present)

 Replace themselfes with address of the desired routine  next time that particular code segment calls library routine directly  incurs no further cost

10

(11)

(12)

Swapping

 Temporarily move process and its address space to backing store

 And bring it back to memory later for continued execution

 Enables execution of more processes than would fit into memory

 Dispatcher is responsible for swapping in the selected process and possibly swapping out of another one

12

(13)

Swapping



Address binding at compile or load time?

 Process has to be swapped in at the same memory space



Execution time binding is much better!

 Process can be swapped into a different location



But be careful: swapping of a process waiting for I/O?!?

 Asynchronous I/O  swapping new process into memory region accessed by I/O request 

 Solutions: Never swap such a process or use I/O buffer in kernel space



Typically (consider for instance UNIX) swapping is disabled when system load is low and only used when many

processes are using a threshold or more amount of memory



Reason: swapping is expensive (swapping time versus CPU

execution time)

(14)

Contiguous Memory Allocation

(15)

Our Model for this Section

(16)

Motivation



Assumption for now



Each process is contained in a single contiguous section of memory



Each process has exclusive access on this region



Managing free and occupied space?



Fixed partition scheme



Contiguous memory allocation

16

(17)

Fixed Partition Scheme



Fixed partition scheme (simple but outdated)



Equal sized memory partitions



Each process is located in one partition

Example:

p1 p2 p3 p4

Partition 1 Partition 2 Partition 3 Partition 4 Partition 5

…

(18)

Example: processes p1, p2, p3, p4 enter, process p2 leaves again, processes p5, p6 enter, ….

Contiguous Memory Allocation



Variable partition scheme



Organize free memory space in a table of memory

“holes”



Initially one large hole of free memory



On process arrival  search hole which is large enough

 Allocate as much as required only

 Possibly splitting the hole



On process termination  release block of memory

 Possibly merging holes

18

(19)

Contiguous Memory Allocation



Dynamic storage allocation problem – which of the free holes to select?



First fit – find the first matching hole (the next time continue from there or start from the beginning)



Best fit – find the smallest hole the process fits in



Worst fit – find the largest hole



Simulation results: first and best fit are better than worst fit



First fit and best fit are comparable but first fit is generally faster



In general two problems can occur with contiguous memory allocation



Internal fragmentation

External fragmentation

(20)

Internal Fragmentation



Example: allocate 18462 bytes given a hole of 18464

 Remaining hole of 2 bytes

 Small holes make no sense and increase overhead to keep track of all holes



Solution: Use fixed sized blocks and allocate a sequence of such blocks



This leads however to some internal fragmentation



Allocated memory might be more than needed:

Required memory Unused memory

²⁰

(21)

External Fragmentation



As processes are loaded and removed from

memory  free memory space is broken into little pieces



Problem: enough total free memory but no contiguous space of the required space exists



An extreme case:

P

₁

P

₂

P

₃

P

₄

P

₅

P

₆

?

(22)

Solutions for External Fragmentation



Compactation



Not possible for compile and load time address binding



Expensive scheme



Non-contiguous address space



Paging



Segmentation

22

(23)

(24)

Paging



Partition physical memory into fixed sized blocks  frames



Logical memory is a contiguous address space build from fixed sized blocks  pages



Page size has to be a power of two. Why?



Frame 4KB and 4-byte page table entries: addressable

physical memory?

²⁴

page number page offset

(25)

Examples:

Example: 32-byte memory and 4-byte pages

Logical Physical

6

11

(26)

Properties



A form of dynamic relocation (similar to table of base registers for each frame)



No external fragmentation



Internal fragmentation possible (maximum: page size - 1)

 average ½ page size



Hardware requires a page table only for the currently running process



However, operating system has to maintain a page table for each process

 Used when mapping logical address to physical address manually (e.g. when user process provides a logical address in an IO system call)

 Used for restoring the page table to be used after a context switch

26

(27)

Hardware Support



Every access to memory goes thought the paging map  efficiency is a mayor

consideration



Solution 1: Dedicated registers



Fast access



Expensive context switch (reload the whole table into dedicated registers)



Useful for small tables



Example – PDP-11: 16-Bit address, 8KB page size

 8 table entries

(28)

Hardware Support



Solution 2: Keep page table in memory



Reasonable for large page tables



e.g. 32 Bit address, 4 KB page tables



 more than one million entries



Only one register points to the page table



Context switch: load only one register!



(State of the art)

12 Bit Offset 20 Bit Page Number

28

(29)

The Problem with Solution 2

PNR Offset

+

PTBR

Frame Address Page Table

+ Data

physical address Logical address

(30)

Solution



Caching: translation look-aside buffer (TLB)

30

(31)

TLB issues



What happens on a TLB miss?



Lookup page NR in memory



Insert NR in Cache for next use



TLB full?  OS has to follow a replacement policy (LRU, Random, …)



Some TLB allow “wired down entries” which are not replaced



TLB and Context switches?



Erase complete table to avoid wrong mappings

(32)

TLB Hit Ratio



Hit ratio – percentage of times a number is found



Effective memory access time (example)

 Memory access: 100 ns

 TLB search: 20 ns

 Hit ratio: 80%

 Access time on TLB hit: 20 + 100 = 120

 Access time on TLB miss: 20 + 100 + 100 = 220

 Effective access time = 0.8 * 120 + 0.2 * 220 = 140

  40 percent slowdown



Reference locality  hit ratio more than 98%  here it would mean a slow down less than 22 %

32

(33)

Protection

 Page table may contain additional flags

 Read, write, execute, valid- invalid

 The first three mentioned flags limit memory use to reading, writing, executing

 The valid-invalid bit: limit

memory usage to valid pages

 Alternative/supplementing solution: page table length register (PTLR)

 The use of PTLR reduces memory overhead of the page table in case a process

(34)

Shared Pages



Shared Code must be

“reentrant code” – does not change during

execution (e.g. clear write flag in each page table register)



Example

 40 users

 150 KB editor + 50 KB data per user total memory required: 8000 KB

 Sharing the editor code

 150 KB + 40 * 50 KB

= 2150 KB

  significant savings!

34

(35)

Hierarchical Paging



Recall: 32 Bit logical address, 4 KB page size  2

³²

/ 2

¹²

Page table entries (about one million)



Reducing the page table size – 2 level paging

scheme

(36)

Quiz

36



Consider a two level paging scheme for 32 Bit adresses

 Let the first 10 Bits be used for the outer page

 Let the next 8 Bits be used for the inner pages

 Let the remaining 14 Bits be used for the page offset



Ignoring any additional flags, what is (measured in Bytes):

 the size of the outer page?

 the size of an inner page?

 the size of a page?

(37)

2 Level Paging Scheme and 64 Bit Addresses?



 2^42 entries in the outer page!



Solution: n-level paging (e.g. 7-level paging)



 Prohibitive number of memory accesses in case of a TLB miss!

We need other solutions here...

Offset 12 Bit Outer Page Index

42 Bit Inner Page Index

10 Bit

(38)

Hashed Page Tables

38

(39)

 Consider 4kByte frames. What is the physical address of (32,17) in the depicted example?

Example

32 17

39 4 32 2

(40)

Inverted Page Tables

40

(41)

Example

PID P

2 12

1 14

4 1

3 2

2 3

1 4

2 3 19

16 Bit Addresses, 8-entry inverted page table.

Size of a page/frame?

Physical address in this example?

(42)

Segmentation

(43)

Segmentation



Recall the simple hardware solution:

relocation (sometimes also called base register)

and limit register

(44)

Segmentation



Extending this idea to a table of base and limit values

44

(45)

Segmentation Example

(46)

Segmentation Example

46

(47)

Quiz

Where is address 3|22 mapped to?

(48)

Quiz

48

Where is address 0|1500 mapped to?

(49)

(50)

Example: Intel Pentium



This architecture supports both



Pure segmentation and



Segmentation with paging



We do not consider the whole memory management structure of the Pentium but rather the major ideas on what it is based



Logical to physical address translation

50

(51)

Example: Intel Pentium

(52)

Example: Intel Pentium



Two types of pages possible



Page types are determined by a page directory flag

52

(53)

(54)

Summary

 Memory management typically includes

 Checking an address and address use (e.g. writing to an address) for legality

 Mapping an address to a physical address

 This can not be realized efficiently in software

 Thus, memory management provided by an operating system is always constrained by the available hardware features (e.g. limit/base-register, translation tables, TLB)

 We have considered two major techniques

 Paging

 Segmentation

 (and a combination of both)

 The study of memory management also includes

 Fragmentation (internal, external)

 Support for relocation to solve external fragmentation

 Swapping to allow more process than would fit into memory

 Sharing code or data

 Protection (execute-only, read-only, read-write)

54

(55)

References

Silberschatz, Galvin, Gagne, „Operating

System Concepts“, Seventh Edition, Wiley, 2005

