Outlook
Background
Swapping
Contiguous Memory Allocation
Paging
Structure of the Page Table
Segmentation
Example: The Intel Pentium
2
Background
So far we considered how to share the resource CPU among processes.
Processes require an additional vital resource: RAM. Active processes must be kept in memory. They must share the amount of available physical memory.
Physical memory
Large array of words/bytes
Each is identified by one address
Contiguous sequence of addresses (0,…,2n-1)
How is this memory used?
Fetch, decode, execute, write back
Decode: possibly load more memory contents into registers
Write back: possibly store register into memory
(Some architectures: direct operations on memory)
Here we are however concerned about just memory accesses but not their meaning.
4
Basic Hardware Support
Support for (quick) memory access definitively required
Memory and registers are the only storage that the CPU can access directly (with appropriate machine instructions)
Operations on registers are fast (typically one clock cycle)
Operations on RAM is very slow (many CPU cycles). Access requires a CPU stall since data required is not available.
Because memory access is so frequent CPUs have caches (faster but also more expensive memory) in between
registers and RAM
Another question: how to protect several concurrently executing processes from
Overwriting data and code of each other?
Overwriting the kernel space?
We need some hardware support!
Basic Hardware Support
Each process gets a separate space in memory
Determine the range of legal access
Two register solution
Base and limit can only be overwritten in supervisor mode
Base and limit are deactivated in supervisor mode
6
Motivation: Address Binding
On execution each instruction
needs to be referable by a unique address requires address binding
Where does such binding take
place? see steps a user program typically goes through
Possible steps for address binding
Compile time – absolute code (example old MS-DOS .COM- format)
Load time – relocatable code
Execution time – movable during execution (special hardware must be available, most general purpose operating systems use this scheme)
Virtual and Physical Addresses
Physical address – the true address value which is to be loaded into the MAR
Virtual address – the value used by the processor
We also speak of physical address space and logical address space.
Compile and load time binding
physical address space = virtual address space
Execution time
Requires mapping from virtual address space into physical address space
Requires hardware support: Memory Management Unit (MMU)
8
A simple MMU Scheme
A combination of relocation and limit register
Virtual address space: 0 to max
Remark: Dynamic Loading, Linking and Shared Libraries
In general: dynamic loading
a program does not reside completely in memory
dynamically load called routines if called for the first time
advantage: routines never called to not occupy any memory
Library specific: static linking: loader combines system language libraries into the binary program image
waste of memory and disk space
Library specific: dynamic linking: similar to dynamic loading, but linking
process instead of loading is postponed until the first call postpone linking until execution time (see figure on slide 7 again)
Advantage: keep only one copy of the library code in memory instead of linking library routines into each program using it (Shared library)
Stubs for library reference
Locate the memory resident library or load it into memory (if not present)
Replace themselfes with address of the desired routine next time that particular code segment calls library routine directly incurs no further cost
10
Swapping
Temporarily move process and its address space to backing store
And bring it back to memory later for continued execution
Enables execution of more processes than would fit into memory
Dispatcher is responsible for swapping in the selected process and possibly swapping out of another one
12
Swapping
Address binding at compile or load time?
Process has to be swapped in at the same memory space
Execution time binding is much better!
Process can be swapped into a different location
But be careful: swapping of a process waiting for I/O?!?
Asynchronous I/O swapping new process into memory region accessed by I/O request
Solutions: Never swap such a process or use I/O buffer in kernel space
Typically (consider for instance UNIX) swapping is disabled when system load is low and only used when many
processes are using a threshold or more amount of memory
Reason: swapping is expensive (swapping time versus CPU
execution time)
Contiguous Memory Allocation
Our Model for this Section
Motivation
Assumption for now
Each process is contained in a single contiguous section of memory
Each process has exclusive access on this region
Managing free and occupied space?
Fixed partition scheme
Contiguous memory allocation
16
Fixed Partition Scheme
Fixed partition scheme (simple but outdated)
Equal sized memory partitions
Each process is located in one partition
Example:
p1 p2 p3 p4
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
…
Example: processes p1, p2, p3, p4 enter, process p2 leaves again, processes p5, p6 enter, ….
Contiguous Memory Allocation
Variable partition scheme
Organize free memory space in a table of memory
“holes”
Initially one large hole of free memory
On process arrival search hole which is large enough
Allocate as much as required only
Possibly splitting the hole
On process termination release block of memory
Possibly merging holes
18
Contiguous Memory Allocation
Dynamic storage allocation problem – which of the free holes to select?
First fit – find the first matching hole (the next time continue from there or start from the beginning)
Best fit – find the smallest hole the process fits in
Worst fit – find the largest hole
Simulation results: first and best fit are better than worst fit
First fit and best fit are comparable but first fit is generally faster
In general two problems can occur with contiguous memory allocation
Internal fragmentation
External fragmentation
Internal Fragmentation
Example: allocate 18462 bytes given a hole of 18464
Remaining hole of 2 bytes
Small holes make no sense and increase overhead to keep track of all holes
Solution: Use fixed sized blocks and allocate a sequence of such blocks
This leads however to some internal fragmentation
Allocated memory might be more than needed:
Required memory Unused memory
20External Fragmentation
As processes are loaded and removed from
memory free memory space is broken into little pieces
Problem: enough total free memory but no contiguous space of the required space exists
An extreme case:
P
1P
2P
3P
4P
5P
6?
Solutions for External Fragmentation
Compactation
Not possible for compile and load time address binding
Expensive scheme
Non-contiguous address space
Paging
Segmentation
22
Paging
Partition physical memory into fixed sized blocks frames
Logical memory is a contiguous address space build from fixed sized blocks pages
Page size has to be a power of two. Why?
Frame 4KB and 4-byte page table entries: addressable
physical memory?
24page number page offset
Examples:
Example: 32-byte memory and 4-byte pages
Logical Physical
6
11
Properties
A form of dynamic relocation (similar to table of base registers for each frame)
No external fragmentation
Internal fragmentation possible (maximum: page size - 1)
average ½ page size
Hardware requires a page table only for the currently running process
However, operating system has to maintain a page table for each process
Used when mapping logical address to physical address manually (e.g. when user process provides a logical address in an IO system call)
Used for restoring the page table to be used after a context switch
26
Hardware Support
Every access to memory goes thought the paging map efficiency is a mayor
consideration
Solution 1: Dedicated registers
Fast access
Expensive context switch (reload the whole table into dedicated registers)
Useful for small tables
Example – PDP-11: 16-Bit address, 8KB page size
8 table entries
Hardware Support
Solution 2: Keep page table in memory
Reasonable for large page tables
e.g. 32 Bit address, 4 KB page tables
more than one million entries
Only one register points to the page table
Context switch: load only one register!
(State of the art)
12 Bit Offset 20 Bit Page Number
28
The Problem with Solution 2
PNR Offset
+
PTBR
Frame Address Page Table
+ Data
physical address Logical address
Solution
Caching: translation look-aside buffer (TLB)
30
TLB issues
What happens on a TLB miss?
Lookup page NR in memory
Insert NR in Cache for next use
TLB full? OS has to follow a replacement policy (LRU, Random, …)
Some TLB allow “wired down entries” which are not replaced
TLB and Context switches?
Erase complete table to avoid wrong mappings
TLB Hit Ratio
Hit ratio – percentage of times a number is found
Effective memory access time (example)
Memory access: 100 ns
TLB search: 20 ns
Hit ratio: 80%
Access time on TLB hit: 20 + 100 = 120
Access time on TLB miss: 20 + 100 + 100 = 220
Effective access time = 0.8 * 120 + 0.2 * 220 = 140
40 percent slowdown
Reference locality hit ratio more than 98% here it would mean a slow down less than 22 %
32
Protection
Page table may contain additional flags
Read, write, execute, valid- invalid
The first three mentioned flags limit memory use to reading, writing, executing
The valid-invalid bit: limit
memory usage to valid pages
Alternative/supplementing solution: page table length register (PTLR)
The use of PTLR reduces memory overhead of the page table in case a process
Shared Pages
Shared Code must be
“reentrant code” – does not change during
execution (e.g. clear write flag in each page table register)
Example
40 users
150 KB editor + 50 KB data per user total memory required: 8000 KB
Sharing the editor code
150 KB + 40 * 50 KB
= 2150 KB
significant savings!
34
Hierarchical Paging
Recall: 32 Bit logical address, 4 KB page size 2
32/ 2
12Page table entries (about one million)
Reducing the page table size – 2 level paging
scheme
Quiz
36
Consider a two level paging scheme for 32 Bit adresses
Let the first 10 Bits be used for the outer page
Let the next 8 Bits be used for the inner pages
Let the remaining 14 Bits be used for the page offset
Ignoring any additional flags, what is (measured in Bytes):
the size of the outer page?
the size of an inner page?
the size of a page?
2 Level Paging Scheme and 64 Bit Addresses?
2^42 entries in the outer page!
Solution: n-level paging (e.g. 7-level paging)
Prohibitive number of memory accesses in case of a TLB miss!
We need other solutions here...
Offset 12 Bit Outer Page Index
42 Bit Inner Page Index
10 Bit
Hashed Page Tables
38
Consider 4kByte frames. What is the physical address of (32,17) in the depicted example?
Example
32 17
39 4 32 2
Inverted Page Tables
40
Example
PID P
2 12
1 14
4 1
3 2
2 3
1 4
2 3 19
16 Bit Addresses, 8-entry inverted page table.
Size of a page/frame?
Physical address in this example?
Segmentation
Segmentation
Recall the simple hardware solution:
relocation (sometimes also called base register)
and limit register
Segmentation
Extending this idea to a table of base and limit values
44
Segmentation Example
Segmentation Example
46
Quiz
Where is address 3|22 mapped to?
Quiz
48
Where is address 0|1500 mapped to?
Example: Intel Pentium
This architecture supports both
Pure segmentation and
Segmentation with paging
We do not consider the whole memory management structure of the Pentium but rather the major ideas on what it is based
Logical to physical address translation
50
Example: Intel Pentium
Example: Intel Pentium
Two types of pages possible
Page types are determined by a page directory flag
52
Summary
Memory management typically includes
Checking an address and address use (e.g. writing to an address) for legality
Mapping an address to a physical address
This can not be realized efficiently in software
Thus, memory management provided by an operating system is always constrained by the available hardware features (e.g. limit/base-register, translation tables, TLB)
We have considered two major techniques
Paging
Segmentation
(and a combination of both)
The study of memory management also includes
Fragmentation (internal, external)
Support for relocation to solve external fragmentation
Swapping to allow more process than would fit into memory
Sharing code or data
Protection (execute-only, read-only, read-write)
54
References
Silberschatz, Galvin, Gagne, „Operating
System Concepts“, Seventh Edition, Wiley, 2005