• Keine Ergebnisse gefunden

Relational Database Systems 2

N/A
N/A
Protected

Academic year: 2021

Aktie "Relational Database Systems 2"

Copied!
63
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Christoph Lofi Philipp Wille

Institut für Informationssysteme

Relational Database Systems 2

2. Physical Data Storage

(2)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 2

Data Storage Manager

Query Processor

Application Interfaces

Indices Statistics

DDL Interpreter

Query Evaluation

Engine Applications

Programs Object Code

Transaction Manager

Buffer Manager File Manager

Catalog/

Dictionary

Embedded DML Precompiler

DML Compiler

DB Scheme Application

Programs Direct Query Application

Programmers

DB Administrators

2 Architecture

(3)

2.1 Introduction 2.2 Hard Disks 2.3 RAIDs

2.4 SANs and NAS 2.5 Case Study

2 Physical Data Storage

(4)

• DBMS needs to retrieve, update and process persistently stored data

Storage consideration is an important factor in planning a database system (physical layer)

Remember:

The data has to be securely stored, but access to the data should be declarative!

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 4

2.1 Physical Storage Introduction

Headquarters in Redwood City, CA

(5)

• Data is stored on a storage media. Media highly differ in terms of

Random Access

Speed

Random/Sequential Read/Write

speed

Capacity

Cost

per Capacity

2.1 Physical Storage Introduction

(6)

Capacity: Quantifies the amount of data which can be stored

– Base Units: 1 Bit, 1 Byte = 23 Bit = 8 Bit

– Capacity units according to IEC, IEEE, NIST, etc:

Usually used for file sizes and primary storage (for higher degree of confusion, sometimes used with SI abbreviations…)

1 KiB = 10241 Byte; 1 MiB = 10242 Byte ; 1 GiB = 10243 Byte;

– Capacity units according to SI:

Usually used for advertising secondary/tertiary storage

1 KB = 10001 Byte ≈ 0.976 KiB; 1 MB = 10002 Byte ≈ 0.954 MiB;

1 GB = 10003 Byte ≈ 0.931 GiB; …

– Especially used by the networking community:

1 Kb = 10001 Bit = 0.125 KB ≈ 0.122 KiB ; 1 Mb = 10002 Bit = 0.125 MB ≈ 0.119 MiB

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 6

2.1 Relevant Media Characteristics

(7)

2.1 A Kilo-Joke

(8)

Random Access Time: Average time to access a random piece of data at a known media position

Usually measured in ms or ns

Within some media, access time can vary depending on position (e.g. hard disks)

Transfer Rate: Average amount consecutive of data which can be transferred per time unit

Usually measured in KB/sec, MB/sec, GB/sec,…

Sometimes also in Kb/sec, Mb/sec, Gb/sec

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 8

2.1 Characteristic Parameters

(9)

Volatile: Memory needs constant power to keep data

Dynamic: Dynamic volatile memory needs to be “refreshed”

regularly to keep data

Static: No refresh necessary

• Access Modes

Random Access: Any piece of data can be accessed in approximately the same time

Sequential Access: Data can only be accessed in sequential order

• Write Mode

Mutable Storage: Can be read and written arbitrarily Write Once Read Many (WORM)

2.1 Other characteristics

(10)

Online media

„always on“

Each single piece of data can be accessed fast e.g. hard drives, main memory

Nearline media

Compromise between online and offline

Offline media can automatically put “on line”

e.g. juke boxes, robot libraries

Offline media (disconnected media)

Not under direct control of processing unit Have to be connected manually

e.g. box of backup tapes in basement

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 10

2.1 Online, Nearline, Offline

(11)

• Media characteristics result in a storage hierarchy

• DBMS optimize data distribution among the storage levels

Primary Storage: Fast, limited capacity, high price, usually volatile electronic storage

Frequently used data / current work data

Secondary Storage: Slower, large capacity, lower price

Main stored data

Tertiary Storage: Even slower, huge capacity, even lower price, usually offline

2.1 The Storage Hierarchy

(12)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 12

2.1 The Storage Hierarchy

Cost Speed

Optical Disks, Tape

Primary

Secondary

Tertiary

Cache, RAM

Flash, Magnetic Disks

~100 ns

~10 ms

> 1 s

(13)

Type Media Size Random Acc. Speed

Transfer Speed

Characteristics Price Price/GB Pri L1-Processor Cache 32 KB 5 x 10-10s 15.4 GB/sec Vol, Stat,

RA,OL Pri DDR3-Ram

(Corsair Dominator Platinum Series)

8 GB 2.6 x 10-8s 12.3 GB/sec Vol, Dyn, Ra, OL

€ 160 € 20

Sec Harddrive SSD

(Samsung 840 PRO)

256 GB 4 x 10-6 s 513 MB/sec Stat, RA, OL € 187 € 0.73 Sec Harddrive Magnetic

(Seagate ST2000DM001)

2000 GB 5.7 x 10-4 s 153 MB/sec Stat, RA, OL € 100 € 0.05 Ter Blank recordable

DVD-R disk

4.7 GB 9.8 x 10-2s 11 MB/sec Stat, RA, OF, WORM

€ 0.15/Disk € 0.03 Ter LTO-5 tape

(TDK - LTO Ultrium 5 Data Cartridge)

1500 GB 58 s 280 MB/sec Stat, SA, OF € 15/Tape € 0.01

2.1 Storage Media – Examples

Pri= Primary, Sec=Secondary, Ter=Tertiary

Vol=Volatile, Stat=Static, Dyn=Dynamic, RA=Random Access, SA=Sequential Access

(14)

• Hard drives are currently the standard for large, cheap and persistent storage

– Usually used as the main storage media for most data in a DB

• DBMS need to be optimized for efficient disk storage and access

– Data access needs to be as fast as possible

– Often used data should be accessible with highest speed, rarely needed data may take longer

– Different data items needed for certain reoccurring tasks should also be stored/accessed together

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 14

2.2 Magnetic Disk Storage – HDs

(15)

Directionally magnetization of a ferromagnetic material

Realized on hard disk platters

Base platter made of non-magnetic aluminum or glass substrate

Magnetic grains worked into base platter to form magnetic regions

Each region represents 1 Bit

Read head can detect magnetization direction of each region

Write head may change direction

2.2 HD – How does it work?

(16)

Giant Magnetoresistance Effect (GMR)

Discovered 1988 simultaneously by Peter Grünberg and Albert Fert

Both honored with the 2007 Nobel Prize in Physics

Allows the construction of efficient read heads:

The electric resistance of an alternating ferro- and non-magnetic material giantally changes within changing directed magnetic fields

http://www.research.ibm.com/research/demos/gmr/cyberdemo1.htm

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 16

2.2 HD – Notable Technology Advances

(17)

Perpendicular

Recording (used since 2005)

Longitudal Recording limited to ~200 Gb/inch2 due to superparamagnetic effect

Thermal energy may spontaneously change magnetic direction

Perpendicular recording allows for up to 1000 Gb/inch2 Very simplified: Align

magnetic field orthogonal to surface instead of

parallel

Magnetic regions can be smaller

2.2 HD – Notable Technology Advances

(18)

Usage of magnetic grains instead of continuous magnetic material

Between magnetic direction transitions, Neel Spikes are formed

Areas of unsure magnetic direction

Neel Spikes are larger for continuous materials

Magnetic regions can be smaller as the transition width can be reduced

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 18

2.2 HD – Notable Technology Advances

(19)

A hard disk is made up of multiple double-sided platters

Platter sides are called surfaces

Platters are fixed on main spindle and rotate at equal and constant speed (common: 5400 rpm / 7200 rpm)

Each surface has it’s own read and write head Heads are attached to arms

Arms can position heads along the surface

Heads cannot move inde- pendently

Heads have no contact to surface and hover on top of an air bearing

2.2 HD – Basic Architecture

(20)

• Each surface is divided into circular tracks

Some disks may use spirals

• All tracks of all surfaces with the same diameter are called cylinder

Data within the same cylinder can be accessed very efficiently

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 20 EN 13.2

2.2 HD – Basic Architecture

(21)

• Each track is subdivided into sectors of equal capacity

a) Fixed angle sector subdivision

Same number of sectors per track, changing density, constant speed

b) Fixed data density

Outer tracks have more sectors than inner tracks

Transfer speed higher on outer tracks

• Adjacent sectors can be

2.2 HD – Basic Architecture

(22)

• Hard drives are not completely reliable!

– Drives do fail

– Means for physical failure recovery are necessary

Backups

Redundancy

• Hard drives age and wear down.

Wear significantly increases by:

Contact cycles (head parking) – Spindle start-stop

– Power-on hours

– Operation outside ideal environment

Temperature too low/high

Unstable voltage

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 22

2.2 HD – Reliability

(23)

Reliability measures are statistical values assuming certain usage patterns

Desktop usage (all per year): 2,400 hours, 10,000 motor start/stops, 25°C temperature

Server usage (all per year): 8,760 hours, 250 motor start/stops, 40°C temperature

Non-Recoverable read errors: A sector on the surface cannot be read anymore – the data is lost

Desktop disk: 1 per 1014 read bits, Server: 1 per 1015 read bits

Disk can detect this!

– Maximum contact cycles: Maximum number of allowed head contacts (parking)

2.2 HD – Reliability

(24)

Mean Time Between Failure (MTBF): Statistically anticipated time for a large disk population failing to 50%

Drive manufactures usually use optimistic simulations to guess the MTBF

Desktop: 0.7 million hours (80 years), Server: 1.2 million hours (137 years) – Manufacturers values

Annualized Failure Rate (AFR): Probability of a failure per year based on MTBF

AFR = OperatingHoursPerYear / MTBFhours

Desktop: 0.34%, Server: 0.73%

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 24

2.2 HD – Reliability

(25)

Failure rate during a hard disks lifespan is not constant

Can be better modeled by the “bathtub curve” having 3 components

Infant Mortality Rate Wear Out Failures Random Failures

2.2 HD – Reliability

(26)

• Report by Google

– 100,000 consumer grade disks (80-400GB, ATA Interface, 5,400- 7,200 RPM)

• Results (among others)

Drives fail often!

– There is infant mortality

– High usage increases infant mortality, but not later failure rates

– Observed AFR is around 7% and MTBF 16.6 years!

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für

Informationssysteme 26

Failure trends in a large disk drive population E. Pinheiro, W.-D. Weber, L. A. Barroso 5th USENIX conference on File and Storage Technologies, 2007

2.2 Real World Failure Rates

Careful: 2+ year results are biased. See reference.

(27)

Seagate ST32000641AS 2 TB (Desktop Harddrive, 2011)

Manufacturer’s specifications

2.2 HD – Example Specs

Specification Value

Capacity 2 TB

Platters 4

Heads 8

Cylinders 16,383

Sectors per track 63

Bytes per sector 512

Spindle Speed 7200 RPM

MTBF 85 years

AFR 0.34 %

(28)

• Assume a storage need of 100 TB. Only following HDs are available

– Capacity: 1 TB capacity each

– MTBF: 100,000 hours each (ca. 11 years)

• Consider using 100 of these disks independently (w/o RAID).

– Total Storage: 100,000 GB = 100 TB – MTBF: 1,000 hours (ca. 42 days)

– THIS IS BAD!

• More sophisticated ways of using multiple disks are needed

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 28

2.2 Reliability Considerations

(29)

• Alternative to hard-drives: SSD

Use microchips which retain data in non-volatile

memory chips

and contain no moving parts

• Use the same interface as hard disk drives

Easy replacement in most applications possible

• Key components

Memory

Controller

2.2 Solid State Disk (SSD)

(30)

Flash-Memory

Most SSDs use NAND-based flash memory Retains memory even without power

Slower than DRAM solutions

Single-level cell versus multi-level cell Wears down!

DRAM

Use volatile Random Access Memory

Ultrafast data access (< 10 microseconds)

Sometimes use internal battery or external power device to ensure data persistence

Only for Applications that require even faster access, but do not need data persistence after power loss

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 30

2.2 Memory

(31)

• The controller is an embedded processor

• Incorporates the electronics that bridge the NAND memory components to the host computer

• Some of its functions

error correction, wear leveling, bad block

mapping, read and write caching, encryption, garbage

collection

2.2 Controller

(32)

• Advantages

Low access time and latency

No moving parts  shock resistant

MTBF about 2 million hours

Lighter and more energy-efficient than HDDs

• Disadvantages

Divided into blocks/pages

If one byte changes the whole page has to be written

The old page will be marked as stale

Only whole blocks can be deleted

Limited ability of being rewritten (between 3,000 and 100,000 cycles per page)

Wear leveling algorithms assure that write operations are equally distributed to the pages

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 32

2.2 SSD – Summary

(33)

The disk controller organizes low level access to the disk

e.g. head positioning, error checking, signal processing

Usually integrated into the disk

Provides unified and abstracted interface to access the disks (e.g. LBA)

Connects disk to a peripheral bus (e.g. IDE, SCSI, FiberChannel, SAS)

The host bus adapter (HBA) bridges between the peripheral bus and systems internal bus (like PCIe, PCI)

Internal Bus usually integrated into systems main board

Often confused as being the disk controller

DAS (Directly Attached Storage)

2.2 HD – Controller

Disk Controller

Host Bus Adapter

Peripheral Bus

Internal Bus Inner System / Mainboard

(34)

• Sectors can be logically grouped to blocks by the operating system

Sectors in a block do not necessarily need to be adjacent

e.g. NTFS defaults to 4 KiB per block

8 sectors on a modern disk

Hardware address of a block is combination of

Cylinder number, surface number, block number within track

Controller maps hardware address to logical block

address

(LBA)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 34

2.2 HD – Controller

(35)

• Disk controller transfers content of whole blocks to buffer

Buffer resides in primary storage and can be accessed efficiently

Time needed to transfer a random block (4KiB/Block on ST3100034AS): (<10 msec)

Seek Time: Time needed to position head to correct cylinder (<8 msec)

Latency (Rotational Delay): Time until the correct block arrives below the head (<0.14 msec)

Block Transfer Time: Time to read all sectors of block (<0.01 msec)

Bulk Transfer Rate for n adjacent blocks (<20msec for n=10)

2.2 HD – Controller

(36)

Locating data on a disk is a major bottleneck

Try operating on data already in buffer

Aim for bulk transfer, avoid random block transfer

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 36

2.2 HD – Controller

(37)

• A single HD is often not sufficient

Limited capacity

Limited speed

Limited reliability

• Idea: Combine multiple HD into a RAID Array (Redundant Array of Independent Disks)

RAID Array treats multiple hardware disks as a single logical disk

More HDs for increased capacity

2.3 RAID

(38)

The RAID controller connects to multiple hard disks

Disks are virtualized and appear to be just one single logical disk

The RAID controller acts as an extended specialized HBA

(Host Bus Adapter)

Still DAS (Directly Attached Storage)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 38

2.3 RAID Controller

RAID Controller

Peripheral Bus

Internal Bus

Represented as single logical Disk

(39)

Mirroring (or shadowing): Increases reliability by complete redundancy

• Idea: Mirror Disks are exact copies of original disk

Not space efficient

• Read speed can be n times as fast, write speed does not increase

• Increases reliability. Assume

Two disks with a MTBF 11 years each

One original disk, one mirror disk

Assume disk failures are independent of each other (unrealistic)

Disk replacement time of 10 hours

2.3 RAID Principles – Mirroring

(40)

Striping: Improve performance by parallelism

• Idea: Distribute data among all disks for increased performance

BitLevel Striping: Split all bits of a byte to the disks

– e.g. for 8 disks, write i-th bit to disk i

– Number of disk needs to be a power of 2 – Each disk is involved in each access

Access rate does not increase

Read and write transfer speed linearly increases

Simultaneous accesses not possible

– Good for speeding up few, sequential and large accesses

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 40 Silber 11.3

2.3 RAID Principles – Striping

(41)

Block Level Striping: Distribute blocks among the disks

Only one disk is involved reading a specific block

Read and write speed of a single block not increased

Other disks still free to read/write other blocks

Read and write speed of multiple accesses increase

Good for large number of parallel accesses

2.3 RAID Principles – Striping

(42)

Error Correction Codes: Increase reliability with computed redundancy

Hamming Codes (~1940)

Can detect and repair 1 bit errors within

a set of n data bits by computing k

parity bits

n = 2k - k – 1

n=1, k=2; n=4, k=3; n = 11, k=4; n = 26, k=5; …

Especially used for in-memory and tape error correction

Media cannot detect errors autonomously

Not really used for hard drives anymore

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 42

2.3 RAID Principles – Error Correction Codes

(43)

Interleaved Parity (Reed-Solomon Algorithm on GD(2) Galois Field)

– Can repair 1-bit errors (when the error is known)

– Hard Disks can detect read errors themselves, no need for complete Hamming codes

– Basic Idea:

From n data pieces D1,…,Dn compute a parity data Dp by combining data using logical XOR (eXclusive OR)

XOR is associative and commutative Important: A XOR B XOR B = A

i.e. Dp= D1 XOR D2 XOR … XOR Dn

2.3 RAID Principles – Error Correction Codes

(44)

• Interleaved Parity. Example:

• A = 0101, B = 1100, C = 1011

• P = 0010 = A XOR B XOR C

• C is lost.

– P = A XOR B XOR CC = P XOR A XOR B

– C = A XOR B XOR C XOR A XOR B – C = A XOR A XOR B XOR B XOR C – C = 0 XOR C

– C = 1011

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 44

2.3 RAID Principles – Interleaved Parity

0101

XOR 1100

XOR 1011

P 0010

0010

XOR 0101

XOR 1100

C 1011

(45)

The 3 RAID principles can be combined in multiple ways

Not every combination is useful

This led to the definition of 7 core RAID levels

RAID 0 – RAID 6

The most dominant levels are RAID 0, RAID 1, RAID 1+0, RAID 5

In following examples, assume

A MTBF of 100,000 hours (11.42 years) per disk A Mean Time to Repair (MTTR) of 6 hours

Failure rate is constant and failures between disks are independent MTBFraid is the mean time to data loss within the raid if each failing

disk is replaced within the MTTR

D is the number of drives in the RAID set

C=200 GB is capacity of one disk, C capacity of whole raid

2.3 RAID in practical applications

(46)

• Mean Time to Repair (MTTR)

MTTR = TimeToNotice + RebuildTime – Assume time to notice of 0.5 hours

Rebuild time is the time for completely writing back lost data

Assume disk capacity of 200GB

Write back speed of 10 MB/sec

Consisting of reading remaining disks Computing parity / Reconstructing data

Rebuild time around 5.5 hours

– During rebuild, a RAID is especially vulnerable

MTTR = 6 hours

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 46

2.3 RAID in practical applications

(47)

File A (A1-Ax), File B (B1-Bx), File C (C1-Cx)

Raid 0

Block-Level-Striping only

Increased parallel access and transfer speeds, reduced reliability

All disks contain data (0% overhead) Works with any number of disks MTBFraid= MTBFdisk / D

4 disks:

MTBFraid= 2.86 years

Craid = 800 GB (0 GB wasted (0%))

Common size: 2 disks

2.3 RAID Levels

(48)

Raid 1

Mirroring only

Increased reliability, increased read transfer speed, low space efficiency

MTBFraid= MTBFdiskD/ (D! * MTTRD-1) 4 disks:

MTBFraid= 2.2 trillion years

Craid = 200 GB (600 GB wasted (75%))

Age of universe may be around 15 billion years…

Common size: 2 disks

MTBFraid= 95,130 years

Craid = 200 GB (200 GB wasted (50%))

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 48

2.3 RAID Levels

(49)

RAID 2

Not used anymore in practice

was used in old mainframes

Bit-Level-Striping Use Hamming Codes

Usually Hamming Code(7,4) – 4 data bits, 3 parity bits

Reliable 1-Bit error recovery (i.e. one disk may fail)

3 redundant disks per 4 data disks (75% overhead)

Ratio better for larger number of disks

MTBFraid = MTBFdisk2/ (D*(D-1) * MTTR)

7 disks (does not really make sense for 4 – not comparable to other values)

MTBF = 4,530 years

2.3 RAID Levels

(50)

RAID 3

Interleaved-Parity Byte-Level-Striping Dedicated Parity Disk

Bottleneck! Every write operation needs to update the parity disk.

No parallel writes

1 redundant disk per n data disks

Overhead decreases with number of disks while reliability decreases

25% overhead for 4 data disks

MTBFraid = MTBFdisk2/ (D*(D-1) * MTTR) 4 disks

MTBFraid= 15,854 years

Craid = 600 GB (200 GB wasted (25%))

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 50

2.3 RAID Levels

(51)

RAID 4

Block-Level Striping

As RAID 3 otherwise

4 disks (common size)

MTBFraid = 15,854 years

Craid = 600 GB (200 GB wasted (25%))

5 disks (also common size)

MTBFraid = 9,513 years

Craid = 800 GB (200 GB wasted (20%))

2.3 RAID Levels

(52)

RAID 5

Parity is distributed among the hard disks

May allow for parallel block writes

– As RAID 4 otherwise

– Bottleneck when writing many files smaller than a block

Whole parity block has to be read and re-written for each minor write

– Can recover from a single disk failure – MTBFraid and Craid as for RAID 3 & 4

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 52

2.3 RAID Levels

(53)

RAID 6

Two independent parity blocks distributed among the disks

May be implemented by parity on orthogonal data or by using Reed-Solomon on GF(28)

As RAID 5 otherwise

2 redundant disks per n data disks

Can recover from a double disk failure

No vulnerability during single failure rebuild

Very suitable for larger arrays

Writer overhead due to more complicated parity computation

MTBFraid= MTBFdisk3/ (D*(D-2)*(D-1) * MTTR2) 4 disks

MTBFraid= 132 million years

Craid= 400 GB (400 GB wasted (50%))

8 disks (common)

2.3 RAID Levels

(54)

Additionally, there are hybrid levels combing the core levels

RAID 0+1, RAID 1+0, RAID 5+0, RAID 5+1, RAID 6+6, …

Raid 1+0

Mirrored sets nested in a striped set

RAID 0 on sets of RAID 1 sets

Very high read and write transfer speeds, increased reliability, low space efficiency, limited maximum size

Most performant RAID combination

D1= Drives per RAID 1, D0=Number of RAID1 sets MTBFraid= MTBFdiskD1 / (D1! * MTTRD1-1) / D0

4 disks: D1 = 2, D0= 2

MTBFraid= 47,565 years

Craid= 400 GB (400 GB wasted (50%)) 6 disks: D1 = 2, D0= 3

MTBFraid= 31,706 years

Craid= 600 GB (600 GB wasted (50%))

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 54

2.3 Practical use of RAIDS

(55)

• RAIDs controllers directly connect storage to the system bus

– Storage available to only one system/ server/ application

• Number of disks is limited

– Consumer grade RAID: 2-4 disks – Enterprise grade RAID: 8-24+ disks

• Solutions

NAS (Network Attached Storage): Provide abstracted file systems via network (software solution)

SAN (Storage Area Network): Virtualized logical storage within a specialized network on block level (hardware

2.4 Beyond RAID

(56)

• Before discussing NAS, we need file systems

• A file systems is a software for abstracting file operations on a logical storage device

– Files are a collection of binary data

Creating, reading, writing, deleting, finding, organizing

– How does a file access translate into top-level operations on a logical

storage device?

e.g. which blocks have to be read/written?

Bridge between application software and (abstracted) hardware

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 56

2.4 File Systems vs. Raw Devices

Application Software

File System

Logical Storage

(57)

• Raw Devices access allows

applications to bypass the OS and the file system

• Application may directly tune aspects of physical storage

There is still the hard drive

controller…so, its not really direct

• May lead to very efficient implementations

2.4 File Systems vs. Raw Devices

Application Software

Logical

(58)

Idea: Provide a remote file system using already available network infrastructure

NAS: Network Attached Storage

Use specialized network protocols (e.g. CIFS, NFS, FTP, etc)

Easiest case: File Server (e.g. Linux+Samba)

Advantages:

Easy to setup, easy to use, cheap infrastructure Allows sharing of storage among several systems Abstracts on file system level (easy for most

applications)

Disadvantages

Inefficient and slow

large protocol and processing overhead

Abstracts on file system level (not suitable for special purposes like raw devices or storage virtualization)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 58

2.4 NAS – Network Attached Storage

Application Software

File System

Logical Storage Network

NAS Server

(59)

SANs offer specialized high-speed networks for storage devices

Usually uses local FibreChannel networks

Remote location may be connected via Ethernet or IP-WAN (Internet)

Network uses specialized storage protocols

iFCP (SCSI on FiberChannel)

iSCSI (SCSI on TCP/IP)

HyperSCSI (SCSI on raw ethernet)

SANs provide raw block level access to logical storage devices

Logical disks of any size can be offered by the SAN For a client system using a logical disk, it might

appears like a local disk or RAID

Client system has full control over file systems on

2.4 SAN – Storage Area Network

Application Software

File System

SAN

(60)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 60

2.4 SAN – Storage Area Network

SAN HBA

SAN/RAID HBA

Peripheral Bus (SCSI, SAS, etc.)

SAN Switch SAN Switch

SAN HBA

SAN Bus (iFCP) SAN

HBA

SAN HBA

SAN Switch

NAS Protocol (CIFS)

Ethernet Network

WAN-SAN Bus (HyperSCSI)

NAS Head

(61)

Advantages:

Very efficient

Highly optimized local network infrastructure

Optimized protocols with low overhead

Very flexible (any number of systems may use any number of disks at any location)

Helps for disaster protection

SAN can transparently span to even remote locations

May also employ NAS heads for NAS-like behavior

Disadvantages

2.4 SAN – Storage Area Network

(62)

• There are different types of storage

Usually, there is a storage hierarchy

Faster, smaller, more expensive storage

Slower, bigger, less expensive storage

• Hard drives are currently the most popular media

Mechanical device

High sequential transfer rates,

Bad random access times, low random transfer rates

Prone to failure

DBMS must be optimized for the used storage devices!

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme 62

2 Physical Storage

(63)

• Access Pathes

Physical Data Access

Index Structures

Physical Tuning

2 Next Lecture

Referenzen

ÄHNLICHE DOKUMENTE

• Access control (authorization) ensures that all direct accesses to database objects occur exclusively according to the modes and rules given by security policies. 12.2

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12 EN 1.6.1.. 1.1 Characteristics

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2.. 2 Physical

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4 SKS 10.5!. 3.1 Introduction to

– Both of the two child nodes to the left and right of the deleted element have the minimum number of elements (L-1) and then can then be joined into a legal single node with

• Cost estimate: (Height of the tree or 1 for hash index) plus #pages that contain overflow lists. 5.5

• For a non-cluster index the expected number of blocks accessed is #blocks ≈ #result size. • If no index is given, #blocks

• DB administrators may provide optimization hints to override optimizer heuristics and results. – Uses explain statement‟s PLAN_TABLE