• Keine Ergebnisse gefunden

Distributed Data Management

N/A
N/A
Protected

Academic year: 2021

Aktie "Distributed Data Management"

Copied!
65
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Profr. Dr. Wolf-Tilo Balke

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Distributed Data Management

(2)

Appointments for the oral exams can be made for the following dates:

– 23. und 24. Juli

– 16. und 17. August

– 6., 7., 27. und 28. September

Please contact our secretary Regine

Dalkiran soon to get your individual slot.

Exams

(3)

9.1 Basic Chord Durability 9.2 Load Balancing

9.3 Power of Two Choices 9.4 Virtual Servers

9.5 LOCKSS

9.6 Special Purpose Databases

9.0 Durability

(4)

• Remember the Chord DHT

Hash Function for hashing data and nodes alikeEach node is responsible for address arc between

itself and the previous node

9.0 Basic Chord

0

2 6

5

1

3 7

Chord Ring

Identifier Node

successor(1) = 6

successor(6) = 7

successor(7) = 1

Example key space: 0…7

(5)

A new node takes over some responsibility from an older node

i.e. key-value pairs are moved to the new node

Each node “knows” some other nodes

Finger table with increasingly

distant nodes for 𝑂(log(𝑛)) routing

– Finger distance based on address space

Successor list of the next 𝑘 nodes in ring for supporting stabilization

– Independent from address space distance

9.0 Basic Chord

Responsible arc of 7

7 2

16 1

8 9 11 15

Fingers of 7 all pointing to 16 2-predecessors of 7

2-sucessors of 7 Data

(6)

Stabilize function continuously fixes broken finger table and successor list entries

Links to left / unreachable / failed nodes will be repaired

DHT routing will be resilient to failures

But: Basic Chord does not offer any data durability

Direct Storage:

Stored data and tuples are lost when a node is fails!

Indirect Storage

Uses soft states to ensure timely updates of indirect links

Data is lost if data providing node fails!

This lecture: How can we

introduce data durability to Chord?

9.0 Basic Chord

(7)

More issues with basic Chord

Hash function evenly distributes keys and nodes across the address space

Basic idea of hashing: even load distribution to the buckets

– But: often, this will not result in a load balanced system

User queries are usually not evenly distributed

“Hot topics” and “Long Tail”; i.e. data everybody wants and data nearly nobody wants

Even using a good hash function will not result in equal load distribution for nodes

Balancing necessary

Also this lecture: Load Balancing for DHTs

9.0 Basic Chord

(8)

For achieving durability in Chord, replication is needed

– k-resilient: k nodes need to crash to loose data – Simple Replication Strategies

• Just keep multiple copies

• Create new copies if a copy is lost

Load Balancing Replication

• Keep multiple copies

• Keep more copies of popular or high-in demand data

9.1 Basic Chord Durability

(9)

Multiple Copies using Successor List

– Store data at responsible node

Additionally, replicate data to the 𝑘 next other nodes

After a node fails, stabilize will repair routing

• After routing is repaired, replicate to the next successor/s until data is again replicated to 𝑘 additional nodes

9.1 Basic Chord Durability

store

replicate

(10)

Advantages

– After a node failure, its successor has the data already stored

• System function is not interrupted

Disadvantages

– Node stores 𝑘 intervals

• More data load

• Data localization more fuzzy

– After breakdown of a node

• Find new successor

• Replicate data to next successor

Message overhead during repair

– Stabilize-function has to check every successor-list

• Find inconsistent links

More message overhead

9.1 Basic Chord Durability

(11)

Multiples nodes per interval

– Responsibility of an address arc is fully shared by at least 𝑘 nodes

– New nodes arriving will be assigned to an arc

• New node obtains a copy of all arc data

• Responsibility is only split if 𝑘 is significantly exceeded

e.g. 2𝑘

New arc segment will have 𝑘 responsible nodes

New link structure: links to other nodes in same interval

New nodes are announced to all other nodes in interval

• Also possible: pass new node on to the next interval if already full

9.1 Basic Chord Durability

9 10 1

2 3

4 5

6

7

8

(12)

Data Insertion

– Replicate data to all other nodes in arc

Failure

– No copy of data needed

– Data is already stored within same interval

– If arc is critically low, borrow nodes from neighbor arcs

• Use stabilization procedure to correct fingers – As in original Chord

• Used by e.g. Kademlia (distributed BitTorrent Tracker)

9.1 Basic Chord Durability

1 2 4 5 6 7 9 10

store

(13)

Advantages

– Failure: usually, no additional copying of data needed – Rebuild intervals with neighbors only if critical

– Requests can be answered by 𝑘 different nodes

• Query load balancing possible

Disadvantages

– Less number of intervals as in original Chord

• Solution: Virtual Servers

9.1 Basic Chord Durability

(14)

• Load balancing goal:

– Query and/or storage load should be distributed equally over all DHT nodes

• Common assumption

– DHTs are naturally load-balanced

• Storage load balancing due to good hash function

9.2 Load Balancing

(15)

Assumption 1: uniform key distribution

– Keys are generated uniformly by hash function

Assumption 2: equal data distribution

– Uniform keys will result in uniform data – Data is thus uniformly distributed

Assumption 3: equal query distribution

– Uniform keys will result in uniform queries – Each node has thus a similar query load

• But are these assumption justifiable?

9.2 Load Balancing

(16)

• Analysis of distribution of data using simulation

• Example

– Parameters

4,096 nodes

500,000 documents

– Optimum

~122 documents per node

– Some items are highly replicated due to popularity

•  No optimal distribution in Chord without load balancing

9.2 Load Balancing

Optimal distribution of documents across nodes

(17)

• Number of nodes without storing any document

– Parameters

4,096 nodes

100,000 to 1,000,000 documents

Some nodes without any load

• Why is the load unbalanced?

• We need load balancing to keep the complexity of DHT management low

9.2 Load Balancing

(18)

• Definitions

– DHT with 𝑁 nodes

Optimally Balanced:

• Load of each node is around

1

𝑁

of the total load

A node is overloaded (or heavy)

• Node has a significantly higher load compared to the optimal distribution of load

Else the node is light

9.2 Load Balancing

(19)

Load Balancing Algorithms

Problem

• Significant difference in the load of nodes

• There are several techniques to ensure an equal data distribution

Power of Two Choices

• (Byers et. al, 2003)

Virtual Servers

• (Rao et. al, 2003)

Thermal-Dissipation-based Approach

• (Rieche et. al, 2004)

Simple Address-Space and Item Balancing

• (Karger et. al, 2004)

– …

9.2 Load Balancing

(20)

• Algorithms

Power of Two Choices (Byers et. al, 2003)

• John Byers, Jeffrey Considine, and Michael Mitzen-macher:

“Simple Load Balancing for Distributed Hash Tables“ in Second International Workshop on Peer-to-Peer Systems (IPTPS), Berkeley, CA, USA, 2003

– Virtual Servers (Rao et. al, 2003)

9.2 Load Balancing

(21)

Power of Two Choices

One hash function for nodes

• ℎ

0

Multiple hash functions for data

• ℎ

1

, ℎ

2

, ℎ

3

, … ℎ

𝑑

Two options

• Data is stored at one node only

• Data is stored at one node &

other nodes store a pointer

9.3 Power of Two Choices

(22)

Inserting Data x

– Results of all hash functions are calculated

• ℎ1(𝑥), ℎ2(𝑥), ℎ3(𝑥), … , ℎ𝑑(𝑥)

– Contact all 𝑑 responsible nodes

• Data is stored on the node with the lowest load

– Alternative: other nodes store pointer

– The owner of the item has to insert the document periodically

Prevent removal of data after a timeout (soft state)

9.3 Power of Two Choices

(23)

Retrieving

– Without pointers

• Results of all hash functions are calculated

• Request all of the possible nodes in parallel

• One node will answer

– With pointers

• Request only one of the possible nodes.

• Node can forward the request directly to the final node

9.3 Power of Two Choices

(24)

Advantages

– Simple

Disadvantages

– Message overhead for inserting data – With pointers

• Additional administration of pointers lead to even more load

– Without pointers

• Message overhead for every search

9.3 Power of Two Choices

(25)

• Algorithms

– Power of Two Choices (Byers et. al, 2003) – Virtual Servers (Rao et. al, 2003)

• Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, and Ion Stoica “Load Balancing in

Structured P2P Systems” in 2

nd

Int. Workkshop on Peer- to-Peer Systems (IPTPS), Berkeley, CA, USA, 2003

9.4 Virtual Servers

(26)

Chord Ring

Virtual Server

Each node is responsible for several intervals

• i.e. acts as multiple nodes

• log(𝑛) virtual servers

• "Virtual server"

9.4 Virtual Servers

(27)

• Each node is responsible for several intervals

Load balancing is achieved by creating or transferring virtual servers

• Virtual servers take over responsibility for an arc and obtain copies of data

• If a node is too heavy, it can transfer the virtual server to another node

– Different possibilities to change servers

• One-to-one

• One-to-many

• Many-to-many

9.4 Virtual Servers

Chord Ring

(28)

Rules for transferring a virtual server

Transfer from heavy node to light node

– The transfer of an virtual server should not make the receiving node heavy

Receiving node should have enough capacity

The transferred virtual server is the lightest virtual server that makes the heavy node light

Transfer as much as needed, but not more

– If no single virtual server can make the node light, just transfer the heaviest one

In a second iteration, another virtual server can be transferred to another node

9.4 Virtual Servers

(29)

• Scheme: One-to-One

– Light node picks a random ID

– Contacts the node x responsible for it – Accepts load if x is heavy

9.4 Virtual Servers

L L

L

L H L

H

H L

(30)

• Scheme: One-to-Many

Light nodes report their load information to directories – Heavy node 𝐻 request information on light nodes from

directory

• 𝐻 contacts the light node which can accept the excess load directly

9.4 Virtual Servers

L

1

L

4

L

2

L

3

H

3

H

2

H

1

D

1

D

2

L

5

(31)

Many-to-Many

– Heavy and light nodes rendezvous with directory

– Directories periodically compute the transfer schedule and report it back to the nodes

Nodes just follow directory plan

9.4 Virtual Servers

H

3

H

2

H

1

D

1

D

2

L

4

L

1

L

2

L

3

L

4

L

5

(32)

Virtual Servers

Advantages

• Easy shifting of load

– Whole Virtual Servers are shifted

Disadvantages

• Increased administrative and messages overhead

– Maintenance of all Finger-Tables

• A lot of load is shifted

9.4 Virtual Servers

(33)

• Simulation

Scenario

4,096 nodes

100,000 to 1,000,000 documents

Chord

M = 22 bits

Consequently, 2

22

= 4,194,304 nodes and documents

Hash function

Sha-1 (mod 2m)

random

Analysis

Up to 25 runs per test

9.4 Virtual Servers

(34)

9.4 Virtual Servers

Power of 2 Choices

+ Simple

+ Lower load

Nodes w/o load

Without load balancing

+ Simple

Bad load balancing

Virtual servers

+ No nodes w/o load – Higher max. load than

Power of Two Choices

(35)

Stands for: Lots Of Copies Keep Stuff Safe

Goal: disaster-proof long-term preservation of digital content

– Idea: distributing copies over the network will make access easy and keep material online, even in face of peer faults

– http://www.lockss.org

• HP Labs 1999

• Currently, many libraries world-wide participate in LOCKSS to preserve their digital content

– Base motivation: digital content is part of the world heritage and should be protected and preserved

• “...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.” — Thomas Jefferson, February 18, 1791

9.5 LOCKSS

(36)

LOCKSS is not a traditional archive

– Archives are for materials that are hard to replicate

i.e. original book from medieval ages

– Archives sacrifice access to ensure preservation

e.g. disaster-proof underground archive

LOCKSS ensures ubiquitous access and

preservation of digitally replicable material

– Allowing access puts preservation at risk, but risk can be minimized

• Central Question

– How do you ensure that copies in the system are not compromised and never lost?

9.5 LOCKSS

(37)

Design Goals of LOCKSS

Be affordable

Cheap hardware

Open-source software

Low administration “appliance”

Provide high data resilience and scalability

Provide heavy replication resilient to attacks and disasters

Scale to enormous rates of publishing

Allow access

Allow search and access features

Conform to publishers access controls

– Libraries take custody of content

9.5 LOCKSS

(38)

• Why is Long-Term Storage Hard?

– Large-scale disaster – Human error

– Media faults

– Component faults – Economic faults – Organized attack

– Organizational faults

– Media/hardware obsolescence – Software/format obsolescence – Lost context/metadata

9.5 LOCKSS

(39)

• Solving the problem

– Use a globally distributed P2P infrastructure

e.g. hosted by libraries

– Allows for affordable cost models

Commodity hardware

Reduce on-going costs

Replicate content, break correlations between replicas

Geographic, administrative, platform, media, formats…

Audit replicas proactively to detect damage

Data must be accessible to do this cheaply!

Regularly migrate content to maintain usability

To new hardware, formats, keys…

Avoid external dependencies

E.g. vendor lock-in, DRM issues

– Plan for data exit

9.5 LOCKSS

(40)

• Exploit existing replication

Testbed: electronic journals in libraries

Many libraries subscribe to the same materials – Appliances used by libraries around the world

Cheap PC with some storage

Libraries maintain existing relationships with publishers

Materials are subscribed to be collected/preserved

Run a P2P audit/repair protocol between LOCKSS peers

Not a file sharing application

Survive or degrade gracefully in the face of attacks

Latent storage faults & sustained attacks

9.5 LOCKSS

(41)

• How does LOCKSS actually work?

The LOCKSS audit/repair protocol

A peer periodically audits its own content

• To check its integrity

• Calls an opinion poll on its content every 3 months

• Gathers repairs from peers

– Raises alarm when it suspects an attack

• Correlated failures

• IP address spoofing

• System slowdown

9.5 LOCKSS

(42)

• Sampled Opinion Poll

– Each peer holds a poll for each document

• Reference list of peers it has discovered

• History of interactions with others (balance of contributions)

– Periodically (faster than rate of storage failures)

• Poller takes a random sample of the peers in its reference list

• Invites them to vote: send a hash of their replica

– Compares votes with its local copy

• Overwhelming agreement (>70%)  Sleep blissfully

• Overwhelming disagreement (<30%)  Repair

• Too close to call  Raise an alarm

– Repair: peer gets pieces of replica from disagreeing peers

• Re-evaluates the same votes

– Every peer is both poller and voter

9.5 LOCKSS

(43)

• Most replicas the same

– No alarms

• Some replicas corrupted

– Alarms very likely

– To achieve full corruption:

• Adversary must pass through

“moat” of alarming states

• Damaged peers vote with undamaged peers

• Rate limitation helps

9.5 LOCKSS

(44)

• Probability of Irrecoverable Damage

9.5 LOCKSS

(45)

Traditional databases are usually all-purpose systems

– e.g. DB2, Oracle, MySQL, …

– Theoretically, general purpose DB provide all features to develop any data driven application

Powerful query languages

SQL, can be used to update and query data; even very complex analytical queries possible

Expressive data model

• Most data modeling needs can be served by the relational model

9.6 Special Purpose Databases

(46)

Full transaction support

Transactions are guaranteed to be “safe”

i.e. ACID transaction properties

System durability and security

Database servers are resilient to failures

Log files are continuously written

» Transactions running during a failure can recovered Most databases have support for constant backup

» Even severe failures can be recovered from backups Most databases support “hot-standby”

» 2nd database system running simultaneously which can take over in case of severe failure of the primary system

Most databases offer basic access control

9.6 Special Purpose Databases

(47)

• In short, databases could be used as storage solutions in all kinds of applications

Furthermore, we have shown distributed databases which also support all features known from classical all-purpose databases

– In order to be distributed, additional mechanisms were needed

partitioning, fragmentation, allocation, distributed transactions, distributed query processor,….

9.6 Special Purpose Databases

(48)

However, classical all-purpose databases may lead to problems in extreme conditions

Problems when being faced with massively high query loads

i.e. millions of transactions per second

Load to high for a single machine or even a traditional distrusted database

Limited scaling

Problems with fully global applications

Transactions originate from all over the globe

Latency matters!

Data should be geographically close to users

Claims:

Amazon: increasing the latency by 10% will decrease the sales by 1%

9.6 Special Purpose Databases

(49)

Problems with extremely high availability constraints

• Traditionally, databases can be recovered using logs or backups

• Hot-Standbys may help during repair time

• But for some applications, this is not enough:

Extreme Availability (Amazon)

– “… must be available even if disks are failing, network routes are flapping, and several data centers are destroyed by massive

tornados”

– Additional availability and durability concepts needed!

9.6 Special Purpose Databases

(50)

• In extreme cases, specialized database-like systems may be beneficial

– Specialize on certain query types

Focus on a certain characteristic

• i.e. availability, scalability, expressiveness, etc…

– Allow weaknesses and limited features for other characteristics

9.6 Special Purpose Databases

(51)

• Typically, two types of queries can be identified in global businesses

OLTP queries

OnLine Transaction Processing

Typical business backend-data storage

• i.e. order processing, e-commerce, electronic banking, etc.

Focuses on data entry and retrieval

Usually, possible transactions are previously known and are only parameterized during runtime

The transaction load is very high

• Represents daily business

Each transaction is usually very simple and local

• Only few records are accessed in each transaction

• Usually, only basic operations are performed

9.6 Special Purpose Databases

(52)

OLAP queries

OnLine Analytical Processing – Business Intelligence Queries

• i.e. complex and often multi-dimensional queries

– Usually, only few OLAP queries are issued by business analysts

• Not part of daily core business

– Individual queries may need to access large amounts of data and uses complex aggregators and filters

• Runtime of a query may be very high

9.6 Special Purpose Databases

(53)

In the recent years, discussing “NoSQL”

databases have become very popular

– Careful: big misnomer!

• Does not necessarily mean that no SQL is used

– There are SQL-supporting NoSQL systems…

• NoSQL usually refers to “non-standard” architectures for database or database-like systems

– i.e. system not implemented as shown in RDB2

• Not formally defined, more used as a “hype” word

Popular base dogma: Keep It Stupid Simple!

9.6 Special Purpose Databases

(54)

• The NoSQL movement popularized the development of special purpose databases

In contrast to general purpose systems like e.g. DB2

• NoSQL usually means one or more of the following

Being massively scalable

Usually, the goal is unlimited linear scalability

Being massively distributedBeing extremly available

Showing extremely high OLTP performance

Usually, not suited for OLAP queries

9.6 Special Purpose Databases

(55)

Not being “all-purpose”

Application-specific storage solutions showing some database characteristics

Not using the relational model

Usually, much simpler data models are used

Not using strict ACID transactions

No transactions at all or weaker transaction models

Not using SQL

But using simpler query paradigms

Especially, not supporting “typical” query interfaces

i.e. JDBC

Offering direct access from application to storage system

9.6 Special Purpose Databases

(56)

• In short:

– Most NoSQL focuses on building specialized

high-performance data storage systems!

9.6 Special Purpose Databases

(57)

• NoSQL and special databases have been popularized by different communities and are driven by different design motivations

• Base motivations

Extreme Requirements

• Extremely high availability, extremely high performance, guaranteed low latency, etc.

• e.g. global web platforms

Alternative data models

• Less complex data model suffices

• Non-relational data model necessary

• e.g. multi-media or scientific data

Alternative database implementation techniques

• Try to maintain most database features but lessen the drawbacks

• e.g. “traditional” database applications, e.g. VoltDB

9.6 Special Purpose Databases

(58)

Motivation: Extreme Requirements

Extreme Availability

• No disaster or failure should ever block the availability of the database

Usually achieved by strong global replication

i.e. data is available in multiple sites with completely different location and connections

Guaranteed low latency

• Distances from users to data matters in term of latency

e.g. crossing the Pacific from east-coast USA to Asia easily amounts for 500ms latency

• Data should be close to users

e.g. global allocation considering the network layer’s performance

Extremely high throughput

• Some systems need to handle extremely high loads

e.g. Amazon’s four million checkouts during holidays

9.6 Special Purpose Databases

(59)

Community: Alternative Data Models

– This is where the NoSQL originally came from – Base idea:

Use a very simple data model to improve performance

No complex queries supported

e.g. Document stores

Data consist of key-value pairs and additional document payload

e.g. payload represents text, video, music, etc.

Often supports IR-like queries on documents

e.g. ranked full text searches

Examples

CouchDB, MongoDB

9.6 Special Purpose Databases

(60)

Key-Value stores

Each record consist of just a key-value pair

Very simple data and query capabilities

Put and Get

Usually implemented on top of a Distributed Hash Table

Example:

MemcacheDB and Amazon Dynamo

Both document and key-value stores offer low-level, one-record-at-a-time data interfaces

XML stores, RDF stores, Object-Oriented Databases, etc.

Not important in current context as most implementations have neither high performance nor are scalable

Those use the opposite philosophy of “classic” NoSQL: do it more

9.6 Special Purpose Databases

(61)

Community: Alternative Database Implementation

OLTP Overhead Reduction

– Base observation: most time in traditional OLTP processing is spent in overhead tasks

• Four major overhead sources equally attribute to most of the used time

Base idea

• Avoid overhead all those

sources of unnecessary overhead

9.6 Special Purpose Databases

(62)

Logging

“Traditional” databases write everything twice

Once to tables, once to log

Log is also forced to disk ⇒ performance issues

Locking

For ensuring transactional consistency, usually locks are used

Locks force other transaction to wait for lock-release

Strongly decreases maximum number of transactions!

Latching

Updates to shared data structures (e.g. B-tree indexes) are difficult for multiple threads

Latches are used (a kind of short-term lock for shared data structures)

9.6 Special Purpose Databases

(63)

Buffer Management

• Disk-based systems have problems randomly accessing small bits of data

• Buffer management locates the required data on disk and caches the whole block in memory

• While increasing the performance of disk based systems, it still is a considerable overhead by itself

9.6 Special Purpose Databases

(64)

• Current trend for overhead avoidance

Distributed single-thread minimum-overhead

shared-nothing parallel main-memory databases (OLTP)

• e.g. VoltDB (Stonebraker et al.),

Sharded row stores (mostly OLAP)

• e.g. Greenplum, MySQL Cluster, Vertica, etc.

9.6 Special Purpose Databases

(65)

10.0 Trade-Offs

CAP Theorem

BASE transactions

10.1 Showcase: Amazon Dynamo

Next Lecture

Referenzen

ÄHNLICHE DOKUMENTE

– Specialized root tablets and metadata tablets are used as an index to look up responsible tablet servers for a given data range. • Clients don’t communicate with

• If an acceptor receives an accept request with higher or equal number that its highest seen proposal, it sends its value to each learner. • A value is chosen when a learner

• Basic storage is offered within the VM, but usually additional storage services are used by application which cost extra.

– Page renderer service looses connection to the whole partition containing preferred Dynamo node. • Switches to another node from the

– Specialized root tablets and metadata tablets are used as an index to look up responsible tablet servers for a given data range. • Clients don’t communicate with

•  Send accept message to all acceptors in quorum with chosen value.

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 38. 13.1 Map

• Both private and public cloud services or even non-cloud services are used or offered simultaneously. • “State-of-art” for most companies relying on cloud