• Keine Ergebnisse gefunden

13.0 The Cloud •

N/A
N/A
Protected

Academic year: 2021

Aktie "13.0 The Cloud •"

Copied!
9
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Wolf-Tilo Balke Christoph Lofi

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Distributed Data Management

13.0 Cloud beyond Storage 13.1 Computing as a Service

SaaS PaaS IaaS

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 2

13.0 The Cloud

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 3

13.0 The Cloud

• The term “cloud computing” is often seen as a successor of client-server architectures

– Often used as synonym for centralized on-demand pay-what-you-use provisioning of

general computation resources

• e.g. compared to utility providers like electric power grids or water supply

• “Computing as a commodity”

– “Cloud” is used as a metaphor for the Internet

• Users or applications “just use” computation resources provided in the internet instead using local hardware or software

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 4

13.0 The Cloud

“Computation resources” can mean a lot of things:

Dynamic access to “raw metal”

Raw storage space or CPU time

Fully operational server are provided by the cloud

Low-level services and platforms

e.g. runtime platforms like Jave JRE

»

User can run application directly on cloud platform

»

No own servers or platform software needed

e.g. abstracted storage space like space

within a database or a file system

»

This is what we did in the last weeks!

13.0 The Cloud

Software services

i.e. some functionalities required by user software is provided “by the cloud”

»

Used via web service remote procedure calls

»

e.g. delegate a the rendering of a map in a user applciarion to Google Maps

Full software functionality

e.g. rented web applications replacing traditional server or desktop applications

»

e.g. rent CRM software online from SalesForce, use Google apps instead of MS Office, etc.

13.0 The Cloud

(2)

Underlying base problem

– Successfully running IT departments and IT infrastructure can be very difficult and expensive for companies

High fixed costs

• Acquiring and paying competent IT staff

“Competent” is often very hard to get…

Buying and maintaining servers

• Correctly hosting hardware

Proper power and cooling facilities, network connections, server racks, etc.

Buying and maintaining software

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 7

13.0 The Cloud

Load and Utilization Issues

• How much hardware resources are required by each application and / or service?

• How to handle scaling issues?

What happens if demand increases or declines?

How to handle spike loads?

“Digg Effect”

• Traditional data centers are

notoriously underutilized, often idle 85% of the time

Over provisioning for future growth or spikes

Insufficient capacity planning and sizing

Improper understanding of scalability requirements etc.

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 8

13.0 The Cloud

• Cloud computing centrally unifies computation resources and provides them on-demand

– Degree of centralization and provision may differ

• Centralize hardware within a department? A company? A number of companies? Globally?

• Provide resources only oneself? To some partners?

To anybody?

• How to compensate resource for resource usage?

Provide resources by a rental model (e.g. monthly fee)?

Provide resources metered on what-is-used basis (e.g. similar to electricity or water?)

Provide resources for free?

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 9

13.0 The Cloud

• Usually, three types of clouds are distinguished – Public Cloud

Private Cloud Hybrid Cloud

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 10

13.0 The Cloud

Public Cloud

•“Traditional” cloud computing

•Services and resources are offered via the internet to anybody willing to pay for them

–User just pays for services, usually no acquisition, administration or maintenance of hardware / software necessary

•Services usually provided by off-site 3rd party providers

– Open for use by general public

•Exist beyond firewall, fully hosted and managed by the vendor

•Customers are individuals, corporations and others

•e.g. Amazon's Web Services and Google AppEngine

•Offers startups and SMB’s quick setup, scalability, flexibility and automated management. Pay as you go model helps startups to start small and go big

– Security and compliance?

– Reliability and privacy concerns hinder the adoption of cloud

•Amazon S3 services were down for 6 hours

•What will Amazon do with all the data?

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 11

13.0 The Cloud

Private Cloud

• Cloud computing hardware are within

the premises of a company behind the cooperate firewall

• Resources are only provided internally for various departments

• Private clouds are still fully bought, build, and maintained by the company using it

–But not by the single departments!

–Still, costs could be prohibitive and cost might exceed public clouds

• Fine grained control over resources

• More secure as they are internal to organization

Schedule and reshuffle resources based on business demands

• Ideal for apps requiring tight security and regulatory concerns

• Development requires hardware investments and in-house expertise

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 12

13.0 The Cloud

(3)

Hybrid Cloud

• Both private and public cloud services or even non-cloud services are used or offered simultaneously

• “State-of-art” for most companies relying on cloud technology

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 13

13.0 The Cloud

Properties promised by Cloud computing Agility

• Resources are quickly available when needed

i.e. servers must not be ordered and build, software doesn’t need to be configured and installed, etc.

Costs

• Capital expenditure is converted to operational expenditure

Independence

• Services are available everywhere and for any device

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 14

13.0 The Cloud

Multi-tenancy

• Resources are shared by larger pool of users

• Resources can be centralized which reduces the costs

Load distribution of users differs

Peak loads can usually be distributed

Overall utilization and efficiency of resources is better

Reliability

• Most cloud services promise durable and reliable resources due to distribution and replication

Scalability

• If a user needs more resources or performance, it can easily provisioned

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 15

13.0 The Cloud

Low maintenance

• Cloud services or applications are not installed on user’s machines, but maintained centrally by specialized staff – Transparency and metering

• Costs for computation resources are directly visible and transparent

• “Pay-what-you-use” models

• Cloud computing generally promises to be beneficial for fast growing startups, SMBs and enterprises alike.

– Cost effective solutions to key business demands – Improved overall efficiency

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 16

13.0 The Cloud

• The cloud heavily encourages a self-service model – Users can just request the resources they need and use

them

from

13.0 The Cloud

Everything-as-a-Service

– In general, cloud providers of some computation resources “as a service”

– In the long run, all computation needs of a company should be modeled, provided and used as a service

• i.e. Amazon’s private and public cloud infrastructures:

everything is a service!

13.1 XaaS

(4)

Services provide a strictly defined functionality with certain guarantees

• Service description and service-level agreement (SLA)

• Services description explains what is offered by the service

• SLA further clarifies the provisioning guarantees

Often: performance, latency, reliability, availability, etc.

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 19

13.1 XaaS

• Usually, three main resources may be offered “as a service”

– Software as a Service

• SaaS

– Platform as a Service

• PaaS

– Infrastructure as a Service

• IaaS

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 20

13.1 XaaS

Server Insfrastructure

Platform Application

Client

• Application Services (services on demand) – Gmail, GoogleCalender

– Payroll, HR, CRM, etc – Sugarm CRM, IBM Lotus Live

• Platform Services (resources on demand) – Middleware, Intergation, Messaging, Information,

connectivity etc

– Amazon AWS, Boomi, CastIron, Google Appengine

• Infrastructure as services (physical assets as services) – IBM Blue House, VMWare Cloud Edition, Amazon EC2,

Microsoft Azure Platform, …

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 21

13.1 XaaS

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 22

13.1 XaaS

…?

CLOUD

Individuals Corporations Non-Commercial

Cloud Middle Ware

Storage Provisioning OS

Provisioning Network Provisioning

Service(apps)

Provisioning SLA(monitor), Security, Billing, Payment

Services Storage Network OS Resources

Infrastructure as a Service (IaaS) – Provides raw computation infrastructure,

i.e. usually a virtual server

• Successor to dedicated server rental – For the user, a virtual server

is similar to a real server

• Has CPU cores, main memory, hard disc space, etc.

• Usually provided as “self-service” raw machine

• User is responsible for installing and maintaining applications like e.g. operating system, databases or server software

User does not need to buy, host or maintain the actual hardware

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 23

3.1 IaaS

• The IaaS provider can host multiple virtual servers on a single, real machine

– Usually, 10-30 virtual severs per real server – Virtualization is used to abstract

server hardware for virtual servers

• Virtual system also often called virtual machines (neutral term) or appliances (usually suggesting preinstalled OS and software)

• e.g. Cloudera appliance from exercise 12

– Virtualization of hardware is usually handled by a so- called hypervisor,

e.g. Xen, KVM, VMWare, HyperV, …

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 24

3.1 IaaS

(5)

• In short, IaaS is virtualization on multiple hardware machines

– Normal Server

• 1 machine with one OS – Traditional virtualization

• 1 machine hosting multiple virtual servers – Distributed Application

• 1 appliance running un multiple machines

– IaaS

• Multiple machines running multiple virtual servers

• Dynamic load balancing between machines

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 25

3.1 IaaS

“Normal”

server

“Traditional”

virtualization IaaS

1 many

1 many #appliances

#machines Distributed

Appliance

• Hypervisor is responsible for allocating available resources to VMs

– Dispatch VMs to machines – Relocate VM to balance load – Distribute resources

• Network adaptors, logical discs, RAM, CPU cores, etc…

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 26

3.1 IaaS

• Usually, virtual machines offered by IaaS infrastructures cannot grow arbitrarily big

– Usually capped by actual server size or a smaller server group

• Really big applications are usually deployed in so-called Pods – Similar to database shards

– Group of machines running one or multiple appliances – Machines within a Pod are very tightly networked

– i.e. each Pod is a full copy of given virtual machines with full OS and application installed

•Usually, there are multiple copies of a given Pod (and its VMs)

•Each Pod is responsible for a disjoint part of the whole workload

– Pods are usually scattered across availability zones (e.g. data

centers or a certain rack)

•Physically separated, usually with own power / network, etc.

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 27

3.1 IaaS

• IaaS Pods

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 28 from CloudScaling.com

3.1 IaaS

– Simplified Pod example: GoogleMail

• Multiple Pods, each Pod running on multiple machines with a full and independent installation of Gmail software

• Load balancer decides during user log-in which Pod will handle the user session

Users are distributed across Pods

• Pods are dynamic by using shared GFS file system

3.1 IaaS

Mission critical applications should be designed such that they run in multiple availability zones on multiple Pods

– Cloud control system (CCS) responsible for distribution and replication

3.1 IaaS

(6)

Pod Architectures

– Each pod consists of multiple machines with mainboards, CPUs, and main memory – Question: where to put secondary storage?

– Usually, three options

• Storage area network (SAN)

• Direct attached storage (DAS)

• Network attached storage (NAS)

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 31

3.1 IaaS

SAN Pods

– Individual servers don’t have own secondary storage – Storage area network provides shared hard disks

storage for all machines of a Pod – Pro

• All machines have access to the same data

• Allows for dynamic load balancing or migration of appliances

e.g. VMware vMotion

Con

• Very very expensive

• Higher latency than direct attached storage

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 32

3.1 IaaS

SAN Pods

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 33

3.1 IaaS

DAS Pods

– Each server has its own set of hard drives – Accessing data from other servers may be difficult – Pro

• Cheap

• Low latency for accessing local data – Con

• Usually, no shared data access

• Usually, difficult to live-migrate appliances (due to no shared data) – But: by using clever storage abstractions, common

problem can be circumvented

• Use distributed file system or a distributed data store!

–e.g. Apache S3 & SimpleDB, Google GFS & BigTable, Apache HBase &

HFS, etc.

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 34

3.1 IaaS

DAS Pods

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 35

3.1 IaaS

• IaaS example: Amazon EC2

Elastic Compute Cloud is one of the core service of the Amazon Cloud Infrastructure

• Public IaaS Cloud

– Customers may rent virtual servers hosted at Amazons Data Centers

• Can freely install OS and applications as needed – Virtual servers are offered in different sizes and are

paid by CPU usage

• Basic storage is offered within the VM, but usually additional storage services are used by application which cost extra

e.g. S3 or SimpleDB

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 36

3.1 Amazon EC2

(7)

• Example: Small EC2 VM – 1.7 GB memory

– 1 EC2 Compute Unit (CU)

• 1 virtual core with 1 EC2 Compute Unit

• 1 CU is roughly a 1.2 GHz 2007 Xeon processor – 160 GB instance storage (150 GB plus 10 GB root

partition)

– 32-bit platform I/O Performance: Moderate – Costs $0.095 per hour

– Usually most users start will the small instance, also heavily used for testing

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 37 From July 2010

3.1 Amazon EC2

• Example: Extra Large Instance EC2 VM – 15 GB memory

– 8 EC2 Compute Units

• 4 virtual cores with 2 EC2 Compute Units each – 1,690 GB instance storage

– 4×420 GB plus 10 GB root partition – 64-bit platform

– I/O Performance: High – Costs $0.76 per hour

– Most cooperate users end with this instance due to higher IO speeds

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 38

3.1 Amazon EC2

• Example: High-Mem Quad XL EC2 VM – 68.4 GB of memory

– 26 EC2 Compute Units

• 8 virtual cores with 3.25 EC2 Compute Units each – 1690 GB of instance storage

– 64-bit platform – I/O Performance: High – Costs $2.68 per hour – Largest standard instance

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 39

3.1 Amazon EC2

Rough Estimations (Oct 2009) – Roughly 40.000 servers

– Uses standard server racks with 16 machines per rack

• Mostly packed with 2U dual-socket Quad-Core Intel Xeons

Roughly matches the High-Mem Quad XL instance…

Uses around 8 500GB Raid-0 disks

Target cost around $2500 per machine in average

75% of the machines are US, the remainder in Europe and Asia

– Amazon aims at a utilization rate of 75%

– Very rough guesses state that Amazon may earn

$25,264 per hour with EC2!

• http://cloudscaling.com/blog/cloud-computing/amazons-ec2-generating-220m-annually

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 40 From Oct 2009

3.1 Amazon EC2

Platform as a Service (PaaS) – Provides software platforms on demand

• e.g. runtime engines (JavaVM, .Net Runtime, etc.), storage systems (distributed file system, or databases), web services,

communication services, etc.

– PaaS systems are usually used to develop and host web applications or web services

User applications run on the provided platform – In contrast to IaaS, no installation and maintenance of

operation system and server applications necessary

• Centrally managed and maintained

• Services or runtimes are directly usable

3.2 PaaS

Google AppEngine provides users a managed Phyton or Java Runtime

Web applications can be directly hosted in AppEngine

• Just upload you WAR file and you are done…

Users are billed by resource usage

• Some free resources provided everyday

–1 GB in- and out traffic, 6.5 hours CPU, 500 MB storage overall

3.2 Google AppEngine

Resource Unit Unit cost

Outgoing Bandwidth GB $0.12 Incoming Bandwidth GB $0.10

CPU Time CPU hours $0.10

Stored Data GB / month $0.15 Recipients Emailed recipients $0.0001

(8)

• Each application can access system resources up to a fixed maximum

– AppEngine is not fully scalable!

AppEngine max values

• CPU: 1730 hours CPU per day; 72 minutes CPU per minute

• Data in or out: 1 TB per day; 10 GB per minute

• Request: 43M web service calls per day, 30K calls per minute

• Data storage: no limit (uses BigTable which can scale in size!!)

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 43

3.2 Google AppEngine

• Amazon Simple DB is data storage system roughly similar to Google BigTable

– http://aws.amazon.com/simpledb – Simple table-centric database engine

• SimpleDB is directly ready to use

No user configuration or administration

Accessible via web service

• SimpleDB is highly available, uses flexible schemas, and eventual consistency

Similar to HBase or BigTable

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 44

3.2 Amazon SimpleDB

– Any application may use SimpleDB for data storage

• Simple web service provided to interact with Simple DB

Create or delete a table (called domain)

Put and delete rows

Query for rows

– Users pay for storage, data transfer, and computation time

• 25 hours computation time (for querying) are free per month

–Later: $0.154 per machine hour

• 1 GB of data transfer is free per month

–Later: $0.15 per GB

• 1 Gb of data storage is free per month

–Later: $0.28 per GB

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 45

3.2 Amazon Simple DB

Software as a Service (SaaS)

Full applications are offered on-demand

• User just need to consume the software; no installation or maintenance necessary

– All administrative and maintenance tasks are performed by the Cloud provider

• e.g. hosting physical hardware, maintaining platforms, maintaining software, dealing with security, scalability, etc.

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 46

3.3 SaaS

• Salesforce.com On-Demand CRM software – Customer-Relationship-Management

• Cooperation with Google Apps in early summer – Provides simple online services for

• Customer database

• Lead management

• Call center

• Customer portal

• Knowledge Bases

• Email

• Collaboration environments

• Etc.

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 47

3.3 SalesForce

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 48

3.3 SalesForce

(9)

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 49

3.3 SalesForce

• Bills per month and user, based on edition

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 50

3.3 SalesForce

Google Apps

– Provides standard office application on-demand

• i.e. Targeting at the lower-end of the customer base of Microsoft Office

–MS counter with Office Online

– Google Apps provides

• Email & Groupware

• Spreadsheets

• Documents

• Presentations

• Online Forms

• Drawings

• etc.

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 51

3.3 Google Apps

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 52

3.3 Google Apps

Knowledge-Based Systems and Deductive Databases

Data Warehousing and Data Mining Techniques

• Spatial Databases and

Geographic Information Systems

• Seminar “Best of Data Mining”

Next Semester

Distributed Data Management

Thanks for your attention!

Referenzen

ÄHNLICHE DOKUMENTE

Knowledge-Based Systems and Deductive Databases – Wolf-Tilo Balke - Christoph Lofi – IfIS – TU Braunschweig 2.. Semantics

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 3.. 2.0

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 2..

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 2?.

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 2..

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 7?. 6.0

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 2.. 7.0

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig..