• Keine Ergebnisse gefunden

Cloud Data Management: Database- and Backend-as-a-Service

Im Dokument Low Latency for Cloud Data Management (Seite 61-65)

2.2 Backend Performance: Scalable Data Management

2.2.6 Cloud Data Management: Database- and Backend-as-a-Service

reusing high-level features such as schema modeling, transactions, and business logic across systems through a unified API. Orestes supports this strategy and provides stan-dard interfaces for different database systems to integrate them into a polyglot database middleware exposed through a unified set of APIs.

Deployment Model Data

Model

Structured Unstructured

RDBMS Machine Image Relational

Schema-free Unstructured

NoSQL Machine

Image Analytics Machine

Image

Managed RDBMS/

DWH Managed

NoSQL

Analytics- as-a-Service

RDBMS/

DWH Service NoSQL Service Analytics/

ML APIs

Database-as-a-Service

Figure 2.13: Classes of cloud databases and DBaaS systems according to their data model and deployment model.

DBaaS providers can also specifically develop a proprietary database or cloud infras-tructure to achieve scalability and efficiency goals that are harder to implement with standard database systems. A proprietary architecture enables co-design of the database or analytics system with the underlying cloud infrastructure. For example, Amazon namoDB provides a large-scale, multi-tenant NoSQL database loosely based on the Dy-namo architecture [Dyn17], and Google provides machine learning (ML) APIs for a variety of classification and clustering tasks [Goo17b].

This thesis will be primarily concerned with the manageddeployment model, as we are convinced that existing database technology offers a suitable basis for data management functionality but currently lacks the non-functional ability to be provided as a low-latency DBaaS/BaaS. We refer to Lehner and Sattler [LS13] and Zhao et al. [ZSLB14] for a com-prehensive overview on DBaaS research.

Backend-as-a-Service

Many data access and application patterns are very similar across different web and mobile applications and can therefore be standardized. This was recognized by the industry and lead to BaaS systems that integrate DBaaS with application logic and predefined building blocks, e.g., for push notifications, user login, static file delivery, etc. BaaS is a rather recent trend and similar to early cloud computing and Big Data processing, progress is currently driven by industry projects, while structured research has yet to be established [Use17, Par17, Dep17].

Web Applications and Mobile Apps With SDKs

Data

Storage Custom

Code Query,

Search Backend

Code User

APIs File

Storage Access Control REST/HTTP

and Hosting Internet

Backend-as-a-Service APIs Backend-as-a-Service Cloud

Infrastructure Business Logic (FaaS) DBaaS Standard APIs Orchestration Layer

(Multi-Tenancy, Scaling, Metering, Failover, …)

Figure 2.14: Architecture and usage of a Backend-as-a-Service.

Figure 2.14 gives an overview of a generic BaaS architecture as similarly found in com-mercial services (e.g., Azure Mobile Services, Firebase, and Kinvey [Baq18]) as well as open-source projects (e.g., Meteor [HS16], Deployd [Dep17], Hoodie [Hoo17], Parse Server [Par17], BaasBox [Bas17], and Apache UserGrid [Use17]).

The BaaS cloud infrastructure consists of three central components. The DBaaS compo-nent is responsible for data storage and retrieval. Its abstraction level can range from structured relational, over semi-structured JSON to opaque files. The FaaS component is concerned with the execution of server-side business logic, for example, to integrate third-party services and perform data validation. It can either be invoked as an explicit API or be triggered by DBaas operations. The standard API component offers common application functionality in a convention-over-configuration style, i.e., it provides defaults for tasks such as user login, push notifications, and messaging that are exposed for each tenant in-dividually. The cloud infrastructure is orchestrated by the BaaS provider to ensure isolated multi-tenancy, scalability, availability, and monitoring.

The BaaS is accessed through a REST API [Dep17,Hoo17,Par17,Bas17,Use17] (and some-times WebSockets [HS16]) for use with different client technologies. To handle not only native mobile applications but also websites, BaaS systems usually provide file hosting to deliver website assets like HTML and script files to browsers. The communication with the BaaS is performed through SDKs employed in the frontend. The SDKs provide high-level

abstractions to application developers, for example, to integrate persistence with applica-tion data models [TGPM17].

BaaS systems are thus confronted with even stronger latency challenges than a DBaaS:

all clients access the system via high-latency WAN network so that latency for retrieving objects, files, and query results determines application performance. Similar to DBaaS systems, BaaS APIs usually provide persistence on top of one single database technology, making it infeasible to achieve all potential functional and non-functional application re-quirements. The problem is even more severe when all tenants are co-located on a shared database cluster. In that case, one database system configuration (e.g., the replication protocol) prescribes the guarantees for each tenant [ADE12].

Private OS/

VM

VM

Hardware Resources Database Process

Database Schema

Private Process/

Container

Private Schema

VM Hardware Resources Database Process

Database Schema

VM Hardware Resources Database Process

Database Schema

Shared Schema

VM Hardware Resources Database Process

Database Schema Virtual Schema

Figure 2.15: Different approaches to multi-tenancy in DBaaS/BaaS systems. The dashed line indicates the boundary between shared and tenant-specific resources.

Multi-Tenancy

The goal of multi-tenancy in DBaaS/BaaS systems is to allow efficient resource pooling across tenants so that only the capacity for the global average resource consumption has to be provisioned and resources can be shared. There is an inherent trade-off between higher isolation of tenants and efficiency of resource sharing [ADE12]. As shown in Figure 2.15, the boundary between tenant-specific resources and shared provider resources can be drawn at different levels of the software stack [MB16, p. 562]:

• Withprivate operating system (OS)virtualization, each tenant is assigned to one or multiple VMs that execute the database process. This model achieves a high degree of isolation, similar to IaaS clouds. However, resource reuse is limited as each tenant has the overhead of a full OS and database process.

• By allocating a private process to each tenant, the overhead of a private OS can be mitigated. To this end, the provider orchestrates the OS to run multiple isolated database processes. This is usually achieved using container technology such as Docker [Mer14] that isolates processes within a shared OS.

• Efficiency can be further increased if tenants only possess aprivate schemawithin a shared database process. The database system can thus share various system re-sources (e.g., the buffer pool) between tenants to increase I/O efficiency.

• The shared schema model requires all tenants to use the same application that dictates the common schema. The schema can be adapted to specific tenant require-ments by extending it with additional fields or tables [KL11]. A shared schema is frequently used in SaaS applications such as Salesforce [Onl17].

The major open challenge for multi-tenancy of NoSQL systems in cloud environments is database independence and the combination with multi-tenant FaaS code execution.

If a generic middleware can expose unmodified data stores as a scalable, multi-tenant DBaaS/BaaS, the problems of database and service architectures are decoupled, and poly-glot persistence is enabled. In Chapter 3 we will outline the requirements for a generic, multi-tenant DBaaS/BaaS architecture and present our approach.

Most research efforts in the DBaaS community have been concerned with multi-tenancy and virtualization [ASJK11,AGJ+08,AJKS09,KL11,SKM08,WB09,JA07], database privacy and encryption [KJH15, Gen09, PRZB11, Pop14, KFPC16, PZ13, PSV+14], workload man-agement [CAAS07,ZSLB14,ABC14,Bas12,XCZ+11,TPK+13,LBMAL14,PSZ+07,Sak14], re-source allocation [MRSJ15,SLG+09], automatic scaling [KWQH16,LBMAL14], and bench-marking [DFNR14, CST+10, CST+10, PPR+11, BZS13, BKD+14, BT11, BK13, BT14, Ber15, Ber14] (see Section 6.4.4). However, several DBaaS and BaaS challenges have remained unsolved. This thesis is focused on providing the following improvements to DBaaS and BaaS systems:

Low latencyaccess to DBaaS systems, to improve application performance and al-low distribution of application logic and data storage

• Unified REST/HTTP access to polyglot data stores with service level agreements for functional and non-functional guarantees

Elastic scalabilityof read and query workloads for arbitrary database systems

• Generic,database-independent APIs and capabilitiesfor fundamental data man-agement abstractions such as schema manman-agement, FaaS business logic, real-time queries, multi-tenancy, search, transactions, authentication, authorization, user management, and file storage for single databases and across databases.

Im Dokument Low Latency for Cloud Data Management (Seite 61-65)