2.2 Backend Performance: Scalable Data Management
2.2.6 Cloud Data Management: Database- and Backend-as-a-Service
reusing high-level features such as schema modeling, transactions, and business logic across systems through a unified API. Orestes supports this strategy and provides stan-dard interfaces for different database systems to integrate them into a polyglot database middleware exposed through a unified set of APIs.
Deployment Model Data
Model
Structured Unstructured
RDBMS Machine Image Relational
Schema-free Unstructured
NoSQL Machine
Image Analytics Machine
Image
Managed RDBMS/
DWH Managed
NoSQL
Analytics- as-a-Service
RDBMS/
DWH Service NoSQL Service Analytics/
ML APIs
Database-as-a-Service
Figure 2.13: Classes of cloud databases and DBaaS systems according to their data model and deployment model.
DBaaS providers can also specifically develop a proprietary database or cloud infras-tructure to achieve scalability and efficiency goals that are harder to implement with standard database systems. A proprietary architecture enables co-design of the database or analytics system with the underlying cloud infrastructure. For example, Amazon namoDB provides a large-scale, multi-tenant NoSQL database loosely based on the Dy-namo architecture [Dyn17], and Google provides machine learning (ML) APIs for a variety of classification and clustering tasks [Goo17b].
This thesis will be primarily concerned with the manageddeployment model, as we are convinced that existing database technology offers a suitable basis for data management functionality but currently lacks the non-functional ability to be provided as a low-latency DBaaS/BaaS. We refer to Lehner and Sattler [LS13] and Zhao et al. [ZSLB14] for a com-prehensive overview on DBaaS research.
Backend-as-a-Service
Many data access and application patterns are very similar across different web and mobile applications and can therefore be standardized. This was recognized by the industry and lead to BaaS systems that integrate DBaaS with application logic and predefined building blocks, e.g., for push notifications, user login, static file delivery, etc. BaaS is a rather recent trend and similar to early cloud computing and Big Data processing, progress is currently driven by industry projects, while structured research has yet to be established [Use17, Par17, Dep17].
Web Applications and Mobile Apps With SDKs
Data
Storage Custom
Code Query,
Search Backend
Code User
APIs File
Storage Access Control REST/HTTP
and Hosting Internet
Backend-as-a-Service APIs Backend-as-a-Service Cloud
Infrastructure Business Logic (FaaS) DBaaS Standard APIs Orchestration Layer
(Multi-Tenancy, Scaling, Metering, Failover, …)
Figure 2.14: Architecture and usage of a Backend-as-a-Service.
Figure 2.14 gives an overview of a generic BaaS architecture as similarly found in com-mercial services (e.g., Azure Mobile Services, Firebase, and Kinvey [Baq18]) as well as open-source projects (e.g., Meteor [HS16], Deployd [Dep17], Hoodie [Hoo17], Parse Server [Par17], BaasBox [Bas17], and Apache UserGrid [Use17]).
The BaaS cloud infrastructure consists of three central components. The DBaaS compo-nent is responsible for data storage and retrieval. Its abstraction level can range from structured relational, over semi-structured JSON to opaque files. The FaaS component is concerned with the execution of server-side business logic, for example, to integrate third-party services and perform data validation. It can either be invoked as an explicit API or be triggered by DBaas operations. The standard API component offers common application functionality in a convention-over-configuration style, i.e., it provides defaults for tasks such as user login, push notifications, and messaging that are exposed for each tenant in-dividually. The cloud infrastructure is orchestrated by the BaaS provider to ensure isolated multi-tenancy, scalability, availability, and monitoring.
The BaaS is accessed through a REST API [Dep17,Hoo17,Par17,Bas17,Use17] (and some-times WebSockets [HS16]) for use with different client technologies. To handle not only native mobile applications but also websites, BaaS systems usually provide file hosting to deliver website assets like HTML and script files to browsers. The communication with the BaaS is performed through SDKs employed in the frontend. The SDKs provide high-level
abstractions to application developers, for example, to integrate persistence with applica-tion data models [TGPM17].
BaaS systems are thus confronted with even stronger latency challenges than a DBaaS:
all clients access the system via high-latency WAN network so that latency for retrieving objects, files, and query results determines application performance. Similar to DBaaS systems, BaaS APIs usually provide persistence on top of one single database technology, making it infeasible to achieve all potential functional and non-functional application re-quirements. The problem is even more severe when all tenants are co-located on a shared database cluster. In that case, one database system configuration (e.g., the replication protocol) prescribes the guarantees for each tenant [ADE12].
Private OS/
VM
VM
Hardware Resources Database Process
Database Schema
Private Process/
Container
Private Schema
VM Hardware Resources Database Process
Database Schema
VM Hardware Resources Database Process
Database Schema
Shared Schema
VM Hardware Resources Database Process
Database Schema Virtual Schema
Figure 2.15: Different approaches to multi-tenancy in DBaaS/BaaS systems. The dashed line indicates the boundary between shared and tenant-specific resources.
Multi-Tenancy
The goal of multi-tenancy in DBaaS/BaaS systems is to allow efficient resource pooling across tenants so that only the capacity for the global average resource consumption has to be provisioned and resources can be shared. There is an inherent trade-off between higher isolation of tenants and efficiency of resource sharing [ADE12]. As shown in Figure 2.15, the boundary between tenant-specific resources and shared provider resources can be drawn at different levels of the software stack [MB16, p. 562]:
• Withprivate operating system (OS)virtualization, each tenant is assigned to one or multiple VMs that execute the database process. This model achieves a high degree of isolation, similar to IaaS clouds. However, resource reuse is limited as each tenant has the overhead of a full OS and database process.
• By allocating a private process to each tenant, the overhead of a private OS can be mitigated. To this end, the provider orchestrates the OS to run multiple isolated database processes. This is usually achieved using container technology such as Docker [Mer14] that isolates processes within a shared OS.
• Efficiency can be further increased if tenants only possess aprivate schemawithin a shared database process. The database system can thus share various system re-sources (e.g., the buffer pool) between tenants to increase I/O efficiency.
• The shared schema model requires all tenants to use the same application that dictates the common schema. The schema can be adapted to specific tenant require-ments by extending it with additional fields or tables [KL11]. A shared schema is frequently used in SaaS applications such as Salesforce [Onl17].
The major open challenge for multi-tenancy of NoSQL systems in cloud environments is database independence and the combination with multi-tenant FaaS code execution.
If a generic middleware can expose unmodified data stores as a scalable, multi-tenant DBaaS/BaaS, the problems of database and service architectures are decoupled, and poly-glot persistence is enabled. In Chapter 3 we will outline the requirements for a generic, multi-tenant DBaaS/BaaS architecture and present our approach.
Most research efforts in the DBaaS community have been concerned with multi-tenancy and virtualization [ASJK11,AGJ+08,AJKS09,KL11,SKM08,WB09,JA07], database privacy and encryption [KJH15, Gen09, PRZB11, Pop14, KFPC16, PZ13, PSV+14], workload man-agement [CAAS07,ZSLB14,ABC14,Bas12,XCZ+11,TPK+13,LBMAL14,PSZ+07,Sak14], re-source allocation [MRSJ15,SLG+09], automatic scaling [KWQH16,LBMAL14], and bench-marking [DFNR14, CST+10, CST+10, PPR+11, BZS13, BKD+14, BT11, BK13, BT14, Ber15, Ber14] (see Section 6.4.4). However, several DBaaS and BaaS challenges have remained unsolved. This thesis is focused on providing the following improvements to DBaaS and BaaS systems:
• Low latencyaccess to DBaaS systems, to improve application performance and al-low distribution of application logic and data storage
• Unified REST/HTTP access to polyglot data stores with service level agreements for functional and non-functional guarantees
• Elastic scalabilityof read and query workloads for arbitrary database systems
• Generic,database-independent APIs and capabilitiesfor fundamental data man-agement abstractions such as schema manman-agement, FaaS business logic, real-time queries, multi-tenancy, search, transactions, authentication, authorization, user management, and file storage for single databases and across databases.