Architecture - Orestes: A Data Management Middleware for Low Latency

3.5 Orestes: A Data Management Middleware for Low Latency

3.5.1 Architecture

To expose existing data stores as a BaaS without prior modification, the Orestes middle-ware and itsunified REST APIhave to be powerful enough to expose the possible capabil-ities of the underlying database system (e.g., conditional updates) without compromising its non-functional properties (e.g., scalability of data volume or linearizable consistency).

To this end, the Orestes architecture is comprised of a superset of database system capa-bilities spanning from client-side persistence APIs to the server’s REST interface.

Figure 3.6 shows the Orestes middleware architecture designed to meet the requirements for database independence, Database- and Backend-as-a-Service functions, scalability, availability, and multi-tenancy as well as low latency with tunable consistency. The ar-chitecture encompasses the complete path from the client to the server and can therefore be split into the three parts: server, network, and client.

Java Persistence API JavaScript Persistence API REST/HTTP

GET /db/{bucket}/{id}

Forward-Proxy Caches

Reverse-Proxy Caches and Load Balancers

HTTP Server

Replicated State Service

Schema, Config-uration, Backend Code

Trans-actions

Que-ries Object

Persist. Schema Object

ACLs Index-ing DBaaS & BaaS

200 OK

Cache-Control: max-age=60 ETag: "3"

JSON Object

Transactions

HTTP Server

Config-uration Partial Updates Application

Server

Browser or Mobile Device

Access Control

Multi-Tenancy Schema Management Data Validation

Cache Coherence

Autoscaling

Function-as-a-Service Engine Code Runtime Local to Each Server

Application

Persistence API

Content Delivery Networks

Database-independent Concerns

Database APIs Data Store

SLAs HTTP Server

Purge &

Scale

MongoDB Redis

Elastic-Search Dynamo-DB Service

Orchestration

Cluster Orchestra-tion

Starting/Stopping Servers, Health Checks

Amazon S3

load() save() find() login()

Real-Time and Messaging Layer

Continuous Queries, Materialized Views, Query Result Invalidations

Figure 3.6: The Orestes middleware architecture with an exemplary request for loading a database object.

Server

The DBaaS/BaaS layer consists of a variable number of Orestes servers. The Orestes servers expose the REST/HTTP API and map it to the underlying database systems. The server tier and the database tier can be scaled and deployed independently. Through the stateless design of the Orestes servers, latency and throughput are only bounded by the saturated database system as the middleware can scale horizontally.

We distinguish between three types ofmodulesin the server middleware. Data Modules express the mapping of data operations in the REST API to the underlying database (e.g., CRUD operations, queries, indexing, system configuration). Data Modules have to be im-plemented for each database that is to be exposed through the unified REST API.Default Modules on the other hand implement database-independent concerns that can be pro-vided by default on top of a database system through a combination of data modules and middleware services (e.g., authentication, authorization, data validation, backend code, push notifications, transactions, schema management, SLA management, elastic scaling).

Default modules can be overwritten to leverage existing native capabilities (e.g., table-level ACLs for authorization). Core Modules contribute the technical foundations of the system and are orthogonal to the underlying database (e.g., web caching, load balancing, HTTP networking, logging, TLS encryption).

One of the central default modules in Orestes is the real-time and messaging layer In-valiDB. It provides continuous query processing on top of any underlying database system by operating purely at the level of generic Orestes objects. A scalable messaging layer connects Orestes servers to the InvaliDB stream processing cluster for the exchange of after-images (the state of objects after an update) as well as query registrations and match notifications. The ability to match updates against queries is necessary to enable caching of query results. The details of the real-time query matching as well as cache coherence for query results are described in depth in Chapter 4.

Service orchestration involves all processes concerned with tying together the Orestes servers as one coherent unit, exchanging shared state, managing tenants and providing the FaaS environment for backend code execution. The replicated state service holds the appli-cation schema, configuration parameters and the uninterpreted backend code functions.

It is partitioned in order to scale with increasing numbers of tenants and stored metadata.

The FaaS engine is co-located with each Orestes server and executes application-defined procedures (implemented in JavaScript/Node.js [TV10] in the Orestes prototype). Back-end code can have the form of handlers that are executed in event-based fashion upon CRUD operations. Alternatively, they can be established as explicit microservices that can be invoked by applications directly. This approach towards server-side code execution is employed in our BaaS architecture, because business logic and validation rules should oftentimes not be disclosed to clients.

The cluster orchestration is handled by a service that starts and stops new servers auto-matically based on workloads and machine or network failures. At the physical level, the

cluster is based on containerization (Docker Swarm [Swa17] in the Orestes prototype).

This architecture allows assigning many logical Orestes servers to physical servers in the form of individual containers [Mer14]. Pairs of Orestes server containers and FaaS con-tainers form an isolated network that does not interact with internal services or other tenants, but allows free communication to the Internet and between paired containers.

Multi-tenancy is therefore straightforward, as each tenant runs in a different pair of con-tainers that can be deployed on the cluster with little overhead and scaled across physical machines.

Module interfaces with different data management and BaaS capabilities decouple the access to underlying database systems. The major interfaces are:

CRUD. The CRUD interface contains the abstractions to create, read, update, and delete objects. Depending on the database, these objects can be key-value pairs, docu-ments, records, rows, or nodes in a graph. Orestes assumes version numbers in order to allow caching and concurrency control. The CRUD interface has to be im-plemented for each database system.

Schema. The schema API allows to create and evolve the application’s data model. Typ-ically, this interface is provided as a default module by Orestes and mapped to the replicated state service. However, schemaful database systems like RDBMSs can implement this interface.

Orestes. The Orestes interface bundles all service-related information and actions. This includes configuration, service discovery, system health, caching metadata, and in-formation about rate-limited users. The Orestes interfaces are not implemented for specific systems, but provided in a generic fashion.

Query. The query interface allows executing database-specific queries. Orestes makes no assumptions on the structures of the query, but requires a list of objects to be returned as a result and that pagination, respectively cursor operations are sup-ported. The implementation of this interface is beneficial for performance. Without it, Orestes cannot exploit database-internal optimizations and therefore has to fall back to full-table scans to filter data.

Prepared Query. Through the prepared query interface, queries can be registered in the database. On the one hand, this allows the database to pre-optimize and parse the query. On the other hand, it allows to only expose certain parameterizations of queries to users.

Index. The index API is closely tied to queries and enables the definition of explicit sec-ondary indexes for various data types, ranging from primitive types such as strings to full-text and geospatial data.

File. The file interface allows storing and retrieving blobs, as well as serving them over the web as assets for websites. Some database systems explicitly support file storage

(e.g., GridFS in MongoDB [CD13]) and can provide this capability. For systems that do not, Orestes falls back to the object store (S3 [Ama17a] in the prototype).

User. The user interface is concerned with the registration, management, and login of users via different protocols (e.g., OAuth [Har14]). As this interface can map to CRUD operations, its implementation is optional.

Code. The code API allows updating and retrieving server-side handlers and methods.

The implementation in the prototype realizes a scalable Node.js tier as an FaaS.

Device. Installations of mobile apps are tracked through the device interfaces. This allows sending push notifications to specific groups of devices or individual users. The implementation in the prototype supports the two wide-spread push protocols by Google and Apple [YAD14].

Partial Update. The partial update API permits modifying database objects in place, to circumvent read-modify-write cycles that are prone to contention. Without explicit partial update support, Orestes will use optimistic concurrency control based on version numbers of objects.

Transaction. The transaction API supports ACID transactions over arbitrary objects. The default implementation provides scalable cache-aware transactions as described in Chapter 4. For transactional database systems, the interface can be overridden.

Asset. Using the asset API, clients can use Orestes as a CDN: Orestes will fetch provided URLs from a given origin and apply the same caching techniques as for any data directly stored in the service. The interface thus allows easy integration and accel-eration of legacy systems.

Event Sockets. As the only non-HTTP interface, the event sockets API exposes real-time queries and event notifications. Clients can pose queries via WebSocket connections and receive updates to the respective query results in real time. The default imple-mentation using InvaliDB provides horizontal scalability and abstracts entirely from the underlying data store.

Integration of new database systems into the polyglot architecture of Orestes is easy, be-cause many features are available out-of-the-box through default modules. The Orestes server prototype is implemented in Java 8 and uses the Jetty framework [Jet] as a cen-tral HTTP networking component. The REST API uses a declarative framework that we implemented and made available as open-source⁵. An overview of the internal server architecture is given in Section 3.5.9.

Network

Orestes relies onHTTPas the core protocol and uses web caches through their standard-ized expiration-based caching model. The Orestes servers can be exposed through load balancers or reverse proxy caches, as their statelessness allows handling arbitrary client

5https://github.com/Baqend/restful-jetty

requests without sticky sessions. Orestes has an extensible purging interface to support a broad set of invalidation-based caches with non-standard invalidation APIs. Optimizing network performance (see Section 2.3) is handled by Orestes and does not require spe-cific support from the database systems. Optimizations include using fast TCP and TLS handshakes for lower connection establishment latency, as well as HTTP/2 and its new performance mechanisms.

Orestes can be combined with any type ofContent Delivery Network (CDN)for invalida-tion-based caching. In the general case, it will be treated by Orestes similar to any other cache. However, if the CDN supports the Varnish Configuration Language (VCL) [Kam17], the Orestes prototype enhances the cache with the ability to validate access tokens used for authentication, perform authorization on cached resources, and apply preventive mea-sures against distributed denial of service (DDoS) attacks, in particular by handling rate-limited requesters from the cache only. This form of edge computing allows minimizing latency for protected cached data without compromising latency.

Client

Clients can either be pureDBaaS clients (application servers) in three-tier architectures or BaaS clients (browsers and mobile devices) in two-tier architectures. While Orestes can be adopted in any programming language and platform that supports HTTP com-munication, persistence APIs provide additional benefits. In particular, they can obtain a seamless integration in the application data model by exposing objects as native classes in the respective language. Additionally, developers get easy-to-use APIs for working with the objects and executing queries, while the persistence API takes care of managing the object lifecycles and ensuring that identical objects share an identity in the scope of a per-sistence session [TGPM17]. In the following we describe the concrete perper-sistence APIs for the Orestes prototype – similar APIs could be developed for other programming languages as well.

The primary language used for clients in the Orestes prototype isJavaScript. JavaScript allows the persistence API to be applied for any type of websites and web app as well as mobile application frameworks based on web views (e.g., Ionic [Hil16]) or JavaScript engines (e.g., React Native, Titanium, NativeScript [Rea17]). The Orestes JavaScript SDK is based on concepts of the Java Persistence API [DeM09] from which it inherits the model of entity managers, dirty checking, persistence by reachability, and object lifecycles.

However, it extends it in many data management aspects like explicit support for semi-structured schemas and continuous queries. The BaaS functions are also deeply integrated into the persistence API. For example, the client and server can share data validation code and high-level APIs support login and registration of users.

Listing 3.1 shows an example of the JavaScript API. It makes use of language features introduced with the ECMAScript 2015 standard [Int17]. All asynchronous operations in the SDK are based on thePromise(also known as futures) concept, to avoid unstructured callback code. In the example, first, a newMessageobject is created and inserted (lines

1 // C r e a t e and i n s e r t a new m e s s a g e o b j e c t 2 c o n s t msg = new DB.M e s s a g e() ;

3 msg.name = ’ F e l i x ’;

4 msg.m e s s a g e = ’ H e l l o W o r l d . ’;

5 msg.i n s e r t() .then(() = > console .log(’ i n s e r t c o m p l e t e d . ’) ) ; 6 // P e r f o r m a q u e r y

7 DB.M e s s a g e.find()

8 .m a t c h e s(’ name ’, / ^ Fel /) 9 .d e s c e n d i n g(’ c r e a t e d A t ’)

10 .l i m i t(30)

11 .r e s u l t L i s t()

12 .then((r e s u l t) = > console .log(’ r e s u l t r e c e i v e d . ’) ) ; 13 // R e g i s t e r a user

14 DB.User.r e g i s t e r(new DB.User({’ u s e r n a m e ’:

’ t e s t @ e x a m p l e . com ’}) , ’ p a s s w o r d ’) ;

15 // Load a user and p e r f o r m a partial , c o m m u t a t i v e u p d a t e 16 DB.User.load(u s e r I d) .then((user) = > {

17 c o n s t u p d a t e = user.p a r t i a l U p d a t e()

18 .set(’ n i c k n a m e ’, ’ A l i c e ’) // sets ’ n i c k n a m e ’ to ’ A l i c e ’

19 .inc(’ age ’, 1) ; // i n c r e m e n t s ’ age ’ p r o p e r t y

20 r e t u r n u p d a t e.e x e c u t e() ; 21 }) ;

Listing 3.1: Example of using the Orestes JavaScript SDK.

1 to 5). TheMessage class is generated by the SDK based on the schema defined at the server and inherits methods for CRUD operations, as well as automatic dirty checking, to detect when changes need to be persisted to the server. All object instances are managed by an entity manager that guarantees referential integrity and is also responsible for cache coherence. Next, a query is executed using a builder pattern (lines 5 to 12). After that, a registration for a user is performed as a typical BaaS operation (lines 13 to 14). After-wards, a user is fetched by its ID, and a commutative partial update is performed as an example of a more advanced data management operation (lines 15 to 21).

Besides JavaScript, the Orestes prototype also supports TypeScript [Mic16], a statically typed language that is a superset of the JavaScript standard [Int17] which it is transpiled to. The advantage of TypeScript is that Orestes can generate so-called typings from the schema. Typings make the data model available for compile-time type checking and code completion, thus preventing potential bugs. The SDK is complemented by a Command Line Interface (CLI) to facilitate the development and deployment process. The CLI allows to start and stop tenants as well as deploying local folders, schemas and backend code to staging and production environments. Additionally, the CLI supports cloning boilerplate projects for different frontend environments and frameworks (e.g., React, React Native, Angular, Bootstrap, Ionic, and Vue).

The Java API of the Orestes prototype is a low-level API with the primary purpose of providing a foundation for integration and performance tests. It replaces a previous

im-plementation of the Java Data Objects (JDO) standard [Rus03b] that does not match the requirements of BaaS systems well. A different Java API is used for Android, that offers SDK abstractions geared particularly towards the requirements of mobile devices with limited computing and power capacities [Dom18]. A similar iOS SDK for Orestes is based on Swift [Sch17]. As the Orestes interface is additionally specified in the OpenAPI standard [Ope17], client SDKs for roughly 30 programming languages can be generated automatically. These do not provide advanced persistence features, but offer easy access to the REST interface through a typed API.

Im Dokument Low Latency for Cloud Data Management (Seite 107-114)