• Keine Ergebnisse gefunden

1FinalProject:AcausallyconsistentCRDTdatabase ExerciseProject:ProgrammingDistributedSystems(SS2018) TUKaiserslautern

N/A
N/A
Protected

Academic year: 2022

Aktie "1FinalProject:AcausallyconsistentCRDTdatabase ExerciseProject:ProgrammingDistributedSystems(SS2018) TUKaiserslautern"

Copied!
6
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Dr. Annette Bieniusa M.Sc. Peter Zeller M.Sc. Mathias Weber

TU Kaiserslautern

Fachbereich Informatik AG Softwaretechnik

Exercise Project: Programming Distributed Systems (SS 2018)

This exercise sheet will be your final project for this course. Successfully implement- ing this project is arequirement for being admitted to the exams.

You have to solve and submit this task individually or as a team, together with one other student. If you work in a group it must be visible for us, that every group member worked on the code (e.g. in the Git log).

Submit your code via your Git repository before Wednesday, July 11, 23:59. Either create a merge request or notify us via mail when your project is ready to be reviewed and tested. We can also give you feedback on work in progress.

You can find a template for the project in the shared repository under

https://softech-git.informatik.uni-kl.de/progdist18/progdist_material/tree/

master/code/minidote_template.

Homework policy Programming is a creative process. Individuals must reach their own understanding of problems and discover paths to their solutions. During this time, discussions with friends and colleagues are encouraged, and they must be acknowledged when you submit your written work. When the time comes to write code, however, such discussions are no longer appropriate. Each program must be entirely your own group’s work!

Do not, under any circumstances, permit any student from another group to see any part of your program, and do not permit yourself to see any part of another group’s program. In particular, you may not test or debug another group’s code, nor may you have someone from another group test or debug your code. If you can’t get code to work, consult the teaching assistant! You may look in the library (including the internet, etc.) for ideas on how to solve homework problems, just as you may discuss problems with your classmates. All sources must be acknowledged. The standard penalty for violating these rules in the assignment is to not pass this exercise.

(The above policies were adapted from policies used by Norman Ramsey at Purdue University in Spring 1996.)

1 Final Project: A causally consistent CRDT database

For the final project you will develop a replicated data store named “Minidote”1 . The database should be able to run replicated on multiple (2 - 10) machines. Each replica is a full replica (eventually) storing all the data. The database must be highly available and provide low latency, so every replica should be able to handle requests, even if it is disconnected from others.

1Named after “Antidote”, a planet-scale, available, transactional database with strong semantics. You are free to change the name of your project.

(2)

Data model: Minidote is a key-CRDT store: Each replicated data object is stored under a key. The store provides an API to read the current state of an object given a key and to update objects. The supported update operations depend on the data type of the object. For example a counter supports increment- and decrement operations, while a set supports add- and remove-operations.

We will use the Antidote CRDT library2to support a variety of replicated data types.

Check the readme file of the library and the lecture on CRDTs for information on how to use the library.

API: We use a Protocol Buffer3interface to let clients written in a variety of languages interact with our database. We will reuse the protocol buffer interface of the Antidote database, so that we can reuse existing clients and benchmarks. The code to handle protocol buffer requests is already provided in the template. It will call theread_objects

andupdate_objectsfunctions in theminidote_servermodule, which you have to implement.

The details of this API are explained below in 1.1.

Consistency model: The data-store must provide the following consistency guaran- tees:

Eventual visibility: Every event eventually becomes visible at all replicas.

Causality: Ife1 vis

−−→e2 and e2 vis

−−→e3, thene1 vis

−−→e3

Correct return values: Each CRDT has a specification, which maps an abstract execution to a return value (Hint: the CRDT library ensures correct return values, if you use is correctly).

For example using a multi-value register guarantees:

v∈rval(e)↔

∃e1 ∈E. op(e1) =assign(v)

6 ∃e2∈E. e1 vis

−−→e2∧ ∃v0. op(e2) =assign(v0)

Atomic operations: When several objects are updated with one call to

update_objects, then it should not be possible to observe a state, where some of the updates are visible and others are not.

Session guarantees: Each calle1 toupdate_objectsandread_objectsreturns a clock which identifies the database version after the operation was completed.

This clock can be passed to a succeeding API call e2. In this case, it must be guaranteed that e1−−→vis e2.

Durability: After an update operation returns, the value must be guaranteed to be persistently stored on at least one machine. The update must not be lost when the machine crashes and the database restarts later. Read operations may not return results, which are not yet persistently stored.

Fault model We assume a non-byzantine fault model. Messages might be lost or delayed. Nodes may crash and restart after some time.

In addition, the system should be able to handle unpredictable errors – if a single Erlang process crashes, supervisors should restart the relevant part of the system.

2https://github.com/SyncFree/antidote_crdt

3https://developers.google.com/protocol-buffers/

(3)

Performance requirements We are not trying to implement the world’s fastest data store here. It is more important to be correct than to be fast. However, you should try to achieve a decent performance and try to avoid strongly degrading performance if the system keeps running for a long time.

For the following performance requirements, we assume a system with 3 machines, each running one replica of the database. Each machine runs Linux, an Intel Core i5-4310M CPU, and has 8GB of RAM. The connection between nodes has an average latency below 50ms and allows for a throughput of at least 10 Mbit/s.

Staleness: Under normal operation, updates should be delivered in a timely man- ner: If all nodes are connected, latency between nodes is below 100ms, and there are less than 10 updates per second with less than 1kB of data per update, then other updates should become visible at other nodes within 5 seconds.

After a network problem (here: a node got disconnected for 10 seconds), up- dates should become visible within 30 seconds after the network connection has been restored to normal operation.

Latency: Assume a setup where 10 threads per node sequentially issue operations (50% read, 50% writes) on counter objects. The keys are chosen randomly with a power-law probability distribution4.

The average latency for write-operations should be less than 100ms and 95%

of write-operations should have a latency below 200ms.

For read-operations the average latency should be less than 50ms.

Throughput: For the same setup as above but with 100 threads, the system should be able to perform at least 500 operations per second.

Testing In the initial template you will find only some very basic tests. Over time, we will make more tests and benchmarks available.

To use our unit tests, you need to stick to the architecture described below.

Documentation and Code style We expect that you document your code with com- ments and write clean and readable code.

Architecture There are three main components, which you need to implement. In the overview below, these components are marked with solid borders. Components for which we provide libraries are marked with a dashed border.

4more precisely: a pareto distribution, where keys 0 to 1000 are chosen 80% of the time.

(4)

Key-value server (1.1)

minidote_server

Logged causal order broadcast (1.2)

minidote_logged_causal_broadcast

Persistent log (1.3)

minidote_op_log

Link layer

link_layer link_layer_distr_erl

disk_log

Antidote CRDTs Protocol buffer interface

minidote_pb

Client 1 Client 2 Client ...

other nodes The protocol buffer (PB) interface manages a set of sockets (using the ranch library).

Clients connect through these sockets and send requests as PB messages. The PB module translates these messages to Erlang terms, and calls the key-value server. The result from the key-value server is again encoded into a PB response and sent back to the client.

1.1 Key-value server

Module name: minidote_server

The key-value server holds the state of all objects in memory under their respective key. A key is a 3-tuple consisting of a main identifier (Key), the type of the datatype (Type), and a namespace (Bucket).

- t y p e key () :: {Key :: b i n a r y() , T y p e :: a n t i d o t e _ c r d t : typ () , B u c k e t :: b i n a r y()}.

The key-value server should provide the following API described below. You can choose an arbitrary representation for the typeclock().

- t y p e s e r v e r () :: pid() | a t o m().

% S t a r t s the key - v a l u e s e r v e r and r e t u r n s the p r o c e s s i d e n t i f i e r of the s e r v e r .

% The p r o c e s s w i l l be r e g i s t e r e d l o c a l l y u n d e r the g i v e n S e r v e r N a m e .

% D u r i n g i n i t i a l i z a t i o n the m i n i d o t e _ l o g g e d _ c a u s a l _ b r o a d c a s t is i n s t a n t i a t e d .

% B e f o r e the s e r v e r can r e t u r n any o t h e r r e q u e s t s , it r e c o v e r s f r o m the log .

% It w i l l r e c e i v e b r o a d c a s t m e s s a g e s f r o m the b r o a d c a s t m o d u l e and a

% l o g _ r e c o v e r y _ d o n e m e s s a g e o n c e all b r o a d c a s t m e s s a g e s f r o m the log

% h a v e b e e n s e n t .

- s p e c s t a r t _ l i n k (a t o m()) - > { ok, s e r v e r ()} | i g n o r e | { error, E r r o r :: any ()}. s t a r t _ l i n k ( S e r v e r N a m e ) - > ...

% T a k e s a l i s t of k e y s and r e t u r n s the v a l u e of the c o r r e s p o n d i n g o b j e c t s .

% To get the v a l u e of a C R D T it u s e s the a n t i d o t e _ c r d t : v a l u e f u n c t i o n .

% If t h e r e are no u p d a t e s for a key , the i n i t i a l v a l u e for the g i v e n T y p e is

% r e t u r n e d .

% The f u n c t i o n t a k e s a c l o c k value , w h i c h can be i g n o r e or c o m e f r o m the

% r e s u l t of a n o t h e r c a l l of r e a d _ o b j e c t s or u p d a t e _ o b j e c t s .

% If c l o c k c o m e s f r o m a n o t h e r call , it is g u a r a n t e e d t h a t t h i s

(5)

% r e a d o b s e r v e s a s t a t e t h a t is not o l d e r t h a n the s t a t e a f t e r the

% p r e v i o u s c a l l .

- s p e c r e a d _ o b j e c t s ( s e r v e r () , [key ()], c l o c k () | i g n o r e ) - >

{ ok, [any ()], c l o c k ()}

| { error, any ()}.

r e a d _ o b j e c t s ( Server , Objects , C l o c k ) - > ...

% T a k e s a l i s t of {key , u p d a t e} p a i r s and e x e c u t e s the g i v e n u p d a t e s a t o m i c a l l y .

% If s e v e r a l u p d a t e s are g i v e n for the s a m e key , the u p d a t e s are p e r f o r m e d

% s e q u e n t i a l l y f r o m l e f t to r i g h t .

% The f u n c t i o n t a k e s a c l o c k value , w h i c h can be i g n o r e or c o m e s f r o m the

% r e s u l t of a n o t h e r c a l l of r e a d _ o b j e c t s or u p d a t e _ o b j e c t s .

% If c l o c k c o m e s f r o m a n o t h e r call , it is g u a r a n t e e d t h a t t h i s

% u p d a t e o p e r a t i o n is a p p l i e d on a s t a t e t h a t is not o l d e r t h a n the s t a t e

% a f t e r the p r e v i o u s c a l l .

- s p e c u p d a t e _ o b j e c t s ( s e r v e r () , [ {key () , Op :: a t o m() , A r g s :: any ()} ], c l o c k ()) - >

{ ok, c l o c k ()}

| { error, any ()}.

u p d a t e _ o b j e c t s ( Server , Updates , C l o c k ) - > ...

1.2 Logged causal order broadcast

Module name: minidote_logged_causal_broadcast

This module provides functionality similar to the causal broadcast algorithm you implemented for Exercise 2. In addition to the features you implemented back then, the logged causal order broadcast module should be able to handle crashes. To this end, each message that is broadcast is stored in a log (using the persistent log module, see 1.3). On startup, this log is read and used to redeliver all messages from the log to all nodes (including the sender). When all messages from the log have been delivered, alog_recovery_done message is sent.

The module uses thelink_layerandlink_layer_distr_erlmodules to communicate with other nodes. Another minor optimization is that the broadcast function only delivers the message to other nodes and not to the sender (the sender only receives the messages during recovery).

% S t a r t s a p r o c e s s h a n d l i n g the b r o a d c a s t a l g o r i t h m .

% On s u c c e s s the f u n c t i o n r e t u r n s a t u p l e {ok , Beb}, w h e r e Beb is a process - id

% u s e d in l a t e r c a l l s to b r o a d c a s t ( see b e l o w ) .

% The f i r s t a r g u m e n t is a process - id .

% W h e n d e l i v e r i n g a b r o a d c a s t m e s s a g e Msg , the t u p l e {r c o _ d e l i v e r , Msg}

% s h o u l d be s e n t to t h i s p r o c e s s .

% The s e c o n d a r g u m e n t is a S e r v e r N a m e t h a t is p a s s e d to the log m o d u l e s

% s t a r t _ l i n k f u n c t i o n .

% A f t e r i n i t i a l i z a t i o n , r e c o v e r y f r o m the l o g s s t a r t s a u t o m a t i c a l l y .

% E a c h m e s s a g e in the log is s e n t to the R e s p o n d T o p r o c e s s .

% W h e n r e c o v e r y is c o m p l e t e , a m e s s a g e l o g _ r e c o v e r y _ d o n e is s e n t to the

% R e s p o n d T o p r o c e s s .

- s p e c s t a r t _ l i n k (pid() , a t o m() ) - > { ok, pid()} | i g n o r e | { error, E r r o r :: any () }.

s t a r t _ l i n k ( R e s p o n d T o , S e r v e r N a m e ) - > ...

% b r o a d c a s t s a m e s s a g e to all o t h e r n o d e s in the s y s t e m .

% The f i r s t a r g u m e n t is the process - id r e t u r n e d by s t a r t _ l i n k .

% The s e c o n d a r g u m e n t is the m e s s a g e to s e n d .

% W h e n the f u n c t i o n r e t u r n s ’ ok ’ it is g u a r a n t e e d , t h a t the

% m e s s a g e has b e e n p e r s i s t e n t l y s t o r e d to d i s k . - s p e c b r o a d c a s t (pid() , any () ) - > ok.

b r o a d c a s t ( B , Msg ) - > ...

% S t o p s the b r o a d c a s t p r o c e s s . - s p e c s t o p (pid() ) - > any () . s t o p ( B ) - > ...

% R e t u r n s the n a m e of the c u r r e n t n o d e . - s p e c t h i s _ n o d e (pid() ) - > any () .

t h i s _ n o d e ( B ) - > ...

(6)

1.3 Persistent Log

Module name: minidote_op_log

This module implements persistent storage of a log. The files created by this module should be stored in a folder that can be configured with the environment variable

OP_LOG_DIR, which should default to "data/op_log/".

The log keeps a sequence of log entries for each node in the system. Each log entry consists of an index number and some data. For each node, the logged entries are indexed consecutively starting from 1.

- t y p e l o g _ e n t r y () :: {I n d e x :: p o s _ i n t e g e r () , D a t a :: any ()}.

% S t a r t s the s e r v e r

% The f i r s t a r g u m e n t is a n a m e for the s e r v e r .

% Any f i l e s o p e n e d by t h i s m o d u l e m u s t i n c l u d e the S e r v e r N a m e in the f i l e p a t h

% to e n s u r e t h a t we can run l o c a l t e s t s w i t h o u t n a m e c l a s h e s .

% The s e c o n d a r g u m e n t is the process , w h i c h r e c e i v e s r e c o v e r y m e s s a g e s .

% A f t e r s t a r t i n g , log r e c o v e r y s t a r t s a u t o m a t i c a l l y .

% For e a c h e n t r y in the log , a m e s s a g e {l o g _ r e c o v e r y , Node , {Index , D a t a} }

% is s e n t to the R e c o v e r y R e c e i v e r and w h e n r e c o v e r y is f i n i s h e d a m e s s a g e

% ’ l o g _ r e c o v e r y _ d o n e ’ is s e n t .

- s p e c s t a r t _ l i n k (a t o m() , pid()) - > { ok, pid()} | i g n o r e | { error, E r r o r :: any ()}. s t a r t _ l i n k ( S e r v e r N a m e , R e c o v e r y R e c e i v e r ) - > ...

% Add a log e n t r y to the end of the log .

% S e r v e r is the p r o c e s s r e t u r n e d by s t a r t _ l i n k .

% N o d e is the n o d e () w h e r e the m e s s a g e o r i g i n a l l y c a m e f r o m .

% E n t r y is the log e n t r y to store , c o n s i s t i n g of i n d e x and d a t a .

% W h e n the f u n c t i o n r e t u r n s ’ ok ’ the e n t r y is g u a r a n t e e d to be p e r s i s t e n t l y s t o r e d .

% W h e n an e n t r y w i t h the s a m e i n d e x a l r e a d y e x i s t s for the g i v e n node ,

% an e r r o r is r e t u r n e d . If the p r e v i o u s i n d e x d o e s not exist , the

% c a l l w a i t s for it to be a d d e d .

- s p e c a d d _ l o g _ e n t r y ( s e r v e r () , n o d e() , l o g _ e n t r y ()) - > ok | { error, R e a s o n :: t e r m ()}. a d d _ l o g _ e n t r y ( Server , Node , E n t r y ) - > ...

% R e a d all log e n t r i e s b e l o n g i n g to a g i v e n n o d e and in a c e r t a i n r a n g e .

% The f u n c t i o n w o r k s s i m i l a r to l i s t s : f o l d l for r e a d i n g the e n t r i e s .

% S e r v e r is the p r o c e s s r e t u r n e d by s t a r t _ l i n k .

% N o d e is the n o d e () w h e r e the m e s s a g e o r i g i n a l l y c a m e f r o m .

% F i r s t I n d e x is the f i r s t i n d e x to r e a d .

% L a s t I n d e x is the l a s t i n d e x to r e a d ( or ’ all ’ for r e a d i n g all e n t r i e s ).

% F is the f o l d f u n c t i o n , w h i c h t a k e s a s i n g l e log e n t r y and the c u r r e n t

% a c c u m u l a t o r and r e t u r n s the new a c c u m u l a t o r v a l u e .

% Acc is the i n i t i a l a c c u m u l a t o r v a l u e .

% R e t u r n s the a c c u m u l a t o r v a l u e a f t e r r e a d i n g all m a t c h i n g log e n t r i e s . - s p e c r e a d _ l o g _ e n t r i e s ( s e r v e r () , n o d e() , i n t e g e r() , i n t e g e r() | all ,

fun (( l o g _ e n t r y () , Acc ) - > Acc ) , Acc ) - > { ok, Acc}.

r e a d _ l o g _ e n t r i e s ( Server , Node , F i r s t I n d e x , L a s t I n d e x , F , Acc ) - > ...

Hint: You can use Erlang’sdisk_logmodule for implementing the persistent log. The

syncfunction can be used to ensure data is persistently stored on disk. However, it can be slow, so you might not want to call it for every single call of add_log_entry. If there are concurrent requests you can use a single call tosync for all of them.

Referenzen

ÄHNLICHE DOKUMENTE

In any case, Turkey would still be the top refugee hosting country in the world, followed by Pakistan with 1.4 million refugees (UNHCR 2019) but the perceived ‘migration pressure’

May 7, 2015 - Sisi frees presidency, Cabinet from some public sector labor laws May 7, 2015 - Suez military court sentences 15 to life in prison for murder charges May 9, 2015

The election results brought Cypriots hope for the reunification talks to recommence, since the Greek Cypriot President Nicos Anastasiades and the newly elected Turkish

and Alaa Mubarak, on April 4; Mubarak-era Habib el-Adly was acquitted of all corruption charges in another embezzlement case known as the “license plates” case; Muslim

E arlier this month, the newly elected Greek Prime Minister Alexis Tsipras made his first foreign visit to the Greek Cypriot side where he indicated his government’s support for a

On the other hand, the Special Adviser of the UN Secretary-General on Cyprus Espen Barth Eide, who made a brief visit to the island from 13 to 14 January, said “there are no

A s the prospect of Palestinian statehood gained ground in several countries outside the Middle East, Arab League foreign ministers agreed in a meeting held in Cairo on 29

23, 2014 - Sisi: Egypt could send forces to stabilize future Palestinian