• Keine Ergebnisse gefunden

3.4 Apache Cassandra

3.4.4 Data Processing

In this subsection, we will give a short summary of the procedure of writing, reading and deleting data, as well as consistency levels in Cassandra.

3.4.4.1 Writing Data

A write request from the client is firstly handled by an arbitrary node called coordi-nator in Cassandra cluster, which does not have to hold the row being written. The coordinator forwards then the request toall nodes holding the relevant replica. When the write is complete (it has been written in theCommitLogandMemtable) on a replica node, it sends back a success acknowledgment to the coordinator. Once the coordina-tor gets enough success acknowledgments depending on the write consistency level, the request is considered successful. The coordinator will then respond back to the client.

Otherwise, the coordinator will inform the client that there are not enough available replicas to perform the write operation.

Figure 3.7shows the procedure of performing a write in a single data center, which hosts an eight-node cluster with the replication factor three adopting SimpleStrategy. The write consistency level isONE. When the coordinator receives the success acknowledg-ment from the first replica node (node C), the write operation is considered a success.

If a replica is not available at this moment, thereby missing the write. Cassandra will make it eventual consistent using one of the synchronization measures mentioned in the previous subsection.

B

C A

D E

F G

H

Client

write request

write response

Replica 1

Replica 2

Replica 3 Coordinator

Figure 3.7: Procedure of Writing Data (Write Consistency Level is ONE)

In Cassandra, there is no update operation as in an RDBMS, where an old value of a column is replaced by a new one. Instead, while writing an existing partition, the new value is written with the modification time (timestamp). When there is a read against that column, the value with higher timestamp will be returned. Hence, “update” in Cassandra is already simplified as an insert, which does not need to get and modify the previous value. This feature makes the “update” efficiently.

3.4.4.2 Reading Data

Similar with writing data, a read request is also first sent to an arbitrary node in the cluster. And then reading data is divided into two steps.

Step one: the coordinator forwards a direct read request to the closest replica node, and a digest request to a number of replica nodes determined by the read con-sistency level. Accordingly, these nodes respond back with the row/a digest of the requested data. If multiple nodes are contacted, the coordinator compares the rows in memory, and sends the most recent data (based on the timestamp included in each column) back to the client. If the read consistency level cannot be fulfilled at the moment, the coordinator has to respond back the client that reading data is failed.

Step two: after that, the coordinator may also contact the remaining replica nodes in the background. The rows from all replicas will be compared to detect the inconsistent data. If the replicas are not consistent, the up-to-date data will be pushed to the out-of-date replicas. As we introduced above, this process is called Read Repair.

3.4. Apache Cassandra 39

B

C A

D E

F G

H

Client

read request

read response

Replica 1

Replica 2

Replica 3 Coordinator

B

C A

D E

F G

H

Client

Replica 1

Replica 2

Replica 3 Coordinator

Step one: reading data

Step two: read repair

write request

Figure 3.8: Procedure of Reading Data (read consistency level is ONE)

Figure 3.8shows an example, where the read consistency level is specified toONE, and the up-to-date rows are hold on replica node C and D. In the first step, only replica node B has been contacted because it is the closest replica to the coordinator (that means replica node B responds node G the fastest). The data fetched from node B is responded back to the client. In the second step, the remaining two replica nodes have been contacted. All rows from the replicas have been compared. The coordinator has found that the replica holding on the node B is out-of-date, so the coordinator issues a write to it.

Cassandra improves its read performance by holding a partition key cache and a row cache, which helps to avoid reading from disk. The partition key cache is enabled by default. It caches the partition index to help Cassandra know where a partition is located on disk so as to decrease seeking times and save CPU time as well as memory.

The row cache is similar to a traditional cache like memcached in MySQL, which stores the entire contents of a partition that is frequently accessed. This feature, however, consumes large amounts of memory. Thus, on the official website it is suggested enabling this function unless it is in demand, and typically only one of these two kinds of caches should be enabled for a column family8.

3.4.4.3 Consistency Levels

Cassandra provides a tunable consistency, which means client can specify how much consistency is required for each query. The level of consistency refers to how many replica nodes are involved in a query. The higher the level is, the more likely the fetched data are to be up-to-date, or the more replica nodes are synchronized, and consequently, the lower availability the query will be.

There are several write/read consistency levels that a client can specify. Particularly, the QUORUM level in Cassandra is calculated as follows:

(replication factor/2) + 1 (3.1)

For example, if the replication factor is specified to THREE, two replica nodes must respond the write/read request.

Write consistency levels: Table 3.6 shows the possible write consistency levels and their implications for a write request. It is noteworthy that the coordinator for-wards a write request to all available replica nodes in all data centers, even if a low consistency level is specified.

Read consistency levels: read consistency levels are declared in Table 3.7. Read consistency level states the number of replicas must respond to a read request, so not all replica nodes would be contacted. In addition,ANY level is not supported here.

3.4.4.4 Deleting Data

Difference with RDBMS, Cassandra does not remove data from disk immediately when they are deleted. Instead, it adds atombstone marker on that data, and removes them later in the background during thecompaction. (So the delete in Cassandra is actually a write.) The reason is that Cassandra is designed to be distributed, durable and eventual consistent. If a node is down, it cannot receive as well as perform a delete request. When

8https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops configuring caches c.html (accessed 19.02.2016)

3.4. Apache Cassandra 41

Level Description

ONE/TWO/THREE/QUORUM/ALL A write must be success on at least one/

two/three/a quorum of/all replica nodes.

If there are not enough available replica nodes, the write will fail.

LOCAL ONE/LOCAL QUORUM A write must be success on at least one/a quorum of replica nodes in the local data center. It is used in multiple data center clusters with the replica placement strat-egyNetworkTopologyStrategy.

ANY A write must be written on at least one node. If all replica nodes are down at write time, the write can still succeed af-ter a Hinted Handoff is written. How-ever, reading of this write is only available until the replica nodes for that partition have recovered.

Table 3.6: Write Consistency Levels [Dat16]

Level Description

ONE/TWO/THREE/QUORUM/ALL Coordinator returns the data after one/t-wo/three/a quorum of/all replica nodes have responded.

LOCAL ONE/LOCAL QUORUM Coordinator returns the data after one/a quorum of replica nodes in the local data center have responded.

Table 3.7: Read Consistency Levels [Dat16]

this node becomes available again, it will compare its data with other nodes. This node will mistakenly think all replicas nodes that received the delete request have missed a write request, thereby launching a repair. As a result, the deleted data would reappear.

Cassandra usescompaction to collect garbage regularly (The default setting for that is 10 days.). Not only the data with a tombstone marker, but also the out-of-date data generated by the “update” will be removed from disk.