Cloud-Based Evaluation of Query Caching - Evaluation of Query Caching

4.7 Evaluation of Query Caching

4.7.2 Cloud-Based Evaluation of Query Caching

To demonstrate the effectiveness of Quaestor, we varied typical workload parameters such as incoming connections, the number of queries and objects, and update rates. We studied Quaestor’s scalability and performance under high throughput and extended the analysis to more clients and measured staleness using simulation. We did not compare Quaestor to geo-replicated systems (e.g., Pileus) as our main point is to show that commodity web caching highly improves latency with very little staleness and no additional servers. Geo-replication schemes tuned towards one specific geographical setup might still outperform Quaestor.

300 600 1200 1800 2400 3000

Connections 0

25k 50k 75k 100k 125k 150k

Throughput (ops/s)

Quaestor ^{CS only} CDN only Uncached

Figure 4.20: Throughput for a varying number of parallel connections comparing un-cached database access (Unun-cached), query caching in the CDN (CDN only), query caching in the client (CS only), and full client and CDN query caching (Quaestor).

Read-Heavy Workload

We begin evaluating Quaestor on a read-heavy workload with 99% queries and reads (equally weighted) and 1% writes. Figure 4.20 demonstrates Quaestor’s throughput scal-ability against a baseline without dynamic caching (Uncached), a CDN with InvaliDB (CDN only), and the client cache based on the Cache Sketch (CS) only (CS only). At maximum load (3 000 asynchronous connections delivered by 10 client instances), Quaestor achieves an 11-fold speedup versus an uncached baseline, a 5-fold improvement over the Cache Sketch-based client caches and a 69.5% improvement over a CDN with InvaliDB. Using a CDN with InvaliDB yields superior performance to only using client caches since clients rely on the CDN to fill up their caches quickly.

Client-side Bloom filters were refreshed every second (∆=1) to ensure minimal stale-ness. Figure 4.21 illustrates the latency distribution: while most queries are client cache

Client Cache Hits (capped)

CDN Cache Hits

Cache Misses

Figure 4.21: Query latency histogram showing peaks for client cache hits, CDN cache hits, and cache misses.

hits with no latency, CDN hits induce an average latency of 4 ms and cache misses 150 ms. Mean round-trip latency between client instances and Quaestor was 145 ms with a variance of 1 ms between runs (error bars omitted due to scale). Please note that linear scalability is not possible, since an increasing number of clients increases the number of updates and thus reduces cacheability.

300 600 1200 1800 2400 3000

Connections 0

50 100 150 200

Mean latency (ms)

Quaestor ^{CS only} CDN only Uncached

Figure 4.22: Object read latency for a varying number of parallel connections comparing cached to uncached database access.

Figures 4.22 and 4.23 show read and query latency for the same setup. For 3 000 connec-tions, Quaestor achieved a mean query latency of 3.2 ms and a mean read latency of 17.5 ms. As there are 100×more records than queries, cache hit rates for queries are higher and latencies lower. Note that the latency of the variant with the client cache (CS only) increases due to more overhead at the database. In contrast, CDN latency for queries

im-300 600 1200 1800 2400 3000 Connections

0 50 100 150 200

Mean latency (ms)

Quaestor CS only CDN only Uncached

Figure 4.23: Query latency for a varying number of parallel connections comparing cached to uncached database access.

proves initially and remains constant afterwards, because separate clients access the same CDN edge.

It is important to note that the relation between query and read latency depends not only on access distributions, but also on how query predicates “cover” the space of primary keys with respect to the concurrent update operations. That is, if most queries select a key that is also frequently updated, invalidations and thus latency increase. In this workload, query predicates were selected uniformly over the primary keys, but not all primary keys were necessarily covered. With increasing query count, updates are more likely to trigger invalidations, which we demonstrate in the following by varying the number of queries executed by clients.

Varying Query Count

Scalability with regard to query count is governed by the provided InvaliDB configuration (which scales linearly, as shown in Section 4.7.4). We demonstrate the effect of increasing query counts with regard to average request latency and cache hit rates for the same InvaliDB configuration used in the read-heavy workload (8 InvaliDB matching nodes).

Figure 4.24 shows how both read and query request latencies are affected by an increasing query count. Read latency improves, because a larger portion of keys is part of a cached query result. When queries that are cached as ID-lists, all records in a result are inserted into the cache as individual entries, thus causing read cache hits by side effect. This improves read latency from initially 20 ms to a mean read latency of 15 ms. The average query latency increases to slightly above 10 ms for larger query counts due to decreasing cache hit rates at the client, as shown in Figure 4.25. Cache hit rates at the CDN are comparably stable, since the concurrent client instances cause sufficient cache hits by

side effect for each other. Ultimately, Quaestor’s performance for increasing query counts depends more on the popularity of individual queries and the update rate than on the total number of queries.

1000 2000 4000 6000 8000 10000

Query count 0

10 20 30

Mean latency (ms)

Queries Reads

Figure 4.24: Mean latency for reads and queries for different numbers of total queries.

1000 2000 4000 6000 8000 10000

Query count 0.2

0.4 0.6 0.8 1.0

Cache hit rate

Client/Qrs. Client/Reads CDN/Qrs CDN/Reads

Figure 4.25: Read and query cache hit rates at the client and CDN for different numbers of total queries.

Varying Write Rates

Read-dominant workloads naturally lend themselves to caching, since they allow higher consistency, longer TTLs, fewer invalidations, and less database load. With increasing up-date rates, throughput is limited by the database. We demonstrate how cache hit rates degrade by increasing update rates (keeping equal read and query rates) in Figure 4.26.

Only 1 200 connections were used to avoid being limited by the write throughput of the MongoDB cluster. Client cache hit rates for both records and queries decrease predictably with increasing update rate. Figure 4.26 shows how staleness (Cache Sketch refresh inter-val) can be used to mitigate performance degradation in write-heavy scenarios. Notably, the refresh interval has only little impact on cache hit rate degradation. There is no lin-ear correlation between increasing refresh rate and lower latency on higher write rates, because increasing write rates also leads to lower TTLs. Hence, increasing Cache Sketch refreshes above a certain threshold only leads to more staleness without improved client performance.

0.00 0.05 0.10 0.15 0.20

Update rate 0.2

0.4 0.6 0.8 1.0

Query cache hit rate

100k obj./1k queries/1 s

100k obj./1k queries/10 s 100k obj./1k queries/100 s 100k obj./10k queries/1 s

Figure 4.26: Client cache hit rates for queries with varying update rates for different Cache Sketch refresh intervals. The labels indicate the respective number of total objects and queries, as well as the refresh interval.

Varying Object Count

Finally, we investigate Quaestor’s performance for varying object counts. Table 4.3 com-pares latencies for different database sizes indicated by the number of objects. Each collec-tion contains 10 000 objects and is accessed by 100 distinct queries. We increased experi-ment durations to 600 s and changed the Zipf constant to 0.99 to account for the fact that caches take significantly longer to fill up with increasing object and query counts. Results show that for very small databases and distributions with high Zipf constants, reads and writes concentrate on the same few objects and thus limit cache hit rates. For increasing database sizes, caches take longer to fill up and TTLs have to be adjusted upwards, thus limiting performance during experiments. Nonetheless, query latencies remain below 35 ms, while read latencies slightly suffer from low cache hit rates for the (relatively) short duration of the experiment for higher numbers of total objects.

Objects Queries Queries Reads

10 000 100 13.8 ms 70 ms

100 000 1 000 5.5 ms 40.2 ms 1 million 10 000 11.9 ms 27.2 ms 10 million 100 000 34.8 ms 133 ms

Table 4.3: Average query and read latency for increasing object counts for a request distri-bution with Zipfian constant0.99.

Production Results

Baqend currently hosts a range of production applications and has delivered performance improvements to numerous websites. As an example we report the results of the e-commerce company Thinks. While being featured in a TV show with 3.5 million view-ers, the shop had to provide low latency to potential customers. By relying on Orestes to cache all static data (e.g., files) and dynamic query results (e.g., articles with stock counters) the website achieved sub-second loads while being requested by 50 000 con-current users (>20 000 HTTP requests per second). The business effect was measurable:

the shop achieved a conversion rate of 7.8%, which is roughly 3 times above the industry average [Cha17]. Usually, such a request volume requires massive scale in the backend.

However, since the CDN cache hit rate was 98%, the load could be handled by 2 DBaaS servers and 2 MongoDB shards.

Im Dokument Low Latency for Cloud Data Management (Seite 183-188)