Latency on the Web - Network Performance: HTTP and Web Caching

2.3 Network Performance: HTTP and Web Caching

2.3.2 Latency on the Web

For interoperability reasons, REST APIs are the predominant type of interface in cloud data management. HTTP on the other hand has to be used by any website. The performance and latency of HTTP communication are determined by the protocols that are involved during each HTTP request.

DNS Lookup Initial Connection

TLS Handshake

Time to First Byte Content Download

20 ms

0 40 ms 60 ms 80 ms 100 ms

TCP Handshake

Figure 2.17: Latency components across network protocols of an HTTP request against a TLS-secured URL.

Figure 2.17 shows the latency components of a single HTTP request illustrated with ex-emplary delays:

1. First, the URL’s domain (e.g.,example.com) is resolved to an IP address using a UDP-basedDNS lookup. To this end, the client contacts a configured DNS resolver. If the DNS entry is uncached, the resolver will contact a root DNS server that redirects to a DNS server responsible for the top-level domain (e.g., for .com). That name server will in turn redirect to the authoritative name server registered by the owner of the domain. This name server then returns one or multiple IP addresses for the requested host name. Depending on the location of the (potentially geo-redundant) DNS servers and the state of their caches, a typical DNS query will return in 10-100 ms. Like in HTTP, DNS caching is based on TTLs with its associated staleness problems [TW11].

2. Next, a TCP connection between the client and the server is established using a three-way handshake. In the first round-trip, connection parameters are negotiated (SYN,SYN-ACKpackets). In the second round-trip, the client can send the first portion

of the payload. There is ongoing research onTCP fast open[CCRJ14], a mechanism to avoid one round-trip by sending data in the firstSYNpacket.

3. If the server supports and requires end-to-end encryption through HTTPS, respec-tively TLS (Transport Layer Security), a TLS handshake needs to be performed [Gri13]. This requires two additional round-trips during which the server’s cer-tificate is checked, session keys are exchanged, and a cipher suite for encryption and signatures is negotiated. TLS protocol extensions have been specified to al-low data transmission during half-open TLS connections to reduce TLS overhead to one round-trip (TLS false start). Alternatively, clients can reuse previous session pa-rameters negotiated with the same server, to abbreviate the handshake (TLS session resumption)¹⁷.

4. When the connection is established, the client sends an HTTP request that consists of an HTTP method, a URL, the protocol version as well as HTTP headers encoding additional information like the desired content type and supported compression al-gorithms. The server processes the requests and either fully assembles the response or starts transmitting it, as soon as data is available (chunked encoding). The delay to the moment where the client receives the first response bytes is referred to as time-to-first-byte(TTFB).

5. Even though the connection is fully established, the response cannot necessarily be transmitted in a single round-trip, but requires multiple iterations for the con-tent download. TCP employs aslow-start algorithmthat continuously increases the transmission rate until the full aggregate capacity of all involved hops is saturated without packet loss and congestion [MSMO97]. Numerous congestion control al-gorithms have been proposed, most of which rely on packet loss as an indicator of network congestion [KHR02, WDM01]. For large responses, multiple round-trips are therefore required to transfer data over a newly opened connection, until TCP’s congestion window is sufficiently sized¹⁸. Increasing the initial TCP congestion win-dow from 4 to 10 segments is ongoing work [CCDM13] and allows for typically 10·1500 B=15 KBof data transmitted with a single round-trip, given the maximum transmission unit of1500 Bof an Ethernet network.

In the best case and with all optimizations applied, an HTTP request over a new connec-tion can hence be performed with one DNS round-trip and two server round-trips. DNS requests are aggressively cached, as IPs for DNS names are considered stable. The DNS overhead is therefore often minimal and can additionally be tackled by geo-replicated DNS servers that serve requests to nearby users (DNS anycast). To minimize the impact of TCP

17Furthermore, the QUIC (Quick UDP Internet Connections) protocol has been proposed as UDP-based alter-native to HTTP that has no connection handshake overhead [Gri13]. A new TLS protocol version with no additional handshakes has also been proposed [Res17].

18The relationship between latency and potential data rate is called the bandwidth-delay product[Gri13].

For a given round-trip latency (delay), the effective data rate (bandwidth) is computed as the maximum amount of data that can be transferred (product) divided by the delay. For example, if the current TCP congestion window is 16 KB and the latency 100 ms, the maximum data rate is 1.31 MBit/s.

and TLS handshakes, clients keep connections open for reuse in future requests, which is an indispensable optimization, in particular for request-heavy websites.

The current protocol version 2 of HTTP [IET15] maintains the semantics of the original HTTP standard [KR01] but improves many networking inefficiencies. Some optimizations are inherent, while others require active support by clouds services:

• Multiplexingall requests over one TCP connection avoids the overhead of multiple connection handshakes and circumvents head-of-line blocking¹⁹.

• Header Compression applies compression to HTTP metadata to minimize the im-pact of repetitive patterns (e.g., always requesting JSON as a format).

• If a server implementsServer Push, resources can be sent to the client proactively whenever the server assumes that they will be requested. This requires explicit sup-port by cloud services, as the semantics and usage patterns define, which content should be pushed to reduce round-trips. However, inadequate use of pushed re-sources hurts performance, as the browser cache is rendered useless.

• By defining dependencies between resources, the server can activelyprioritize im-portant requests.

As of 2017, still less than 20% of websites and APIs employ HTTP/2 [Usa17]. The tech-niques developed in this thesis apply to both HTTP/1.1 and HTTP/2, but profit from the improvements of HTTP/2. When all above protocols are in optimal use, the remaining la-tency bottleneck is the round-trip lala-tency between API and browser clients and the server answering HTTP requests.

Inmobile networks, the impact of HTTP request latency is even more severe. Additional latency is caused by the mobile network infrastructure. With the older 2G and 3G mobile network standards, latencies between 100 ms (HSPA) and 750 ms (GPRS) are common [Gri13, Ch. 7]. With modern 4G LTE-Advanced (Long Term Evolution) networks, the standards prescribe strict latency bounds for better user experience. As mobile devices share radio frequencies for data transmission, access has to be mediated and multiplexed.

This process is performed by a radio resource controller (RRC) located in the radio towers of the LTE cells that together comprise the radio access network (RAN). At the physical level, several latency-critical steps are involved in a request by a mobile device connected via a 4G network:

1. When a mobile device sends or receives data and was previously idle, it negotiates physical transmission parameters with the RRC. The standard prescribes that this control-plane latencymust not exceed 100 ms [DPS13].

2. Any packet transferred from the mobile device to the radio tower must have a user-plane latencyof below 5 ms.

19Head-of-line blocking occurs when a request is scheduled, but no open connection can be used, as responses have not yet been received.

3. Next, the carrier transfers the packet from the radio tower to a packet gateway connected to the public Internet. Thiscore network latencyis not bounded.

4. Starting from the packet gateway, normalInternet routing with variable latency is performed.

Thus, in modern mobile networks, one-way latency will be at least 5-105 ms higher than in conventional networks. The additional latency is incurred for each HTTP request and each TCP/TLS connection handshake, making latency particularly critical for mobile websites and apps.

In summary, to achieve low latency for REST/ and HTTP, many network parameters have to be explicitly optimized at the level of protocol parameters, operating systems, network stacks, and servers [Gri13]. In-depth engineering details of TCP/IP, DNS, HTTP, TLS, and mobile networking are provided by Grigorik [Gri13], Kurose and Ross [KR10], and Tanen-baum [TW11]. However, with all techniques and best practices applied, physical latency from the client to the server remains the main bottleneck, as well as the time-to-first-byte caused by processing in the backend. Both latency contributions can be addressed through caching.

Im Dokument Low Latency for Cloud Data Management (Seite 76-79)