• Keine Ergebnisse gefunden

HTTP−Proxy AL1

6.2 Challenges for Web Browsing

6.2 Challenges for Web Browsing

Web-based browsing has less stringend network requirements compared to the peer-to-peer video streaming discussed in Section5, as the resources accessed via WWW are in most cases1 not time-critical, but nonetheless, the volume of data is considerable for resource constrained environments. The pace maker for web-based applications seems also to be what is possible today with fixed-line network access technologies. For instance, [114] claims that the average size of a web page, that is the web page and all elements that are loaded for that page, has quintupled from 2003 to 2009. However, the achievable throughput of fixed-line high speed access-lines, such as, FTTH, has constantly increased, but with a faster pace as compared to other technologies, for example, UMTS.

The trend of growing web page sizes is impacting the other end of the spectrum of network ac-cess lines – the users with low-bandwidth network acac-cess, such as, for instance, dial-up modem users or mobile wireless users in rural areas without fast mobile wireless. These users are also in aresource constrainednetwork environment and will suffer from the gap between their slow network access and the growing web pages: they will wait for a very long time to see the page or give up and cannot participate anymore in that.

Whether a node (and the respective user) is in a resource constrained environment is relative to what the users of the web applications are expecting of the network: a very patient user may be willing to wait for a long time for a we page to be delivered while an impatient user will simply give up waiting for the web page soon. There is a vast number of studies on how the time to deliver web page influences the behavior of a user, as for instance in [115, 116].

However, in turns out that the the upper bound for a web site to be displayed to the user (this includes network retrieval time and rendering time of the browser) is below 1 minute. Most of the users will stop from either accessing a web page, if this particular page is loaded too slowly, or completely refrain from using the Internet, if all web pages are loaded too slowly.

We call a user accessing a web page to be in a resource constrained environment if the user cannot access the desired web page within 1.5 times the regular acceptable ”waiting” period for the retrieval and display of a web page.

1The author cannot rule out this possibility, given the versatile usage of HTTP and HTML.

Chapter 6. Web Browsing in Constrained Environments 108

6.2.1 Preconditions

Web traffic has its own specific traffic pattern between the web browser and the server. The traffic is very bursty followed by longer idle periods: the user is requesting a web page and it is loaded by the browser as fast as the network is able to deliver the page and page’s el-ements [117]. However, in recent years the traffic patterns have changed as the applications working on top of web traffic have changed.

Two recent observations are blogs, i. e., where user can actively publish their own content on web servers [118]. In this case, the traffic pattern is changed in the way that not only elements are downloaded but also more frequently uploaded. On the other hand, there are more and are web pages based on Asynchronous JavaScript and XML (AJAX) [112] that fetch further content even after the web page has been load. The observations in terms of changed traffic patterns are that AJAX applications transfer more bytes in each session compared to non-AJAX applications; they have a larger number of requests per session; and the inter-request-times are significantly shorter [119].

Nonetheless, the one of the main characteristics of web traffic remains: the burstiness but with changed inter-request-times.

6.2.2 How much is cacheable?

As of today (2010), it is impossible to find literature about how much traffic HTTP proxies are going to save in general and also in particular situations. There are some vagues numbers, but there are no reliable sources on that topic. The main reason for this seems to be that the savings by HTTP proxies depend on the type of users using the proxy (e. g., enterprise vs. home users), the type of content accessed (e. g., news pages or social networking sites), and how web pages are designed or operated (e. g., web pages can allow or disallow to cache element).

We have two sample HTTP proxy excerpts to show how much traffic can be cached by such a proxy and how dependent it is on the visited web sites – and thus dependent on the users. Both statistics in Table6.2.2and Table6.2.2are courtesy by two different companies who provided the statistics while not being mentioned. The proxies are located on the up/downlink of the companies and operate as a cache. Both samples are reflect the statistic for the period July 1-31, 2010.

Table6.2.2shows the top 10 entries access in company A, while only a few entries are actually being cached. The last column indicates the total amount of bytes cached as compared to the column ”Total Bytes”. There is almost no caching for the most popular site (google.de), but the 2nd and 3rd most popular sites are ached to some extend. These sites have typically a number of information elements which are cacheable and also usable by multiple users, e. .g, they have mainly common banner graphics, etc.

Table 6.2.2shows the top sites accessed. The last column ”hit ratio” reflects how much of the traffic in bytes has actually been cached. For instance, de.archive.ubuntu.com and www.spiegel.de have both a significant cache hit ratio of 34.16 % and 51.58 %. Our sites, which generate a large

Chapter 6. Web Browsing in Constrained Environments 109

SiteRequestsPageViewsBrowse Time indays Total Bytes inMBytes BytesRe- ceived inMBytes

BytesSent inMBytesCache Bytes inMBytes www.google.de1813191269.72529.0829069501340147.07 www.facebook.com1543093236.54725.2131001185012404480 www.bild.de3868298138.57611.94338039760362033040 webradio.antenne.de163239128.3797.6292.67194.0698.6050.61 cluster.evig.de10265798.3425.4145.5283.9361.593.75 85.17.147.131776577.3743.863.6218.4145.210.062 www.google.at41357573.7047.220201700328.1839.37 www.dir.bg7559565.9379.4107.7436.5071.243.16 www.meinvz.net106061663.2838.077406870888.883730 service.gmx.net68270160.3623.029602160813.421190 Table6.1:SampleproxystatisticsforcompanyA,sampletimeJuly1-31,2010

Chapter 6. Web Browsing in Constrained Environments 110

destinationNo.of request%totalre- questshitrate in%seconds per request Bytesoverall part in%

hitratio bytesin% de.archive.ubuntu.com656770.7784.930.1312816M5.5734.16 liveupdate.symantecliveupdate.com1283461.5118.820.106726011K2.8618.28 swr.ic.llnwd.net880.000.008388.066565074K2.790.00 download.windowsupdate.com50110.0616.820.846195866K2.631.25 195.10.10.207290.000.0012505.115508975K2.340.00 ftp-stud.fht-esslingen.de5720.0143.712.105127182K2.180.01 ftp.hp.com2760.000.3669.974033935K1.710.72 armdl.adobe.com37880.041.116.092890612K1.230.53 download.oracle.com5880.0136.732.342835923K1.200.21 us.archive.ubuntu.com27840.035.460.472581540K1.100.01 ftp.uni-erlangen.de24120.030.170.461996251K0.850.26 safebrowsing-cache.google.com1356011.6034.440.041891303K0.8043.73 v17.lscache7.c.youtube.com410.000.0030.771841443K0.780.00 ftp.de.debian.org23930.0331.300.301695222K0.7210.07 www.spiegel.de1399221.6572.120.031671029K0.7151.85 svn.ict-societies.eu4070.000.001.691509855K0.640.00 download.microsoft.com880.009.0933.081326076K0.565.07 www.myotherdrive.com2530.000.0016.101295668K0.550.00 is2.myvideo.de3750.0047.201.461295302K0.5532.19 kh.google.com835920.980.050.071252498K0.530.00 archlinux.limun.org10280.010.390.291246514K0.530.00 au.download.windowsupdate.com22680.034.631.101147744K0.495.08 www1011.megaupload.com10.000.00229.541024000K0.430.00 www1123.megaupload.com10.000.00233.771024000K0.430.00 www1041.megaupload.com10.000.00253.291024000K0.430.00 other:all-level-domains792113293.2324.771.85154974M67.3810.17 Sum8496674100.0025.801.87230003M100.0010.36 Table6.2:SampleproxystatisticsforcompanyB,sampletimeJuly1-31,2010

Chapter 6. Web Browsing in Constrained Environments 111

amount of traffic, such as youtube.com, are not cached at all. This can be due to non-cacheable content, or the proxy configuration is set to ignore that type of content.

However, both tables show that the access web sites can vary a lot between different user groups and thus also result in a different amount of traffic that can be cached.

6.2.3 Challenges in Resource Constrained Environments

Access to the web pages and the resulting process of retrieving the information elements is difficult to predict. It is hardly foreseeable, when a user wants to browse a web page, i. e., sending a request to the browser. Second, the complete size of the web page to retrieve is not that obvious, as it consists of numerous information elements, that can each have a varying size.

The size is not known a prior in HTML, as only the element name and the link to it are given.

The HTTP protocol [RFC 2616] offers a way to retrieve the content-length of an element with the HEAD method which is discussed in Section 6.3.3. This is fundamentally different from the peer-to-peer video streaming application in Section 5, where the size of the information elements (chunks) is known in advance and also the timing of the chunks.