WWW-Based Experiment - Experimental Analysis

II. Execution of Queries over a Web of Linked Data 97

6. A Traversal-Based Strategy 111

7.3. Experimental Analysis

7.3.2. WWW-Based Experiment

In our first experiment we executed particular C_LD(M) queries over the WWW to test Hypothesis7.1. In the following we describe this experiment and discuss the results.

Queries

For this experiment we used a mix of 18 queries. These queries, denoted by WQ1 to WQ18, can be found in the Appendix (cf. SectionD.1, page211ff). Eleven of these queries have been proposed as Linked Data queries for the FedBench benchmark suite [141].

These “queries vary in a broad range of characteristics, such as number of sources in-volved, number of query results, and query structure” [141]. However, we had to slightly adjust five of these queries (WQ6, WQ7, WQ8, WQ9, and WQ10) because their original versions use terms from outdated vocabularies. These adjustments do not change the intent of the queries or their structural properties.

In addition to these eleven FedBench queries, we used seven queries that are updated versions of queries used for experiments in our earlier work [72,79]. These seven queries are designed such that each of the respective reachable subwebs covers data from a larger number of data providers than the reachable subwebs of the FedBench queries. Hence, these seven queries add more diversity to the query mix.

Overall the query mix covers a variety of structural properties. The BGPs in these queries consist of two to eight triple patterns and contain between two to seven different query variables. Some of these BGPs are “star-shaped” (i.e., one variable is contained in all triple patterns of the BGP), others are “path-shaped” (i.e., every variable is contained in at most two triple patterns), and a third group presents mixtures of star-shaped and path-shaped BGPs. Table7.1characterizes all 18 test queries w.r.t. these properties.

Procedure

For the experiment we executed the 18 queries sequentially (i.e., one after the other).

Such a sequential execution avoids measuring artifacts of concurrent executions. To exclude possible interference between subsequent query executions, we use SQUIN in its primary mode of operation as described in Section 7.3.1 (cf. page 152f), that is, each query execution within the sequence starts with an initially empty query-local dataset.

Hereafter, we refer to these executions of the test queries as data-retrieving executions.

Query Number of

Table 7.1.: Structural properties of the queries used for the WWW-based experiment.

To minimize the potential of impacting the experiment by unexpected network traffic we performed five of the aforementioned sequential runs and combine the measurements by calculating the arithmetic mean for each query, respectively. By this procedure we obtain the following primary measurements for each test query:

• the average number of documents retrieved during data-retrieving executions of the query,

• the average number of solutions returned for the query during data-retrieving ex-ecutions, and

• the average time for completing the data-retrieving executions of the query.

Furthermore, for each test query we also recorded the minimum and maximum of (i) the number of retrieved documents, (ii) the number of returned solutions, and (iii) the query execution time as measured during the five runs (of data-retrieving executions).

To test Hypothesis 7.1 we also need the data retrieval time, that is, the fraction of the query execution time that SQUIN spends on data retrieval. However, this fraction is difficult to determine during data-retrieving executions because data retrieval and query-local data processing are deeply interwoven in SQUIN (as suggested by our query execution model); furthermore, due to a multi-threaded implementation of data retrieval,

7.3. Experimental Analysis SQUIN usually performs multiple URI lookups in parallel (cf. Section7.3.1, page152f).

Consequently, simply summing up the runtime of all URI lookups would not be an accurate approach for measuring the fraction of query execution time spent on data retrieval (i.e., data retrieval time). Therefore, we applied the following cache-based method to measure the data retrieval time for each test query.

We executed each test query twice: First, as for the aforementioned data-retriev-ing executions, we executed the query usdata-retriev-ing SQUIN in its primary mode of operation (i.e., starting the query execution with an empty query-local dataset). Hence, during this execution, SQUIN populates the (initially empty) query-local dataset as usual. After this first execution we kept the populated query-local dataset and reused it as the initial que-ry-local dataset for a second execution of the same test query. However, for this second execution we deactivated SQUIN’s data retrieval functionality and, thus, used SQUIN as if it was a standard SPARQL query engine. That is, we evaluated the test query over the fixed dataset that we obtained from the first execution. As a result, we may use the difference between the query execution time of this second execution—hereafter calledcache-based execution—and the average query execution time of the data-retriev-ing executions as a measure of data retrieval time. Formally:

Definition 7.2 (Average Data Retrieval Time). Letqbe a test query; lett_overall be the average query execution time measured for the data-retrieving executions ofq; and let t_local be the query execution time measured for the cache-based execution of q. The average data retrieval time for test queryq ist_retrieval:=t_overall−t_local. 2 Arguably, this method of measuring data retrieval time is not completely accurate. By starting the cache-based executions with query-local datasets that are already popu-lated completely, our query engine may compute additional intermediate solutions (that are not computed based on an initially empty, continuously augmented query-local dataset) [71]. Thus, for the cache-based executions, the amount of query-local data processing may be greater than what it actually is in the data-retrieving executions.

Therefore, our method may underestimate the actual data retrieval times. However, underestimation does not invalidate a verification of Hypothesis 7.1.

Measurements

We conducted the experiment from September 17, 2012 to September 18, 2012. Thus, the measurements that we report in the following, reflect the situation of Linked Data on the WWW during that time (and may be different for other points in time).

The chart in Figure 7.4 depicts the average number of solutions returned for each of the 18 test queries during the five primary, data-retrieving runs. Figure 7.5reports the corresponding number of retrieved documents. The range bar laid over each of the main bars in these two charts denotes the minimum and maximum values that contribute to the average represented by the main bar (for the exact minimum and maximum values we refer to Table D.1 in the appendix, cf. page215).

These range bars indicate that the measurements for queries WQ5, WQ8, WQ17, and WQ18 varied significantly for the five runs, whereas the measurements for the other

Figure 7.4.: Average number of solutions returned for each of the 18 test queries in the WWW-based experiment.

queries have been consistent. A closer inspection of the statistics recorded during the experiment reveals that for query WQ5, WQ8, WQ17, and WQ18, some atypical URI lookup timeouts occurred during one of the five executions, respectively.

For instance, there are four executions of query WQ8 during which SQUIN recorded 48 lookup timeouts and retrieved 331 LD documents, respectively; during the other exe-cution of query WQ8, SQUIN observed an atypical number of 160 timeouts. As a result, during this atypical execution, SQUIN retrieved only 59 of the 331 LD documents.¹ The smaller number of retrieved documents also has an effect on the number of solutions that SQUIN returned during the atypical execution of query WQ8(see the lower bound of the corresponding range bar in Figure7.4).

An even more extreme example is query WQ17 for which the lookup of the seed URI of the query timed out during one of the five executions. In this case, the lack of seed data made the traversal-based discovery of further data impossible. As a result, during this atypical execution of query WQ17, SQUIN did not retrieve a single document and returned no solutions for the query.

While Figures7.4and7.5show measurements for the data-retrieving runs, correspond-ing numbers for the cache-based run follow from these measurements: We ensured that the particular (data-retrieving) executions based on which we populated the query-local

1We explain the disparity in the differences between the typical and the atypical number of timeouts (48 vs. 160) and between the typical and the atypical number of retrieved documents (331 vs. 59) as follows: Some of the documents missed due to the additional timeouts during the atypical execution, have enabled SQUIN to discover further documents during the four typical executions.

7.3. Experimental Analysis

Figure 7.5.: Average number of documents retrieved during executions of the 18 test queries in the WWW-based experiment.

datasets for the cache-based executions are not atypical. Hence, for each test query, the cache-based run (re)used a pre-populated dataset that consisted of the data from all doc-uments whose retrieval contributed to the corresponding measurement in the rightmost column of Table D.1(b)(cf. page 215). The number of solutions returned in the cache-based executions is consistent with the numbers reported for the typical data-retrieving executions (that is, the numbers in the rightmost column of TableD.1(a), page215).

Figure 7.6 reports query execution times. In particular, the dark gray, hatched bars in the chart represent the query execution times measured during the cache-based run.

The light gray bars represent the average query execution times of the primary, da-ta-retrieving executions; range bars, again, denote the minimum and maximum values that contribute to the average (TableD.2 in the appendix lists the exact minimum and maximum values; cf. page216).

Result

Based on Figure 7.6 we observe that for each test query the average execution time measured during the data-retrieving runs is significantly larger than the time required for the cache-based execution. More precisely, these times differ by two (for query WQ17) to five (for queries WQ6, WQ7, WQ8, WQ10, WQ15, and WQ16) orders of magnitude. As discussed in the context of Definition7.2(cf. page155), these differences approximate the net times that SQUIN required for retrieving data during the traversal-based executions of the test queries. Thus, the measurements show that the overall (traversal-based)

Figure 7.6.: Comparison of overall execution times for each of the 18 test queries in the WWW-based experiment.

query execution time for the test queries is dominated by the data retrieval time, which verifies Hypothesis7.1.

Im Dokument Querying a Web of Linked Data (Seite 165-170)