• Keine Ergebnisse gefunden

Performance and Scalability

Case Studies

5.1 Performance and Scalability

In this section, we analyze the performance and scalability of our semantic interpreta-tion engine through an experimental study. We describe the setup of the experiments used for the evaluation in detail. Afterwards, we present and discuss the results of our experiments.

This experimental study has two main goals:

• To analyze the performance of the system in terms of time spent in processing interpretation requests.

• To test the scalability of the system by analyzing the sensitivity of the interpre-tation service to the growth in analysis ABox size.

To achieve a rigorous evolution of the semantic interpretation engine, we test it using a reasonably large corpus of documents taken from websites. In particular, we use 500 web pages taken from the IAAF and USTAF web sites [Int09, USA09] as test data.

In order to obtain high-quality training data for analysis tools, these web pages have been annotated by multiple human annotators. More precisely, each web page has been annotated by two human experts, and then a third expert has adjudicated on discrepancies between the annotations of different annotators. In the literature, this process of obtaining high-quality annotation data is called gold-standard annota-tion, and the resulting gold-standard annotations are often used to measure system performance [MMSW06].

These gold-standard annotations created for the BOEMIE project include both surface-level information, e.g. sport event or person names, and deep-level informa-tion, e.g. such as sport trials and athletes. In the annotation process concept and role names from the athletics domain ontology AEO are used. Therefore, these annotations can be referred to as semantic annotations. Later, the gold-standard annotations have been transformed to DL ABoxes. We henceforth call these ABoxes gold-standard in-terpretation ABoxes to emphasize the fact that they are derived from gold-standard annotations and contain both surface and deep-level information.

In our first experiment, we want to study the performance and scalability of the in-terpretation service. In order to test the text inin-terpretation web service of the semantic interpretation engine, we need analysis ABoxes, which contain surface-level information only. Therefore, we remove all deep-level information from gold-standard interpretation ABoxes, and name the resulting ABoxesgold-standard analysis ABoxes.

Our first experiment has the following setup: A client application calls the inter-pretText web service of the semantic interpretation engine serially for each one of the 500 web pages. The gold-standard analysis ABoxes serve as input for the interpreta-tion process. We use three metrics as performance measures for the interpretainterpreta-tion of an analysis ABox: the number of fiat assertions, the number of all assertions, and the time spent to process the interpretation request.

We define the time spent for interpretation as the time spent between the moment at which the interpretation request arrives at the semantic interpretation engine, and the moment at which the response to the call, i.e. the interpretation ABox, is ready to be sent to the client. This definition considers solely the time needed by the interpretation algorithm, and does not involve the time needed for the communication between the client and the semantic interpretation engine. This enables us to measure the system

performance independent of external factors such as network latency that may vary substantially.

The experiments were run on a Macintosh machine (OSX 10.4.11) with a 2.16 GHz Intel Core Duo processor and 2 GB of main memory. The semantic interpretation engine was deployed in the servlet container Apache Tomcat 6.0.14. We used Sun JVM version 1.5.0 16. The maximum heap size allocated to Java was set to 512 megabytes. The semantic interpretation engine was configured to manage a single instance of the DL reasoner RacerPro in version 1.9.3. The background knowledge used by the semantic interpretation consists of the ontologies AEO version 2.13, MCO version 2.13, GIO version 2.5 and the text interpretation rule file sports rules text 1.racer, which was last modified at 6th of March 2009.

0
 1
 2
 3
 4
 5
 6
 7


0
 50
 100
 150
 200
 250
 300
 350
 400
 450


Figure 5.1: The number of fiat assertions (x) and the time (y) spent in minutes for the interpretation of 500 text analysis ABoxes.

Figure 5.1 shows the performance results of our semantic interpretation engine for the interpretation of a corpus consisting of 500 text analysis ABoxes. Each point in

the diagram is a different text analysis ABox. For each text analysis ABox the value at the horizontal axis denotes the number of fiat assertions, and the value at the vertical axis denotes the time spent for interpretation in minutes.

To get a clear picture of the relation between the number of fiat assertions and the time spent for interpretation, we built clusters of ABoxes with similar amount of fiat assertions. Later we have selected an ABox from each cluster as representative average members of that cluster. The diagram in Figure 5.2 shows the number of fiat assertions and the time spent for the interpretation of the selected text analysis ABox.

0
 1
 2
 3
 4
 5
 6


0
 50
 100
 150
 200
 250
 300
 350
 400
 450


Figure 5.2: The number of fiat assertions (x) and the time (y) spent in minutes for the interpretation of selected text analysis ABoxes.

The number of fiat assertions in an analysis ABox plays an important role on the amount of time needed to interpret that analysis ABox, because every fiat assertion represents an observation that is questioned. Therefore, the semantic interpretation engine is requested to compute preferred explanations for each fiat assertion.

Figure 5.3 shows the performance results of the semantic interpretation engine for

the interpretation of the same 500 text analysis ABoxes, but with respect to the number of all assertions in the ABoxes. Like the diagram in Figure 5.1, each point in the diagram is a different analysis ABox, and the vertical axis denotes the time spent for interpretation in minutes. However, different from the diagram in Figure 5.1, the horizontal axis denotes the sum of fiat and bona-fide assertions.

0
 1
 2
 3
 4
 5
 6
 7


0
 100
 200
 300
 400
 500
 600
 700
 800
 900
 1000


Figure 5.3: The sum of fiat and bona fide assertions (x) and the time (y) spent in minutes for the interpretation of 500 text analysis ABoxes.

In this experiment, assertions that regard concrete domains, so-calleddata proper-ties in OWL, are excluded. The values shown in the horizontal axis of the diagram in Figure 5.3 include only so-called object properties, i.e. assertions that are not related to concrete domains. In our approach, contrary to data properties, object properties are used as predicates in text interpretation rules, and thus affect the time spent for interpretation.

Comparing the two diagrams in Figure 5.1 and Figure 5.3, we observe that not only the amount of fiat assertions affect the amount of processing time, but also the

sum of fiat and bona fide assertions. For example, in Figure 5.1 we observe that the largest amount of time, more than 6 minutes, was requested for the interpretation of an analysis ABox with less than 250 fiat assertions. We can also see that the interpretation of many other ABoxes with a similar number of fiat assertions took between 1.6 and 3.6 minutes. In fact, the difference indicates the existence of another factor on the performance. In Figure 5.3 we identify the reason for this difference: The sum of fiat and bona-fide assertions in the analysis ABox that required more than 6 minutes for interpretation is considerably higher than the sum of fiat and bona fide assertions in other analysis ABoxes, which have a similar amount of fiat assertions.

To clarify the relationship between the number of fiat and bona fide assertions, and the time spent for interpretation, we built again clusters of ABoxes with similar amount of fiat assertions. From each cluster we have selected an ABox to represent all ABoxes in that cluster. The plot in Figure 5.4 shows the number of fiat and bona-fide assertions, and the time spent for the interpretation of the selected text analysis ABox.

The diagrams in Figure 5.2 and Figure 5.4 visualize the progression of the time required for interpretation. We can observe that the time required for interpretation increases approximately linear to the increase in the amount of fiat assertions and the sum of fiat and bona fide assertions. The results obtained are very good, because they show that the system scales quite well.

In a practical scenario, the multimedia interpretation process can be considered as part of an offline process where a repository of semantic descriptions is prepared before the repository can be exploited by a multimedia retrieval system. Considering also the fact that this experimental study has been conducted on hardware below today’s standard, the performance of the system is quite promising.

In light of this experimental study, we identify further possibilities for improving the performance and scalability of the semantic interpretation engine, especially in practical settings:

• Several reasoning tasks provided by RacerPro are constantly improved. The semantic interpretation engine will benefit from future improvements in RacerPro such as the support for incremental reasoning or optimizations of the abductive inference service.

0
 1
 2
 3
 4