• Keine Ergebnisse gefunden

In this chapter, we discussed the technologies involved in this thesis. Here, XML, the de-facto standard for representing and exchanging data in the web, was considered from the data-centric perspective. We also showed the basic concepts of XPath and XQuery, the most widely used XML query languages. Finally, the XInclude and XLink/XPointer specifications for connecting distributed XML instances have been explained.

In the next chapter, we investigate how distributed XML instances interlinked with XLinks can be mapped to an integrated view. XInclude offers one possible mapping which is not flexible enough for general data integration tasks: it lacks the required flexibility to combine data according to a given external schema.

Having defined the mapping, how can the resulting view be queried with XPath? We propose and discuss appropriate techniques concerning the evaluation of queries on this view in Chapter 4.

To our best knowledge, these issues have not been examined yet.

XML Sources

In this chapter, we describe a model for mapping interlinked XML instances to a logical view. This model is specified by a flexible and expressive extension “dbxlink” for XLink and provides a basis for the investigations conducted in the remainder of this work.

3.1 Motivation

XML documents are not required to be self-contained but may rather have links to remote XML sources. As shown in Section 2.3.3, such references toautonomousresources can be defined with the W3C XLink specification [XLi01] in terms of a syntactical representation as XML elements. In contrast to the context of browsing and navigating to remote documents via links, from the data-centric viewpoint, a set of distributed interlinked XML documents induces a logical view that can be considered as a virtual XML instance. Figure 3.1 illustrates how this scenario is related to the classical three-level database architecture [TK78].

external level

users

View 1 View 2 . . . View n

logical level dbxlink: logical view (virtual instance)

physical level

XML DB XML DB XML File XML File

XLinks

Figure 3.1: Three-Level Database Architecture

On the physical level, interlinked XML sources (e.g. provided by database systems or stored as plain files) are given. These are then mapped to an integrated view (the logical level) which in turn serves as a basis for defining further views on the external level from where it is also accessed by users.

Unfortunately, XLink does not specifyhow the referenced fragments should be mapped into the virtual instance. Thus, the XLink mechanism has to be extended withsemantics that defines how to actually handle instances with references. The primary goal is that XLink references are mapped to a logical model that is (or at least provides the look-and-feel of) a plain XML instance that can be subject to the application of standard languages from the XML area. Especially, XPath as the basic addressing mechanism underlying XQuery must be applicable. Thus, a transparent modeling as an XML-to-XML transformation where the XLink elements are present only on the syntactical level, but queries navigate in the virtual instance along semantic notions is desirable.

Given an XML instance with XLink references, the actual specification of thelogical model must be flexible enough to cope with data integration issues. For instance, if for a distributed XML scenario a target schema is given that has to be met by the virtual instance, the model should allow versatile mapping options. Considering simple XLinks, the naive mapping approach would be to replace an XLink element with the target of its “xlink:href” attribute as done by XInclude (cf. Section 2.3.2). However, it might be useful or even necessary wrt. data integration to have alternative mapping options like the merging of the XLink’s local data (i.e. non-XLink-attributes and subelements) with the referenced nodes. In addition to that deficiency, the XLink specification solely specifies linking semantics for the context of hypermedia systems while the data-centric viewpoint is not considered. Thus, two questions arise:

1. What kind of modeling options are useful for (simple) XLinks?

2. How can interlinked XML data instances be queried while navigating across links?

In order to propose a solution to these issues, we introduced additional modeling and querying directives as an extension to the XLink technology by thedbxlinknamespace in [May02, MM03, BFM06a]. Similar to thexlinknamespace, the dbxlinknamespace offers several attributes that specify the database-specific semantics of XLink elements. Also, XLink’s attributexlink:actuateis interpreted for the data-centric viewpoint on XML. To give a first intuition, we shortly mention the relevant attributes used for simple XLinks:

• dbxlink:transparent is used to specify how the referenced data is mapped into the referencing instance,

• xlink:actuate supplies the time-point for evaluating the reference (during parsing or query answering),

• for querying, dbxlink:eval specifies how the evaluation of the XPointer expression contained in the xlink:href attribute is distributed between the local and remote server, and

• with dbxlink:cache, it can be specified which intermediate results (of both the XPointer and query results) are cached for reuse.

The dbxlink:transparent directives are the main issue in this chapter while the other directives will be explained in Section 4.1.3. In order to get an intuition of the logical model which is specified by the dbxlink:transparent attributes of XLinks compared to the real data model, see Figure 3.2. We thus have specified a model that transparently resolves and embeds XLinks into a virtual instance.

xpath-expr1 url#xpath-exprx

url

xpath-exprx

(real DM)

xpath-expr1

(transparent DM)

Figure 3.2: Extended XML Data Model with XLink Elements

Remark. In this thesis, all investigations related to querying interlinked XML instances are restricted to simple XLinks. For extended XLinks, the modeling issues are more complex. However, the results presented in this work can serve as a basis for dealing with extended links.

Therefore, in the following, we assume any XLink to be a simple XLink. According to XLink 1.1 [XLi06], simple XLinks are an application default. Thus, we will omit the attributexlink:type=”simple” occasionally and assume that it is given implicitly.