• Keine Ergebnisse gefunden

2.3 Linking XML Data

2.3.3 Remarks

Since the W3C and IETF standards and recommendations for XPath, XPointer, XQuery – and almost all other XML-related technologies – are quickly evolv-ing, there is an obvious need to specify the version / state of the art of these technologies as they are used, understood and cited in scope of this work. This work refers to:

• XPath: XML Path Language (XPath) Version 1.0, W3C Recommendation 16 November 1999 [XPa99]

• XPointer:

– XML Pointer Framework (XPointer), W3C Recommendation 25 March 2003 [XPt03b]

– XML XPointer element()Scheme, W3C Recommendation 25 March 2003 [XPt03a]

– XPointer xmlns() Scheme, W3C Recommendation 25 March 2003 [XPt03c]

– XPointerxpointer()Scheme, W3C Working Draft 19 December 2002 [XPt02b]

• XLink: XML Linking Language (XLink) Version 1.1, W3C Recommen-dation 27 June 2001 [XLi01a]

The namespace for the XML Linking Language ishttp://www.w3.org/1999/xlink.

Throughout the examples in this work, The namespace is always bound to the namespace prefixxlink, if not stated otherwise.

Chapter 3

Querying XML Data with Simple Links

3.1 Query Support for XLinks

Consider the following XLink example: The geographical databaseMondialis split up into several instances and distributed over a number of host locations.

An instancecountries.xml contains country data, instancescities-UK.xml, cities-B.xml andcities-D.xml contain data about all cities of a specific country (here, cities in the U.K., in Belgium and in Germany).

The fact that Antwerp is in Belgium is expressed via a Simple Link from inside the Belgium element in countries.xml to Antwerp’s city element in the cities-B.xml document (at Figure 3.1). The fact that global organizations have members (countries) is represented with one Extended Link, containing one arc for eachcountry↔organizationmembership relation:

memberships

orgs countries

host 1 host 2

host 3 cities-B cities-D

member-of is-member

headq

capital cities

neighbor

How can XML documents linked in this way be queried? Many relations in the modeled data are expressed with XLinks. E.g. for finding out how many

19

<!-- http://www.foo.de/countries.xml -->

<countries>

<country car code=”B” area=”30510”>

<name>Belgium</name>

<population>10170241</population>

<capital xlink:type=”simple” xlink:href=

”http://www.bar.de/cities-B.xml#

xpointer(/cities/city[name=’Brussels’])” />

<neighbor xlink:type=”simple” xlink:href=

”http://www.foo.de/countries.xml#

xpointer(/countries/country[@car code=’D’])”

borderlength=”167”/>

:

<cities xlink:type=”simple” xlink:href=

”http://www.bar.de/cities-B.xml#xpointer(//city)” />

:

</country>

<country car code=”D” area=”356910”>

<name>Germany</name>

<population>83536115</population>

<capital xlink:type=”simple” xlink:href=

”http://www.bar.de/cities-D.xml#

xpointer(/cities/city[name=’Berlin’])” />

<neighbor xlink:type=”simple” xlink:href=

”http://www.foo.de/countries.xml#

xpointer(/countries/country[@car code=’B’])”

borderlength=”167”/>

:

<cities xlink:type=”simple” xlink:href=

”http://www.bar.de/cities-D.xml#xpointer(//city)” />

:

</country>

:

</countries>

<!-- http://www.bar.de/cities-B.xml -->

<cities>

<city>

<name>Brussels</name>

<population>951580</population>

:

</city>

<city>

<name>Antwerp</name>

<population>459072</population>

:

</city>

:

</cities>

<!-- http://www.bar.de/cities-D.xml -->

<cities>

<city>

<name>Berlin</name>

<population>3472009</population>

:

</city>

<city>

<name>Hamburg</name>

<population>1705872</population>

:

</city>

:

</cities>

Figure 3.1: Excerpt of the DistributedMondialXML Database [May07]

inhabitants the capital of Belgium has, it would be necessary to gather data from two different documents – countries.xml andcities-B.xml, possibly on two different hosts – during a single query execution.

The XML Query Requirements [XMQ03]1 explicitly state that querying

1The XML Query Requirements led to the specification of the XML Query Language (XQuery)by the World Wide Web Consortium. XPathis an XML navigation language based on path expressions, and is an integral part of XQuery. Thus, all XPath functions can be used within XQuery; that’s why for the scope of this work there is no distinction between XPath and XQuery functions, using the term “XPath/XQuery function” instead. In the specificationXQuery 1.0 and XPath 2.0 Functions and Operators [XPQ07], the distinction

3.1. QUERY SUPPORT FOR XLINKS 21 along references, both within an XML document and between documents, must be supported. Intra-document references are modeled in XML using the ID-IDREF construct. In XQuery, these references can be explicitly dereferenced with the XPath/XQuery functionid(). Inter-document references in XML doc-uments can be expressed with XLink constructs. How can they be queried?

Can they be queried at all?

With the XPath/XQuery functiondocument(), a remote document can be identified in a query, and with

let $pointer :=

doc(”http://. . . /countries.xml”)//country[name=”Belgium”]/capital/@href/string(), one can select the URI value of the capitalelement’shref attribute:

”http://. . . /cities-B.xml#xpointer(/cities/city[name=’Brussels’])”, which references thecitydocument of Brussels. But inside XQuery, that attribute value is just a string, which cannot be resolved in order to dereference thecapitalSimple Link.

Hence, inter-document xlink:href references as the above cannot be resolved in XQuery, at least not in general.

However, there exist some exceptions: If the URI’s XPointer expression is a shorthand pointer, as ”http://. . . /countries.xml#B”, or an XPointer scheme with an explicitIDvalue given, as in”http://. . . /countries.xml#xpointer(id(B))”, the URI can be resolved by combining the document() and the id()functions.

Also, there exist XML processing applications that provide proprietary functions which can be used to supply that functionality. E.g., the Saxon XML processing software [Kay] provides an XSLT extension functionsaxon:evaluate()which can be used to evaluate an XPath expression within a remote document specified by Saxon’sdocfunction. Furthermore, [RBHS04] propose an XQuery extension with“execute aturixquery{xquery}”.

These solutions either work only on restricted URIs, or within non-XQuery-standard software solutions. Within the scope of non-XQuery-standard XQuery functions as given inXQuery 1.0 and XPath 2.0 Functions and Operators, the described dereferencing functionality cannot be made available for the general case.

Apart from being insular, the above approaches for querying in the pres-ence of XLink referpres-ences require explicit link dereferencing. Preferable to this would be an approach for handling distributed XML data where the links are transparent in the sense that they are seamlessly embedded into the common XML / XPath data model, so that queries could follow the links implicitly to the referenced nodes in other documents without “minding the gap” between two linked documents. This leads to a logical data model where distributed, XLinked XML documents represent asingle, virtual, integrated XML instance, as shown in Figure 3.2. The XLink elements are seen as view definitions that in-tegrate the referenced XML data into the referencing XML instance. The XLink element specifies the referenced nodes, and how they are mapped seamlessly into the surrounding instance. Of special interest is here, how the link relation is

between XQuery and XPath functions also has been given up.

xpath-expr1

uri#xpath-exprx

uri

xpath-exprx

(physical instances)

xpath-expr1

(virtual instance) Figure 3.2: Extended XML Data Model with XLink Elements

mapped to a standard XML data model relation (e.g. child or attribute rela-tion). The virtual instance can then be processed with standard languages like XPath, XQuery, or XSLT without need for specific link dereferencing operators.

3.2 Applications: Data Integration and