Projection of XML Documents and Fragments

Main memory XQuery processors like Saxon [SAX], Xalan [Xal] or Galax [Gal] are not able to process queries on XML documents of arbitrary size because the documents have to be loaded completely before evaluating the query. Hence, for overcoming this limitation, the work in [MS03] proposes a method for reducing XML documents to relevant parts wrt. a query, calledprojection. For a given XPath query, a set of relevant projection paths is computed at compile time. These paths describe the parts of an XML document that are needed to answer the query. Then, before loading a document, it can thus be reduced in size according to these paths. Experiments in [MS03] have shown that, using this method, the memory requirements can be reduced to 5% on average.

If we want to apply this approach in our implementation, we have to discuss the

relationship to the different query evaluation modes, i.e. query shipping, data shipping and hybrid shipping.

Query Shipping (dbxlink:eval=”remote”). We delegate the evaluation of the XLink and the ongoing query evaluation to the remote server. The final result for this part is then sent back to the originating server. Projection methods are not required here because the result set can not be reduced. However, the remote server might exploit these methods autonomously for its internal XPath evaluation processes.

Data Shipping (dbxlink:eval=”local”). In case of dbxlink:eval=”local”, the referenced document is requested for a local evaluation, i.e. the XPointer and the remaining query are processed subsequently on a local copy. If we want to avoid big documents to be stored in the local database backend, we can apply projection (based on the XPointer of the currently processed link) to the received document before it is stored. This means that the requested document is projected wrt. the XPath expression contained in the pointer. However, note that this is only useful if projection and the query evaluation on the smaller fragment is faster than storing the whole document and evaluating the query on the big fragment.

Hybrid Shipping (dbxlink:eval=”distributed”). If this shipping strategy is applied, the XPointer expression is evaluated remotely returning the referenced XML fragment to which the remaining query is applied locally. Thus, the remote server could apply projection to the returned fragment according to the remaining query. Consider the following informal example which illustrates how this approach could be used in our implementation for hybrid shipping.

Example 7.1

Assume that the following query has to be evaluated on a variant of Mondial’s dis-tributed version where for all links dbxlink:eval=”distributed” (i.e. hybrid shipping) is set:

//organization[@abbrev=”EU”]/member/id(@capital)/population

The XPointers of the member elements (which are children of elements representing organizations) reference the countries that are members of the actual organization. When these pointers are resolved, not the whole country elements are transmitted, but only the element “hull” with the capital XLink element (that due to the “L”-directive as “make-attribute” will contribute the required capital attribute of the next step). Then, resolving the capital link, not the whole corresponding city elements are transmitted, but only the element hulls with the population subelements that are needed to answer the last step of the query.

As sketched in the example above, given an XLink and an XPath user query, the idea is to project the referenced XML fragment according to the XPath query rest. This would be an “extended” hybrid shipping leading to less data traffic over the network

but it requires the remote server to support this operation. Here, the question comes into mind if it is better to let the remote server process the whole query, i.e. to switch to query shipping. However, there are some cases where hybrid shipping is preferable compared to query shipping, e.g. there might be servers that do not support query shipping according to thedbxlinkapproach but, besides XPath processing facilities, they could offer projection methods.

In this section, some related work is discussed and we also sketch some open issues that might lead to further work.

8.1 Related Work

Active XML. A general approach for integrating remote access functionality into XML documents is proposed byActive XML[ABM⁺02]: ^<axml:sc^>elements allow for embed-dingservice calls into XML documents. Active XML anddbxlinkdiffer significantly wrt.

generality (Active XML) and specialization (dbxlink) and in the degree of integration with the database functionality. While thedbxlink approach is an incremental extension to the existing concepts of XLink and XPointer, targeting to provide a transparent data model and support XPath/XQuery for them from the database point of view, Active XML is a generic extension of functionality towards Web Services. Nevertheless, as described below, dbxlinkand Active XML can be used to implement each other.

Active XML has no processing directives (the left-hand- or “L”-directives in our ap-proach) specifying how the results of Web Services should be integrated. Especially, in Active XML, it is not possible to create attributes or duplicate elements. Addition-ally, the dbxlink proposal provides explicit processing strategies (cf. thedbxlink:eval and dbxlink:cache directives). Because there are some similarities between our approach and Active XML we show how these techniques are related to each other wrt. Web Services.

In Active XML,axml:scelements represent Web Service calls which are then replaced by the result of the service call. It follows an example Active XML document¹:

<directory^>

<axml:sc^>toy.xyz.com/GetToyPersonnel()^</axml:sc^>

</dep^>

<axml:sc^>dvd2000.com/GetDVDPersonnel()^</axml:sc^>

</dep^>

</directory^>

Note that there are many extra parameters that have to be supplied in order to fully specify a Web Service call but they are generally omitted by the Active XML authors for clarity reasons.

1Taken fromhttp://www.activexml.net.

On one hand, an Active XML service that implements thedbxlinkmodeling and takes adbxlink-extended XLink element could return the appropriate XML fragment which is then integrated into the Active XML document conforming to dbxlink:transparent=”drop-element insert-nodes”. Other mappings are not possible with Active XML. On the other hand, in order to embed Active XML into our proposal, evaluating XLink elements that refer to Web Services by adbxlink-extended XPath/XQuery engine covers the basic Ac-tive XML functionality. Depending on the given Web Service, we can build appropriate calls as described later in Section 8.2.1.

Decomposing Queries on Distributed XML Data. Several approaches dealing with strategies for decomposing queries on distributed XML data have been investigated whose results can also be used for the implementation of the dbxlink specification. In [Suc02], distributed query evaluation for general semistructured data graphs is investi-gated. The approach assumes that a fixed community of sites agrees on sharing their data and answering queries. They split the query into adecomposed query, evaluate its parts independently at each site, and assemble the result fragments. In contrast to this, the scenario of our approach considers XLink references betweenarbitrary sources, and the specification for mapping the linked fragments to a virtual instance and querying it.

The logical modeling of [Suc02] is similar to XInclude. In [BG03], the distribution of XML repositories is investigated, focusing on index structures. Both these approaches are orthogonal to ours (where the focus is on the modeling and handling of the interplay of links seen as views) and could probably be applied for a more efficient implementation.

SXLink. The XLink processor SXLink [LL05] implemented in Scheme aims for offering methods in order to be able to obtain all information about XLinks contained in a set of XML documents. In this system, queries are supported by an XPath extension called

“XPathLink” implemented as “SXPathLink” in Scheme. This language introduces an additional XPath axis traverse for traversing XLinks explicitly in XPath queries. As already discussed in Section 4.1.1, we showed that our approach which handles queries over linked XML instances transparently is preferable in the general case.

Im Dokument Evaluation of Queries on Linked Distributed XML Data (Seite 132-136)