CONCLUSION AND OUTLOOK - Accelerating XPath Evaluation in Any RDBMS

This work has been primarily motivated by the need for an XPath index structure that would be capable of

(1) running on top of a relational back-end to leverage its stability, scalability, and performance,

(2) providing coverage for all nodes of an XML document tree (such that the index itself can serve as the only representation of the document inside the database), (3) closely tracking the XPath semantics (especially with respect to adequate

sup-port forall XPath axes and document order), as well as

(4) rooting XPath traversals in arbitrary context nodes.

The latter requirement, specifically, did arise in the context of an ongoing project to construct an XQuery compiler: An XQuery expression like

for $v in e return $v/p

sequentially binds variable $v to the arbitrarily computed nodes of sequence e.

The index thus needs to evaluate pathprooted in context nodes scattered over the whole document tree. For the XPath accelerator, any context node is as good as any other, in particular, the index has no bias towards the document root element like many related proposals.

While the above makes the XPath accelerator a promising target for XQuery compilation, another core feature of XQuery, element construction, needs to be addressed with care. An XQuery element constructor

elementn{e1,e2}

constructs a new node (with tag n) with left and right subtrees e1 and e2, re-spectively. To generate a validpre/post encoding for the constructed tree means to renumber all nodes in the encoding ofe2. This is similar to the node insertion problem we have briefly discussed in Section 7, since the nodes ofe2are in the fol-lowingaxis of the nodes ine1. The compiler, however, has control about the order in which e1 and e2 are evaluated: since XQuery is a functional language without side effects, evaluation order of subexpressions does not matter semantically. Now, ife1 is evaluated prior toe2, the compiler can—in anticipation of the ‘,’ sequence construction operator—encode the XML fragmente2withpre/post ranks that are immediately following those used for e1. The encoding for e2 will be consistent in itself (remember that the absolute pre and post values are insignificant), and the element construction above is merely a matter of concatenating the pre/post encodings fore1 ande2. Thisthreading of pre/post ranks through the evaluation of an XQuery program is a technique currently under investigation by the authors.

More on the theoretical side—geared towards the development of an optimizing XPath or XQuery compiler—we believe that the XPath accelerator provides the necessary hooks to incorporate an effectivecost estimation for XPath queries.

As discussed in Section 4, it is theaxis window areawhich dominantly influences the step evaluation cost. Since the pre/post plane allows us to very accurately estimate window sizes given only a context node’s descriptor, this could yield a cost model that is sensitive to the actuallocation of the context nodes and not only to the query itself.

The XPath symmetry rewrites explored in [Olteanu et al. 2001] could then be used to establish the space of equivalent XPath queries out of which a cost-based optimizer would pick candidates based on (a function of) pre/post plane window areas providing the cost measure.

It will be interesting to compare this approach to more intricate cost models for XML queries as presented in [Chen et al. 2001] and [Wu et al. 2002].

ACKNOWLEDGMENTS

The authors would like to thank the Monet people at CWI (Amsterdam, The Netherlands) for their support and most useful feedback. Maurice van Keulen has been with the University of Konstanz as a DAAD INNOVATEC funded research fellow. The comments of the anonymous reviewers were of great help in improving the presentation of the material.

REFERENCES

Altinel, M. and Franklin, M. J. 2000. Efficient Filtering of XML Documents for Selective Dissemination of Information. InProc. of the 26th Int’l Conference on Very Large Databases (VLDB). Morgan Kaufmann Publishers, Cairo, Egypt, 53–64.

Berglund, A.,Boag, S.,Chamberlin, D.,Fernandez, M. F.,Kay, M.,Robie, J.,and Sim´eon, J.2002. XML Path Language (XPath) 2.0. Tech. Rep. W3C Working Draft, Version 2.0, World Wide Web Consortium. Aug.http://www.w3.org/TR/xpath20/.

Boag, S.,Chamberlin, D.,Fernandez, M.,Florescu, D.,Robie, J.,and Sim´eon, J.2002.

XQuery 1.0: An XML Query Language. Tech. Rep. W3C Working Draft, World Wide Web Consortium. Aug. http://www.w3.org/TR/xquery.

B¨ohm, C.,Berchtold, S., Kriegel, H.-P., and Michel, U. 2000. Multidimensional Index Structures in Relational Databases. Journal of Intelligent Information Systems (JIIS) 15,1, 51–70.

Boncz, P. A.2002. Monet: A Next-Generation DBMS Kernel for Query-Intensive Applications.

Ph.D. thesis, University of Amsterdam, The Netherlands.

Boncz, P. A. and Kersten, M. L.1999. MIL Primitives for Querying a Fragmented World.The VLDB Journal 8,2, 101–119.

Chen, Z.,Jagadish, H.,Korn, F.,Koudas, N.,Muthukrishnan, S.,Ng, R.,and Srivastava, D.2001. Counting Twig Matches in a Tree. InProc. of the 17th Int’l Conference on Data Engineering (ICDE). IEEE Computer Society, Heidelberg, Geramny, 595–604.

Cohen, E.,Kaplan, H.,and Milo, T.2002. Labeling Dynamic XML Trees. InProc. of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS).

ACM Press, Madison, Wisconsin, 271–121.

Cooper, B. F.,Sample, N.,Franklin, M. J.,Hjaltason, G. R.,and Shadmon, M.2001. A Fast Index for Semistructured Data. InProc. of the 27th Int’l Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers, Rome, Italy, 341–360.

Dietz, P. F. and Sleator, D. D.1987. Two Algorithms for Maintaining Order in a List. In Conference Record of the 19th Annual ACM Symposium on Theory of Computing (STOC).

ACM Press, New York City, 365–372.

Fernandez, M., Marsh, J., and Nagy, M.2002. XQuery 1.0 and XPath 2.0 Data Model.

Tech. Rep. W3C Working Draft, World Wide Web Consortium. Aug.http://www.w3.org/TR/

query-datamodel.

Florescu, D. and Kossmann, D. 1999. A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Tech. Rep. 3680, INRIA, Rocquen-court, France. May.

Goldman, R. and Widom, J.1997. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. InProc. of the 23rd Int’l Conference on Very Large Databases (VLDB). Morgan Kaufmann Publishers, Athens, Greece, 436–445.

Gottlob, G.,Koch, C., and Pichler, R. 2002. Efficient Algorithms for Processing XPath Queries. InProc. of the 28th Int’l Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers, Hong Kong, China, 95–106.

Grust, T.2002. Accelerating XPath Location Steps. InProc. of the 21st Int’l ACM SIGMOD Conference on Management of Data. ACM Press, Madison, Wisconsin, USA, 109–120.

Grust, T. and van Keulen, M.2003. Tree Awareness for Relational Database Kernels: Staircase Join. InIntelligent Search on XML, H. Blanken, H.-J. Schek, and G. Weikum, Eds. Number 2818 in Lecture Notes in Computer Science. Springer Verlag, Heidelberg, Germany.

Guttman, A.1984. R-Trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD 1984, Proc. of Annual Meeting. ACM Press, Boston, Massachusetts, 47–57.

Hellerstein, J. M.,Naughton, J. F.,and Pfeffer, A. 1995. Generalized Search Trees for Database Systems. InProc. of the 21st Int’l Conference on Very Large Databases (VLDB).

Morgan Kaufmann Publishers, Zurich, Switzerland, 562–573.

Kamel, I. and Faloutsos, C.1993. On Packing R-Trees. InProc. of the 2nd Int’l Conference on Information and Knowledge Management (CIKM). ACM Press, Washington DC, USA, 490–499.

Kaushik, R.,Bohannon, P.,Naughton, J. F.,and Korth, H. K.2002. Covering Indexes for Branching Path Queries. InProc. of the 21st Int’l ACM SIGMOD Conference on Management of Data. ACM Press, Madison, Wisconsin, USA, 133–144.

Kriegel, H.-P., P¨otke, M., and Seidl, T. 2000. Managing Intervals Efficiently in Object-Relational Databases. InProc. of the 26th Int’l Conference on Very Large Databases (VLDB).

Morgan Kaufmann Publishers, Cairo, Egypt, 407–418.

Li, Q. and Moon, B.2001. Indexing and Querying XML Data for Regular Path Expressions.

InProc. of the 27th Int’l Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers, Rome, Italy, 361–370.

Olteanu, D.,Meuss, H.,Furche, T.,and Bry, F. 2001. Symmetry in XPath. Tech. Rep.

PMS-FB-2001-16, Institute of Computer Science, University of Munich, Germany.

Roussopoulos, N. and Leifker, D.1985. Direct Spatial Search on Pictorial Databases Using Packed R-Trees. InProc. of the ACM SIGMOD Int’l Conference on Management of Data.

ACM Press, Austin, Texas, 17–31.

SAX (Simple API for XML). http://sax.sourceforge.net/.

Schmidt, A., Waas, F., Kersten, M.,Carey, M. J., Manolescu, I.,and Busse, R. 2002.

XMark: A Benchmark for XML Data Management. InProc. of the 28th Int’l Conference on Very Large Databases (VLDB). Morgan Kaufmann Publishers, Honk Kong, China, 974–985.

Shanmugasundaram, J.,Tufte, K.,He, G.,Zhang, C.,DeWitt, D.,and Naughton, J.1999.

Relational Databases for Querying XML Documents: Limitations and Opportunities. InProc.

of the 25th Int’l Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers, Edinburgh, Scotland, 302–314.

Suciu, D. and Milo, T.1999. Index Structures for Path Expressions. InProc. of the 7th Int’l Conference on Database Theory (ICDT). Number 1540 in Lecture Notes in Computer Science (LNCS). Springer Verlag, Jerusalem, Israel, 277–295.

Wu, Y.,Patel, J. M.,and Jagadish, H.2002. Estimating Answer Sizes for XML Queries. In Proc. of the 8th Int’l Conference on Extending Database Technology (EDBT). Springer Verlag, Prague, Czech Republic, 590–608.

Zhang, C.,Naughton, J.,DeWitt, D.,Luo, Q.,and Lohman, G.2001. On Supporting Con-tainment Queries in Relational Database Management Systems. InProc. of the ACM SIGMOD Int’l Conference on Management of Data. ACM Press, Santa Barbara, California, 425–436.

Received M Y; revised M Y; accepted M Y

This document is the online-only appendix to:

Im Dokument Accelerating XPath Evaluation in Any RDBMS (Seite 37-41)