Searching for Optimal Query Plans - Methods and Cost Models for XPath Query Processing in Main

In the introductory part of this thesis, we outlined the long-term objective to enable strategical query optimization for evaluating XPath expressions. The present work did not solve this problem entirely, but provided a large set of required models and methods. We will conclude this work by taking a look at the tasks that still remain open.

We suggested to constrain the optimization problem to single step expressions.

Hence, the solution space of imaginable query plans is highly limited. In fact, there are not just two, as shown in the introduction, but three possible plans when considerations include the additional key join due to Monet’s vertically fragmented data model (Fig. 5.1):

(a) Independent execution: the axis step and the name test are executed on the entire node set followed by a key join on both results.

Figure 5.1: Possible query plans for single step expressions combining an axis step , a key join./, and name test selectionσ.

(b) Axis step first: the axis step is performed on the entire node set, followed by the join and the name test on the intermediary result.

(c) Name test first: the name test is performed on the entire node set, followed by the join and the axis step over the restricted node set resulting from the first two.

Obviously, the three plans shown above involve issues of tactical optimization as well. Whereas (a) and (b) enable application of thevoidaxis step, the third variant falls back to the less efficientoid-version. Similarly, the key join in (a) require the usage of the sort-merge join implementation due to its oid-operands. The faster fetch-join, however, can be applied in both other query plans as it is associated there with at least one operand of typevoid.

Experiments considering scenarios that differ with respect to the chosen context nodes, axis steps, and node tests might in advance disqualify one of the three plans, if it is outperformed in all tests by one of the other two. However, query optimization will not become obsolete this way. First tests already showed that at least the plans (b) and (c) will pass this preselection.

When calculating costs for entire step expressions, we further have to analyze how the proposed axis step cost models could be combined with those of the involved standard database operations. Using the number of cache misses for cost indication, we would expect that the cache misses of all three involved operations could be simply summed up to represent the total costs of a step expression. Comparison of the three (respectively, two) calculated results enables the query optimizer to choose the optimal plan in any given situation.

[AAN01] Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton. Es-timating the Selectivity of XML Path Expressions for Internet Scale Applications. InThe VLDB Journal, pages 591–600. 2001.

[ADHW99] Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A.

Wood. DBMSs on a Modern Processor: Where Does Time Go? In Proc. of the Int’l Conference on Very Large Data Bases (VLDB), pages 266–277. Edinburgh, Scotland, UK, 1999.

[BBC⁺03] Anders Berglund, Scott Boag, Don Chamberlin, Mary Fern´andez, Micheal Kay, Jonathan Robie, and J´erˆome Sim´eon (eds.). XML Path Language (XPath) 2.0. Working Draft, W3C, Available at http:

//www.w3.org/TR/xpath20, 2003.

[BM] Rudolf Berrendorf and Bernd Mohr. PCL - The Performance Counter Library: A Common Interface to Access Hardware Performance Coun-ters on Microprocessors. Research Center Juelich. Available at http:

//www.fz-juelich.de/zam/PCL.

[Bon02] P. A. Boncz. Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Ph.d. thesis, Universiteit van Amsterdam, Am-sterdam, The Netherlands, 2002.

[Bos] Jon Bosak. The plays of Shakespeare. ibiblio Organisation. Available athttp://www.ibiblio.org/bosak.

[FMM⁺03] Mary Fern´andez, Ashok Malhotra, Jonathan Marsh, Marton Nagy, and Norman Walsh (eds.). XQuery 1.0 and XPath 2.0 Data Model. Working Draft, W3C, Available at http://www.w3.org/TR/xpath-datamodel, 2003.

[GKT03] Torsten Grust, Maurice van Keulen, and Jens Teubner. Staircase Join:

Teach a Relational DBMS to Watch its (Axis) Steps. In Proc. of the Int’l Conference on Very Large Data Bases (VLDB), pages 524–535.

Berlin, Germany, 2003.

[GKT04] Torsten Grust, Maurice van Keulen, and Jens Teubner. Accelerating XPath Evaluation in Any RDBMS. InACM Transactions on Database Systems. 2004. To appear.

[GMS92] Hector Garcia-Molina and Kenneth Salem. Main Memory Database Systems: An Overview. IEEE Transactions on Knowledge and Data Eng. (TKDE), 4: pages 509–516, 1992.

[Gru02] Torsten Grust. Accelerating XPath Location Steps. InProc. of the Int’l ACM SIGMOD Conference on Management of Data, pages 109–120.

Madison, Wisconsin, USA, 2002.

[Int03] Intel Corporation. IA-32 Intel Architecture Optimization Refer-ence Manual, 2003. Available at http://www.intel.com/design/

Pentium4/manuals.

[KG02] April Kwong and Michael Gertz. Schema-based Optimization of XPath Expressions. Technical report, UC Davis, 2002.

[Ley] Michael Ley. Computer Science Bibliography. University of Trier. Avail-able athttp://dblp.uni-trier.de/xml.

[Man] Stefan Manegold. The Calibrator Tool. CWI Amsterdam. Available at http://www.cwi.nl/~manegold.

[Man02] Stefan Manegold. Understanding, Modeling, and Improving Main-Memory Database Performance. Ph.D. thesis, Universiteit van Am-sterdam, 2002.

[MBK00] Stefan Manegold, Peter A. Boncz, and Martin L. Kersten. Optimizing Database Architecture for the New Bottleneck: Memory Access. The VLDB Journal, 9: pages 231–246, 2000.

[MBK02] Stefan Manegold, Peter A. Boncz, and Martin L. Kersten. Optimiz-ing Main-Memory Join On Modern Hardware. IEEE Transactions on Knowledge and Data Eng., 14: pages 709–730, 2002.

[MDB] MonetDB. CWI Amsterdam. http://monetdb.cwi.nl/monetdb.

[OMFB02] Dan Olteanu, Holger Meuss, Tim Furche, and Francois Bry. XPath:

Looking Forward. In Proc. of the EDBT Workshop on XML Data Management (XMLDM), volume 2490, pages 109–127. Springer, 2002.

[SAX] SAX (Simple API for XML). http://www.saxproject.org.

[SWK⁺02] Albrecht Schmidt, Florian Waas, Martin L. Kersten, Michael J. Carey, Ioana Manolescu, and Ralph Busse. XMark: A Benchmark for XML Data Management. InProc. of the Int’l Conference on Very Large Data Bases (VLDB), pages 974–985. Hong Kong, China, 2002.

[WJLY03] Wei Wang, Haifeng Jiang, Hongjun Lu, and Jeffrey Xu Yu. Contain-ment Join Size Estimation: Models and Methods. InProc. of the Int’l ACM SIGMOD Conference on Mangement of Data, pages 145–156.

San Diego, California, USA, 2003.

[WPJ02] Yuqing Wu, Jignesh M. Patel, and H. V. Jagadish. Estimating An-swer Sizes for XML Queries. In EDBT, volume 2287, pages 590–608.

Springer, 2002.

Im Dokument Methods and Cost Models for XPath Query Processing in Main Memory Databases (Seite 70-73)