4. XML Query Languages I

(1)

XML Databases

4. XML Query Languages, 17.11.08

Silke Eckstein Andreas Kupfer

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

4.1 Introduction 4.2 XPath 4.3 XPointer 4.4 XLink 4.5 Overview 4.6 References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 2

• Querying XML Documents –"QueryingXML data" essentially means

•to identify (or address) nodes,

•to testcertain further propertiesof these nodes,

•then to operate on the matches,

•and finally, to construct result XML documents as answers.

–In the XML context, the language XQueryplays the role that SQL has in relational databases.

–XQuery can express all of the above constituents of XML querying:

•XPath, as an embedded sublanguage, expresses the locateand test parts;

•XQuery can then iterate over selected parts, operate on and construct answers from these.

•There are more XML languages that make use of XPath as embedded sublanguages.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 3 [Scholl07]

4.1 Introduction

• Xpath – Navigational access to XML documents –In a sense, the traversalor navigationof trees of XML nodes

lies at the core of every XML query language.

–To this end, XQuery embeds XPathas its tree navigation sub-language:

•Every XPath expression also is a correct XQuery expression.

•XPath 2.0: http://www.w3.org/TR/xpath20/ .

–Since navigation expressions extract (potentially huge volumes of) nodes from input XML documents, the efficient

implementation of the sub-language XPath is a prime concern when it comes to the construction of XQuery processors.

4.1 Introduction

• XPath as an embedded sublanguage

–XPathis a declarative, expression-based language to locate and test doc nodes (with lots of syntactic sugar to make querying sufficiently sweet).

–Addressing document nodes is a core task in the XML world.

XPath occurs as an embedded sub-language in

•XSLT(extract and transform XML document [fragments] into XML, XHTML, PDF, . . . )

•XQuery(compute with XML document nodes and contents, compute new docs, . . . )

•XPointer(representation of the address of one or more doc nodes in a given XML document)

•XMLSchema(represent sets of elements as scopes for uniqueness or key concepts)

4.1 Introduction

• XML Pointer Language

–Allows to use XPath expressions in URIs –Used to identify ranges within XML documents

• XML Linking Language –To define links, which can…

•Be as easy as HTML links

•Have multiple end points

•Be stored independently of the referenced documents –Allows to add metadata to links

–Uses XPointer and XPath

4.1 Introduction

(2)

7

4. XML Query Languages I

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

• Context node

–In XPath, a path traversal starts off from a sequence of context nodes.

•XPath navigation syntax is simple:

•It is a common error in XQuery expressions to try and start an XPath traversal without the context node sequence being actually defined.

4.2 XPath

An XPath step

cs0/step

• cs0 denotes the context node sequence, from which a navigation in direction stepis taken.

• Multiple steps

–An XPath navigation may consist of multiple steps stepi ,i ≥ 1 taken in succession.

–Step step1starts o from the context node sequence cs0and arrives at a sequence of new nodes cs1.

–cs1is then used as the new context node sequence for step2, and so on.

4.2 XPath

Multi-step XPath path

cs0/step₁/step₂/…

≡ ((cs₀/step₁)/step₂)/…

cs1

• XPath location steps

–A step(or location step) stepispecifies 1. the axisax, i.e., the direction of navigation

taken from the context nodes,

2. a node test nt, which can be used to navigate to nodes of certain kind (e.g., only attribute nodes) or name, 3. optional predicatespi which further filter the sequence

of nodes we navigated to.

4.2 XPath

XPath step

ax :: nt [p₁] … [p_n]

• XPath axes

–XPath defines a family of 12 axes allowing for flexible navigation within the node hierarchy of an XML tree.

–XPath axes semantics

•○○○○marks the context node

•@ marks attribute nodes,

•●represents any other node kind (inner ●nodes are element nodes).

4.2 XPath

@

@ @@

@

• XPath axes:

child,parent, attribute –The child axis does notnavigate

to the attribute nodes below ○○○○. –The only way to access

attributes is to use the attribute axis explicitly.

4.2 XPath

@

@ @@

@

(3)

• XPath axes:

descendant, ancestor, self –In a sense, descendant and

ancestor represent the transitive closures of child and parent, respectively.

@

@ @@

@

• XPath axes:

preceding, following, ancestor-or-self –Note: In the serialized XML document,

nodes in the preceding (following) axis appear completely

before (after) the context node.

@

@ @@

@

• XPath axes:

preceding-sibling, following-sibling, descendant-or-self

4.2 XPath

@

@ @@

@

• XPath axes: Examples (1)

–In these first examples, there is a single initial context node, i.e., a context node sequence of length 1: the root element a.

•Here, we set the node test ntto simply node()which means to notfilter any nodes selected by the axis.

4.2 XPath

XPath example (<a b="0">

<c d="1"><e>f</e></c>

<g><h/></g>

</a>)/child::node()

(<c d="1">

<e>f</e>

</c>,

<g><h/></g>)

⇒⇒

4.2 XPath

<c d="1"><e>f</e></c>

<g><h/></g>

</a>)/attribute::node()

attribute b { "0" }

⇒

⇒⇒

⇒

<c d="1"><e>f</e></c>

<g><h/></g>

</a>)/descendant::node()

(<c d="1"><e>f</e></c>,

<e>f</e>, text { "f" },

<g><h/></g>),

<h/>

)

⇒⇒⇒

⇒

–Notes:

•If an extracted node has no suitable XML representation by itself, XQuery serializes the result using the XQuery node constructor syntax, e.g.,

attribute b {"0" } or text { "f" } .

•Nodes are serialized showing their content. This does notimply that all of the content nodes have been selected by the XPath expression!

4.2 XPath

<c d="1"><e>f</e></c>

<g><h/></g>

</a>)/child::node()/child::node()

(<e>f</e>,

<h/>

)

⇒

(4)

• XPath results: Order & duplicates –XPath Semantics: The result node sequence of

any XPath navigation is returned in document

order with no duplicate nodes(remember: node identity).

–Examples:

4.2 XPath

Duplicate nodes are removed in XPath results . . . (<a b="0">

<c d="1"><e>f</e></c>

<g><h/></g>

</a>)/child::node()/parent::node()

<a>

...

</a>

⇒

⇒⇒

⇒

(<a><c/><d/>

</a>

)/child::node()/following-sibling::node()

(<c/>,

<d/>

)

⇒

• XPath: Results in document order

–Note:

•The XPath document order semantics require to occur before <c/> and <e/> to occur before <f/>.

–The result (<e/>,<f/>,,<c/>) would have been OK as well.

–In contrast, the result (,<e/>,<c/>,<f/>) is inconsistentwith respect to the order of nodes from separatetrees!

4.2 XPath

XPath: context node sequence of length > 1 (<a><c/></a>,

<d><e/><f/></d>)/child::node()

(,<c/>,

<e/>,<f/>)

⇒⇒⇒

⇒

• XPath: Node test

–Once an XPath step arrives at a sequence of nodes, we may apply a node test to filter nodes based on kindand name.

4.2 XPath

Kind Test Semantics

node() let any node pass

text() preserve text nodes only attribute() preserve attribute nodes only comment() preserve comment nodes only processing-instruction() preserve processing instructions processing-instruction(p) preserve processing instructions

of the form <?p ...?>

document-node() preserve the (invisible) document root node

• XPath: Name test

–A node test may also be a name test, preserving only those element or attribute nodes with matching names.

–Note:

•In general we will have cs/ax::*⊆⊆⊆⊆cs/ax::node().

4.2 XPath

Name Test Semantics

name preserve element nodes with tag nameonly (for attributeaxis: preserve attributes)

* preserve element nodes with arbitrary tag names (for attributeaxis: preserve attributes)

• XPath: Node test example

–The XQuery builtin function string-joinhas signature string-join(string*, string) as string.

4.2 XPath

Collect and concatenate all text nodes of a tree string-join

(<a>A<c>B</c>

<d>C</d>

</a>/descendant-or-self::node()/child::text() , "")

Equivalent: compute the string value of node a string(<a>A<c>B</c>

<d>C</d>

</a>)

"ABC"

⇒

⇒⇒

⇒

• XPath: Ensuring order is not for free –The strict XPath requirement to construct a result in

document order may imply sorting effort depending on the actual XPath implementation strategy used by the processor.

•In many implementations, the descendant-or-self::x step will yield the context node sequence (<x>...</x>,<x>...</x> ) for the child::y step.

•Such implementations thus will typically extract <y id="1"/> before

<y id="0"/> from the input document.

4.2 XPath

(<x>

<x><y id="0"/></x>

<y id="1"/>

</x>

)/descendant-or-self::x/child::y

(<y id="0"/>,

<y id="1"/>)

⇒⇒⇒

⇒

(5)

• XPath: Predicates

–The optional third component of a step formulates a list of predicates[p1]...[pn] against the nodes selected by an axis.

–XPath predicate evaluation:

•Predicates have higher precedence than the XPath step operator /,i.e.:

cs/step[p1][p2]≡ cs/((step [p1])[p2])

•The p_iare evaluated left-to-right for each node in turn. In p_i, the current context itemis available as '.'.

–Context item: predicates may be applied to sequences of arbitrary items (not only nodes)

• XPath: Predicates

–An XPath predicate p_i, may be any XQuery expression evaluating to some value v. To finally evaluate the predicate, XQuery computes the effective Boolean valueebv(v).

•Item x ∉{0,""; NaN; false()}, items xiarbitrary. Builtin function boolean(item*) as boolean also computes the eective Boolean value.

Value v ebv(v)

() false()

0, NaN false()

"" false()

false() false()

x true()

(x₁, x₂,..., x_n) true()

• XPath: Predicate example

–Note: Existential semantics of path predicates.

4.2 XPath

Select all elements with an idattribute (<a id="0">

<c id="1"/>

<c></c>

<d id="2">e</d>

</a>

)/descendant-or-self::*[./attribute::id]

(<a id="0">

...

</a>,

<c id="1"/>,

<d id="2">e</d>

)

⇒⇒⇒

⇒

Select all elements with a "b"grandchild element (<a id="0">

<c id="1"/>

<c></c>

<d id="2">e</d>

</a>

)/descendant-or-self::*[./child::*/child::b]

<c></c>

⇒

• XPath: Predicates and atomization

–In XQuery, if any item x – atomic value or node – is used in a context where a value is required, atomizationis applied to convert x into an atomic value.

•Nodes in value contexts commonly appear in XPath predicates.

Consider:

4.2 XPath

Value comparison in a predicate (<a>

42

<c><d>42</d></c>

<e>43</e>

</a>)/descendant-or-self::*[. eq 42]

(42,

<c><d>42</d></c>,

<d>42</d>

)

⇒⇒

• Atomization

–Atomization turns a sequence (x₁,...,xn) of items into a sequence of atomic values (v₁,...,vn):

1. If xiis an atomic value, vi≡ xi

2. if xi is a node, viis the typed value of xi.

– Remember: the typed value is equal to the string value if xihas not been validated. In this case, vihas type untypedAtomic.

–The XQuery builtin function

data(item*) as anyAtomicType*

may be used to perform atomization explicitly (rarely necessary).

4.2 XPath

• XPath: Predicates and atomization

–

–Note: the value comparison operator eqis witness to the value context in which '.' is used in this query.

–For the context item <c><d>42</d></c> (a non- validated node), data(.) returns "42" of type untypedAtomic.

4.2 XPath

Atomization (and casting) made explizit (<a>

42

<c><d>42</d></c>

<e>43</e>

</a>)/descendant-or-self::*[data(.) cast as double eq

42 cast as double]

(6)

• Atomization and subtree traversals –Since atomization of nodes is pervasive in XQuery

expression evaluation, e.g., during evaluation of

•arithmetic and comparison expressions,

•function call and return,

•explicit sorting (order by),

–efficient subtree traversals are of prime importance for any implementation of the language:

4.2 XPath

Applying data() to a node and its subtree data(<a>

foo<c>

<d>b</d><e>ar</e>

</c>

</a>)

≡ data a

d e c b

"fo"

"o"

"b" "ar"

• XPath: Positional access

–Inside a predicate [p] the current context item is '.'.

•An expression may also access the positionof '.' in the context sequence via position(). The first item is located at position 1.

•Furthermore, the position of the lastcontext item is available via last().

•A predicate of the form [position() eq i] with ibeing any XQuery expression of numeric type, may be abbreviated by [i].

4.2 XPath

Positional access

(x₁,x₂,...,x_n) [position() eq i] ⇒⇒⇒⇒^xi

(x1,x2,...,xn)[position() eq last()] ⇒⇒⇒⇒^xn

• XPath: The context item '.'

–As a useful generalization, XPath makes the current context item '.' available in each step (not only in predicates)

–In the expression cs/e

expression ewill be evaluated with '.' set to each item in the context sequence cs(in order). The resulting sequence is returned.

•Remember: if ereturns nodes (ehas type node*), the resulting sequence is sorted in document order with duplicates removed.

4.2 XPath

⇒⇒

⇒⇒ Accessing '.'

(<a>1</a>,2,<c>3</c>)/(. + 42)

⇒

⇒⇒(43.0,44.0,45.0)

(<a>1</a>,2,<c>3</c>)/name(.)

⇒⇒

⇒⇒("a","b","c") (<a>1</a>,2,<c>3</c>)/position()

⇒

⇒(1,2,3)

(<a></a>)/(./child::b, .)

⇒⇒

⇒⇒(<a></a>,)

• Combining node sequences

–Sequences of nodes (e.g., the results of XPath location step) may be combined via

•|, union (used synonymously), intersect, except

–These operators remove duplicate nodesbased on identity and return their result in document order.

–Note: Introduced in the XPath context because a number of useful navigation idioms are based on these operators:

4.2 XPath

Selecting all x children and attributes of context node cs/(./child::x | ./attribute::x)

Select all siblings of context node cs/(./preceding-sibling::node()

| ./following sibling::node()) or

cs/(./parent::node()/child::node() except .)

• XPath: Abbreviations

–Since XPath expressions are pervasive in XQuery, query authors commonly use the succinct abbreviated XPath syntaxto specify location steps.

1(At the beginning of a path expression.)

4.2 XPath

Abbreviation Expansion

nt child::nt

@ attribute::

.. parent::node()

// /descendant-or-self::node()/

/¹ root(.)/

step¹ ./step

(7)

• XPath abbrevation examples

Abbreviation Expansion

a/b/c ./child::a/child::b/child::c

a//@id ./child::a/descendant-or-self::node()/attribute::id //a root(.)/descendant-or-self::node()/child::a a/text() ./child::a/child::text()

4.2 XPath

• PurchaseReport –regions [RegionsType]

•{keyref pNumKey selector: zip/part field: @number}

–parts [PartType]

–@period –@periodEnding –{unique

selector: regions/zip field: @code}

–{key pNumKey selector: parts/part field: @number}

• RegionsType –zip*

•part*

–@number –@quantity

•@code

• PartsType –part*

•@number

4.2 XPath

41

4. XML Query Languages I

• XML Pointer Language

–Allows to use XPath expressions in URIs –Used to identify ranges within XML documents –URI + XPath == XPointer

–[XPointer] 4 specifications:

•http://www.w3.org/TR/xptr-framework

XPointer Framework (Recommendation 25 March 2003)

•http://www.w3.org/TR/xptr-xmlns

xmlns() Scheme (Recommendation 25 March 2003)

•http://www.w3.org/TR/xptr-element

element() Scheme (Recommendation 25 March 2003)

•http://www.w3.org/TR/xptr-xpointer (Working Draft 19 December 2002)

4.3 XPointer

(8)

• Framework

–Syntax: uri#pointer1 … pointer_n

pointer ::= ( idValue | linkSchema '(' schemaData ')' )*

•IdValue references an element

•The values "xpointer", "xmlns" and "element" are allowed as values for LinkSchema

–Provides extensibility

–Resolving of multiple references from left to right

•BUT: Break on success

–Contextof XPointer expressions: root nodeof the XML document referenced by URI

43

4.3 XPointer

• xpointer() scheme: ranges and points –Every location expression in XPath returns a node set –XPointer can identify parts of documents which can not

be represented as XPath node set

–Additional concept: locationandlocation sets Location: Point, range or standard XPath node

•Point location: standard node and index (points either to a child node or to a character)

•Range location: 2 points, seperated by the keyword "to"

–2 new node tests: range() andpoint() –New functions working on ranges and points:

range-to, string-range, range, range- inside, start-point, end-point, here and origin

44

4.3 XPointer

• xmlns() scheme

–Namespaces are used only with the xmlns() scheme –xmlns() scheme expressions

•always fail

•assign a namespace to a prefix

–Following schema expressions can use the prefix –Syntax: NamespacePrefix = namespace URI

45

4.3 XPointer

Example

xmlns(xlink=http://www.w3.org/1999/xlink) xpointer(//*[@xlink:role="rollen.xml#author"])

• element() scheme

–Identify ranges in documents by id values and / or element numbers (position of child nodes) –Examples:

46

4.3 XPointer

Element node with the id value xmldat06032005 element(xmldat06032005)

2^ndchild of that node

element(xmldat06032005/2) 2^ndchild of the 2^ndchild of the root node Element(1/2/2)

47

4. XML Query Languages I

• XML Linking Language –To define links, which can…

•be as easy as HTML links

•have multiple end points

•be stored independently of the referenced documents –Allows to add metadata to links

–Uses XPointer and XPath

–[XLink] W3C Recommendation 27 June 2001 http://www.w3.org/TR/xlink

48

4.4 XLink

(9)

• XLink namespace

–The XLink namespace has to be declared, so that applications and tools can recognize XLink markup:

XLink namespace

<myElement

xmlns:xlink="http://www.w3.org/1999/xlink">

...

</myElement>

• Simple links

–… link exactly 2 resources with a reference from the local to the remote

–… are indicated by the attribute xlink:type='simple'

–… contain special attributes with information about:

•The remote resource: xlink:href

•Properties of the link: xlink:rolersp. Xlink:arcrole (the value must be an URI)

•The meaning of the link: xlink:title

•The type of presentation of the remote resource: xlink:show (possible values: new, replace, embed, other, none)

•When the link shall be followed: xlink:actuate(values can be: onLoad, onRequest, other, none)

•

51 [Ray01]

4.4 XLink

Simple links – examples

<cite xmlns:xlink="http://www.w3.org/1999/xlink"

xlink:type="simple"

xlink:href="http://www.books.org/huckfinn.xml"

xlink:show="new"

xlink:actuate="onRequest" >Huckleberry Finn</cite>

<graphic xmlns:xlink="http://www.w3.org/1999/xlink"

xlink:href="figs/diagram39.png"

xlink:show="embed"

xlink:actuate="onLoad" />

<dataref xmlns:xlink="http://www.w3.org/1999/xlink"

xlink:href="http://dataserv.buggs.com/db.xml#entry92"

xlink:actuate="onLoad"

xlink:show="embed" />

• Extended XLinks –Components:

•Element with attributexlink:typecontaining all other link components

•Elements representing local resources (xlink:type='resource')

•Elements representing remote resources (xlink:type='locator')

•Elements representing the actual link between 2 resources (xlink:type='arc')

•Elements containing descriptions of the links (xlink:type='title')

•Some other attributes

52

4.4 XLink

53

4.4 XLink

•

R: required, O: optional

[XLink] 54

4.4 XLink

simple extended locator arc resource title

type R R R R R R

href O R

role O O O O

arcrole O O

title O O O O O

show O O

actuate O O

label O O

from O

to O

(10)

55 [XLink]

4.4 XLink

• Extended links – example

<!ELEMENT courseload ((tooltip|person|course|gpa|go)*)>

<!ATTLIST courseload xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"

xlink:type (extended) #FIXED "extended"

xlink:role CDATA #IMPLIED

xlink:title CDATA #IMPLIED>

<!ELEMENT tooltip ANY>

<!ATTLIST tooltip xlink:type (title) #FIXED "title"

xml:lang CDATA #IMPLIED>

<!ELEMENT person EMPTY>

<!ATTLIST person xlink:type (locator) #FIXED "locator"

xlink:href CDATA #REQUIRED

xlink:role CDATA #IMPLIED

xlink:title CDATA #IMPLIED

xlink:label NMTOKEN #IMPLIED>

<!ELEMENT course EMPTY>

<!ATTLIST course xlink:type (locator) #FIXED "locator"

xlink:href CDATA #REQUIRED

xlink:role CDATA #FIXED "http://www.example.com/linkprops/course"

xlink:title CDATA #IMPLIED

xlink:label NMTOKEN #IMPLIED>

56

4.4 XLink

<!ELEMENT gpa ANY>

<!ATTLIST gpa xlink:type (resource) #FIXED "resource"

xlink:role CDATA #FIXED "http://www.example.com/linkprops/gpa"

xlink:title CDATA #IMPLIED xlink:label NMTOKEN #IMPLIED>

<!ELEMENT go EMPTY>

<!ATTLIST go xlink:type (arc) #FIXED "arc"

xlink:arcrole CDATA #IMPLIED xlink:title CDATA #IMPLIED

xlink:to NMTOKEN #IMPLIED>

57

4.4 XLink

<tooltip>Course Load for Pat Jones</tooltip>

<person xlink:href="students/patjones62.xml" xlink:label="student62"

xlink:role="http://www.example.com/linkprops/student"

xlink:title="Pat Jones" />

<person xlink:href="profs/jaysmith7.xml" xlink:label="prof7"

xlink:role="http://www.example.com/linkprops/professor"

xlink:title="Dr. Jay Smith" />

<course xlink:href="courses/cs101.xml" xlink:label="CS-101"

xlink:title="Computer Science 101" />

58

4.4 XLink

<go xlink:from="student62" xlink:to="PatJonesGPA"

xlink:show="new" xlink:actuate="onRequest"

xlink:title="Pat Jones's GPA" />

<go xlink:from="CS-101"

xlink:arcrole="http://www.example.com/linkprops/auditor"

xlink:to="student62" xlink:show="replace"

xlink:actuate="onRequest" xlink:title="Pat Jones, auditing the course"

/>

<go xlink:from="student62"

xlink:arcrole="http://www.example.com/linkprops/advisor"

xlink:to="prof7" xlink:show="replace"

xlink:actuate="onRequest" xlink:title="Dr. Jay Smith, advisor" />

</courseload>

59

4.4 XLink

• Simple vs. Extended links –Simple links cannot

•link an arbitrary number of local and remote resources

•specify a link from a remote resource to a local resource

•set a title to a fixed link

•set a role or title to the local resource

•set a role or totle to the link itself

4.4 XLink

(11)

1. Introduction 2. XML Basics 3. Schema definition 4. XML query languages I 5. Mapping relational

data to XML 6. SQL/XML 7. XML processing

8. XML query languages II 9. XML storage I 10. XML storage - index 11. XML storage - native 12. Updates / Transactions 13. Systems

14. XML Benchmarks

• http://www.w3.org/ [W3C]

–XPath, XPointer, XLink, XQuery, XSLT

• M. Scholl, "XML and Databases", Lecture, Uni Konstanz, WS07/08 [Scholl07]

• XML und Datenmodellierung [EE04]

–R. und S. Eckstein

–Dpunkt-Verlag, 2004, ISBN 3898642224

• XML in a Nutshell [HM04]

–Harold & Means

–O'Reilly, 2004, ISBN 0596007647

• Now, or ...

• Room: IZ 232

• Office our: Tuesday, 12:30 – 13:30 Uhr or on appointment

• Email: eckstein@ifis.cs.tu-bs.de

4. XML Query Languages I

4.1 Introduction

4.1 Introduction

4.1 Introduction

4.1 Introduction

4. XML Query Languages I

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4.2 XPath

4. XML Query Languages I

4.3 XPointer

4.3 XPointer

4.3 XPointer

4.3 XPointer

4.3 XPointer

4. XML Query Languages I

4.4 XLink

4.4 XLink

4.4 XLink

4.4 XLink

4.4 XLink

4.4 XLink

4.4 XLink

4.4 XLink

4.4 XLink

4.4 XLink

4.4 XLink

Questions, Ideas, Comments