• Keine Ergebnisse gefunden

10. XML Storage 1

N/A
N/A
Protected

Academic year: 2021

Aktie "10. XML Storage 1"

Copied!
14
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Silke Eckstein Andreas Kupfer

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

XML Databases

10 . XML Storage 1 – O verview

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 2

10. XML Storage 1

Applications require different types of XML documents

Structure vs. content

Regular vs. irregular

• Thus, XML documents are

Data-centric

Document-centric

or somewhere in-between

• Questions

Storage of XML documents

Efficient processing of queries on the stored documents or data

• There are several methods for storage

1

st

goal: Learn and understand methods

2

nd

goal: Classify methods

•Principles

•Advantages and disadvantages

•Usage

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 3

10.1 Motivation

Characterisation of XML documents:

Data-centric documents

• Structured, regular

• E.g. product catalog, order, invoice – Document-centric documents

• Unstructured, irregular

• E.g. scientific article, book, email, web page – Semi-structured documents

• Data-centric and document-centric parts

• E.g. publications, Amazon, MS Press (example chapters)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 4

10.1 Motivation

Requirements for the physical layer:

– Order preserving and lossless storage of XML documents

– Efficient access to XML documents or parts thereof

• Quick response time for

Queries

Update operations

• Indexing

• Transaction processing

• Support of XPath and XQuery

• Support of SAX and DOM for applications

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 5

10.1 Motivation

Storage approaches for XML documentsText-based

• Storage as character data – Model-based

• On top of the relational DBMS

• Inside the relational DBMS – Schema-based

• Mapping to (object-)relational databases

Deriving the database schema from the XML structure

Using user defined mapping procedures

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 6

10.1 Motivation

(2)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 7

10.1 Motivation

Text-based

storage Model-based

storage Schema-based

storage

– Examples –

10.1 Motivation

10.2 Text-based storage 10.2.1 Index structures 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 8

10. XML Storage 1

• The whole XML document text is stored as character data – File in the file system

– CLOB (Character-Large-OBject) in the DBS

• Operations on documents as a whole are very efficient – Reading and writing the whole document

But the content is monolithic and opaque with respect to the relational query engine (query can't inspect a fragment)

• Getting granular access requires additional support – Full text index

– Path index

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 9

10.2 Text-based storage

• Index structures for XML documents allow efficient access for specific queries

– Different types of indexes are optimized for different types of queries

• Generate redundancy

– Index has to be up-to-date by propagating data changes

• Index structures can be storage structures as well – They define the storage method

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 10

10.2.1 Index structures

Types of index structures – Value index

Indexes atomar values of an XML document, like element content or attribute values

Index format for structured parts of XML documents

Already known from databases (B-trees, hash index, …) – Full text index

Indexes single words from the full text

Index format for unstructured parts of XML documents

Already known from Information Retrieval (inverted lists, tries, suffix trees, …)

– Path index

Indexes subtrees/paths in an XML document

Index format for semistructured parts of XML documents

Already known from object-databases (access support relations, …)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 11

10.2.1 Index structures

B-tree as value index for an XML fragment document

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 12 [Tür08]

10. 2.1 Index structures

(3)

Full text index

– Not limited to exact matches

• Keyword-based search and boolean retrieval

• Pattern search (with regular expressions) – Use of

• Statistical, word-based methods

–Stop word removal –Elimination of uncommon items

• Linguistic methods

–Normalization of words (e.g. capitalisation, hyphenation,) –Word decomposition by rules (engl.) or dictionaries (german) –Stemming

• Knowledge-based methods

–Use of ontologies and thesauri to search for synonyms, hypernyms and hyponyms

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 13

10. 2.1 Index structures

Inverted list as full text index for XML

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 14 [Tür08]

10. 2.1 Index structures

word occurrence word position in the text

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 15 [Tür08]

10. 2.1 Index structures

word occurrence

word occurrence

Path index

– Structure information must be identifiable and reconstructable

• Assigning the markup to the content as well as

• Representing the hierarchical nesting and order of elements/attributes

– Especially suited for keyword search with regard to structure or path expressions

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 16

10. 2.1 Index structures

FOR $b IN //book

WHERE CONTAINS($b/author,"Benjamin") RETURN $b

Types of path indexes – Nested path index

• Access to root node from every node

– Multi-index

• Accessing parent nodes – Join-index

• Access parent and child nodes – Access Support Relations

(ASR)

• Generalization of indexes above, by listing all paths in a table

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 17 [Tür08]

10. 2.1 Index structures

Conclusion

– Efficient query processing on XML documents requires different types of index structures – Value index

• For efficient access to structured parts

• Keyword search, value search – Full text index

• For efficient access to unstructured parts – Path index

• Using the document structure

• Navigating queries

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 18

10. 2.1 Index structures

(4)

Summary text-based storage

Schema definition:

•not required –

Document reconstruction:

•documents stay in their original format –

Queries:

•Information retrieval queries

•Processing the markup of the queries

•XML queries possible –

Special features:

•Full text functions –

Efficiency:

•Character string must be parsed on every access with XML processors expensive

•No concurrency on read or write no parallel processing –

Usage:

•For document-centric XML applications

•Suitable to only a limited extent also for semi-structured applications

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 19

10.2 Text-based storage

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage

10.3.1 On top of the relational DBMS 10.3.2 Inside the relational DBMS 10.4 Schema-based storage

10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 20

10. XML Storage 1

Idea: generic storage of the graph structure – XML elements, XML attributes, … are nodes of a graph – Nesting of elements defines edges

– Nodes get an (internal) ID based on graph traversal

• Using relations or object classes to store elements and attributes

• Document structure can be restored completely

• Extension for data type adapted storage is possible

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 21

10.3.1 Model-based storage

ID Element name Value Reference to preceeding Rank

ID Attribute name Value Reference to element Elements

Attributes

The EDGE approach [FK99]

– Variant BINARY: horizontal partition of EDGE based on label

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 22 [Tür08]

10.3.1 Model-based storage

XML documents

XML queries

– XML queries (XPath, XQuery) are mapped to SQL queries (taking storage structures into account) – Result of XML query is generated from result of

database query

• "Labeling" of the result tuples

• Result is in XML format

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 23 [Tür08]

10.3.1 Model-based storage

• Example: list bargain buy with prices

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 24

10.3.1 Model-based storage

SELECT a.content, b.content FROM Edge a, Edge b WHERE (a.label = 'price') AND (a.content < 10.00) AND (b.label = 'description')

AND (b.parent = a.parent) AND (a.key = b.key)

[Tür08]

(5)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 25 [Tür08]

10.3.1 Model-based storage

DOM-based storage – Information from the

Document Object Model are stored in the database – Storage alternatives

• (Object-)relational databases

• Object-oriented databases

• Developing own data structure

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 26 [Tür08]

10.3.1 Model-based storage

Node type:

ELEMENT Node type:

ATTRIBUTE

Node type:

TEXT

DOM-based storage – example

XML Queries

– XML queries (DOM method invocations) are mapped to SQL queries (taking storage structures into account)

– Result of method invocation is generated from result of database query

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 27 [Tür08]

10.3.1 Model-based storage

Summary model-based storage on top of the DBMS – Schema definition:

•not required for storage

– Document reconstruction:

•Possible, but expensive

– Queries:

•XML queries possible

•Adapted database queries

– Special features:

•Querying many elements/attributes is expensive

– Efficiency:

•Navigation from the given context is efficient

•Restoring the document and evaluating path expressions is inefficient

– Usage:

•For data- and document-centric as well as for semi-structured XML applications

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 28

10.3.1 Model-based storage

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage

10.3.1 On top of the relational DBMS 10.3.2 Inside the relational DBMS 10.4 Schema-based storage

10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 29

10. XML Storage 1

SQL/XML

– Datatype XML with belonging functions – Mapping between

SQL and XML

– Embedding XQuery in SQL

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 30 [Tür08]

10.3.2 XML data type

SQL database XML datatype SQL XQuery

<City>

<Name>

Braunschweig

</Name>

<Zip>38100</Zip>

<Zip>38106</Zip>

<State>

Niedersachsen

</State>

</City>

<City>

<City>

Storing XML documents inside

the database as values of type XML

Generating XML documents using SQL/XML functions

Mapping between SQl and XML

(6)

• A value of the data type XML can contain – whole XML document

– XML element – a set of XML elements

• All XML publishing operators from chapter 9.4 create values of the data type XML, not a string

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 31

10.3.2 XML data type

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 32 [Tür08]

10.3.2 XML data type

XML(SEQUENCE

)

XML(CONTENT(ANY))

XML(CONTENT(XMLSCHEMA)) XML(CONTENT(UNTYPED))

XML(DOCUMENT(ANY))

XML(DOCUMENT(UNTYPED)) XML(DOCUMENT(XMLSCHEMA))

NULL or document node Untyped elements &

attributes,

elements not NULL 1 element child

Validated against schema

1 element child Validated against schema

1 element child

Specification of XML type

• Modifiers are optional

• Primary type modifier

DOCUMENT (XML document)

CONTENT(XML element)

SEQUENCE (sequence of XML elements)

• Secondary type modifier

UNTYPED

XMLSCHEMA(typed)

ANY(may be typed)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 33 [Tür08]

10.3.2 XML data type

XML [({DOCUMENT|CONTENT|SEQUENCE}

[({ANY|UNTYPED|XMLSCHEMA schema name})])]

• Create a table that is an XML data type in itself – CREATE TABLE XMLDOCUMENT OF XMLTYPE;

• Create a table containing an XMLType data type column

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 34 [Pow07]

10.3.2 XML data type

CREATE TABLE XML (

ID NUMBER NOT NULL,

XML XMLTYPE,

CONSTRAINT XPK PRIMARY KEY (ID) );

Example: Definition of an XML type column

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 35

10.3.2 XML data type

ID Name

123 <Groups>Annabelle</Groups>

234 <Groups>Magdalena, Marius</Groups>

345 <?xml version 1.0?>

<Groups>

<Person>Patrick</Person>

<Person>Robert</Person>

</Groups>

654 <Groups>Rebecca</Groups>

<Groups>Torben</Groups>

CREATE TABLE Groups (

ID INTEGER,

Name XML );

Characteristics – Allowed values:

• XML documents (including prolog)

• XML content according to XML 1.0 (includes pure text comments, PI?)

• NULL

No comparison possible (compare CLOB in SQL)

• User can define an order, if comparison is necessary – No corresponding type in programming languages for

embedding in SQL available

– Standard defines operators to convert to other SQL data types

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 36 [Kud07]

10.3.2 XML data type

(7)

Parsing & SerializationXMLParse:

• Parses a string value using an XML parser

• Produces value whose specific type is XML(DOCUMENT(ANY)), or …CONTENT…, orXMLSerialize

• Transforms an XML value into a string value (CHAR, VARCHAR, CLOB, or BLOB)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 37 [Mel05]

10.3.2 XML data type

<City>

<Name>

Braunschweig

</Name>

<Zip>38100</Zip>

<Zip>38106</Zip>

<State>

Niedersachsen

</State>

</City>

<City>

<Name>

Braunschweig

</Name>

<Zip>38100</Zip>

<Zip>38106</Zip>

<State>

Niedersachsen

</State>

</City>

Validation of XML

– Is like integrity constraints in DBs – Requires an XML Schema

– XML Schemas may be registered with the SQL-server

• Implementation-defined mechanism

• Known by SQL name & by target namespace URI

Schema does need a unique name

• Used by XMLValidate(), IS VALID, and to restrict values of

XML(DOCUMENT-or-CONTENT(XMLSCHEMA))

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 38 [Mel05]

10.3.2 XML data type

Schema registration

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 39

10.3.2 XML data type

Register XMLSCHEMA

'http://www.Alfred-Moos.de/GrussSchema.xsd' FROM 'file://c:/XML_Schemata/GrussSchema.xsd' AS GrussSchema

COMPLETE

;

CREATE TABLE Dokument_XML (Dokument_XML_Nr CHAR (4)

NOT NULL PRIMARY KEY, Dokument XML,

CONSTRAINT validieren

CHECK (Dokument IS VALIDATED ACCORDING TO XMLSCHEMA ID GrussSchema )

)

;

Schema definitionSyntax

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 40 [Tür08]

10.3.2 XML data type

XML(CONTENT(XMLSCHEMA) <schema> [<elements>]))

<schema> := URI <namespace> [LOCATION <loc>]

| NO NAMESPACE [LOCATION <loc>]

| ID <registered schema name>

<element> := [NAMESPACE <namespace>]

ELEMENT <element name>

New functions and predicates:

XMLValidate

• Validates an XML value against an XML Schema (or target namespace), returning new XML value with type annotations – IS VALID

• Tests an XML value to determine whether or not it is valid according to an XML Schema (or target namespace); return true/false without altering the XML value itself

IS DOCUMENT

• determines whether an XML value satisfies the (SQL/XML) criteria for an XML document

IS CONTENT

• determines whether an XML value satisfies the (SQL/XML) criteria for XML content

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 41 [Mel05]

10.3.2 XML data type

Benefits of schema registration – Security issues

• Schemas cannot “disappear” without SQL- server knowing about it

• Schemas cannot be “hijacked” (altered in inappropriate ways) without SQL-server knowing about it

• Documents cannot be marked “valid”

against schemas unless SQL-server knows about them

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 42 [Mel05]

10.3.2 XML data type

(8)

Predefined schemas (build-in namespaces)

xs:http://www.w3.org/2001/XMLSchema

xsi:http://www.w3.org/2001/XMLSchema-instance

sqlxml:http://standards.iso.org/iso/9075/2003/sqlxml – More depending on the DB implementation

Completely supported per XML+Namespaces:

XMLElement, XMLForest, XMLTable – Default namespace, explicit namespace (prefix)

– Declare namespace within scopes of WITH clause, column definitions, constraint definitions, insert/delete/update statements, compound statements

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 43 [Mel05]

10.3.2 XML standard

SQL/XML standard published as – ISO/IEC 9074-14:2003

• Mappings and Publishing Functions – ISO/IEC 9075-14:2006

• Adds XQuery, including Data Model, Validation – ISO/IEC 9075-14:2008

• Updates

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 44 [Mel05]

10.3.2 SQL/XML standard

SQL/XML:2003 plus – Additional publishing functions – XQuery data model

– More precise XML type (modifiers) – XMLQuery, XMLTable

– XMLValidate, IS VALID

– XMLExists, IS DOCUMENT, IS CONTENT – Casting between XML type and SQL types

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 45 [Mel05]

10.3.2 SQL/XML standard

Overview of some operators for the XML type

– XMLELEMENT – creates an XML element node

– XMLFOREST – creates a sequence of XML element nodes from a table

– XMLCOMMENT – creates an XML comment node – XMLTEXT – creates a text node

– XMLPI – creates a processing instruction – XMLAGG – aggregates XML values of a group – XMLCONCAT – concatenates XML type values – XMLTRANSFORM – applies an XSL to a document

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 46 [Tür08]

10.3.2 SQL/XML standard

... Overview of some operators for the XML type

– XMLPARSE – a well-formed SQL text to XML value – XMLSERIALIZE – converts an XML value to a SQL text – XMLDOCUMENT – creates an XML document node

from an XML value

– XMLVALIDATE – validates an XML value with a schema – XMLQUERY – evaluates an XQuery expression

– XMLTABLE – transforms an XQuery result to a SQL table – XMLITERATE – transforms an XQery sequence to a SQL

table

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 47 [Tür08]

10.3.2 SQL/XML standard

Review of SQL/XML – Two components

• A data type XML to store XML data

• Functions to map relational structures to XML – Only construction operators

• No extraction of values or search

• But construction operators are based on XQuery – Mapping of tables, schemas, catalogues ignores some

information from the relational schema

• UNIQUE

• REFERENCES

• CHECK

– Further extensions are expected

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 48 [Kud07]

10.3.2 SQL/XML standard

(9)

Summary model-based storage inside the DBMS – Schema definition:

not required for storage – Document reconstruction:

Possible without problems – Queries:

XML queries possible – Special features:

Full integration of SQL and XML – Efficiency:

Vendor specific

Optimized data structures possible – Usage:

For data- and document-centric as well as for semi-structured XML applications

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 49

10.3.2 Model-based storage

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage 10.4 Schema-based storage

10.4.1 Automatic mapping 10.4.2 User-defined mapping 10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 50

10. XML Storage 1

Motivation

– XML content shall be stored in a conventional database – Accepting the loss of native access

– DB schema is derieved from a DTD or an XML schema

Problem

– Generate DB schema automatically

– Thereby use as much structure information as possible

General approach for mapping from a DTD – Transform DTD into a tree representation – Nodes: element types, attributes, etc. (type layer!!!)

– Edges: nesting relationships of element types and their restrictions – Traverse tree in order to transform nodes and edges into database

tables (according to certain rules)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 51

10.4.1 Schema-based storage

Generating the DB schema for a DTD:

– Rules to map element types:

– Rules to map attributes:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 52

10.4.1 Schema-based storage

XML element type column of a table

Sequence of element types columns of a table Alternative of element types column of a table Element type with quantifier ? column with null values

Element type with quantifier +,* set/list of columns (SET OF, LIST OF)

Nested element types TUPLE OF

XML attribute column of a table

IMPLIED null values allowed

REQUIRED null values not allowed

Default value DEFAULT constraint

Mapping to relational databases – DTD is usually required

– Queries use SQL functionality

– RDBMS data types are used (e.g. prices are NUMERIC) – Problem: Mapping of collection types

Subdivide into additional relations – Example:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 53

10.4.1 Schema-based storage

Comment_ID Customer_info Feedback

44901 C0001 F0001

ID Fname Lname Email

C0001 Charles Sanchez C.Sanchez@hotmail...

ID Type Content

F001 opinion Darjeeling Special…

Comment:

Customer_Info:

Feedback:

Mapping to object relational databases – DTD is usually required

– Queries use SQL functionality – "Natural" mapping to tuple

types, collection types – In case of irregular document

structure databases contain many null values.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 54

10.4.1 Schema-based storage

Comment_ID <Customer_info> <Feedback>

44901

Fname Lname Email

Charles Sanchez C.Sanchez@hotmail...

Type Content opinion Darjeeling Specia…

Comment:

(10)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 55 [Tür08]

10.4.1 Schema-based storage

• Mapping of recursive data definitions – DTDs can be recursive

– Infinite recursion is impossible on instance layer of a database – Procedure:

Marking the nodes

Subdividing into separate tables

Use primary and foreign keys in RDBMS

Use reference types in ORDBMS

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 56

10.4.1 Schema-based storage

<!ELEMENT book (front, body, references)>

<!ELEMENT references (book+)>

• Mapping of element sequences – Sequence can be important

• Use an additional attribute in these cases – Example:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 57

10.4.1 Schema-based storage

Order Lesson

1 Introduction

2 XML basics

<lecture>

<lesson>Introduction</lesson>

<lesson>XML basics</lesson>

• Mapping of alternatives – XML allows to specify alternatives – Example:

– Three possible storage variants

• Each alternative is stored as separate table column

• Subdivide alternatives in separate tables

• Use a table column of type XML type

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 58

10.4.1 Schema-based storage

<!ELEMENT car (compactCar | sedan | van)*>

Variant 1 – all alternatives in one table

– Problem: many null values (wasting storage space)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 59 [Tür08]

10.4.1 Schema-based storage

Variant 2 – subdivided into multiple tables

– For queries, combination of tables is needed

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 60 [Tür08]

10.4.1 Schema-based storage

(11)

Variant 3 – Using column type XML

– XML type allows XML queries or DOM methods

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 61 [Tür08]

10.4.1 Schema-based storage

Mapping with STORED (Semistructured TO RElational Data)

Basic idea: Use data mining techniques on the XML structure to find a good

mapping to tables [DFS99]

Input

•XML documents (or an average sample of the collection)

Query workload

•Restrictions of storage space, number of tables, …

•No DTD or XML schema is required!

Output

•Relational schema

•STORED-queries: Mapping instructions for XML documents to DB tables –

Procedure

•Determine the XML subtrees with the largest support in the collection and in the queries

•These subtrees are materialised in tables

•Irregular data is stored in overflow tables according to the EDGE approach

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 62

10.4.1 Schema-based storage

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

Mapping with STORED – example

10.4.1 Schema-based storage

XML documents shown as tree structure Subtrees with

high support Subtrees with high support

63 [Tür08]

Schema-based storage with automatic mapping

– Advantages

• Queries, data types, aggregation functions, views

• Integration in other databases when storing structured data – Disadvantages

• Large schema, sparsely filled databases (many null values)

• No flexible data types, storage of alternatives has problems

• Less flexible queries

–No information retrieval queries possible without additional extensions –No full text operations for semi- or unstructured data

–Usuallynative access is not possibleany more

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 64

10.4.1 Schema-based storage

Summary Schema-based storage with automatic mapping

Schema definition:

•Is usually required and analysed

•not required, e.g. for STORED –

Document reconstruction:

•Limited (requires logging of the mapping process) –

Queries:

•Database queries

•XML queries possible,but lack the XPath horizontal axes, e.g. following, preceding-sibling

Special features:

•Federation with existing databases is possible –

Efficiency:

•High efficiency by using the DB-engine –

Usage:

•For data-centric XML applications, but with limited nesting

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 65

10.4.1 Schema-based storage

User defined mapping – Idea

In all previously shown methods it is not possible to affect the storage in the DB

With user defined mappings the

user defines the storage structure

The structure of XML documents and database schema can be designed independently from each other

Also possible: storing XML documents in existing databases – Annotation of DTD and XML schema, respectively

In many cases the mapping definition is combined with existing schema information

– Only limited XML queries possible

Logging of the mapping process from XML documents to databases

For a given query all relevant data has to be stored (lossless mapping)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 66

10.4.2 User-defined Mapping

(12)

• Mapping solutions with different specializations – Algorithms, middleware, commercial applications, … – Varying amount of required input or user decisions – Many algorithms create different database schemas

• Two phases – Mapping

• Assign a place for each node type in the DB – Shredding

• Import the XML data as DB tuples

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 67

10.4.2 User-defined Mapping

• The shredder can be part of the DB – Usually requires an XML schema

– In the IBM Data Studio, the shredder is part of the

"annotated XML schema decomposition"

– Direct approach in DB2:

• register the XML schema and call the stored procedure:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 68

10.4.2 User-defined Mapping

register xmlschema http://our.org/custacc from dec_files/custacc.xsd as cust_schema ;

complete xmlschema cust_schema enable decomposition ; call SYSPROC.XDBDECOMPXML ('VRODRIG', 'CUST_SCHEMA', ? ,

?, 1, null, null, null)

Shredding without XML schema in DB2 – XMLTABLE function in combination with an INSERT

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 69 http://www.ibm.com/developerworks/db2/l

ibrary/techarticle/dm-0801ledezma/

10.4.2 User-defined Mapping

INSERT INTO ENVELOPEXT (MAILFROM, MAILTO, MAILDATE, SUBJECT) SELECT MAILFROM, MAILTO, MAILDATE, SUBJECT

FROM XMLTABLE(

XMLNAMESPACES('http://www.sal.com/mails' AS "email"), '$doc/email:mails/mail' (: some xquery-expression :) PASSING xml-sourceAS "doc"

COLUMNS

MAILFROM VARCHAR (100) PATH 'envelope/from', MAILTO VARCHAR (100) PATH 'envelope/to', MAILDATE VARCHAR (30) PATH 'envelope/email:Date', SUBJECT VARCHAR (100) PATH 'envelope/Subject') AS T;

Example:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 70 [Tür08]

10.4.2 User-defined Mapping

mapping instruction XML document

Mapping instruction

– Example syntax for XML-DBMS (Roland Bourret)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 71

10.4.2 User-defined Mapping

<ClassMap>

<ElementType Name="sales:SalesOrder"/>

<ToClassTable>

<Table Name="Sales"/>

</ToClassTable>

<PropertyMap>

<Attribute Name="SONumber"/>

<ToColumn>

<Column Name="Number"/>

</ToColumn>

</PropertyMap>

</ClassMap>

Connection between elements

and tables

Connection between elements/attributes

and table columns

Remarks

– Many different mapping languages or schema annotations

• Automatic mappings usually have an internal mapping language

– Remember the mapping constructs from last lecture (9). The SQL/XML annotations are a mapping language, too.

– DB2 uses similar annotations as SQL/XML

• see next slide:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 72

10.4.2 User-defined Mapping

(13)

Name Balance

Joe 2000

Jim 3500

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 73 [Tür08]

<ACCOUNT>

<row>

<NAME>Joe</NAME>

<BALANCE>2000</BALANCE>

</row>

<row>

<NAME>Jim</NAME>

<BALANCE>3500</BALANCE>

</row>

</ACCOUNT>

Mapping SQL tables

<xsd:complexType xmlns:db2-xdb=

"http://www.ibm.com/xmlns/prod/db2/xdb1"

name="ROW.ACCOUNT">

<xsd:sequence>

<xsd:element name="NAME"

type="CHAR_20"

db2-xdb:rowSet="Account"

db2-xdb:column="Name"/>

<xsd:element name="BALANCE"

type="NUMERIC_12_2"/>

db2-xdb:rowSet="Account"

db2-xdb:column="Balance"/>

</xsd:sequence>

</xsd:complexType>

<xsd:complexType name="TABLE.ACCOUNT">

<xsd:sequence>

<xsd:element name="row"

type="ROW.ACCOUNT"/>

</xsd:sequence>

</xsd:complexType>

<xsd:element name="ACCOUNT"

type="TABLE.ACCOUNT"/>

CREATE TABLEAccount (

Name CHAR(20), BalanceNUMERIC(12,2), );

Mapping SQL table columns to XML elements

Mapping table rows to XML

<row>

elements

Summary schema-based storage with user defined mapping

– Schema definition:

•Depends on mapping language

– Document reconstruction:

•Not possible in most cases (requires logging of the mapping process)

– Queries:

•Database queries

•XML queries in rare cases only!

– Special features:

•Integration with existing databases is possible

– Efficiency:

•High efficiency by using the DB-engine

– Usage:

•For data-centric XML applications

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 74

10.4.2 User-defined Mapping

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 75

10. XML Storage 1

• Different methods for storage of XML documents – Text-based

Storing whole XML documents

as string

Can use full text index or path index – Model-based

Generic

mapping of the

tree

structure – Schema-based

Detect and analyse the structure of the XML documents

Derive a DB schema from the structure

– Hybrid approaches

A combination of some of those methods

– No algorithm has the optimal solution for all kind of XML documents

– Reasonable solution is heavily dependent on the application

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 76

10.5 Conclusion

• "XML und Datenbanken" [Tür08]

–Can Türker

–Lecture, University of Zurich, 2008

• "XML und Datenbanken" [KM03]

–M. Klettke, H. Meier –dpunkt.verlag, 2003

• "Generierung eines adaptiven Datenbankschemas für datenzentrierte XML- Dokumente" [Bus08]

–Carsten Busche

–Diplomarbeit, TU Braunschweig, 2008

• [FK99]

–D. Florescu, D. Kossmann: Storing and Querying XML Data using an RDBMS. IEEE Data engineering Bulletin (DEBU), Volume 22(3), Seiten 27-34, 1999.

• [DFS99]

–A. Deutsch, M.F. Fernández, D. Suciu: Storing Semistructured Data with STORED.

Proceedings of the 1999 ACM SIGMOD international conference on Management of data, Seiten 431-442, ACM, 1999.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 77

10.6 References

Introduction and Basics 1. Introduction

2. XML Basics 3. Schema Definition 4. XML Processing Querying XML 5. XPath & SQL/XML

Queries

6. XQuery Data Model 7. XQuery

XML Updates 8. XML Updates & XSLT

Producing XML 9. Producing XML Storing XML 10. XML storage 11.Relational XML

storage

12. Storage Optimization Systems

13. Technology Overview

10.6 Overview

78 XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

(14)

• Now, or ...

• Room: IZ 232

• Office our: Tuesday, 12:30 – 13:30 Uhr or on appointment

• Email: eckstein@ifis.cs.tu-bs.de

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 79

Questions, Ideas, Comments

Referenzen

ÄHNLICHE DOKUMENTE

Weitere anspruchsvolle Proben für das Fach Mathematik in der zweiten Klasse findest Du auf unserer Partnerseite www.CATLUX.de.. Dort gibt es ausführliche Musterlösungen,

Spatial Databases and GIS – Karl Neumann, Sarah Tauscher– Ifis – TU Braunschweig 860.. 10 Location

• The cloaked region is the MBR that includes the user and neighboring nodes.. – Clique

•  The cloaked region is the MBR that includes the user and neighboring nodes.. –  Clique

10.2 Text-based storage 10.2.1 Index structures 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion.. 10.6 Overview

One actor stood at a bus stop in traditional Muslim dress. The other loudly argued that the Muslim could be a terrorist and tried to stop him from boarding the bus. Over and

tronen erzeugt, diese über eine hohe Anodenspannung ( U = 50 − 300 V ) beschleunigt, elektronenoptisch bündelt (z.B. W ehnelt-Zylinder) und den Elektronen dann eine An-.. tikathode

The M iniScribe IV series of highly reliable 5 1 /4-inch Winchester disk drives offers 10 or 20 megabytes of unformatted storage on one or two platters in an