• Keine Ergebnisse gefunden

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

N/A
N/A
Protected

Academic year: 2021

Aktie "Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de"

Copied!
79
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

XML Databases

10 O

Silke Eckstein Andreas Kupfer

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

10 . XML Storage 1 – O verview

(2)

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage

10. XML Storage 1

10.4 Schema-based storage 10.5 Conclusion

10.6 Overview and References

(3)

Applications require different types of XML documents – Structure vs. content

– Regular vs. irregular

• Thus, XML documents are – Data-centric

– Document-centric

– or somewhere in-between

10.1 Motivation

– or somewhere in-between

• Questions

– Storage of XML documents

– Efficient processing of queries on the stored documents or data

• There are several methods for storage – 1st goal: Learn and understand methods – 2nd goal: Classify methods

Principles

Advantages and disadvantages

Usage

(4)

Characterisation of XML documents:

Data-centric documents

• Structured, regular

• E.g. product catalog, order, invoice

Document-centric documents

10.1 Motivation

Document-centric documents

• Unstructured, irregular

• E.g. scientific article, book, email, web page

Semi-structured documents

• Data-centric and document-centric parts

• E.g. publications, Amazon, MS Press (example chapters)

(5)

Requirements for the physical layer:

– Order preserving and lossless storage of XML documents

– Efficient access to XML documents or parts thereof

Quick response time for

10.1 Motivation

• Quick response time for

– Queries

– Update operations

• Indexing

• Transaction processing

• Support of XPath and XQuery

• Support of SAX and DOM for applications

(6)

Storage approaches for XML documents

Text-based

• Storage as character data

Model-based

10.1 Motivation

• On top of the relational DBMS

• Inside the relational DBMS

Schema-based

• Mapping to (object-)relational databases

– Deriving the database schema from the XML structure – Using user defined mapping procedures

(7)

10.1 Motivation

Text-based

storage Model-based

storage Schema-based

storage

– Examples –

(8)

10.1 Motivation

10.2 Text-based storage 10.2.1 Index structures

10. XML Storage 1

10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion

10.6 Overview and References

(9)

• The whole XML document text is stored as character data

– File in the file system

– CLOB (Character-Large-OBject) in the DBS

• Operations on documents as a whole are very efficient

– Reading and writing the whole document

10.2 Text-based storage

Reading and writing the whole document

But the content is monolithic and opaque with respect to the relational query engine (query can't inspect a fragment)

• Getting granular access requires additional support

– Full text index

– Path index

(10)

• Index structures for XML documents

allow efficient access for specific queries

– Different types of indexes are optimized for different types of queries

10.2.1 Index structures

• Generate redundancy

– Index has to be up-to-date by propagating data changes

• Index structures can be storage structures as well

– They define the storage method

(11)

Types of index structures

– Value index

• Indexes atomar values of an XML document, like element content or attribute values

• Index format for structured parts of XML documents

• Already known from databases (B-trees, hash index, …)

– Full text index

10.2.1 Index structures

– Full text index

• Indexes single words from the full text

• Index format for unstructured parts of XML documents

• Already known from Information Retrieval (inverted lists, tries, suffix trees, …)

– Path index

• Indexes subtrees/paths in an XML document

• Index format for semistructured parts of XML documents

• Already known from object-databases (access support relations, …)

(12)

B-tree as value index for an XML fragment document

10. 2.1 Index structures

(13)

Full text index

– Not limited to exact matches

Keyword-based search and boolean retrieval

Pattern search (with regular expressions)

– Use of

Statistical, word-based methods

10. 2.1 Index structures

Statistical, word-based methods

Stop word removal

Elimination of uncommon items

Linguistic methods

Normalization of words (e.g. capitalisation, hyphenation,)

Word decomposition by rules (engl.) or dictionaries (german) Stemming

Knowledge-based methods

Use of ontologies and thesauri to search for synonyms, hypernyms and hyponyms

(14)

Inverted list as full text index for XML

10. 2.1 Index structures

word occurrence word position in the text

(15)

10. 2.1 Index structures

word occurrence

word occurrence

(16)

Path index

– Structure information must be identifiable and reconstructable

• Assigning the markup to the content as well as

• Representing the hierarchical nesting and order of

10. 2.1 Index structures

• Representing the hierarchical nesting and order of elements/attributes

– Especially suited for keyword search with regard to structure or path expressions

FOR $b IN //book

WHERE CONTAINS($b/author,"Benjamin") RETURN $b

(17)

Types of path indexes

– Nested path index

Access to root node from every node

– Multi-index

Accessing parent nodes

10. 2.1 Index structures

Accessing parent nodes

– Join-index

Access parent and child nodes

– Access Support Relations (ASR)

Generalization of indexes above,

by listing all paths in a table

(18)

Conclusion

– Efficient query processing on XML documents requires different types of index structures

– Value index

• For efficient access to structured parts

10. 2.1 Index structures

• For efficient access to structured parts

• Keyword search, value search

– Full text index

• For efficient access to unstructured parts

– Path index

• Using the document structure

• Navigating queries

(19)

Summary text-based storage – Schema definition:

not required

– Document reconstruction:

documents stay in their original format

– Queries:

Information retrieval queries

Processing the markup of the queries

10.2 Text-based storage

Processing the markup of the queries

XML queries possible

– Special features:

Full text functions

– Efficiency:

Character string must be parsed on every access with XML processors expensive

No concurrency on read or write no parallel processing

– Usage:

For document-centric XML applications

Suitable to only a limited extent also for semi-structured applications

(20)

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage

10.3.1 On top of the relational DBMS

10. XML Storage 1

10.3.1 On top of the relational DBMS 10.3.2 Inside the relational DBMS

10.4 Schema-based storage 10.5 Conclusion

10.6 Overview and References

(21)

Idea: generic storage of the graph structure

– XML elements, XML attributes, … are nodes of a graph – Nesting of elements defines edges

– Nodes get an (internal) ID based on graph traversal

• Using relations or object classes to store elements and attributes

10.3.1 Model-based storage

attributes

• Document structure can be restored completely

• Extension for data type adapted storage is possible

ID Element name Value Reference to preceeding Rank

ID Attribute name Value Reference to element Elements

Attributes

(22)

The EDGE approach [FK99]

10.3.1 Model-based storage

– Variant BINARY: horizontal partition of EDGE based on label

XML documents

(23)

XML queries

– XML queries (XPath, XQuery) are mapped to SQL

10.3.1 Model-based storage

– XML queries (XPath, XQuery) are mapped to SQL queries (taking storage structures into account)

– Result of XML query is generated from result of database query

• "Labeling" of the result tuples

• Result is in XML format

(24)

• Example: list bargain buy with prices

10.3.1 Model-based storage

SELECT a.content, b.content FROM Edge a, Edge b WHERE (a.label = 'price') AND (a.content < 10.00) AND (b.label = 'description')

AND (b.parent = a.parent) AND (a.key = b.key)

(25)

10.3.1 Model-based storage

DOM-based storage

– Information from the

Document Object Model are stored in the database – Storage alternatives

– Storage alternatives

• (Object-)relational databases

• Object-oriented databases

• Developing own data

structure

(26)

10.3.1 Model-based storage

Node type:

ELEMENT Node type:

ATTRIBUTE

DOM-based storage – example

Node type:

TEXT

(27)

XML Queries

– XML queries (DOM method invocations) are mapped

10.3.1 Model-based storage

– XML queries (DOM method invocations) are mapped to SQL queries (taking storage structures into

account)

– Result of method invocation is generated from result

of database query

(28)

Summary model-based storage on top of the DBMS

Schema definition:

not required for storage

Document reconstruction:

Possible, but expensive

Queries:

XML queries possible

10.3.1 Model-based storage

XML queries possible

Adapted database queries

Special features:

Querying many elements/attributes is expensive

Efficiency:

Navigation from the given context is efficient

Restoring the document and evaluating path expressions is inefficient

Usage:

For data- and document-centric as well as for semi-structured XML applications

(29)

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage

10.3.1 On top of the relational DBMS

10. XML Storage 1

10.3.1 On top of the relational DBMS 10.3.2 Inside the relational DBMS

10.4 Schema-based storage 10.5 Conclusion

10.6 Overview and References

(30)

SQL/XML

– Datatype XML with

10.3.2 XML data type

<City>

<Name>

Braunschweig

</Name>

<Zip>38100</Zip>

<Zip>38106</Zip>

<State>

Niedersachsen

</State>

</City>

<City>

<City>

Storing XML documents inside

the database as values of type XML

Generating XML documents using SQL/XML functions

– Datatype XML with belonging functions – Mapping between

SQL and XML

– Embedding XQuery in SQL

SQL database XML datatype SQL XQuery

Mapping between SQl and XML

(31)

• A value of the data type XML can contain

– whole XML document – XML element

– a set of XML elements

• All XML publishing operators from chapter 9.4

10.3.2 XML data type

• All XML publishing operators from chapter 9.4

create values of the data type XML, not a string

(32)

10.3.2 XML data type

XML(SEQUENCE)

XML(CONTENT(ANY))

NULL or document node Untyped elements &

attributes,

elements not NULL 1 element child

Validated against schema

XML(CONTENT(XMLSCHEMA)) XML(CONTENT(UNTYPED))

XML(DOCUMENT(ANY))

XML(DOCUMENT(UNTYPED)) XML(DOCUMENT(XMLSCHEMA))

1 element child Validated against schema

1 element child

(33)

Specification of XML type

• Modifiers are optional

• Primary type modifier

10.3.2 XML data type

XML [({DOCUMENT|CONTENT|SEQUENCE}

[({ANY|UNTYPED|XMLSCHEMA schema name})])]

• Primary type modifier

DOCUMENT (XML document) – CONTENT (XML element)

SEQUENCE (sequence of XML elements)

• Secondary type modifier

UNTYPED

XMLSCHEMA (typed) – ANY (may be typed)

(34)

• Create a table that is an XML data type in itself

CREATE TABLE XMLDOCUMENT OF XMLTYPE;

• Create a table containing an XMLType data type column

10.3.2 XML data type

column

CREATE TABLE XML (

ID NUMBER NOT NULL,

XML XMLTYPE,

CONSTRAINT XPK PRIMARY KEY (ID) );

(35)

Example: Definition of an XML type column

10.3.2 XML data type

ID Name

123 <Groups>Annabelle</Groups>

CREATE TABLE Groups (

ID INTEGER,

Name XML );

123 <Groups>Annabelle</Groups>

234 <Groups>Magdalena, Marius</Groups>

345 <?xml version 1.0?>

<Groups>

<Person>Patrick</Person>

<Person>Robert</Person>

</Groups>

654 <Groups>Rebecca</Groups>

<Groups>Torben</Groups>

(36)

Characteristics

– Allowed values:

• XML documents (including prolog)

• XML content according to XML 1.0 (includes pure text comments, PI?)

• NULL

10.3.2 XML data type

• NULL

No comparison possible (compare CLOB in SQL)

• User can define an order, if comparison is necessary

– No corresponding type in programming languages for embedding in SQL available

– Standard defines operators to convert to other SQL

data types

(37)

Parsing & Serialization

XMLParse:

• Parses a string value using an XML parser

• Produces value whose specific type is

10.3.2 XML data type

<City>

<Name>

Braunschweig

</Name>

<Zip>38100</Zip>

<Zip>38106</Zip>

<State>

Niedersachsen

</State>

</City>

• Produces value whose specific type is

XML(DOCUMENT(ANY)), or …CONTENT…, or

XMLSerialize

• Transforms an XML value into a string value (CHAR, VARCHAR, CLOB, or BLOB)

<City>

<Name>

Braunschweig

</Name>

<Zip>38100</Zip>

<Zip>38106</Zip>

<State>

Niedersachsen

</State>

(38)

Validation of XML

– Is like integrity constraints in DBs – Requires an XML Schema

– XML Schemas may be registered with the SQL-server

10.3.2 XML data type

XML Schemas may be registered with the SQL-server

• Implementation-defined mechanism

• Known by SQL name & by target namespace URI

– Schema does need a unique name

• Used by XMLValidate(), IS VALID, and to restrict values of

XML(DOCUMENT-or-CONTENT(XMLSCHEMA))

(39)

Schema registration

10.3.2 XML data type

Register XMLSCHEMA

'http://www.Alfred-Moos.de/GrussSchema.xsd' FROM 'file://c:/XML_Schemata/GrussSchema.xsd' AS GrussSchema

COMPLETE

;

CREATE TABLE Dokument_XML

(Dokument_XML_Nr CHAR (4) NOT NULL

PRIMARY KEY, Dokument XML,

CONSTRAINT validieren

CHECK (Dokument IS VALIDATED ACCORDING TO XMLSCHEMA ID GrussSchema

) )

;

(40)

Schema definition

Syntax

10.3.2 XML data type

XML(CONTENT(XMLSCHEMA) <schema> [<elements>]))

<schema> := URI <namespace> [LOCATION <loc>]

| NO NAMESPACE [LOCATION <loc>]

| NO NAMESPACE [LOCATION <loc>]

| ID <registered schema name>

<element> := [NAMESPACE <namespace>]

ELEMENT <element name>

(41)

New functions and predicates:

XMLValidate

Validates an XML value against an XML Schema (or target namespace), returning new XML value with type annotations

IS VALID

Tests an XML value to determine whether or not it is valid

10.3.2 XML data type

Tests an XML value to determine whether or not it is valid according to an XML Schema (or target namespace); return

true/false

without altering the XML value itself

IS DOCUMENT

determines whether an XML value satisfies the (SQL/XML) criteria for an XML document

IS CONTENT

determines whether an XML value satisfies the (SQL/XML)

criteria for XML content

(42)

Benefits of schema registration

– Security issues

• Schemas cannot “disappear” without SQL- server knowing about it

• Schemas cannot be “hijacked” (altered in

10.3.2 XML data type

• Schemas cannot be “hijacked” (altered in inappropriate ways) without SQL-server knowing about it

• Documents cannot be marked “valid”

against schemas unless SQL-server knows

about them

(43)

Predefined schemas (build-in namespaces)

xs:http://www.w3.org/2001/XMLSchema

xsi:http://www.w3.org/2001/XMLSchema-instance

sqlxml:http://standards.iso.org/iso/9075/2003/sqlxml

– More depending on the DB implementation

10.3.2 XML standard

More depending on the DB implementation

Completely supported per XML+Namespaces:

XMLElement, XMLForest, XMLTable

– Default namespace, explicit namespace (prefix)

– Declare namespace within scopes of WITH clause, column definitions, constraint definitions, insert/delete/update

statements, compound statements

(44)

SQL/XML standard published as

– ISO/IEC 9074-14:2003

• Mappings and Publishing Functions

– ISO/IEC 9075-14:2006

• Adds XQuery, including Data Model, Validation

10.3.2 SQL/XML standard

• Adds XQuery, including Data Model, Validation

– ISO/IEC 9075-14:2008

• Updates

(45)

SQL/XML:2003 plus

– Additional publishing functions – XQuery data model

– More precise XML type (modifiers) – XMLQuery, XMLTable

10.3.2 SQL/XML standard

– XMLQuery, XMLTable – XMLValidate, IS VALID

– XMLExists, IS DOCUMENT, IS CONTENT

– Casting between XML type and SQL types

(46)

Overview of some operators for the XML type

– XMLELEMENT – creates an XML element node

– XMLFOREST – creates a sequence of XML element nodes from a table

10.3.2 SQL/XML standard

from a table

– XMLCOMMENT – creates an XML comment node – XMLTEXT – creates a text node

– XMLPI – creates a processing instruction

– XMLAGG – aggregates XML values of a group

– XMLCONCAT – concatenates XML type values

– XMLTRANSFORM – applies an XSL to a document

(47)

... Overview of some operators for the XML type

– XMLPARSE – a well-formed SQL text to XML value

– XMLSERIALIZE – converts an XML value to a SQL text – XMLDOCUMENT – creates an XML document node

10.3.2 SQL/XML standard

– XMLDOCUMENT – creates an XML document node from an XML value

– XMLVALIDATE – validates an XML value with a schema – XMLQUERY – evaluates an XQuery expression

– XMLTABLE – transforms an XQuery result to a SQL table – XMLITERATE – transforms an XQery sequence to a SQL

table

(48)

Review of SQL/XML

– Two components

A data type XML to store XML data

Functions to map relational structures to XML

– Only construction operators

No extraction of values or search

10.3.2 SQL/XML standard

No extraction of values or search

But construction operators are based on XQuery

– Mapping of tables, schemas, catalogues ignores some information from the relational schema

UNIQUE

REFERENCES

CHECK

– Further extensions are expected

(49)

Summary model-based storage inside the DBMS

– Schema definition:

• not required for storage

– Document reconstruction:

• Possible without problems

– Queries:

10.3.2 Model-based storage

– Queries:

• XML queries possible

– Special features:

• Full integration of SQL and XML

– Efficiency:

• Vendor specific

• Optimized data structures possible

– Usage:

• For data- and document-centric as well as for semi-structured XML applications

(50)

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage 10.4 Schema-based storage

10. XML Storage 1

10.4 Schema-based storage

10.4.1 Automatic mapping 10.4.2 User-defined mapping 10.5 Conclusion

10.6 Overview and References

(51)

Motivation

XML content shall be stored in a conventional database

Accepting the loss of native access

DB schema is derieved from a DTD or an XML schema

Problem

Generate DB schema automatically

10.4.1 Schema-based storage

Generate DB schema automatically

Thereby use as much structure information as possible

General approach for mapping from a DTD

Transform DTD into a tree representation

Nodes: element types, attributes, etc. (type layer!!!)

Edges: nesting relationships of element types and their restrictions

Traverse tree in order to transform nodes and edges into database

tables (according to certain rules)

(52)

Generating the DB schema for a DTD:

– Rules to map element types:

10.4.1 Schema-based storage

XML element type column of a table

Sequence of element types columns of a table Alternative of element types column of a table

Element type with quantifier ? column with null values

– Rules to map attributes:

Element type with quantifier ? column with null values

Element type with quantifier +,* set/list of columns (SET OF, LIST OF)

Nested element types TUPLE OF

XML attribute column of a table

IMPLIED null values allowed

REQUIRED null values not allowed

Default value DEFAULT constraint

(53)

Mapping to relational databases

– DTD is usually required

– Queries use SQL functionality

– RDBMS data types are used (e.g. prices are NUMERIC) – Problem: Mapping of collection types

• Subdivide into additional relations

10.4.1 Schema-based storage

• Subdivide into additional relations

– Example:

Comment_ID Customer_info Feedback

44901 C0001 F0001

ID Fname Lname Email

C0001 Charles Sanchez C.Sanchez@hotmail...

ID Type Content

F001 opinion Darjeeling Special…

Comment:

Customer_Info:

Feedback:

(54)

Mapping to object relational databases

– DTD is usually required

– Queries use SQL functionality – "Natural" mapping to tuple

types, collection types

10.4.1 Schema-based storage

types, collection types

– In case of irregular document structure databases contain many null values.

Comment_ID <Customer_info> <Feedback>

44901

Fname Lname Email

Charles Sanchez C.Sanchez@hotmail...

Type Content

opinion Darjeeling Specia…

Comment:

(55)

10.4.1 Schema-based storage

(56)

• Mapping of recursive data definitions

– DTDs can be recursive

– Infinite recursion is impossible on instance layer of a database – Procedure:

• Marking the nodes

• Subdividing into separate tables

10.4.1 Schema-based storage

• Subdividing into separate tables

• Use primary and foreign keys in RDBMS

• Use reference types in ORDBMS

<!ELEMENT book (front, body, references)>

<!ELEMENT references (book+)>

(57)

• Mapping of element sequences

– Sequence can be important

• Use an additional attribute in these cases

– Example:

10.4.1 Schema-based storage

Order Lesson

1 Introduction

2 XML basics

<lecture>

<lesson>Introduction</lesson>

<lesson>XML basics</lesson>

⇓ ⇓

⇓ ⇓

⇓ ⇓

⇓ ⇓

(58)

• Mapping of alternatives

– XML allows to specify alternatives – Example:

10.4.1 Schema-based storage

<!ELEMENT car (compactCar | sedan | van)*>

– Three possible storage variants

• Each alternative is stored as separate table column

• Subdivide alternatives in separate tables

• Use a table column of type XML type

(59)

Variant 1 – all alternatives in one table

10.4.1 Schema-based storage

– Problem: many null values (wasting storage space)

(60)

Variant 2 – subdivided into multiple tables

10.4.1 Schema-based storage

– For queries, combination of tables is needed

(61)

Variant 3 – Using column type XML

10.4.1 Schema-based storage

– XML type allows XML queries or DOM methods

(62)

Mapping with STORED (Semistructured TO RElational Data)

– Basic idea: Use data mining techniques on the XML structure to find a good mapping to tables [DFS99]

– Input

XML documents (or an average sample of the collection)

Query workload

Restrictions of storage space, number of tables, …

10.4.1 Schema-based storage

Restrictions of storage space, number of tables, …

No DTD or XML schema is required!

– Output

Relational schema

STORED-queries: Mapping instructions for XML documents to DB tables

– Procedure

Determine the XML subtrees with the largest support in the collection and in the queries

These subtrees are materialised in tables

Irregular data is stored in overflow tables according to the EDGE approach

(63)

Mapping with STORED – example

10.4.1 Schema-based storage

Subtrees with high support Subtrees with

high support

XML documents shown as tree structure

(64)

Schema-based storage with automatic mapping

– Advantages

Queries, data types, aggregation functions, views

Integration in other databases when storing structured data

10.4.1 Schema-based storage

Integration in other databases when storing structured data

– Disadvantages

Large schema, sparsely filled databases (many null values)

No flexible data types, storage of alternatives has problems

Less flexible queries

No information retrieval queries possible without additional extensions No full text operations for semi- or unstructured data

(65)

Summary Schema-based storage with automatic mapping – Schema definition:

Is usually required and analysed

not required, e.g. for STORED

– Document reconstruction:

Limited (requires logging of the mapping process)

– Queries:

10.4.1 Schema-based storage

– Queries:

Database queries

XML queries possible,but lack the XPath horizontal axes, e.g. following, preceding-sibling

– Special features:

Federation with existing databases is possible

– Efficiency:

High efficiency by using the DB-engine

– Usage:

For data-centric XML applications, but with limited nesting

(66)

User defined mapping

– Idea

• In all previously shown methods it is not possible to affect the storage in the DB

• With user defined mappings the user defines the storage structure

• The structure of XML documents and database schema can be designed

10.4.2 User-defined Mapping

• The structure of XML documents and database schema can be designed independently from each other

• Also possible: storing XML documents in existing databases

– Annotation of DTD and XML schema, respectively

• In many cases the mapping definition is combined with existing schema information

– Only limited XML queries possible

• Logging of the mapping process from XML documents to databases

• For a given query all relevant data has to be stored (lossless mapping)

(67)

• Mapping solutions with different specializations

– Algorithms, middleware, commercial applications, … – Varying amount of required input or user decisions – Many algorithms create different database schemas

10.4.2 User-defined Mapping

Many algorithms create different database schemas

• Two phases

– Mapping

• Assign a place for each node type in the DB

– Shredding

• Import the XML data as DB tuples

(68)

• The shredder can be part of the DB

– Usually requires an XML schema

– In the IBM Data Studio, the shredder is part of the

"annotated XML schema decomposition"

10.4.2 User-defined Mapping

– Direct approach in DB2:

• register the XML schema and call the stored procedure:

register xmlschema http://our.org/custacc from dec_files/custacc.xsd as cust_schema ;

complete xmlschema cust_schema enable decomposition ;

call SYSPROC.XDBDECOMPXML ('VRODRIG', 'CUST_SCHEMA', ? ,

?, 1, null, null, null)

(69)

Shredding without XML schema in DB2

– XMLTABLE function in combination with an INSERT

10.4.2 User-defined Mapping

INSERT INTO ENVELOPEXT (MAILFROM, MAILTO, MAILDATE, SUBJECT) SELECT MAILFROM, MAILTO, MAILDATE, SUBJECT

FROM XMLTABLE(

XMLNAMESPACES('http://www.sal.com/mails' AS "email"), XMLNAMESPACES('http://www.sal.com/mails' AS "email"), '$doc/email:mails/mail' (: some xquery-expression :) PASSING xml-source AS "doc"

COLUMNS

MAILFROM VARCHAR (100) PATH 'envelope/from', MAILTO VARCHAR (100) PATH 'envelope/to',

MAILDATE VARCHAR (30) PATH 'envelope/email:Date', SUBJECT VARCHAR (100) PATH 'envelope/Subject') AS T;

(70)

Example:

10.4.2 User-defined Mapping

mapping instruction XML document

(71)

Mapping instruction

– Example syntax for XML-DBMS (Roland Bourret)

10.4.2 User-defined Mapping

<ClassMap>

<ElementType Name="sales:SalesOrder"/>

<ToClassTable>

Connection between elements

<ToClassTable>

<Table Name="Sales"/>

</ToClassTable>

<PropertyMap>

<Attribute Name="SONumber"/>

<ToColumn>

<Column Name="Number"/>

</ToColumn>

</PropertyMap>

</ClassMap>

between elements and tables

Connection between

elements/attributes and table columns

(72)

Remarks

– Many different mapping languages or schema annotations

• Automatic mappings usually have an internal mapping language

10.4.2 User-defined Mapping

language

– Remember the mapping constructs from last lecture (9). The SQL/XML annotations are a mapping language, too.

– DB2 uses similar annotations as SQL/XML

• see next slide:

(73)

Name Balance

Joe 2000

Jim 3500

Mapping SQL tables

<xsd:complexType xmlns:db2-xdb=

"http://www.ibm.com/xmlns/prod/db2/xdb1"

name="ROW.ACCOUNT">

<xsd:sequence>

<xsd:element name="NAME"

type="CHAR_20"

db2-xdb:rowSet="Account"

db2-xdb:column="Name"/>

<xsd:element name="BALANCE"

type="NUMERIC_12_2"/>

db2-xdb:rowSet="Account"

db2-xdb:column="Balance"/>

</xsd:sequence>

CREATE TABLE Account (

Name CHAR(20),

Balance NUMERIC(12,2), );

Mapping SQL table columns to XML elements

Mapping table rows to XML

<row>

elements

Jim 3500

<ACCOUNT>

<row>

<NAME>Joe</NAME>

<BALANCE>2000</BALANCE>

</row>

<row>

<NAME>Jim</NAME>

<BALANCE>3500</BALANCE>

</row>

</xsd:sequence>

</xsd:complexType>

<xsd:complexType name="TABLE.ACCOUNT">

<xsd:sequence>

<xsd:element name="row"

type="ROW.ACCOUNT"/>

</xsd:sequence>

</xsd:complexType>

<xsd:element name="ACCOUNT"

elements

(74)

Summary schema-based storage with user defined mapping

Schema definition:

Depends on mapping language

Document reconstruction:

Not possible in most cases (requires logging of the mapping process)

Queries:

10.4.2 User-defined Mapping

Queries:

Database queries

XML queries in rare cases only!

Special features:

Integration with existing databases is possible

Efficiency:

High efficiency by using the DB-engine

Usage:

For data-centric XML applications

(75)

10.1 Motivation

10.2 Text-based storage 10.3 Model-based storage

10. XML Storage 1

10.4 Schema-based storage 10.5 Conclusion

10.6 Overview and References

(76)

• Different methods for storage of XML documents

– Text-based

• Storing whole XML documents as string

• Can use full text index or path index

– Model-based

Generic mapping of the tree structure

– Schema-based

10.5 Conclusion

– Schema-based

• Detect and analyse the structure of the XML documents

Derive a DB schema from the structure

– Hybrid approaches

• A combination of some of those methods

– No algorithm has the optimal solution for all kind of XML documents

– Reasonable solution is heavily dependent on the application

(77)

• "XML und Datenbanken" [Tür08]

Can Türker

Lecture, University of Zurich, 2008

• "XML und Datenbanken" [KM03]

M. Klettke, H. Meier dpunkt.verlag, 2003

• "Generierung eines adaptiven Datenbankschemas für datenzentrierte XML- Dokumente" [Bus08]

10.6 References

Dokumente" [Bus08]

Carsten Busche

Diplomarbeit, TU Braunschweig, 2008

• [FK99]

D. Florescu, D. Kossmann: Storing and Querying XML Data using an RDBMS. IEEE Data engineering Bulletin (DEBU), Volume 22(3), Seiten 27-34, 1999.

• [DFS99]

A. Deutsch, M.F. Fernández, D. Suciu: Storing Semistructured Data with STORED.

Proceedings of the 1999 ACM SIGMOD international conference on Management of data, Seiten 431-442, ACM, 1999.

(78)

Introduction and Basics 1. Introduction

2. XML Basics

3. Schema Definition 4. XML Processing Querying XML

Producing XML 9. Producing XML Storing XML

10. XML storage

11.Relational XML

10.6 Overview

Querying XML

5. XPath & SQL/XML Queries

6. XQuery Data Model 7. XQuery

XML Updates

8. XML Updates & XSLT

11.Relational XML storage

12. Storage Optimization Systems

13. Technology Overview

(79)

• Now, or ...

• Room: IZ 232

• Office our: Tuesday, 12:30 – 13:30 Uhr

Questions, Ideas, Comments

• Office our: Tuesday, 12:30 – 13:30 Uhr or on appointment

• Email: eckstein@ifis.cs.tu-bs.de

Referenzen

ÄHNLICHE DOKUMENTE

• EMF is a modelling framework and code generation facility for building tools and other applications based on a structured data model.. • From a model specification described in

rdf:Property rdf:Property rdf:Property rdfs:Resource rdfs:Resource rdf:type The subject is an instance of a class. rdfs:subClassOf The subject is a subclass of

ƒ Datentypen keine eigenständige Objekte: beschreiben immer Inhalt von Element oder Attribut. ƒ Attribut-Werte

• Alle Elemente einer Menge sind spezifiziert. • Zugriff auf

Datentyp: gültiger Inhalt von Elementen oder Attributen Formal repräsentiert ein Datentyp eine Menge von gültigen Werten, den so genannten Wertebereich Wertebereich..

mit einem Wert in keyref kann man nicht ein einzelnes Element einer Liste, die im key steht, referenzieren.  Vorgängerachsen im XPath-Ausdruck von field nicht zugelassen → kein

 Heuristik: Zwei Elemente sind ähnlich, wenn ihre Nachbarn ähnlich

&lt;!ELEMENT order(part+, destination, note?) &gt;.