XML Databases
10 O
Silke Eckstein Andreas Kupfer
Institut für Informationssysteme
Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
10 . XML Storage 1 – O verview
10.1 Motivation
10.2 Text-based storage 10.3 Model-based storage
10. XML Storage 1
10.4 Schema-based storage 10.5 Conclusion
10.6 Overview and References
• Applications require different types of XML documents – Structure vs. content
– Regular vs. irregular
• Thus, XML documents are – Data-centric
– Document-centric
– or somewhere in-between
10.1 Motivation
– or somewhere in-between
• Questions
– Storage of XML documents
– Efficient processing of queries on the stored documents or data
• There are several methods for storage – 1st goal: Learn and understand methods – 2nd goal: Classify methods
• Principles
• Advantages and disadvantages
• Usage
• Characterisation of XML documents:
– Data-centric documents
• Structured, regular
• E.g. product catalog, order, invoice
– Document-centric documents
10.1 Motivation
– Document-centric documents
• Unstructured, irregular
• E.g. scientific article, book, email, web page
– Semi-structured documents
• Data-centric and document-centric parts
• E.g. publications, Amazon, MS Press (example chapters)
• Requirements for the physical layer:
– Order preserving and lossless storage of XML documents
– Efficient access to XML documents or parts thereof
Quick response time for
10.1 Motivation
• Quick response time for
– Queries
– Update operations
• Indexing
• Transaction processing
• Support of XPath and XQuery
• Support of SAX and DOM for applications
• Storage approaches for XML documents
– Text-based
• Storage as character data
– Model-based
10.1 Motivation
• On top of the relational DBMS
• Inside the relational DBMS
– Schema-based
• Mapping to (object-)relational databases
– Deriving the database schema from the XML structure – Using user defined mapping procedures
10.1 Motivation
Text-based
storage Model-based
storage Schema-based
storage
– Examples –
10.1 Motivation
10.2 Text-based storage 10.2.1 Index structures
10. XML Storage 1
10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion
10.6 Overview and References
• The whole XML document text is stored as character data
– File in the file system
– CLOB (Character-Large-OBject) in the DBS
• Operations on documents as a whole are very efficient
– Reading and writing the whole document
10.2 Text-based storage
Reading and writing the whole document
– But the content is monolithic and opaque with respect to the relational query engine (query can't inspect a fragment)
• Getting granular access requires additional support
– Full text index
– Path index
• Index structures for XML documents
allow efficient access for specific queries
– Different types of indexes are optimized for different types of queries
10.2.1 Index structures
• Generate redundancy
– Index has to be up-to-date by propagating data changes
• Index structures can be storage structures as well
– They define the storage method
• Types of index structures
– Value index
• Indexes atomar values of an XML document, like element content or attribute values
• Index format for structured parts of XML documents
• Already known from databases (B-trees, hash index, …)
– Full text index
10.2.1 Index structures
– Full text index
• Indexes single words from the full text
• Index format for unstructured parts of XML documents
• Already known from Information Retrieval (inverted lists, tries, suffix trees, …)
– Path index
• Indexes subtrees/paths in an XML document
• Index format for semistructured parts of XML documents
• Already known from object-databases (access support relations, …)
• B-tree as value index for an XML fragment document
10. 2.1 Index structures
• Full text index
– Not limited to exact matches
•
Keyword-based search and boolean retrieval
•
Pattern search (with regular expressions)
– Use of
•
Statistical, word-based methods
10. 2.1 Index structures
•
Statistical, word-based methods
– Stop word removal
– Elimination of uncommon items
•
Linguistic methods
– Normalization of words (e.g. capitalisation, hyphenation,)
– Word decomposition by rules (engl.) or dictionaries (german) – Stemming
•
Knowledge-based methods
– Use of ontologies and thesauri to search for synonyms, hypernyms and hyponyms
• Inverted list as full text index for XML
10. 2.1 Index structures
word occurrence word position in the text
10. 2.1 Index structures
word occurrence
word occurrence
• Path index
– Structure information must be identifiable and reconstructable
• Assigning the markup to the content as well as
• Representing the hierarchical nesting and order of
10. 2.1 Index structures
• Representing the hierarchical nesting and order of elements/attributes
– Especially suited for keyword search with regard to structure or path expressions
FOR $b IN //book
WHERE CONTAINS($b/author,"Benjamin") RETURN $b
• Types of path indexes
– Nested path index
•
Access to root node from every node
– Multi-index
•
Accessing parent nodes
10. 2.1 Index structures
•
Accessing parent nodes
– Join-index
•
Access parent and child nodes
– Access Support Relations (ASR)
•
Generalization of indexes above,
by listing all paths in a table
• Conclusion
– Efficient query processing on XML documents requires different types of index structures
– Value index
• For efficient access to structured parts
10. 2.1 Index structures
• For efficient access to structured parts
• Keyword search, value search
– Full text index
• For efficient access to unstructured parts
– Path index
• Using the document structure
• Navigating queries
• Summary text-based storage – Schema definition:
• not required
– Document reconstruction:
• documents stay in their original format
– Queries:
• Information retrieval queries
• Processing the markup of the queries
10.2 Text-based storage
• Processing the markup of the queries
• XML queries possible
– Special features:
• Full text functions
– Efficiency:
• Character string must be parsed on every access with XML processors expensive
• No concurrency on read or write no parallel processing
– Usage:
• For document-centric XML applications
• Suitable to only a limited extent also for semi-structured applications
10.1 Motivation
10.2 Text-based storage 10.3 Model-based storage
10.3.1 On top of the relational DBMS
10. XML Storage 1
10.3.1 On top of the relational DBMS 10.3.2 Inside the relational DBMS
10.4 Schema-based storage 10.5 Conclusion
10.6 Overview and References
• Idea: generic storage of the graph structure
– XML elements, XML attributes, … are nodes of a graph – Nesting of elements defines edges
– Nodes get an (internal) ID based on graph traversal
• Using relations or object classes to store elements and attributes
10.3.1 Model-based storage
attributes
• Document structure can be restored completely
• Extension for data type adapted storage is possible
ID Element name Value Reference to preceeding Rank
ID Attribute name Value Reference to element Elements
Attributes
• The EDGE approach [FK99]
10.3.1 Model-based storage
– Variant BINARY: horizontal partition of EDGE based on label
XML documents
• XML queries
– XML queries (XPath, XQuery) are mapped to SQL
10.3.1 Model-based storage
– XML queries (XPath, XQuery) are mapped to SQL queries (taking storage structures into account)
– Result of XML query is generated from result of database query
• "Labeling" of the result tuples
• Result is in XML format
• Example: list bargain buy with prices
10.3.1 Model-based storage
SELECT a.content, b.content FROM Edge a, Edge b WHERE (a.label = 'price') AND (a.content < 10.00) AND (b.label = 'description')
AND (b.parent = a.parent) AND (a.key = b.key)
10.3.1 Model-based storage
• DOM-based storage
– Information from the
Document Object Model are stored in the database – Storage alternatives
– Storage alternatives
• (Object-)relational databases
• Object-oriented databases
• Developing own data
structure
10.3.1 Model-based storage
Node type:
ELEMENT Node type:
ATTRIBUTE
DOM-based storage – example
Node type:
TEXT
• XML Queries
– XML queries (DOM method invocations) are mapped
10.3.1 Model-based storage
– XML queries (DOM method invocations) are mapped to SQL queries (taking storage structures into
account)
– Result of method invocation is generated from result
of database query
Summary model-based storage on top of the DBMS
–
Schema definition:
• not required for storage
–
Document reconstruction:
• Possible, but expensive
–
Queries:
• XML queries possible
10.3.1 Model-based storage
• XML queries possible
• Adapted database queries
–
Special features:
• Querying many elements/attributes is expensive
–
Efficiency:
• Navigation from the given context is efficient
• Restoring the document and evaluating path expressions is inefficient
–
Usage:
• For data- and document-centric as well as for semi-structured XML applications
10.1 Motivation
10.2 Text-based storage 10.3 Model-based storage
10.3.1 On top of the relational DBMS
10. XML Storage 1
10.3.1 On top of the relational DBMS 10.3.2 Inside the relational DBMS
10.4 Schema-based storage 10.5 Conclusion
10.6 Overview and References
• SQL/XML
– Datatype XML with
10.3.2 XML data type
<City>
<Name>
Braunschweig
</Name>
<Zip>38100</Zip>
<Zip>38106</Zip>
<State>
Niedersachsen
</State>
</City>
<City>
<City>
Storing XML documents inside
the database as values of type XML
Generating XML documents using SQL/XML functions
– Datatype XML with belonging functions – Mapping between
SQL and XML
– Embedding XQuery in SQL
SQL database XML datatype SQL XQuery
Mapping between SQl and XML
• A value of the data type XML can contain
– whole XML document – XML element
– a set of XML elements
• All XML publishing operators from chapter 9.4
10.3.2 XML data type
• All XML publishing operators from chapter 9.4
create values of the data type XML, not a string
10.3.2 XML data type
XML(SEQUENCE)
XML(CONTENT(ANY))
NULL or document node Untyped elements &
attributes,
elements not NULL 1 element child
Validated against schema
XML(CONTENT(XMLSCHEMA)) XML(CONTENT(UNTYPED))
XML(DOCUMENT(ANY))
XML(DOCUMENT(UNTYPED)) XML(DOCUMENT(XMLSCHEMA))
1 element child Validated against schema
1 element child
• Specification of XML type
• Modifiers are optional
• Primary type modifier
10.3.2 XML data type
XML [({DOCUMENT|CONTENT|SEQUENCE}
[({ANY|UNTYPED|XMLSCHEMA schema name})])]
• Primary type modifier
– DOCUMENT (XML document) – CONTENT (XML element)
– SEQUENCE (sequence of XML elements)
• Secondary type modifier
– UNTYPED
– XMLSCHEMA (typed) – ANY (may be typed)
• Create a table that is an XML data type in itself
– CREATE TABLE XMLDOCUMENT OF XMLTYPE;
• Create a table containing an XMLType data type column
10.3.2 XML data type
column
CREATE TABLE XML (
ID NUMBER NOT NULL,
XML XMLTYPE,
CONSTRAINT XPK PRIMARY KEY (ID) );
• Example: Definition of an XML type column
10.3.2 XML data type
ID Name
123 <Groups>Annabelle</Groups>
CREATE TABLE Groups (
ID INTEGER,
Name XML );
123 <Groups>Annabelle</Groups>
234 <Groups>Magdalena, Marius</Groups>
345 <?xml version 1.0?>
<Groups>
<Person>Patrick</Person>
<Person>Robert</Person>
</Groups>
654 <Groups>Rebecca</Groups>
<Groups>Torben</Groups>
• Characteristics
– Allowed values:
• XML documents (including prolog)
• XML content according to XML 1.0 (includes pure text comments, PI?)
• NULL
10.3.2 XML data type
• NULL
– No comparison possible (compare CLOB in SQL)
• User can define an order, if comparison is necessary
– No corresponding type in programming languages for embedding in SQL available
– Standard defines operators to convert to other SQL
data types
• Parsing & Serialization
– XMLParse:
• Parses a string value using an XML parser
• Produces value whose specific type is
10.3.2 XML data type
<City>
<Name>
Braunschweig
</Name>
<Zip>38100</Zip>
<Zip>38106</Zip>
<State>
Niedersachsen
</State>
</City>
• Produces value whose specific type is
XML(DOCUMENT(ANY)), or …CONTENT…, or
– XMLSerialize
• Transforms an XML value into a string value (CHAR, VARCHAR, CLOB, or BLOB)
<City><Name>
Braunschweig
</Name>
<Zip>38100</Zip>
<Zip>38106</Zip>
<State>
Niedersachsen
</State>
• Validation of XML
– Is like integrity constraints in DBs – Requires an XML Schema
– XML Schemas may be registered with the SQL-server
10.3.2 XML data type
XML Schemas may be registered with the SQL-server
• Implementation-defined mechanism
• Known by SQL name & by target namespace URI
– Schema does need a unique name
• Used by XMLValidate(), IS VALID, and to restrict values of
XML(DOCUMENT-or-CONTENT(XMLSCHEMA))
• Schema registration
10.3.2 XML data type
Register XMLSCHEMA
'http://www.Alfred-Moos.de/GrussSchema.xsd' FROM 'file://c:/XML_Schemata/GrussSchema.xsd' AS GrussSchema
COMPLETE
;
CREATE TABLE Dokument_XML
(Dokument_XML_Nr CHAR (4) NOT NULL
PRIMARY KEY, Dokument XML,
CONSTRAINT validieren
CHECK (Dokument IS VALIDATED ACCORDING TO XMLSCHEMA ID GrussSchema
) )
;
• Schema definition
– Syntax
10.3.2 XML data type
XML(CONTENT(XMLSCHEMA) <schema> [<elements>]))
<schema> := URI <namespace> [LOCATION <loc>]
| NO NAMESPACE [LOCATION <loc>]
–
| NO NAMESPACE [LOCATION <loc>]
| ID <registered schema name>
<element> := [NAMESPACE <namespace>]
ELEMENT <element name>
• New functions and predicates:
– XMLValidate
•
Validates an XML value against an XML Schema (or target namespace), returning new XML value with type annotations
– IS VALID
•
Tests an XML value to determine whether or not it is valid
10.3.2 XML data type
•
Tests an XML value to determine whether or not it is valid according to an XML Schema (or target namespace); return
true/falsewithout altering the XML value itself
– IS DOCUMENT
•
determines whether an XML value satisfies the (SQL/XML) criteria for an XML document
– IS CONTENT
•
determines whether an XML value satisfies the (SQL/XML)
criteria for XML content
• Benefits of schema registration
– Security issues
• Schemas cannot “disappear” without SQL- server knowing about it
• Schemas cannot be “hijacked” (altered in
10.3.2 XML data type
• Schemas cannot be “hijacked” (altered in inappropriate ways) without SQL-server knowing about it
• Documents cannot be marked “valid”
against schemas unless SQL-server knows
about them
• Predefined schemas (build-in namespaces)
– xs:http://www.w3.org/2001/XMLSchema
– xsi:http://www.w3.org/2001/XMLSchema-instance
– sqlxml:http://standards.iso.org/iso/9075/2003/sqlxml
– More depending on the DB implementation
10.3.2 XML standard
More depending on the DB implementation
• Completely supported per XML+Namespaces:
– XMLElement, XMLForest, XMLTable
– Default namespace, explicit namespace (prefix)
– Declare namespace within scopes of WITH clause, column definitions, constraint definitions, insert/delete/update
statements, compound statements
• SQL/XML standard published as
– ISO/IEC 9074-14:2003
• Mappings and Publishing Functions
– ISO/IEC 9075-14:2006
• Adds XQuery, including Data Model, Validation
10.3.2 SQL/XML standard
• Adds XQuery, including Data Model, Validation
– ISO/IEC 9075-14:2008
• Updates
• SQL/XML:2003 plus
– Additional publishing functions – XQuery data model
– More precise XML type (modifiers) – XMLQuery, XMLTable
10.3.2 SQL/XML standard
– XMLQuery, XMLTable – XMLValidate, IS VALID
– XMLExists, IS DOCUMENT, IS CONTENT
– Casting between XML type and SQL types
• Overview of some operators for the XML type
– XMLELEMENT – creates an XML element node
– XMLFOREST – creates a sequence of XML element nodes from a table
10.3.2 SQL/XML standard
from a table
– XMLCOMMENT – creates an XML comment node – XMLTEXT – creates a text node
– XMLPI – creates a processing instruction
– XMLAGG – aggregates XML values of a group
– XMLCONCAT – concatenates XML type values
– XMLTRANSFORM – applies an XSL to a document
• ... Overview of some operators for the XML type
– XMLPARSE – a well-formed SQL text to XML value
– XMLSERIALIZE – converts an XML value to a SQL text – XMLDOCUMENT – creates an XML document node
10.3.2 SQL/XML standard
– XMLDOCUMENT – creates an XML document node from an XML value
– XMLVALIDATE – validates an XML value with a schema – XMLQUERY – evaluates an XQuery expression
– XMLTABLE – transforms an XQuery result to a SQL table – XMLITERATE – transforms an XQery sequence to a SQL
table
• Review of SQL/XML
– Two components
•
A data type XML to store XML data
•
Functions to map relational structures to XML
– Only construction operators
•
No extraction of values or search
10.3.2 SQL/XML standard
•
No extraction of values or search
•
But construction operators are based on XQuery
– Mapping of tables, schemas, catalogues ignores some information from the relational schema
•
UNIQUE
•
REFERENCES
•
CHECK
– Further extensions are expected
Summary model-based storage inside the DBMS
– Schema definition:
• not required for storage
– Document reconstruction:
• Possible without problems
– Queries:
10.3.2 Model-based storage
– Queries:
• XML queries possible
– Special features:
• Full integration of SQL and XML
– Efficiency:
• Vendor specific
• Optimized data structures possible
– Usage:
• For data- and document-centric as well as for semi-structured XML applications
10.1 Motivation
10.2 Text-based storage 10.3 Model-based storage 10.4 Schema-based storage
10. XML Storage 1
10.4 Schema-based storage
10.4.1 Automatic mapping 10.4.2 User-defined mapping 10.5 Conclusion
10.6 Overview and References
• Motivation
–
XML content shall be stored in a conventional database
–Accepting the loss of native access
–
DB schema is derieved from a DTD or an XML schema
• Problem
–
Generate DB schema automatically
10.4.1 Schema-based storage
Generate DB schema automatically
–
Thereby use as much structure information as possible
• General approach for mapping from a DTD
–
Transform DTD into a tree representation
–
Nodes: element types, attributes, etc. (type layer!!!)
–
Edges: nesting relationships of element types and their restrictions
–Traverse tree in order to transform nodes and edges into database
tables (according to certain rules)
• Generating the DB schema for a DTD:
– Rules to map element types:
10.4.1 Schema-based storage
XML element type column of a table
Sequence of element types columns of a table Alternative of element types column of a table
Element type with quantifier ? column with null values
– Rules to map attributes:
Element type with quantifier ? column with null values
Element type with quantifier +,* set/list of columns (SET OF, LIST OF)
Nested element types TUPLE OF
XML attribute column of a table
IMPLIED null values allowed
REQUIRED null values not allowed
Default value DEFAULT constraint
• Mapping to relational databases
– DTD is usually required
– Queries use SQL functionality
– RDBMS data types are used (e.g. prices are NUMERIC) – Problem: Mapping of collection types
• Subdivide into additional relations
10.4.1 Schema-based storage
• Subdivide into additional relations
– Example:
Comment_ID Customer_info Feedback
44901 C0001 F0001
ID Fname Lname Email
C0001 Charles Sanchez C.Sanchez@hotmail...
ID Type Content
F001 opinion Darjeeling Special…
Comment:
Customer_Info:
Feedback:
• Mapping to object relational databases
– DTD is usually required
– Queries use SQL functionality – "Natural" mapping to tuple
types, collection types
10.4.1 Schema-based storage
types, collection types
– In case of irregular document structure databases contain many null values.
Comment_ID <Customer_info> <Feedback>
44901
Fname Lname Email
Charles Sanchez C.Sanchez@hotmail...
Type Content
opinion Darjeeling Specia…
Comment:
10.4.1 Schema-based storage
• Mapping of recursive data definitions
– DTDs can be recursive
– Infinite recursion is impossible on instance layer of a database – Procedure:
• Marking the nodes
• Subdividing into separate tables
10.4.1 Schema-based storage
• Subdividing into separate tables
• Use primary and foreign keys in RDBMS
• Use reference types in ORDBMS
<!ELEMENT book (front, body, references)>
<!ELEMENT references (book+)>
• Mapping of element sequences
– Sequence can be important
• Use an additional attribute in these cases
– Example:
10.4.1 Schema-based storage
Order Lesson
1 Introduction
2 XML basics
<lecture>
<lesson>Introduction</lesson>
<lesson>XML basics</lesson>
…
⇓ ⇓
⇓ ⇓
⇓ ⇓
⇓ ⇓
• Mapping of alternatives
– XML allows to specify alternatives – Example:
10.4.1 Schema-based storage
<!ELEMENT car (compactCar | sedan | van)*>
– Three possible storage variants
• Each alternative is stored as separate table column
• Subdivide alternatives in separate tables
• Use a table column of type XML type
• Variant 1 – all alternatives in one table
10.4.1 Schema-based storage
•
– Problem: many null values (wasting storage space)
• Variant 2 – subdivided into multiple tables
10.4.1 Schema-based storage
•
– For queries, combination of tables is needed
• Variant 3 – Using column type XML
10.4.1 Schema-based storage
– XML type allows XML queries or DOM methods
• Mapping with STORED (Semistructured TO RElational Data)
– Basic idea: Use data mining techniques on the XML structure to find a good mapping to tables [DFS99]
– Input
• XML documents (or an average sample of the collection)
• Query workload
• Restrictions of storage space, number of tables, …
10.4.1 Schema-based storage
• Restrictions of storage space, number of tables, …
• No DTD or XML schema is required!
– Output
• Relational schema
• STORED-queries: Mapping instructions for XML documents to DB tables
– Procedure
• Determine the XML subtrees with the largest support in the collection and in the queries
• These subtrees are materialised in tables
• Irregular data is stored in overflow tables according to the EDGE approach
• Mapping with STORED – example
10.4.1 Schema-based storage
Subtrees with high support Subtrees with
high support
XML documents shown as tree structure
• Schema-based storage with automatic mapping
– Advantages
•
Queries, data types, aggregation functions, views
•
Integration in other databases when storing structured data
10.4.1 Schema-based storage
•
Integration in other databases when storing structured data
– Disadvantages
•
Large schema, sparsely filled databases (many null values)
•
No flexible data types, storage of alternatives has problems
•
Less flexible queries
– No information retrieval queries possible without additional extensions – No full text operations for semi- or unstructured data
• Summary Schema-based storage with automatic mapping – Schema definition:
• Is usually required and analysed
• not required, e.g. for STORED
– Document reconstruction:
• Limited (requires logging of the mapping process)
– Queries:
10.4.1 Schema-based storage
– Queries:
• Database queries
• XML queries possible,but lack the XPath horizontal axes, e.g. following, preceding-sibling
– Special features:
• Federation with existing databases is possible
– Efficiency:
• High efficiency by using the DB-engine
– Usage:
• For data-centric XML applications, but with limited nesting
• User defined mapping
– Idea
• In all previously shown methods it is not possible to affect the storage in the DB
• With user defined mappings the user defines the storage structure
• The structure of XML documents and database schema can be designed
10.4.2 User-defined Mapping
• The structure of XML documents and database schema can be designed independently from each other
• Also possible: storing XML documents in existing databases
– Annotation of DTD and XML schema, respectively
• In many cases the mapping definition is combined with existing schema information
– Only limited XML queries possible
• Logging of the mapping process from XML documents to databases
• For a given query all relevant data has to be stored (lossless mapping)
• Mapping solutions with different specializations
– Algorithms, middleware, commercial applications, … – Varying amount of required input or user decisions – Many algorithms create different database schemas
10.4.2 User-defined Mapping
Many algorithms create different database schemas
• Two phases
– Mapping
• Assign a place for each node type in the DB
– Shredding
• Import the XML data as DB tuples
• The shredder can be part of the DB
– Usually requires an XML schema
– In the IBM Data Studio, the shredder is part of the
"annotated XML schema decomposition"
10.4.2 User-defined Mapping
– Direct approach in DB2:
• register the XML schema and call the stored procedure:
register xmlschema http://our.org/custacc from dec_files/custacc.xsd as cust_schema ;
complete xmlschema cust_schema enable decomposition ;
call SYSPROC.XDBDECOMPXML ('VRODRIG', 'CUST_SCHEMA', ? ,
?, 1, null, null, null)
• Shredding without XML schema in DB2
– XMLTABLE function in combination with an INSERT
10.4.2 User-defined Mapping
INSERT INTO ENVELOPEXT (MAILFROM, MAILTO, MAILDATE, SUBJECT) SELECT MAILFROM, MAILTO, MAILDATE, SUBJECT
FROM XMLTABLE(
XMLNAMESPACES('http://www.sal.com/mails' AS "email"), XMLNAMESPACES('http://www.sal.com/mails' AS "email"), '$doc/email:mails/mail' (: some xquery-expression :) PASSING xml-source AS "doc"
COLUMNS
MAILFROM VARCHAR (100) PATH 'envelope/from', MAILTO VARCHAR (100) PATH 'envelope/to',
MAILDATE VARCHAR (30) PATH 'envelope/email:Date', SUBJECT VARCHAR (100) PATH 'envelope/Subject') AS T;
• Example:
10.4.2 User-defined Mapping
mapping instruction XML document
• Mapping instruction
– Example syntax for XML-DBMS (Roland Bourret)
10.4.2 User-defined Mapping
<ClassMap>
<ElementType Name="sales:SalesOrder"/>
<ToClassTable>
Connection between elements
<ToClassTable>
<Table Name="Sales"/>
</ToClassTable>
<PropertyMap>
<Attribute Name="SONumber"/>
<ToColumn>
<Column Name="Number"/>
</ToColumn>
</PropertyMap>
</ClassMap>
between elements and tables
Connection between
elements/attributes and table columns
• Remarks
– Many different mapping languages or schema annotations
• Automatic mappings usually have an internal mapping language
10.4.2 User-defined Mapping
language
– Remember the mapping constructs from last lecture (9). The SQL/XML annotations are a mapping language, too.
– DB2 uses similar annotations as SQL/XML
• see next slide:
Name Balance
Joe 2000
Jim 3500
Mapping SQL tables
<xsd:complexType xmlns:db2-xdb="http://www.ibm.com/xmlns/prod/db2/xdb1"
name="ROW.ACCOUNT">
<xsd:sequence>
<xsd:element name="NAME"
type="CHAR_20"
db2-xdb:rowSet="Account"
db2-xdb:column="Name"/>
<xsd:element name="BALANCE"
type="NUMERIC_12_2"/>
db2-xdb:rowSet="Account"
db2-xdb:column="Balance"/>
</xsd:sequence>
CREATE TABLE Account (
Name CHAR(20),
Balance NUMERIC(12,2), );
Mapping SQL table columns to XML elements
Mapping table rows to XML
<row>
elements
Jim 3500
<ACCOUNT>
<row>
<NAME>Joe</NAME>
<BALANCE>2000</BALANCE>
</row>
<row>
<NAME>Jim</NAME>
<BALANCE>3500</BALANCE>
</row>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="TABLE.ACCOUNT">
<xsd:sequence>
<xsd:element name="row"
type="ROW.ACCOUNT"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="ACCOUNT"
elements
• Summary schema-based storage with user defined mapping
–
Schema definition:
• Depends on mapping language
–
Document reconstruction:
• Not possible in most cases (requires logging of the mapping process)
–
Queries:
10.4.2 User-defined Mapping
–
Queries:
• Database queries
• XML queries in rare cases only!
–
Special features:
• Integration with existing databases is possible
–
Efficiency:
• High efficiency by using the DB-engine
–
Usage:
• For data-centric XML applications
10.1 Motivation
10.2 Text-based storage 10.3 Model-based storage
10. XML Storage 1
10.4 Schema-based storage 10.5 Conclusion
10.6 Overview and References
• Different methods for storage of XML documents
– Text-based
• Storing whole XML documents as string
• Can use full text index or path index
– Model-based
• Generic mapping of the tree structure
– Schema-based
10.5 Conclusion
– Schema-based
• Detect and analyse the structure of the XML documents
• Derive a DB schema from the structure
– Hybrid approaches
• A combination of some of those methods
– No algorithm has the optimal solution for all kind of XML documents
– Reasonable solution is heavily dependent on the application
• "XML und Datenbanken" [Tür08]
– Can Türker
– Lecture, University of Zurich, 2008
• "XML und Datenbanken" [KM03]
– M. Klettke, H. Meier – dpunkt.verlag, 2003
• "Generierung eines adaptiven Datenbankschemas für datenzentrierte XML- Dokumente" [Bus08]
10.6 References
Dokumente" [Bus08]
– Carsten Busche
– Diplomarbeit, TU Braunschweig, 2008
• [FK99]
– D. Florescu, D. Kossmann: Storing and Querying XML Data using an RDBMS. IEEE Data engineering Bulletin (DEBU), Volume 22(3), Seiten 27-34, 1999.
• [DFS99]
– A. Deutsch, M.F. Fernández, D. Suciu: Storing Semistructured Data with STORED.
Proceedings of the 1999 ACM SIGMOD international conference on Management of data, Seiten 431-442, ACM, 1999.