XML Databases
Silke Eckstein Andreas Kupfer
Institut für Informationssysteme
Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
9. Producing XML, 06.01.2010
9.1 Introduction
9.2 Mapping tables
9.3 Individual mapping
9. Producing XML
9.4 Mapping query results 9.5 Overview
9.6 References
• Why map relational database contents to XML?
– Interoperability
• We may want to use (parts of) our RDB contents in many
different application contexts (XML as data interchange format)
– Reconstruction
9.1 Introduction
– Reconstruction
• We might have stored (parts of) our XML documents in an RDBMS in the first place (RDBMS as XML store)
– Dynamic XML contents
• We may use RDBMS queries to retrieve dynamic XML contents (cf. dynamic Web sites)
– Wrapping
• Everybody likes XML …, so why don't we give it to them?
• Why do we look at that mapping?
– What we're really interested in is the mapping in the opposite direction:
How to get XML into a database!
– Yes, but…
9.1 Introduction
– Yes, but…
• This one is easier to start with.
• We do get some insight for the other mapping.
• We can see some of the problems.
• We'll see in what respect XML supports semi-structured data.
We'll learn more about SQL as well.
• XML vs. SQL
– Hierarchical vs. flat
– Loose schema vs. fixed schema
– Case-sensitive names vs. case-insensitive names
9.1 Introduction
Case-sensitive names vs. case-insensitive names – Several type concepts
– Strictly unicode based vs. heterogeneous encodings – Ordered vs. unordered
• Mapping of databases to XML
– Standard mapping of tables
– Individual mapping instructions
– Standard + individual mapping of query results
9.2 Mapping tables
9.1 Introduction
9.2 Mapping tables
9.3 Individual mapping
9. Producing XML
9.4 Mapping query results 9.5 Overview
9.6 References
• General approach
– How to map
• Table and column names to element and attribute names
• Data types to XML schema data types
• Data from the database to content in XML documents
9.2 Mapping tables
• Data from the database to content in XML documents
• Standard mapping of tables
– Database as a 3-tier hierarchy of
• Database
• Tables
• Columns
9.2 Mapping tables
Database
Table_1
Column_11 Column_12 Column_13
Table_2
Coulmn_21 Column_2o
Table_n
Column_n1 Column_np
• Columns
– Presentation of database contents and structures in an XML document
• As elements
• As elements and attributes
Column_11 Column_12 Column_13 Coulmn_21 Column_2o Column_n1 Column_np
• Mapping of SQL database to XML
– SQL charset to unicode (depends on implementation) – SQL identifiers to XML names
– SQL data types to XML schema data types
9.2 Mapping tables
SQL data types to XML schema data types – SQL values to XML values
– SQL tables to XML and XML schema documents – SQL schemas to XML and XML schema documents – SQL catalogues to XML and XML schema documents
• Mapping between SQL identifiers and XML names
• Charset
– XML is based on unicode, SQL is not
– Mapping between a SQL charset and unicode depends on the implementation
• Names
– Not every SQL name is allowed as XML name (delimited identifier)
9.2 Mapping tables
– Not every SQL name is allowed as XML name (delimited identifier) – Certain characters have to be masked (especially ":" and leading
"xml" )
• Ω _x03A9_ (the unicode value)
• Gehalt:FY2000 Gehalt_x003A_FY2000
• Work@home Work_x0040_home
• Home_Town Home_x005F_Town
– SQL name becomes XML name (in capitals)
• Employee <EMPLOYEE>…</EMPLOYEE>
• Mapping data type names
– Data type parameters are added to the XML name
• Example: DECIMAL_9_2, VARCHAR_10, BLOB_4000
• For TIME, TIMESTAMP, INTERVAL parameters are added about accuracy and time zone if applicable
– DOMAIN d in schema s in catalogue c is transformed to the XML name c.s.d (fully masked)
9.2 Mapping tables
XML name c.s.d (fully masked)
– DISTINCT TYPE is analogous to domain
– ARRAY of the type t with maximal m elements: ARRAY_m.t – MULTISET is analogous to ARRAY
– ROW types: depends on implementation, but begin with prefix Row
– Interval types, structured types and reference types can not be mapped
• Mapping of data types
– Using XML schema
• Many predefined data types
– <xsd:element name="city" type="xsd:string">
– <xsd:element name="zip" type="xsd:integer">
9.2 Mapping tables
– <xsd:element name="zip" type="xsd:integer">
– Extension of DTDs by reserved attributes
• <city xml-sqltype="varchar">Braunschweig</city>
• <zip xml-sqltype="integer">38100</zip>
– XML processors or applications have to know about and evaluate those information
• ... Mapping of data types
– Differences between SQL and XML schema
• SQL data types are mapped to the best matching XML schema type
• Facets limit the range of the XML schema type to the SQL range
9.2 Mapping tables
length scale maxElements domainName
maxLength minExponent final typeName
characterSet maxExponent catalogName mappedType
• Characteristics not mapped (like collation and character set) are annotated
– Cooperation of the namespaces XML schema and SQL/XML
• To define the SQL base data types und type constructors in XML
• Allows to map constructed and user defined SQL data types
characterSet maxExponent catalogName mappedType
collation userPrecision schemaName mappedElementType precision leadingPrecision
• Mapping of SQL basic data types: Character
– CHARACTER(20) CHARACTER SET LATIN1 COLLATION ENGLISH
<xsd:simpleType name="CHAR_20">
<xsd:annotation>
<xsd:appinfo>
<sqlxml:sqltype kind="PREDEFINED"
9.2 Mapping tables
Exact type description Exact type description
based on the data
<sqlxml:sqltype kind="PREDEFINED"
name="CHAR" length="20"
characterSetName="LATIN1"
collation="ENGLISH"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:restriction base="xsd:string">
<xsd:length value="20"/>
</xsd:restriction>
</xsd:simpleType>
based on the data type or type constructor from
SQL/XML
Mapping to data type of XML schema
• Mapping of SQL basic data types: Numeric
– NUMERIC(12,2) with an implementation using 13 decimals
9.2 Mapping tables
<xsd:simpleType name="NUMERIC_12_2">
<xsd:annotation>
<xsd:appinfo>
<sqlxml:sqltype kind="PREDEFINED"
name="NUMERIC"
There are different There are different attributes defined for name="NUMERIC"
precision="12"
scale="2"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:restriction base="xsd:decimal">
<xsd:totalDigits value="13"/>
<xsd:fractionDigits value="2"/>
</xsd:restriction>
</xsd:simpleType>
attributes defined for the various SQL data
types, like precision and fraction digits for
numeric data types
• Mapping of SQL basic data types: DECIMAL(12,2) ARRAY [10]
<xsd:complexType name="ARRAY_10.DECIMAL_12_2">
<xsd:annotation>
<xsd:appinfo>
<sqlxml:sqltype kind="ARRAY"
maxElements="10"
mappedElementType="NUMERIC_12_2"/>
9.2 Mapping tables
mappedElementType="NUMERIC_12_2"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:sequence>
<xsd:element name="Element"
minOccurs="0"
maxOccurs="10"
type="NUMERIC_12_2">
</xsd:element>
</xsd:sequence>
</xsd:complexType>
Maximum length of the array
• Mapping of SQL basic data types: CHARACTER(20) MULTISET
9.2 Mapping tables
<xsd:complexType name="MULTISET.CHAR_20">
<xsd:annotation>
<xsd:appinfo>
<sqlxml:sqltype kind="MULTISET"
mappedElementType="CHAR_20"/>
</xsd:appinfo>
</xsd:appinfo>
</xsd:annotation>
<xsd:sequence>
<xsd:element name="Element"
minOccurs="0"
maxOccurs="unbounded"
type="CHAR_20">
</xsd:element>
</xsd:sequence>
</xsd:complexType>
Unlimited cardinality
• Example: Mapping as elements Pizza Delivery: Address:
9.2 Mapping tables
Pid Name Category Address
1 Super Pizza 4 1
2 Turbo Pizza 3 2
Aid City Street Number
1 Braunschweig Big str. 55 2 Braunschweig Small str. 6
– Standard mapping of column names to element names
2 Turbo Pizza 3 2 2 Braunschweig Small str. 6
<PizzaDelivery>
<Pid>1</Pid>
<Name>Super Pizza</Name>
<Category>4</Category>
<Location>1</Location>
</PizzaDelivery>
• Example: Mapping as elements and attributes Pizza Delivery: Address:
9.2 Mapping tables
Pid Name Category Address
1 Super Pizza 4 1
2 Turbo Pizza 3 2
Aid City Street Number
1 Braunschweig Big str. 55 2 Braunschweig Small str. 6
– Standard mapping of the column names to attribute names
<PizzaDelivery Pid='1'
Name='Super Pizza' Category='4'
Location='1' />
• Mapping of keys and foreign keys
– Goal: Keys and foreign keys shall be represented appropriately in the XML document
– Mapping relations to element hierachy
• 1:n relations are mapped to nested elements
9.2 Mapping tables
• n:m relations are problematic
– Denormalized content in the XML document – possibly high redundancy
– Using XML schema
• Representing keys and foreign keys with key / keyref
– Using DTD
• Mapping keys and foreign keys to attributes (ID/IDREF)
• Make keys unique (ID must be unique document wide)
• Keys and foreign key as nested elements
9.2 Mapping tables
Pid Name Category Address
1 Super Pizza 4 1
2 Turbo Pizza 3 2
Aid City Street Number
1 Braunschweig Big str. 55 2 Braunschweig Small str. 6 FK
– Resolving of foreign key relations by embedding referenced elements
<!ELEMENT PizzaDelivery (Pid, Name, Category?, Address)>
<!ELEMENT Address (Aid, City, Street, Number)>
<!ELEMENT Pid (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Category (#PCDATA)>
<!ELEMENT Aid (#PCDATA)>
<!ELEMENT City (#PCDATA)>
• Mapping object relational databases
– Object relational databases support non atomic, complex columns
• Tuple values
• Collection values
9.2 Mapping tables
• Collection values
• Object values
• Reference values
– Existing structure shall be carried over to the XML document
• Instance layer
• Schema layer
• Mapping object relational databases
– Instance layer:Appropriate mapping of instances with complex attributes (tuple, sets or lists)
9.2 Mapping tables
Pid Name <Address> {Phone}
1 Super Pizza 4City Big str.Street 55Number 1
1 Super Pizza BS Big str. 55 {'0531/555-7447'
1 Super Pizza BS Big str. 55 {'0531/555-7447'
'0531/555-2232'}
<PizzaDelivery>
<Pid>1</Pid>
<Name>Super Pizza</Name>
<Address>
<City>BS</City>
<Street>Big str.</Street>
<Number>55</Number>
</Address>
<Phone>0531/555-7447</Phone>
• Mapping object relational databases
– Schema layer: Derive XML schema or DTD from object relational schema
9.2 Mapping tables
CREATE ROW TYPE AddressType(
City VARCHAR(25), Street VARCHAR(20), Number INTEGER);
CREATE TABLE PizzaDelivery(
CREATE TABLE PizzaDelivery(
Pid INTEGER NOT NULL PRIMARY KEY, Name VARCHAR(20) NOT NULL,
Address AddressType,
Phone SET(INTEGER NOT NULL));
<!ELEMENT PizzaDelivery (Pid, Name, Address, Phone+)>
<!ELEMENT Address (City, Street, Number)>
<!ELEMENT Pid (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Phone (#PCDATA)>
<!ELEMENT City (#PCDATA)>
…
• Mapping tables – summary
– Database output – complete – Required information – none – Variable output format – no
– Preservation of data types – by extended DTD or XML schema – Storing keys and foreign keys
9.2 Mapping tables
– Storing keys and foreign keys
• Mapping to hierachies in the XML document
• ID/IDREF
• XML schema
– Special features
• Parsers must evaluate DTD extensions
• XML schema is better suited
• Appropriate represenation of tuples, sets, lists and references from object relational databases in the XML document
9.1 Introduction
9.2 Mapping tables
9.3 Individual mapping
9. Producing XML
9.4 Mapping query results 9.5 Overview
9.6 References
• Variants for individual mapping instructions
1 2
Extended database query with transformation instructions
Standard
9.3 Individual mapping
2
3
Standard
transformation XML query language
Standard
transformation XSLT stylesheet
Database
XML document XML view
Standardized document Standardized
XML document
• (Extended) database query languages
9.3 Individual mapping
– First Idea: use basic SQL
Database
XML document
• Creating XML by using basic SQL
– Basic idea is to use string concatenation in the select clause to build XML markup
• || operator from the SQL standard
9.3 Individual mapping
postgres=# SELECT 'abc' || 'def' AS "unspecified";
• Concatenation requires values to be not NULL
– Implementation independent
SELECT '<pizzerias><name>'||Name||'</name>',
'<category>'||Category||'</category></pizzerias>' FROM PizzaDelivery;
postgres=# SELECT 'abc' || 'def' AS "unspecified";
Result:
abcdef
• Problems and limitations of basic SQL
– Valid XML documents require a header and one root element
• Workaround with additional SELECT statements
– Nesting is difficult
9.3 Individual mapping
• The output is a single table
• No dependance on previous or following SELECT statements
• Extend an SQL dialect by XML operators:
– Database query (in SQL) to select the data to be presented in XML
– XML query to define the syntax of the target format (e.g.
CONSTRUCT / RETURN / FOR XML)
• Example
Select * from PizzaDelivery, Address where (Address=Aid)
construct
<PizzaDelivery>
9.3 Individual mapping
<PizzaDelivery>
<name>
{$name}
</name>
<city>{$city}</city>
<address>{$street}
{$number}</address>
</PizzaDelivery>
Access to columns from the SQL result
Arbitrary XML names Arbitrary XML element and attribute
names
• Individual mapping instructions and XML queries
– Query to the database with XML query languages is supported
9.3 Individual mapping
Standard
Database
XML document XML view
Standard
transformation XML query language
• Implementing XML queries
– Naive Implementation
• Full content of the database is transformed to an XML document
• XML query is processed on this document
9.3 Individual mapping
• XML query is processed on this document
• Very inefficient (XML document can be very large, can contain irrelevant information)
– Better: XML view on the data in a database
• XML document will not be materialized (virtual)
• XML queries are processed on the virtual document Much processing within the native database engine
• Example: Silkroute
– Silkroute
• Middleware beween RDBMS and XML application
• Develloped by AT&T and University of Pennsylvania
– Creation of an XML view
9.3 Individual mapping
– Creation of an XML view
• Arbitrary output format
• View is not materialized
– XML queries to the view with XML-QL or XQuery – Problems
• Derive SQL queries from a view definition
• Processing of the whole view usually not neccessary
• Silkroute example
9.3 Individual mapping
Construct
<view>
from Address a, PizzaDelivery d construct
<PizzaDelivery>
<name>$d.name</name>
<address>
<city>$a.city</city>
<street>a.$street
a.$number</street>
</PizzaDelivery>
• Benefits of using standardized XML documents as intermediate format
– Less requirements, more flexibility, better compatibility
9.3 Individual mapping
Database
XML document
Standardized document Standardized
XML document Standard
transformation XSLT stylesheet
• Individual mapping instructions – summary
– Database output – complete or partial – Required information
• XML query
• XSLT (see last lecture)
• View definition
9.3 Individual mapping
• View definition
– Variable output format – yes
– Preservation of data types – usually not, but possible to deduce
– Storing keys and foreign keys
• Embedding, ID/IDREF, XML schema
– Special features
• none
9.1 Introduction
9.2 Mapping tables
9.3 Individual mapping
9. Producing XML
9.4 Mapping query results 9.5 Overview
9.6 References
• Mapping of SQL query results to XML documents or XML elements
– Standard XML representation
• Result table to rowset elements
• Every row to a row element
9.3 Mapping query results
• Every row to a row element
• Columns to subelements or XML attributes
• No variable structure
– Similar process is part of the SQL/XML standard
• SQL
• XML
9.4 Mapping query results
SELECT Name, Category, City FROM PizzaDelivery, Address WHERE City='Braunschweig' AND
PizzaDelivery.Address=Address.Aid
• XML
?<rowset>
<row no='1'>
<Name>Super Pizza</Name>
<Category>4</Category>
<City>Braunschweig</City>
</row>
</rowset>
…
?
• Creating XML by using defaults from the DBMS
– Implementation dependant, e.g.: MySQL has an option to produce XML output
• Shell command to execute a query with username and database name, piped to a file
9.4 Mapping query results
./mysql -ujon test --xml -e 'SELECT * FROM t1' > t1.xml
<?xml version="1.0"?>
– No need to dump the whole database (mysqldump --xml …)
<?xml version="1.0"?>
<resultset statement="SELECT * FROM t1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row> <field name="Pid">1</field>
<field name="Name">Super Pizza</field>
<field name="Category">4</field>
<field name="Location">1</field> </row>
<row>…</row>
</resultset>
• Mapping query results – summary
– Database output – partial
– Required information – database query or view – Variable output format – no
– Preservation of data types – usually not, but possible
9.4 Mapping query results
– Preservation of data types – usually not, but possible to deduce
– Storing keys and foreign keys
• Possible to represent relations within the table
– Embedding, ID/IDREF, XML schema
– Special features
• none
• Mapping query results – summary
– Database output – partial
– Required information – database query or view – Variable output format – yes SQL/XML
– Preservation of data types – usually not, but possible
9.4 Mapping query results
– Preservation of data types – usually not, but possible to deduce
– Storing keys and foreign keys
• Possible to represent relations within the table
– Embedding, ID/IDREF, XML schema
– Special features
• none
• XMLELEMENT creates an XML element
– Example: creating name and content
9.4 Publishing relational data
XMLELEMENT( NAME "City", 'Bad Oeynhausen' ) Creates
<City>Bad Oeynhausen</City>
– Can contain attributes, comments and other elements and options
<City>Bad Oeynhausen</City>
XMLELEMENT( NAME "City", XMLCOMMENT ( "Example 2" ), XMLATTRIBUTES('Bayern' AS "State",
'80469' AS "Zip" ),'München' ) Creates
<City State="Bayern" Zip="80469"><!– Example 2 -->
München</City>
• XMLELEMENT referencing the database
– Can be used directly from an SQL statement
9.4 Publishing relational data
SELECT XMLELEMENT( NAME "City",
XMLCOMMENT ( "Example 3" ),
XMLATTRIBUTES( "State", "Zip" AS "PLZ" ),
"City" )
• Creates
"City" ) FROM Cities WHERE …;
<City STATE="Niedersachsen" PLZ="38100">
<!– Example 3 -->
Braunschweig
</City>
• XMLELEMENT nesting
– Example
9.4 Publishing relational data
SELECT XMLELEMENT( NAME "City",
XMLELEMENT( NAME "Name", "City" ), XMLELEMENT( NAME "State", "State" ), XMLELEMENT( NAME "Zip", "Zip" ) )
• Creates
XMLELEMENT( NAME "Zip", "Zip" ) ) FROM Cities WHERE …;
<City>
<Name>Braunschweig</Name>
<State>Niedersachsen</State>
<Zip>38100</Zip>
</City>
• XMLELEMENT syntax diagram
9.4 Publishing relational data
• XMLFOREST
– Constructs a forest of elements without attributes
• Creates
9.4 Publishing relational data
SELECT XMLFOREST ( "City", "State" ) FROM Cities;
• Creates
<City>Braunschweig</City><State>Niedersachsen</State>
<City>Braunschweig</City><State>Niedersachsen</State>
<City>Braunschweig</City><State>Niedersachsen</State>
<City>Hannover</City><State>Niedersachsen</State>
• XMLFOREST syntax diagram
9.4 Publishing relational data
• XMLCONCAT
– Concatenates multiple XML fragments into a single XML pattern
– Compare outputs
9.4 Publishing relational data
SELECT XMLELEMENT("city", City) AS "CITY", SELECT XMLELEMENT("city", City) AS "CITY",
XMLELEMENT("zip", Zip) AS "ZIP",
XMLELEMENT("state", State) AS "STATE"
FROM Cities;
SELECT XMLCONCAT(
XMLELEMENT("city", CITY), XMLELEMENT("zip", ZIP),
XMLELEMENT("state", STATE) ) FROM Cities;
• XMLAGG
– Aggregates seperate lines of output into a single string
9.4 Publishing relational data
SELECT CITY, XMLAGG(
XMLELEMENT(NAME "Zip", Zip)) AS
"Zipcodes"
• Creates
City Zipcodes
Braunschweig <Zip>38100</Zip>
<Zip>38106</Zip>
Hannover <Zip>30159</Zip>
FROM Cities
GROUP BY City;
• XMLAGG
– Allows sorting
9.4 Publishing relational data
SELECT XMLAGG(
XMLELEMENT("address", Zip||' '||City) ORDER BY Zip DESC)
FROM Cities;
• Creates
– Disadvantage: Can only aggregate a single element, and thus fields are concatenated
FROM Cities;
<address>38106 Braunschweig</address>
<address>38100 Braunschweig</address>
<address>30159 Hannover</address>
9.1 Introduction
9.2 Mapping tables
9.3 Individual mapping
9. Producing XML
9.4 Mapping query results 9.5 Overview
9.6 References
• Conclusion
– Different methods to generate XML documents from stored information exist
• Standard mapping of tables to XML
– Standard XML document: fixed output format
• Individual mapping instructions
9. Producing XML
• Individual mapping instructions
– XML views on database contents
– XML query languages processed on views – Mapping using XSLT
• Standard mapping of query results
– Extend database queries by XML functionality – Standard or individual XML document
– Focus on relational and object relational databases
Introduction and Basics 1. Introduction
2. XML Basics
3. Schema Definition 4. XML Processing Querying XML
Producing XML 9. Producing XML Storing XML
10.XML storage
11. Relational XML storage
9.5 Overview
Querying XML
5. XPath & SQL/XML Queries
6. XQuery Data Model 7. XQuery
XML Updates
8. XML Updates & XSLT
11. Relational XML storage 12. Storage Optimization Systems
13. Technology Overview
• "XML und Datenbanken" [Tür08]
– Can Türker
– Lecture, University of Zurich, 2008
• Beginning XML Databases. [Pow07]
– Gavin Powell
Wiley & Sons, 2007, ISBN 0471791202
9.6 References
– Wiley & Sons, 2007, ISBN 0471791202
• "XML-Datenbanken", [Kud07]
– Thomas Kudraß
– Lecture, HTWK Leipzig, WS2007/2008
• "SQL/XML", [Mel05]
– Jim Melton,
– Oracle Corp. 2005
• XQuery und SQL/XML in DB2-Datenbanken:
Verwaltung und Erzeugung von XML- Dokumenten in DB2 [Moo08]
– Alfred Moos
9.6 References
– Vieweg+Teubner, 2008
• ISO/IEC 9075-14:2003 Information Technology -
Database Languages - SQL - Part 14: XML-Related Specifications (SQL/XML)
• DB2 SQL-Reference, IBM, March 2008 [IBM]
• "XML and Databases", [Scholl07]
– M. Scholl
– Lecture, Uni Konstanz, WS07/08
• "MySQL 9.1 Reference Manual", [MySQL]
9.6 References
• "MySQL 9.1 Reference Manual", [MySQL]
– Jon Stephens, MySQL Documentation Team – 2007
• "XML und Datenbanken", [KM03]
– M. Klettke, H. Meier – dpunkt.verlag, 2003