• Keine Ergebnisse gefunden

2. XML Basics

N/A
N/A
Protected

Academic year: 2021

Aktie "2. XML Basics"

Copied!
10
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

XML Databases

2. XML Basics, 03.11.08

Silke Eckstein Andreas Kupfer

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

2.1 Introduction 2.2 XML Formalization 2.3 Well-Formedness 2.4 XML Text Declarations 2.5 Namespaces

2.6 Overview 2.7 References

2

2. XML Basics

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

• Structure of XML documents

–XML prolog

–Document Type Definition (DTD) –Document Instance

–Have to be well-formed (see later)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 3

2.1 Introduction

<Bücher>

<Buch>

<Autor id="1234567890">Rainer Eckstein</Autor>

<Autor id="1234568723">Silke Eckstein</Autor>

<Titel>XML und Datenmodellierung</Titel>

<Untertitel>XML-Schema ...</Untertitel>

<Verlag id="3-89864">dpunkt.Verlag</Verlag>

</Buch>

</Bücher>

• A document instance is a set of tags that is customized to represent the content, e.g.:

<Autor>Silke Eckstein</Autor>

<Titel>XML und Datenmodellierung</Titel>

• New types of queries may require new tags: No problem for XML!

–Resulting set of tags forms a new markup language (XML dialect).

Alltags need to appear in properly nestedpairs (e.g.,

<t> . . . <s> . . . </s>. . . </t>).

• Tags can be freely nested to reflect the logical structure of the content.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 4 [Scholl07]

2.1 Introduction

• XML comes with a number of additional constructs which allow us to convey even more useful information, e.g.:

Attributesmay be used to qualify tags (avoid the so-called tag soup).

Instead of

•<question> Is it okay ...? </question>

<angry> Now I'm ... </angry>

use

•<bubble tone="question">Is it okay ...?</bubble>

<bubble tone="angry">Now I'm ...</bubble>

2.1 Introduction

• More additional constructs:

Referencesestablish links internal to an XML document:

Establish link target:

•<character id="phb">The Pointy-Haired Boss</character>

Reference the target:

•<bubble speaker="phb">Speed is the key to success.</bubble>

2.1 Introduction

(2)

2.1 Introduction 2.2 XML Formalization

Elements

Attributes

Entities

Miscellaneous

General structure

2.3 Well-Formedness 2.4 XML Text Declarations 2.5 Namespaces

2.6 Overview 2.7 References

7

Outline

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

• We will now try to approach XML in a slightly more formal way.

• This discussion will be based on the central XML technical specification:

–Extensible Markup Language (XML) 1.1 (Second Edition) W3C Recommendation Aug 2006

(http://www.w3.org/TR/xml11)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 8 [Scholl07]

2.2 XML Formalization

Visit the W3C site

This lecture does not try to be a "guided tour" through the XML-related W3C technical documents (boring!).

Instead we will cover the basic principles and most interesting ideas. Visit the W3C site and use the original W3C documents to get a full grasp of their contents.

Elements

–… are the basic modules of XML documents –… consist of a start-and an end-tagwith the

element content in between

–… may also be empty(with an empty-element tag then) –… may be nested, which leads to hierarchical structure of

XML documents

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 9 [Scholl07]

2.2 XML Formalization

Well-formed XML (fragments):

<foo> okay </foo>

<This-is-a-well-formed-XML-tag.> okay

</This-is-a-well-formed-XML-tag.>

<foo>okay</foo>

<foo/>

Non-well-formed XML:

<foo> oops </bar>

<foo> oops </Foo>

<foo> oops ... ‹EOT›

• Elements – examples:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 10

2.2 XML Formalization

Nested element:

<address>

<street> Rudower Chaussee </street>

<no> 25 </no>

<zip> 12489 </zip>

<city> Berlin </city>

</address>

Simple element:

<city> Berlin </city>

Empty element:

<fax/>

• Element content may contain document characters as well as properly nested elements (so-called mixed content):

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 11 [Scholl07]

2.2 XML Formalization

Well-formed XML

<foo><bar>

<baz> okay </baz>

</bar>

<ok> okay </ok> still okay

</foo>

Non-well-formed XML

<foo><bar> oops </foo></bar>

<foo><bar> oops </bar><bar> oops </foo></bar>

• Element nesting establishes a parent-child relationship between elements:

–In the XML fragment <p> <c> . . . </c>. . . <c'> . . . </c'> </p>,

element p is the parentof elements c; c',

elements c; c' are childrenof element p,

elements c; c' are siblings.

• There is exactly one element that encloses the whole XML content: the root element.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 12 [Scholl07]

2.2 XML Formalization

Non-well-formed XML

<one> one eins un </one>

<two> two zwei deux </two>

(3)

Attributes

–… may specify further properties of elements –… may not be nested

–… are not considered to be children of the containing element (instead they are ownedby the containing element)

–Attribute valuesare restricted to character data.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 13 [Scholl07]

2.2 XML Formalization

Well-formed XML (fragments)

<price currency="Euro"> 23.45 </price>

<price>

<currency> Euro </currency>

23.45

</price>

• An Element can contain each attribute only once:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 14 [Scholl07]

2.2 XML Formalization

Non-well-formed XML

<Team person='Erna' person='Hugo' person='Agnes'/>

Well-formed XML (fragments)

<Team persons='Erna Hugo Agnes'/>

<Team person1='Erna' person2='Hugo' person3='Agnes'/>

<Team>

<Person>Erna</Person>

<Person>Hugo</Person>

<Person>Agnes</Person>

</Team>

Entities

–In XML, document contentand markupare specified using a single set of characters.

–Characters { <, >, &, ", ' } form pieces of XML markup, they may be denoted by predefined entities to represent content:

–The XML entity facility is actually a versatile recursive macro expansion machinery (more on that later).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 15 [Scholl07]

2.2 XML Formalization

Character Entity

< &lt;

> &gt;

& &amp;

" &quot;

' &apos;

Well-formed XML:

<operators>

Valid comparison operators are

&lt;, =, &amp; &gt;.

</operators>

CDATA sections

–… may occur anywhere where character data may occur.

–… are used to escape blocks of text containing characters which would otherwise be recognized as markup.

–Within a CDATA section, only the string ']]>' is recognized as markup

•left angle brackets and ampersands may occur in their literal form;

•they need not (and cannot) be escaped using "&lt;" and "&amp;".

–CDATA sections cannot nest.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 16 [XML06]

2.2 XML Formalization

Well-formed XML (fragments)

<![CDATA[<greeting>Hello, world!</greeting>]]>

Comments

–… may appearanywhere in a document outside other markup

–… may not end with '--->'

2.2 XML Formalization

Well-formed XML (fragments)

<!-- declarations for <head> & <body> -->

Non-well-formed XML

<!-- B+, B, or B--->

2.1 Introduction 2.2 XML Formalization 2.3 Well-Formedness 2.4 XML Text Declarations 2.5 Namespaces

2.6 Overview 2.7 References

Outline

(4)

• The W3C XML recommendation is actually more formal and rigid in dening the syntactical structure of XML:

–"A textual object is well-formedXML if, 1. Taken as a whole, it matches the production labeled

"document".

2. It meets all the well-formedness constraints given in this [the W3C XML Recommendation] specification. . . . "

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 19 [Scholl07]

2.3 Well-Formedness

Well-formedness #1: Context-free Properties

–All context-freeproperties of well-formed XML

documents are concisely captured by a grammar (using an EBNF-style notation).

Grammar: system of production (rule)s of the form lhs ::= rhs

–Excerpt of the XML grammar (see next pages):

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 20 [Scholl07]

2.3 Well-Formedness

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 21

No lhs rhs

[1] Document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) [2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

[2a] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-

#x9F]

[3] S ::= (#x20 | #x9 | #xD | #xA)+

[4] NameStartChar ::=

":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-

#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

[5] Name ::= NameStartChar (NameChar)*

[10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)

2.3 Well-Formedness

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 22

No lhs rhs

[22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)?

[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')

[25] Eq ::= S? '=' S?

[26] VersionNum ::= '1.1'

[27] Misc ::= Comment | PI | S

[39] element ::= EmptyElemTag | STag content ETag [40] STag ::= '<' Name (S Attribute)* S? '>' [41] Attribute ::= Name Eq AttValue [42] ETag ::= '</' Name S? '>'

[43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*

[44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [67] Reference ::= EntityRef | CharRef [68] EntityRef ::= '&' Name ';'

N.B.

–The numbers in [] refer to the correspondig productions in the W3C XML Recommendation.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 23 [Scholl07]

2.3 Well-Formedness

Expression … … denotes

r* ε, r,rr,rrr, … zero or more repititions of r

r+ rr* one or more repititions of r

r? r|ε optionalr

[abc] a|b|c character class

[^abc] inverted character class

Remarks

–As usual, the XML grammar may systematically be transformed into a program, an XML parser, to be used to check the syntax of XML input.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 24 [Scholl07]

2.3 Well-Formedness

Rule … … implements this characteristic of XML:

[1] an XML document contains exactly one root element [10] attribute values are enclosed in " or '

[22] XML documents have to include a declaration prolog [14] characters < and & may not appear literally in element content [43] element content may contain character data and entity references as well as

nested elements

[68] entity references may contain arbitrary entity names (other than lt, amp, . . . )

(5)

Parsing XML

1. Starting with the symbol document, the parser uses the lhs ::= rhs rules to expand symbols, constructing a parse tree.

2. The leaves of the parse tree are characters which have no further expansion.

3. The XML input is parsedsuccessfully if it perfectly matches the parse tree's front(concatenate the parse tree leaves from left to right).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 25 [Scholl07]

2.3 Well-Formedness

speaker

Example 1

–Parse tree for XML input

<?xml … ?> <bubble speaker="phb">Um... No.</bubble>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 26 [Scholl07]

2.3 Well-Formedness

document

prolog element Misc*

STag

bubble Name

< Attribute S?>

ε

S

[

S

S?

Eq

=S?

ε ε

Name AttValue

"phb"

content

CharData Um… No.

STag

bubble Name

</ S?>

ε ε

Example 2

–Parse tree for the \minimal" XML document

<?xml version="1.1"?><foo/>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 27 [Scholl07]

2.3 Well-Formedness

document

prolog XMLDecl

<?xml VersionInfo EncodingDecl? ε

Misc*

[

version S

?>

S?

S?

Eq"

= 1.1

VersionNum

element Misc*

S?

ε ε

" ε ε

EmptyElemTag

foo Name

ε

< (S Attribute)* S?/>

ε

[

S

[

S

[[ [[

Well-formedness #2: Context-dependent

Properties

–The XML grammar cannot enforce all XML well-formedness constraints(WFCs).

–Some XML WFCs depend on

1. what the XML parser has seen before in its input, or

2. on a global state, e.g., the denitions of user-declared entities.

–These WFCs cannot be checked by simply comparing the parse tree front against the XML input (context- dependent WFCs).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 28 [Scholl07]

2.3 Well-Formedness

Sample WFCs

–All 10 XML WFCs are given in http://www.w3.org/TR/REC-xml

2.3 Well-Formedness

WFC Comment

(2) Element Type Match The Name in an element's end tag must match the element name in the start tag.

(3) Unique Att Spec No attribute name may appear more than once in the same start tag or empty element tag.

(5) No < in Attribute Values The replacement text of any entity referred to directly or indirectly in an attribute value (other than &lt;) must not contain a <.

(9) No Recursion A parsed entity must not contain a recursive reference to itself, either directly or indirectly.

2.1 Introduction 2.2 XML Formalization 2.3 Well-Formedness 2.4 XML Text Declarations 2.5 Namespaces

2.6 Overview 2.7 References

Outline

(6)

• The XML Text Declaration <?xml. . . ?>

–A well-formed XML 1.1 document has to start with a header, the text declaration (grammar rule [23]):

XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' –VersionInfo:

•An XML document whose text declaration carries a VersionInfo of version="1.1" is required to conform to W3C's XML Recommendation posted on August 16, 2006 (see http://www.w3.org/TR/xml11).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 31 [Scholl07]

2.4 XML Text Declarations

• For XML 1.0 this header is optional

–I.e. the following is an XML 1.0 document

because it does not have an XML declaration:

•<greeting>Hello, world!</greeting>

• Encoding declaration:

–Documents that use an encoding other than UTF-8 or UTF-16 (see below) must announce so using the XML text declaration, e.g.

•<?xml version="1.1" encoding="iso-8859-15"?> or

•<?xml version="1.1" encoding="utf-32"?>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 32 [Scholl07]

2.4 XML Text Declarations

XML Documents and Character Encoding

–For a computer, a character like X is nothing but an 8 (16/32) bit number whose value is interpreted as the character X when needed (e.g., to drive a display).

–Trouble is, a large number of such number character mapping tables, the so-called encodings, are in parallel use today.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 33 [Scholl07]

2.4 XML Text Declarations

… XML Documents and Character Encoding

–Due to the huge amount of characters needed by the

global computing community today (Latin, Hebrew, Arabic, Greek, Japanese, Chinese . . . languages), conflicting intersectionsbetween encodings are common.

–Example:

•0xa4 0xcb 0xe4 0xd3 €Λ δ∑

•0xa4 0xcb 0xe4 0xd3 € Ë ä Ó

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 34 [Scholl07]

2.4 XML Text Declarations

iso-8859-7 iso-8859-15

Unicode

–The UnicodeInitiative aims to define a new encoding that tries to embrace all character needs:

•characters of "all" languages of the world,

•plus scientific, mathematical, technical, box drawing, . . . symbols

(see http://www.unicode.org/charts/).

–Range of the Unicode encoding:

0x0000 – 0x10FFFF (17 * 65536 characters).

•Codes that fit into the first 16 bits (denoted U+0000 - U+FFFF) have been assigned to encode the most widely used languages and their characters (Basic Multilingual Plane, BMP).

•Codes U+0000 - U+007F have been assigned to match the 7-bit ASCII encoding which is pervasive today.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 35 [Scholl07]

2.4 XML Text Declarations

Unicode Transformation Formats

–Current CPUs operate most efficiently on 32-bit words (16- bit words, 8-bit bytes).

–Unicode thus developed Unicode Transformation Formats (UTF) which define how a Unicode character code between U+0000 – U+10FFFF is to be mapped into a 32-bit word (16- bit words, 8-bit bytes).

UTF-32(map a Unicode character into a 32-bit word) 1. Map any Unicode character in the range U+0000 – U+10FFFF

to the corresponding 32-bit value 0x00000000 – 0x0010FFFF.

2. N.B. For each Unicode character encoded in UTF-32 we waste at least 11 zero bits.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 36 [Scholl07]

2.4 XML Text Declarations

(7)

UTF-16(map a Unicode character into one or two 16-bit words) 1. Apply the following mapping scheme:

2. For the range U+000000 – U+00FFFF, simply fill the

positions with the 16 bit of the character code.

(Code ranges U+D800 – U+DBFF and U+DC00 – U+DFFF are unassigned!)

3. For the U+010000 – U+10FFFF range, subtract 0x010000 from the character code and fill the positions using the resulting 20-bit value.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 37 [Scholl07]

2.4 XML Text Declarations

Unicode range Word sequence U+000000 – U+00FFFF 

U+010000 – U+10FFFF 110110110111

Example

Unicode character U+012345 (0x012345 – 0x010000 = 0x02345):

UTF-16: 1101100000001000 1101111101000101

UTF-8

–UTF-16 is designed to facilitate efficient and robust decoding:

•If we see a leading 11011 bit pattern in a 16-bit word, we know it is the first orsecond word in a UTF-16 multi-word sequence.

•The sixth bit of the word then tells us if we actually look at the first or second word.

–UTF-8 (map a Unicode character into a sequence of 8-bit bytes)

•UTF-8 is of special importance because

a) a stream of 8 bit bytes (octets) is what flows over an IP network connection,

b) text-processing software today is built to deal with 8 bit character encodings (iso-8859-x, ASCII, etc.).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 38 [Scholl07]

2.4 XML Text Declarations

UTF-8 encoding

1. Apply the following mapping scheme:

2. The spare bits () are filled with the bits of the character code to be represented (rightmost is least significant bit, pad to the left with 0-bits).

Examples:

Unicode character U+00A9 (© sign):

UTF-8: 11000010 10101001 (0xC2 0xA9)

Unicode character U+2260 (math relation symbol ≠):

UTF-8: 11100010 10001001 10100000 (0xE2 0x89 0xA0)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 39 [Scholl07]

2.4 XML Text Declarations

Unicode range Word sequence U+000000 – U+00007F 0

U+000080 – U+0007FF 11010

U+000800 – U+00FFFF 11101010

U+010000 – U+10FFFF 111101010

Advantages of UTF-8 encoding

–For a UTF-8 multi-byte sequence, the length of the sequenceis equal to the number of leading 1-bits (in the first byte), e.g.:

11100010 10001001 10100000

(Only single-byte UTF-8 encodings have a leading 0-bit.) –Character boundaries are simple to detect (even when

placed at some arbitrary position in a UTF-8 byte stream).

–UTF-8 encoding does not affect (binary) sort order.

–Text processing software which was originally developed to work with the pervasive 7-bit ASCII encoding remains functional. This is especially true for the C programming language and its string (char[]) representation.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 40 [Scholl07]

2.4 XML Text Declarations

XML and Unicode

–A conforming XML parser is required to correctly process UTF-8 and UTF-16 encoded documents. (The W3C XML Recommendation predates the UTF-32 definition).

–Documents that use a different encoding must announce so using the XML text declaration, e.g.

<?xml encoding="iso-8859-15"?> or <?xml encoding="utf-32"?>

–Otherwise, an XML parser is encouraged to guessthe encoding while reading the very first bytes of the input XML document:

2.4 XML Text Declarations

Head of doc Encoding guess 0x00 0x3C 0x00 0x3F UTF-16 (big-endian) 0x3C 0x00 0x3F 0x00 UTF-16 (little-endian)

0x3C 0x3F 0x78 0x6D UTF-8 (or ASCII, iso-8859-?: erroneous) (Notice: < = U+003C, ? = U+003F, x = U+0078, m = U+006D)

2.1 Introduction 2.2 XML Formalization 2.3 Well-Formedness 2.4 XML Text Declarations 2.5 Namespaces

2.6 Overview 2.7 References

Outline

(8)

• Insertion: Universal Resource Identifiers

–URL (Universal Resource Locator): resolvable identifier on the Web

The target of an URL pointer isan HTML file (virtual or materialized) –URI (Universal Resource Identifier): general purpose key to

resources on the Web

Uniquely identifies a resource

Target is not an HTML file, can be anything (schema, table, file, entity, object, tuple, person, physical item, etc.)

Lifetime and scope of this "key" is user dependent –IRI (Internationalized Resource Identifiers)

Allow non Latin characters (Chinese, Arabic, Japanese, etc.) –URL, URI, IRI

All strings

Very LONG strings

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 43 [Fisch05]

2.5 Namespaces

• How the web does work

–Individually created documents linked by ambiguous references

• How the web should work –Global database of knowledge

• Key to doing that is to permit distributed knowledge creation and lazy integration

• Problems

–Vocabulary collisions –Joins

• Namespaces

–Build on URI / IRI notion

–Make it possible to uniquely qualify intra-document name collisions

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 44 [Lag05]

2.5 Namespaces

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 45 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<Book>

<ISBN>0743204794</ISBN>

<author>Kevin Davies</author>

<title>Cracking the Genome</title>

<price>20.00</price>

</Book>

<?xml version=“1.1” encoding=“UTF-8”?>

<html>

<head>

<title>My home page</title>

</head>

<body>

<p>My hobby</p><p>My books</p>

</body>

</html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 46 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<html>

<head>

<title>My home page</title>

</head>

<body>

<p>My hobby</p>

<p>My books

<Book>

<ISBN>0743204794</ISBN>

<author>Kevin Davies</author>

<title>Cracking the Genome</title>

<price>20.00</price>

</Book>

</p>

</body>

</html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 47 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<xhtml:html>

<xhtml:head>

<xhtml:title>My home page</xhtml:title>

</xhtml:head>

<xhtml:body>

<xhtml:p>My hobby</xhtml:p>

<xhtml:p>My books

<bo:Book>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

<bo:title>Cracking the Genome</bo:title>

<bo:price>20.00</bo:price>

</bo:Book>

</xhtml:p>

</xhtml:body>

</xhtml:html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 48 [Lag05]

2.5 Namespaces

bo xhtml

bo:Book

bo:title

bo:author bo:price

bo:ISBN

xhtml:html xhtml:head

xhtml:body xhtml:p xhtml:title

vocabulary bo vocabulary xhtml But who guarantees uniqueness of prefixes?

(9)

Give prefixes only local relevance in an instance document

Associate local prefix with global namespace

name

–a unique name for a namespace

–uniqueness is guaranteed by using a URI in domain of the party creating the namespace

–doesn’t have any meaning, i.e. doesn’t have to resolve into anything

An XML namespace is a collection of names,

identified by a URI reference, which are used in

XML documents as element types and attribute names.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 49 [Lag05]

2.5 Namespaces

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 50 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<xhtml:html

xmlns:xhtml=“http://www.w3c.org/1999/xhtml”

xmlns:bo=“http://www.nogood.com/Book”>

<xhtml:head>

<xhtml:title>My home page</xhtml:title>

</xhtml:head>

<xhtml:body>

<xhtml:p>My hobby</xhtml:p>

<xhtml:p>My books

<bo:Book>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

………

</bo:Book>

</xhtml:p>

</xhtml:body>

</xhtml:html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 51 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<html

xmlns=“http://www.w3c.org/1999/xhtml”

xmlns:bo=“http://www.nogood.com/Book”>

<head>

<title>My home page</title>

</head>

<body>

<p>My hobby</p>

<p>My books

<bo:Book>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

………

</bo:Book>

</p>

</body>

</html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 52 [Lag05]

2.5 Namespaces

<?xmlversion=“1.0” encoding=“UTF-8”?>

<html

xmlns=“http://www.w3c.org/1999/xhtml”>

<head>

<title>My home page</title>

</head>

<body>

<p>My hobby</p>

<p>My books

<bo:Bookxmlns:bo=“http://www.nogood.com/Book”>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

………

</bo:Book>

</p>

</body>

</html>

2.5 Namespaces

<?xml version=“1.0” encoding=“UTF-8”?>

<html

xmlns=“http://www.w3c.org/1999/xhtml”>

<head>

<title>My home page</title>

</head>

<body>

<p>My hobby</p>

<p>My books

<Book xmlns=“http://www.nogood.com/Book”>

<ISBN>0743204794</ISBN>

<author>Kevin Davies</author>

………

</Book>

</p>

</body>

</html>

2.1 Introduction 2.2 XML Formalization 2.3 Well-Formedness 2.4 XML Text Declarations 2.5 Namespaces

2.6 Overview 2.7 References

Outline

(10)

1. Introduction 2. XML Basics 3. Schema definition 4. XML query languages I 5. Mapping relational data

to XML 6. SQL/XML 7. XML processing

8. XML query languages II 9. XML storage I 10. XML storage - index 11. XML storage - native 12. Updates / Transactions 13. Systems

14. XML Benchmarks

2.6 Overview

55 XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

http://www.w3.org/ [W3C]

Extensible Markup Language (XML) 1.1 (2nd Edition) [XML06]

–W3C Recommendation 16 August 2006, edited in place 29 September 2006

–http://www.w3.org/TR/xml11

M. Scholl, "XML and Databases", Lecture, Uni Konstanz, WS07/08 [Scholl07]

Carl Lagoze, "Architecture of Web Information Systems", Cornell University, Spring 05, [Lag05]

http://www.cs.cornell.edu/Courses/cs431/2005sp/syllabus.htm

56

2.7 References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

• XML in a Nutshell [HM04]

–Harold & Means

–O'Reilly, 2004, ISBN 0596007647

• The Unicode Standard, Version 5.0

–The Unicode Consortium

(http://www.unicode.org/) ) –Addison-Wesley; 5th edition, 2006

ISBN:0321480910

• Peter Fischer, "XML und Datenbanken", Lecture, ETH Zürich, WS 05/06 [Fisch05]

57

2.7 References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

• Now, or ...

• Room: IZ 232

• Office our: Tuesday, 12:30 – 13:30 Uhr or on appointment

• Email: eckstein@ifis.cs.tu-bs.de

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 58

Questions, Ideas, Comments

Referenzen

ÄHNLICHE DOKUMENTE

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 6 [Gru08]... • Relational

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 11 [Gru08]... •

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 19 [Gru08]!. 12.3

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 6 [Tür08]..

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 5 [Tür08].. 13.2

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 5 [Fisch05]?.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 4 [Scholl07]..

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 4 [Kud07]...