• Keine Ergebnisse gefunden

# What is a Tree?

N/A
N/A
Protected

Aktie "What is a Tree?"

Copied!
6
0
0

Volltext

(1)

## Similarity Search

Trees

Nikolaus Augsten

nikolaus.augsten@sbg.ac.at Department of Computer Sciences

University of Salzburg

http://dbresearch.uni-salzburg.at

WS 2021/22

Version October 26, 2021

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 1 / 21

## Outline

1 What is a Tree?

2 Encoding XML as Trees

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 2 / 21

What is a Tree?

## Outline

1 What is a Tree?

2 Encoding XML as Trees

What is a Tree?

## What is a Tree?

Graph: a pair (N,E) of nodesN and edgesE between nodes of N Tree: a directed, acyclic graph T

that is connected and

no node has more than one incoming edge Edges: E(T) are the edges of T

an edge (p,c)∈E(T) is an ordered pair with p,c∈N(T)

“Special” Nodes: N(T) are the nodes of T

parent/child: (p,c)∈E(T)⇔p is the parent of c, c is the child of p siblings: c1 and c2 are siblings if they have the same parent node root node: node without parent (no incoming edge)

leaf node: node without children (no outgoing edge) fanout: fanoutfvof node v is the number of children of v

(2)

What is a Tree?

## Unlabeled Trees

Unlabeled Tree:

the focus is on the structure, not on distinguishing nodes however, we need to distinguish nodes in order to define edges

⇒each node v has a unique identifier id(v) within the tree Example: T = ({1,3,5,4,7},{(1,3),(1,5),(5,4),(5,7)})

1

3 5

4 7

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 5 / 21

What is a Tree?

## Edge Labeled Trees

Edge Labeled Tree:

an edgee∈E(T) between nodes a and b is a triple e= (id(a),id(b), λ(e))

id(a) and id(b) are node IDs

λ(e) is the edge label (not necessarily unique within the tree) Example:

T = ({1,3,5,4,7},{(1,3,a),(1,5,b),(5,4,c),(5,7,a)})

• 1

• 3 a

• 5

• 4 c

• 7 a b

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 6 / 21

What is a Tree?

## Node Labeled Trees

Node Labeled Tree:

a node v∈N(T) is a pair (id(v), λ(v)) id(v) is unique within the tree

labelλ(v) needs not to be unique Intuition:

The identifier is the key of the node.

The label is the data carried by the node.

Example: T = ({(1,a),(3,c),(5,b),(4,c),(7,d)}, {(1,3),(1,5),(5,4),(5,7)}) (1,a)

What is a Tree?

## Notation and Graphical Representation

Notation:

node identifiers: id(vi) =i tree identifiers: T1,T2, . . . Graphical representation

we omit brackets for (identifier,label)-pairs we (sometimes) omit node identifiers at all we do not show the direction of edges (edges are always directed from root to leave)

unlabeled tree edge labeled tree node labeled tree

• •

a b

a

(3)

What is a Tree?

## Ordered Trees

Ordered Trees: siblings are ordered

contiguoussiblings s1<s2 have no sibling x such that s1<x<s2 ci is the i-th childof p if

p is the parent of ci, and

i =|{x∈N(T) : (p,x)∈E(T),x≤ci}|

Example:

Unordered Trees Ordered Trees a

c b d e f

= a d f e

b c

a c b d

e f

6

=6

=6

= a d f e

b c

Note: “ordered” does not necessarily mean “sorted alphabetically”

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 9 / 21

What is a Tree?

## Edit Operations

We assumeordered, labeled trees Rename node: ren(v,l0)

change labell of v tol06=l

Delete node: del(v) (v is not the root node) remove v

connect v’s children directly to v’s parent node (preserving order) Insert node: ins(v,p,k,m)

removemconsecutive children of p, starting with the child at position k, i.e., the children ck,ck+1, . . . ,ck+m1

insert ck,ck+1, . . . ,ck+m1 as children of the new node v (preserving order)

insert new node v ask-th child of p Insert and delete areinverseedit operations (i.e., insert undoes delete and vice versa)

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 10 / 21

What is a Tree?

## Example: Edit Operations

T0

v1,a v3,c v4,c v7,d

ins((v5,b),v1,2,2)

T1

v1,a v3,c v5,b

v4,c v7,d

ren(v4,x)

T2

v1,a v3,c v5,b

v4,x v7,d

ren(v4,c) del(v5,b)

Encoding XML as Trees

## Outline

1 What is a Tree?

2 Encoding XML as Trees

(4)

Encoding XML as Trees

## Representing XML as a Tree

Many possibilities – we will consider single-label tree

double-label tree

Pros/cons depend on application!

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 13 / 21

Encoding XML as Trees

## XML as a Single-Label Tree

The XML document is stored as a tree with:

XML element: node labeled with element tag name XML attribute: node labeled with attribute name

Text contained in elements/attributes: node labeled with the text-value Element nodes contain:

nodes of their sub-elements nodes of their attributes nodes with their text values Attribute nodes contain:

single node with their text value Text nodes are always leaves Order:

sub-element and text nodes are ordered

attributes are not ordered (approach: store them before all sub-elements, sort according to attribute name)

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 14 / 21

Encoding XML as Trees

## Example: XML as a Single-Label Tree

<article title=’pq-Grams’>

<author>Augsten</author>

<author>Boehlen</author>

<author>Gamper</author>

</article>

article

title author author author

Encoding XML as Trees

## XML as a Double-Label Tree

Node labels are pairs

The XML document is stored as a tree with:

XML element: node labeled with (tag-name,text-value) XML attribute: node labeled with (attribute-name,text-value) Element nodes contain:

nodes of their sub-elements and attributes Attribute nodes are always leaves

Element nodes without attributes or sub-elements are leaves Order:

sub-element nodes are ordered

(5)

Encoding XML as Trees

## Example: XML as a Double-Label Tree

<article title=’pq-Grams’>

<author>Augsten</author>

<author>Boehlen</author>

<author>Gamper</author>

</article>

(article,ε)

(title,pq-Grams) (author,Augsten) (author,Boehlen) (author,Gamper)

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 17 / 21

Encoding XML as Trees

## Example: Single- vs. Double-Label Tree

<xhtml>

<p>This is <b>bold</b> font.</p>

<xhtml>

Single-Label Tree Double-Label Tree xhtml

p This is b bold

font

(xhtml,ε)

(p,?)

(b,bold)

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 18 / 21

Encoding XML as Trees

## Parsing XML

We discuss two popular parsers for XML:

DOM – Document Object Model SAX – Simple API for XML

Encoding XML as Trees

## DOM – Document Object Model

W3C1 standard for accessing and manipulating XML documents Tree-based: represents an XML document as a tree

(single-label tree with additional node info, e.g. node type) Elements, attributes, and text values are nodes

DOM parsers load XML into main memory random access by traversing tree :-)

large XML documents do not fit into main memory :-(

1

(6)

Encoding XML as Trees

## SAX – Simple API for XML

“de facto” standard for parsing XML2

Event-based: reports parsing events (e.g., start and end of elements) no random access :-(

you see only one element/attribute at a time you can parse (arbitrarily) large XML documents :-) Java API available for both, DOM and SAX

For importing XML into a database: use SAX!

2http://www.saxproject.org

Augsten (Univ. Salzburg) Similarity Search WS 2021/22 21 / 21

Referenzen

ÄHNLICHE DOKUMENTE

Connotea and Del.icio.us both have lower numbers of re- source related tags, while Flickr and Youtube have more resource descriptive tags.. Subject related tags were categorized

In contrast to previous works that use smooth generalized cylinders to represent tree branches, our method generates realistic looking tree models with complex branching geometry

The decision which label to take next is based on a probability distribution defined over all labels reducing the number of components implied by the labels (and corresponding edges)

In this work we present exact mixed integer programming approaches including branch-and-cut and branch-and-cut-and-price for the minimum label spanning tree problem as well as a

We present exact mixed integer programming approaches including branch-and-cut and branch- and-cut-and-price for the minimum label spanning tree problem as well as a variant of

An Evolutionary Algorithm with Solution Archive for the Generalized Minimum Spanning..

The R-tree [3] and the R*-tree [4], spatial access methods with a hierarchically structured directory that use minimum bounding rectangles (MBRs) as page regions, have primarily

Assuming that the KDB-tree shown in Figure 4.3 has a limit of 8 index nodes and all the records are stored in a bucketlist, a query counter is implemented in every index

Whenever the data necessary for the construction by the clustered algorithm completely fits into main memory and the string is too large for Ukkonen’s algorithm, the clustered

If truth, accountability, and victim recognition often result in further estrangement among different groups (if only because they may desta- bilize an already po liti cally

When a node has data which needs to be collected, it sends the data up the tree, and it forwards collection data that other nodes send to it.” [TinyOS TEP 119]..

Sorting procedures that are based on comparison require in the worst case and on average at least Ω(n log n) key comparisons.. Comparison

We now proceed to study the expressiveness, stability, and run-time performance of approximate tree kernels in real-world applications, namely supervised learning tasks dealing

• A source predicate is projected to a target token if all of the following con- ditions are fulfilled: (1) the English predicate is a verb or its roleset has a link to a verb

Vorteile Tiefensuche von der Wurzel zum aktuellen Knoten wird die Baumprojektion einfach durchgereicht Nachteile Tiefensuche. passt am Anfang nicht in Hauptspeicher,

(single-label tree with additional node info, e.g. node type) Elements, attributes, and text values are nodes. DOM parsers load XML into

insert new node v as k-th child of p Insert and delete are inverse edit operations (i.e., insert undoes delete and vice versa).

After Chapter 3, in which we discuss sev- eral known connections between hyperbolic graphs (together with their hyperbolic boundary) and trees (together with their boundary) and

The minimum spanning tree will only used the dashed edges and therefore have a cost of 2(k −1) whereas the optimal Steiner tree will use the solid edges and have a cost of k.. 3

a The weak equivalence to LIGs, CCGs and HGs holds for TAGs with local constraints (on tree adjoining) as formally introduced e.g.. in Vijay–Shanker &amp; Joshi 1985, following

Prediction Rule The prediction rule is a method to generate a prediction from the (possibly conflicting) target values of the training examples in a leaf node.. In RT, the method

By contrast, if we blur the line between constituent and constituted powers, as Mortati and Vedel do, it may allow us to call for a stronger role of citizens in ordinary politics,