• Keine Ergebnisse gefunden

Algorithm Engineering „Parallele Algorithmen“

N/A
N/A
Protected

Academic year: 2021

Aktie "Algorithm Engineering „Parallele Algorithmen“"

Copied!
65
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Algorithm Engineering

„Parallele Algorithmen“

Stefan Edelkamp

(2)

Übersicht

Parallele Externe Suche

Parallele Verspätete Duplikatselimination

Parallele Expansion

Verteilte Sortierung

Parallele Strukturierte Duplikatselimination

Disjunkte Duplikatserkennungsbereiche

”Schlöser”

Parallele Algorithmen

Matrix-Multiplikation

List Ranking

Euler Tour

(3)

Verteilte Suche

Distributed setting provides more space.

Experiments show that internal time dominates I/O.

(4)

Exploiting Independence

Since each state in a Bucket is independent of the other – they can be expanded in

parallel.

Duplicates removal can be distributed on different

processors.

Bulk (Streamed) transfers much better than single ones.

(5)

Distributed Queue for Parallel Best- First Search

P0

P1

P2

<15,34, 0, 100>

<g, h, start byte, size>

<15,34, 20, 100>

TOP

<15,34, 40, 100>

<15,34, 60, 100>

(6)

Multiple Processors - Multiple Disks Variant

Sorted

buffers w.r.t the hash val Sorted Files

P1 P2 P3 P4

Divide w.r.t the hash ranges Sorted

buffers from every

processor Sorted File

h0 ….. hk-1 hk ….. hl-1

(7)

Parallel External A*

(8)

Parallel External A*

(9)
(10)

Distributed Heuristic Evaluation

Assume one child processor for each tile one master processor

B3 B1 B2

B8

B4 B5 B6 B7 B9 B10 B11 B12 B13 B14 B15 B0

B3 B1 B2

B8

B4 B5 B6 B7 B9 B10 B11 B12 B13 B14 B15 B0

(11)

Distributed Pattern Database Search

Only pattern databases that include the client tile need to be loaded on the client

Because multiple tiles in pattern, from birds eye PDB loaded multiple times

In 15-Puzzle with corner and fringe PDB this saves RAM in the order of factor 2 on each machine, compared to loading all

In 36-Puzzle with 6-tile pattern databases this saves RAM in the order of factor 6 on each machine, compared to loading all

Extends to additive pattern databases

(12)

Distributed

Heuristic Evaluation

(13)

Same bottleneck in external-memory search

Bottleneck: Duplicate detection

Duplicate paths cause parallelization overhead

A

C D

BB

C DDDD

Internal memory External memory vs.

fast slow

A

(14)

Disjoint duplicate-detection scopes

B1

B0 B4

B0 B1 B2 B3

B8

B4 B5 B6 B7 B9 B10 B11 B12 B13 B14 B15 B0 B1

B4

B3 B2

B7

B2

B3 B7

B12 B8

B13 B14 B15 B11 B8

B12 B13 B15 B11 B14

(15)

Finding disjoint duplicate-detection scopes

B1

B0 B4

0 0 0 0

0

0 0 0 0

0 0 1

0 0 0 0

0 1 1

0 2

1

B2

B3 B7

0 1 0

B8

B12 B13 B15 B11 B14

1

2 2

01 2

2 2

2 1 2

2

2 2

2

0 1

1 1

0

1 0

2

3 3

2 B1

B5 B6 B4 B9

2

3

3

4 3

3

(16)

Implementation of Parallel SDD

Hierarchical organization of hash tables

One hash table for each abstract node

Top-level hash func. = state-space projection func.

Shared-memory management

Minimum memory-allocation size m

Memory wasted is bounded by O(m#processors)

External-memory version

I/O-efficient order of node expansions

I/O-efficient replacement strategy

Benötigt nur ein Mutex

“Schloss”

B3 B1 B2

B8

B4 B5 B6 B7 B9 B10 B11 B12 B13 B14 B15 B0

(17)

Parallelle Matrix-

Multiplication

(18)
(19)

Parallele Matrix

Multiplication

(20)

Exklusives Schreiben

(21)

Parallele Kopien

(22)

Fazit Matrix

Multiplication

(23)

Paralleles List Ranking

(24)

List

Ranking

(25)

Erster Algorithmus

(26)

Prinzip

(27)

Komplexität

(28)

Verbesserungen

(29)

Strategie

(30)

Unabhängige Mengen

(31)

2-Färbung

(32)

Reduktion

(33)

Restauration

(34)

Beispiel

(35)

Variablen

(36)

Beispiel (ctd.)

(37)

Pseudo Code

(38)

Nächster Schritt

(39)

Analyse

(40)

Backup

(41)

Algo

(42)

Algo

(43)

Speicher

(44)

Analyse

(45)

Ausblick:

Randomisiert in O(n) whp?

(46)

Probleme mit DFS

(47)

Idee Euler Tour

(48)

Parallel DFS

(49)

DFS

Nummern

(50)

Allgemein

(51)

Allgemein

(52)

Allgemein

(53)

Beispiel

(54)

Ein Zyklus oder

mehrere?

(55)

Korrektheit

(56)

Korrektheit

(57)

Beispiel

(58)

Konstruktion Euler

Tour

(59)

Fazit Euler Touren

(60)

GPU Architektur

(61)

Effektivität

(62)

Hierarchischer Speicher

(63)

Hash-based Partitioning

(64)

BFS

(65)

Kernel Functions

Referenzen

ÄHNLICHE DOKUMENTE

The Walkabout class has just one method, visit, which takes an argument of type Object.. Replacing the reflection code with pseudo-code yields the informal description of the class

This study posits that governments will be more likely to repress challenges when they use violence, occur in urban areas, target the government, make political demands, or

Define the back-prop learning rule for a multilayer perceptron that also allows connec- tions (edges between neurons) between non-adjacent layers. The multilayer perceptron consists

pos($str): Position des Zeichens nach aktuellem match length($str ): L¨ ange von $str. $&amp;:

Strategy: A decorator lets you change the skin of an object; a strategy lets you change the guts. These are two alternative ways of changing

between Colleague objects. It is aware of all the  Colleagues and their purpose with regards to inter Colleagues and their purpose with regards to inter 

More precisely, the field of pattern formation focuses on systems where the nonlinearities conspire to form spatial patterns that sometimes are stationary, travelling or

The following two patterns describe combinations of pattern transitions among several subspaces and open the space for attributes that can be considered as being redundant; that