• Keine Ergebnisse gefunden

Massively Parallel Algorithms

N/A
N/A
Protected

Academic year: 2021

Aktie "Massively Parallel Algorithms"

Copied!
76
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Massively Parallel Algorithms

Parallel Prefix Sum And Its Applications

G. Zachmann

University of Bremen, Germany

(2)

§

Remember the reduction operation

§ Extremely important/frequent operation → Google's MapReduce

§

Definition prefix sum:

Given an input sequence

,

the (inclusive) prefix sum of this sequence is the output sequence where is an arbitrary binary associative operator.

The exclusive prefix sum is

whereιis the identity/zero element, e.g., 0 for the + operator.

§

The prefix sum operation is sometimes also called a scan (operation)

A = (a

0

, a

1

, a

2

, . . . , a

n 1

)

Aˆ = (a0, a1 a0,a2 a1 a0, . . .,an 1 · · · a0)

0 = (◆, a0,a1 a0, . . .,an 2 · · · a0)

(3)

§

Example:

§ Input:

§ Inclusive prefix sum:

§ Exclusive prefix sum:

§

Further variant: backward scan

§

Applications: many!

§ For example: polynomial evaluation (Horner's scheme)

§ In general: "What came before/after me?"

§ "Where do I start writing my data?"

§

The prefix sum problem appears to be "inherently sequential"

A = (3 1 7 0 4 1 6 3)

Aˆ = (3 4 11 11 15 16 22 25) Aˆ0 = (0 3 4 11 11 15 16 22)

(4)

Variation: Segmented Scan

§

Input: segments of numbers in one large vector

§

Task: scan (prefix-sum) within each segment

§

Output: prefix-sums for each segment, in one vector

§

Forms the basis for a wide variety of algorithms:

§ E.g., Quicksort, Sparse Matrix-Vector Multiply, Convex Hull

§ Note: take care to store the flags array space- and

bandwidth-efficient! (one int per flag is very in-efficient)

3

0 0 7 7 0 1 7

1

3 7 0 4 1 6 3

0

1 1 0 0 1 0 0 Flags array

(head-tail flags,

"1" = new segment) Payload data

(5)

Application from "Everyday" Life

§

Given:

§ A 100-inch sandwich

§ 10 persons

§ We know how many inches each person wants: [3 5 2 7 10 4 3 0 8 1]

§

Task: cut the sandwich quickly

§

Sequential method: one cut after another

(3 inches first, 5 inches next, …)

§

Parallel method:

§ Compute prefix sum

§ Make cuts in parallel with 10 knives

§

(6)

Importance of the Scan Operation

§

Assume the scan operation is a primitive that has unit time costs, then the following algorithms have the following complexities:

38 CHAPTER 3. THE SCAN PRIMITIVES

Model

Algorithm EREW CRCW Scan

Graph Algorithms

(nvertices,medges,mprocessors)

Minimum Spanning Tree lg2n lgn lgn

Connected Components lg2n lgn lgn

Maximum Flow n2lgn n2lgn n2

Maximal Independent Set lg2n lg2n lgn

Biconnected Components lg2n lgn lgn

Sorting and Merging (nkeys,nprocessors)

Sorting lgn lgn lgn

Merging lgn lg lgn lg lgn

Computational Geometry (npoints,nprocessors)

Convex Hull lg2n lgn lgn

Building aK-D Tree lg2n lg2n lgn

Closest Pair in the Plane lg2n lgnlg lgn lgn

Line of Sight lgn lgn 1

Matrix Manipulation (n×nmatrix,n2processors)

Matrix×Matrix n n n

Vector×Matrix lgn lgn 1

Matrix Inversion nlgn nlgn n

EREW=

exclusive-read,

exclusive-write PRAM CRCW=

concurrent-read,

concurrent-write PRAM Scan =

EREW with scan as unit-cost primitive

Guy E. Blelloch:

Vector Models for

Data-Parallel Computing

(7)

§

Actually, prefix-sum (a.k.a. scan) was considered such an

important operation, that it was implemented as a primitive in the CM-2 Connection Machine (of Thinking Machines Corp.)

(8)

Example: Line-of-Sight

§

Given:

§ Terrain as grid of height values (height map)

§ Point X in the grid (our "viewpoint", has a height, too)

§ Viewing direction, we can look up and down, but not to the left or right

§

Problem: find all visible points in the grid along the view direction

§

Assumption: we have already extracted a vector of heights from the grid containing all cells' heights that are in our viewing direction

(9)

3.3. EXAMPLE: LINE-OF-SIGHT 41

Figure 3.1: The line-of-sight problem for a single ray. The X marks the observation point.

The visible points are shaded. A point on the ray is visible if no previous point has a greater

§

The algorithm:

1. Convert height vector to vertical angles (as seen from X) → A

- One thread per vector element

2. Perform max-scan on angle vector (i.e., prefix sum with the max operator) → Â

3. Test âi < ai , if true then grid point is visible form X

Height vector

3.3. EXAMPLE: LINE-OF-SIGHT 41

Figure 3.1: The line-of-sight problem for a single ray. The X marks the observation point.

The visible points are shaded. A point on the ray is visible if no previous point has a greater angle. The angle is calculated as arctan(altitude/distance).

Angle vector (A)

3.3. EXAMPLE: LINE-OF-SIGHT 41

Figure 3.1: The line-of-sight problem for a single ray. The X marks the observation point.

The visible points are shaded. A point on the ray is visible if no previous point has a greater angle. The angle is calculated as arctan(altitude/distance).

Max-scan of angle vector (Â)

(10)

The Hillis-Steele Algorithm

§

Iterate log(n) times:

§

Notes:

§ Blue = active threads

§ Each thread reads from "another" thread, too → must use barrier sync

- Can save one barrier by double buffering

A: 3 1 7 0 4 1 6 3

B: 3 4 8 7 4 5 7 9

d = 0, stride 1

A: 3 4 11 11 12 12 11 14

B: 3 4 11 11 15 16 22 25

d = 1, stride 2

d = 2, stride 4

(11)

§

The algorithm as pseudo-code:

§ Note: barrier synchronization omitted for clarity

§

Remark: precision is usually better than the naïve sequential algo

§ Because, in the parallel version, summands (in each iteration) tend to be of the same order

§

Algorithmic technique: recursive/iterative doubling technique =

"Accesses or actions are governed by increasing powers of 2"

forall i in parallel do // n threads for d = 0...log(n)-1:

if i >= 2^d :

x[i] = x[ i − 2^d ] + x[i]

(12)

Definitions

§

Depth D(n) = "#iterations" = parallel running time Tp(n)

§ (Think of the loops unrolled and "baked" into a hardware pipeline)

§ Sometimes also called step complexity

§

Work W(n) = total number of operations performed by all threads

§ With sequential algorithms, work complexity = time complexity

§

Work-efficient:

A parallel algorithm is called work-efficient, if it performs no more work than the sequential one

(13)

§

Visual definition of depth/work complexity:

§ Express computation as a dependence graph of parallel tasks:

§ Work complexity = total amount of work performed by all tasks

§ Depth complexity = length of the "critical path" in the graph

§

Parallel algorithms should be always both work and depth efficient!

Parallel tasks

(14)

§

Complexity of the Hillis-Steele algorithm:

§ Depth d = Tp(n) = # iterations = log(n) → good

§ In iteration d:

§ Total number of adds = work complexity

§

Conclusion: not work-efficient

§ A factor of log(n) can hurt: 20x for 106 elements

n 2d 1 adds

W(n) =

log2n

X

d=1

(n 2d 1) =

log2n

X

d=1

n

log2n

X

d=1

2d 1 = n·logn n 2 O n log n

(15)

The Blelloch Algorithm (for Exclusive Scan)

§

Consists of two phases: up-sweep (= reduction) and down-sweep 1. Up-sweep:

3 1 7 0 4 1 6 3

3 4 7 7 4 5 6 9

3 4 7 11 4 5 6 14

3 4 7 11 4 5 6 25

d = 0, stride 1

d = 1, stride 2

d = 2, stride 4

(16)

2. Down-sweep:

§ First: zero last element (might seem strange at first thought)

§ Dashed line means "copy over" (overwriting previous content)

3 4 7 0 4 5 6 11

3 4 7 11 4 5 6 0

3 0 7 4 4 11 6 16

0 3 4 11 11 15 16 22

d = 0, stride 4

d = 1, stride 2

d = 2, stride 1

(17)

§

Depth complexity:

§ Performs 2.log(n) iterations

§ D(n) ∈ O( log n )

§

Work-efficiency:

§ Number of adds: n/2 + n/4 + .. + 1 + 1 + … + n/4 + n/2

§ Work complexity W(n) = 2.n = O(n)

§ The Blelloch algorithm is work efficient

§

This up-sweep followed by down-sweep is a very common pattern in massively parallel algorithms!

§

Limitations so far:

§ Only one block of threads (what if the array is larger?)

(18)

Working on Arbitrary Length Input

§

Partition array into b blocks

§ Choose fairly small block size = 2k, so we can easily pad array to b.2k

1. Run up-sweep on each block

2. Each block writes the sum of its partition (= last element after up- sweep) into a PartialSums array at blockIdx.x

3. Run prefix sum on the PartialSums array 4. Perform down-sweep on each block

5. Add PartialSums[blockIdx.x] to each element in "next"

array section blockIdx.x+1

(19)

Up-sweep block 0 Up-sweep block 1 Up-sweep block 2 Up-sweep block 3

Down-sweep block 0 Down-sweep block 1 Down-sweep block 2 Down-sweep block 3

Store block sums to auxiliary array PartialSums

Scan auxiliary array PartialSums

"Seed" last value in block i+1 with PartialSums[i], instead of 0

(20)

Further Optimizations

§

A simple & effective technique:

§ Each thread i loads 4 floats from global memory → float4 x

§ Store Σj=0…3 x[i][j] in shared memory a[i]

§ Compute the prefix-sum on a â

§ Store 4 values back in global memory:

-A[4*i] = â[i-1] + x[0]

-A[4*i+1] = â[i-1] + x[0] + x[1]

-A[4*i+2] = â[i-1] + x[0] + x[1] + x[2]

-A[4*i+3] = â[i-1] + x[0] + x[1] + x[2] + x[3]

§ Experience shows: 2x faster

§ Why does this improve performance? → Brent's theorem exclusive

(21)

Brent's Theorem

§

Frequent assumption when formulating parallel algorithms: we have arbitrarily many processors

§ E.g., O(n) many processors for input of size n

§ Kernel launch even reflects that:

- Often, we run as many threads as there are input elements - I.e., CUDA/GPU provide us with this (nice) abstraction

§

Real hardware: only has fixed number p of processors

§ E.g., on current GPUs: p ≈ 200–2000 (depending on viewpoint)

§

Question: how fast can an implementation of a parallel algorithm really be?

(22)

§

Assumptions for Brent's theorem: PRAM model

§ No explicit synchronization needed

§ Memory access = free (no cost)

§

Brent's Theorem:

Given a massively parallel algorithm A; let D(n) = its depth (i.e., parallel time) complexity, and W(n) = its work complexity.

Then, A can be run on a p-processor PRAM in time at most

(Note the "≤")

T(n, p)  W(n) p

+ D(n)

(23)

§

Proof:

§ For each iteration step i, 1 ≤ i ≤ D(n), let Wi(n) = number of operations in that step

§ In each iteration, distribute those Wi(n) operations on p processors:

- Execute operations on each of the p processors in parallel

- Takes time steps on the PRAM

§ Overall :

T(n,p) =

D(n)X

i=1

⇠Wi(n) p

D(n)X

i=1

Wi(n) p

+ 1  W (n) p

+ D(n)

Wi(n) p

Wi(n) p

(24)

Application of Brent's Theorem to our Optimization of Prefix-Sum

§

Assume that the optimized version loads f floats into local registers

§

Work complexity:

§ Without optimization:

§ With optimization:

§

Depth complexity:

§ Without optimization:

§ With optimization:

§

If f = 2, then W2 = W1 and D2 = D1, i.e., we gain nothing

§

If f > 2, speedup of version 2 (opt.) over version 1 (orig.):

D

1

(n) = 2 log(n) W

1

(n) = 2n

W2(n) = 2nf + nf ·f = n 1 + 2f

⇡ 2np

n

p 1 + 2f = 2f f + 2

D

2

(n) = 2 log(

nf

) + 2f = 2 log n 2 log f + 2f

Speedup(n) = T1(n) T2(n) =

W1(n)

p + D1(n)

W2(n)

p + D2(n)

<latexit sha1_base64="5WkMZq87Tct349y3qHAl0YsNb3E=">AAACxHicZVHbbhMxEPVugZZwaVpeQLxYVEitENFuXuAFqSoV4rGIpqnUjSLHO9ta9U32GIhW2xe+jw9AfAF/gTeXlrQjWXNm5njGPjOxUnjMst9Junbv/oP1jYedR4+fPN3sbm2feBMchwE30rjTCfMghYYBCpRwah0wNZEwnFx+bOvDb+C8MPoYpxZGip1rUQnOMKbG3Z+FseAYGqeZgvqrBSiDbXb1Hv1Ai8oxXtPjcd7GTYv6M3RTW/rhNcfG8ht6eB3fMPq3GItenXF3J+tlM6N3Qb4AO/uHv/5QQsjReCvZK0rDgwKNXDLvz/K+xVHNHAouoekUwYNl/JKdQz3TqKGvY6qklXHxaKSz7ApPTRXDi0hsnf+/dBawej+qhbYBQfN5rypIioa2mtJSOOAop5RxHp8UGMZR/ILFf2PUfmWMZ9rPBxVL2CkcaPjOjVJMl3VRMSXktISKBYlNXfhqiVc7iaAF/ohJLzxgsHXc5FtlSojraTVvN9xEcfPbUt4FJ/1envXyL1HlAzK3DfKSvCK7JCfvyD75TI7IgHDyN9lMnicv0k+pTH0a5tQ0Wdx5RlYsvfoHpkPY3w==</latexit><latexit sha1_base64="c7dBg+NdHIt6oHYWZ/7Tc500Zjo=">AAACxHicZVFLaxRBEO4ZH4nrIxu9KF6aBCFBXGb2ohchaBCPEbPZQGZZe3tqkib9ortaXYbx4u8TvIpnD/4Le/aRuElBU19VfV3V/dXESuExy34l6Y2bt26vrd/p3L13/8FGd/PhkTfBcRhwI407njAPUmgYoEAJx9YBUxMJw8n527Y+/AzOC6MPcWphpNipFpXgDGNq3P1eGAuOoXGaKag/WoAy2GZH79LXtKgc4zU9HOdt3LSoP0OXtaUfXnBsLD+n+xfxJaN/hbHo1Rl3t7NeNjN6HeQLsL23/+P31s8/nw7Gm8luURoeFGjkknl/kvctjmrmUHAJTacIHizj5+wU6plGDX0WUyWtjItHI51lV3hqqhieRWLr/P+lk4DVq1EttA0Ims97VUFSNLTVlJbCAUc5pYzz+KTAMI7iZyz+G6P2K2M8034+qFjCTuFAwxdulGK6rIuKKSGnJVQsSGzqwldLvNpJBC3wa0x64QGDreMmXyhTQlxPq3m74SaKm1+V8jo46vfyrJd/iCq/IXNbJ0/JFtkhOXlJ9sh7ckAGhJO/yUbyOHmSvktl6tMwp6bJ4s4jsmLpt38pldr2</latexit><latexit sha1_base64="c7dBg+NdHIt6oHYWZ/7Tc500Zjo=">AAACxHicZVFLaxRBEO4ZH4nrIxu9KF6aBCFBXGb2ohchaBCPEbPZQGZZe3tqkib9ortaXYbx4u8TvIpnD/4Le/aRuElBU19VfV3V/dXESuExy34l6Y2bt26vrd/p3L13/8FGd/PhkTfBcRhwI407njAPUmgYoEAJx9YBUxMJw8n527Y+/AzOC6MPcWphpNipFpXgDGNq3P1eGAuOoXGaKag/WoAy2GZH79LXtKgc4zU9HOdt3LSoP0OXtaUfXnBsLD+n+xfxJaN/hbHo1Rl3t7NeNjN6HeQLsL23/+P31s8/nw7Gm8luURoeFGjkknl/kvctjmrmUHAJTacIHizj5+wU6plGDX0WUyWtjItHI51lV3hqqhieRWLr/P+lk4DVq1EttA0Ims97VUFSNLTVlJbCAUc5pYzz+KTAMI7iZyz+G6P2K2M8034+qFjCTuFAwxdulGK6rIuKKSGnJVQsSGzqwldLvNpJBC3wa0x64QGDreMmXyhTQlxPq3m74SaKm1+V8jo46vfyrJd/iCq/IXNbJ0/JFtkhOXlJ9sh7ckAGhJO/yUbyOHmSvktl6tMwp6bJ4s4jsmLpt38pldr2</latexit><latexit sha1_base64="c7dBg+NdHIt6oHYWZ/7Tc500Zjo=">AAACxHicZVFLaxRBEO4ZH4nrIxu9KF6aBCFBXGb2ohchaBCPEbPZQGZZe3tqkib9ortaXYbx4u8TvIpnD/4Le/aRuElBU19VfV3V/dXESuExy34l6Y2bt26vrd/p3L13/8FGd/PhkTfBcRhwI407njAPUmgYoEAJx9YBUxMJw8n527Y+/AzOC6MPcWphpNipFpXgDGNq3P1eGAuOoXGaKag/WoAy2GZH79LXtKgc4zU9HOdt3LSoP0OXtaUfXnBsLD+n+xfxJaN/hbHo1Rl3t7NeNjN6HeQLsL23/+P31s8/nw7Gm8luURoeFGjkknl/kvctjmrmUHAJTacIHizj5+wU6plGDX0WUyWtjItHI51lV3hqqhieRWLr/P+lk4DVq1EttA0Ims97VUFSNLTVlJbCAUc5pYzz+KTAMI7iZyz+G6P2K2M8034+qFjCTuFAwxdulGK6rIuKKSGnJVQsSGzqwldLvNpJBC3wa0x64QGDreMmXyhTQlxPq3m74SaKm1+V8jo46vfyrJd/iCq/IXNbJ0/JFtkhOXlJ9sh7ckAGhJO/yUbyOHmSvktl6tMwp6bJ4s4jsmLpt38pldr2</latexit><latexit sha1_base64="KJvsRPzR+AV6bsz6POjq0fOL+lc=">AAACxHicZVHbbhMxEPUutxIuTcsLiBeLCKkVItrNC7wgVYAQj0U0TaVuFDne2daqb7LH0Gi1vPCVfAJ/gTfZtKQdyZozM8cz9pm5lcJjlv1J0jt3791/sPWw9+jxk6fb/Z3dY2+C4zDmRhp3MmcepNAwRoESTqwDpuYSJvOLT2198gOcF0Yf4cLCVLEzLSrBGcbUrP+7MBYcQ+M0U1B/twBlsM2e3qcfaFE5xmt6NMvbuGnRaImua2s/ueLYWH5DP1/F14zRDUbXqzfrD7JhtjR6G+QdGJDODmc7yX5RGh4UaOSSeX+ajyxOa+ZQcAlNrwgeLOMX7AzqpUYNfR1TJa2Mi0cjXWY3eGqhGJ5HYuv8/6XTgNX7aS20DQiar3pVQVI0tNWUlsIBR7mgjPP4pMAwjuLnLP4bo/YbYzzTfjWoWMNe4UDDT26UYrqsi4opIRclVCxIbOrCV2u82UkELfAyJr3wgMHWcZNvlSkhrqfVvN1wE8XNb0p5GxyPhnk2zL9lg4OPncxb5CV5RfZITt6RA/KVHJIx4eRvsp08T16kX1KZ+jSsqGnS3XlGNiz99Q/BhtaT</latexit>

(25)

Other Consequences of Brent's Theorem

§

Obviously,

§

In the sequential world, time = work:

§

In the parallel world:

§

Our speedup is

§

Assume,

i.e., our parallel algorithm would do asymptotically more work

§

Then,

because, on real hardware, p is bounded

Speedup(n)  p

T

S

(n) = W

S

(n)

TP(n) = WPp(n) + D(n)

W

P

(n) 2 ⌦ ( W

S

(n) )

Speedup(n) = TTS(n)

P(n) = WPW(n)S(n)

p +D(n)

Speedup(n) = WS(n)

⌦(WS(n) ) + D(n) ! 0 as n ! 1

(26)

§

Now, look at work-efficient parallel algorithms, i.e.

§

Then,

§

In this situation, we will achieve the optimal speedup of O(p), so long as

§

Consequence: given two work-efficient parallel algorithms, the one with the smaller depth complexity is better, because we can run it on hardware with more processors (cores) and still obtain a speedup of p over the sequential algorithm (in theory).

We say this algorithm scales better.

W

P

(n) 2 ⇥ ( W

S

(n) )

Speedup(n) = W (n)

W(n)

p + D(n) = pW(n)

W(n) + pD(n)

p 2 O W (n) D(n)

(27)

Limitations of Brent's Theorem

§

Brent's theorem is based on the PRAM model

§

That model makes a number of unrealistic assumptions:

§ Memory access has zero latency

§ Memory bandwidth is infinite

§ No synchronization among processors (threads) is necessary

§ Arithmetic operations cost unit time

§

With current hardware, rather the opposite is realistic

(28)

Digression: Radix-Sort

§

Modelled after sorting machines of post routing centers (but with a twist!)

§

Disadvantages:

§ Not generic like Quicksort, which require only a compare operator on pairs of elements

§ Works only on elements with a known, pre-defined

numeric representation (e.g., 32 bits)

§ Different representation require different versions of radix sort

§

Vorteil: sehr effizient!

(29)

§

Observation: integers can be represented with any base r

§

Naive (intuitive) idea:

§ Sort all elements according to the most significand digit into bins (one bin per digit)

§ Sort bin 0 using radix sort recursively

§ Sort bin 1 recursively, etc. …

§

This is called MSD radix sort (MSD = most significant digit)

§

For the algorithm on the next slide:

§ Choose radix r and fix it

§ Define z(t,a) = t-tth digit of number a when represented over base r,

(30)

The Algorithm (in Python)

A = array of numbers

i = current digit used for sorting ( 0 <= i <= d-1 ) d = total number of digits (same for all keys)

def msd_radix_sort( A, i, d ):

# init array of r empty lists = [ [], [], [], … ] bin = r * [[]]

# distribute all A's in bins according to z(i,.) for j in range(0, len(A) ):

bin[ z(i, A[j]) ].append( A[j] )

# sort bins if i >= 0:

for j in range(0, r):

msd_radix_sort( bin[j], i-1, d )

# gather bins A = []

for j in range(0, r):

A.extend( bin[j] ) bin[j] = []

Optional

(31)

Example

§

Keys = integers with 64 bits

§

Size of input = 224 (ca. 16 mio)

§

We choose r = 28 = 256 as base

§ E.g. "digits" = characters in fixed-length strings

§

On the first recursion level, the algo checks the left-most 8 bits of the keys and distributes each key into one of 256 bins

§

Average (expected) size of the bins (assuming uniform distribution of the keys) = 224 / 28 = 216 = 65536

(32)

§

Recursion tree:

§

Problem: in each recursion, we need to save r-1 many bins

§ Lots of house keeping necessary

§ Either using marker arrays like with Counting Sort

§ Or using arrays of lists (lots of allocations / deallocations)

(33)

§

Solution: LSD Radix-Sort (aka. Backward Radix-Sort)

§ First, sort according to least-significand digit, then according to least but second digit, etc.

§

Let d = number of digits, digit 0 = least-significand one

§

The algorithm:

§ Use, e.g., Counting Sort inside the loop lsd_radix_sort( A ):

for i = 0, ..., d-1:

do a stable sort on A with the

i-th digit of the elements as the key

(34)

Example

§

Sort 12 letters according to the post code (zip code)

§

In the first iteration, consider only the last digit

§ Notice that letters with the same digit did not change their position relative to each other!

Letters before the first iteration Letters after the first iteration

(35)

§

Sort by last but second digit

Letters before the second iteration Letters after the second iteration

(36)
(37)

Parallel Radix Sort, Based on the Split Operation

§

The split operation: rearrange elements according to a flag

§ Note: split maintains order within each group! (i.e., it is stable)

§

Radix sort (massively parallel):

where split(i,a) rearranges a by moving all keys that have bit i = 0 to the front, and all keys that have bit i = 1 to the back

(bit 0 = LSB)

1 1

1 0

0 0

0 0

1 0 0

0 1

0 0

1

radix_sort( array a, int len ):

for i = 0...numbits-1: // important: go from low to high bit!

split(i, a) // split a, based on bit i of keys

Flags.

Usually, there are payload data, too (omitted here)

(38)

Algorithm for the Split Operation

§

Split's job:

§ Determine new index for each element

§ Then perform the permutation

§

Algorithm (by way of the example):

§ Consider lowest bit of the keys 1. Compute exclusive "0"-scan:

fi = # 0's in (a0, …, ai-1)

2. Set F = total number of "0"s

3. Construct d = new pos. of ai's

- If ai's bit = 0 → new pos. di = fi - If ai = 1 → new pos. di = F + (i – fi),

because i – fi = # 1's to the left of i

100 111 010 110 001 101 001 000

0 1 2 3 4 5 6 7

a:

i:

=

(fn 1 + 1 an 1 = 0 fn 1 an 1 = 1

0 1 1 2 3 3 3 3

f:

4+(1-1) 4+(4-3)4+(5-3)4+(6-3)

dfor "1"s:

0 1 2 3

dfor "0"s: F=4

0 4 1 2 5 6 7 3

d:

4 7 2 6 1 5 1 0

0 1 2 3 4 5 6 7

a:

i:

4 2 6 0 7 1 5 1

a':

Example: split based on bit 0

(39)

§

A conceptual algorithm for the "0"-scan:

§ Extract the relevant bit (conceptually only)

§ Invert the bit

§ Compute regular prefix sum with +-operation

§

In a real implementation, you would, of course, implement this as a native "0"-scan routine with a special "+" operation!

§

Depth complexity:

100 111 010 110 001 101 001 000

a:

1 0 1 1 0 0 0 1

a':

0 1 1 2 3 3 3 3

f:

O(b * log(n) ), where b = #bits per integer, and n = # elements

(40)

Stream Compaction

§

Given: input stream A, and a flag/predicate for each ai

§

Goal: output stream A' that contains only ai's, for which flag = true

§

Example:

§ Given: array of upper and lower case letters

§ Goal: delete lower case letters and compact the others

to the front of the array

§

Solution:

§ Just like with the split operation, except we don't compute indices for the

"to-be-deleted" elements

§

Frequent task: e.g., collision detection,

§

Sometimes also called list packing, or stream packing

A x C P h w b Z

a:

1 0 1 1 0 0 0 1

a':

A C P Z

b:

(41)

Sparse Matrices

§

"Unstructured" sparse matrices:

§ Most common storage format is Compressed Sparse Row (CSR)

§ Matrix M, size m×n , k non-zero elements

§ Stored in three arrays V, C, R

- Row i of matrix M is stored in

- Element Vj in M's i-th row represents element

V C

R

VRi,. . .,VRi+1 1

<latexit sha1_base64="adkwkxf9pqSM4uu7pOYREbzTdT8=">AAACX3icZVBNbxMxEJ0sHy0BSgonhIQsAhKINFr3Ui5IEVw4FkTSSt1o5XhnW6v+WNljSrTaG7+GK/wZbvBP2E1aidAnWX56M543fotKq0Bp+quX3Lh56/bW9p3+3Xv3dx4Mdh/Ogote4lQ67fzxQgTUyuKUFGk8rjwKs9B4tDh/39WPvqAPytnPtKxwbsSpVaWSglopHzyd5fWnXDUjlunCURixlVCr17zZ400+GKbjdAV2nfBLMpw8z0a/AeAw3+29ygono0FLUosQTvh+RfNaeFJSY9PPYsBKyHNxivVq/Ya9aKWClc63xxJbqRt9ZmkEnbWN3RX+LZ1EKt/Ma2WrSGjlelYZNSPHuu+yQnmUpJdMSNmuFAW1VvJMeCGpjWXDJggb1kbZFe1nHi1eSGeMsEWdlcIovSywFFFTU2ehvOKbk1S0ir62YlABKVZ1hX7PuALZW1Z23m34Xbj8/yivk9n+mKdj/pEPJ+9gjW14As/gJXA4gAl8gEOYgoRv8B1+wM/en2Qr2UkG69akd/nmEWwgefwXIDi6WQ==</latexit><latexit sha1_base64="asGJRMYiMnpeSnTu0+CRvxlZcwo=">AAACX3icZVBNbxMxEHW2QEqANoUTQkIWAQlEGq17KZdKEVw4FkTSSt1o5XhnW6v+WNljaLTaW/9E/wJX+DMckfgheJNWIvRJlp/ejOeN37xS0mOa/uokG3fu3utu3u89ePhoa7u/83jqbXACJsIq647n3IOSBiYoUcFx5YDruYKj+fmHtn70FZyX1nzBRQUzzU+NLKXgGKW8/3ya159z2QxppgqLfkiXQi3fsmaXNXl/kI7SJehtwq7JYPwyG/4prrqH+U7nTVZYETQYFIp7f8L2KpzV3KEUCppeFjxUXJzzU6iX6zf0VZQKWloXj0G6VNf69EJzPIuN7eX/LZ0ELN/NammqgGDEalYZFEVL2+/SQjoQqBaUCxFXChyjlTjjjguMsazZeG78yii7ob3MgYFvwmrNTVFnJddSLQooeVDY1Jkvb/j6JBmMxIsoeukBQ1VX4Ha1LYAe0LL1juG34bL/o7xNpnsjlo7YJzYYvycrbJJn5AV5TRjZJ2PykRySCRHkknwnP8jPzu+km2wl/VVr0rl+84SsIXn6F9wcu2E=</latexit><latexit sha1_base64="asGJRMYiMnpeSnTu0+CRvxlZcwo=">AAACX3icZVBNbxMxEHW2QEqANoUTQkIWAQlEGq17KZdKEVw4FkTSSt1o5XhnW6v+WNljaLTaW/9E/wJX+DMckfgheJNWIvRJlp/ejOeN37xS0mOa/uokG3fu3utu3u89ePhoa7u/83jqbXACJsIq647n3IOSBiYoUcFx5YDruYKj+fmHtn70FZyX1nzBRQUzzU+NLKXgGKW8/3ya159z2QxppgqLfkiXQi3fsmaXNXl/kI7SJehtwq7JYPwyG/4prrqH+U7nTVZYETQYFIp7f8L2KpzV3KEUCppeFjxUXJzzU6iX6zf0VZQKWloXj0G6VNf69EJzPIuN7eX/LZ0ELN/NammqgGDEalYZFEVL2+/SQjoQqBaUCxFXChyjlTjjjguMsazZeG78yii7ob3MgYFvwmrNTVFnJddSLQooeVDY1Jkvb/j6JBmMxIsoeukBQ1VX4Ha1LYAe0LL1juG34bL/o7xNpnsjlo7YJzYYvycrbJJn5AV5TRjZJ2PykRySCRHkknwnP8jPzu+km2wl/VVr0rl+84SsIXn6F9wcu2E=</latexit><latexit sha1_base64="asGJRMYiMnpeSnTu0+CRvxlZcwo=">AAACX3icZVBNbxMxEHW2QEqANoUTQkIWAQlEGq17KZdKEVw4FkTSSt1o5XhnW6v+WNljaLTaW/9E/wJX+DMckfgheJNWIvRJlp/ejOeN37xS0mOa/uokG3fu3utu3u89ePhoa7u/83jqbXACJsIq647n3IOSBiYoUcFx5YDruYKj+fmHtn70FZyX1nzBRQUzzU+NLKXgGKW8/3ya159z2QxppgqLfkiXQi3fsmaXNXl/kI7SJehtwq7JYPwyG/4prrqH+U7nTVZYETQYFIp7f8L2KpzV3KEUCppeFjxUXJzzU6iX6zf0VZQKWloXj0G6VNf69EJzPIuN7eX/LZ0ELN/NammqgGDEalYZFEVL2+/SQjoQqBaUCxFXChyjlTjjjguMsazZeG78yii7ob3MgYFvwmrNTVFnJddSLQooeVDY1Jkvb/j6JBmMxIsoeukBQ1VX4Ha1LYAe0LL1juG34bL/o7xNpnsjlo7YJzYYvycrbJJn5AV5TRjZJ2PykRySCRHkknwnP8jPzu+km2wl/VVr0rl+84SsIXn6F9wcu2E=</latexit><latexit sha1_base64="vYhuK9XKkcgO7w6XSWvd5Am83JA=">AAACX3icZVBNTxsxEHW2pUBKaWhPCKmyGiFRFaI1F3qphMqFI1RNQGKjleOdBQt/rOxxabTaW39Nr+2f6ZF/gjcJUlOeZPnpzXje+E0qJT2m6d9O8uz5yovVtfXuy41Xm697W29G3gYnYCissu5ywj0oaWCIEhVcVg64nii4mNyetPWL7+C8tOYbTisYa35tZCkFxyjlvXejvP6ay2afZqqw6PfpTKjlR9YcsCbv9dNBOgN9StiC9MkCZ/lW50NWWBE0GBSKe3/FDisc19yhFAqabhY8VFzc8muoZ+s3dDdKBS2ti8cgnalLfXqqOd7Exvby/5auApafxrU0VUAwYj6rDIqipe13aSEdCFRTyoWIKwWO0UrccMcFxliWbDw3fm6UPdJu5sDAnbBac1PUWcm1VNMCSh4UNnXmy0e+PEkGI/FHFL30gKGqK3AH2hZAP9Oy9Y7ht+Gy/6N8SkaHA5YO2DnrH39ZxLxGdsh7skcYOSLH5JSckSER5Cf5RX6TP537ZDXZTHrz1qSzePOWLCHZfgAVe7iN</latexit>

Mi,Cj

<latexit sha1_base64="L7j+dafKqDJxwSNIwdYEgYgZeTQ=">AAACS3icZVBNaxRBEK3ZGI3rVxKPXhqXgIIuM7noJbAkFy9CBDcJZJalt6cmadNfdFdHl2H+Rq7mD/kD/B3exIM9uwm45kHTj1fV9arfzCkZKM9/Zr21e+v3H2w87D96/OTps82t7aNgoxc4FlZZfzLjAZU0OCZJCk+cR65nCo9nFwdd/fgSfZDWfKa5w4nmZ0bWUnBKUvlx2sg3rDmYfmnb6eYgH+YLsLukuCGD0U4RrwHgcLqVvS4rK6JGQ0LxEE6LXUeThnuSQmHbL2NAx8UFP8NmsWrLdpJUsdr6dAyxhbrSp+ea03lq7K7wb+k0Uv1+0kjjIqERy1l1VIws677GKulRkJozLkRaKXJKVuKcey4oRbBiE7gJS6PylvZLjwa/Cqs1N1VT1lxLNa+w5lFR25ShvuWrk2Q0kr4lMciAFF3j0L/VtkK2x+rOOwXdhVv8H+VdcrQ7LPJh8akYjPZhiQ14AS/hFRTwDkbwAQ5hDAIcXMF3uM5+ZL+y39mfZWsvu3nzHFbQW/8LhRC2Uw==</latexit><latexit sha1_base64="KhP90KnRRx+Q0keEcupMlIPLdGc=">AAACS3icZVDLahRBFK0ejcbxkUSXbgqHgIIO3dnoRhjMxo0QwUkC6WG4U307KVMvqm6pQ9NL/0Dc6h/kS/IB+Q534sLqmQQcc6Cow7m37rl1Zk7JQHl+kfVu3Fy7dXv9Tv/uvfsPNja3Hu4HG73AsbDK+sMZBFTS4JgkKTx0HkHPFB7MTne7+sEn9EFa84HmDicajo2spQBKUvlu2sjnvNmdfmzb6eYgH+YL8OukuCSD0XYRz86+ftubbmXPysqKqNGQUBDCUbHjaNKAJykUtv0yBnQgTuEYm8WqLd9OUsVr69MxxBfqSp+ea6CT1Nhd4d/SUaT61aSRxkVCI5az6qg4Wd59jVfSoyA15yBEWikCJStxAh4EpQhWbAKYsDQqr2i/9Gjws7Bag6masgYt1bzCGqKitilDfcVXJ8loJH1JYpABKbrGoX+hbYX8Na877xR0F27xf5TXyf7OsMiHxftiMHrDllhnj9kT9pQV7CUbsbdsj42ZYI59Zz/Yz+w8+5X9zv4sW3vZ5ZtHbAW9tb+cmLgq</latexit><latexit sha1_base64="KhP90KnRRx+Q0keEcupMlIPLdGc=">AAACS3icZVDLahRBFK0ejcbxkUSXbgqHgIIO3dnoRhjMxo0QwUkC6WG4U307KVMvqm6pQ9NL/0Dc6h/kS/IB+Q534sLqmQQcc6Cow7m37rl1Zk7JQHl+kfVu3Fy7dXv9Tv/uvfsPNja3Hu4HG73AsbDK+sMZBFTS4JgkKTx0HkHPFB7MTne7+sEn9EFa84HmDicajo2spQBKUvlu2sjnvNmdfmzb6eYgH+YL8OukuCSD0XYRz86+ftubbmXPysqKqNGQUBDCUbHjaNKAJykUtv0yBnQgTuEYm8WqLd9OUsVr69MxxBfqSp+ea6CT1Nhd4d/SUaT61aSRxkVCI5az6qg4Wd59jVfSoyA15yBEWikCJStxAh4EpQhWbAKYsDQqr2i/9Gjws7Bag6masgYt1bzCGqKitilDfcVXJ8loJH1JYpABKbrGoX+hbYX8Na877xR0F27xf5TXyf7OsMiHxftiMHrDllhnj9kT9pQV7CUbsbdsj42ZYI59Zz/Yz+w8+5X9zv4sW3vZ5ZtHbAW9tb+cmLgq</latexit><latexit sha1_base64="KhP90KnRRx+Q0keEcupMlIPLdGc=">AAACS3icZVDLahRBFK0ejcbxkUSXbgqHgIIO3dnoRhjMxo0QwUkC6WG4U307KVMvqm6pQ9NL/0Dc6h/kS/IB+Q534sLqmQQcc6Cow7m37rl1Zk7JQHl+kfVu3Fy7dXv9Tv/uvfsPNja3Hu4HG73AsbDK+sMZBFTS4JgkKTx0HkHPFB7MTne7+sEn9EFa84HmDicajo2spQBKUvlu2sjnvNmdfmzb6eYgH+YL8OukuCSD0XYRz86+ftubbmXPysqKqNGQUBDCUbHjaNKAJykUtv0yBnQgTuEYm8WqLd9OUsVr69MxxBfqSp+ea6CT1Nhd4d/SUaT61aSRxkVCI5az6qg4Wd59jVfSoyA15yBEWikCJStxAh4EpQhWbAKYsDQqr2i/9Gjws7Bag6masgYt1bzCGqKitilDfcVXJ8loJH1JYpABKbrGoX+hbYX8Na877xR0F27xf5TXyf7OsMiHxftiMHrDllhnj9kT9pQV7CUbsbdsj42ZYI59Zz/Yz+w8+5X9zv4sW3vZ5ZtHbAW9tb+cmLgq</latexit><latexit sha1_base64="jCzWxUv57dGH0FjiSJfbfYFtFF4=">AAACS3icZVBNaxRBEO3ZGI2r0SQevTQugoIuM3vRixDMxYuQgJsEMstS21OTdNJfdFeryzB/w6v+IX+Av8ObeEjP7gZc86Dpx6vqetVv5pQMlOe/st7Gnc2797bu9x883H70eGd37zjY6AWOhVXWn84goJIGxyRJ4anzCHqm8GR2ddDVTz6jD9KaTzR3ONFwbmQtBVCSyo/TRr7izcH0sm2nO4N8mC/Ab5NiRQZshcPpbvayrKyIGg0JBSGcFSNHkwY8SaGw7ZcxoANxBefYLFZt+fMkVby2Ph1DfKGu9em5BrpIjd0V/i2dRarfThppXCQ0YjmrjoqT5d3XeCU9ClJzDkKklSJQshIX4EFQimDNJoAJS6PyhvZLjwa/CKs1mKopa9BSzSusISpqmzLUN3x9koxG0tckBhmQomsc+tfaVsjf8brzTkF34Rb/R3mbHI+GRT4sjorB/vtVzFvsKXvGXrCCvWH77AM7ZGMmmGPf2Hf2I/uZ/c7+ZH+Xrb1s9eYJW0Nv8xqyDbSo</latexit>

(42)

§

Example:

M = 0 B B B B

@

a

0

0 0 a

1

0 0 a

2

0 0 0 0 0 a

3

0 0 0 a

4

0 a

5

a

6

0 0 0 0 a

7

1 C C C C A

<latexit sha1_base64="+MdAF5W+5Bxum+MuL62xLQ3xNW0=">AAAC2nicZVFLb9NAEF6bVzGPpnDkskpUBAeCHR7lghTBhQtSkUhbqRtZ4/U4XXW9tvYBRFYkxA1x5cJP49fAxm7apB1pvZ+/b2Zn9E1WS2FsHP8NwmvXb9y8tXU7unP33v3t3s6DA1M5zXHCK1npowwMSqFwYoWVeFRrhDKTeJidvl/qh19QG1Gpz3Ze47SEmRKF4GA9lfb+fKRvKctwJlRTl2C1+LaIII3pY7o6kCYtYizqfkdr4jndSS8u05C+PBdftd/X6xUXlXs0YqjyiyHS3iAexm3QqyA5A4Nxn/W/E0L2053gKcsr7kpUlksw5jgZ1XbagLaCS1xEzBmsgZ/CDJvWuQXd9VROi0r7oyxt2Y28cu7nOfGJy8usS8fOFm+mjVC1s6h491bhJLUVXTpNc6GRWzmnwLkfyYH1rfgJaODWb2SjjQFlukZsBSOmUeFXXpUleF9YAaWQ8xwLcNIuGmaKFd58STglrPePGWHQurqpUT8rqxz9ootlb7/3hTc3uWzlVXAwGibxMPmUDMbvSBdb5BHpkyckIXtkTD6QfTIhnPwLdoNh8Dxk4Y/wZ/irSw2Ds5qHZCPC3/8BV2nTpw==</latexit><latexit sha1_base64="QGNbtg4yXTSMe0CB/cZNiFv2vBc=">AAAC2nicZVFNb9QwEHXCVwlfWzhysXZVBAeWZKGUC9IKLlyQisS2lepVNHEmW6uOE9lOYRXlwq3iioT4PfwIxK8Bb9Jtd9uRHL+8N+MZvUlKKYwNw7+ef+36jZu3Nm4Hd+7eu/+gt/lwzxSV5jjhhSz0QQIGpVA4scJKPCg1Qp5I3E+O3y/0/RPURhTqs52XOM1hpkQmOFhHxb1fH+lbyhKcCVWXOVgtvjYBxCF9QpcH4qhFjAXd72hFPKc76eVlGuJX5+J2+329WnFRuUMDhiq9GCLuDcJh2Aa9CqIzMBj3Wf/05+8/u/Gm94ylBa9yVJZLMOYwGpV2WoO2gktsAlYZLIEfwwzr1rmGbjkqpVmh3VGWtuxaXj538xy5xMVlVqXDymZvprVQZWVR8e6trJLUFnThNE2FRm7lnALnbqQKrGvFj0ADt24ja20MKNM1YksYMI0Kv/Aiz8H5wjLIhZynmEElbVMzky3x+kuiUsI6/5gRBm1V1iXq53mRolt0tujt9t44c6PLVl4Fe6NhFA6jT9Fg/I50sUEekz55SiKyQ8bkA9klE8LJP2/LG3ovfOZ/80/9712q753VPCJr4f/4DxTj1eE=</latexit><latexit sha1_base64="QGNbtg4yXTSMe0CB/cZNiFv2vBc=">AAAC2nicZVFNb9QwEHXCVwlfWzhysXZVBAeWZKGUC9IKLlyQisS2lepVNHEmW6uOE9lOYRXlwq3iioT4PfwIxK8Bb9Jtd9uRHL+8N+MZvUlKKYwNw7+ef+36jZu3Nm4Hd+7eu/+gt/lwzxSV5jjhhSz0QQIGpVA4scJKPCg1Qp5I3E+O3y/0/RPURhTqs52XOM1hpkQmOFhHxb1fH+lbyhKcCVWXOVgtvjYBxCF9QpcH4qhFjAXd72hFPKc76eVlGuJX5+J2+329WnFRuUMDhiq9GCLuDcJh2Aa9CqIzMBj3Wf/05+8/u/Gm94ylBa9yVJZLMOYwGpV2WoO2gktsAlYZLIEfwwzr1rmGbjkqpVmh3VGWtuxaXj538xy5xMVlVqXDymZvprVQZWVR8e6trJLUFnThNE2FRm7lnALnbqQKrGvFj0ADt24ja20MKNM1YksYMI0Kv/Aiz8H5wjLIhZynmEElbVMzky3x+kuiUsI6/5gRBm1V1iXq53mRolt0tujt9t44c6PLVl4Fe6NhFA6jT9Fg/I50sUEekz55SiKyQ8bkA9klE8LJP2/LG3ovfOZ/80/9712q753VPCJr4f/4DxTj1eE=</latexit><latexit sha1_base64="QGNbtg4yXTSMe0CB/cZNiFv2vBc=">AAAC2nicZVFNb9QwEHXCVwlfWzhysXZVBAeWZKGUC9IKLlyQisS2lepVNHEmW6uOE9lOYRXlwq3iioT4PfwIxK8Bb9Jtd9uRHL+8N+MZvUlKKYwNw7+ef+36jZu3Nm4Hd+7eu/+gt/lwzxSV5jjhhSz0QQIGpVA4scJKPCg1Qp5I3E+O3y/0/RPURhTqs52XOM1hpkQmOFhHxb1fH+lbyhKcCVWXOVgtvjYBxCF9QpcH4qhFjAXd72hFPKc76eVlGuJX5+J2+329WnFRuUMDhiq9GCLuDcJh2Aa9CqIzMBj3Wf/05+8/u/Gm94ylBa9yVJZLMOYwGpV2WoO2gktsAlYZLIEfwwzr1rmGbjkqpVmh3VGWtuxaXj538xy5xMVlVqXDymZvprVQZWVR8e6trJLUFnThNE2FRm7lnALnbqQKrGvFj0ADt24ja20MKNM1YksYMI0Kv/Aiz8H5wjLIhZynmEElbVMzky3x+kuiUsI6/5gRBm1V1iXq53mRolt0tujt9t44c6PLVl4Fe6NhFA6jT9Fg/I50sUEekz55SiKyQ8bkA9klE8LJP2/LG3ovfOZ/80/9712q753VPCJr4f/4DxTj1eE=</latexit><latexit sha1_base64="ZwOyoS2mY6NU3G4wy39H97CAraM=">AAAC2nicZVFLb9NAEF6bVzGPpnDksiIqggPBDo9yQargwgWpSKSt1I2s8XqcrrpeW/sAIssXbogrF34avwY2dtIm7Ujr/fx9Mzujb7JaCmPj+G8QXrt+4+atrdvRnbv37m8Pdh4cmsppjhNeyUofZ2BQCoUTK6zE41ojlJnEo+zsw0I/+oraiEp9sfMapyXMlCgEB+updPDnE31HWYYzoZq6BKvF9zaCNKZP6OpAmnSIsaj/Ha+J53QvvbxMQ/rqXHzdfd+sV1xU7tGIocovhkgHw3gUd0GvgmQJhmQZB+lO8IzlFXclKsslGHOSjGs7bUBbwSW2EXMGa+BnMMOmc66lu57KaVFpf5SlHbuRV879PKc+cXGZdenE2eLttBGqdhYV798qnKS2ogunaS40civnFDj3IzmwvhU/BQ3c+o1stDGgTN+IrWDENCr8xquyBO8LK6AUcp5jAU7atmGmWOHNl4RTwnr/mBEGraubGvXzssrRL7pY9PZ7b725yWUrr4LD8SiJR8nnZLj/fmnzFnlEHpOnJCF7ZJ98JAdkQjj5F+wGo+BFyMIf4c/wV58aBsuah2Qjwt//AQML0kc=</latexit>

V

<latexit sha1_base64="Wx0lZ1PH1zcrcU3neZTK48t8oqc=">AAAChXicZVHbbhMxEJ0sl5ZwaQqPSMgiQmqlNtoNpeUFEQEPPLYSSSt1o8jxzrZWfVnZYyBa5Yf4Aj6D1/LEp+AkW4nQkaxzNDOeY5+ZVkp6StPrVnLn7r37G5sP2g8fPX6y1dl+OvI2OIFDYZV1Z1PuUUmDQ5Kk8KxyyPVU4en06uOifvoVnZfWfKFZhWPNL4wspeAUU5POpxF7x3YYn6R7LN+LmDXYb/B1gwcNvmnwsMEjttuedLppL10Gu02yhnQHL36e/AGA48l2azcvrAgaDQnFvT/P+hWNa+5ICoXzdh48Vlxc8Qusl5+cs1cxVbDSungMsWV2rU/PNKfL2LgA/2/pPFD5dlxLUwVCI1azyqAYWbYwhRXSoSA1Y1yI+KTAKUqJS+64oGjemoznxq+E8hvazh0a/Cas1twUdV5yLdWswJIHRfM69+UNX58kg5H0PSa99Eihqit0+9oWGHdSLrTjiubR3Ox/K2+TUb+Xpb3sJOsOPsAqNuE5vIQdyOAIBvAZjmEIAn7AL7iG38lGsp8cJIer1qTV3HkGa5G8/wvTEMCt</latexit><latexit sha1_base64="mzuhQbPf0fpuUaUNzxhDHLedmi8=">AAAChXicZVHbahsxEJW3l6TuzUkfC0HUFBJIzK6b20uoafvQxwRqJ5A1ZqydTUS00iKNkprFj/2ZfkE+o6/pL/QnKtsbqJsBcQ4zoznSmXGppKM4vmtEjx4/ebqy+qz5/MXLV69ba+sDZ7wV2BdGGXs2BodKauyTJIVnpUUoxgpPx1efZ/XTa7ROGv2NJiUOC7jQMpcCKKRGrS8DfsQ3OYzibZ5uB0xq7Nb4ocbdGvdq3K/xgG81R6123InnwR+SpCbt3sbtyZ8fG7fHo7XGVpoZ4QvUJBQ4d550SxpWYEkKhdNm6h2WIK7gAqv5J6f8fUhlPDc2HE18nl3qKyYF0GVonIH7t3TuKT8cVlKXnlCLxazcK06Gz0zhmbQoSE04CBGe5IGClLgEC4KCeUsyDrRbCKX3tJla1HgjTFGAzqo0h0KqSYY5eEXTKnX5PV+eJL2W9D0knXRIvqxKtDuFyTDsJJ9phxVNg7nJ/1Y+JINuJ4k7yUnS7n1ii1hlb9k7tskSdsB67Cs7Zn0m2E/2i92x39FKtBPtRvuL1qhR33nDliL6+BcsdMIT</latexit><latexit sha1_base64="mzuhQbPf0fpuUaUNzxhDHLedmi8=">AAAChXicZVHbahsxEJW3l6TuzUkfC0HUFBJIzK6b20uoafvQxwRqJ5A1ZqydTUS00iKNkprFj/2ZfkE+o6/pL/QnKtsbqJsBcQ4zoznSmXGppKM4vmtEjx4/ebqy+qz5/MXLV69ba+sDZ7wV2BdGGXs2BodKauyTJIVnpUUoxgpPx1efZ/XTa7ROGv2NJiUOC7jQMpcCKKRGrS8DfsQ3OYzibZ5uB0xq7Nb4ocbdGvdq3K/xgG81R6123InnwR+SpCbt3sbtyZ8fG7fHo7XGVpoZ4QvUJBQ4d550SxpWYEkKhdNm6h2WIK7gAqv5J6f8fUhlPDc2HE18nl3qKyYF0GVonIH7t3TuKT8cVlKXnlCLxazcK06Gz0zhmbQoSE04CBGe5IGClLgEC4KCeUsyDrRbCKX3tJla1HgjTFGAzqo0h0KqSYY5eEXTKnX5PV+eJL2W9D0knXRIvqxKtDuFyTDsJJ9phxVNg7nJ/1Y+JINuJ4k7yUnS7n1ii1hlb9k7tskSdsB67Cs7Zn0m2E/2i92x39FKtBPtRvuL1qhR33nDliL6+BcsdMIT</latexit><latexit sha1_base64="mzuhQbPf0fpuUaUNzxhDHLedmi8=">AAAChXicZVHbahsxEJW3l6TuzUkfC0HUFBJIzK6b20uoafvQxwRqJ5A1ZqydTUS00iKNkprFj/2ZfkE+o6/pL/QnKtsbqJsBcQ4zoznSmXGppKM4vmtEjx4/ebqy+qz5/MXLV69ba+sDZ7wV2BdGGXs2BodKauyTJIVnpUUoxgpPx1efZ/XTa7ROGv2NJiUOC7jQMpcCKKRGrS8DfsQ3OYzibZ5uB0xq7Nb4ocbdGvdq3K/xgG81R6123InnwR+SpCbt3sbtyZ8fG7fHo7XGVpoZ4QvUJBQ4d550SxpWYEkKhdNm6h2WIK7gAqv5J6f8fUhlPDc2HE18nl3qKyYF0GVonIH7t3TuKT8cVlKXnlCLxazcK06Gz0zhmbQoSE04CBGe5IGClLgEC4KCeUsyDrRbCKX3tJla1HgjTFGAzqo0h0KqSYY5eEXTKnX5PV+eJL2W9D0knXRIvqxKtDuFyTDsJJ9phxVNg7nJ/1Y+JINuJ4k7yUnS7n1ii1hlb9k7tskSdsB67Cs7Zn0m2E/2i92x39FKtBPtRvuL1qhR33nDliL6+BcsdMIT</latexit><latexit sha1_base64="MTqSy52YPbqzv8WpaOUdwGZdKPg=">AAAChXicZVFNTxsxEHWWttD0g0CPvViNKoEE0W5KoZeqqO2BI5WagMRG0cQ7CxZee2WPodEqf6i/plf6a+okRiJlJOs9zYzn2W8mtZKO0vSulaw9efpsfeN5+8XLV683O1vbQ2e8FTgQRhl7PgGHSmockCSF57VFqCYKzybX3+b1sxu0Thr9k6Y1jiq41LKUAiikxp3vQ/6Z73AYp3s83wuYRexH/BDxIOLHiIcRj/hue9zppr10EfwxySLpshin463Wbl4Y4SvUJBQ4d5H1axo1YEkKhbN27h3WIK7hEpvFJ2f8fUgVvDQ2HE18kV3pq6YV0FVonIN7WLrwVH4aNVLXnlCL5azSK06Gz03hhbQoSE05CBGe5IGClLgCC4KCeSsyDrRbCuX3tJ1b1HgrTFWBLpq8hEqqaYEleEWzJnflPV+dJL2W9CsknXRIvm5qtPuVKTDspJxrhxXNgrnZ/1Y+JsN+L0t72Y+se/w12rzB3rJ3bIdl7IgdsxN2ygZMsN/sD7tjf5P1ZD85SA6XrUkr3nnDViL58g/36b5m</latexit>

= (a

0

, a

1

, a

2

, a

3

, a

4

, a

5

, a

6

, a

7

) C

<latexit sha1_base64="XWQfpAJykbxFZMbbLxAsv0iSe4E=">AAACdHicZVFNaxsxEB1vv1L3y2mPLUXUBBJIza5TaC8F01x6TKBOAlljZO1sIqKPRRo1MYt/SP9Jb722P6G3/oqeK+86EDcDGj3ejOZJT7NKSU9p+ruT3Ll77/6DjYfdR4+fPH3W23x+5G1wAsfCKutOZtyjkgbHJEnhSeWQ65nC49nF/rJ+/BWdl9Z8oXmFE83PjCyl4BSpaW9vn31k2yxlu/ku22ty1uThDdzy71Z5Z9rrp4O0CXYbZCvQH73+fvgHAA6mm52dvLAiaDQkFPf+NBtWNKm5IykULrp58FhxccHPsG7etGBbkSpYaV1chljDrvXpueZ0HhuXm79ZOg1UfpjU0lSB0Ih2VhkUI8uWHrBCOhSk5owLEa8UOEUpcc4dFxS9WpPx3PhWKL+G3dyhwUthteamqPOSa6nmBZY8KFrUuS+v8fokGYykq0h66ZFCVVfo3mpbYPyCcqkdf2QRzc3+t/I2OBoOsnSQHWb90SdoYwNewhvYhgzewwg+wwGMQcA3+AE/4Vfnb/Iq6SdbbWvSWZ15AWuRDP4B4qq62w==</latexit><latexit sha1_base64="TdGzCAQxiza6XKZ1+0X1bRj4HtA=">AAACdHicZVFNTxsxEHW2tIX0g9AeQchqhAQSjXYDEr1UiuDCESQCSGwUOd5ZsPDHyh4XolWO/RH9J9x6pT+hv6H3nnF2g0RgJI+f3ozn2c+jQgqHcfy3Eb1aeP3m7eJS8937Dx+XWyufTp3xlkOfG2ns+Yg5kEJDHwVKOC8sMDWScDa6PpjWz36AdcLoExwXMFDsUotccIaBGrZ2Duh3ukljup1u050qJ1XuPsE1vzvLW8NWO+7EVdCXIJmBdm/97vjfz/W7o+FKYyvNDPcKNHLJnLtIugUOSmZRcAmTZuodFIxfs0soqzdN6EagMpobG5ZGWrFzfWqsGF6FxunmnpYuPObfBqXQhUfQvJ6Ve0nR0KkHNBMWOMoxZZyHK3mGQYpfMcs4Bq/mZBzTrhZKH2EztaDhhhulmM7KNGdKyHEGOfMSJ2Xq8kc8P0l4LfA2kE44QF+UBdivymQQviCfaocfmQRzk+dWvgSn3U4Sd5LjpN3bJ3UsklXyhWyShOyRHjkkR6RPOPlFfpN78qfxP1qL2tFG3Ro1Zmc+k7mIOg88DrxB</latexit><latexit sha1_base64="TdGzCAQxiza6XKZ1+0X1bRj4HtA=">AAACdHicZVFNTxsxEHW2tIX0g9AeQchqhAQSjXYDEr1UiuDCESQCSGwUOd5ZsPDHyh4XolWO/RH9J9x6pT+hv6H3nnF2g0RgJI+f3ozn2c+jQgqHcfy3Eb1aeP3m7eJS8937Dx+XWyufTp3xlkOfG2ns+Yg5kEJDHwVKOC8sMDWScDa6PpjWz36AdcLoExwXMFDsUotccIaBGrZ2Duh3ukljup1u050qJ1XuPsE1vzvLW8NWO+7EVdCXIJmBdm/97vjfz/W7o+FKYyvNDPcKNHLJnLtIugUOSmZRcAmTZuodFIxfs0soqzdN6EagMpobG5ZGWrFzfWqsGF6FxunmnpYuPObfBqXQhUfQvJ6Ve0nR0KkHNBMWOMoxZZyHK3mGQYpfMcs4Bq/mZBzTrhZKH2EztaDhhhulmM7KNGdKyHEGOfMSJ2Xq8kc8P0l4LfA2kE44QF+UBdivymQQviCfaocfmQRzk+dWvgSn3U4Sd5LjpN3bJ3UsklXyhWyShOyRHjkkR6RPOPlFfpN78qfxP1qL2tFG3Ro1Zmc+k7mIOg88DrxB</latexit><latexit sha1_base64="TdGzCAQxiza6XKZ1+0X1bRj4HtA=">AAACdHicZVFNTxsxEHW2tIX0g9AeQchqhAQSjXYDEr1UiuDCESQCSGwUOd5ZsPDHyh4XolWO/RH9J9x6pT+hv6H3nnF2g0RgJI+f3ozn2c+jQgqHcfy3Eb1aeP3m7eJS8937Dx+XWyufTp3xlkOfG2ns+Yg5kEJDHwVKOC8sMDWScDa6PpjWz36AdcLoExwXMFDsUotccIaBGrZ2Duh3ukljup1u050qJ1XuPsE1vzvLW8NWO+7EVdCXIJmBdm/97vjfz/W7o+FKYyvNDPcKNHLJnLtIugUOSmZRcAmTZuodFIxfs0soqzdN6EagMpobG5ZGWrFzfWqsGF6FxunmnpYuPObfBqXQhUfQvJ6Ve0nR0KkHNBMWOMoxZZyHK3mGQYpfMcs4Bq/mZBzTrhZKH2EztaDhhhulmM7KNGdKyHEGOfMSJ2Xq8kc8P0l4LfA2kE44QF+UBdivymQQviCfaocfmQRzk+dWvgSn3U4Sd5LjpN3bJ3UsklXyhWyShOyRHjkkR6RPOPlFfpN78qfxP1qL2tFG3Ro1Zmc+k7mIOg88DrxB</latexit><latexit sha1_base64="WCPO+XRvuACTZwa8/unq82HLC90=">AAACdHicZVFNTxsxEHUW2tLQj1CO7cEiQgKJRruhUnuphMqlRyo1gMRG0cQ7Cxb+WNnjttEqP4Rf02v7E/gjPdfZLBKBkTx+ejOeZz9PKyU9peltJ1lbf/L02cbz7uaLl69e97benHobnMCRsMq68yl4VNLgiCQpPK8cgp4qPJteHy/qZz/QeWnNd5pVONZwaWQpBVCkJr3DY/6Z7/GUH+QH/LDJWZOH9/CS/9Dm/Umvnw7SJvhjkLWgz9o4mWx19vPCiqDRkFDg/UU2rGhcgyMpFM67efBYgbiGS6ybN835bqQKXloXlyHesCt9eqaBrmLjYvP3SxeByk/jWpoqEBqxnFUGxcnyhQe8kA4FqRkHIeKVAlCUElfgQFD0akXGg/FLofwOdnOHBn8KqzWYos5L0FLNCiwhKJrXuS/v8OokGYykX5H00iOFqq7Qvde2wPgF5UI7/sg8mps9tPIxOB0OsnSQfcv6R19amzfYW7bD9ljGPrIj9pWdsBET7Ib9Zn/Y386/5F3ST3aXrUmnPbPNViIZ/AcHkriU</latexit>

= (0, 3, 1, 2, 1, 3, 4, 4)

R

<latexit sha1_base64="NEj1efQu+P/ZTiabQ8C52re8q4o=">AAACaHicZVFNaxsxEB1vv1L3y0kPpRSKiClNIDW7biC5FEx76TEpdRLIGiNrZxMRfSzSKKlZ/BP6a3rpofkhufXcX1F5nUDdDOjp8TSapxlNKiU9pelVK7lz9979BysP248eP3n6rLO6duBtcAKHwirrjibco5IGhyRJ4VHlkOuJwsPJ2af5+eE5Oi+t+UrTCkeanxhZSsEpSuPO2y/sA9tgKdvKt1i/wfcNbje40+Au2xx3umkvbYLdJtk16Q5e/9z/DQB749XWZl5YETQaEop7f5z1KxrV3JEUCmftPHisuDjjJ1g3fczYmygVrLQuLkOsUZfy9FRzOo2J883/e3QcqNwd1dJUgdCIRa0yKEaWzftmhXQoSE0ZFyI+KXCKVuKUOy4ozmfJxnPjF0b5DW3nDg1eCKs1N0Wdl1xLNS2w5EHRrM59ecOXK8lgJH2LopceKVR1he6dtgXGsZdz7/gLszjc7P9R3iYH/V6W9rL9rDv4CItYgVewDhuQwQ4M4DPswRAEfIcf8AsuW3+STvIieblITVrXd57DUiTrfwGk8rgw</latexit><latexit sha1_base64="nkM+NWRgN5uHCfx2fNGUh85tnQc=">AAACaHicZVFNbxMxEHWWrxK+UjgghFRZjRCt1Ea7oVJ7QYrgwrFFpK3UjaKJd7a16o+VPaZEqxw58mu49AA/hL8AfwJn00qEjuTnp+fxPM94UinpKU1/tZJbt+/cvbdyv/3g4aPHTzqrTw+9DU7gUFhl3fEEPCppcEiSFB5XDkFPFB5Nzt/Pz48+o/PSmk80rXCk4dTIUgqgKI07rz/yt3yDp3wr3+L9Bt80uNPgboN7fHPc6aa9tAl+k2RXpDtYuzz483Xtcn+82trMCyuCRkNCgfcnWb+iUQ2OpFA4a+fBYwXiHE6xbvqY8VdRKnhpXVyGeKMu5empBjqLifPN/3t0EqjcG9XSVIHQiEWtMihOls/75oV0KEhNOQgRnxSAopU4AweC4nyWbDwYvzDKr2k7d2jwQlitwRR1XoKWalpgCUHRrM59ec2XK8lgJH2JopceKVR1hW5b2wLj2Mu5d/yFWRxu9v8ob5LDfi9Le9lB1h28Y4tYYS/ZOttgGdtlA/aB7bMhE+wb+85+sJ+t30kneZ68WKQmras7z9hSJOt/Af5HuZY=</latexit><latexit sha1_base64="nkM+NWRgN5uHCfx2fNGUh85tnQc=">AAACaHicZVFNbxMxEHWWrxK+UjgghFRZjRCt1Ea7oVJ7QYrgwrFFpK3UjaKJd7a16o+VPaZEqxw58mu49AA/hL8AfwJn00qEjuTnp+fxPM94UinpKU1/tZJbt+/cvbdyv/3g4aPHTzqrTw+9DU7gUFhl3fEEPCppcEiSFB5XDkFPFB5Nzt/Pz48+o/PSmk80rXCk4dTIUgqgKI07rz/yt3yDp3wr3+L9Bt80uNPgboN7fHPc6aa9tAl+k2RXpDtYuzz483Xtcn+82trMCyuCRkNCgfcnWb+iUQ2OpFA4a+fBYwXiHE6xbvqY8VdRKnhpXVyGeKMu5empBjqLifPN/3t0EqjcG9XSVIHQiEWtMihOls/75oV0KEhNOQgRnxSAopU4AweC4nyWbDwYvzDKr2k7d2jwQlitwRR1XoKWalpgCUHRrM59ec2XK8lgJH2JopceKVR1hW5b2wLj2Mu5d/yFWRxu9v8ob5LDfi9Le9lB1h28Y4tYYS/ZOttgGdtlA/aB7bMhE+wb+85+sJ+t30kneZ68WKQmras7z9hSJOt/Af5HuZY=</latexit><latexit sha1_base64="nkM+NWRgN5uHCfx2fNGUh85tnQc=">AAACaHicZVFNbxMxEHWWrxK+UjgghFRZjRCt1Ea7oVJ7QYrgwrFFpK3UjaKJd7a16o+VPaZEqxw58mu49AA/hL8AfwJn00qEjuTnp+fxPM94UinpKU1/tZJbt+/cvbdyv/3g4aPHTzqrTw+9DU7gUFhl3fEEPCppcEiSFB5XDkFPFB5Nzt/Pz48+o/PSmk80rXCk4dTIUgqgKI07rz/yt3yDp3wr3+L9Bt80uNPgboN7fHPc6aa9tAl+k2RXpDtYuzz483Xtcn+82trMCyuCRkNCgfcnWb+iUQ2OpFA4a+fBYwXiHE6xbvqY8VdRKnhpXVyGeKMu5empBjqLifPN/3t0EqjcG9XSVIHQiEWtMihOls/75oV0KEhNOQgRnxSAopU4AweC4nyWbDwYvzDKr2k7d2jwQlitwRR1XoKWalpgCUHRrM59ec2XK8lgJH2JopceKVR1hW5b2wLj2Mu5d/yFWRxu9v8ob5LDfi9Le9lB1h28Y4tYYS/ZOttgGdtlA/aB7bMhE+wb+85+sJ+t30kneZ68WKQmras7z9hSJOt/Af5HuZY=</latexit><latexit sha1_base64="CN1PzUBD912+KrjMegh6xIwccW0=">AAACaHicZVHbThsxEHW2tKXpLZSHCvFiEVUFiUa7KRK8VELtSx8BNYDERtHEOwsWvqzsMW20yif0a3htP6S/0K+oswkSKSN5fHQ8nuM5HldKekrTP63k0crjJ09Xn7Wfv3j56nVn7c2pt8EJHAirrDsfg0clDQ5IksLzyiHoscKz8fWX2fnZDTovrflGkwqHGi6NLKUAitSo8/6Ef+LbPOW7+S7vN/ljk/eavN/kA74z6nTTXtoEfwiyBeiyRRyN1lo7eWFF0GhIKPD+IutXNKzBkRQKp+08eKxAXMMl1s0cU/4uUgUvrYvLEG/YpTo90UBXsXC2+ftHF4HKg2EtTRUIjZj3KoPiZPlsbl5Ih4LUhIMQ8UkBKEqJK3AgKPqzJOPB+LlQfgfbuUOD34XVGkxR5yVoqSYFlhAUTevcl3d4uZMMRtKPSHrpkUJVV+g+aFtgtL2cacdfmEZzs/+tfAhO+70s7WXHWffw88LmVbbJttg2y9g+O2Rf2REbMMF+slv2i/1u/U06ydtkY16atBZ31tlSJFv/AMnLtek=</latexit>

= (0, 2, 3, 4, 7, 8)

(43)

§

Implementation in C:

where

n_rows = m , nnz = k ,

val = V ,

col_idx = C , row_start = R

struct {

int n_rows; // number of rows

int nnz; // total number of non-zero elements int row_start[n_rows+1];

int col_idx[nnz];

double val[nnz];

}

row_start col_idx

val nnz

Abbildung

Figure 3.1: The line-of-sight problem for a single ray. The X marks the observation point.
Figure 4: The left column shows the original input images, the middle column are reconstructions from summed-area tables (SATs) generated using our method, and the right column are reconstructions from SATs generated with the old method
Figure 39-12 shows a simple scene rendered with approximate depth of field, so that objects far from the focal length are blurry, while objects at the focal length are in  fo-cus
Figure 39-12 shows a simple scene rendered with approximate depth of field, so that objects far from the focal length are blurry, while objects at the focal length are in  fo-cus

Referenzen

ÄHNLICHE DOKUMENTE

Bei einer spaltenorientierten Datenbank wird späte Materialisierung angestrebt, da die benötigten Attribute erst so spät wie möglich hinzu geladen werden. Nicht nur

The nature of the isomerisation was not further identified, but a BHT isomer was observed to be abundant in suboxic zones of stratified marine water columns (Sáenz et al., 2011)

direct and total transmission are overestimated at large SZA, i.e., where the incoming solar energy is small. Thus from an energetic point of view, it appears that the ICA is a

per, we use a subset of the WOCE/JGOFS/DOE/NOAA global CO 2 survey data from the Pacific Ocean to pro- vide new estimates of organic carbon remineralization rates in the water

The main parts of the S87 format file are the header line containing all pertinent station information, an id line with two character minimum mnemonics describing the data in

In this paper we investigate the change in temperature flux at different latitudes, compare the temperature flux with the mean geostrophic flow field and with

Enhanced concentra- tions were observed at about 100 m depth, where ethylene and methane were also enriched, and minor chlorophyll-a concentrations at surface waters at station

Two different metaheuristics are suitably combined with parts of a developed column generation approach: On the one hand a variable neighborhood search (VNS) acts as the sole