• Keine Ergebnisse gefunden

Exercise3(CUDAbasics:Launchingkernels, Credits ) Exercise2(Amdahl’slaw, Credits ) Exercise1(Moore’sLawandPowerconsumption, Credits ) DueDate AssignmentonMassivelyParallelAlgorithms-Sheet2

N/A
N/A
Protected

Academic year: 2021

Aktie "Exercise3(CUDAbasics:Launchingkernels, Credits ) Exercise2(Amdahl’slaw, Credits ) Exercise1(Moore’sLawandPowerconsumption, Credits ) DueDate AssignmentonMassivelyParallelAlgorithms-Sheet2"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Prof. G. Zachmann

Christoph Schr¨oder (schroeder.c@cs.uni-bremen.de)

University of Bremen School of Computer Science

CGVR Group September 14, 2020

Summer Semester 2020

Assignment on Massively Parallel Algorithms - Sheet 2

Due Date

Exercise 1 (Moore’s Law and Power consumption, Credits)

a) Consider two approaches of doubling the number of transistors: halving the size of a single transistor while maintaining constant die area (Moore’s Law) versus maintaining the size of a single transistor while doubling the die area. List at least three reasons why the first approach is superior to the second approach.

b) The idealized formula for energy consumption by a processor core is E=ctf V2

wherecis a CPU-dependent constant,tis total execution time,fis the processor’s clock frequency, andV is the supply voltage. The frequency and voltage are correlated as follows

f =αV

with α = 0.2·109 HzV-1. Suppose our algorithm must complete in t = 10 seconds and needs a total of 1010 clock cycles to execute. What is the CPU energy consumption E for one task that completes in t = 10 seconds? What is the energy saving ratio when we run this perfectly parallelizable algorithm in two tasks on two CPU cores in parallel, assuming each task takest = 10 seconds to complete?

Exercise 2 (Amdahl’s law, Credits)

Given a single core processor A and a multi-core processor B with N cores. Additionally, assume that all cores of A and B are identical.

a) Given a program that runs 1.7 times faster on processor B than on processor A. Compute the parallel portion of the program i.e. f =P/(P+S) withP = execution time of parallizable part on single processor andS = execution time of inherently serial part on single processor (see Slide on Amdahl’s Law (the ”Pessimist”) in the Introduction Chapter).

b) Suppose parallel portionf is 0.5, how many processor cores are needed to achieve an overall speed up of 1.6?

Exercise 3 (CUDA basics: Launching kernels, Credits)

Starting from the frameworkmyFirstKernel

1

(2)

a) Allocate device memory for arrayd_ato hold the results of the kernel.

Overall numBlocks×numThreadsPerBlock threads will be launched, and each thread writes to one array element.

b) Configure and launch the kernelmyFirstKernel(int *d_a)using a 1D grid of 1D thread blocks.

c) Have each thread set an element of d_aas follows:

idx = blockIdx.x*blockDim.x + threadIdx.x

d_a[idx] = (blockIdx.x - 6) * (100 - threadIdx.x) d) Copy the result ind_aback to the host memory to array h_a.

e) Free the device arrayd_a

f) Cuda kernels cannot return a value. What could be the reason for this?

g) Measure the kernel execution time using cuda events. Consider the following types and functions which are described further in the CUDA docs:

cudaEvent_t event; float milliseconds = 0;

cudaEventCreate(&event); cudaEventElapsedTime(&milliseconds, start, stop);

cudaEventRecord(event); printf("CUDA time: %f\n", milliseconds);

cudaEventSynchronize(event);

2

Referenzen

ÄHNLICHE DOKUMENTE

die Geschwindigkeit der Fourier-Transformation auf Standard- Computern um über den Faktor 10000 erhöht!. Der Gesamtfaktor gegenüber der Zeit

Der Protagonist des Romans „The Absolutely True Diary of a Part-Time Indian“, Junior Spirit, ist ein Native American, der mit seiner Familie in einem Reservat lebt.. Als Junior in

 Ein Blick auf die Altersstruktur vergleichbarer Agglomerationsgemeinden (Allschwil, Binningen, Muttenz, Reinach) zeigt, dass zwar der Anteil der Personen über 65 Jahre in

modifyin9 programs leads to the definition of independent and dependent instructions. Because non·modifying programs contain only independent instructions, such

Using the Administration Tool you can change the print server configuration, download a new software version into the Flash EPROM, view statistics of Novell queues and servers, create

a) Consider two approaches of doubling the number of transistors: halving the size of a single transistor while maintaining constant die area (Moore’s Law) versus maintaining the

Hint: You can use one of the examples on the lecture homepage or from the Cuda SDK ( included in the Cuda installation package ) to test if Cuda works at all on your computer.

As such, we would expect a higher efficiency gain in the post-WTO accession period when more PDEs participated in this type of switch, since these PDEs are firms which are