• Keine Ergebnisse gefunden

Exercise2(Inter-BlockSynchronization, BonusCredits ) Exercise1(SortingNetworks, 4Credits ) DueDate23.07.2014 AssignmentonMassivelyParallelAlgorithms-Sheet11

N/A
N/A
Protected

Academic year: 2021

Aktie "Exercise2(Inter-BlockSynchronization, BonusCredits ) Exercise1(SortingNetworks, 4Credits ) DueDate23.07.2014 AssignmentonMassivelyParallelAlgorithms-Sheet11"

Copied!
1
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Prof. G. Zachmann A. Srinivas

University of Bremen School of Computer Science

CGVR Group July 16, 2014

Summer Semester 2014

Assignment on Massively Parallel Algorithms - Sheet 11

Due Date 23. 07. 2014

Exercise 1 (Sorting Networks, 4 Credits )

a) Modify the bubble sort cuda implementation (single block) in the previous assignment (assignment 10) so that it can handle array lengths greater than 2 times the maximum number of threads per block for device (GPU) used (using multiple blocks).

b) Compare the runtimes of parallel version of bubble sort (implemented above) with the sequential version. Plot a graph of speed up ( where speed up = runtime of sequential version / runtime of parallel version) along y axis vs size of input array along x axis. Interpret the plot and provide your arguments.

Hint: consider logarithm of size of input array along the x axis while plotting the above graph.

Exercise 2 (Inter-Block Synchronization , Bonus Credits)

a) Is it possible to achieve global synchronization of all threads in all blocks within a CUDA kernel method? Support your answer with appropriate arguments.

1

Referenzen

ÄHNLICHE DOKUMENTE

b) Implement a method to store the above Matrix in column major order and then modify the above Matrix vector multiplication kernel to handle matrix stored in column major order ..

i) Note that the Blelloch Algorithm performs exclusive scan operation. Please perform ap- propriate modifications to generate the inclusive max scan result.. ii) Use the

b) Imagine the following scenario: You are standing on a glass floor, from beneath that glass floor a virtual skyscraper is being projected, so that you can see your own body

a) Form groups of four people and either try out the demo ”Titan of Space” and ”Lava” on Oculus Rift 2 device or watch a movie in the cinema theatre and answer the following

a) Consider two approaches of doubling the number of transistors: halving the size of a single transistor while maintaining constant die area (Moore’s Law) versus maintaining the

Hint: You can use one of the examples on the lecture homepage or from the Cuda SDK ( included in the Cuda installation package ) to test if Cuda works at all on your computer.

b) Implement another version of the kernel using global memory only for all intermediate results.. Note: CUDA does not support synchronization across different blocks of a

Hint: Please note that the tiled version of Matrix Multiplication is used in the above given framework and use the similarities between algorithm EXTEND-PATH and Matrix