Uniform type, implementation variants - The “Google MapReduce” skeleton

7.3 The “Google MapReduce” skeleton

8.1.1 Uniform type, implementation variants

8.1. PROCESS PIPELINES

the two languages. Furthermore, we will compare recursive and non-recursive skeleton implementations and discuss the aspect of nesting topology skeleton.

The strength ofEdIis its explicit communication control, bought at the price of excess explicit evaluation control, reduced safety, and lengthy code. Explicit communication is essential for tasks like implementing topology skeletons. This is why the explicit communication features of Eden (newand parfill) have been added to the (otherwise functional) language, to allow performance optimisation.

Eden’s dynamic channels are a debatable concept because, when used inappro-priately, they run the risk of losing referential transparency in programs. On the other hand, they are apparently an enormous benefit, if not a requirement, in reasonably implementing topology skeletons. Unless the tasks of a parallel algorithm interact in a tree shape (the natural process topology created by the call hierarchy), programmers will surely wish for more control over inter-process communication. On the other hand, by allowing completely free communica-tion structures between parallel processes, e.g. in the style of MPI [MPI97], programming comfort and security are abandoned. The liberty offered by free communication can lead to hard-to detect race-conditions when new programs are developed, and is opposed to the general aim of parallel functional programming:

reliability, readability, and provable soundness. This is why only few parallel functional languages support arbitrary connections between their units of com-putation at all. Examples are the channel concept of Clean [SP99], as well as communication features of Facile [GMP89] and Concurrent ML [Rep99]. Even more general and powerful than the dynamic channels in Eden and EdI, they relax the type safety restrictions and one-to-one restrictions.

8.1 Process pipelines

CHAPTER 8. STRUCTURE-ORIENTED SKELETONS

History. In the early days of the Eden language, Gal´an et al., in [GPP96], defined pipelines by folding a homogeneous process abstraction list with a suit-able process composition operator. The respective code is shown in Fig. 8.1 (we changed the (>>) operator from the original paper to (>->), since its form con-flicts with a standard monad operation). This definition takes a list of process abstractions of uniform type and creates a process abstraction which will un-fold a pipeline with the intended semantics – however this is inefficient, since all pipeline processes are connected through their respective parent! And another, less obvious, drawback is causing a lot more parallelisation overhead. Please note that the composition of two processes by (>->)yields a (composite) process, not a function. Consequently, the created pipeline has a spine of unnecessary inter-mediate “detour” processes (see depicted communication structure on the right of Fig. 8.1). This drawback is, however, relative when classified in the broader development context of Eden: Extended static analysis (and a suitable runtime support) was supposed to ‘shortcut’ connections over intermediate processes, to send data directly to the consuming process [KPS99, PPRS00].

( >-> ) :: (Trans a, Trans b, Trans c) =>

Process a b -> Process b c -> Process a c p >-> q = process (\xs -> q # (p # xs))

pipePF :: Trans a => Pipe a pipePF [] xs = xs

pipePF fs xs =

foldr1 (>->) (map process fs) # xs

P* P* P*

P P P P

parent process

input ?6output

* * *@@I@R@

Figure 8.1: Anno 1997: Pipeline by a fold

pipenaive :: Trans a => Pipe a pipenaive [] xs = xs pipenaive (f:fs) xs =

pipenaive fs ( process f # xs)

parent process

P P P P

input?AAU AAU AAU 6output

Figure 8.2: Na¨ıve tail-recursive pipeline

Recursive creation, variants. A slightly better, and more intuitive, way to create a pipeline is to userecursion in the skeleton. However, a na¨ıve tail-recursive scheme for the pipeline stages creates a purely hierarchical communication struc-ture as well, as shown in Fig. 8.2: All pipeline stages created by pipenaive send back their results to the caller, which forwards them to the next stage. Figure 8.3 (pipeR) shows how inner recursion in the process abstraction leads to a direct con-nection. Each process creates its pipeline successor and forwards data directly to it as an input. But still, the finalresult will flow back through all pipeline stages before reaching the caller. In order to establish direct communication between

8.1. PROCESS PIPELINES

pipeR :: Trans a => Pipe a pipeR [] vals = vals

pipeR ps vals = (process (generatePipe ps)) # vals generatePipe [p] vals = p vals

generatePipe (p:ps) vals =

(process (generatePipe ps)) # (p vals) P P P P parent process

input?6output

- -

Figure 8.3: Pipeline by inner recursion

pipeC :: Trans a => Pipe a pipeC [] vals = vals

pipeC ps vals = new (\chan res ->

(process (generatePipeC ps chan)) # vals ‘seq‘ res) generatePipeC [f] c vals =

parfill c (f vals) () generatePipeC (f:fs) c vals =

(process (generatePipeC fs c)) # (f vals)

parent process

P P P P

input output

? - - - 6

Figure 8.4: Pipeline with dynamic reply channel

the last pipeline stage and the caller, Eden’s dynamic channels must be used, as shown inpipeC(Fig. 8.4, also presented in [PRS01]). In this version, each process creates its successor in the pipeline recursively, but the results are sent back to the first process as a side effect, via the dynamic channelcwhich is embedded in the process abstraction.

Single-source versions. The pipeline can also be created directly from the parent, which then collects and distributes dynamic channels appropriately to connect the stages. This version requires additional demand control, and more communication, since channel names are communicated twice instead of being embedded in the process abstraction.

pipeParent :: Trans a => [[a]->[a]] -> [a] -> [a]

pipeParent [] xs = xs

pipeParent fs xs = new (\c_res res ->

let clistL = [ createProcess (chproc f) # c

| (c,f) <- zip (c_res:(map deLift clistL) (reverse fs)]

‘using‘ seqList rwhnf

in clistL ‘seq‘ parfill (deLift (last clistL)) xs res) -- helper function: "channel process"

chproc :: (Trans a, Trans b) => (a -> b) -> Process (ChanName b) (ChanName a) chproc f = process ch_f

where ch_f c_out = new (\c_inp inp -> parfill c_out (f inp) c_inp)

Figure 8.5: Eden pipeline skeleton where caller creates all processes

CHAPTER 8. STRUCTURE-ORIENTED SKELETONS

ediPipeFold :: (NFData a) =>

[([a] -> [a])] -> -- all stages as a list (same type) [a] -> IO [a]

ediPipeFold stages input

= do (outC, out) <- createC -- create last out-channel inC <- foldM spawnWithChans outC (reverse stages)

-- create stages right-to-left

fork (sendNFStream inC input) -- send input to first stage return out -- return result (to be received)

spawnWithChans :: (NFData a, NFData b) =>

ChanName’ [b] -> -- strict in result channel ([a] -> [b]) -> -- the stage

IO (ChanName’ [a]) -- input channel returned -- (goes to previous stage) spawnWithChans outC stage

= do (inCC,inC) <- createC -- create back-channel for input spawnProcessAt 0 (pipestage inCC outC stage)

return inC -- return in-channel (sent back by pl stage) pipestage :: (NFData a, NFData b) =>

ChanName’ (ChanName’ [a]) -> -- input back channel ChanName’ [b] -> -- output channel ([a] -> [b]) -> IO () -- functionality pipestage inCC outC f

= do (inC,input) <- createC -- create in-channel sendWith rwhnf inCC inC -- and send it back sendNFStream outC (f input) -- send results

Figure 8.6: Pipeline skeleton in EdI, single-source variant

ediRecPipe :: NFData a => [[a] -> [a]] -> [a] -> IO [a]

ediRecPipe [] input= return input

ediRecPipe fs input = do (inCC,inC) <- createC (resC,res) <- createC

spawnProcessAt 0 (doPipe inCC resC (reverse fs)) fork (sendNFStream inC input)

return res doPipe :: NFData a =>

ChanName’ (ChanName’ [a]) -> ChanName’ [a] -> [[a] -> [a]] -> IO () doPipe incc resC [f] = do (inC,input) <- createC

sendNF incc inC

sendNFStream resC (f input) doPipe incc resC (f:fs) = do (myInC,myIn) <- createC

spawnProcessAt 0 (doPipe incc myInC fs) sendNFStream resC (f myIn)

Figure 8.7: Pipeline skeleton in EdI, recursive variant

8.1. PROCESS PIPELINES

An EdI version of the pipeline, shown in Fig. 8.6, can proceed exactly in the same way and saves some of the Eden process creation overhead: As shown previously, the first action of a newly instantiated process is to send an input channel to its parent. In the fold operation which unfolds the pipeline, this channel is directly forwarded to the preceding pipeline stage instead of being used to send yet another channel as the parent input.

EdI can also use recursion to unfold the process pipeline. The code for the recursive version (shown in Fig. 8.7) looks very similar to the recursive Eden version with a channel at first sight. However, the pipeline is unfolded the other way round: The first process spawned by doPipe applies thelast function.

Im Dokument Explicit and implicit parallel functional programming : concepts and implementation (Seite 123-127)