• Keine Ergebnisse gefunden

Elan Library

Im Dokument mej<o Surface (Seite 155-163)

The Elan library provides direct access to the Elan communication processor.

The functions in this library provide application programmers with high per-formance DMA and event handling functionality.

The Elan Widget library (and therefore all of Meiko's current message passing libraries) are built upon the Elan library. You may therefore embed Elan library functions within most CS-2 applications.

meI<o

Parallel Programming 71

72

Key features of the Elan library are:

• Local or remote DMA.

• Broadcast DMA over contiguous processes.

• Event test and set functionality.

• C interface.

The Elan's DMA engine allows local, remote, and broadcast transfers. Comple-tion of the DMA can be flagged at either (or both) the sender and recipient with Elan events.

The event functionality allows a process to test the state of an event, to set an event in its own address space, and to queue a DMA transfer on an event. Note that the DMA engine can be used to set remote events; a DMA transfer (possibly transferring 0 bytes) may be used to set events at both the sender an receiver.

For more information about this library see The Elan Library, Meiko document number SIOO2-IOMI31.

Example

The following example is taken from the Elan library documentation. It shows how to embed Elan library DMA and event functionality within a CSN applica-tion.

In this example the process initialisation is undertaken by functions in the CSN library, which indirectly call both the Elan Widget library and Elan library ini-tialisation functions. Data buffers are created using rnallocO and a DMA de-scriptor is created by rnernalignO (note that the Elan DMA dede-scriptor must be aligned on an EW _ALIGN boundary).

The DMA will transfer data from process 0 into a data buffer somewhere in the address space of process I. To signify completion of the transfer an Elan event will be set at both the source and destination process. The event structure and data buffer can exist anywhere in the target process's address space, so blocking CSN communication functions are used to notify these to the DMA sender. Note that you could use the Widget library ew_allocateO function to define both as global- objects, and thus eliminate the CSN handshaking.

S 1002-1 OM117 .02

meJ<a

tinclude <stdio.h>

tdefine DMASIZE 1024

static unsigned char pattern [] = {OxOO, OxOO, OxOO, Ox55, Ox55, Ox55,

unsigned char* bufferi

Oxaa, Oxaa, Oxaa, Oxff, Oxff, Oxff};

/* Package pointers to remote data objects in one structure so we *1

/* can transfer both in one CSN message passing operation. *1

struct {

unsigned char* bufferpi ELAN_EVENT* eventpi rxbuffersi

/************* CSN library initialisation functions ****************1

cs_getinfo(&nproc, &me, &i)i 1* i variable not used *1 if (nproc ! = 2)

fprintf(stderr, "error: need 2 processors\n");

exit(1)i

5

mei<o

Parallel Programming 73

74

/* Build structures in processes heap space */

/* DMA descriptor MUST BE 32 bit aligned. */

drnaDesc - (ELAN_DMA*) memalign(EW_ALIGN, sizeof(ELAN_DMA»;

buffer = (unsigned char*) malloc(DMASIZE);

event = (ELAN_EVENT*) malloc(sizeof(ELAN_EVENT»;

if (csn_open (CSN_NULL_ID, &t) != CSN_OK)

fprintf (stderr, "Cannot open transport \n") ; exit(-l);

if ( me == 0 )

/* Process 0 is DMA sender; receiver of addresses from CSN transport */

/* Register my transport */

if (csn_registername(t, "toProcO") != CSN_OK)

fprintf(stderr, "Cannot register transport name\n" );

exit(-l);

if (csn_lookupname(&next, "toProcO", 1) != CSN_OK) fprintf(stderr, "Cannot lookup transport name\n");

exit(-l);

/* Send address of my event and data buffers */

rxbuffers.bufferp = buffer;

rxbuffers.eventp = event;

csn_tx (t, 0, next, (char*) &rxbuffers, sizeof (rxbuffers»;

SI002-10M117.02

mei<.o

1********** Elan library DMA/Event functionality *********1 if(!elan_checkVersion(ELAN_VERSION»

fprintf (stderr, "error: libelan version error\n");

exit (1);

5

Process 0 defines the DMA transfer by initialising a DMA descriptor. This iden-tifies the size of transfer, source and destination buffers, and events that are set on completion at both the sender and receiver. The transfer is initiated by a call to elan_dmaO, and is tested for completion in both processes by a call to elan_waiteventO.

dmaDesc->dma_type = DMA_TYPE(TR_TYPE BYTE, DMA_NORMAL, 8);

dmaDesc->dma_size = DMASIZE;

dmaDesc->dma_source = buffer;

dmaDesc->dma_dest = rxbuffers.bufferp;

dmaDesc->dma_destEvent = rxbuffers.eventpi dmaDesc->dma_destProc = 1;

dmaDesc->dma_sourceEvent = event;

1* Address received from proc 1 *1 1* Address received from proc 1 *1

1* Initiate DMAj the event signifies completion. *1

printf ("Process %d now transfering %d bytes by DMA\n", me, DMASIZE);

elan_dma(ew_ctx, dmaDesc);

elan_waitevent(ew_ctx, event, ELAN_POLL_EVENT);

else {

1* Process 1 is the DMA recipient *1 1* Wait for DMA to trigger dest. event *1 elan_waitevent(ew_ctx, event, ELAN_POLL_EVENT);

mei<o

Parallel Programming 75

76

/* Check received data pattern */

for(i=O; i<DMASIZE; i++)

if (buffer[i] != pattern [i%sizeof (pattern) ]) fprintf (stderr, "Received data differs\n");

exit(l);

printf ("Data received and verified by process %d\n", me);

To compile this application you must link with the CSN library, the Elan Widget library, and the Elan library, as shown below:

user@cs2: ee -0 esnDMA -I/opt/MEIKOes2/inelude \ -L/opt/MEIKOes2/1ib esnDMA.e -lesn -lew -lelan

You run the program on two processors with prun:

user@cs2: prun -n2 -pparallel esnDMA

Process 0 now transfering 1024 bytes by DMA Data received and verified by process 1

SI002-10M117.02

ms<a

meJ<D

Glossary A

Configuration

The set of partitions that make up a CS-2 system.

Node

See Processing Element.

Pandora

The graphical user interface (GUI) to the resource management system. Pan-dora uses the facilities of a colour X workstation to display the status of a CS-2 system, to query its usage, and to manipulate its partitions. The user inter-face is via the familiar X-windows point-and-click system.

PFS

The Meiko parallel filesystem. Allows files to be striped over a number of Unix filesystems. Allows very large files to be created (as large as the total ca-pacity of all the participating Unix filesystems) and provides higher perform-ance data access for parallel programs, which need not compete for access to a single disk device when accessing data.

Partition

A set of processing elements dedicated to performing certain classes of work.

Most systems have at lease 2 partitions, one for interactive tasks and one for parallel applications. Additional partitions can be added to support system ad-min tasks, device management and batch processing. Partitions are set up by the system manager.

77

78

Processing Element

A CS-2 system made up of multiple processing elements (or PEs). Each is a SPARC processor, a memory system and an interface to the CS-2 inter-proc-essor communication system. Some processing elements have additional vec-tor some control an I/O system.

Vector Processing Unit

CS-2 uses a Fujitsu vector processing unit to provide high floating point per-formance on certain classes of application. Each vector PEs has a pair of these processors sharing memory with the SPARC. The compiler assigns vectorisa-ble blocks of code to these processors and scalar code to the SPARC.

Hosted vs. Hostless

Some message passing libraries support both hosted and hostless models, oth-ers are limited to just one.

Hostless applications consist of two processes; a host and a number of identi-cal slave processes. The application is initiated by executing the host process which then spawns the slave processes into a partition. All processes, includ-ing the host itself, use message passinclud-ing functions to cooperate and complete the task. Both PVM and the MPSC library may support this model.

Hostless applications have a number of identical slave processes that are spawned onto a partition prun. The PVM, MPSC, and CSN libraries support this model.

In either model the host/loader program uses functions in the resource man-agement user interface library to liaise with the Resource Manager for the slave's processing resource. The host/loader executes in one segment (typical-ly in your login partition) and the slaves execute in a second segment within some other partition.

Elan Id

The decimal representation of a node's unique route from the top of the net-work. For example, the node uniquely identified by the route <5>.<1> will have Elan Id 21 (hint: convert the route to binary 101.01 and convert this to decimal). See the Network Overview (document S1002-10M105) for a de-scription of network routes.

SI002-10M117.02

meJ«)

Computing

Surface

Im Dokument mej<o Surface (Seite 155-163)