• Keine Ergebnisse gefunden

5.4 Methods for Improving Scalability

5.4.3 Temporal and Spatial Segregation for a Composable and

in the unclustered (to the right of Fig. 5.12) SDFG1 (by RR SDFG scheduler) actor C comes to execution after 21 time units (under the assumption that no blocking on the FIFO buffers occurs) while in the clustered version it will come to execution after 23 time units. This fact obviously proves that the timing semantics can be violated when applying the clustering method, in its general form, to SDFGs scheduled according to Round-Robin.

Mapping information (see D2) are needed, since only actors mapped to same tile and which do not engage in an inter-processor communication can be clustered. The reason behind this is that actors of an SDFG which are engaged in an inter-processor communication, when clustered show different timing semantics (because of possible changes in the rates of the ports in the resulting hierarchical actor). This in turn, could lead to a distorted access pattern on the shared interconnect which could lead to false real-time results.

If above conditions hold, clustering can now be applied. After clustering, one issue remains, which is how to calculate the WCET/BCET of the resulting hierarchical actorΩ. Ifndenotes the number of actors in Z,γ(a) is the repeti-tion vector value of actor a(for notations’ details refer to Sect.2.2.1.4) then the newwcetof the hierarchical actorΩcan be calculated as follows:

γ(Ω)×wcet(Ω) =

n i=1

(γ(ai)×wcet(ai))

wcet(Ω) =

n i=1

(γ(ai)×wcet(ai)) γ(Ω)

similarly thebcetof the hierarchical actorΩcan be calculated as follows:

bcet(Ω) =

n i=1

(γ(ai)×bcet(ai)) γ(Ω)

It is important to note that beyond the estimated WCET/BCET of single actors when being executed on a target processor, the clustering technique is fully independent from the target architecture of the MPSoC.

5.4.3 Temporal and Spatial Segregation for a Composable and

SDFG Scheduler

SDFG

SDFG SDFG

Cluster 1 Cluster 2

TDMA Clusters‘ Scheduler

SO, RR

SO

Actors

Figure 5.13: Scheduling hierarchy extended with TDMA clusters’ scheduler

(in Sect.7.2.5) that with the help of this extension, we are able to improve the number of actors, being analyzable by our approach, on an MPSoC with a fixed number of tiles.

In the following, we assume that the spatial isolation is already realized (e.g.

either through virtualization with the help of a hypervisor [Fakih et al., 2013b]

or through static memory allocation) and will describe how such a composable RT analysis can be made based on a TDMA clusters’ scheduler and with the help of our state-based RT analysis method. Fig. 5.13 shows the scheduling hierarchy (see Sect. 4.2.4.2) extended with a non-preemptive TDMA clusters’

scheduler in the top hierarchy level. This TDMA scheduler allows clusters of actors (still respecting the lower two hierarchy scheduling levels: SO within the SDFG and SO or RR among SDFGs) to be executed in only specific time slots and switches to next slot as soon as the previous expires, and is defined as follows:

Definition 5.4.1. (TDMA Clusters’ Scheduler)A TDMA scheduler is defined as a tuple S= (F,S L) where Frepresents the functionality (code) of the scheduler (c.f. pseudo-code in Sect. 6.5.2which switches between different slots on a tile, S Lis a finite set of slots Sl = (d,Cl) each having a duration d after which the slot expires and a clusterCl ⊆ S O of different SDFG schedules to be executed in this slot. LetT be the number of tiles in the system then every tilet∈ T has

s s

slot1 slot2

s s

Tile 2

PE

D I

Tile 1

PE

D I

Bus

SDFG1 SDFG3 SDFG2 SDFG4

Memory

t

TDMA

Cluster1

Cluster2 slot1 slot2

s

Timer

Channel 1 Channel 2

>=1

IRQ1 IRQ2

IRQ_T1 IRQ_T2

IRQ1 IRQ2 IRQ1 IRQ2 IRQ1 IRQ2 IRQ1 IRQ2 IRQ1

Figure 5.14: Example of TDMA scheduling of clusters of SDFGs

its own sot ⊆ S O (see Def. 4.2.10) and every clusterCl = {sot0,sot1, . . . ,sott} consists of a set of schedules to be executed on every tile in the current slot wheresot0so0, sot1so1andsottsot respectively.

In this work, we make a simplification of the general case of Def. 5.4.1 concerning at which granularity we allow to construct the clusters and assume that a cluster can consist of a number of SDFGs and these are independent from other SDFGs mapped to other clusters. Note that the general case, when permitting the clustering at the granularity level of actors (see Fig.5.13), could easily lead to deadlocks in the SUA if care is not taken, this would not be the case if the clustering is made at the granularity level of SDFGs.

Let us take a look at a concrete example to understand how the TDMA clus-ters’ scheduler works. In a first step, clusters of SDFGs are identified as shown in the simple example in Fig. 5.14, where SDFG1 andSDFG2 are mapped to cluster1 and SDFG3 and SDFG4 are mapped to cluster2. Next, scheduling strategy is chosen for the SDFG scheduler (say for e.g. RR).

Now, the worst-case instance of some timing metric (e.g. period or

end-to-State-based Real-time

Method Timed-automata

Templates

Multicore Design2

Configure TDMA Scheduler with slot length (WCPApp1, WCPApp2 …)

Final Design Results:

WCPApp1 = 146 896 SATISFIED WCPApp2 = 146 896 SATISFIED WCPApp3 = 146 896 SATISFIED WCPApp4 = 146 896 SATISFIED

…..

Period, Latency

Analytical TDMA Real-time

Method Allowing Composability

Cluster2 Results:

WCPApp3 = 85 001

WCPApp4 = 44 236

…..

Cluster1 Results:

WCPApp1 = 54 529 SATISFIED WCPApp2 = 59 895 SATISFIED

…..

Cluster 1 Design Requirements

Integration

MPSoC Integrated Design

with TDMA Scheduler

Design

Tool Activity Library Requirement

Figure 5.15: Two-Tier RT analysis method through TDMA clusters’ scheduler

end latency) for every SDFG in every cluster is obtained in isolation (without considering other clusters see Fig. 5.15) with the help of our state-based RT method (presented in the Sect.5.3). In our example in Fig.5.14, we can first an-alyze the Worst-Case Period (WCP) ofSDFG1 andSDFG2 belonging tocluster1 considering all contentions on the shared bus (for different arbitrations proto-cols) between the two SDFGs but without consideringcluster2. Then we do the same forcluster2 SDFGs without consideringcluster1 SDFGs. After that, every cluster is mapped to a slot of a fixed size equal to the maximum of obtained worst-case time among all SDFGs (obtained from the state-based RT method) in this cluster so that it is guaranteed that all SDFGs mapped to this slot are already executed when the slot expires without the need for preemption (see Fig. 5.15). The TDMA scheduler has the role to switch between the slots of different clusters when the slot time of every cluster expires.

Assuming that SDFGs running in one slot are independent from those run-ning in other slots, in order to calculate now the worst-case execution (Tcompos see Fig. 5.15) of single SDFGs when all clusters are integrated and executed on the MPSoC platform, we take advantage of composability property of such a TDMA based scheduling and can calculate it (similar to Eq. 2.2) using the

following formula:

Tcompos=

Sl i=0

Tmax(i) + (Sl×s), (5.6) where T can either be the worst-case period or the worst-case end-to-end dead-line depending on the timing requirement we are interested in, Tmax(i) is the maximal T among the SDFGs running in slot i, s is the scheduler worst-case delay time when switching from one slot to another andSl is the total number of slots.

One possible realization of the above TDMA clusters’ scheduler with the help of hardware customized timers can be found in Sect. 6.5.2. Another real-ization, which we described in [Fakih et al., 2013b], is used in the experiment in Sect. 7.2.5 requires the existence of a resource manager (hypervisor) in the MPSoC which takes care of the temporal and spatial segregation.