5.4 Methods for Improving Scalability
5.4.3 Temporal and Spatial Segregation for a Composable and
in the unclustered (to the right of Fig. 5.12) SDFG1 (by RR SDFG scheduler) actor C comes to execution after 21 time units (under the assumption that no blocking on the FIFO buffers occurs) while in the clustered version it will come to execution after 23 time units. This fact obviously proves that the timing semantics can be violated when applying the clustering method, in its general form, to SDFGs scheduled according to Round-Robin.
Mapping information (see D2) are needed, since only actors mapped to same tile and which do not engage in an inter-processor communication can be clustered. The reason behind this is that actors of an SDFG which are engaged in an inter-processor communication, when clustered show different timing semantics (because of possible changes in the rates of the ports in the resulting hierarchical actor). This in turn, could lead to a distorted access pattern on the shared interconnect which could lead to false real-time results.
If above conditions hold, clustering can now be applied. After clustering, one issue remains, which is how to calculate the WCET/BCET of the resulting hierarchical actorΩ. Ifndenotes the number of actors in Z,γ(a) is the repeti-tion vector value of actor a(for notations’ details refer to Sect.2.2.1.4) then the newwcetof the hierarchical actorΩcan be calculated as follows:
γ(Ω)×wcet(Ω) =
∑
n i=1(γ(ai)×wcet(ai))
⇔wcet(Ω) =
∑n i=1
(γ(ai)×wcet(ai)) γ(Ω)
similarly thebcetof the hierarchical actorΩcan be calculated as follows:
bcet(Ω) =
∑n i=1
(γ(ai)×bcet(ai)) γ(Ω)
It is important to note that beyond the estimated WCET/BCET of single actors when being executed on a target processor, the clustering technique is fully independent from the target architecture of the MPSoC.
5.4.3 Temporal and Spatial Segregation for a Composable and
SDFG Scheduler
SDFG
SDFG SDFG
Cluster 1 Cluster 2
TDMA Clusters‘ Scheduler
SO, RR
SO
Actors
…
Figure 5.13: Scheduling hierarchy extended with TDMA clusters’ scheduler
(in Sect.7.2.5) that with the help of this extension, we are able to improve the number of actors, being analyzable by our approach, on an MPSoC with a fixed number of tiles.
In the following, we assume that the spatial isolation is already realized (e.g.
either through virtualization with the help of a hypervisor [Fakih et al., 2013b]
or through static memory allocation) and will describe how such a composable RT analysis can be made based on a TDMA clusters’ scheduler and with the help of our state-based RT analysis method. Fig. 5.13 shows the scheduling hierarchy (see Sect. 4.2.4.2) extended with a non-preemptive TDMA clusters’
scheduler in the top hierarchy level. This TDMA scheduler allows clusters of actors (still respecting the lower two hierarchy scheduling levels: SO within the SDFG and SO or RR among SDFGs) to be executed in only specific time slots and switches to next slot as soon as the previous expires, and is defined as follows:
Definition 5.4.1. (TDMA Clusters’ Scheduler)A TDMA scheduler is defined as a tuple S= (F,S L) where Frepresents the functionality (code) of the scheduler (c.f. pseudo-code in Sect. 6.5.2which switches between different slots on a tile, S Lis a finite set of slots Sl = (d,Cl) each having a duration d after which the slot expires and a clusterCl ⊆ S O of different SDFG schedules to be executed in this slot. LetT be the number of tiles in the system then every tilet∈ T has
s s …
slot1 slot2
s s
Tile 2
PE
D I
Tile 1
PE
D I
Bus
SDFG1 SDFG3 SDFG2 SDFG4
Memory
t
TDMA
Cluster1
Cluster2 slot1 slot2
s
Timer
Channel 1 Channel 2
>=1
IRQ1 IRQ2
IRQ_T1 IRQ_T2
IRQ1 IRQ2 IRQ1 IRQ2 IRQ1 IRQ2 IRQ1 IRQ2 IRQ1
Figure 5.14: Example of TDMA scheduling of clusters of SDFGs
its own sot ⊆ S O (see Def. 4.2.10) and every clusterCl = {sot0,sot1, . . . ,sott} consists of a set of schedules to be executed on every tile in the current slot wheresot0⊆so0, sot1⊆so1andsott ⊆sot respectively.
In this work, we make a simplification of the general case of Def. 5.4.1 concerning at which granularity we allow to construct the clusters and assume that a cluster can consist of a number of SDFGs and these are independent from other SDFGs mapped to other clusters. Note that the general case, when permitting the clustering at the granularity level of actors (see Fig.5.13), could easily lead to deadlocks in the SUA if care is not taken, this would not be the case if the clustering is made at the granularity level of SDFGs.
Let us take a look at a concrete example to understand how the TDMA clus-ters’ scheduler works. In a first step, clusters of SDFGs are identified as shown in the simple example in Fig. 5.14, where SDFG1 andSDFG2 are mapped to cluster1 and SDFG3 and SDFG4 are mapped to cluster2. Next, scheduling strategy is chosen for the SDFG scheduler (say for e.g. RR).
Now, the worst-case instance of some timing metric (e.g. period or
end-to-State-based Real-time
Method Timed-automata
Templates
Multicore Design2
Configure TDMA Scheduler with slot length (WCPApp1, WCPApp2 …)
Final Design Results:
WCPApp1 = 146 896 SATISFIED WCPApp2 = 146 896 SATISFIED WCPApp3 = 146 896 SATISFIED WCPApp4 = 146 896 SATISFIED
…..
Period, Latency
Analytical TDMA Real-time
Method Allowing Composability
Cluster2 Results:
WCPApp3 = 85 001
WCPApp4 = 44 236
…..
Cluster1 Results:
WCPApp1 = 54 529 SATISFIED WCPApp2 = 59 895 SATISFIED
…..
Cluster 1 Design Requirements
Integration
MPSoC Integrated Design
with TDMA Scheduler
Design
Tool Activity Library Requirement
Figure 5.15: Two-Tier RT analysis method through TDMA clusters’ scheduler
end latency) for every SDFG in every cluster is obtained in isolation (without considering other clusters see Fig. 5.15) with the help of our state-based RT method (presented in the Sect.5.3). In our example in Fig.5.14, we can first an-alyze the Worst-Case Period (WCP) ofSDFG1 andSDFG2 belonging tocluster1 considering all contentions on the shared bus (for different arbitrations proto-cols) between the two SDFGs but without consideringcluster2. Then we do the same forcluster2 SDFGs without consideringcluster1 SDFGs. After that, every cluster is mapped to a slot of a fixed size equal to the maximum of obtained worst-case time among all SDFGs (obtained from the state-based RT method) in this cluster so that it is guaranteed that all SDFGs mapped to this slot are already executed when the slot expires without the need for preemption (see Fig. 5.15). The TDMA scheduler has the role to switch between the slots of different clusters when the slot time of every cluster expires.
Assuming that SDFGs running in one slot are independent from those run-ning in other slots, in order to calculate now the worst-case execution (Tcompos see Fig. 5.15) of single SDFGs when all clusters are integrated and executed on the MPSoC platform, we take advantage of composability property of such a TDMA based scheduling and can calculate it (similar to Eq. 2.2) using the
following formula:
Tcompos=
∑
Sl i=0Tmax(i) + (Sl×s), (5.6) where T can either be the worst-case period or the worst-case end-to-end dead-line depending on the timing requirement we are interested in, Tmax(i) is the maximal T among the SDFGs running in slot i, s is the scheduler worst-case delay time when switching from one slot to another andSl is the total number of slots.
One possible realization of the above TDMA clusters’ scheduler with the help of hardware customized timers can be found in Sect. 6.5.2. Another real-ization, which we described in [Fakih et al., 2013b], is used in the experiment in Sect. 7.2.5 requires the existence of a resource manager (hypervisor) in the MPSoC which takes care of the temporal and spatial segregation.