Communication Resources - Timing Issues of MPSoCs

2.3 Timing Issues of MPSoCs

2.3.3 Communication Resources

On the communication resource level, contention results when multiple PE try to access it concurrently. In addition, contention resulting from the coherence mechanism (for e.g. the case when one PE updates data in the destination memory, this update invokes an invalidation update on all caches which con-tain these data). Furthermore, contention could be also invoked by devices other than the PEs such as the I/O devices, or DMA controller. Depending on the nature of the communication resource, these contentions can be avoided or minimized e.g. in the case of Network-on-Chip (NoC) having enough chan-nels to serve all connected PEs [Kotaba et al., 2013]. Another major issue, is the kind of arbitration used which decides which PE by a concurrent access should be served first. While TDMA arbitration policy insures determinism since maximum latency can be guaranteed, other policies which allow starva-tion of PEs (e.g. fixed-priority) or highly depend on the run-time state (e.g.

First-Come-First-Serve: FCFS) are more difficult to analyze.

At the level of aMemory controller, contention due to concurrent access (de-pending on the interconnect type) can take place. In this case, the memory controller must continuously open and close new pages leading to timing be-havior impacts of the overall MPSoC timing bebe-havior.

Bridges can be used to connect multiple buses and interconnects to each

other. Also at the level of bridges contention occurs when multiple requests from the connected interconnects are issued to the bridge.

Since modeling communication resources (see Sect. 5.2.5) will be a major contribution of this thesis, we will elaborate in the following on their arbitration issues and their timing models.

2.3.3.1 Scheduling (arbitration)

Similar to the scheduling mechanisms presented in Sect. 2.2.1.1, we will de-scribe in the following the arbitration mechanisms of shared communication resources which control concurrent access requests of multiple PEs to a shared storage resource. All arbitration mechanisms used in this thesis are non-preemptive (c.f. [Abel et al., 2013] for a description of non-preemptive arbitration protocols) meaning that the arbiter grants access to the arriving access only if no other request is currently served.

We will differentiate (Similar to [Abel et al., 2013]) between Time-driven ar-bitration (TDMA) where a predefined schedule assigning fixed time slots to every PE to access the communication resources and Event-driven arbitration (First-Come-First-Serve, Round-Robin, Fixed-priority) where at run-time the arbitration mechanism decides which PE should be granted access.

Time-driven Arbitration We have already described the TDMA mechanism in Sect. 2.2.1.1used for scheduling SDFGs. In the following, the same mecha-nism will be described but now applied to arbitration of accessors on a shared communication resource. In this case, every PE is allocated a priori to a time slot of a fixed slot length where it can perform its operations (transfers on the shared storage resource).

In order to insure that every transaction of a PE finishes before the slot time expires (since we assume a non-preemptive arbitration), the length of all slots is set to the transaction communication time (including arbitration cycle time and memory access delay) with the maximal delay which can be requested on this communication resource i.e. ift_max_n is the maximal communication time of actor n needed to transport a number of tokens among all its ports to a target shared storage resource (including the latency of the storage resource) then the slot size T_sl can be calculated as follows:

T_sl =max{t_max₁, . . . ,t_max_n}

Knowing the slot size, we are now able to calculate the WCRT of a PE access to the shared storage resource according to a TDMA arbitration as follows:

twcrt_j =n×T_sl (2.4)

whereby n specifies the number of PEs and T_sl is the slot size (in time units). The TDMA arbitration is composable and flexible for the same reasons mentioned in Sect.2.2.1.1.

Event-driven Arbitration Here we will elaborate on the event-driven arbitra-tions used in this thesis. By a fixed-priorityarbitration policy, a unique priority is assigned to each PE and if contention occurs on the communication resource, the PE with the highest priority is granted access. Suppose that PE with sub-script 0 is the one with the highest priority then the WCRT of PE₀ can be calculated as follows [Pitter and Schoeberl, 2010]:

t_wcrt₀ = max

0<i≤n−1{t_WCCT_i−1}+t₀ (2.5) whereiis the identification of all lower priority PEs,nthe number of PEs in the system,t_WCCT_i represents the maximum duration among all access instances of storage resource accesses of PE_i andt₀ is the time needed to access the storage resource for PE₀. Eq.2.5 represents the case where another lower priority PE gets the communication resources and only after one cycle (t_WCCT_i −1)PE0the one with the highest priority issues a request, which is the worst-case scenario for PE₀. Yet, calculating the WCRT of lower priority PEs accessing storage resources is much more difficult (according to [Pitter and Schoeberl, 2010]) due to the fact that it is strongly dependent on the number of active PEs. For e.g.

suppose the higher priority PE prevent the lower priority PE from accessing the storage resource indefinitely, in this case obviously it is not an easy task to bound the WCRT of the lower priority PE.

Fair arbitration can be achieved throughround-robin (RR) arbitration (sim-ilar to RR scheduling Sect. 2.2.1.1). By every arbitration, a counter (typically beginning from 0) indicates the identification of the next PE to be granted the access to the interconnect and which is incremented after every arbitration and thus insuring starvation-freedom between accessors. The WCRT of a PE acces-sor on a communication resource with a RR arbitration can be calculated as follows [Pitter and Schoeberl, 2010]:

t_wcrt_j =

∑

∀i6=j

(t_WCCT_i) +t_j (2.6)

wheret_WCCT_i represents the maximum duration among all access instances to the storage resource accesses of PE_i (other processors thanPE_j), andt_jis the time needed to access the storage resource for PE_j.

In a First-Come-First-Serve (FCFS) arbitration a FIFO queue maintains re-quests from accessors and the oldest request in the queue is granted access to the shared communication resource. According to this scheme and similar to

HCLK

0xA000 0000

0x2F00 9801 NONSEQ

HREQ HGRANT HCNTRL HADDR HWDATA HREADY

time Data

Address Arbitration

Figure 2.8: Cycle-accurate Write single-beat transfer (based on [ARM, 2006, ICVerification , 2015])

RR arbitration, FCFS also insures starvation freedom and fairness among ac-cessors since the PE which has the oldest request is served earlier as the others.

Authors in [Shabbir et al., 2010] noted that, in the case of a FCFS arbitration, when a new access request of PE_i arrives on the communication resource, it is assumed that, in the worst-case, it waits for all other PEs which could al-ready have pending requests in the FIFO queue. This WCRT can be calculated similarly to the WCRT of a RR arbitration according to Eq.2.6.

For real-time applications with hard real-time requirements time-driven arbitration (e.g. TDMA) are superior to event-driven since their compos-able and predictcompos-able behavior makes validating their RT requirements feasible [Marwedel, 2010].

2.3.3.2 Timing models

Modeling communication resources with their timing properties is indispens-able for the timing validation of RT requirements of embedded applications running on MPSoCs (see Chap. 5). In this section, we will describe two dif-ferent timed models (c.f. bus-functional models in [Cai and Gajski, 2003]) of the communication resources: time-accurate communication model and cycle-accuratecommunication model.

A cycle-accurate model specifies delays (time) in terms of the bus master’s clock cycles which can be derived from the communication interconnect proto-col. Fig.2.8and Fig.2.9show how such a cycle-accurate protocol looks like for a Writesingle-beat and aWriteburst transfer respectively. In the following, we will describe exemplary the main differences between the two transfer styles ac-cording to AHB bus protocol [ARM, 2006,ICVerification , 2015]. In both cases, an arbitration phase first takes place, where a bus master requests access to the

HCLK

0x20

Data (0x20) NONSEQ

HREQ HGRANT HCNTRL HADDR HWDATA HREADY

time Arbitration

SEQ SEQ SEQ

0x28 0x2C 0x24

Data (0x24) Data (0x28) Data (0x2C)

Transfer

Figure 2.9: Cycle-accurateWriteburst transfer (4-beats based on [ARM, 2006, ICVerification , 2015])

bus by usingHREQsignal. If the arbiter decides that this master has the highest priority (according to an arbitration mechanism) then aHGRANTis asserted. Af-ter some delay the bus masAf-ter is notified and is set as current masAf-ter of the bus by setting HMASTERsignal (not depicted in the figures). Afterwards, address and data phases occur in which some differences can be observed between the single-beat (see Fig.2.8) and the burst transfer (see Fig.2.9). After checking if the slave is ready (HREADY), the master drivesHADDRalong with other control signals that indicate the type (Read/Write), size (byte, half-word, word) and length of the transaction.

In a single-beat transfer (see Fig.2.8) the length of transaction is set to sin-gle and the HTRANS signal (included in the control signals HCNTRL) is set to NONSEQ. In addition, arbitration is redone directly after the transaction is fin-ished (signaled by reseting the HGRANT), where the master if wanting to con-tinue communication must acquire again the bus (by asserting HREQ) and its access would be granted or blocked depending on the arbitration mechanism.

In a burst transfer (see Fig.2.9), however, the length may vary from two to sev-eral single-beats (from 4 8, to 16 single-beats per transfer in the AHB). In Fig.2.9 four sequential single-beats write accesses are depicted. After the arbitration phase, the master indicates a burst in the AHB protocol by usingHTRANSsignal (belonging to the control signalsHCNTRL).HTRANSis set first to NONSEQ indi-cating the first transfer of a new transaction. In the next address phase,HTRANS is set to SEQ indicating that a sequential transfer of the same transaction fol-lows. During this phase, the address is simply incremented to the next “beat”

(incrementing burst: e.g. 0x20,0x24,0x28and0x2Cin Fig.2.9). Meanwhile, the bus is reserved for the current master (HGRANTremains high) until the last

“beat” access is done. If any request from other masters is acquired, then it would be blocked until finishing the burst transfer and a new arbitration phase begins.

In difference to a cycle-accurate model, in a time-accurate model lower/up-per latency bounds (for e.g. in Fig.2.10the time is limited in the range between

(5, 15)

(10, 20)

(5, 25)

(5, 15)

Lowerbound = 5 + 10 + 5 + 5 = 25 Upperbound = 15 + 20 + 25 + 15 = 75

ack ready data [31:0]

address [15:0]

Figure 2.10: Time-accurate bus-functional model (taken from [Cai and Gajski, 2003])

25 and 75) are determined with the help of the time diagram of interconnect’s protocol abstracting away from details of the communication protocol. This model would be appropriate in case no accurate (constant) timings can be ob-tained when transferring data (of specific size) through an interconnect with a specific communication protocol.

Im Dokument State-Based Real-Time Analysis of Synchronous Data-flow (SDF) Applications on MPSoCs with Shared Communication Resources (Seite 38-43)