State-of-The-art in Multicast Routing Methodology and Theory

Recently, development of programming models for the NoC-based multiprocessor systems has been recently a hot topic in multicomputer research area. Ultimately, mul-ticast communication service has been a standard service in data parallel programming languages such as Fortran-D [73], Distributed Fortran 90 [155] and High Performance For-tran (HPF) [95]. Message passing libraries such as Message Passing Interface (MPI) [156]

and Parallel Virtual Machine (PVM) [79], [80], which are commonly used to design mes-sage passing programming models, also includes some standard procedures to perform collective communications such multicasting and broadcasting. Both libraries have been developed for computer languages such as Fortran and C/C++.

Multicast communications in the programming models can be effectively and effi-ciently implemented in the application layer of the NoC-based multiprocessor systems, as long as hardware infrastructures in network, data-link and physical layers supports the multicast services. Indeed, the multicast support as one of the collective communi-cation services can simplify the programming models and alleviate programming efforts for NoC-based multiprocessor systems.

In internet community, multicast data communication has been an interesting topic.

The works presented in [71], [170] and in [112] have presented some protocols utilized to support a multicast data transmission. The work in [112] especially provides a reli-able multicast communication by involving the use of multiple multicast channels for reducing receiver processing costs and reducing network bandwidth consumption in a multicast session. The works mentioned above are dedicated for off-chip networks not for on-chip network platform. However, the implemention of the multicast protocol in both different platforms has the same motivation, i.e. to reduce communication time and energy.

5.2 State-of-The-art in Multicast Routing Methodology and

5.2 STATE-OF-THE-ART INMULTICASTROUTINGMETHODOLOGY ANDTHEORY 121

Source Node Destination Node 5

0 1 2 3 4 5

y−Address

x−Address

Multicast Dual−Path

Multicast Tree−based

Multi−Path Multicast Static XY

Fig. 5.1: The traffic formations by using static tree-based, dual-path and multi-path multicast rout-ing methods.

the message with minimum number of paths.

The works in [137], [63] and [39] present the multicast methodology using path-based method. In path-based multicast routing, PEs that inject the message have to set up the order of headers containing the addresses of all multicast destination nodes in order to find optimum paths from the PEs to the destination nodes. Therefore, there will be time-overhead for the message preparation at source nodes. The path-based multicast routing is aimed at reducing or probably preventing the multicast messages conflicts in interme-diate nodes. Each multicast packet will acquire at most two sinking ports in a destination node to forward the multicast message i.e., LOCAL port (connected directly to a resource tile) and the other (one) port for forwarding/duplicating the multicast message to other destination nodes. In general, path-based multicast routing can be classified into dual-path and multi-dual-path multicast routing. In the dual-dual-path multicast routing, the number of maximum paths performed in the network is two, while in the multi-path multicast routing, the number of maximum path is four. However, the path-based multicast rout-ing avoid to do branchrout-ing. In each intermediate destination node, a packet is firstly for-warded from an input port to the LOCAL output port, while keeping the packet in the input port. Afterwards, the packet is routed to another requested output port. Fig. 5.1 presents the different routing paths performed by the dual-path and multi-path multicast routing methods.

The works in [19], [147], [124] present the routing methodology based on multicast tree. In the tree-based multicast routing, the header ordering in source nodes is not

re-quired (the order of the destination addresses can be freely determined). The multicast routing will form communication paths like branches of trees connecting the source node with the destination nodes at the end points of the tree branches. A higher probabil-ity that multicast deadlock occurs in intermediate nodes is the disadvantage of the tree-based multicast routing. However, the novel multicast scheduling for adaptive tree-tree-based multicast routing presented in this thesis can solve effectively and efficiently the multi-cast deadlock problem in the intermediate nodes, which makes the methodology more interesting.

In general, the multicast routings presented in [137], [63], [39], [19], [147] and [124]

are not suitable for on-chip networks. All these works utilize virtual channels to solve multicast deadlock problems. In general, FIFO queues as the main components in virtual channels dominate significantly the logic gate consumption. In the XHiNoC, the adaptive routing algorithms used to route unicast and multicast packets are the same and multicast contentions are solved without using virtual channels, resulting in a very efficient gate-level implementation of the routing function and data buffers.

The work presenting a path-based multicast routing dedicated for NoC has been intro-duced in [146]. The path-based multicast routing is designed to avoid multicast deadlock in the destination nodes by reserving virtual channels and giving priority to the multicast message over the unicast message on arbitration of link bandwidth. Experiments in the work show that the proposed multicast technique improves throughput, and does not exhibits significant impact on the unicast performance in a network with mixed unicast-multicast traffic “only if” the network is not saturated.

Compared to the work presented in [146], the proposed tree-based multicast schedul-ing presented in this thesis does not give priority to multicast messages (fair flit-by-flit arbitration between the unicast and multicast messages). Hence, our multicast technique does not have a significant impact on the unicast performance “even if” the network is saturated. The multicast routing methodology used in the XHiNoC also presents an interesting performance characteristic during saturating and non saturating conditions.

Moreover, the NoC router in [146] has not been synthesized into logic gate level.

5.2.2 Source and Distributed Multicast Routing

According to the place where the routing paths and routing decisions are made, the mul-ticast routing can be divided intocentralized (source) multicast routinganddistributed mul-ticast routing. By using the static tree-based mulmul-ticast routing in a mesh-based regular network for instance, the routing decision can be made by using the distributed routing approach, because the network orientation can be easily mapped to every network router to make correct routing paths.

The work in [216] presents the problem of synthesizing custom NoC architectures that are optimized for a given application, and considers both unicast and multicast traffic

5.2 STATE-OF-THE-ART INMULTICASTROUTINGMETHODOLOGY ANDTHEORY 123 flows in the input specifications. Several algorithms that can systematically examine dif-ferent flow partitioning are proposed. Algorithms based on Rectilinear-Steiner-Tree are then used to generate efficient network topology. The design flow of the work integrates floorplanning and deadlock-free routing determination. The work proposes a static solu-tion for deadlock-free multicast routing that is fixed to specific NoC applicasolu-tion. Hence, it look that the work in [216] can be classified into the centralized multicast routing with off-line (at design time) multicast routing paths computation.

The work in [138] presents a new heuristic multicast routing scheme that combines the distributed routing and source routing methods. The proposed path-based multi-cast routing scheme consists of two routing algorithms, i.e. a preprocessing algorithm for message preparation to find routing control information that will be carried by the message that are run at source node, and an algorithm for message routing that are made distributively in the intermediate nodes. The generated routing control information are in conjunction with destination address such that efficient routing decision can be made by forward nodes.

So far, there have been some other works that have introduced a NoC router with multicast routing service. The work in [1] for example presents aMulticast Router Rotary (MRR). The multicast routing algorithm in the MRR can be classified into a distributed routing method. The multicast contention in MRR is solved by implementing two single-direction internal rings in the switch, one in clock-wise single-direction and the other one in counter clock-wise direction. Without careful data flow rule, a dangerous permanent deadlock can occur especially when packets come from all different input ports, and each of them requests all output ports simultaneously. The proposed data flow rule in the MRR must even allow misrouting to avoid deadlock in a case that a packet cannot find a free output port. In any circumstance, misrouting can increase data communication energy due to the overhead misrouting traffic which can lead to a livelock situation. The work in [1] has not yet addressed this livelock issue. Moreover, an additional 10 internal buffers (5 for each ring) in the MRR will increase the area overhead of the router.

The work in [189] presents a Broadcast-multicast-enabled Logic-based Distributed Rout-ing (BLBDR). Another routing approach calledRecursive Partitioning Multicast (RP M) method is also presented in [208]. The BLBDR and RPM methods need for global net-work view and preprocessing algorithm for netnet-work partitioning. In the RPM method, a routing decision is made based on the current network partitioning that has been pre-viously computed recursively in a source node. The whole network is divided into at most eight subnets by the source node. The objective of the network partitioning is to minimize packet replication time. In general, a pre-processing network partitioning algo-rithm methods will lead to an initiation time overhead.

AVirtual Circuit Tree Multicasting(V CT M) method is presented in [107]. In the VCTM method, a setup packet must be sent in the network to configure a switched tree-based multicast virtual circuit. The virtual circuit configuration is implemented by using virtual channels. Hence, like the RPM method [208], both multicast routers have large logic area

(1,2) (2,2) (3,2)

(1,1) (2,1) (3,1)

Message A Message B

(a) Tree-based multicast routing

Message A

(1,2) (2,2) (3,2)

(1,1) (2,1) (3,1)

Message B

(b) Branching in dual/multi-path multicast-ing

Fig. 5.2: Multicast deadlock configurations when using tree-based and path-based multicast rout-ing in mesh networks.

cost due to the replication of buffers and control logics for the VCs arbitration.

Im Dokument Microarchitecture and Implementation of Networks-on-Chip with a Flexible Concept for Communication Media Sharing (Seite 154-158)