Networks with constant properties - Distributed methods for convex optimisation : application t

6.1.1 A basic instance of the NUM problem

In this section block-coordinate implementations of the mapping Gare tested on a benchmark instance of the NUM problem discussed in Section 3.4.5. We consider the formulation of the NUM problem studied in [ZRJO11] and already encountered in Example 2.1, where a network with node setN = {1, ..., n}and edge setEis represented by a directed graph G = (N;E). Each edge connects a pair of nodes i, j ∈ N and is denoted by the ordered pair (i, j)∈ E, where node i is arbitrarily chosen as the origin and j as the destination. A pair of nodes is connected by at most one edge. Direct transmissions of information are only possible between neighbours, i.e. pairs of nodes connected by an edge. The set of neighbours of a node i is denoted by N_i ⊂ N and we have j ∈ N_i i (i, j)∈ E or (j, i)∈ E.

The considered problem can be stated as minimise

∑n

i=1f_i(x_i)

subject to Ax= b . (6.1)

The objective is to minimise a convex functionf : R^p → Rof the information ows on the edges, gathered in the composite vector x = (x₁, ..., x_n), where, for i ∈ N, the vector x_i ∈ R^pⁱ is assigned to node i and contains the p_i ow variables (arranged in arbitrary order) corresponding to all the edges with origin i (∑n

i=1p_i = p with p_i ≥ 0 for i = 1, ..., n). The objective function is assumed to be additively separable with respect to x₁, ..., x_n, and we write

6.1. Networks with constant properties 149

f = ∑n

i=1f_i with f_i : R^pⁱ → R convex. The network supports a single information ow specied, for each node i ∈ N, by the incoming rate b_i ≥ 0 if i is a source node, or by the outgoing rate b_i < 0 if i is a sink node. Flow conservation at each node is guaranteed by the constraint Ax = b, where b = (b₁, ..., b_n) is a vector of R^p such that ∑n

i=1b_i = 0, and A denotes the node-edge incidence block matrix. For i, j ∈ N, A_ji is a 1×p_i row vector such that

A_ji =

⏐

(1, ...,1)^′ if j =i,

(0, ...,0)^′ if j ̸=i, (i, j)∈/ E, (0, ...,0

  

k−1

,−1,0, ...,0)^′ if (i, j) is the k^th edge with origin i.

The ow conservation constraint can be rewritten as h_i(x) = 0 for i ∈ N, where we dene h_i(x) =∑

j∈Ni∪{i}A_ijx_j −b_i.

The problem (6.1) is an instance of Problem 3.3 in Section 3.4.5 with u_i = 1, v_i = 0, m_i = 1 and thus Y_i = R for i∈ N. From (B.41), we nd

∇ij²g(y) =−∑

k∈N˜ij

A_ik[∇²f_k(x^∗_k(y))]⁻¹A^′_jk, i, j ∈ N, y ∈ Y. (6.2) Since ∇²f = diag(∇²f₁, ...,∇²f_n) is a block diagonal matrix, (6.2) can be rewritten as

∇²g(y) =−A[∇²f(x^∗(y))]⁻¹A^′, y ∈ Y, (6.3) where [∇²f(x^∗(y))]⁻¹ is diagonal with positive elements. Hence −∇²g can be seen as a scaled version of the form AA^′, which is is called the Laplacian matrix of the directed graph and such that

[AA^′]_ij =

⏐

deg(i) if i= j

−1 if i∈ N_j or j ∈ N_i 0 otherwise

, (6.4)

where deg(i) denotes the degree if node i, i.e. the number of nodes connected to i by an edge or, equivalently, the number of neighbours of i.

In our simulations, we use the objective function suggested in [ZRJO11]

and dened, for all i∈ N, by

f_i : x ∈ R^pⁱ → f_i(x) =∑^pi

j=1(e^γx^j +e^−γx^j), (6.5) where x_j denotes the jth component of x and γ is a positive constant.

F^t(ρ)

ρ S

N⁽⁰⁾ N⁽¹⁾

N⁽²⁾ N⁽³⁾

N⁽⁴⁾ N⁽⁵⁾ N⁽⁶⁾ N⁽⁷⁾

10⁰ 10−¹

10−² 10−³

10−⁴ 0

1 2

Generation of t = 1000 networks with n = 50 and random links, and computation, for S (dotted line) andN^(q) withq= 0, ...,7(plain lines), of F^t(ρ) = ¹_t∑t

s=1δρ^s≤ρ, where ρ^s denotes the spectral radius of the matrix convergence rate of the considered method observed for the s^th sample.

Figure 6.1: Spectral radii of the Gauss-Seidl and accelerated methods Rates of convergence

Fig. 6.1 compares, for this particular formulation of the NUM problem, the spectral radii ρ of the asymptotic convergence rates of the Gauss-Seidel algo-rithm S and the family of accelerated synchronous algorithms {N^(q)} given in (4.82).

We generate connected networks with 50 nodes where the existence of each communication edge is decided at random with probability ¹₂. The empirical probability distribution of the spectral radii of the convergence rates derived from the simulations using (4.83) and (4.85) are depicted on the gure. One sees that ρ is smaller for S than for N⁽⁰⁾ with similar amounts of informa-tion transmitted between the nodes. In other words, the Gauss-Seidel algo-rithm (S) used with local Newton scaling has not only the advantage over the Jacobi mode (G ≡N⁽⁰⁾) that it does not require global synchronism, but also converges asymptotically faster in terms of local gradient descents per node.

This result is in accordance with the Stein-Rosenberg theorem (Theorem A.1 in Appendix A). Indeed, it is easily seen that −∇²g meets the conditions of Theorem A.1 by using the decomposition proposed in (4.57), and writing

−∇²g = D −L−L^′, where, for any y ∈ Y, we know from (6.2) that D(y) is diagonal and positive denite and L is strictly lower triangular with non-negative elements, and we note that the forms D⁻¹(L+L^′)and (D−L)⁻¹L^′ give the asymptotic matrix convergence rates of the Jacobi and Gauss-Seidel algorithms, respectively.

By increasing the value of the parameter q and the quantity of exchanged information, ρ can be further reduced for N^(q) and becomes arbitrarily small for q large enough.

Constrained feasible sets and sequential implementations.

The sequential implementations of the mapping G are now tested in con-strained sets. Our problem is modied by considering, in each node i ∈ N,

6.1. Networks with constant properties 151

an additional inequality constraint of the type d_i(x) ≤ 0, where d_i(x) =

∑

j∈N_i∪{i}ζ_ij(x_j). We introduce an operator (·)⁺ which converts the scalar components of a vector v = (v₁, ..., v_l) into positive values, i.e, (v)⁺ .

= (|v₁|, ...,|v_l|). A limitation of the total activity of each node i ∈ N can then be obtained by setting

ζ_ij(x_j) =

⏐

A_ii(x_i)⁺−d_i if j = i,

−A_ij(x_j)⁺ if j ∈ N_i, (6.6) where d_i is a positive constant. The new problem can be stated as

minimise

∑n

i=1f_i(x_i) subject to ∑

j∈Ni∪{i}A_ijx_j −b_i = 0, i ∈ N

∑

j∈Ni∪{i}ζ_ij(x_j)≤0, i ∈ N

(6.7) and is an instance of Problem 3.3 withu_i = v_i = 1, m_i = 2, andY_i = R×R≥0

for i∈ N.

The parameters {d_i}^i∈N are chosen large enough so that the problem is well-conditioned and has one nite solution y^⋆ ∈ Y. We randomly generate the network with 50 nodes and 100 links depicted in Fig. 6.2. Then we run the Gauss-Seidel implementation of G from various starting points chosen at random in Y, as well as a sequential arbitrary implementation mode where it is assumed that the time t between two local projected gradient descents in a subset Y_i is distributed identically and independently for all i ∈ N with exponential distribution F(t) = 1−exp(−t), and that the block-coordinate sequences (ρ^k) are generated accordingly. Both algorithms are successively tested with local Newton and diagonal scaling.

The assumption on {d_i}^i∈N ensures that the nonnegativity constraints specifying Y are all active at y^⋆. By Proposition 4.7, local Newton scaling and diagonal scaling are thus equivalent in the reduced space at y^⋆, which is reached in nite time by all the algorithms in accordance with Proposition 4.6.

Table 6.1 displays the mean number k^⋆ of gradient descents per node needed by the algorithms to uncover {Aⁱ(y^⋆)}^i∈N together with an estimate of the standard deviation for random starting points. Thanks to second-order scal-ing, the Gauss-Seidel implementations require only 1 to 3 cycles to identify these constraints, while 4 to 5 updates per node are needed on average by the random algorithms. Hence the eort-saving diagonal scaling strategy seems more appropriate in this particular example, although local Newton scaling sometimes proves signicantly faster in more dicult problems.

The table also displays, for various precisions ϵ, the number of gradient descents per node observed before the norm of the projected gradient be-comes less than ϵ. This quantity is given for the Gauss-Seidel algorithms by

(a) Topology (b) Optimal routing

Random generation of a network with 50 nodes and 100 edges depicted in Figure 6.2(a), where the source nodes are depicted in white and sink nodes in grey, with diameters directly proportinal to the rate of incoming or outgoing information. Figure 6.2(b) displays the optimal information ows, where the thickness of an edge is directly proportional to the optimal rate of information transmitted over the edge.

Figure 6.2: Network with 50 nodes and 100 edges

Table 6.1: Convergence of the Gauss-Seidel and random algorithms

Statistics of k^⋆ and τ(·) for the Gauss-Seidel mode with local Newton scaling (Sˇ) and diagonal scaling (S¨), and for the random mode with local Newton scaling (Aˇ^k) and diagonal scaling (A¨^k) [estimated mean ±standard deviation].

k^⋆ τ(10⁻¹) τ(10⁻²) τ(10⁻³) τ(10⁻⁴) τ(10⁻⁵) τ(10⁻⁶) Sˇ 2.2±0.4 26.3±3.7 44.4±4.1 62.7±4.6 81.0±5.1 99.5±5.4 117.9±5.9 S¨ 3.0±0.5 24.2±3.4 42.4±3.6 61.0±3.6 79.4±3.8 98.1±3.9 117.2±4.3 Aˇ^k 4.3±1.2 48.7±5.2 86.6±5.4 122.9±5.9 161.0±5.6 199.0±6.9 238.2±8.1 A¨^k 5.2±1.5 43.8±7.8 80.3±7.4 116.4±7.3 154.7±8.9 193.0±9.8 231.9±9.3

τ(η) = min¯k{¯k : ∥E(y^k)^′∇g(y^k)∥ < η, ∀k ≥k¯}, whereE is dened in (4.56).

The magnitude of the projected gradient vanishes in a seemingly linear fashion for the algorithms. In this problem, we note that convergence is about twice faster for the Gauss-Seidel mode than the random mode.

6.1.2 Network lifetime-maximisation

In this section we address a more complex routing problem, similar to those treated in [ML06, BAR11, Bil12], where the aim is to optimise the lifetime, as previously dened in (2.14), of a network subject to constraints concerned with the capacity of the transmission channels and the maximum power con-sumption of the nodes.

We consider a connected network with n nodes distributed in a region.

The set of nodes is denoted by {1, ..., n} and the set of directed edges by E. Each node i has a limited battery supply b_i, and either generates packets of information or collects the packets forwarded by other nodes. The

parame-6.1. Networks with constant properties 153

ter s_i is the rate of information generated (s_i ≥ 0) or collected (s_i < 0) by node i. The set of neighbours N_i of a node i is dened like in Section 6.1.1 as the set of the nodes located at short distances from i. Direct transmissions of information are only possible between neighbours.

In the present setting, a nonnegative variable x˘, already introduced in Ex-ample 2.4, stands for an upper bound on the `inverse lifetime' of the network.

Our objective is thus to minimise x˘. Since x˘ is a global variable, we intro-duce the n auxiliary variables x˘₁, ...,x˘_n, where x˘_i is a copy of x˘ assigned to node i together with the consistency constraints of the type x˘_i = ˘x_j for any neighbouring node j ∈ N_i (cf. Example 3.9). We use nonnegative ow vari-ables as in Example 2.3, which facilitate the introduction of the constraints.

To each node i is assigned a vector x¯¯_i containing the ow varibles ⃗x_ij of all the edges (i, j) ∈ E starting form i, as well as the variables ⃗x_ji for all the edges (j, i) leading to i. The problem variables form a composite vector x = ((¯x¯₁,x˘₁), ...,(¯x¯_n,x˘_n)), where (¯x¯_i,x˘_i) contains all the nonnegative vari-ables assigned to the sensor i ∈ N.

By denition of the network lifetime, the constraintp_i(¯x¯_i) ≤b_ix˘_i should be introduced for every nodei∈ N, wherep_i(¯x¯_i)is an estimation of the expected power consumption at i under the local ow policy x¯¯_i. Reception power con-sumption is not considered, and it is assumed that the power concon-sumptionp_i is given by the linear model

p_i(¯x¯_i) =∑

j:(i,j)∈E¯eⁱ_ij⃗x_ij +∑

j:(j,i)∈E¯eⁱ_ij _ji⃗x , ∀i ∈ N, (6.8) where, for every i ∈ N_i, e¯ⁱ_ij is the mean energy cost of the transmission of a packet from i to j.

Since our intention is to apply dual distributed methods, we need to make sure that the dual function g enjoys attractive properties. In the presence of continuous constraints, we know from Corollary 3.1 that the compact-ness of the feasible set and the strict convexity of the cost function are suf-cient conditions for the dual function to be dierentiable on its domain.

Compactness of the feasible set X is ensured by adding the additional con-straint 0 ≤ (¯x¯_i,x˘_i) ≤ x¯_i at each node i ∈ N. Treating all the inequality constraints as side constraints, the feasible set reduces to the Cartesian prod-uct set X = ∏

i∈NX_i, where we dene

X_i = {(¯y,¯ y)˘ ∈ R^2|Nⁱ^|×R: 0 ≤(¯y,¯ y)˘ ≤x¯_i, p_i(¯x¯_i)−b_ix˘_i ≤ 0}, ∀i∈ N, (6.9) and |N_i| denotes the number of neighbours of i. The positive vector x¯_i has 2|N_i| + 1 components (i ∈ N). It species both limitations on the ca-pacities of the communication channels of the sensor i and an arbitrary large upper bound on the inverse lifetime variable x˘_i.

(a) Topology (b) Optimal routing

Figure 6.3: Random network with25 nodes

Strict convexity of the cost function in all the problem variables is ob-tained by regularisation. We proceed as in (3.59) by adding a separable term quadratic in all the variables with a small positive coecientϵ. All in all, the lifetime-maximising routing problem is formulated as the quadratic program

minimise

x∈X

∑

i∈Nx˘²_i +ϵ∥x∥²

subject to A_ix¯¯_i = s_i ∀i ∈ N

x_i = ˘x_j ∀i ∈ N, j ∈ N_i

(6.10) where A_i is the local incidence matrix at i, and X is dened in (6.9). Us-ing the results of Section 3.4.3, it is straightforward to show that the dual function g obtained after dualisation of the equality constraints in (6.10) is everywhere dierentiable and piecewise quadratic over the dual feasible setY (cf. Example 3.6), which for this problem is the whole dual space. The dual problem can be solved in a distributed manner by the sensors using the gra-dient methods explored in Chapter 4. In the rest of this section we report basic numerical experiments showing both the necessity of using line-search routines for the step-sizes and the importance of scaling.

A collection of sensor networks are randomly generated¹, so that the net-works are connected and distributed uniformly in a circular region. Fig-ure 6.3(a) depicts the topology and connectivity of an instance of these net-works with 25 nodes and 104 edges. It is assumed that b_i is equal for all the nodes and that each s_i is chosen randomly and uniformly in the inter-val [−1,1], except in one node j where s_j = −∑

k̸=js_k. We make the as-sumption of low trac conditions by assigning large values to the channel capacities {x¯_i}^i∈N, and use, as a path loss model for the transmissions, the d⁻⁴ power law suggested Section 2.1.3. Figure 6.3(b) depicts the ow policy solution of (6.10), where the segment sections are proportional to the corre-sponding packet ows.

1Since the eects of scaling and line-search are clearly noticeable on small problems with only hundreds of variables, and increasing the complexity of the problem tends to prohibitively slow down the convergence steepest gradient method, the upcoming tests are restricted to small networks.

6.1. Networks with constant properties 155

Table 6.2: Convergence of the gradient projection and the Gauss-Seidel algorithms

Statistics of τ(·) andφ(·)for the gradient projection algorithm with decreasing step-size (GP), the Gauss-Seidel mode with unit scaling (S˙), and the Gauss-Seidel mode with local Newton scaling (Sˇ) [estimated mean ±standard deviation].

τ(10⁻¹) τ(10⁻²) τ(10⁻³) τ(10⁻⁴)

GP (1.6±1.4)×10³ (10.4±9.3)×10³ (6.3±7.4)×10⁴ (1.4±1.4)×10⁵ S˙ 7.3±1.1 15.8±3.0 32.1±9.2 48.9±16.7 Sˇ 6.7±1.3 9.6±1.5 11.7±2.0 13.4±2.5

φ(10⁻¹) φ(10⁻³) φ(10⁻⁵) φ(10⁻⁷)

GP (7.1±4.6)×10⁴ (4.6±3.2)×10⁵ (2.9±3.2)×10⁶ (7.1±9.5)×10⁶ S˙ (7.8±3.2)×10² (18.0±7.5)×10² (3.4±1.2)×10³ (5.4±1.9)×10³ Sˇ (6.6±3.0)×10² (9.0±4.6)×10² (10.2±4.8)×10² (11.2±4.9)×10²

The dual of (6.10) is solved for all the network samples by successively applying the three following distributed optimisation methods: the gradient projection algorithm (3.80) with identity matrix scaling (steepest descent) and with decreasing step-size lawa^k = 0.1k^−0.5 (GP), the Gauss-Seidel implemen-tation of G with identity scaling (S˙), and the Gauss-Seidel implementation of G with local Newton scaling (Sˇ).

Since the primal of (6.10) only explicitely displays equality constraints, the gradient of the dual function is expected to vanish to 0 at dual solutions and the function τ dened in the previous section may be used to analyse the speed of convergence of the algorithms. A second function φ is introduced to characterise the quantity of information exchanged by the sensors during the optimisation process. Given an optimisation algorithm and a positive scalarη > 0, φ(η)is dened as an estimate of the average number of messages each sensor exchanges by message passing up to timeτ(η) during a run of the algorithm. Table 6.2 displays, for various precisions η, the expectation and standard deviation of τ and φ for the three methods as estimated from the random experiments. We see that the number of iterations and information transfers needed by the steepest gradient method is highly prohibitive even for small problems. Convergence is, however, much faster when line-search routines are implemented. According to the table, the convergence times and numbers of data transfers for this problem are further reduced when line-search is combined with second-order scaling.

In view of these experiments, it appears that scaling and line-search ac-celerate the convergence of gradient methods by several orders of magnitude.

The implementation of such techniques is thus needed in practice. In the net-work depicted in Figure 6.3, for instance, Sˇ identied the active constraints

at the optimum after only 7 iterations and solved the problem in a few tens of node updates, while hundreds of thousands of iterations were needed by the steepest gradient algorithm, which is clearly unmanageable in practical problems.

Im Dokument Distributed methods for convex optimisation : application to cooperative wireless sensor networks (Seite 161-169)