Projected Push-Sum Gradient Descent-Ascent for Convex Optimization with Application to Economic Dispatch Problems

(1)

Accepted Publication to the 2020 59th Conference on Decision and Control (CDC) December 14th-18th 2020, Jeju Island, Republic of Korea

DOI:10.1109/CDC42340.2020.9304360

Projected Push-Sum Gradient Descent-Ascent for Convex Optimization

with Application to Economic Dispatch Problems

Jan Zimmermann, Tatiana Tatarenko, Volker Willert, J¨urgen Adamy

Abstract— We propose a novel algorithm for solving convex, constrained and distributed optimization problems defined on multi-agent-networks, where each agent has exclusive access to a part of the global objective function. The agents are able to exchange information over a directed, weighted communication graph, which can be represented as a column-stochastic matrix. The algorithm combines an adjusted push-sum consensus pro-tocol for information diffusion and a gradient descent-ascent on the local cost functions, providing convergence to the optimum of their sum. We provide results on a reformulation of the push-sum into single matrix-updates and prove convergence of the proposed algorithm to an optimal solution, given standard assumptions in distributed optimization. The algorithm is ap-plied to a distributed economic dispatch problem, in which the constraints can be expressed in local and global subsets.

I. INTRODUCTION

We consider constrained optimization problems that are distributed over multi-agent-networks. In such scenarios, each agent has a local cost function, only known to the respective agent. The overall goal of the network is to minimize the sum of all local functions, while the exact form of the latter should remain private. This type of problem is known as a social welfare optimization. Objective variables are often subject to a variety of constraints, depending on the application, that need to be considered in the optimization process. For many of such constrained problems, it can be distinguished between global constraints that effect all agents in the system and local constraints that are only relevant to a single agent. An example application is the distributed economic dispatch problem (DEDP), where each agent rep-resents a generator with a distinct cost function. The goal of DEDPs is to minimize the overall cost for producing power, while matching the demand and keeping the production inside the generator’s limits. In such problems, the balancing constraint is globally defined, as it constrains the power production of all generators, but the limits of each generator should remain private and therefore local. Depending on the cost function choice, the resulting problem is either convex or non-convex.

We employ first order gradient methods as the core of the optimization strategy. Currently, a lot of work has been dedicated to optimization methods that use gradient tracking instead of the gradient at a distinct point in time. These methods, published for example in [7], [9] and [12], have

The authors are with the Control Methods and Robotics Lab at TU Darmstadt, Germany.

The work was gratefully supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) within the SPP 1984 “Hybrid and multimodal energy systems: System theoretical methods for the transforma-tion and operatransforma-tion of complex networks”.

the advantage that a constant step-size can be used for the gradient update, while first order methods usually require a diminishing steps-size sequence for convergence. However, constraints have not been considered in gradient tracking yet. On the other hand, a couple of publications have already been focused on first order methods that are able to respect constraints. A projection-free method that also uses the push-sum algorithm for spreading information, was published in [13] and further analyzed in [17]. For this method, the constraints are incorporated into the objective by a penalization function. One property of this approach is that it considers all constraints to be local, which makes this method applicable to a wide range of optimization problems, including the DEDP. However, next to the step-size sequence, a second sequence for a penalization parameter needs to be determined. Choosing those dependent sequences optimally proved to be non trivial [17].

One of the first projection-based, distributed gradient meth-ods was published in [8]. However, the proposed method relies on double-stochastic matrix-updates, which restrict the communication to undirected graphs. The contributions in [4] and [15] rely on row-stochastic communication matrices for diffusing the projected gradient information. Finally, [14] employs the push-sum consensus combined with a projection that uses a convex proximal function. The major drawback of the mentioned projection-based methods in relation to the specific structure of the DEDP under consideration is the assumption that all constraints of the distributed optimization problem are known by every agent and therefore global. This restricts the privacy of the agents with local constraints. Compared to the projection-free method in [13], they have the advantage that no penalization parameter sequence needs to be chosen.

Our work seeks to combine the advantage of the projection methods’ reduced parameter number with the ability to respect local constraints, while assuming directed commu-nication architectures. This is done by exploiting the explicit distinction of the constraints into local and global. Similar to the approaches in [5] and [13], we employ the push-sum average consensus for information diffusion. However, by formulating the push-sum algorithm into a single row-stochastic matrix-update for easier analysis, the basic struc-ture of the algorithm is closer to the ones in [4] or [15], which also use row-stochastic updates, but do not rely on the push-sum consensus. For convergence to the optimum, we propose a novel distributed gradient descent-ascent algorithm, which uses a projection in order to incorporate global constraints, while respecting local constraint by adding them over a

(2)

Lagrange multiplier to the local cost functions.

Within this article, we make the following contributions: First, we provide a reformulation of the unconstrained push-sum consensus, which differs to the one published in e.g. [5], and provide convergence properties of the update matrix. Secondly, we prove convergence of the distributed gradient descent-ascent method to an optimal solution of the problem, respecting privacy of the local constraints. At last, we show that our proposed algorithm is applicable to the DEDP. The paper is structured as follows: In section II we provide our notation for formulas and graphs. The main part begins in section III with the formulation of the problem class and the results on the reformulation of the push-sum, before our algorithm for solving the defined problems is proposed. The subsequent section IV provides the convergence proof of the proposed method. In section V we undergird the theoretic results by a simulation of a basic DEDP, before we summarize and conclude our results in section VI.

II. NOTATION ANDGRAPHS

Throughout the paper we use the following notation: The set of non-negative integers is denoted by Z+. All time indices t in this work belong to this set. To differentiate between multi-dimensional vectors and scalars we use bold-face. || · || denotes the standard euclidean norm. 1, ..., n is denoted by [n]. The element ij of matrix M is denoted by Mij. The operation QX(u) is the projection of u onto the convex set X such thatQ

X(u) = arg minx∈X||x − u||. Instances of a variable x at time t are denoted by x[t]. The directed graph G = {V, E} consists of a set of vertices V and a set of edges E = V × V. Vertex j can send information to vertex i if (j, i) ∈ E. The communication channels can be described by the Perron matrix P , where Pij > 0 if (j, i) ∈ E and zero otherwise. This notation includes self-loops such that Pii> 0. The set containing the in-neighborhood nodes is N_i+, while N_i−is the set of out-neighbors. The out degree of node i is denoted with di= |Ni−|. We say that a directed graph is strongly connected if there exists a path from every vertex to every other vertex.

III. PROJECTED PUSH-SUM GRADIENT DESCENT-ASCENT

A. Problem formulation and matrix update of the push-sum consensus

We consider optimization problems of the form

min x F (x) = minx n X i=1 Fi(x), (1a) s.t. gi(x) ≤ 0, gi(x) = (gi1(x), ..., gim(x))T, (1b) x ∈ X ⊂ Rd, (1c)

where functions Fi : Rd → R and gij : Rd → R are differentiable for i = [n] and for j = [m]1. While we consider gi(x) to be local constraint functions of agent i, the constraint set X is assumed to be global and therefore

1_{For the sake of notation, we assume here that all agents have m local} constraints. However, the following considerations hold for arbitrary, yet finite numbers of local constraints that can differ between the agents.

known by every agent.

After defining M0_i = {µi ∈ Rm|µi 0} and by using µ = (µT_i)n_i=1, the dual function of the problem takes the form q(µ) = inf x∈X ( n X i=1 Fi(x) + µTi gi(x) ) , (2)

with which the dual problem

max µ q(µ), (3a) s.t. µ ∈ M0, (3b) with M0 = {µ ∈ Rn×m_|µ i ∈ M0i, i = [n]} can be defined. In accordance to that, the global Lagrangian is the sum of the local Lagrangians

L(x, µ) = n X i=1 Li(x, µi) = n X i=1 Fi(x) + µTigi(x). (4)

Before we continue with the analysis of the push-sum consensus for information diffusion, we make the following assumptions regarding the above problem:

Assumption 1. F is strongly convex on X and gi is convex for i = [n]. The optimal value F∗ is finite and there is no duality gap, namely F∗ = q∗. There exist compact sets Mi ⊂ M0i, i = [n] containing the dual optimal vectors µ∗_i, i = [n].

Assumption 2. X ⊂ Rd _{is convex and compact.}

Remark 1. Given the convexity properties of the problem and Slater’s constraint qualification, we have strong duality, which implies that the duality gap is zero. Furthermore, the optimal vector forµi is then uniformly bounded in the norm (see [2]). Thereby, Assumption 1 is given, for example, if Slater’s condition holds, F and gi, i = [n], are continuous and the domainX is compact.

Assumption 3. The gradients ∇xLi(x, µi), ∇µiLi(x, µi)

exist and are uniformly bounded onX and Mi, i.e.∃Lx< ∞, Lµi < ∞ such that ||∇xLi(x, µi)|| ≤ Lx for x ∈

X , µi ∈ Mi and ||∇µiLi(x, µi)|| ≤ Lµi for x ∈ X , µi ∈

Mi.

Remark 2. Note that this means that for either fixed µi or fixedx the Lagrangian Li(x, µi) is Lipschitz-continuous with constantLx,Lµi, respectively.

Assumption 4. The Graph G = {V, E} is fixed and strongly connected. The associated Perron matrix P is column-stochastic.

Remark 3. If, for example, agent i weights its messages by 1/(di), with direpresenting the out degree ofi, the resulting communication matrix contains the elements Pij = 1/di, which achieves column-stochasticity ofP .

(3)

Recall the push-sum consensus protocol from [1], [3] y[t + 1] = P y[t], (5a) x[t + 1] = P x[t], (5b) zi[t + 1] =

xi[t + 1] yi[t + 1]

, (5c)

with initial states x[0] = z[0] = x0 and y[0] = 1. The agent-wise update of z can be rewritten as a matrix update, as it is done for example in [6]. For that, define the time dependent matrix Q[t] such that

Q(y[t + 1], y[t]) = Q[t] = diag(y[t + 1])−1P diag(y[t]). (6) Thereby, we can merge the update equations of x and z into z[t + 1] = Q[t]z[t]. (7) Some important properties of Q[t] can now be proven, which hold independently of the values of y[t] at different time instances t. Those are summarized in the following Lemma. But first, we introduce the matrix

Φ(t, s) = Q[t]Q[t − 1]...Q[s + 1]Q[s], t > s (8) with Φ(t, t) = Q[t], for easier notation.

Lemma 1. Given Assumption 4, the time-dependent commu-nication matrixQ[t] of equation (6) and the matrix product Φ(t, s) defined in (8) have the following properties:

a) MatrixQ[t] and matrix Φ(t, s) are row-stochastic for 0 ≤ s ≤ t and all t.

b) limt→∞Φ(t, 0) = _n111T.

c) limt→∞Φ(t, s) = _n11y[s]T for finite0 ≤ s < t . Proof. Part a):

According to equation (6), we can write

Φ(t, s) = diag(y[t + 1])−1Pt+1−sdiag(y[s]). For s = 0 it holds that

Φ(t, 0)1 = diag(y[t + 1])−1Pt+1I1 = 1,

because, with y[t + 1] = Pt+11, each dimension of y[t + 1] contains the sum over the respective row of Pt+1. Therefore, diag(y[t+1])−1norms the rows of Pt+1, such that the above holds. As this is true for arbitrary t, we can factor out Q[t] and show row-stochasticity for all Q[t]:

Φ(t, 0)1 = Q[t]Φ(t − 1, 0)1 = Q[t]1 = 1. Thereby, we also have for 0 ≤ s ≤ t

Φ(t, s)1 = Q[t]Q[t − 1]...Q[s]1 = 1. Part b):

From the Perron-Frobenius Theorem [11], we know that, for a column-stochastic matrix P , the limit

lim

t→∞y[t + 1] = limt→∞P

t+1_{1 = w1}T_{1 = nw,}

holds, with w being the right eigenvector of P for the eigenvalue λ = 1. Therefore, lim t→∞Φ(t, 0) = 1 n(diag[w]) −1 w1T = 1 n11 T , which is a double-stochastic matrix.

Part c):

Using the results from b), we can write for finite s < t lim t→∞Φ(t, s) = 1 n11 T_{diag(y[s]) =} 1 n1y[s] T_. Using the column-stochasticity of P , we have

1 n1y[s] T_{1 =} 1 n11 T_(Ps₎T_{1 =} 1 n11 T_{1 = 1,} which shows row-stochasticity.

The following Lemma provides us with bounds on the matrix updates.

Lemma 2. Given Assumption 4. The matrix Q[t] is defined according to (6) and Φ(t, s) as in (8). Then, there exist constants C > 0 and λ ∈ (0, 1) that satisfy the following expressions fori, j = [n], 0 ≤ s ≤ t and ∀t:

a) Φ(t, 0)ij− 1 n n X i=1 Φ(t, 0)ij ≤ Cλt ₍₉₎ b) Φ(t, s)ij− 1 n n X i=1 Φ(t, s)ij ≤ Cλt−s ₍₁₀₎

Proof. Part a):

We add −1/n + 1/n to the term on the left side of the inequality in (9) and apply the triangle inequality, what results in Φ(t, 0)ij− 1 n + 1 n n X i=1 Φ(t, 0)ij− 1 n .

We already showed convergence of the first term to 1/n in Lemma 1. Since the column sum of _n111T _{is equal to 1,} the second term also converges. Therefore, we can bound above expression by Cλt with C > 0, λ ∈ (0, 1), which is a standard procedure for row-stochastic, non-negative matrix multiplications, see for example Proposition 1 in [5]. Part b):

Following the same line of thought as in a), we add +1

nyj[s] T ₋ 1

nyj[s]

T _{to the left side of the inequality in} (10) and receive: Φ(t, s)ij− 1 nyj[s] T + 1 n n X i=1 Φ(t, s)ij− 1 nyj[s] T .

Again, Lemma 1 showed convergence of the first term to zero for finite s < t. Summing again over all columns of the matrix _n11y[s]T_{, we receive the vector y[s]}T _{and therefore} convergence of the second term. Note that for s = t, the expression does not converge. Thereby, we can bound the above by Cλt−s _{with C > 0 and λ ∈ (0, 1), as it is done in} the proposition cited in a).

(4)

B. Projected push-sum gradient descent-ascent

We propose the following agent-wise update equations for solving problem (1): yi[t + 1] = n X j=1 Pijyj[t], (11a) zi[t] = 1 yi[t + 1] n X j=1 Pijyj[t]xj[t], (11b) xi[t + 1] = Y X zi[t] − αt ∇xLi(zi[t], µi[t]) yi[t + 1] ! , (11c) µi[t + 1] = Y Mi µi[t] + αt∇µiLi(zi[t], µi[t]) ! . (11d)

The algorithm above is based on the idea of the push-sum consensus protocol in (5), combined with the descent-ascent procedure to update the local optimization variable xi and dual variable µi for each agent i ∈ [n]. Moreover, note that the dual variables are projected on the local sets Mi, which are defined in Assumption 1. We refer the reader to [16] for possible strategies each agent can use to define its own Mi locally.

The zi- and yi-update equations can be written in the more concise matrix notation

y[t + 1] = P y[t], z[t] = Q[t]x[t],

as Pijyj[t]/yi[t + 1] represent the elements ij of matrix Q[t]. Remember that, resulting from Assumption 4, the Perron matrix P is column-stochastic and Pij = 0 if agent j has no communication link to i. The local gradients ∇xLi(zi[t], µi[t]) of each agent are locally weighted with yi[t + 1]. Note that yi[t] > 0, for i = [n] and all t, resulting from its initialization with yi[0] = 1, i = [n], and the update by a column-stochastic, non-negative matrix.

We reformulate above procedure for easier analysis. For that, we define the local disturbance terms

xi[t] = Y X zi[t] − αt ∇xLi(zi[t], µi[t]) yi[t + 1] ! − zi[t], µi i [t] = Y Mi µi[t] + αt∇µiLi zi[t], µi[t] ! − µi[t]

and express equations (11c) and (11d) by

xi[t + 1] = zi[t] + xi[t], (12a) µi[t + 1] = µi[t] + µii[t]. (12b) IV. CONVERGENCE OF PROPOSED ALGORITHM

In what follows, we show convergence of the proposed algorithm in (11) to an optimal primal dual pair of the distributed problem. For that, we first make a standard assumption in distributed optimization regarding the step-size of the distributed gradient descent and ascent:

Assumption 5. The non-increasing, positive step-size se-quenceαthas the properties:

a)limt→∞αt= 0, b)P∞t=0αt= ∞, c) P∞t=0α 2 t < ∞. For example, this assumption holds true for step-sizes of the form αt=_tcγ, ∀t ≥ 1, with c > 0 and γ ∈ (0.5, 1].

It is possible to bound the norm of the disturbances xi[t] and µi

i [t], defined in the reformulations (12a) and (12b), using the non-expansive property of the projection operator and the fact that the update z[t] = Q[t]x[t] by the row-stochastic matrix Q[t] lies inside the convex constraint set X . This is the case, because all xi[t] are projected onto said constrained set in the previous time-step and every zi[t] lies inside the convex hull spanned by x[t]. Therefore, zi[t] and µi[t] must lie inside the sets X and Mi, respectively, in every time step. A similar approach can be found in [15]. This allows us to exploit the Assumption 3 on boundness of the Lagrangian gradients as follows

||x i[t]|| ≤ αt ∇xLi(zi[t], µi[t]) yi[t + 1] ≤ |αt|Lx yi[t + 1] , (13) ||µi i [t]|| ≤ ||αt∇µiLi(zi[t], µi[t])|| ≤ |αt|Lµi. (14)

Using the step-size properties of Assumption 5, it can be concluded that lim t→∞αt= 0 =⇒ limt→∞|| x i[t]|| = 0, lim t→∞|| µi i [t]|| = 0, (15) ∞ X t=0 α2t< ∞ =⇒ ∞ X t=0 αt||xi[t]|| < ∞, ∞ X t=0 αt|| µi i [t]|| < ∞ (16) This result will be used in the proof of the following Lemma. Lemma 3. The Assumptions 2, 3 and 4 are given. Denote the average at timet with ¯x[t] = _n1Pn

i=1xi[t]. Then, a) if Assumption 5 a) is true,limt→∞||xi[t] − ¯x[t]|| = 0. b) if Assumption 5 c) is true,P∞

t=0αt||xi[t] − ¯x[t]|| < ∞. Proof. Part a):

Expanding equation (12a), xi[t] can be expressed as

xi[t] = n X j=1 Φ(t−1, 0)ijxj[0]+ t−2 X s=0 n X j=1 Φ(t−1, s+1)ijxj[s] + x_j[t − 1].

Inserting this into ||xi[t] − 1/nP n

i=1xi[t]||, repeatedly ap-plying the triangle inequality and using the results from Lemma 2, we receive ||xi[t] − ¯x[t]|| ≤ Cλt n X j=1 ||xj[0]|| + t−2 X s=0 Csλt−s−2s n X j=1 x_j[s] + ||x_i[t − 1]|| +1 n n X i=1 ||x i[t − 1]|| .

(5)

We are now considering the limit t → ∞ for each line separately.

The expression on the right side of the first line converges to zero, because λ ∈ (0, 1) and ||xj[0]||, j = [n] can assumed to be finite. For the second line, we use Lemma 7 from [8], stating that if for some positive scalar sequence γt it holds that limt→∞γt= 0, then

lim t→∞ t X s=0 βt−sγs= 0,

where β ∈ (0, 1). From implication (15) we know that, given Assumption 5, the limit of the positive, scalar sequence

x_j[s]

converges to zero for j = [n]. Therefore, we can apply the results of Lemma 7 from [8] to the second line after the inequality by substituting k = t − 2 and conclude

lim k→∞ k X s=0 Csλk−ss n X j=1 x_j[s] = 0.

Following from implication (15), the third line converges to zero as well, which concludes the proof of part a). Part b): We have ∞ X t=0 αt||xi[t] − ¯x[t]|| ≤ ∞ X t=0 αtat+ ∞ X t=0 αtbt + ∞ X t=0 αt ||xi[t − 1]|| + 1 n n X i=1 ||x i[t − 1]|| ! (17) with sequences at= n X j=1 Φ(t − 1, 0)ij− 1 n n X i=1 Φ(t − 1, 0)ij ||xj[0]|| and bt= t−2 X s=0 n X j=1 Φ(t−1, s+1)ij− 1 n n X i=1 Φ(t−1, s+1)ij x_j[s] .

From Lemma 2 a) we know that the sequence at can be bounded by a sequence a0_tas follows

at≤ Cλt n X j=1 ||xj[0]|| = a0t. The seriesP∞

t=0a0t converges, as it is a geometric, conver-gent series with 0 < λ < 1. By direct comparison test it follows thatP∞

t=0at converges as well, because 0 ≤ at ≤ a0_t.

Resulting from Assumption 5, αt is a positive, non-increasing sequence. Therefore, there exists a 0 < K < ∞ such that αt ≤ K. Because of that, the first element after the inequality sign of equation (17) is summable, because

∞ X t=0 αtat≤ K ∞ X t=0 at< ∞. (18)

Using Lemma 2 b), we receive

bt≤ t−2 X s=0 Csλt−s−2s n X j=1 ||x j[s]||. Summing over all t and multiplying with αt, we get

∞ X t=0 αtbt≤ ∞ X t=0   t−2 X s=0 Csλt−s−2s n X j=1 αs||xj[s]||  ,

where we used the non-increasing property αs ≤ αt for s ≤ t. Now, define γs= n X j=1 αs||xj[s]||.

We know from implication (16) thatP∞

t=1γt< ∞. Accord-ing to Lemma 7 from [8], we know

∞ X t=0 t−2 X s=0 λt−s−2γs< ∞.

Applying this to our case, we conclude ∞ X t=0 αtbt< ∞. Finally, we have ∞ X t=0 αt||xi[t − 1]|| ≤ ∞ X t=0 αt−1||xi[t − 1]|| < ∞,

where we used again the implication (16) and the fact that αt is non-increasing. Therefore, ∞ X t=0 αt||xi[t] − ¯x[t]|| < ∞

holds, which concludes the proof.

The above Lemma is necessary for the convergence proof of the suggested algorithm. Next, we provide an upper bound for the algorithm updates in each time step.

Proposition 1. Let Assumptions 2, 3 and 4 hold. Then, for the optimal primal dual pairx∗∈ X , µ∗

i ∈ Mi andt ≥ t0, the following bound holds

n X i=1 yi[t + 1]||xi[t + 1] − x∗||2+ ||µi[t + 1] − µ∗i|| 2 ≤ n X i=1 yi[t]||xi[t] − x∗||2+ ||µi[t] − µ∗i|| 2 − 2αt(L(¯x[t], µ∗)−L(x∗, µ∗)+L(x∗, µ∗)−L(x∗, µ[t])) + 2αtLxn n X i=1 ||xi[t]− ¯x[t]|| + L2xα 2 t n X i=1 1 wi + α2_t n X i=1 L2_µ i,

where µ∗ = (µ∗₁, . . . , µ∗_n) and wi is such that limt→∞yi[t] = wi.

(6)

Proof. Inserting xi[t + 1] and µi[t + 1] of equations (11c) and (11d), using the non-expansive property of the projection operator, the fact that the optimal values (x∗, µ∗) lie within the sets X and Mi, respectively, and by expanding the quadratic norm, the following inequality holds

yi[t + 1]||xi[t + 1] − x∗||2+ ||µi[t + 1] − µ∗i|| 2 _(19a) ≤ yi[t + 1]||zi[t] − x∗||2+ ||µi[t] − µ∗i||2 (19b) − 2αt∇xLi(zi[t], µi[t]) T (zi[t] − x∗) (19c) + 2αt∇µiLi(zi[t], µi[t]) T (µi[t] − µ∗i) (19d) + α 2 t yi[t + 1] ||∇xLi(zi[t], µi[t])||2 (19e) + α2_t||∇µiLi(zi[t], µi[t])|| 2_, _(19f)

Next, we sum the left and right side of above inequality from i = 1 to n and analyze every line after the inequality sign separately.

Using Jensen’s inequality and the fact that z[t] = Q[t]x[t], we get || n X j=1 Q[t]ijxj[t] − x∗||2≤ n X j=1 Q[t]ij||xj[t] − x∗||2

Thus, we can bound the first addend in (19b) as n X i=1 yi[t + 1] n X j=1 Pijyj[t] yi[t + 1] ||xj[t] − x∗||2 ≤ n X j=1 yj[t]||xj[t] − x∗||2 n X i=1 Pij = n X i=1 yi[t]||xi[t] − x∗||2,

where we replaced Q[t]ij with its elements in the first line, rearranged the sums in the second and used the column-stochasticity property of P . With that,

n X i=1 (19b) ≤ n X i=1 yi[t]||xi[t] − x∗||2+ ||µi[t] − µ∗i|| 2_.

For (19c), because of convexity of Li(zi[t], µi[t]) for fixed µi[t], we have

−∇xLi(zi[t], µi[t]) T

(zi[t] − x∗)

≤ Li(x∗, µi[t]) − Li(zi[t], µi[t]). In (19d), Li(zi[t], µi[t]) depends affinely on µi[t] with fixed zi[t] and therefore

∇µiLi(zi[t], µi[t])

T

(µi[t] − µ∗i)

= Li(zi[t], µi[t]) − Li(zi[t], µ∗i). Combing above results, adding +Li(x∗, µ∗i) − Li(x∗, µ∗i) and +Li(¯x[t], µ∗i) − Li(¯x[t], µ∗i), as well as summing from i = 0 to n, we receive n X i=1 (19c) + (19d) ≤ −2αt L(¯x[t], µ∗) − L(x∗, µ∗) + L(x∗, µ∗) − L(x∗, µ[t]) − 2αt n X i=1 (Li(zi[t], µ∗i) − Li(¯x[t], µ∗i)) .

The last line in above expression can be further bounded

−2αt n X i=1 (Li(zi[t], µ∗i) − Li(¯x[t], µ∗i)) ≤ 2αt n X i=1 |Li(zi[t], µ∗i) − Li(¯x[t], µ∗i)| ≤ 2αtLx n X i=1 ||zi[t] − ¯x[t]|| ≤ 2αtLx n X i=1 n X j=1 Q[t]ij||xj[t] − ¯x[t]|| ≤ 2αtLxn n X i=1 ||xi[t] − ¯x[t]|| ,

using Lx− Lipschitz continuity of the Lagrangian for fixed µ∗_i, triangle inequality and the fact that 0 ≤ Q[t]ij < 1. With that, n X i=1 (19c) + (19d) ≤ −2αt L(¯x[t], µ∗) − L(x∗, µ∗) + L(x∗, µ∗) − L(x∗, µ[t]) + 2αtLxn n X i=1 ||xi[t] − ¯x[t]|| .

There exists a t0 such that for all t > t0, it holds that yi[t] ≥ nw₂i > w₂i as limt→∞yi[t] = wi. Therefore, using the gradient bounds it holds for t > t0 that

n X i=1 (19e) + (19f) ≤ L2_xα2_t n X i=1 1 wi + α2_t n X i=1 L2_µ i.

Combing above results concludes the proof.

Before we are able to finally prove convergence of our method to the optimum of problem (1), we provide the following Lemma, which is the deterministic version of a Theorem in [10]:

Lemma 4. Let {vt}∞t=0, {ut}∞t=0, {bt}∞t=0 and {ct}∞t=0 be non-negative sequences such that P∞

t=0bt < ∞ and P∞

t=0ct< ∞ and

vt+1≤ (1 + bt)vt− ut+ ct, ∀t ≥ 0.

Thenvt converges andP∞t=0ut< ∞.

This Lemma will be the key element for proving the following main result:

Theorem 1. Let Assumptions 1-5 hold. Then, xi[t] and µi[t], updated by the rules in(11a) - (11d), converge to an optimal primal dual pair(x∗, µ∗_i) ∈ X × M for i = [n] as t → ∞.

(7)

Proof. To apply Lemma 4 let us define vt= n X i=1 yi[t]||xi[t] − x∗||2+ ||µi[t] − µ∗i|| 2_, ut= 2αt(L(¯x[t], µ∗) − L(x∗, µ∗) + L(x∗, µ∗) − L(x∗, µ[t])), ct= 2αtLxn n X i=1 ||xi[t] − ¯x[t]|| + L2xα 2 t n X i=1 1 wi + α_t2 n X i=1 Lµi, bt= 0.

For showing that ctis sumable, we recall from Lemma 3b) that, under given assumptions,P∞

t=0αt||xi[t] − ¯x[t]|| < ∞. Therefore, 2Lxn ∞ X t=0 αt n X i=1 ||xi[t] − ¯x[t]|| < ∞.

By Assumption 5, we directly have L2_xPn i=1 1 wi + Lµi P∞ t=0α 2 t < ∞. Together, this results in _∞ X t=0 ct< ∞.

Applying Lemma 4, we can then make the statements ∃δ, lim t→∞vt= limt→∞ n X i=1 yi[t]||xi[t]−x∗||2+ ||µi[t]−µ∗i|| 2 = δ ≥ 0, (20) ∞ X t=0 ut= ∞ X t=0 2αt(L(¯x[t], µ∗) − L(x∗, µ∗) + L(x∗, µ∗) − L(x∗, µ[t]) < ∞. (21) Because the sum of the step-size is infinite, P∞

t=0αt= ∞, by assumption, there need to exist subsequences x[tl], µi[tl], such that

lim

l→∞L(¯x[tl], µ

∗_)−L(x∗_{, µ}∗_)+L(x∗_{, µ}∗_)−L(x∗_{, µ[t} l]) = 0 Because L(x[t], µ[t]) is affine for fixed x[t] = x∗ and convex for fixed µi[t] = µ∗i, it holds that, ∀tl,

L(¯x[tl], µ∗)−L(x∗, µ∗) ≥ 0, L(x∗, µ∗)−L(x∗, µ[tl]) ≥ 0. Therefore, the limit holds, if and only if

lim l→∞L(¯x[tl], µ ∗_{) = L(x}∗_{, µ}∗_), lim l→∞L(x ∗_{, µ[t} l]) = L(x∗, µ∗).

Following from convergence of vt to some constant δ, the subsequences ¯x[tl] and µ[tl] are bounded. With that, we can choose convergent subsequences ¯x[tls] and µ[tls], such that

lims→∞(¯x[tls], µ[tls]) = (ˆx, ˆµ) ∈ X × M, since X and M

are closed. Therefore, it holds that lim s→∞L(¯x[tls], µ ∗_{) = L(ˆ}_{x, µ}∗_{) = L(x}∗_{, µ}∗ ) and lim s→∞L(x ∗_{, µ[t} ls]) = L(x ∗_{, ˆ}_{µ) = L(x}∗_{, µ}∗_).

Resulting from the strong convexity of F (x) over X , the equality L(ˆx, µ∗) = L(x∗, µ∗) = minxL(x, µ∗) implies that ˆx = x∗. Due to dual feasibility of ˆµ, ˆx = x∗ and L(x∗_{, ˆ}_{µ) = L(x}∗_{, µ}∗_{) = max}

µ≥0L(x∗, µ), it is implied that (ˆx, ˆµ) = (x∗_{, µ}∗_{). Next, taking into account Lemma 3a)} and (20), we obtain δ = lim t→∞ n X i=1 yi[t]||xi[t] − x∗||2+ ||µi[t] − µ∗i|| 2 = lim s→∞ n X i=1 yi[tls]|| ¯x[tls] − x ∗_||2_{+ ||µ} i[tls] − µ ∗ i|| 2 = 0.

Finally, as yi[t] > 0 for all t, we conclude lim t→∞||xi[t] − x ∗_{|| = 0,} lim t→∞||µi[t] − µ ∗ i|| = 0 for i = [n]. V. SIMULATION

As motivated in the introduction, we consider an economic dispatch problem as an example application. In this problem, a group of networked generators seeks to fulfill some pre-defined demand D while minimizing their summed up local cost functions, which are assumed to take a quadratic form. Formally, the problem can be defined by

min p n X i=1 Ci(pi) = min p n X i=1 aip2i + bipi+ ci, (22a) s. t. n X i=1 pi= D, (22b)

p_i,min≤ pi≤ pi,max, ∀i ∈ [n]. (22c) The power balance constraint (22b) consists of the sum of all local decision variables pi and is therefore part of the global constraint set. Furthermore, we add the technical constraint that all power outputs of the generators should be positive p ≥ 0, defining the global constraint set as P = {p|p ≥ 0,Pn

i=1pi= D}. Thereby, we ensure that P is closed and bounded and therefore compact, satisfying Assumption 2. Furthermore, this provides that the projection onto P binds the generation pi, because, regardless of other pj, j = [n] 6= i, it holds that 0 ≤ pi ≤ D. The lower and upper power limits of each generator in constraint (22c) allocate the local part of the constraint set and are therefore exclusively known by the respective agent i.

The solution set is non-empty if n X i=1 pmin,i≤ D ≤ n X i=1 pmax,i.

Therefore, there exists at least one relative interior point for which the affine equality and inequality constraints are fulfilled, which satisfies the Slater constraint qualification. Together with the strong convexity of the cost functions, we

(8)

0 1,000 2,000 3,000 4,000 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Iteration Relati v e error δ1 δ2 δ3 δ4

0 1,000

2,000

3,000

4,000

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6 Iteration

Relati

v

e

error

δ

₁

δ

₂

δ

3

δ

4

Fig. 1. Convergence of relative error with step size at= 15/t0.60. Largest error after 4000 iterations: 0.01.

conclude that Assumption 1 is given for this problem. The local Lagrangians of the problem takes the form

L(pi, µi) = aip2i + bipi+ ci

+ µi,1(pi,min− pi) + µi,2(pi− pi,max)

To check, whether Assumption 3 is satisfied, the gradients of the local Lagrangians are inspected. First, we realize that pi[t] ≡ zi[t] of equation (11b) is uniformly bounded in every time step. This results from projecting the gradient update onto the convex and compact set P and communicating the results (x[t + 1] in equation (11c)) in the next time step via a row-stochastic communication matrix, such that the result lies inside the convex hull spanned by x[t + 1] and therefore lies inside of P. Using this result and the fact that Slater’s constraint qualification is given, which provides us with uniform bounds on µi(see Remark 1), we can conclude that both ∇pL(pi, µi) and ∇µiL(pi, µi) can be uniformly

bounded. Together with the strong convexity of the cost functions (22a), we conclude that Assumption 3 is given for Problem (22).

We chose a simple setup of four generators and designed the demand D and generator limits such that above equation is true. Each agent i maintains an estimation pi, containing all decision variables, i.e. pi = (pi)ni=1. The agents were connected by a static, strongly connected graph, such that Assumption 4 is satisfied. At last, the step size sequence was chosen by a grid search according to Assumption 5 with at= 15/t0.60.

In Figure 1, the convergence of the relative errors δk= |pk− p∗_k|/p∗

k, k = [4] of one example agent are depicted. Three of the four states approach 0 already after 500 iterations, while the error δ3 shows slow convergence over several hundred iterations. After 1000 iterations, all errors are below 0.04 and after 4000 iterations the highest relative error in the agent system is lower than 0.01.

VI. CONCLUSION

In the work at hand, we tailored a solution method to a class of distributed, convex optimization problems that need to respect both global and local constraints. Each agent projects its gradient update onto the global set while updating a Lagrange parameter, over which their local constraints are added to the cost function. Convergence to the optimal value was proven and some convergence properties shortly discussed by an example of the economic dispatch problem. Future work will include augmenting the method for time-varying communication architectures in order to make the algorithm more robust against failing communication chan-nels.

REFERENCES

[1] T. Charalambous and C. N. Hadjicostis. Average consensus in the presence of dynamically changing directed topologies and time delays. Proceedings of the IEEE Conference on Decision and Control, 2015(February):709–714, 2014.

[2] J.-B. Hiriart-Urruty and C. Lemar´echal. Convex analysis and mini-mization algorithms : 1. Fundamentals, 1996.

[3] D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, volume 2003-January, pages 482–491. IEEE Computer Society, 2003.

[4] H. Li, Q. Lu, and T. Huang. Distributed Projection Subgradient Algorithm over Time-Varying General Unbalanced Directed Graphs. IEEE Transactions on Automatic Control, 64(3):1309–1316, 2019. [5] A. Nedi´c and A. Olshevsky. Distributed optimization over

time-varying directed graphs. IEEE Transactions on Automatic Control, 60(3):601–615, March 2015.

[6] A. Nedi´c, A. Olshevsky, and M. G. Rabbat. Network Topology and Communication-Computation Tradeoffs in Decentralized Optimiza-tion. Proceedings of the IEEE, 106(5):953–976, 2018.

[7] A. Nedi´c, A. Olshevsky, and W. Shi. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM Journal on Optimization, 27(4):2597–2633, 2017.

[8] A. Nedi´c, A. E. Ozdaglar, and P. A. Parrilo. Constrained consensus and optimization in multi-agent networks. IEEE Trans. Automat. Contr., 55(4):922–938, 2010.

[9] S. Pu, W. Shi, J. Xu, and A. Nedi´c. Push-pull gradient methods for distributed optimization in networks. CoRR, abs/1810.06653, 2018. [10] H. Robbins and D. Siegmund. A convergence theorem for non negative

almost supermartingales and some applications. Optimizing methods in statistics, Academic Press, New York, pages 233–257, 1971. [11] E. Seneta. Non-negative Matrices and Markov Chains. Springer Series

in Statistics. Springer New York, New York, NY, 1981.

[12] W. Shi, Q. Ling, G. Wu, and W. Yin. EXTRA: an exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization, 25(2):944–966, 2015.

[13] T. Tatarenko, J. Zimmermann, V. Willert, and J. Adamy. Penalized push-sum algorithm for constrained distributed optimization with application to energy management in smart grid. In 58th Conference on Decision and Control (CDC), Dezember 2019.

[14] K.I. Tsianos, S. Lawlor, and M.G. Rabbat. Consensus-based dis-tributed optimization: Practical issues and applications in large-scale machine learning. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1543– 1550. IEEE, October 2012.

[15] Sy Van Mai and Eyad H. Abed. Distributed optimization over weighted directed graphs using row stochastic matrix. Proceedings of the American Control Conference, 2016-July:7165–7170, 2016. [16] M. Zhu and E. Frazzoli. Distributed robust adaptive equilibrium

computation for generalized convex games. Automatica, 63:82 – 91, 2016.

[17] J. Zimmermann, T. Tatarenko, V. Willert, and J. Adamy. Optimales Energie-Management ¨uber verteilte, beschr¨ankte Gradientenverfahren. at - Automatisierungstechnik, 67(11):922–935, November 2019.