Convex optimisation - Distributed methods for convex optimisation : application to cooperative

other point of X, i.e. if there exists an ϵ > 0 such that

f(x)≤ f(y), ∀y ∈ X. (3.12)

The vector x is a strict local minimum if the inequality in (3.11) is strict for y ̸= x, and a strict global minimum if the inequality in (3.12) is strict fory ̸= x. Similarly, if f : Rⁿ → [−∞,∞) and X ⊂ Rⁿ, a vector x ∈ dom(f)∩ X is said to be a local (or global) maximum off overX if it is a local (respectively, global) minimum of −f over X.

Solving (3.10) is generally a dicult task without further assumptions on the feasible set and objective function. Also, many applications usually con-sider that the set X is convex, and closed. A collection of systematic methods for nding local optima over convex sets is reported in Section 3.4.4. The same methods yields global minima when, in addition, the objective function has the attractive property to be convex. The next result states that, if in (3.10) both the feasible set X and the objective function f are convex, then any local minimum off overX is also a global minimum over X, hence a solution of the problem.

Proposition 3.4 (Minima of convex functions over convex sets) Let f : Rⁿ → (−∞,∞] be a convex function and X a convex subset of Rⁿ, then any local minimum of f over X is also a global minimum. If in addition f is strictly convex, then there exists at most one global minimum of f over X. The study of optimisation problems with convex feasible sets and objective functions forms a distinctive branch of the nonlinear optimisation framework, called convex optimisation. The convex optimisation problem will be intro-duced in Section 3.4.2 and discussed throughout this manuscript.

The general nonlinear optimisation problem

In many nonlinear optimisation problems, the structure of the feasible setX is specied by a system of inequality and equality constraints. We now formulate a variant of (3.10) with explicit characterisation of the feasible set, and refer to it as the general nonlinear optimisation problem.

Problem 3.1 (General nonlinear optimisation problem) Consider the optimisation problem

minimise

x∈S f₀(x)

subject to f_i(x)≤0, i= 1, ..., q h_i(x) = 0, i= 1, ..., r

(3.13) where f₀ : Rⁿ → (−∞,∞] is continuously dierentiable, f₁, ..., f_q and h₁, ..., h_r are functions Rⁿ →(−∞,∞], and S is a nonempty subset of Rⁿ.

3.4. Convex optimisation 47

We refer to f₁, ..., f_q as the inequality constraint functions, and to h₁, ..., h_r as the equality constraint functions. The feasible set of Problem 3.1 is thus given by

X = {x∈ S| f_i(x)≤ 0, i = 1, ..., q, h_j(x) = 0, j = 1, ..., r}. (3.14) Notice that equality constraints are redundant in theoretical analyses since any equality constrainth_i(x) = 0can be replaced by a system of two inequality constraints h_i(x) ≤ 0, −h_i(x) ≤ 0. Besides, the constraint x ∈ S, called side constraint, has minor importance and is often omitted in the literature.

Indeed, it is easily seen that the problem minimise^x∈Sf₀(x) is equivalent to minimisexf₀(x) +χ_S(x), where the characteristic function χ_S is dened as χ_S(x) = 0 if x ∈ S, and χ_S(x) = ∞ if x /∈ S. We nevertheless decide to keep both the equality constraints and the side constraint in the formulation of Problem 3.1, as they sometimes simplify the notation and analysis.

3.4.2 Convex optimisation

Introduction and examples

The developments of this study are mainly concerned with convex optimisa-tion. The problem stated in (3.10) is called a convex optimisation problem when the feasible set X is convex and f is a convex, continuously dieren-tiable function. Convex optimisation problems have the specicity that all their minimisers are global minimisers, in accordance with Proposition 3.4.

A variant of the convex optimisation problem is when a concave function f : Rⁿ → [−∞,∞) is to be maximised over a convex set, which is equivalent to the convex problem of minimising −f over the same set and can be con-sidered in a similar way. We refer to this problem as the convex optimisation problem with concave objective.

Basic examples of convex optimisation problems, such as linear programs and quadratic programs are presented in Examples 3.1 and 3.2.

Example 3.1 (Linear program) A linear programming (LP) problem (also called linear program) is a convex optimisation problem with ane objective and constraint functions. It can be shown that every linear program can be rewritten in the standard form

minimise

x c^′x

subject to Ax=b x≥0

(3.15)

where the variablexbelongs to some real spaceRⁿ,c is a vector ofRⁿ,Aar×nreal matrix, and b a vector ofR^r.

Example 3.2 (Quadratic program) A quadratic programming (QP) problem (sometimes quadratic program) is usually dened as a convex optimisation problem with a quadratic objective function and ane constraint functions. The quadratic program can be formulated as

minimise

2x^′Cx+d^′x subject to Ax=b

Gx≤h

(3.16)

where the variablexis a vector ofRⁿ,C is a symmetric matrix ofR^n×n,d∈Rⁿ A∈R^r×n,b∈R^r, G∈R^q×n, andh∈R^q.

Another example of a convex optimisation problem with quadratic objective function is the orthogonal projection on a convex set, dened in Example 3.3.

In Chapter 4 we will use the scaled projection operator, which is introduced in Example 3.4 as an extension of the orthogonal projection operator.

Example 3.3 (Orthogonal projection) The orthogonal projection [x]⁺_X of a vector x∈Rⁿ on a closed convex setX is dened by

[x]⁺_X = arg min

y∈X ∥x−y∥², (3.17)

where∥x∥ = (x^′x)¹² is the Euclidean norm of x(see also Equation (A.1) in Appendix A). Equiva-lently,[x]⁺_X can be dened as the solution of the convex optimisation problem

minimise

y (x−y)^′(x−y)

subject to y∈X (3.18)

which reduces to a quadratic program whenX is a polyhedron specied by a nite collection of ane inequality constraints.

Example 3.4 (Scaled projection) For any vector x∈Rⁿ and any symmetric, positive denite scaling matrixT ∈R^n×n, we dene the scaled norm of xby

∥x∥^T = (x^′T x)¹². (3.19)

The scaled projection[x]⁺_X,T ofxon a closed convex set X is dened as [x]⁺_X,T = arg min

y∈X ∥x−y∥²T⁻¹. (3.20)

If T is the identity matrix, ∥x∥^T reduces to the Euclidean norm of x, and [x]⁺_X,T to the orthogo-nal projection of x on X, denoted by [x]⁺_X. More generally, [x]⁺_X,T is the solution of the convex optimisation problem

minimise

y (x−y)^′T⁻¹(x−y)

subject to y∈X (3.21)

which reduces to a quadratic program whenX is a polyhedron. Some useful properties of the scaled norm and the scaled projection operator are derived in Appendix B.4.

Optimality condition

The optimality condition in convex optimisation problems takes the following form.

3.4. Convex optimisation 49

Proposition 3.5 (First-order optimality condition) Let f : Rⁿ → (−∞,∞] be a convex continuously dierentiable function and X a convex subset of Rⁿ. A vector x ∈ dom(f)∩X is a global minimum of f over X i

∇f(x)^′(y−x) ≥0, ∀y ∈ X. (3.22) If the vector x is an interior point of X, then (3.22) reduces to ∇f(x) = 0. The rst-order optimality condition at a given point of the convex set X can be interpreted in terms of the existence of descent directions at this point. A vector d ∈ Rⁿ is called a feasible direction at a point x ∈ X if one can nd an ϵ > 0 such that x+ϵd ∈ X. The vector d is said to be a descent direction for f at x if ∇f(x)^′d < 0. Hence, the condition (3.22) states that no descent direction exists at x, or equivalently, that the value of the function cannot be decreased by a displacement from x along any feasible direction. The computation of descent directions plays an important part in many iterative algorithms for convex optimisation. If such a direction can be found at a feasible point x^k, then it is always possible to nd a `better' point x^k+1 such that f(x^k+1) < f(x^k) by searching along the considered descent direction.

In a broader context where f is not necessarily convex, any vector x ∈ dom(f)∩X which satises (3.22) is said to be a stationary point. Note that, when the function f is not convex, this condition is satised by local minima but also by some other points such as local maxima. When f is convex, all the stationary points are global minima over X and thus solutions of the considered optimisation problem. Since the developments of this manuscript are mainly concerned with the minimisation of convex functions over convex sets, we will mostly speak of global minima and solutions (rather than of stationary points) so as to avoid any confusion.

Standard formulation

In this section, we formulate the convex optimisation problem in standard form, where inequality and equality constraints are displayed explicitly. The standard form of the convex optimisation problem resembles the general non-linear optimisation problem (Problem 3.1) with the additional assumption that the objective and inequality constraint functions are convex, the equality constraint functions are ane, and S is a convex set.

Problem 3.2 (Convex optimisation problem in standard form) Consider the optimisation problem

minimise

x∈S f₀(x)

subject to f_i(x) ≤0, i = 1, ..., q h_i(x) = 0, i = 1, ..., r

(3.23)

where f₀ : Rⁿ → (−∞,∞] is convex continuously dierentiable, f₁, ..., f_q are convex functions Rⁿ → (−∞,∞], h₁, ..., h_r are ane functions Rⁿ → R, and S is a convex subset of Rⁿ.

It follows from Proposition B.4 in Appendix B.1 that the feasible set of Prob-lem 3.2, still given by (3.14), is now the intersection of convex sets and thus convex. Notice that the assumption that any equality constraint h_i(x) = 0 is ane can be explained by the fact that this equality constraint should be equivalent to the system of the two convex inequality constraints h_i(x) ≤ 0 and −h_i(x) ≤ 0, which is only possible if h_i is both concave and convex and thus ane. Again, the equality constraints and the side constraint are dispensable but their presence will simplify the notations in the sequel.

3.4.3 Lagrange duality

In this section we consider the general nonlinear optimisation problem formu-lated in Section 3.4.1 (Problem 3.1). Note that neither the objective function of Problem 3.1, nor its feasible set (3.14) is necessarily convex.

The dual problem and weak duality

The Lagrange duality framework consists of considering simultaneously the objective and constraints of the problem by augmenting the objective func-tion with a weighted sum of the constraint funcfunc-tions. In Problem 3.1, the Lagrangian function l : Rⁿ×R^q ×R^r is dened by

l(x, y, z) =f₀(x) +

∑

i=1

y_if_i(x) +

∑

i=1

z_ih_i(x) (3.24) where y = (y₁, ..., y_q) is a vector of R^q, z = (z₁, ..., z_r) is a vector of R^r, and y₁, ..., y_q and z₁, ..., z_r are called the Lagrange multipliers or dual variables.

The Lagrange dual function g is obtained by minimisation of the La-grangian along the primal variables x₁, ..., x_n, i.e.

g(y, z) = inf

x∈S∩Dl(x, y, z) (3.25)

where we dene D = [⋂q

i=1dom(f_i)] ∩[⋂r

i=1dom(h_i)]. Note that when the problem is convex, we can simply write g(y, z) = inf_x∈Sl(x, y, z).

It is reasonable to assume that the objective functionf₀ takes a nite value at at least one vector of Rⁿ, i.e. dom(f₀) ̸= ∅ (such a function is sometimes called proper). Under this assumption, the dual function g : R^q × R^r → [−∞,∞)is concave, as the pointwise inmum overS∩D of l(x, y, z)regarded

3.4. Convex optimisation 51

as an ane function of (y, z) with parameter x (see Appendix B.1). Notice that g is concave even when Problem 3.1 is not convex.

Recall (3.14) and consider the optimal value of Problem 3.1, given by p^⋆ = inf

x∈Xf₀(x). (3.26)

For any solution x^⋆ of Problem 3.1, we have f₀(x^⋆) = p^⋆. Suppose now thatx is a feasible point of Problem 3.1. We have x ∈ S, f_i(x) ≤ 0 for i = 1, ..., q, and h_i(x) = 0 for i = 1, ..., r. It follows that l(x, y, z) ≤ f₀(x) and thus g(y, z) ≤f₀(x) for any (y, z) such that y ≥0. Since the last result holds for any feasible x, we nd g(y, z) ≤ p^⋆ for (y, z) in R^q≥0×R^r, where the dual function gives a lower bound on the optimal value p^⋆.

The question that the duality theory raises, is how close the largest lower bound supplied by the dual function

d^⋆ = sup

y≥0,z

g(y, z) (3.27)

is to the optimal value p^⋆. The quantity d^⋆ is in fact the optimal value of the dual problem

maximise

y,z g(y, z)

subject to y ≥0 (3.28)

which is a convex optimisation problem with concave objective where the inequality constraint takes the simple formy ≥ 0. The feasible set of the dual problem, or dual feasible set is the set

Y = {(y, z) ∈ R^q ×R^r|y ≥0, g(y, z) > −∞}. (3.29) Any vector (y, z) ∈ Y is called dual feasible. It is common to refer to the initial problem (3.13) as the primal problem, and to any feasible point of the primal problem as primal feasible.

For any pair of primal and dual feasible vectors x,(y, z), we have

g(y, z) ≤d^⋆ ≤p^⋆ ≤f₀(x). (3.30) We are thus able to locate the optimal values of the primal and dual problems in the interval [g(y, z), f₀(x)]. The width f₀(x)−g(y, z) of this interval is called the duality gap associated with x and (y, z). The duality gap is used in particular to derive stopping criteria for iterative algorithms in the form of error bounds for the estimation of the optimal values.

The duality gap is minimised by the quantity p^⋆ −d^⋆, called the optimal duality gap of the problem, and we havep^⋆−d^⋆ ≥0. The nonnegativity of the duality gap is called the weak duality principle. Weak duality states that the

optimal value of the dual problem is an underestimator of that of the primal problem. It is thus possible to nd a lower bound on the optimal value of Problem 3.1 by solving its dual (3.28), which is a convex optimisation prob-lem (with concave objective) allowing for the convex optimisation methods discussed further in the chapter.

Strong duality and constraint qualication

In the case when the optimal duality gap is zero, i.e. p^⋆ = d^⋆, one says that strong duality holds. Under strong duality, the best lower bound for p^⋆ provided by the dual function is tight, and the optimal value of the primal problem can be found by solving the dual problem.

Although strong duality does not necessarily hold for the general formu-lation of Problem 3.1, there exist some assumptions for the solutions of the primal problem, called regularity conditions or constraint qualications, that guarantee the existence of solutions of the dual problem with zero optimal duality gap. A minimiser of the primal problem which satises a constraint qualication is called a regular point. For instance, a minimiser x of Prob-lem 3.1 in the interior of S is regular if the inequality and equality constraint functions are all ane (linearity constraint qualication), if the constraint functions are continuously dierentiable at x and the gradients at x of the active inequality constraints and of the equality constraints are linearly inde-pendent (linear independence constraint qualication), or even if these gradi-ent are only positive-linearly independgradi-ent (Mangasarian-Fromovitz constraint qualication).

Strong duality is easier to obtain in the convex setting of Problem 3.2.

One constraint qualication, called the Slater condition, is specic to convex optimisation problems. Problem 3.2 satises the Slater condition if one can nd a feasible vector x where the non-ane inequality constraints hold with strict inequalitysuch a point is sometimes called strictly feasible, i.e. if there exists anxin the interior ofS such thatf_i(x) ≤0fori= 1, ..., q,h_i(x) = 0 for i = 1, ..., r, and f_i(x) < 0 for i ∈ I, where I ⊂ {1, ..., q} denotes the index set of the inequality constraint functions that are not ane. The notion of constraint qualication is illustrated by Example B.1 in Appendix B.3.

The concept of duality is explained in Example 3.5 for the case when the optimisation problem is a linear program. When the orthogonal and scaled projectionspreviously introduced in Examples 3.3 and 3.4are done on polyhedral sets, they reduce to quadratic programs with strictly convex ob-jective. The duality of a general instance of a quadratic program is considered in Example 3.6.

3.4. Convex optimisation 53

Example 3.5 (Dual of a linear program) Consider the linear program in standard form (3.15).

The Lagrangian is given by

l(x, y, z) =c^′x−y^′x+z^′(Ax−b) = (A^′z−y+c)^′x−b^′z, (3.31) where we introduce y ∈Rⁿ as the vector of the dual variables corresponding to the inequality con-straints, and z ∈ R^r as the vector of the dual variables related to the equality constraints. By minimisation of the Lagrangian with respect tox, we obtain the dual function

g(y, z) =

⏐

−b^′z ifA^′z−y+c= 0

−∞ ifA^′z−y+c̸= 0 . (3.32)

It follows from (3.32) that the dual problem maximisey≥0,zg(y, z)reduces to maximise

(y,z) −b^′z

subject to A^′z−y+c= 0 y≥0

(3.33)

or, equivalently,

minimise

z b^′z

subject to A^′z+c≥0 (3.34)

which is a linear program. Since strong duality holds, the problem (3.15) and its dual (3.34) have the same optimal values. Similarly, it is easily seen that the dual of (3.34) yields the initial prob-lem (3.15).

Example 3.6 (Dual of a quadratic program) We study the dual of the quadratic program (3.16), where we assume that the symmetric matrixC is positive denite and thus invertible. The Lagrangian is given by

l(x, y, z) = 1

2x^′Cx+d^′x+y^′(Gx−h) +z^′(Ax−b) =1

2x^′Cx+ (G^′y+A^′z+d)^′x−(h^′y+b^′z), (3.35) where y ∈R^q andz ∈R^r are the dual variable vectors. For any (y, z)∈R^q×R^r, the Lagrangian is minimised when its derivative with respect to xis zero, i.e. when Cx+G^′y+A^′z+d = 0, or equivalentlyx=−C⁻¹(G^′y+A^′z+d). It follows that dual function is given by

g(y, z) =−1

2(G^′y+A^′z+d)^′C⁻¹(G^′y+A^′z+d)−(h^′y+b^′z) (3.36)

=−1

2(y, z)^′[(G, A)C⁻¹(G, A)^′](y, z) + [(h, b)−(G, A)C⁻¹d]^′(y, z)−1

2d^′C⁻¹d (3.37) where (G, A) ∈ R^q+r ×Rⁿ denotes the vertical concatenation of the matrices G and A, and it is easily seen that(G, A)C⁻¹(G, A)^′ is a positive semidenite symmetric matrix with dimension(q+r).

The dual function g is thus a (strictly concave) quadratic function with ane gradient ∇g(y, z) =

−[(G, A)C⁻¹(G, A)^′](y, z) + (h, b)−(G, A)C⁻¹d, for all (y, z) ∈ R^q ×R^r, and constant Hessian

∇²g(y, z) = −(G, A)C⁻¹(G, A)^′. If I denotes the identity matrix in R^q ×R^q and we set C˜ .

= (G, A)C⁻¹(G, A)^′ andd˜ .

= (h, b)−(G, A)C⁻¹d, we nd that the dual problem is given by maximise

(y,z) −¹2(y, z)^′C(y, z) + ˜˜ d^′(y, z)

subject to (−I,0)^′(y, z)≤0 (3.38)

which is a quadratic program in concave form with strictly concave objective and ane inequality constraints.

Complementary slackness and Karush-Kuhn-Tucker conditions

In this section we discuss the optimality conditions of primal minimisers. Con-sider Problem 3.1 and suppose that, under some constraint qualication, the optimal duality gap is zero. Letx^⋆ and (y^⋆, z^⋆)be solutions of the primal and dual problems, respectively. We have

d^⋆ = g(y^⋆, z^⋆)≤l(x^⋆, y^⋆, z^⋆) ≤f₀(x^⋆) =p^⋆. (3.39) Since d^⋆ = p^⋆, we nd that the inequalities in (3.39) hold with equality signs.

It follows on the one hand that x^⋆ and (y^⋆, z^⋆) have zero duality gap, and on the other hand that ∑q

i=1y_i^⋆f_i(x^⋆) = 0, and thus

y_i^⋆f_i(x^⋆) = 0, i = 1, ..., q, (3.40) by nonnegativity of y^⋆₁, ..., y_q^⋆. The condition (3.40), known as complementary slackness, is a necessary condition for the optimality of a primal-dual pair of vectors.

In many applications the objective function f₀ and constraint functions f₁, ..., f_q and h₁, ..., h_r happen to be continuously dierentiable, in which case the optimality conditions take a particular form. Under the assumption of zero optimal duality gap, we consider two pointsx^⋆ and (y^⋆, z^⋆), respectively solu-tions of the primal and dual problems. Recalling that the inequalities in (3.39) hold with equality signs, it follows from (3.25) that x^⋆ minimises l(·, y^⋆, z^⋆) over S ∩D. If x^⋆ is an interior point of S, and f₀, f₁, ..., f_q and h₁, ..., h_r are continuously dierentiable atx^⋆, then the gradient atx^⋆ofl(x, y^⋆, z^⋆)seen as a function ofxmust be0, i.e. ∇f(x^⋆)+∑q

i=1y^⋆_i∇f_i(x^⋆)+∑r

i=1z_i^⋆∇h_i(x^⋆) = 0. The next result summarises the necessary optimality conditions of a primal-dual pair under strong primal-duality.

Proposition 3.6 (Karush-Kuhn-Tucker (KKT) conditions) Let x be a local minimiser of Problem 3.1 with dom(f₀) ̸= ∅, and assume that x is an interior point of S, that f₀, ..., f_q and h₁, ..., h_r are continuously dieren-tiable at x, and that a constraint qualication holds at x. Then there exists a point (y, z)∈ R^q ×R^r satisfying

f_i(x)≤0, i = 1, ..., q, (3.41) h_i(x) = 0, i= 1, ..., r. (3.42) y_i ≥0, i = 1, ..., q, (3.43) y_if_i(x) = 0, i= 1, ..., q, (3.44)

∇f₀(x) +∑^q

i=1y_i∇f_i(x) +∑^r

i=1z_i∇h_i(x) = 0. (3.45)

3.4. Convex optimisation 55

In particular, (3.41) and (3.42) are called the primal feasibility conditions, (3.43) the dual feasibility condition, and (3.44) can be identied as the comple-mentary slackness condition previously stated in (3.40). Notice, in particular, that if x is a global minimiser of Problem 3.1 satisfying the KKT conditions together with a dual pair (y, z), then

g(y, z)^(3.44)= l(x, y, z)(3.42),(3.44)

= f₀(x), (3.46)

and it follows that x and (y, z)are primal and dual optimal with zero duality gap.The KKT conditions will be used in Chapter 4 for the convergence anal-ysis of parallel optimisation methods. One specicity of convex optimisation is that the KKT optimality conditions (3.41)-(3.45) are also sucient. The next proposition provides sucient optimality conditions for the convex opti-misation problem in standard form.

Proposition 3.7 (KKT conditions for convex problems) Consider Problem 3.2 with dom(f₀) ̸= ∅. Let x be a vector in the interior of S where f₀, ..., f_q and h₁, ..., h_r are continuously dierentiable. The point x is a solution i there exists a point (y, z) ∈ R^q ×R^r which, together with x, satises the Karush-Kuhn-Tucker conditions (3.41)-(3.45).

One should keep in mind that the existence of a primal-dual pair of points with zero duality gap is conditioned by the satisfaction of a constraint quali-cation such as the Slater condition. In Appendix B.3 we provide a proof of Proposition 3.7 based on the characterisation of closed convex sets by their containing halfspaces (Proposition 3.1), as well as a geometric interpretation of the constraint qualication issue.

Dierentiability of the dual function

One diculty in solving the dual of an optimisation problem is that an expres-sion for the dual functiong is not explicitly available. Indeed, the computation of g(y, z) via (3.25) requires, in the general case, to solve a dierent optimi-sation problem for each new value of (y, z). It is nonetheless possible to solve the dual problem eciently when the dual function is dierentiable. In this section, we discuss the conditions that guarantee the dierentiability of the function g for Problems 3.1 and 3.2.

Non-convex optimisation. We rst introduce a point-to-set mapping X^∗ and a function x^∗, which establish a connection between between dual and primal points.

Denition 3.3 (Mapping X^∗ and function x^∗) Consider Problem 3.1 and the set-valued mapping X^∗ : R^q ×R^r →2^Rⁿ dened by

X^∗(y, z) = arg min_x∈Sl(x, y, z), (y, z)∈ R^q ×R^r. (3.47) At every (y, z)∈ R^q ×R^r where X^∗ is a singleton, we dene x^∗(y, z) as the only element of X^∗(y, z).

According to the above denition², and for any(y, z)∈ R^q×R^r, a vectorx ∈ S belongs to X^∗(y, z)ig(y, z) = l(x, y, z). Depending on the problem and the point (y, z), the number of elements of X^∗(y, z) may be zero, one (in which exclusive case x^∗(y, z) is dened), nite, innite, or even the entire set S. Also x^∗ is only dened on a subset of R^q ×R^r, where the dual function g can be shown to be dierentiable under certain conditions discussed in the rest of the section. In particular, we haveg(y, z) = l(x^∗(y, z), y, z)whereverx^∗(y, z) is dened.

The next lemma and theorem are known results of nonlinear optimisation.

For these results it is assumed that the set S is compact, i.e. closed and bounded. The boundedness ofS implies that any sequence of vectors in S has at least one limit point (see Theorem A.3 in Appendix A), and the closedness of S that every limit point of this sequence lies in S. It is also assumed that the objective and constraint functions are continuous overS, which guarantees that the Lagrangianl(x, y, z)is continuous inxoverS for any(y, z), and that its minimum for x ∈ S is realised in S, in accordance with the Weierstrass theorem (Theorem A.4 in Appendix A). The following lemma is a consequence of these observations and of the continuity of the objective and constraint functions over S.

Lemma 3.1 (Closure of X^∗) Consider Problem 3.1, in which we assume that the set S is a non-empty compact subset of Rⁿ, and the objective f₀ and constraint functions f₁, ..., f_q and h₁, ..., h_r are continuous over S. Let (¯y,z)¯ ∈ R^q ×R^r and suppose that X^∗(¯y,z)¯ is a singleton. If {(y^k, z^k)} is a sequence in R^q ×R^r with (y^k, z^k) → (¯y,z)¯ , then for every sequence x^k such that x^k ∈ X^∗(y^k, z^k) for all k, we have x^k →x^∗(¯y,z)¯ .

The next theorem follows from Lemma 3.1. We introduce the constraint vectors f : Rⁿ → R^q and h : Rⁿ → R^r, respectively dened by f(x) = (f₁(x), ..., f_q(x)) andh(x) = (h₁(x), ..., h_r(x)), and denote by(f, h)the verti-cal concatenation of all the constraint functions of the considered optimisation problem.

2Notice that the denition ofX^∗ is not extended to innite values. Thus, ifg(y, z) =−∞, we haveX^∗(y, z) =∅, and x^∗ is not dened at(y, z).

3.4. Convex optimisation 57

Theorem 3.3 (Dierentiability of the dual function) Consider Pro-blem 3.1, where S is a non-empty compact set, and f₀, f₁, ..., f_q andh₁, ..., h_r are assumed to be continuous over S. If (y, z) ∈ R^q ×R^r and X^∗(y, z) is a singleton, then the dual function g is dierentiable at (y, z) and its gradient at (y, z) is given by

∇g(y, z) = (f(x^∗(y, z)), h(x^∗(y, z))). (3.48)

Convex problems. Consider now Problem 3.2 and assume that S is com-pact, that the objective and constraint functions are continuous³ over S, and that the objective function f₀ is strictly convex. Under the assump-tion made in Problem 3.2 that f₁, ..., f_q are convex and h₁, ..., h_r ane, it is easily seen that the Lagrangian l(x, y, z) is also stricly convex in x for every (y, z) provided that y ≥ 0. If in addition the set S is compact, then it follows from Proposition 3.4 and Theorem A.4 that for every (y, z) with y ≥ 0, l(x, y, z) attains its minimum for x ∈ S at exactly one vec-tor of S. Consequently, the mapping X^∗ is a singleton for every vector of the dual feasible set, now given by Y = {(y, z) ∈ R^q × R^r|y ≥ 0}, and the function x^∗ is dened everywhere in Y. It follows from Lemma 3.1 that, for every (¯y,z)¯ ∈ Y and every sequence {(y^k, z^k)} in Y such that (y^k, z^k) → (¯y,z)¯ , we have x^∗(y^k, z^k) → x^∗(¯y,z)¯ . By continuity of f and h, we nd f(x^∗(y^k, z^k)) → f(x^∗(¯y,z))¯ and h(x^∗(y^k, z^k)) → h(x^∗(¯y,z))¯ , hence the gradient ∇g, given in (3.48) is continuous at (¯y,z)¯ , and thus over all Y.

We infer the following corollary of Lemma 3.1 and Theorem 3.3, which provides a sucient condition for the continuous dierentiability of the dual function of convex problems with strictly convex objectives.

Corollary 3.1 (Dierentiability for convex problems) Consider Pro-blem 3.2, where it is further assumed that the set S is a non-empty compact set, the objective and constraint functions f₀, f₁, ..., f_q are continuous over S, and the objective function f₀ is strictly convex. Then the function x^∗ is de-ned and continuous over Y = {(y, z) ∈ R^q × R^r|y ≥ 0}, and the dual function g is continuously dierentiable over Y, where its gradient is given by (3.48).

As we will see in in Section 3.4.4, many ecient optimisation methods are designed for continuously dierentiable functions. Hence the continuous dierentiability of the dual function proves a very useful property for solving the dual problem. Note that the condition that S be compact is not severely

3 The continuity overS of the objective and constraint functions of Problem 3.2 is guaranteed, in particular, whenS is a subset of the interior of[⋂q

i=0dom(fi)]∩[⋂r

i=1dom(hi)].

Example 3.7 (Dierentiability of the dual function) Consider the convex optimisation prob-lem depicted in Figure 3.5(a)

minimise

x∈S e^x¹+e^x² subject to x1+x2≥a

x1−x2= 0

(3.49) where we rst assume thatS=R². The Lagrangian is given by

l(x, y, z) =e^x¹+e^x² +y(−x₁−x₂+a) +z(x1−x₂) =l₁(x1, y, z) +l₂(x2, y, z), (3.50) where we denel₁(x1, y, z) =e^x¹−(y−z)x1+ay, andl₂(x2, y, z) =e^x²−(y+z)x2. The functionx^∗, introduced in Denition 3.3, is only dened over the set {(y, z) ∈ R² : y > |z|} where we nd x^∗(y, z) = (x^∗₁(y, z), x^∗₂(y, z)), with

x^∗₁(y, z) = ln (y−z), x^∗₂(y, z) = ln (y+z), y >|z|. (3.51) By minimisation ofl(·, y, z)we nd for (y, z)∈R²,g(y, z) =g1(y, z) +g2(y, z), with

g₁(y, z) =

⏐

−∞ ify−z <0

ay ify−z= 0

h(y−z) +ay ify−z >0

, g₂(y, z) =

⏐

−∞ ify+z <0 0 ify+z= 0 h(y+z) ify+z >0

, (3.52)

where we have introduced the functionh:R²dened byh(t) =t(1−lnt), and such that ^dh(t)_dt =−lnt fort >0. We nd that the dual functiong, depicted in Figure 3.5(b), is continuously dierentiable over the set {(y, z) ∈ R² : y > |z|} (i.e. the domain of denition of x^∗), where its gradient is given by ∇g(y, z) = (a−ln (y²−z²),ln^y−z_y+z). Notice that ∇g(e^a²,0) = 0, thus (e^a²,0) is dual optimal by concavity of g. Since strong duality holds, the solution of (3.49) is unique and given by x^∗(e^a²,0) = (^a₂,^a₂).

Suppose now that S is the compact S={x∈R²:|x₁| ≤b, |x₂| ≤b}, whereb is a positive constant satisfying|a|<2b <∞. The function x^∗ is dened on R² and (3.51) becomes

x^∗₁(y, z) =

⏐

−b ify−z≤e^−b ln (y−z) ife^−b< y−z < e^b b ify−z≥e^b

, x^∗₂(y, z) =x^∗₁(y,−z), (3.53) while (3.52) becomes

g₂(y, z) =

⏐

e^−b+b(y+z) ify+z≤e^−b h(y+z) ife^−b< y+z < e^b e^b−b(y+z) ify+z≥e^b

, g₁(y, z) =g₂(y,−z) +ay. (3.54) The dual functiong, depicted in Figure 3.5(c), is continuously dierentiable onR², and∇g is now dened on R² and such that ∇g(y, z) = 0 i (y, z) = (e^a²,0). Also, the solution of (3.49) is still given byx^∗(e^a²,0) = (^a₂,^a₂). Notice that∇²gis discontinuous on the linesy±z=e^−b andy±z=e^b, and dened and continuous over each of the 9 regions delineated by these borders.

3.4. Convex optimisation 59

x1+^x2≥^a x1−^x2=0

(₂^a,^a

x₁ x₂

0 a

(a) Objective function and constraints

y−^z=0

y+^z=0

(^e^a², 0) y z

(b) Dual function forz≥0withb=∞(S=R²)

y−z=e^−b

y+z=e^−b

e⁻^b

(e^a², 0) y z

Figure 3.5: Example 3.7

Example 3.8 (Dierentiability: non-strictly convex objective) Consider the convex opti-misation problem depicted in Figure 3.6(a)

minimise

x∈S x1+x2

subject to x²₁+x²₂≤a² x1−x2= 0

(3.55)

where S = {x ∈ R² : |x₁| ≤ b, |x₂| ≤ b} with b < ∞.The Lagrangian is given by l(x, y, z) = l₁(x1, y, z) +l₂(x2, y, z), where we dene l₁(x1, y, z) = (yx1+ (z+ 1))x1−a²y and l₂(x2, y, z) = (yx2−(z−1))x2. For the sake of concision, only the dual points(y, z)lying in the zone of interest Y =R×R≥0are considered. The function x^∗ is uniquely dened over Y \ {(0,1),(0,−1)}, where x^∗(y, z) = (x^∗₁(y, z), x^∗₂(y, z))and

x^∗₁(y, z) =

⏐

−bf igure ifz+ 1≥2by

−^z+12y if|z+ 1|<2by b ifz+ 1≤ −2by

, x^∗₂(y, z) =

⏐

−b ifz−1≤ −2by

z−1

2y if|z−1|<2by b ifz−1≥2by

. (3.56) For (y, z) ∈ Y \ {(0,−1),(0,1)}, we nd g(y, z) = l(x^∗(y, z), y, z) = l₁(x^∗₁(y, z), y, z) + l₂(x^∗₂(y, z), y, z) =g₁(y, z) +g₂(y, z), whereg_i(y, z) =l_i(x^∗_i(y, z), y, z)(i= 1,2) and thus

g₁(y, z) =

⏐

(b²−a²)y−b(z+ 1) ifz+ 1≥2by

−^(z+1)4y ² −a²y if|z+ 1|<2by (b²−a²)y+b(z+ 1) ifz+ 1≤ −2by

g₂(y, z) =

⏐

b²y+b(z−1) ifz−1≤ −2by

−^(z−1)4y ² if|z−1|<2by b²y−b(z−1) ifz−1≥2by

(3.57)

Noting that g(0, z) =−2b for|z| ≤1, we nd that g is continuous over Y and continuously dier-entiable overY \ {(0,−1),(0,1)}, where ∇g(y, z) = (∥x^∗(y, z)∥²−a², x^∗₁(y, z)−x^∗₂(y, z)), while∇g shows discontinuities at(−1,0)and(1,0). Ifb >

√ 2

2 a, then∇g is zero at the unique point(^√¹_2a,0), and we infer from (3.56) and Figure 3.6(c) that(−

√ 2 2 a,−

√ 2

2 a)is the unique solution of (3.55).

Whenb=

√ 2

2 a, notice thatg is maximal and equal to−√

2aall over the triangle{(y, z)|y≥0,|z| ≤ 1−√

2ay}, and again(−

√2 2 a,−

√2

2 a)is the unique primal solution.

If b <

√ 2

2 a, the optimality condition (3.22) is satised by all the points of the open segment {(0, z)| |z| < 1}, and the solutions of the dual problem are the points (0, z) with |z| ≤ 1, while the unique primal solution is(−b,−b).

If b=∞, thenS=R²is not compact, x^∗(y, z)is not dened ify= 0, and (3.56) becomes x^∗₁(y, z) =−z+ 1

2y , x^∗₂(y, z) = z−1

2y , y >0. (3.58)

After computations, we nd g(y, z) = −^z²2y⁺¹ −a²y over R×R>0, where x^∗ is uniquely dened andg is continuously dierentiable with gradient ∇g(y, z) = (^z_2y²⁺¹2 −a²,−^zy), and g(y, z) = −∞if y = 0, as depicted in Figure 3.6(b). Notice then that ∇g(^√¹_2a,0) = 0, and it follows from (3.58) that(−

√ 2 2 a,−

√ 2

2 a)is the solution of (3.55).

3.4. Convex optimisation 61

computed

x1−^x2=0

x2 1+^x²₂=^a²

(−^√₂²^a,−^√₂²^a)

0 a x₁

x₂

(a) Objective function and constraints

(√¹ 2a, 0)

y z

(b) Dual function forz≥0withb=∞(S=R²)

z−1=2by z+1=2by

z−1=−2by z+1=−2by

(√¹ 2a, 0)

2b y

−¹ 1

Figure 3.6: Example 3.8

restrictive, even when the feasible set of the problem is unbounded. In prac-tice, it is usual to specify S as a compact set characterised by a collection of suitable (possibly large) lower and upper bounds on the primal variables, so that the feasible region surrounding the optimal points is not aected. This is done in Example 3.7, where it is shown how, in a convex optimisation prob-lem where the objective function is strictly convex, adding a side constraint of the type x ∈ S with S compact can ensure the continuous dierentiability of the dual function. The compactness of S alone, however, is not a sucient condition for the dierentiability of the dual function, as seen in Example 3.8, where the gradient of the dual function of a convex optimisation problem with linear (hence not strictly convex) objective is discontinuous at some dual feasible points even when S is specied as a compact set.

When the objective function of convex optimisation problems is not strictly convex, dierentiability of the dual function can still be achieved through reg-ularisation techniques. These techniques aim to nd the optima of unfriendly optimisation problems, e.g. problems with non-strictly convex objective func-tions, by successively solving (exactly or approximately) sequences of regular problems that get asymptotically close to the initial problem.

Regularisation

Strict convexity of the objective function can be obtained by adding a quadratic term with small positive coecient ϵ, i.e.

f˜₀(x, ϵ) =f₀(x)ϵ+ϵ∥x∥² (3.59) Minimising Eq. (3.59) leads to suboptimal solutions for the optimisation prob-lem. Note that the term ϵ∥x∥² = ∑n

i=1ϵ∥x_i∥² is additively separable with respect to the coordinates of x and does not compromise the separability of the problem, as we shall explain in Section 3.4.5. By decreasing ϵ to values small enough, the solution of the regularised problem can be made arbitrar-ily close to the set of solutions of the initial problem. It is thus possible to iteratively solve the initial problem by considering a sequence of subproblems with strictly convex objectivesf˜₀(x, ϵ^k), where ϵ^k ↓ 0. That the sequence (ϵ^k) must vanish leads to implemetation issues, and to ill-conditioning issues for small values of ϵ^k. To avoid these diculties it is common to resort to more elaborate regularisation techniques such as proximal point methods, or the augmented Lagrangian techniques that will be introduced in Section 3.4.4.

Given the optimisation problem of minimising a real function f over a convex set X and a starting point x⁰ ∈ X, the proximal point algorithm is given by

x^k+1 = min

x∈Xf(x) + 1

2c^k∥x−x^k∥², k = 0,1,2, ..., (3.60)

3.4. Convex optimisation 63

where (c^k) is a sequence of positive scalars. If f is convex, it is easily shown that (3.60) rewrites as x^k+1 = P^kx^k, where we introduce the successive op-erators P^k = (1 + c^kT)⁻¹, in which the operator T is such that T x is a subgradient⁴ of f at x. Then T is called a monotone operator, P^k is single-valued and said to be non-expansive for all k, and the algorithm converges to a solution if the sequence (c^k) is bounded away from zero. The proximal point algorithm has the advantage over the regularisation technique (3.59) that the sequence (c^k) needs not grow to innity and thus ill-conditioning can be avoided.

3.4.4 Iterative optimisation methods

In this section we briey discuss the basic optimisation methods for nding stationary points in optimisation problems. These methods are called itera-tive as they start with some initial guess x⁰ and generate a sequence of vec-tors {x^k}^∞k=1 which is expected to get closer and closer to the set of solutions.

An iterative algorithm is called convergent if any limit point of the sequences of vectors it generates is a stationary point of the problem. Since the scope of this study is limited to convex problems (where stationary points are actual solu-tions), we will mostly speak in terms of convergence to solutions. The methods listed in this section may nonetheless be used to uncover the stationary points of nonconvex problems. Focus is set on a particular class of optimisation algorithms, called the gradient methods, which allow for parallel implemen-tations and are therefore particularly attractive in the context of distributed network optimisation. Other accepted families of methods which are are not intended for parallel execution (e.g. the interior-point methods and barrier methods [FM68, Wri05], the augmented Lagrangian methods and method of multipliers [Hes69, Pow69, PT73, Ber76a, Roc76a, BT94, BT97, NW99], or se-quential quadratic programming (SQP) [GW12, BT95]) are not discussed in these text.

Convergence of iterative agorithms for convex problems

An iterative optimisation algorithm is globally convergent if the vector se-quences it generates are guaranteed to converge to a solution of the problem provided that such a point exists. The main properties that motivate the choice of a globally convergent iterative algorithm include the complexity of its implementation, the computational expense and the resource consumption

4The vectors is a subgradient off atx∈dom(f)iff(y)≥f(x) +s^′(y−x)for ally∈dom(f).

An operatorT is monotone if(T x−T y)^′(x−y)≥0for allx, y. It is said to be non-expansive with respect to the Euclidean norm if∥T x−T y∥ ≤x−y for allx, y.

overhead it causes, and the quickness (or slowness) with which the generated sequences of vectors are expected to converge. These criteria are correlated in the sense that quick convergence is likely to minimise both the compu-tational load and the energy consumption overhead. In particular, the local convergence of an algorithm is concerned with the asymptotic properties of a sequence generated by the algorithm in the close vicinity of its point of con-vergence. Local convergence is typically characterised in terms of orders and rates of convergence, which we now dene.

Suppose that {x^k} is a sequence of vectors of a real vector space which converges to a vector x^⋆. We say that {x^k}converges Q-linearly to x^⋆ if there exists a scalar µ ∈ (0,1) such that

lim sup

k→∞

∥x^k+1−x^⋆∥

∥x^k−x^⋆∥ = µ. (3.61)

The quantity µ is then called the (asymptotic) rate of convergence of {x^k}. A Q-linearly convergent sequence with a small convergence rate converges rel-atively fast to its limit. If however the rate is close to 1, the convergence will be slow. If (3.61) holds with µ = 1, the sequence {x^k} is said to con-verge sublinearly. If (3.61) holds with µ = 0, then the sequence converges superlinearly.

The convergence of a sequence can be further characterised by its order.

Let {x^k} converge to a vector x^⋆. The sequence {x^k} is said to converge with order q to x^⋆ if q ≥1 and there exists a scalar µ > 0 such that

lim sup

k→∞

∥x^k+1−x^⋆∥

∥x^k−x^⋆∥^q = µ. (3.62) Convergence is thus linear if (3.62) holds withq = 1, and it is called quadratic if (3.62) holds with q = 2.

The denition of the order of convergence is often extended to encompass some sequences which converge reasonably fast but do not satisfy (3.62). A sequence {x^k} is then said to converge (at least) with order q to x^⋆ if there exists a scalar sequence {y^k} which converges to 0 with order q according to (3.62) and such that

∥x^k −x^⋆∥ ≤ y^k, ∀k. (3.63) In particular, sequences which satisfy (3.63) with q = 1 are said to converge at least R-linearly.

In this study we are mainly concerned with algorithms which generate linearly convergent sequences. A vector sequence{x^k} in Rⁿ which converges

Im Dokument Distributed methods for convex optimisation : application to cooperative wireless sensor networks (Seite 58-93)