• Keine Ergebnisse gefunden

other point of X, i.e. if there exists an ϵ > 0 such that

f(x)≤ f(y), ∀y ∈ X. (3.12)

The vector x is a strict local minimum if the inequality in (3.11) is strict for y ̸= x, and a strict global minimum if the inequality in (3.12) is strict fory ̸= x. Similarly, if f : Rn → [−∞,∞) and X ⊂ Rn, a vector x ∈ dom(f)∩ X is said to be a local (or global) maximum off overX if it is a local (respectively, global) minimum of −f over X.

Solving (3.10) is generally a dicult task without further assumptions on the feasible set and objective function. Also, many applications usually con-sider that the set X is convex, and closed. A collection of systematic methods for nding local optima over convex sets is reported in Section 3.4.4. The same methods yields global minima when, in addition, the objective function has the attractive property to be convex. The next result states that, if in (3.10) both the feasible set X and the objective function f are convex, then any local minimum off overX is also a global minimum over X, hence a solution of the problem.

Proposition 3.4 (Minima of convex functions over convex sets) Let f : Rn → (−∞,∞] be a convex function and X a convex subset of Rn, then any local minimum of f over X is also a global minimum. If in addition f is strictly convex, then there exists at most one global minimum of f over X. The study of optimisation problems with convex feasible sets and objective functions forms a distinctive branch of the nonlinear optimisation framework, called convex optimisation. The convex optimisation problem will be intro-duced in Section 3.4.2 and discussed throughout this manuscript.

The general nonlinear optimisation problem

In many nonlinear optimisation problems, the structure of the feasible setX is specied by a system of inequality and equality constraints. We now formulate a variant of (3.10) with explicit characterisation of the feasible set, and refer to it as the general nonlinear optimisation problem.

Problem 3.1 (General nonlinear optimisation problem) Consider the optimisation problem

minimise

x∈S f0(x)

subject to fi(x)≤0, i= 1, ..., q hi(x) = 0, i= 1, ..., r

(3.13) where f0 : Rn → (−∞,∞] is continuously dierentiable, f1, ..., fq and h1, ..., hr are functions Rn →(−∞,∞], and S is a nonempty subset of Rn.

3.4. Convex optimisation 47

We refer to f1, ..., fq as the inequality constraint functions, and to h1, ..., hr as the equality constraint functions. The feasible set of Problem 3.1 is thus given by

X = {x∈ S| fi(x)≤ 0, i = 1, ..., q, hj(x) = 0, j = 1, ..., r}. (3.14) Notice that equality constraints are redundant in theoretical analyses since any equality constrainthi(x) = 0can be replaced by a system of two inequality constraints hi(x) ≤ 0, −hi(x) ≤ 0. Besides, the constraint x ∈ S, called side constraint, has minor importance and is often omitted in the literature.

Indeed, it is easily seen that the problem minimisex∈Sf0(x) is equivalent to minimisexf0(x) +χS(x), where the characteristic function χS is dened as χS(x) = 0 if x ∈ S, and χS(x) = ∞ if x /∈ S. We nevertheless decide to keep both the equality constraints and the side constraint in the formulation of Problem 3.1, as they sometimes simplify the notation and analysis.

3.4.2 Convex optimisation

Introduction and examples

The developments of this study are mainly concerned with convex optimisa-tion. The problem stated in (3.10) is called a convex optimisation problem when the feasible set X is convex and f is a convex, continuously dieren-tiable function. Convex optimisation problems have the specicity that all their minimisers are global minimisers, in accordance with Proposition 3.4.

A variant of the convex optimisation problem is when a concave function f : Rn → [−∞,∞) is to be maximised over a convex set, which is equivalent to the convex problem of minimising −f over the same set and can be con-sidered in a similar way. We refer to this problem as the convex optimisation problem with concave objective.

Basic examples of convex optimisation problems, such as linear programs and quadratic programs are presented in Examples 3.1 and 3.2.

Example 3.1 (Linear program) A linear programming (LP) problem (also called linear program) is a convex optimisation problem with ane objective and constraint functions. It can be shown that every linear program can be rewritten in the standard form

minimise

x cx

subject to Ax=b x0

(3.15)

where the variablexbelongs to some real spaceRn,c is a vector ofRn,Aar×nreal matrix, and b a vector ofRr.

Example 3.2 (Quadratic program) A quadratic programming (QP) problem (sometimes quadratic program) is usually dened as a convex optimisation problem with a quadratic objective function and ane constraint functions. The quadratic program can be formulated as

minimise

x

1

2xCx+dx subject to Ax=b

Gxh

(3.16)

where the variablexis a vector ofRn,C is a symmetric matrix ofRn×n,dRn ARr×n,bRr, GRq×n, andhRq.

Another example of a convex optimisation problem with quadratic objective function is the orthogonal projection on a convex set, dened in Example 3.3.

In Chapter 4 we will use the scaled projection operator, which is introduced in Example 3.4 as an extension of the orthogonal projection operator.

Example 3.3 (Orthogonal projection) The orthogonal projection [x]+X of a vector xRn on a closed convex setX is dened by

[x]+X = arg min

y∈X xy2, (3.17)

wherex = (xx)12 is the Euclidean norm of x(see also Equation (A.1) in Appendix A). Equiva-lently,[x]+X can be dened as the solution of the convex optimisation problem

minimise

y (xy)(xy)

subject to yX (3.18)

which reduces to a quadratic program whenX is a polyhedron specied by a nite collection of ane inequality constraints.

Example 3.4 (Scaled projection) For any vector xRn and any symmetric, positive denite scaling matrixT Rn×n, we dene the scaled norm of xby

xT = (xT x)12. (3.19)

The scaled projection[x]+X,T ofxon a closed convex set X is dened as [x]+X,T = arg min

y∈X xy2T−1. (3.20)

If T is the identity matrix, xT reduces to the Euclidean norm of x, and [x]+X,T to the orthogo-nal projection of x on X, denoted by [x]+X. More generally, [x]+X,T is the solution of the convex optimisation problem

minimise

y (xy)T−1(xy)

subject to yX (3.21)

which reduces to a quadratic program whenX is a polyhedron. Some useful properties of the scaled norm and the scaled projection operator are derived in Appendix B.4.

Optimality condition

The optimality condition in convex optimisation problems takes the following form.

3.4. Convex optimisation 49

Proposition 3.5 (First-order optimality condition) Let f : Rn → (−∞,∞] be a convex continuously dierentiable function and X a convex subset of Rn. A vector x ∈ dom(f)∩X is a global minimum of f over X i

∇f(x)(y−x) ≥0, ∀y ∈ X. (3.22) If the vector x is an interior point of X, then (3.22) reduces to ∇f(x) = 0. The rst-order optimality condition at a given point of the convex set X can be interpreted in terms of the existence of descent directions at this point. A vector d ∈ Rn is called a feasible direction at a point x ∈ X if one can nd an ϵ > 0 such that x+ϵd ∈ X. The vector d is said to be a descent direction for f at x if ∇f(x)d < 0. Hence, the condition (3.22) states that no descent direction exists at x, or equivalently, that the value of the function cannot be decreased by a displacement from x along any feasible direction. The computation of descent directions plays an important part in many iterative algorithms for convex optimisation. If such a direction can be found at a feasible point xk, then it is always possible to nd a `better' point xk+1 such that f(xk+1) < f(xk) by searching along the considered descent direction.

In a broader context where f is not necessarily convex, any vector x ∈ dom(f)∩X which satises (3.22) is said to be a stationary point. Note that, when the function f is not convex, this condition is satised by local minima but also by some other points such as local maxima. When f is convex, all the stationary points are global minima over X and thus solutions of the considered optimisation problem. Since the developments of this manuscript are mainly concerned with the minimisation of convex functions over convex sets, we will mostly speak of global minima and solutions (rather than of stationary points) so as to avoid any confusion.

Standard formulation

In this section, we formulate the convex optimisation problem in standard form, where inequality and equality constraints are displayed explicitly. The standard form of the convex optimisation problem resembles the general non-linear optimisation problem (Problem 3.1) with the additional assumption that the objective and inequality constraint functions are convex, the equality constraint functions are ane, and S is a convex set.

Problem 3.2 (Convex optimisation problem in standard form) Consider the optimisation problem

minimise

x∈S f0(x)

subject to fi(x) ≤0, i = 1, ..., q hi(x) = 0, i = 1, ..., r

(3.23)

where f0 : Rn → (−∞,∞] is convex continuously dierentiable, f1, ..., fq are convex functions Rn → (−∞,∞], h1, ..., hr are ane functions Rn → R, and S is a convex subset of Rn.

It follows from Proposition B.4 in Appendix B.1 that the feasible set of Prob-lem 3.2, still given by (3.14), is now the intersection of convex sets and thus convex. Notice that the assumption that any equality constraint hi(x) = 0 is ane can be explained by the fact that this equality constraint should be equivalent to the system of the two convex inequality constraints hi(x) ≤ 0 and −hi(x) ≤ 0, which is only possible if hi is both concave and convex and thus ane. Again, the equality constraints and the side constraint are dispensable but their presence will simplify the notations in the sequel.

3.4.3 Lagrange duality

In this section we consider the general nonlinear optimisation problem formu-lated in Section 3.4.1 (Problem 3.1). Note that neither the objective function of Problem 3.1, nor its feasible set (3.14) is necessarily convex.

The dual problem and weak duality

The Lagrange duality framework consists of considering simultaneously the objective and constraints of the problem by augmenting the objective func-tion with a weighted sum of the constraint funcfunc-tions. In Problem 3.1, the Lagrangian function l : Rn×Rq ×Rr is dened by

l(x, y, z) =f0(x) +

q

i=1

yifi(x) +

r

i=1

zihi(x) (3.24) where y = (y1, ..., yq) is a vector of Rq, z = (z1, ..., zr) is a vector of Rr, and y1, ..., yq and z1, ..., zr are called the Lagrange multipliers or dual variables.

The Lagrange dual function g is obtained by minimisation of the La-grangian along the primal variables x1, ..., xn, i.e.

g(y, z) = inf

x∈S∩Dl(x, y, z) (3.25)

where we dene D = [⋂q

i=1dom(fi)] ∩[⋂r

i=1dom(hi)]. Note that when the problem is convex, we can simply write g(y, z) = infx∈Sl(x, y, z).

It is reasonable to assume that the objective functionf0 takes a nite value at at least one vector of Rn, i.e. dom(f0) ̸= ∅ (such a function is sometimes called proper). Under this assumption, the dual function g : Rq × Rr → [−∞,∞)is concave, as the pointwise inmum overS∩D of l(x, y, z)regarded

3.4. Convex optimisation 51

as an ane function of (y, z) with parameter x (see Appendix B.1). Notice that g is concave even when Problem 3.1 is not convex.

Recall (3.14) and consider the optimal value of Problem 3.1, given by p = inf

x∈Xf0(x). (3.26)

For any solution x of Problem 3.1, we have f0(x) = p. Suppose now thatx is a feasible point of Problem 3.1. We have x ∈ S, fi(x) ≤ 0 for i = 1, ..., q, and hi(x) = 0 for i = 1, ..., r. It follows that l(x, y, z) ≤ f0(x) and thus g(y, z) ≤f0(x) for any (y, z) such that y ≥0. Since the last result holds for any feasible x, we nd g(y, z) ≤ p for (y, z) in Rq≥0×Rr, where the dual function gives a lower bound on the optimal value p.

The question that the duality theory raises, is how close the largest lower bound supplied by the dual function

d = sup

y≥0,z

g(y, z) (3.27)

is to the optimal value p. The quantity d is in fact the optimal value of the dual problem

maximise

y,z g(y, z)

subject to y ≥0 (3.28)

which is a convex optimisation problem with concave objective where the inequality constraint takes the simple formy ≥ 0. The feasible set of the dual problem, or dual feasible set is the set

Y = {(y, z) ∈ Rq ×Rr|y ≥0, g(y, z) > −∞}. (3.29) Any vector (y, z) ∈ Y is called dual feasible. It is common to refer to the initial problem (3.13) as the primal problem, and to any feasible point of the primal problem as primal feasible.

For any pair of primal and dual feasible vectors x,(y, z), we have

g(y, z) ≤d ≤p ≤f0(x). (3.30) We are thus able to locate the optimal values of the primal and dual problems in the interval [g(y, z), f0(x)]. The width f0(x)−g(y, z) of this interval is called the duality gap associated with x and (y, z). The duality gap is used in particular to derive stopping criteria for iterative algorithms in the form of error bounds for the estimation of the optimal values.

The duality gap is minimised by the quantity p −d, called the optimal duality gap of the problem, and we havep−d ≥0. The nonnegativity of the duality gap is called the weak duality principle. Weak duality states that the

optimal value of the dual problem is an underestimator of that of the primal problem. It is thus possible to nd a lower bound on the optimal value of Problem 3.1 by solving its dual (3.28), which is a convex optimisation prob-lem (with concave objective) allowing for the convex optimisation methods discussed further in the chapter.

Strong duality and constraint qualication

In the case when the optimal duality gap is zero, i.e. p = d, one says that strong duality holds. Under strong duality, the best lower bound for p provided by the dual function is tight, and the optimal value of the primal problem can be found by solving the dual problem.

Although strong duality does not necessarily hold for the general formu-lation of Problem 3.1, there exist some assumptions for the solutions of the primal problem, called regularity conditions or constraint qualications, that guarantee the existence of solutions of the dual problem with zero optimal duality gap. A minimiser of the primal problem which satises a constraint qualication is called a regular point. For instance, a minimiser x of Prob-lem 3.1 in the interior of S is regular if the inequality and equality constraint functions are all ane (linearity constraint qualication), if the constraint functions are continuously dierentiable at x and the gradients at x of the active inequality constraints and of the equality constraints are linearly inde-pendent (linear independence constraint qualication), or even if these gradi-ent are only positive-linearly independgradi-ent (Mangasarian-Fromovitz constraint qualication).

Strong duality is easier to obtain in the convex setting of Problem 3.2.

One constraint qualication, called the Slater condition, is specic to convex optimisation problems. Problem 3.2 satises the Slater condition if one can nd a feasible vector x where the non-ane inequality constraints hold with strict inequalitysuch a point is sometimes called strictly feasible, i.e. if there exists anxin the interior ofS such thatfi(x) ≤0fori= 1, ..., q,hi(x) = 0 for i = 1, ..., r, and fi(x) < 0 for i ∈ I, where I ⊂ {1, ..., q} denotes the index set of the inequality constraint functions that are not ane. The notion of constraint qualication is illustrated by Example B.1 in Appendix B.3.

The concept of duality is explained in Example 3.5 for the case when the optimisation problem is a linear program. When the orthogonal and scaled projectionspreviously introduced in Examples 3.3 and 3.4are done on polyhedral sets, they reduce to quadratic programs with strictly convex ob-jective. The duality of a general instance of a quadratic program is considered in Example 3.6.

3.4. Convex optimisation 53

Example 3.5 (Dual of a linear program) Consider the linear program in standard form (3.15).

The Lagrangian is given by

l(x, y, z) =cxyx+z(Axb) = (Azy+c)xbz, (3.31) where we introduce y Rn as the vector of the dual variables corresponding to the inequality con-straints, and z Rr as the vector of the dual variables related to the equality constraints. By minimisation of the Lagrangian with respect tox, we obtain the dual function

g(y, z) =

bz ifAzy+c= 0

−∞ ifAzy+c̸= 0 . (3.32)

It follows from (3.32) that the dual problem maximisey≥0,zg(y, z)reduces to maximise

(y,z) bz

subject to Azy+c= 0 y0

(3.33)

or, equivalently,

minimise

z bz

subject to Az+c0 (3.34)

which is a linear program. Since strong duality holds, the problem (3.15) and its dual (3.34) have the same optimal values. Similarly, it is easily seen that the dual of (3.34) yields the initial prob-lem (3.15).

Example 3.6 (Dual of a quadratic program) We study the dual of the quadratic program (3.16), where we assume that the symmetric matrixC is positive denite and thus invertible. The Lagrangian is given by

l(x, y, z) = 1

2xCx+dx+y(Gxh) +z(Axb) =1

2xCx+ (Gy+Az+d)x(hy+bz), (3.35) where y Rq andz Rr are the dual variable vectors. For any (y, z)Rq×Rr, the Lagrangian is minimised when its derivative with respect to xis zero, i.e. when Cx+Gy+Az+d = 0, or equivalentlyx=C−1(Gy+Az+d). It follows that dual function is given by

g(y, z) =1

2(Gy+Az+d)C−1(Gy+Az+d)(hy+bz) (3.36)

=1

2(y, z)[(G, A)C−1(G, A)](y, z) + [(h, b)(G, A)C−1d](y, z)1

2dC−1d (3.37) where (G, A) Rq+r ×Rn denotes the vertical concatenation of the matrices G and A, and it is easily seen that(G, A)C−1(G, A) is a positive semidenite symmetric matrix with dimension(q+r).

The dual function g is thus a (strictly concave) quadratic function with ane gradient g(y, z) =

[(G, A)C−1(G, A)](y, z) + (h, b)(G, A)C−1d, for all (y, z) Rq ×Rr, and constant Hessian

2g(y, z) = (G, A)C−1(G, A). If I denotes the identity matrix in Rq ×Rq and we set C˜ .

= (G, A)C−1(G, A) andd˜ .

= (h, b)(G, A)C−1d, we nd that the dual problem is given by maximise

(y,z) 12(y, z)C(y, z) + ˜˜ d(y, z)

subject to (I,0)(y, z)0 (3.38)

which is a quadratic program in concave form with strictly concave objective and ane inequality constraints.

Complementary slackness and Karush-Kuhn-Tucker conditions

In this section we discuss the optimality conditions of primal minimisers. Con-sider Problem 3.1 and suppose that, under some constraint qualication, the optimal duality gap is zero. Letx and (y, z)be solutions of the primal and dual problems, respectively. We have

d = g(y, z)≤l(x, y, z) ≤f0(x) =p. (3.39) Since d = p, we nd that the inequalities in (3.39) hold with equality signs.

It follows on the one hand that x and (y, z) have zero duality gap, and on the other hand that ∑q

i=1yifi(x) = 0, and thus

yifi(x) = 0, i = 1, ..., q, (3.40) by nonnegativity of y1, ..., yq. The condition (3.40), known as complementary slackness, is a necessary condition for the optimality of a primal-dual pair of vectors.

In many applications the objective function f0 and constraint functions f1, ..., fq and h1, ..., hr happen to be continuously dierentiable, in which case the optimality conditions take a particular form. Under the assumption of zero optimal duality gap, we consider two pointsx and (y, z), respectively solu-tions of the primal and dual problems. Recalling that the inequalities in (3.39) hold with equality signs, it follows from (3.25) that x minimises l(·, y, z) over S ∩D. If x is an interior point of S, and f0, f1, ..., fq and h1, ..., hr are continuously dierentiable atx, then the gradient atxofl(x, y, z)seen as a function ofxmust be0, i.e. ∇f(x)+∑q

i=1yi∇fi(x)+∑r

i=1zi∇hi(x) = 0. The next result summarises the necessary optimality conditions of a primal-dual pair under strong primal-duality.

Proposition 3.6 (Karush-Kuhn-Tucker (KKT) conditions) Let x be a local minimiser of Problem 3.1 with dom(f0) ̸= ∅, and assume that x is an interior point of S, that f0, ..., fq and h1, ..., hr are continuously dieren-tiable at x, and that a constraint qualication holds at x. Then there exists a point (y, z)∈ Rq ×Rr satisfying

fi(x)≤0, i = 1, ..., q, (3.41) hi(x) = 0, i= 1, ..., r. (3.42) yi ≥0, i = 1, ..., q, (3.43) yifi(x) = 0, i= 1, ..., q, (3.44)

∇f0(x) +∑q

i=1yi∇fi(x) +∑r

i=1zi∇hi(x) = 0. (3.45)

3.4. Convex optimisation 55

In particular, (3.41) and (3.42) are called the primal feasibility conditions, (3.43) the dual feasibility condition, and (3.44) can be identied as the comple-mentary slackness condition previously stated in (3.40). Notice, in particular, that if x is a global minimiser of Problem 3.1 satisfying the KKT conditions together with a dual pair (y, z), then

g(y, z)(3.44)= l(x, y, z)(3.42),(3.44)

= f0(x), (3.46)

and it follows that x and (y, z)are primal and dual optimal with zero duality gap.The KKT conditions will be used in Chapter 4 for the convergence anal-ysis of parallel optimisation methods. One specicity of convex optimisation is that the KKT optimality conditions (3.41)-(3.45) are also sucient. The next proposition provides sucient optimality conditions for the convex opti-misation problem in standard form.

Proposition 3.7 (KKT conditions for convex problems) Consider Problem 3.2 with dom(f0) ̸= ∅. Let x be a vector in the interior of S where f0, ..., fq and h1, ..., hr are continuously dierentiable. The point x is a solution i there exists a point (y, z) ∈ Rq ×Rr which, together with x, satises the Karush-Kuhn-Tucker conditions (3.41)-(3.45).

One should keep in mind that the existence of a primal-dual pair of points with zero duality gap is conditioned by the satisfaction of a constraint quali-cation such as the Slater condition. In Appendix B.3 we provide a proof of Proposition 3.7 based on the characterisation of closed convex sets by their containing halfspaces (Proposition 3.1), as well as a geometric interpretation of the constraint qualication issue.

Dierentiability of the dual function

One diculty in solving the dual of an optimisation problem is that an expres-sion for the dual functiong is not explicitly available. Indeed, the computation of g(y, z) via (3.25) requires, in the general case, to solve a dierent optimi-sation problem for each new value of (y, z). It is nonetheless possible to solve the dual problem eciently when the dual function is dierentiable. In this section, we discuss the conditions that guarantee the dierentiability of the function g for Problems 3.1 and 3.2.

Non-convex optimisation. We rst introduce a point-to-set mapping X and a function x, which establish a connection between between dual and primal points.

Denition 3.3 (Mapping X and function x) Consider Problem 3.1 and the set-valued mapping X : Rq ×Rr →2Rn dened by

X(y, z) = arg minx∈Sl(x, y, z), (y, z)∈ Rq ×Rr. (3.47) At every (y, z)∈ Rq ×Rr where X is a singleton, we dene x(y, z) as the only element of X(y, z).

According to the above denition2, and for any(y, z)∈ Rq×Rr, a vectorx ∈ S belongs to X(y, z)ig(y, z) = l(x, y, z). Depending on the problem and the point (y, z), the number of elements of X(y, z) may be zero, one (in which exclusive case x(y, z) is dened), nite, innite, or even the entire set S. Also x is only dened on a subset of Rq ×Rr, where the dual function g can be shown to be dierentiable under certain conditions discussed in the rest of the section. In particular, we haveg(y, z) = l(x(y, z), y, z)whereverx(y, z) is dened.

The next lemma and theorem are known results of nonlinear optimisation.

For these results it is assumed that the set S is compact, i.e. closed and bounded. The boundedness ofS implies that any sequence of vectors in S has at least one limit point (see Theorem A.3 in Appendix A), and the closedness of S that every limit point of this sequence lies in S. It is also assumed that the objective and constraint functions are continuous overS, which guarantees that the Lagrangianl(x, y, z)is continuous inxoverS for any(y, z), and that its minimum for x ∈ S is realised in S, in accordance with the Weierstrass theorem (Theorem A.4 in Appendix A). The following lemma is a consequence of these observations and of the continuity of the objective and constraint functions over S.

Lemma 3.1 (Closure of X) Consider Problem 3.1, in which we assume that the set S is a non-empty compact subset of Rn, and the objective f0 and constraint functions f1, ..., fq and h1, ..., hr are continuous over S. Let (¯y,z)¯ ∈ Rq ×Rr and suppose that X(¯y,z)¯ is a singleton. If {(yk, zk)} is a sequence in Rq ×Rr with (yk, zk) → (¯y,z)¯ , then for every sequence xk such that xk ∈ X(yk, zk) for all k, we have xk →x(¯y,z)¯ .

The next theorem follows from Lemma 3.1. We introduce the constraint vectors f : Rn → Rq and h : Rn → Rr, respectively dened by f(x) = (f1(x), ..., fq(x)) andh(x) = (h1(x), ..., hr(x)), and denote by(f, h)the verti-cal concatenation of all the constraint functions of the considered optimisation problem.

2Notice that the denition ofX is not extended to innite values. Thus, ifg(y, z) =−∞, we haveX(y, z) =, and x is not dened at(y, z).

3.4. Convex optimisation 57

Theorem 3.3 (Dierentiability of the dual function) Consider Pro-blem 3.1, where S is a non-empty compact set, and f0, f1, ..., fq andh1, ..., hr are assumed to be continuous over S. If (y, z) ∈ Rq ×Rr and X(y, z) is a singleton, then the dual function g is dierentiable at (y, z) and its gradient at (y, z) is given by

∇g(y, z) = (f(x(y, z)), h(x(y, z))). (3.48)

Convex problems. Consider now Problem 3.2 and assume that S is com-pact, that the objective and constraint functions are continuous3 over S, and that the objective function f0 is strictly convex. Under the assump-tion made in Problem 3.2 that f1, ..., fq are convex and h1, ..., hr ane, it is easily seen that the Lagrangian l(x, y, z) is also stricly convex in x for every (y, z) provided that y ≥ 0. If in addition the set S is compact, then it follows from Proposition 3.4 and Theorem A.4 that for every (y, z) with y ≥ 0, l(x, y, z) attains its minimum for x ∈ S at exactly one vec-tor of S. Consequently, the mapping X is a singleton for every vector of the dual feasible set, now given by Y = {(y, z) ∈ Rq × Rr|y ≥ 0}, and the function x is dened everywhere in Y. It follows from Lemma 3.1 that, for every (¯y,z)¯ ∈ Y and every sequence {(yk, zk)} in Y such that (yk, zk) → (¯y,z)¯ , we have x(yk, zk) → x(¯y,z)¯ . By continuity of f and h, we nd f(x(yk, zk)) → f(x(¯y,z))¯ and h(x(yk, zk)) → h(x(¯y,z))¯ , hence the gradient ∇g, given in (3.48) is continuous at (¯y,z)¯ , and thus over all Y.

We infer the following corollary of Lemma 3.1 and Theorem 3.3, which provides a sucient condition for the continuous dierentiability of the dual function of convex problems with strictly convex objectives.

Corollary 3.1 (Dierentiability for convex problems) Consider Pro-blem 3.2, where it is further assumed that the set S is a non-empty compact set, the objective and constraint functions f0, f1, ..., fq are continuous over S, and the objective function f0 is strictly convex. Then the function x is de-ned and continuous over Y = {(y, z) ∈ Rq × Rr|y ≥ 0}, and the dual function g is continuously dierentiable over Y, where its gradient is given by (3.48).

As we will see in in Section 3.4.4, many ecient optimisation methods are designed for continuously dierentiable functions. Hence the continuous dierentiability of the dual function proves a very useful property for solving the dual problem. Note that the condition that S be compact is not severely

3 The continuity overS of the objective and constraint functions of Problem 3.2 is guaranteed, in particular, whenS is a subset of the interior of[q

i=0dom(fi)][r

i=1dom(hi)].

Example 3.7 (Dierentiability of the dual function) Consider the convex optimisation prob-lem depicted in Figure 3.5(a)

minimise

x∈S ex1+ex2 subject to x1+x2a

x1x2= 0

(3.49) where we rst assume thatS=R2. The Lagrangian is given by

l(x, y, z) =ex1+ex2 +y(x1x2+a) +z(x1x2) =l1(x1, y, z) +l2(x2, y, z), (3.50) where we denel1(x1, y, z) =ex1(yz)x1+ay, andl2(x2, y, z) =ex2(y+z)x2. The functionx, introduced in Denition 3.3, is only dened over the set {(y, z) R2 : y > |z|} where we nd x(y, z) = (x1(y, z), x2(y, z)), with

x1(y, z) = ln (yz), x2(y, z) = ln (y+z), y >|z|. (3.51) By minimisation ofl(·, y, z)we nd for (y, z)R2,g(y, z) =g1(y, z) +g2(y, z), with

g1(y, z) =

−∞ ifyz <0

ay ifyz= 0

h(yz) +ay ifyz >0

, g2(y, z) =

−∞ ify+z <0 0 ify+z= 0 h(y+z) ify+z >0

, (3.52)

where we have introduced the functionh:R2dened byh(t) =t(1lnt), and such that dh(t)dt =lnt fort >0. We nd that the dual functiong, depicted in Figure 3.5(b), is continuously dierentiable over the set {(y, z) R2 : y > |z|} (i.e. the domain of denition of x), where its gradient is given by g(y, z) = (aln (y2z2),lny−zy+z). Notice that g(ea2,0) = 0, thus (ea2,0) is dual optimal by concavity of g. Since strong duality holds, the solution of (3.49) is unique and given by x(ea2,0) = (a2,a2).

Suppose now that S is the compact S={xR2:|x1| ≤b, |x2| ≤b}, whereb is a positive constant satisfying|a|<2b <. The function x is dened on R2 and (3.51) becomes

x1(y, z) =

b ifyze−b ln (yz) ife−b< yz < eb b ifyzeb

, x2(y, z) =x1(y,z), (3.53) while (3.52) becomes

g2(y, z) =

e−b+b(y+z) ify+ze−b h(y+z) ife−b< y+z < eb ebb(y+z) ify+zeb

, g1(y, z) =g2(y,z) +ay. (3.54) The dual functiong, depicted in Figure 3.5(c), is continuously dierentiable onR2, andg is now dened on R2 and such that g(y, z) = 0 i (y, z) = (ea2,0). Also, the solution of (3.49) is still given byx(ea2,0) = (a2,a2). Notice that2gis discontinuous on the linesy±z=e−b andy±z=eb, and dened and continuous over each of the 9 regions delineated by these borders.

3.4. Convex optimisation 59

x1+x2a x1x2=0

(2a,a

2)

x1 x2

0 a

a

(a) Objective function and constraints

yz=0

y+z=0

(ea2, 0) y z

0

(b) Dual function forz0withb=(S=R2)

yz=e−b

y+z=e−b

eb

(ea2, 0) y z

0

(c) Dual function forz0with|a|<2b <

Figure 3.5: Example 3.7

Example 3.8 (Dierentiability: non-strictly convex objective) Consider the convex opti-misation problem depicted in Figure 3.6(a)

minimise

x∈S x1+x2

subject to x21+x22a2 x1x2= 0

(3.55)

where S = {x R2 : |x1| ≤ b, |x2| ≤ b} with b < .The Lagrangian is given by l(x, y, z) = l1(x1, y, z) +l2(x2, y, z), where we dene l1(x1, y, z) = (yx1+ (z+ 1))x1a2y and l2(x2, y, z) = (yx2(z1))x2. For the sake of concision, only the dual points(y, z)lying in the zone of interest Y =R×R≥0are considered. The function x is uniquely dened over Y \ {(0,1),(0,1)}, where x(y, z) = (x1(y, z), x2(y, z))and

x1(y, z) =

bf igure ifz+ 12by

z+12y if|z+ 1|<2by b ifz+ 1≤ −2by

, x2(y, z) =

b ifz1≤ −2by

z−1

2y if|z1|<2by b ifz12by

. (3.56) For (y, z) Y \ {(0,1),(0,1)}, we nd g(y, z) = l(x(y, z), y, z) = l1(x1(y, z), y, z) + l2(x2(y, z), y, z) =g1(y, z) +g2(y, z), wheregi(y, z) =li(xi(y, z), y, z)(i= 1,2) and thus

g1(y, z) =

(b2a2)yb(z+ 1) ifz+ 12by

(z+1)4y 2 a2y if|z+ 1|<2by (b2a2)y+b(z+ 1) ifz+ 1≤ −2by

,

g2(y, z) =

b2y+b(z1) ifz1≤ −2by

(z−1)4y 2 if|z1|<2by b2yb(z1) ifz12by

.

(3.57)

Noting that g(0, z) =2b for|z| ≤1, we nd that g is continuous over Y and continuously dier-entiable overY \ {(0,1),(0,1)}, where g(y, z) = (x(y, z)2a2, x1(y, z)x2(y, z)), whileg shows discontinuities at(1,0)and(1,0). Ifb >

2

2 a, theng is zero at the unique point(12a,0), and we infer from (3.56) and Figure 3.6(c) that(

2 2 a,

2

2 a)is the unique solution of (3.55).

Whenb=

2

2 a, notice thatg is maximal and equal to

2aall over the triangle{(y, z)|y0,|z| ≤ 1

2ay}, and again(

2 2 a,

2

2 a)is the unique primal solution.

If b <

2

2 a, the optimality condition (3.22) is satised by all the points of the open segment {(0, z)| |z| < 1}, and the solutions of the dual problem are the points (0, z) with |z| ≤ 1, while the unique primal solution is(b,b).

If b=, thenS=R2is not compact, x(y, z)is not dened ify= 0, and (3.56) becomes x1(y, z) =z+ 1

2y , x2(y, z) = z1

2y , y >0. (3.58)

After computations, we nd g(y, z) = z22y+1 a2y over R×R>0, where x is uniquely dened andg is continuously dierentiable with gradient g(y, z) = (z2y2+12 a2,zy), and g(y, z) = −∞if y = 0, as depicted in Figure 3.6(b). Notice then that g(12a,0) = 0, and it follows from (3.58) that(

2 2 a,

2

2 a)is the solution of (3.55).

3.4. Convex optimisation 61

computed

x1x2=0

x2 1+x22=a2

(−22a,22a)

0 a x1

x2

(a) Objective function and constraints

by

(1 2a, 0)

y z

0

(b) Dual function forz0withb=(S=R2)

by

z1=2by z+1=2by

z1=2by z+1=2by

(1 2a, 0)

1

2b y

z

0

1 1

(c) Dual function forz0with 22a < b <

Figure 3.6: Example 3.8

restrictive, even when the feasible set of the problem is unbounded. In prac-tice, it is usual to specify S as a compact set characterised by a collection of suitable (possibly large) lower and upper bounds on the primal variables, so that the feasible region surrounding the optimal points is not aected. This is done in Example 3.7, where it is shown how, in a convex optimisation prob-lem where the objective function is strictly convex, adding a side constraint of the type x ∈ S with S compact can ensure the continuous dierentiability of the dual function. The compactness of S alone, however, is not a sucient condition for the dierentiability of the dual function, as seen in Example 3.8, where the gradient of the dual function of a convex optimisation problem with linear (hence not strictly convex) objective is discontinuous at some dual feasible points even when S is specied as a compact set.

When the objective function of convex optimisation problems is not strictly convex, dierentiability of the dual function can still be achieved through reg-ularisation techniques. These techniques aim to nd the optima of unfriendly optimisation problems, e.g. problems with non-strictly convex objective func-tions, by successively solving (exactly or approximately) sequences of regular problems that get asymptotically close to the initial problem.

Regularisation

Strict convexity of the objective function can be obtained by adding a quadratic term with small positive coecient ϵ, i.e.

0(x, ϵ) =f0(x)ϵ+ϵ∥x∥2 (3.59) Minimising Eq. (3.59) leads to suboptimal solutions for the optimisation prob-lem. Note that the term ϵ∥x∥2 = ∑n

i=1ϵ∥xi2 is additively separable with respect to the coordinates of x and does not compromise the separability of the problem, as we shall explain in Section 3.4.5. By decreasing ϵ to values small enough, the solution of the regularised problem can be made arbitrar-ily close to the set of solutions of the initial problem. It is thus possible to iteratively solve the initial problem by considering a sequence of subproblems with strictly convex objectivesf˜0(x, ϵk), where ϵk ↓ 0. That the sequence (ϵk) must vanish leads to implemetation issues, and to ill-conditioning issues for small values of ϵk. To avoid these diculties it is common to resort to more elaborate regularisation techniques such as proximal point methods, or the augmented Lagrangian techniques that will be introduced in Section 3.4.4.

Given the optimisation problem of minimising a real function f over a convex set X and a starting point x0 ∈ X, the proximal point algorithm is given by

xk+1 = min

x∈Xf(x) + 1

2ck∥x−xk2, k = 0,1,2, ..., (3.60)

3.4. Convex optimisation 63

where (ck) is a sequence of positive scalars. If f is convex, it is easily shown that (3.60) rewrites as xk+1 = Pkxk, where we introduce the successive op-erators Pk = (1 + ckT)−1, in which the operator T is such that T x is a subgradient4 of f at x. Then T is called a monotone operator, Pk is single-valued and said to be non-expansive for all k, and the algorithm converges to a solution if the sequence (ck) is bounded away from zero. The proximal point algorithm has the advantage over the regularisation technique (3.59) that the sequence (ck) needs not grow to innity and thus ill-conditioning can be avoided.

3.4.4 Iterative optimisation methods

In this section we briey discuss the basic optimisation methods for nding stationary points in optimisation problems. These methods are called itera-tive as they start with some initial guess x0 and generate a sequence of vec-tors {xk}k=1 which is expected to get closer and closer to the set of solutions.

An iterative algorithm is called convergent if any limit point of the sequences of vectors it generates is a stationary point of the problem. Since the scope of this study is limited to convex problems (where stationary points are actual solu-tions), we will mostly speak in terms of convergence to solutions. The methods listed in this section may nonetheless be used to uncover the stationary points of nonconvex problems. Focus is set on a particular class of optimisation algorithms, called the gradient methods, which allow for parallel implemen-tations and are therefore particularly attractive in the context of distributed network optimisation. Other accepted families of methods which are are not intended for parallel execution (e.g. the interior-point methods and barrier methods [FM68, Wri05], the augmented Lagrangian methods and method of multipliers [Hes69, Pow69, PT73, Ber76a, Roc76a, BT94, BT97, NW99], or se-quential quadratic programming (SQP) [GW12, BT95]) are not discussed in these text.

Convergence of iterative agorithms for convex problems

An iterative optimisation algorithm is globally convergent if the vector se-quences it generates are guaranteed to converge to a solution of the problem provided that such a point exists. The main properties that motivate the choice of a globally convergent iterative algorithm include the complexity of its implementation, the computational expense and the resource consumption

4The vectors is a subgradient off atxdom(f)iff(y)f(x) +s(yx)for allydom(f).

An operatorT is monotone if(T xT y)(xy)0for allx, y. It is said to be non-expansive with respect to the Euclidean norm ifT xT y∥ ≤xy for allx, y.

overhead it causes, and the quickness (or slowness) with which the generated sequences of vectors are expected to converge. These criteria are correlated in the sense that quick convergence is likely to minimise both the compu-tational load and the energy consumption overhead. In particular, the local convergence of an algorithm is concerned with the asymptotic properties of a sequence generated by the algorithm in the close vicinity of its point of con-vergence. Local convergence is typically characterised in terms of orders and rates of convergence, which we now dene.

Suppose that {xk} is a sequence of vectors of a real vector space which converges to a vector x. We say that {xk}converges Q-linearly to x if there exists a scalar µ ∈ (0,1) such that

lim sup

k→∞

∥xk+1−x

∥xk−x∥ = µ. (3.61)

The quantity µ is then called the (asymptotic) rate of convergence of {xk}. A Q-linearly convergent sequence with a small convergence rate converges rel-atively fast to its limit. If however the rate is close to 1, the convergence will be slow. If (3.61) holds with µ = 1, the sequence {xk} is said to con-verge sublinearly. If (3.61) holds with µ = 0, then the sequence converges superlinearly.

The convergence of a sequence can be further characterised by its order.

Let {xk} converge to a vector x. The sequence {xk} is said to converge with order q to x if q ≥1 and there exists a scalar µ > 0 such that

lim sup

k→∞

∥xk+1−x

∥xk−xq = µ. (3.62) Convergence is thus linear if (3.62) holds withq = 1, and it is called quadratic if (3.62) holds with q = 2.

The denition of the order of convergence is often extended to encompass some sequences which converge reasonably fast but do not satisfy (3.62). A sequence {xk} is then said to converge (at least) with order q to x if there exists a scalar sequence {yk} which converges to 0 with order q according to (3.62) and such that

∥xk −x∥ ≤ yk, ∀k. (3.63) In particular, sequences which satisfy (3.63) with q = 1 are said to converge at least R-linearly.

In this study we are mainly concerned with algorithms which generate linearly convergent sequences. A vector sequence{xk} in Rn which converges