NOT FOR QUOTATION WITHOUT P E R M I S S I O N O F THE AUTHOR
A DESCENT ALGORITHM FOR LARGE-SCALE LINEARLY CONSTRAINED CONVEX
NONSMOOTH M I N I M I Z A T I O N
K r z y s z t o f C . K i w i e l
A p r i l 1 9 8 4 C P - 8 4 - 1 5
C o l l a b o r a t i v e P a p e r s r e p o r t w o r k w h i c h h a s n o t b e e n p e r f o r m e d s o l e l y a t t h e I n t e r n a t i o n a l I n s t i t u t e f o r A p p l i e d S y s t e m s A n a l y s i s and w h i c h has received o n l y
l i m i t e d r e v i e w . V i e w s o r o p i n i o n s e x p r e s s e d h e r e i n do n o t n e c e s s a r i l y r e p r e s e n t t h o s e of t h e I n s t i t u t e , i t s N a t i o n a l M e m b e r O r g a n i z a t i o n s , o r o t h e r o r g a n i - z a t i o n s s u p p o r t i n g t h e w o r k .
INTERNATIONAL I N S T I T U T E FOR A P P L I E D SYSTEMS A N A L Y S I S A - 2 3 6 1 L a x e n b u r g , A u s t r i a
A Descent A l ~ o r i t h m f o r L a r ~ e - S c e l c Llnearly Constrained Cor?vcx Nonsnaath kiininization
Krzyaztof C. Kiwiel
Systems Research I n s t i t u t e , P o l i s h Academy o f S c i e n c e s , Ne~velska 6 , 01-447 Warsaw, Poland.
Abstract. A descent a l g o r i t h m i 8 given f o r s a l v i n g a l a r g e convex program obtained by augmenting t h e o b j e c t i v e of a
I
l i n e a r pragram w i t h a ( p a s e i b l y n o n d i f f e r e n t i a b l e ) canvex f u c t i o n depending an r e l a t i v e l y few v a r i a b l e s . Such prablens a f t e n a r i s e i n p r a c t i c e as d e t e r m i n i s t i c e q u i v a l e n t s ~f
s t o c h a s t i c programming p r o b l e m . The algorithm s oearch d i r e c t i m f i n d i n g subproblem8 can be solved e f f i c i e n t l y by t h e e x i s t i n g software f o r large-scale smooth optimizatian. The algorithzc i s both r e a d i l y implementable and g l o b a l l y convergent.
AIGS 1580 s u b j e c t c l a s 8 i f i c a t i a n . Primary: 65K05. .Secondary:gGC25.
Xey wards: Nonsmaoth o p t i m i z a t i o n , n o n d i f f e r e n t i a b l e p r ~ g r a m n i n g , l i n e a r c o n s t r a i n t s , convex programming, descent me t h ~ d s
,
l a r g e - s c a l e optimization
1 I n t r o d u c t i o n
T h i s paper p r e s e n t s a method f o r sol.ving t h e f o l l ~ w i n g problem
minimize <c,y>
+
f(x) over a l l ( y , x ) ~ I3'
s a t i s f y i n g Ay
+
Bx s b, (1.1)P -
where c e+, A i s an Pm hi-matrix, B i s an P mN-matrix, ~ E ,aa: R f :
61
+ R i s 1 a (possibly n o n d i f f e r e n t i a b l e ) convex f u c t i o n .Yie ouppose t h a t t h e s e t o f f e a s i b l e p o i n t s : & + B x ' b )
S = C ( ~ , X ) E R
i s nonempty and bounded, and t h a t a t each (y,x) t S we can
compute f ( x ) and a c e r t a i n eubgradient g f ( x ) c h f ( x ) , i.e. an a r b i t r a r y element o f t h e s u b d i f f e r e n t i a l
a
f ( x ) o f I" a t x a n which we cannot impose any f u r t h e r r e s t r i c t i o n s .Problems o f t h e fom (1.1) a r e o f t e n encountered i n p r a c t i c e , e s p e c i a l l y a s d e t e r m i n i s t i c , equivalents o f two-stage s t ~ c h z a t i c p r o g r a m i n g problems [KI]
,
[NWI],
[WIJ I n m a n y a p p l i c a t i a n st h e number LI o f " l i n e a r m v a r i a b l e 8 y i i s much l a r g e r than t h e number N o f "nonlinear" v a r i a b l e s
5 ,
and t h e m a t r i c e s A and 3a r e s p a r s e (have r e l a t i v e l y few nonzero e n t r i e s ) . I n such cases problem (1 . I ) can be solved by t h e e x i s t i q g algorithms f o r
large-scale optimization (e.g. LIINOS [?IGs~] ) i f f i s d i f f e r e n t i a 2 f e . I n t h e n o n d i f f e r e n t i a b l e l a r g e - s c a l e case, only a few a l g a r i t k z s
have been propotled [ B W ~ ] , m d they frequently nssunc t h c Laowledge of t h e f u l l e u b d i f f e r e n t i a l bf(2:) a t each x.
The ffiethod preeented i n t h i s paper modifies one given i n [1(3] t o makc use of t h e e p e c l a l s t r u c t u r e ~ f p r o b l e n ( 1 . I )
.
Iti o o f c n s i b l e p o i n t mathod of deownt i n " t h e scnce of 1~;encrotill:
succcsoivc poiritu i n S w i t h nonincreesing o b j c c t i v e values.
To d e a l w i t h n o n d i f f e r c n t i a b i l i t y o f f , . a t each i t e r a t i m a
piecewise l i n e a r (polyhedral) approximatim t o f is c ~ n s t r u c t e d fram a t m o s t N+2 a u b ~ r a d i e n t e o f f c a l c u l a t e d przviously a t
c e r t a i n t r i a l points. A cearch d i r e c t i o n i s found by salv>fig a q u a d r a t i c p r o g r a m i n g subproblem obtained by r e p l a c i n g f i n .
- .
(1.1) by i t s polyhedral approximation augmented w i t h a simple q u a d r a t i c term. Then a l i n e aearch f i n d s t h e - n e x t approximation
and t h e next t r i a l p o i n t . The two-point l i n e
s e a r c h i s employed t o d e t e c t d i s c o n t i n u i t i e s i n t h e - g r a d i e n t o f f.
.
We show t h a t t h e m e t h ~ d ie g l o b a l l y convergent under no a d d i t i o n a l assumptions. We m a y add t h a t t h e method w i l l f i n d a s o l u t i o n i n a f i n i t e number o f i t e r a t i o n i i f . f i s p31yhedral and c e r t a i n t e c h n i c a l conditions are s a ~ i s f i e d (see [K2] ). Fron l a c k o f space, we s h a l l pursue t h i s s u b j e c t elsewhere.. The method i s implementable i n t h e sense q f r e q u i r i n g b~uncied s t o r a g e and a f i n i t e number af simple operations p e r i t e r a t i a c . For prqblems w i t h l a r g e s p a r s e n a t r i c e s A and B and r e l a t i v e l y few n o n l i n e a r v a r i a b l e s xi-, t h e method can use EINOS [@I] f a r s o l v i n g i t s q u a d r a t i c programming subproblems. In f a c t , ' an e f f i c i e n t implementation o f tbe method would r e q u i r e rnadifying- UNOS' t o e x p l o i t t h e f a c t t h a t consecutive subproblems r e t a i n t h e o r i g i n a l c o n s t r a i n t s o f (1 .I ), d i f f e r only i n a f e u e u x i l i a r j l i n e a r c o n s t r a i n t s an x, have simple terms q u a d r a t i c i n x as t h e only n o n l i n e q r i t i e s - i n t h e i r o b j e c t i v e s , e t c . It waulci oe
i n t e r e s t i n g t o perform t h e necessary numerical experimentation, 'but we have n o t had the means - t o do s o .
Other. descent methods f o r e o l v i n g problem (1.1
-
) can bef ound i n [ D V ~ ]
,
[ ~ 4 ],
[ I S B ~ ~ . [ ~ l ],
[ ~ 2 ],
[x'~I] and Is~ill] None o f . t h e i r s e a r c h d i r e c t i o n f i n d i n g s u b p r o b l e m can be solvede f f i c i e n t l y by t h e a v a i l a b l e software when problem (1. I ) i s l z r c r .
. .
Therefare, we hope t h a t our method c ~ u l d c ~ m p e t e w i t h t h e e x i s t i n g al&orithms.
The method i s derived and s t a t e d i n S e c t i o n 2. I t s g l ~ b a l convergence i s e s t a b l i s h e d i n S e c t i o n t 3, where we a l s ~ d i s c u s s
the case of an unbounded f e a s i b l e a e t S. F i n a l l y , we have a c o n c l u s i ~ n s e c t i o n .
HE s h a l l use t h e following n o t a t i o n and t e n n i n o l ~ g y . 3 ki and R N denote t h e U- and N-dimensional Euclidean spaces w i t h the
u s u a l i n n e r prod;cts 4 * , * > and t h e arisociated norms I I
,
r e s p e c t i r e - ly. Vle use xi t o denote t h e i - t h component o f t h e v e c t ~ r x.. S u p e r s c r i p t s a r e used t o denote d i f f e r e n t v e c t o r s , e.g. x1 and r 2
.
A l l vectore a r e column vectors. However, f o r convenience a
column v e c t o r i n R Y+N i s eometimea denoted by (y ,x) even though
y and x a r e column v e c t o r s
i n RI"
and R N,
r e s p e c t i v e l y . For anyx r H N and 6 r 0 ,
N N
a , f ( x ) = t g c R : f ( i ) r f ( x )
+ig,%-r>
- r f o r a l l x c R 1denotes t h e e - s u b d i f f e r e n t i a l of f a t x. We denote by > f ( x ) t h e s e t b o f ( x ) , i.e. t h e ordinary s u b d i f f e r e n t i a l . Note t h a t
f i s continuaus and t h e mapping ( x , e )
- a e
f ( x ) i s 13ca11y b ~ u n d e d , because f i s real-valued and convex on R N ( s e e , e.G.CDV~I 1.
2. The Xethod
1 1
Given a a t a r t i n g p o i n t a'
-
(y ,x ) S , t h e a l y r i t h s k k kdeacrlbed below g e n e r a t e s eequencea o f p o i n t s z = (y ,X ) i n a ,
k k N k
s e a r c h d i r e c t i o n s dk = (d ,d ) i n
.dLm
R and s t e p s i z e s tL i n Y xb , 1 )
,
r e l a t e d by z k*l = zk+
tLd f o r k=1, 2,....
Theaequcnce a k l i s intended t o converge t o a s o l u t i ~ n o f p r ~ b l e r .
ld rl
i?
0k a C CI
3
d Q, CI
E 2
0 kn m .r(cn W
n 24 ax a 24
as
n 24 2N
(I] +' r: 4 0 a
r-l a .d k CI cn d n a
a X u n N
r-l d (d
k I3 k
n X. u k
+
h a .a 0 ' V n
-
H u Erc+
d Q, 01 n (I]'" ax
P: z ::
Q,' I
k k
minimize
6
(z t d )+ 3
ldx12 over a l l d=(dY,d,)s a t i s f y i n g z k +d E S, (2.1)
2 m k + l = x k
+
d, k .rn where t h e penalty t e r n141
/2 s e r v e s t o keep xA
t h e r e g i ~ n where
*fk
18 a c l o s e a p p r o x i n s t i o n . t o f a 8 3 t h a t F"(* )"k+l = zk
+
d k m C l e a r l y , d k m a y be faurlui 3 c l ~ c e t o P(* ) a t z k k k
. y a x E+N+l t o t h e f3110virnl; k-th fram t h e s o l u t i o n ( d d ,u ) a R
q u a d r a t i c pragramming subproblem
1 1.. +1.;
+
1minisize < c
,$> +
- u+
2I
dxl Dver a l l (dy ,dx,u) E Rk j k
s a t i s f y i n g f . +<g ,dx>
s
u f o r j E J a3 (2.2)
k k
A(y
+
dy )+
B(x+
d,) 2 b.Lore over,
s o we may i n t e r p r e t ,
vk = ;lr(zk
+
d k )-
~ ( 2 ' )k k k '
= < c , d Y
> +
u-
f ( x ) (2.3)as an epproximate d e r i v a t i v e of P a t zk i n t h e d i r e c t i o n dh.
It w i l l be convenient t o d e s c r i b e t h e P l i n e a r c a n s t r a i n t s - 1
S m R L ,
of problem (1.1) i n terms o f P a f f i n e f u n o t i 3 n s hi.
such t h a t
S = ( (y , x ) e #+N : hi(y ,x) 5 0 f o r i c I),
where I =11,...,~1. Then subproblem (2.2) t a k e s on t h e . f ~ m
1 . .
I +I?+?
minimize < c a d
> +
u+
Z l d x ~ 2 . o v e r . a l l (d d . , u ) E iT'.Y Y '
s a t i s f y i n g f k
+
<gJ, d,$ a u f o r j c J k,
3 ( 2 . 4
hi k
+
<Vyhi, d ) +<Vxhi,d,)s 0 f o r i o I k k k Ywith hi = hi(y ,x ) f o r i c I , s i n c e
k k k k
hi(y +dg,x +dx) = hi(y ,x )
+(v
h. ,d ) +(Vichi,dX)Y l Y
f 3 r a l l (dJ ,d,), because each hi i s a f f i n c .
Xaving n 3 t i v a t e d t k e a e a r c h d i r e c t i a n finding s u b ? r ~ b l e - a s , K C
s h a l l noiv s t a t e t h e nethod i n d e t a i l , c m n e a t i n g on i t s r a l e s I -
A 1 ~ o ~ i t h i . l 2.1
1 1 1
. S t e p 0 ( I n i t i a l i z a t i m l . S g l e c t a s t a r t i n g p o i n t z = (y ,x ) E 5, a f i n a l accuracy t o l e r a n c e e o r 0 and a l i n e s e a r c h p a r a ~ l e t e r
-1 - 1 -1 1 1 -1
n E ( 0 , l ) . S e t J1 = { I \ , z =(y ,X ) = e
,
G =gf(x ) andf: = f ( z 1 j . S e t ' t h e c o u n t e r 3 k = l , 1=0 and k ( ~ ) = 1.
k k L
Ster, 1 ( D i r e c t i o n findinrcl. Find t h e s o l u t i o n (dy
,
,d, u ) t o subproblem ( 2 . 4 ) , and Lagrange multipliers x', j c J ~ , and,
i e I , o f (2.4) such t h a t t h e s e t3
. Y i
J
c a t i a f i e s
1 jk1
5 N+1.
S e t dk =($,
2;) and - compute vk by (2.3).S t e p 2 stoppi pin^ c r i t e r i o n l . I f v k
r -
t,, t e r m i n a t e ; otherwise, c o n t i n u e-k+l,-,k+l) = ,k + dk. If
-k+l = (y
S t e p 3 (L.i.ne s e a r c b l . S e t z
s e t
ti
= 1 ( s e r i o u s s t e p ) , s e t k ( l + l ) = k + l arid i n c r e a s e 1 bjr 1;o t h e r w i s e , i . e e i f (2.5) does n ~ t h a l d , s e t
ti
= 0 ( n u l l st;>).k k k
S e t z k+l (y k + l , x k + ' = z
+
t L d.
4 ( L i c e a r i z a t i o n u ~ d a t i n ~ : ) . S e t J "k
= J u t i c + l l . S e t ( ~ k + l )
= Gf J
-iC+1>
1 k+l
-
= f ( k k + 1 )
+<, ,
x -%+I # - (2.6)
k "k
8''
3 = fk j+<
gJ, Xk+'-
x>
fsr j E J.
I n c r e a s e k by 1 and go t o S t e p 1.
A few remarks 3n t h e a l g o r i t h m are i n o r d e r *
F o r prablctis af i n t e r e s t t~ us, s u b p r ~ b l e t t s ( 2 . 4 ) w i l l
have r e l a t i v e l y few n o n l i n e a r vari::bles (1: w L) and l a r g e , l ~ t s ~ t r ~ z s
c o n s t r a i n t n a t r i c e o . Such subprablenn can be s o l v e d by LiIiGCS [l-.~:]
i n a f i n i t e n a c b e r o f i t e r a t i a n s ; mareaver, XIIiOS w i l l a u t o c a t i - c a l l y a t ~ o s t 1i+1 nonzers h g r a n g e m u l t i p l i e r s A'
3
f o r t h e f i r s t c o n s t r a i n t s of ( 2 . 4 ) , s i n c e t h e s e c a n s t r a i n t s i n ? a l v e an11 Ii+1 v a r i a b l e s .
In bye? 2 we always have
F ( Z ) 2 p ( z k ) + vk L I V ~
llh
I X-I kI
f a r a l l z=(y ,x) c S, (2.3)and hence
p ( z k ) S min { ~ ( z ) : z r S I
-
vk+
T h i s w i l l be proved i n t h e next s e c t i o n . The abave estiniz'tes j u s t i f y t h e stoppifig c r i t e r i o n of 3he ~ e t h a d .
--,:+I S t e p 3 i s always e n t e r e d w i t h vk 4 0. The t r i a l p s i n t z i s a c c e p t e d as t h e n e x t i t e r e t e z agly i f t h i s decrea9ez a i g n i f i c r n t l y t h e o b j e c t i v e value. Otherwise t h z a l g s r i t k ~ s t a y s a t z k+l = zk ( a n u l l s t e p ) , b u t t h e new s u b g r a d i e n t
i n f a m a t i o n c o l l e c t e d - a t "k+l z w i l l a i d i n f i n d i n g a b e t t e r r.exz s e a r c h d i r e c t i o n , s i n c e k+l E ' J ~ + ' . O f c o u r s e , { z k
3
c S , because-k+l z = zk .+ dk . E
s
f o r a l l k.\de may add t h a t if t h e r e are no l i n e a r v a r i a b l e s i n prsblem (1.1 ) (Y=o)
,
t h e n Algorithm 2.1 becames s i m i l a r t o t h e nethod of 1 ~ 3 3 .3- C n n ~ e r ~ y e n c c
I n t h i n s e c t i o n we s h o w t h a t t h e a l g ~ r i t h m g e n e r a t e s a u i n i - n i z i n g sequence izk] c S, i D e . ~ ( z k )
4
min ( P ( Z ) : z E s ) ; m r e 3 v e r ,-
0there e x i s t s = (y , x ) i n t h e s e t of ~ a l u t i o n s af prablein (1.1 )
s u c h t h a t xk 4
f
and y kAi
f o r eorne i n f i n i t e o e t Kc{1,2,...).ile assume, of c o u r s e , t h a t t h e f i n a l a c c u r a c y t ~ l e r a n c e t s i s s e t
t~ zero. Our a n a l y s i s w i l l dwelve on t h e r e s u l t s i n rd2], [ ~ 3 ]
.
\'le s t a r t by a n a l y z i n g t h e f ~ l l o w i n g d u a l t o t h e k - t h s u b p r a o l e z
s u b j e c t t o 3 . 2 0 f o r j e J , k 5 = I ,
J k J (3.1)
j c J
p i 2 O f o r i a I ,
where
dk'= J f ( X k )
-
f J f o r j a J k.
Lemaa 3.1. ( i ) The Lagrange m u l t i p l i e r s ( hk,$) of (2.4) s o l v e
k k k .k
(3.1) and y i e l d t h e u n i q u e p a r t (d,,u ) of t h e s o l u t i s n (6 ,a ,uk)
Y X
of (2.4) by
where
( i i ) The o p t i m a l v a l u e wk of (3;1) s a t i s f i e s
and one h 3 ~