Approximation algorithms for the interval constrained coloring problem

(1)

Approximation Algorithms for the Interval Constrained Coloring Problem

Ernst Althaus· Stefan Canzar . Khaled Elbassioni . Andreas Karrenbauer . Julian Mestre

Abstract We consider the interval constrained coloring problem, which appears in the interpretation of experimental data in biochemistry. Monitoring hydrogen- deuterium exchange rates via mass spectroscopy experiments is a method used to obtain information about protein tertiary structure. The output of these experiments provides data about the exchange rate of residues in overlapping segments of the protein backbone. These segments must be re-assembled in order to obtain a global pic- ture of the protein structure. The interval constrained coloring problem is the mathematical abstraction of this re-assembly process.

The objective of the interval constrained coloring problem is to assign a color (exchange rate) to a set of integers (protein residues) such that a set of constraints is satisfied. Each constraint is made up of a closed interval (protein segment) and requirements on the number of elements that belong to each color class (exchange rates observed in the experiments).

An extended abstract appeared in the proceedings of the II th Scandinavian Workshop on Algorithm Theory, SWAT 2008, held in Gothenburg, Sweden in July 2008.

A. Karrenbauer is supported by the Deutsche Forschungsgemeinschaft (DFG) within Priority Programme 1307 "Algorithm Engineering".

J. Mestre research supported by an Alexander von Humboldt fellowship. E. Althaus

Johannes Gutenberg-Universitiit, Mainz, Germany S. Canzar

Centrum Wiskunde & Informatica, Amsterdam, The Netherlands K. Elbassioni . J. Mestre (181)

Max-Planck-Institut fUr Informatik, Saarbriicken, Germany e-mail: jmestre@mpi-inf.mpg.de

A. Karrenbauer

Institute of Mathematics, EPFL, Lausanne, Switzerland

http://dx.doi.org/10.1007/s00453-010-9406-0

Konstanzer Online-Publikations-System (KOPS) URN: http://nbn-resolving.de/urn:nbn:de:bsz:352-172659

(2)

We show that the problem is NP-complete for arbitrary number of colors and we provide algorithms that given a feasible instance find a coloring that satisfies all the coloring requirements within

±

1 of the prescribed value. In light of our first result, this is essentially the best one can hope for. Our approach is based on polyhedral theory and randomized rounding techniques. Furthermore, we consider a variant of the problem where we are asked to find a coloring satisfying as many fragments as possible. If we relax the coloring requirements by a small factor of (I

+

^E),we propose an algorithm that finds a coloring "satisfying" this maximum number of fragments and that runs in quasi-polynomial time if the number of colors is poly logarithmic.

Keywords Approximation algorithms· Coloring problems· LP rounding 1 Introduction

Our motivation for the interval constrained coloring problem comes from an application in biochemistry. The problem has been introduced recently by Althaus et al. [I].

To be self-contained, we restrict ourselves to a very brief and informal description in this paper and refer the interested reader to the publication mentioned above.

A challenging and important problem in biochemistry is to determine the tertiary structure of a protein, i.e. the spatial arrangement, which is indispensable for its function. There are various approaches each with advantages and drawbacks. One method for this task is the so-called hydrogen-deuterium exchange, abbreviated by HDX.

This is a chemical reaction where a hydrogen atom of the protein is replaced by a deuterium atom, or vice versa. To this end, the protein solution is diluted with D20.

Intuitively, the exchange process happens at a higher rate at amino acids, or residues, that are more exposed to the solvent. Put differently, the exchange rates for residues at the outside of the complex are higher than inside. Note that though deuterium is heav- ier than hydrogen, they are almost identical from a chemical point of view. Hence, the exchange rate may be monitored by mass spectroscopy while the tertiary structure remains unaffected by the process. However, this method does not deliver that fine grained information such that the exchange rate for each residue can be determined directly. Rather, we get bulk information for fragments of the protein. For example, we get the number of slow, medium, and fast residues for each of several overlapping fragments covering the whole protein. That is, the experimental data only tells us how many residues of a fragment react at low, medium, and high exchange rate, respectively. Moreover, we know the exact location and size of each fragment in the protein. It remains to find a valid assignment of all residues to exchange rates that matches the experimentally found bulk information. If the solution is not unique, we want to enumerate all feasible solutions or a representative subset thereof as a basis for further chemical considerations.

The problem can be rephrased in mathematical terms as follows. We are given a protein of n residues and a set of fragments, which correspond to intervals of [n].

The fragments cover the whole protein and may overlap. Furthermore, there are k possible exchange rates to which we refer as colors in the following. The goal is to produce a coloring of the set [n] using k colors such that a given set of requirements is satisfied. Each requirement is made up of a closed interval I S; [n] and a complete

(3)

specification of how many elements in I should be colored with each color class. We refer to this problem as the interval constrained coloring problem.

More formally, let I be a set of intervals defincd on the set V = [n], let [k] be a set of color classes, and let r : I x [k] -+ Z+ be a requirement function such that L celk] r (I, c) = III for all I E I. A coloring X : V -+ Ik] is said to be feasible if for every I E I we have

I{i E I: xU)

=

ell

=

r(l, c) for all c E [k]. (I) Given this information, we would like to determine whether or not a feasible coloring exists, and if so, to produce one.

The problem is captured by the integer program given below. The binary variable Xi,c indicates whether i is colored c or not. Constraint (2) enforces that each residue gets exactly one color and constraint (3) enforces that every requirement is satisfied.

L Xi.c

=

I Vi E [n],

celk]

LXi,c=r(l,c) V/EI, cE[k],

iel

Xi,c E

to,

^I} Vi E [n], C E [k].

(2)

(3)

(4) Let P be the polytope obtained by relaxing the integrability constraint (4) in the above integral problem. That is P is the set of values of x obeying (2), (3) and 0 S Xi,c S 1 for all i and c.

1.1 Previous and Related Work

The polyhedral description was introduced in [I] and has served there as a basis to attack the problem by integer programming methods and tools, which perform well in practice. Moreover, the authors established the polynomial-time solvability of the two-color case by the integrability of the polytope P and provided also a combina- torial algorithm for this case. However, the complexity of the general problem was left open. Very recently, Komusiewicz et at. [21 showed that thc problcm is fixed- parameter tractable with respect to parameters such as the maximum fragment length, and the maximum number of fragments containing a given residue.

A closely related problem is broadcast scheduling, where a server must decide which data item to broadcast at each time step in order to satisfy client requests. The literature in broadcast scheduling is vast and many variations of the problem have been studied (see [3,4] and references therein). In the variant we are concerned with here, a client request is specified by a time window I and a data type A. The request is satisfied if A is broadcast at least once in I. The similarities between the two problems should be clear with time steps, time windows and data types in broadcast scheduling playing the respective roles of positions, intervals and colors in interval constrained coloring. There are, however, important differences. First, whereas in broadcast scheduling it does not hurt to broadcast an item more times than the prescribed number, in our problem it does. Second, an interval is satisfied only if all the

(4)

requirements for that interval are satisfied exactly, which indicates that our problem may be harder.

1.2 Contributions of this Paper

As mentioned above, the complexity status for the interval constrained coloring problem has been open. In Sect. 4 we partly settle this by showing that deciding whether a feasible coloring exists is NP-complete when k is PaJt of the input.

Although the polytope P is integral for k = 2 [1], it need not be for k ~ 3. Never- theless, we can check in polynomial time whether P

=

0. If that is the case then we know that there is no feasible coloring. Otherwise we can find a feasible fractional solution. In Sect. 2 we will show how to round this fractional solution to produce a coloring where all the requirements are satisfied within a mere additive error of one.

In practice, the data emanating from the experiments is noisy, which normally causes the instance to be infeasible and in some case even forces P to be empty.

To deal with this problem in Sect. 3 we study a variant of the problem in which we want to maximize the number of requirements that aJ'e satisfied. We relax the coloring requirements by a small factor of (I

+

E) and propose a divide-and-conquer algorithm that finds a coloring "satisfying" this maximum number of requirements in time n O( T k2 logn log Ill) , for any E > O. Another way to deal with noisy data is to model the noise in the linear programming relaxation to get a new set of requirements on which to run the algorithm from Sect. 2. The latter approach was explored by Althaus et al. [1]; the reader is referred to their paper for details.

2 A ±1 guarantee

Let x be a fractional solution in P. We use the scheme of Gandhi et al. [4] to round x to an integral solution X.

Theorem 1 Given a fractional solution x E P we can construct in polynomial time an integral solution

x

^{with the}following properties

(PI) For every i E [n] there exists c E [k] such that Xi,c = I and Xi,d = 0 for all d=/=c.

(P2) For every! E I and c E [k] we have I LiE! Xi,c - r(l, c)1 :::: 1.

(P3) Every! E I is satisfied with probability at least Yk = k(k~L;~~-I) , where Hk- l

is the k - 1st harmonic number I

+ ! ⁺ ^{... +}

^k~l'

In other words, each position gets exactly one color (PI), every coloring requirement is off by at most one from the prescribed number (P2), and all the requirements for a given interval! are satisfied exactly (LiE! Xi,c

=

r(l, c) for all c E [k]) with probability at least Yk. An interesting corollary of this theorem is that if P is non-empty then there exists always a coloring satisfying at least Yk

III

intervals, and such coloring can be found in polynomial time.

The high level idea is to simplify the polytope P into another integral polytope with basic solutions satisfying (PI) and (P2). Then we show how to select a basic

(5)

0.2 0.1 0.5 0.3 0.6

'\.

f ~ ^f ^,

\ /

0.2 D.l 0.4

\V

_n_e

I

D.l D.3 0.6

\V

0.3 0.8 0.2 0.4

, ^\

^O.:{

^V

_Be ^0.7

" ^f ^/ ^,

3

D.l D.2 D.4

\V

_n_e

"

Fig. 1 How the blocks Bj' are constructed. The xi.c values appear on top and the Yi.(c.j) values appear on the edges. Note that a block can only overlap with its predecessor or successor. In this case <tc = 0.7

solution satisfying (P3). This is done by defining a set of blocks and then setting up an assignment problem instance between [n] and the set of blocks, whose polytope is integral.

For each color class c E [k] we choose a real number etc E [0, I], to be specified shortly. For a fixed color class c, let be = fLiE[rz] xi,c - etc

1 +

I. We define blocks B)', B~, ... , BZc as follows

Bi

= {t

^E^[n]^{: j}^- ²

+

etc <

I>i,C

1:5':1

and LXi,C < j - I

+

^{etc }.}

1<1

(5)

For each i E Bi we define a variable Yi,(c,j)' If i belongs to a single block Bi of color c then we set Yi,(c,j)

=

Xi,c. Otherwise, i belongs to two adjacent blocks Bi+1 and Bi, in which case we set Yi,(c,j+l) = LI.:';i XI,C - (j - I

+

etc) and Yi,(c,j)

=

Xi,c - Yi,(c,J+I). See Fig. I for an example of how the blocks and the solution y are constructed. Another, equivalent, way to define Y is to ask that Xi,c

=

Lj Yi,(c,j), LiEBf Yi,(c, I)

=

etc and LiEBj' Yi,(c,j)

=

I for every I < j < be.

Thus Y defincs a feasible fractional assignmcnt betwecn [n] and the set of blocks. Let Q be the polytope of this assignment problem, namely, the set of vectors Y such that

L Yi,(c,j)

=

I Vi ^EIn], (6)

Be . _j₃₁

L Yi,(c,j)

=

I Vc E [k] and) < j < bc, (7) iEBj'

L Yi,(C,j) ::S I Vc E [k] and j E {1,bcJ. (8) iEBj'

Yi,(c,l) 2: 0 ^ViÊ^[n],^cÊ^[k], ^tÊ^[bcJ. (9)

It is well known that the LP matrix defining Q is totally unimodular [5, Chap. 18], which in turn implies that the extreme points of Q are integral. Therefore, if there exists a fractional solution Y E Q then there must exists another integral solution

Y

^EQ. Furthermore, we can find such an integra) solution in polynomial time. Notice that an integral solution

y

to Q induces an integral solution

x

^by^setting^Xi^,c

⁼

^I

(6)

if and only if Yi,(c,j) = I for some j, Constraint (6) implies that

x

satisfies (PI), FUithermore,

x

also satisfies (P2),

Lemma 1 Let

Y

be an integral solutionfor Q and let

x

be the coloring induced by

y,

Then 1 ^{L i}^{Ef Xi,}^c^- r(J, c)1 S I for all / E I and c ^E[k],

Proof Since L iEf Xi,c

=

r(J, c), the number of blocks of color c that intersect / is either r(J, c) or r(J, c)

+

L Furthermore, at least r(J, c) - I of these blocks lie entirely within / and at most two blocks intersect / partially, Due to constraints (6) and (7), each internal block will force a different position in / to be colored c, On the other hand, the fringe blocks, if any, can force at most two additional positions in /

to be colored c, Hence, the lemma follows, 0

Remark In our application domain the goal usually is not to find a single solution, but to generate a number of candidate solutions, An expert then examines this candidate set and selects the most biologically relevant one, Our framework is amenable to this task since there are very efficient algorithms to enumerate all the integral solutions of Q [6],

It only remains to prove that

x

obeys (P3), To do so, we need to introduce some randomization in our construction, First, we will choose the offset (Xc of each color c E rk] independently and uniformly at random, Second, instead of choosing any extreme point of Q, we choose one using a randomized rounding procedure.

Gandhi et al. [4] showed that any fractional solution Y E Q can be rounded to an integral solution

y

^EQ such that the probability that Yi,(c,j)

=

I is exactly Yi,(c,j) , It is important to note that these events are not independent of each other,

Lemma 2 Let

Y

be the solution output by the randomized rounding procedure and

x

the coloring induced by it. For any interval/ E I, the probability that L iEf Xi,c = (/ )fi II [k] ' I k(k+I- Hk_J)

r ,c or aCE IS at east (k+I)! .

Proof Let / be an arbitrary, but fixed, interval throughout the proof and for the time being let us concentrate on a fixed, but arbitrary, color c E [k]. Let

f

and I be the indices of the first and last blocks of color class c that intersect / and define fJc = LionB

f

^Yi^{,(c,f) ,}or, equivalently, L iEfnBf ^Yi,(cJ)

=

1 - fJc.

Intuitively, the probability that L iEf Xi,c

=

r(J, c) should be greater when the blocks of c are aligned with / (when fJc is close to

°

or 1) and it should be low when they are not (when fJc is around 0.5). By choosing (Xc uniformly at random, fJc also becomes a random variable uniformly distributed in [0, 1]. Thus, we have a decent chance of getting a "good value" of fJc.

Let us formalize and make more precise the above idea. Denote with ~f and ~l

the events L iEfnB

f

^Yi^,(c,f)

⁼

Î ând^{L i}ÊfnB['^Yi,(c,l)

⁼

1 respectively. Let fJ

=

(fJ I , ... , Ih) be the vector of offsets for the color classes. For brevity's sake we denote

Pr[~ 1 fJ] with Prf![~]'

Prf!

[I)i ,C ⁱ⁼

^r(J,

C)]

⁼

Prf![~f~l

^v

~f~tl

IEf

(7)

= Pr#[~f~/]

+

Pr#[~f~Ll

~ min{Pr#[~f], Pr#[~tl}

+

min{PrfJ[~f]' Pr#[~tl}.

Since PrfJ [~f] = f3e and PrfJ [~/] = I - f3c, it follows that

PrfJ

[I)i, e

^=F^r(/^,

^C)] ~

²^min{f3e,^{I -} ^f3e}.

lEI

(10)

As a warm-up we first show that the probability that all requirements for J are fulfilled is at least (k11)!' Denote with r the event Vc : LiEI Xi,e = r(/, c). Recall that the vector f3 is distributed uniformly over the domain D

=

^[0,^I]k. Conditioning on f3 and averaging over D gives the desired result.

Pr[r]

= 1

^Pr#

[vc :

^{LI Xi}^,e

⁼

^r(/^,

^C)]

^df31^.^.^{. df3k}

IE

2:

1

^{max{ 0,}^{I -} ^L ^Pr#^{[ L}^Xi,c^=F^r(/^,

^C)]}

^df31^..^·df3k

D eElk] iEI

2:

1

^max

^{o,

^{I -} ^{2 L} min{f3e, I - f3e} } df31 ... df3k

D CElk]

=21

^max{o,

^~

^- ^L ^min{f3e,¹^- f3e}} df3I···d f3k.

D eElk]

The first inequality follows from the union bound and the second from (10). A mo- ment's thought reveals that the function inside the integral is symmetrical in the 2k orthants around the point

(!, ... , !)

^ED. Therefore, setting D' = [0,

!t

^{we get}

Pr[rl 2: 2k+'

l,

^max{o,

^~

^- ^L ^f3e^}^df3I^"^·df3k.

D eElk]

The right hand side of the above inequality can be interpreted as the volume of a (k

+

I)-dimensional simplex.

Pr[r] 2: 2^k+1

Vol ( A ^E

R~+ I: L

^Ai

^~ ^~)

iEIk+I]

(I )k+1

=2^k+I...,.::-2 _ _

(k

+

I)!

(k +

I)!

In order to get the stronger bound in the statement of the lemma we need two more ideas. First, we claim that we only need to condition on fulfilling k - I requirements:

(8)

Because LCElk] r(/. c) = III. once we get k - I colors right. the kth requirement must be satisfied as well. Second, since we can condition on any k - I colors. we had better condition on the ones with smallest offset, that is, those that are close to 0 or I.

Pr[r]

= /,

^PrtJ

[vc :

^LXi,C

⁼

^{r(/. C}^{)]d,BI .}^..^d,Bk

D lEI

~ L ^~f~

^{max{^O^{. I -}

^~prtJ

^[E^X^i.C

ⁱ

^r(/^.

^C ^)]}

^}d,BI^..^.^d,Bk

~

/, max{max{o. I - 2 Lmin{,Bc. I - ,Bcl} }d,BI" ·d,Bk

DdElk] c#

=

2k /, max {max{o. I - 2 L,Bc} }d,BI ... d,Bk D,dElk]

cld

=

2k+¹^{/ ,}max{o.

~

^- ^L ^,Bc

⁺

max,Bd }d,BI ... d,Bk·

D' 2 dE[k]

CE[k]

The last integral can be simplified by assuming that the maximum ,Bd is attained by the last variable. Of course, the maximum can be any of the k variables. thus these two quantities are related by a factor of k.

pr[r] ~ k2k+'l '1

I

[l k_l max{o.~ -

^L ^,Bc}d,B^I^...^d,Bk^-^l]dZ.

o [O.z] CElk- l]

Let T(z) denote Vol(A E Ri :

L7=,

^Ai^::::^~and AI •.... Ak-I :::: z). Then we can rewrite the above integral as

I

Pr[r]:::: k2k+1

10 '1

^{T(z) dz.} ^(ll)

The volume computed by T (z) is not a simplex, but it can be reduced to a summation involving only the volume of simplices using the rrinciple of inclusion/exclusion.

Let V (p) denote the volume Vol (A E Ri : Li

=

^{I Ai}^::::p) and recall that V (p) =

F'

k Consider what happens when z E

[!.

^~);^clearly^T(z)^<^V(~)^since^V(~)^in-

cludes points A E Ri such that Ai > z for exactly one coordinate i E [k - I] (since

z

~

!).

Notice that

Vol (A E

R~

^:

t

^A^j^::::

^~

^{and Ai}^>

^z ⁾

⁼^V

^(~

^-

^z ^).

J=I

Thus T(z) = V(~) - (k - 1) V(~ -z) for Z E

[! ,

^~],^{but T(}^{z) >}^V(~)^- ^{(k -} ¹⁾^V(~

z) for Z E [0,

!)

since the volume of points y such the constraint Ai :::: z is violated for

(9)

two coordinates is subtracted twice. To avoid cumbersome notation, assume yep)

=

o

^if^p

:s

O. A simple inclusion/exclusion argument yields

k-l(k_ l) . (I ) T(z)

= L

ⁱ ^{(-1/ V}

^{2 -}

^{iz .}

1=0

(12)

Plugging (12) into (II) we get

( ^{ ^~ ( I )

^k^-^I^(k

I)

^{^ir ⁽

I ))

Pr[r]

~

²^k⁺¹^k

10

^V

2

dz + {;1

~

^(_I)i

10

^I ^V

2 -

iz dz

(

I (I)k k- I ( ) I (I ·)k )

k+1

In :!

²

L

^{k -} ^I ⁱ

In

^r, ^{2 -} ^lZ

=2 k - dz+ (- I) dz

o k! . i 0 k!

1=1

-2^k+¹k __ _I' ²

(

I

I~

^k-^l(k ^_ ^l) ^.^(.!. ^- iZ)(k+lllir) - k! 2kz o +{;1 i ( ) (k+I)!(- i) 0

(

I k-l(k_ l) (_I)i )

= 2k+lk k!2(k+ll

+

L i (k

+

1)!2k+1i

1=1

k ( k-l(k_ I)(_I)i)

=(k+l)! k+I+L i - i - ·

1=1

Using induction on k, it is straightforward to show that the sum in the last line adds up exactly to - Hk-I, which gives us the desired bound of Yk. 0

3 Maximum Coloring

In this section we study a variant of the interval constrained coloring to deal with instances that do not admit a feasible coloring. For these instances we consider the problem of finding a coloring that maximizes the number of intervals satisfying (I).

More generally, we assume a non-negative weight w(l), associated with each interval I E I, and seek a subset I' ~ I, maximizing w(I') ~ L/EI' w(l), such that there exists a coloring of V satisfying (1) for each I E I'. We call this problem MAXCOLORING. Let OPT ~ I be a subset achieving this maximum. For a E (0, I]

and fJ ~ I, an (a, fJ)-approximation of the problem is given by a pair (X, I') of a subset I' ~ I, and a coloring X : V ~ [k], such that L /EI' w(l) ~ a . w(OPT), and frr(I, c)

:s

N x (I, c) :S fJr(I, c), where N x (I, c) is the number of positions in I colored c by

x.

(10)

Theorem 2 Consider an instance (V, I) of MAXeOLORING with

IV I

= nand

III

= m. Then we can find a (I, I

+

E)-approximation in quasi-polynomial time n OC '( k2 loglliogm) ,for any E > 0.

Note that the above bound is quasi-polynomial for k

=

polylog(n, m). To prove Theorem 2 we use a similar technique as in [7]. Our approach can be divided into two parts: (i) Reducing the search space to E-partial assignments (to be defined): An E-partial assignment represents a set of colorings, for which we can bound the range of the number of vertices of a certain color within intervals I E I by a factor of (I

+

E). Evaluating intervals I E I on the basis of E-partial assignments then allows us to limit the violation of their requirements by any of the corresponding colorings to a factor of (1

+

E). (ii) Developing a divide-and-conquer algorithm that finds an E-partial assignment, and thus a coloring, that "satisfies" a maximum weight set of intervals. We explain these two steps in more details in the next subsections.

3.1 Reducing the Search Space

Let E >

°

be a given constant. For a vertex u ^EV and a set of intervals I on V, denote respectively by ILCu), IR (u) and I(u), the subsets of intervals of I that lie to the left of u, lie to the right of u, and span u, that is

h(U)={[S,t]EI: t -:su- I), IR(u)= {[S,t] EI: s~u+l),

I(u)

=

Irs, t] EI: s

-:s

u

-:s

t}.

Denote by V LCu) and V R (u) the sets of vertices that lie to the left and right of u E V, respectively: VLCu)

= Ii

E V: i

-:s

u} and VR(U)

=

(i E V: i > u).

Our divide and conquer algorithm will reduce the original problem into many subproblems which are very easy to solve. More precisely, the algorithm constructs a recursion tree such that, in the subproblems defined at the leaf nodes, the requirements are essentially defined on intervals that either start or end at the same point. This motivates the following definition.

Definition 1 (Assignment) Let Vi = {p, p

+

I, ... ,q}. An assignment on Viis a pair sci

=

(I, r) of intervals I on Vi and a function r : I x [k] r-+ {a, I, ... , I V'I} such that

(el) r(!, c)

-:s

r(!', c) for alII, I' E I with I S; I' and all c E [k], and (e2) L CE[k] r(!, c) = III for every I E I.

sci is called a left-assignment, respectively right-assignment, if all intervals in I start at p, respectively end at q.

We will show in Lemma 4 that the subproblem corresponding to a left-or right- assignment is straightforward to solve. To reduce the original requirements on arbitrary intervals into requirements on left- and right-assignments we use a simple idea.

(11)

Suppose that we consider all intervals containing a given vertex u*. Then an opti- mal coloring can be used to divide the requirements on any such interval I into two groups: the requirements on subintervals of I to the left of u* and the ones on right subintervals of I. These two groups define respectively a right-and a left-assignment.

The main observation now (see Lemma 3 and Observation 1 below) is that if we are willing to have a violation of the requirement by a factor of (l

+

E) then we do not need to guess all left-and right-assignments; it is enough to guess the ones at which the number of vertices of a certain color increases by powers of (1

+

E). This restriction leads to a logarithmic number of intervals in the assignments. Since the total number of left-and right-assignments we have to evaluate in the course of our algorithm depends exponentially on the number of intervals these assignments contain, this restriction is necessary to obtain the desired poly logarithmic time bound. The resulting guess is represented by an I" -partial assignment, which we will define next.

Definition 2 (E-Partial Assignment) Let u* E Vi be a given vertex of Vi = (p, P

+

1, ... , ^q).A set of hI

+

^h2

+

2 intervals I =IA UIB, IA

=

^{(II, ...}^,

hi'

^IIII⁺^tland IB

=

^{(I;, .}^.^{. ,}^I~^{2' I~2+}^1),together with a function r: Ix [k] t-+ (O, I, ... ,

IV I),

such that

(R I) all intervals end at u*, or start at respectively u*

+

I: Ij

=

^{[u j, u*]}^forj E

{l, 2, ... , htl, IIII+I

=

[p, u*], Ij

=

[u*

+

I, uj] for j ^E(1,2, ... , h2), and 1/.2+1

=

^[u^*

+

I,q], where Ul_li < UIII- I < ... < UI < u* and u*

+

1 < £III <

I I

£1 2 < ... < £1"2'

(R2) (IA, r) is a right-assignment on (p, ... , u*), and (Is, ^r)is a left-assignment on (£1*

+

I, ... , q),

(R3) for every I E

I\

(l11i +1, //.

2+₁) there exists a color c E [k] and an integer i E Z+

such that r(I, c)

=

^f(l

+ d1,

^and

(R4) for every color c E [k] and integer i E Z+ with i .::: LCIogr(IIII+I, c)/Iog(l

+

E)J, there exists I E IA such that r(I, c)

=

^f(l

+d1;

likewise, for every c E [k]

and i E Z+ with i .::: LCIogr(I!.2+1' c)/log(l

+

E)J, there exists I' E Is such that r(I', c) = f(l

+ dl,

will be called an E-partial assignment w.r.t. u*, denoted by [JlJ = (u*, I, r).

Properties (R3) and (R4) together ensure that the intervals in the left- and right- assignment defining the E-partial assignment are exactly those where the number of vertices of a certain color increases by a power of (l

+

E). From property (C2) of an assignment (with

I

I .:::

n) and property (R3) of an I" -partial assignment it follows that hI, h2 .::: fklogn/ log(l

+

E) - 11- In Fig. 2 vertices (p, p

+

I, ... , u*) S; Vi are shown along with four intervals from IA, all ending at £1* (RI). Note that intervals Ijl' 112 and Ij" satisfy condition (R4) for color c E [k], since r(Ijl' c) = f(l

+

1")11, r(I12, c)

= reI +

1")21 and r(Ij", c) = f(1

+ d'l,

^for^h

=

^fCIog^r(I"1⁺^1,^c)/^10g(1

+

E) - 11-

The total number p,(n) of possible E-partial assignments with respect to a given vertex £1* E V with

I

V

I

= n can be bounded as follows: There are at most nil I +h2 possible choices for the vertices £I I , U2, ... , Ul_li' u; , u;, ... , U;'2. For each interval

(12)

- - - - , - - - -- --

- - - - , - - - - , --

1 1

0 0 0 0 0 0 " ' 0 0 0 0 0

P Uj/, Uj2 UjI

Fig. 2 The number of vertices in interval [11,11*] colored C E [k] by X is monotonically increasing on III - 11*1. According to (R4), an E-partial assignment consistent with X has to contain intervals Ih' Ijz and Ihl with r(/jl,e) = r(1 + E)ll, r(/jz' e) = r(1 + E)21 and r(/j" , c) = r(1 + d 'l, for II = r(logr(/,,, + I, e)1 log( I + E) - Il Interval

I,,,

⁺¹⁼^{[p, u*}^]^isrequired by (RI)

I E I, the number of non-negative integer requirements r (I, c), c E [k], satisfying (C2) is

(

III )k - I

= I + - -Hk-I

k - I (

III )k- I

::::: I

+

^{k _}^{I (I}

+

^{In(k -} ^1»

The first inequality follows from the inequality of arithmetic and geometric means, which states that for any n non-negative real numbers XI, X2, ... , XII

XI +X2+"'+XII

- - - 2:

,:y

^{XI ...}X2 . X".

n ⁽¹³⁾

Letting il

= 1/11 ,

^ij

=

^{IIj \ I}^j^_^ll^,for 2::::: j ::::: hI

+

I, and similarly i;

= 1/;1 -

I, if = IIi \ Ii_II, for 2::::: j ::::: h2

+

1, we observe by (Cl) and (C2) that

Iii +1

(i ' ⁺

^k^_

1)

^"2+¹

(i i , ⁺

^{k -}

1)

{)-(n) ::::: n"I+"2

n

^J

_k-I n

^J

_k-I

j=1 j=1

(13)

''1+¹( . )k-I h2+1 i' )k-I

.::: n''1+h2

n

^{I +}

^~(Ink

⁺^I)

n

(I + -J-(Jnk + I)

k-I k - I

j=1 j=1

(

r:''1+¹(J +

~(Jnk

+ I» + r:"2+¹(I +

~(Jnk

+ 1»)(k-I)(I'I+^h2+2)

X 1=1 k-I 1=1 k-I

hI

+

h2

+

2

= n"I+"2 ₍I

+

^{Ink+ 1}

. - - - -

ⁿ ⁾^(k^-^l)(h^l⁺^h2+2)

k - 1 hI

+

h2

+

2

2k2~+4k-2

( Ink

+

I

)(k- I) (2k IO!~r~< )+4)

< n 109(T+E) . 2· - - -

- k - I ' (14)

which is nPolylog(ll) for every fixed € > 0 and k = polylog(n). The third inequality follows from (13), and concerning the last step of the calculation, note that the

+

I summand in big brackels can be replaced by a mulLipJicalive faclor of 2 for suffi- ciently large n. Using hI, h2 .::: k lognj log(l

+

€)

+

I then gives the desired bound.

Definition 3 (Consistent Assignment) Let X : V ^f-+[k) be a coloring of V. We say that an assignment d

=

(I, r) is consistent with X if N x (/, c)

=

r (/, c) for all c E [k) and I E I. Two assignments d l and

a2

are said to be consistent if there exists a coloring X with which both are consistent.

Lemma 3 Let X be a coloring of Vi and u* E V be an arbitrary vertex. Then there exists an €-partial assignment f!lJ

=

(u*, I, r) on Vi W.r.t. u* that is consistent with X·

Proof Assume that Vi = {p, p

+

I, ... , q}. Clearly, for every c E [k) the function Nx([u*

+

I, u), c) is monotonically increasing on u > u* with a maximum positive increment of I. This allows us to define f!lJ as follows. Let U

o

⁼^u*

+

I. For j

=

I, 2, ... , h2 let

uj

=

min{u > uj_1 13i E Z+, c E [k): Nx([u*

+

I, u), c)

=

^f(1

+ dl

and X (u) = c}. (15)

The highest index j for which such an uj < q exists determines the value of h2.

In accordance with condition (RI), we set

Ij

⁼^[u*

+

I, uj) for j = 1,2, ... , h2 and h2+1 = [u*

+

I, q). In a similar way we define h I and the intervals Ij for j

=

1,2, ... , hI

+

1 (see Fig. 2). Finally, we define r(/, c)

=

N x(/, c) for all c ^E[k) and I E I, which naturally satisfies (CI) and (C2). The definition of interval endpoints

according to (15) guarantees (R3) and (R4). D

We observe than an €-partial assignment f!lJ is an effective abstraction of the set of colorings that f!lJ is consistent with:

(14)

1= [s,

t J

: j(I,9 ) = 4

, ^,^l^(I^,^.9"⁾⁼²

I ~

1!~

Fig. 3 For an E-partial assignment 9 W.r.t. u' and a given interval I E I, jU, 9 ) and l(l. 9 ) are defined to be the smallest and largest indices. respectively. such that [UjU . .9'), ueU . .9')1 £ I

Observation 1 Let fYi = (u*, I~, r~) be an E-partial assignment W.r.t. u*. Given an interval I

=

[s, t] ^EI with u* ^EI, we let

1

(l, fYi) and e(l, fYi) be, respectively, the smallest and largest indices such that [Uj(I.~), u~(I.~)] <; I, i.e. l(l, fYi)

=

min{i : Ui 2: s} and e(l, fYi) = max{i : u; ::: t} (see Fig. 3). If either of these indices does not exist, we set the corresponding value ofrf!lJ(le(J.~), c) or r~(lj(I,~), c) to O. Then by property (R4) of an ^E-partial assignment

r~(le(J.~), c)

+

rf!lJ(lj(I,~), c)

::: Nx,(l, c) ::: (I

+

E)(rf!lJ(lt(J.~), c)

+

r~(lj(l,~), c» (16) holds for any c ^E[k] and coloring X': V r+ [k] such that fYi is consistent with X.

In the next section we show how to compute an assignment that represents (I, I

+

E)-approximate colorings by recursively merging consistent E -partial assignments.

3.2 A Divide-and-Conquer Algorithm

The pseudocode describing our divide-and-conquer (D&C) algorithm is presented below as a procedure called MAXCOLORINGApPROX, which takes as parameters an instance (n, k, I, r) of problem MAXCOLORING and consistent left- and right- assignments dL and dR. To compute an (I, 1

+

E)-approximation, we set dL and dR to be empty in the initial call.

The algorithm is based on a divide-and-conquer paradigm where a vertex u* in the middle of V is picked and all intervals containing u* are evaluated to determine whether they should be taken into the solution. To do this evaluation conservatively, the procedure iterates over all E-partial assignments fYi w.r.t. to the middle vertex u*

that are consistent with dL and dR, then recurses on the subsets of intervals to the left and right of u*.

Procedure MAXCOLORINGApPROX uses two subroutines: MAXCOLORING- SPECIAL checks whether a pair of a left-and a right-assignment is consistent, and if so, returns a feasible coloring; REDUCE(Vdu*), fYi, dL, dR) (REDUCE(VR(U*), fYi, dL , dR» combines the assignments fYi, dL and dR into left- and right- assignments d{, d~ on Vdu*) (respectively, on VR(U*». For a more detailed description of the two subroutines see below.

From the recursive calls in lines 7 and 8 we obtain two independent colorings XI : Vdu*) r+ [k] and X2 : VR(U*) r+ [k], which are combined in line 9 into a coloring

(15)

Algorithm 1: MAXCOLORINGApPROX(V,I , .sz1L, .sz1R) Data: An instance (n, k, I, r) of MAXCOLORING Result: An (I, I

+

E)-approximation (X, J)

1 if

III =

^{0 then}

2 X +-MAXCOLORINGSPECIAL(.sz1L, .sz1R)

3 return (X, 0)

4 let u* E V be such that IILCu*)

I .::::

m/2 and IIR(U*)I .:::: m/2

5 forall E-partial assignments 9 = (u*, If?lJ, r g1J) do

6 if 9 is consistent with .sz1L and .sz1R then

7 (XI, JI) +-

MAXCOLORINGApPROX(VLCu*),

h

(u*), REDUCE(Vdu*), 9 , .sz1L, .sz1R»

8 (X2, J2) +-

MAXCOLORINGApPROX(VR(U*), IR(U*), REDUCE(VR(U*), 9 , .sz1L, .sz1R»

9 X +-XI U X2

10 K +-(I E I(u*): ~i~~~

.::::

^rf?lJ^(/eu^.f?lJ),^c)

+

rf?lJ(/j(I.g1J), c) .:::: r(/, c) 'Ie)

11 J+-KUJIUJ2

12 store (X, J)

13 return the recorded solution with largest W(J) value

X

=

^XI^U^X2defined in the obvious way: X (u)

=

^XI^(u)îfûÊ^Vdu^*)ând^{X (u) =}

X2(U) if u E VR(U*).

Given a left-assignment .sz1L

=

(IL, rd and right-assignment .sz1R

=

(IR, rR) on a vertex set V'

=

(p, ... , q) and an E-partial assignment 9

=

(u*, IA U IB, rf?lJ), procedure REDUCE constructs, considering the recursive call on

V{

(u*) in line 7, a left-assignment .sz1{

=

(I~, rD and right-assignment .sz1~

=

(I~, r~) on vertex set V"

=

(p, ... , u*) by cutting intervals at u* as follows (see Fig. 4):

• I~=([p,t] EILlt.::::u*),

• I~ =I A U Irs, u*] 13[s, q] E I R : s < u*),

• r~ (/, c)

=

rd/, c) for 1 E I~ and r~(/, c)

=

rg1J(/, c) for 1 E I A, ^{for all}c E [k],

• r~ ([s, u*], c)

=

^r^{R ([s, q]}^{, c)}^- ^r^{g1J ([u}^*

+

^1,^q],^{c) for}^[s^{, q]}ÊÎR,^S^<û*,^{for all}

c E [k].

In the recursive call in line 8 procedure REDUCE combines the given assignments according to a symmetric schema. Notice that IL = 0 in the leftmost and IR = 0 in the rightmost path of the recursion tree.

In the following Lemma 4 and Theorem 3 we show how procedure MAX- COLORINGSPECIAL can check consistency of assignments.sz1L = (IL, rd and.sz1R = (IR, rR) on vertex set V in line 2 in time O(k . (IIL

1+

IIR

I» .

Note that sets IL and I R each contain an interval spanning all vertices in V. This is due to intervals I" I + I and 1/'2+1 in Definition 2 of an E-partial assignment and due to the specific structure of the assignments constructed by procedure REDUCE (see Fig. 4).

(16)

Z, { ~ ~ : ,:: , : , - - - ^}Z ^'

h{ r=~ ==:::=::::::t::::~-; :::,::', :, :: }ZR

p u* q

Fig.4 For given left-assignment sdL = ^(LL^.rd. right-assignment sdR = (LR. rR) and an E-partial assignment 9 = (u*. LA ULn,rf!1'). in the recursive call on Vdu*) procedure REDUCE cuts intervals on the verticallille at index u* such that the new left- and right-assignments sd{ and sdk contain the intervals shown by solid lilies. Interval [Po q]. contained in LR. and interval [Po ql. contained in LL. are omitted

Following the terminology introduced before we call a coloring X feasible for an assignment d

=

(I, r), if N x (I, c)

=

r(l' c) for all c E [k] and I E I. In other words, X is feasible for d , if d is consistent with X. We call two assignments d and d ' equivalent, if they have the same set of feasible colorings.

Lemma 4 Let d

=

(I, r) be an assignment on V

=

{I, 2, ... , n} where set I can be partitioned into two sets II and I2, such that for p E (I, 2) it holds

(WI) Ii

n

Ij

=

^{0, V}Ii, Ij E I_{p ,}i.e. intervals are disjoint and (W2) UIEL_pI

=

[I, n], i.e. the intervals span all vertices.

Then it can be decided in time O(k . III) whether a feasible coloring X : V ~ [k] for d exists, i.e. a coloring X such that d is consistent with X.

Proof We represent interval set II as sequence ([Sl, tl], [S2, t2], ... , [s/, t/]) and set I2 as sequence ([SI, td, [Sz. t2], ... , [Sm, [,/1]), where Si

=

ti-I

+

I for 2.::: i .::: l, and similarly Si

=

(;-1

+

1 for 2 .::: i .::: m. Property (W2) implies Sl

=

51

=

I and tl =

0!

⁼n. For I .::: i .::: l we denote [Si, Ii] by Ii and for 1 .::: i .::: m we denote [Si, ti] by Ii.

From assignment d we construct an equivalent assignment d '

=

(I', r'), where intervals in I' are disjoint and therefore feasibility of d ' can be determined by veri- fying for every interval [s, t] E I' that

r'([s,t],c):::o, forallc E[k).

We define I' to be the partition of {I, 2, ... , n} into a minimal number of intervals, such that for each interval I' E I' and each element I E I either I' S; I or I'

n

1= 0 (see Fig. 5(a». We represent I' by sequence ([s;, t;], [s~, I~], ... , [s;, t;]) and again denote [s;,

In

^by

^If

^for^I^.:::ⁱ^.:::^r^.

What remains is the assignment of requirements to intervals in I', i.e. the definition of r' : I' x [k] ~ {I, 2, ... ,n}. We will define funclion r' recursively, i.e. for c E [k] the value r' (If ' c) might depend on values r' (Ii, c) for j < i. Due to the min- imality of I', I; = min(tl, t)) and interval I; will coincide with either II or

i

l. In Fig. 5(a) the latter case holds. Therefore any coloring X feasible for assignment d

(17)

Sq Iq tq

I ^I

Sq' I'l' ^['I'

I I

:1'

I~

~ I ^L

s' _1J s' L t';

., " "

.51 [1:: . ::

S",: :

^[IlL

I2

~mnit . . . ^t1h:7

I' Hf____j~f__::.:J/"~ ~~~f____j~

s~ t~ ⁴

S " . t".

(al (bl

Fig.5 (a) Set II and I2 satisfy (W I) and (W2) in Lemma 4. For each interval I' E I' and each element I E I, I = II UI2, either I' £ I or I' n 1= 0. (b) In the construction of an equivalent assignment .91' in the proof of Lemma 4 the number of vertices that have to be colored c in interval

I:

^{is ob}tained by (17)

will satisfy (1) for interval I; if and only if r'U;, c)

=

rUI, c) or r'U;, c)

=

r(II, c), respectively, for all c E [k]. Now consider an interval I( for arbitrary 2

:s

ⁱ

:s

^{r. If}

I: E II or I: E I2, as e.g. I~ E I2 in Fig. 5(a), for assignment .f21' to be equivalent with assignment.f21 it must hold r'U(, c)

=

rU/, c), for all c E [k]. Otherwise, with- out loss of generality assume s;

=

Sq for some Iq E II and t;

=

lql for some Iql E I2.

Let p be such that I;) E I' and

s;) =

Sq' (see Fig. 5(b)). If we assume that any coloring X feasible for .f21 satisfies (1) for all intervals Ij with 1

:s

j

:s

i - 1, then X will satisfy (1) for interval

I(

if and only if

i- I

r'U(,c)=r(Iq"c)- Lr'Uj,c), forallcE[k].

j=p

(17)

o

Clearly the above lemma can be generalized to the case where I can be partitioned into an arbitrary number of sets, each satisfying conditions (W 1) and (W2).

Theorem 3 Let V

=

^{I,^{2, ... ,}n}. For given left-assignment .f21L

=

^{(IL, rd}^with

[1, n] E IL and right-assignment.f21R = (IR, rR) with [1, n] E IR, it can be decided in time O(k . (IIL

1+

^IIR^I))whether.f21L and .f21R are consistent.

ProofLetIL

=

([I,ttl,[I,t2], ... ,[I,tpD with tp =n and IR

=

([sl,n]'h,n], ... , [Sq, n]) with SI

=

1 be sorted with respect to "~" and "2", respectively, in non- decreasing order. Then assignments .f21L and .f21R are consistent if and only if the following assignments .f21{

=

(I~, r~) (see Fig. 6) and .f21~

=

(I~, r~) are consistent:

• I~

=

([1, ttl, [tl

+

1, t2]' ... , [tp-I

+

1, tpD,

• rU[l, ttl, c)

=

rd[l, ttl, c) and r~ ([ti- I

+

1, ti], c)

=

rd[l, ti], c) - rd[l, ti-tl, c), for 2

:s

i

:s

p and c E [k],

• I~ = ([SI, S2 - 1], [S2, S3 - I], ... , [Sq,

nD,

and

• r~([sq,n],c)

=

rR([Sq,n],c) and r~([si,Si+1 - 1],c)

=

rR([si,n],c) - rR([si+l, n], c), for I

:s

i < q and c E [k].

Interval sets I~ and I~ satisfy conditions (W 1) and (W2) in Lemma 4 and therefore

the claim follows. 0

Approximation algorithms for the interval constrained coloring problem