• Keine Ergebnisse gefunden

4.2 Theoretical Results

4.2.3 Tractability Results

4.2 Theoretical Results

by construction ofU0, it follows thatS∩U06= ;. Analogously, the rowrS1 is mapped to a pattern vector such that its-symbol is assigned to 0. Thus, S∩(U\U0)6= ;.

This proves the correctness of our reduction. Note that in this reduction the alphabet size is|U| +3. By applyingLemma 4.1, we get an equivalent instance with|Σ| =2 and the number of columns is|U|log|U|. Thus, the statement of the theorem follows.

4 Homogeneous Team Formation

given hint. If no such solution exists, then the first phase will generate another hint. The decisive point is to find a realization of the first phase which generates, for all yes-instances, at least one “correct” hint, that is, a hint which leads to a solution.

More precisely, we have the following.

Hint Enumeration

Input: A matrixM∈Σn×mand a pattern maskP∈{,?}p×m.

Task: Enumerate “hint” functionsh: R(P)→R(M)∪{∅}which maps each pattern vector either to a row of the input matrixM or to∅.

The hint gives information about the solutionϕ: For each pattern vector, one either fixes that it is not used in the solution, that is, it is mapped to∅, or one fixes one row from the input matrix which is mapped to the pattern vector in the solution. A hint functionhiscorrectif there is a solutionϕof the BASICHOMOGENEOUSTEAMFORMATIONinstance (M,P) such that

∀x∈R(P) :

¡h(x)6=∅→ϕ(h(x))=x¢

∧¡

h(x)=∅→@y∈R(M) :ϕ(y)=x¢ .

The second phase efficiently computes a solution that respects the hint whenever there is such a solution. We use the termpreimage typeto denote those row types which can safely be mapped to a specific pattern vector consistent with the hint.

Definition 4.5. Let (Mn×m,P) be an instance of BASIC HOMOGENEOUS TEAM FORMATION, and leth:R(P)→R(M) be a hint function. We say that a row type X ofM is apreimage type of a pattern vectorv ofP if rows from X can potentially be mapped tovwhile respecting the hinth.

More precisely, a row type X is apreimage typeof pattern vectorvwith y:=h(v)6=∅if for any rowxofX(recall that all rows ofXare identical) it holds that for all 1≤i≤m

(v[i]=)=⇒ (x[i]=y[i]).

4.2 Theoretical Results

Hint Checking

Input: A matrixM∈Σn×m, a pattern maskP∈{,?}p×m, a hint func-tionh: R(P)→R(M)∪{∅}, three functionsl,u,c: R(P)→N, and a cost bounds∈N.

Task: Compute a consistent function ϕ mapping the rows of M to the pattern vectors ofP with cost at mosts, fulfilling the size constraintslandu, and respecting the hinth, or answer “no” if there is no such mapping.

In Phase 2, we have the following situation. Suppose that there are t input row types, and a solution usesp0≤ppattern vectors; these are the vectors which the hint functionhdoesnotmap to∅. In the following we represent the set of input row types byTin:={1, . . . ,t}and the set of pattern vectors used in the solution byTout:={1, . . . ,p0}. Letκ:Tin×Tout→{0, 1}

be the function expressing whether an input row type is a preimage type of a pattern vector. The size constraints are expressed by the integers αi andβi with i∈Tout, whereαi=l(ρi) and βi=u(ρi) with ρx denot-ing the xthpattern vector used in the solution. Furthermore, letωiwith i∈Toutdenote the costc(i) of theithpattern vector and letnjwith j∈Tin denote the number of rows in the jthinput row type. A consistent map-ping g that fulfills the size constraintsl andu, has cost at most s, and respects the preimage types corresponds to a solution of a slight modifica-tion2of the ROWASSIGNMENT[Bre+14e] problem. It is defined as follows.

ROWASSIGNMENT

Input: Nonnegative integers s,α1, . . . ,αp0,β1, . . . ,βp0,ω1, . . . ,ωp0, and n1, . . . ,ntwithPt

i=1ni=nand a functionκ:Tin×Tout→{0, 1}.

Question: Is there a function g:Tin×Tout→{0, . . . ,n}such that

2In Inequality4.2the modified ROWASSIGNMENThas a specific lower boundαjand a specific upper boundβjfor eachjToutinstead of a uniform upper boundk.

4 Homogeneous Team Formation

κ(i,j)·n≥g(i,j) ∀i∈Tin,∀j∈Tout (4.1) αj

t

X

i=1

g(i,j)≤βj ∀j∈Tout (4.2)

p0

X

j=1

g(i,j)=ni ∀i∈Tin (4.3)

t

X

i=1 p0

X

j=1

g(i,j)·ωj≤s (4.4)

Let us see why ROWASSIGNMENTcorrectly captures theHint Checking task. We interpret g(i,j)=` in the former problem to mean that the function ϕof the latter problem maps exactly`rows of input type ito pattern vector j. Inequality4.1ensures that for each pattern vectorv∈P, only rows from its preimage types are mapped tov. Inequality4.2ensures that the mapping fulfills the size constraintslanduofPolynomial-time hint checking. Equation4.3states that all rows of each input type are mapped tosomepattern vector; this ensures that each input row is mapped to a pattern vector. Inequality4.4ensures that the costs of the mapping are at mosts.

The following lemma shows that ROWASSIGNMENTcan be be solved in polynomial time. The proof is similar to the original proof showing that ROWASSIGNMENTis polynomial-time solvable [Bre+14e, Lemma 1].

Lemma 4.2. There is an algorithm that solvesROWASSIGNMENTin time O(t p·log(t+p)(t p+(t+p) log(t+p))).

Proof. We reduce ROWASSIGNMENTto the CAPACITATEDMINIMUMCOST FLOWproblem, which is defined as follows [Orl88]:

CAPACITATEDMINIMUMCOSTFLOW

Input: A flow networkIwith a directed graphG=(V,A) capacity function c:A→N, a demand function d:V→Zon the nodes, and a cost functionb:A→Non the arcs.

Task: Find a functionf which minimizesP

(u,v)∈Ab(u,v)·f(u,v) and satisfies:

X

{v|(u,v)A}

f(u,v)−X

{v|(v,u)A}

f(v,u)=d(u) ∀u∈V 0≤f(u,v)≤c(u,v) ∀(u,v)∈A

4.2 Theoretical Results

v1

−n1

v2

−n2

v3

−n3

v4

−n4

v5

−n5

u1α1

u2

α2

u3α3

u4

α4

τ ˆ t

X

i=1

ni

!

− ˆ p

X

i=1

αi

! (ω

1,∞)

(ω

3,

) (ω1,)

(ω2,)

(ω3,) (ω2,)

(ω

4,∞) (ω4,)

(0,β

1α1)

(0,β2−α2) (0,β3α3)

(0,β

4α4)

Figure 4.4: Example of the constructed network witht=5 andp=4. The pair (x,y) on each arc denotes costsxand capacityy. The num-ber next to each node denotes its demand.

We first describe the construction of the network with demands, costs, and capacities. For each ni, 1≤i≤t, add a nodevi with demand−ni (that is, a supply ofni) and for each 1≤j≤p0add a nodeuj with demandαj. Ifκ(i,j)=1, then add an arc (vi,uj) with costωjand capacity∞. Finally, add a sinkτwith demand (Pni−Pαi) and the arcs (uj,τ) with cost zero and capacityβj−αj. SeeFigure 4.4for an example of the construction.

The CAPACITATEDMINIMUMCOSTFLOWproblem is known to be solv-able inO(|A| ·log(|V|)(|A| + |V| ·log(|V|))) time [Orl88]. Since our con-structed flow network hasO(t+p) nodes andO(t·p) arcs, we can solve our CAPACITATEDMINIMUMCOSTFLOW-instance inO(t p·log(t+p)(t p+(t+ p) log(t+p))) time.

It remains to prove that the ROWASSIGNMENT-instance is a yes-instance if and only if the constructed network has a minimum cost flow of cost at mosts.

For the “only if ” part, assume thatgis a function fulfilling constraints4.1 to 4.4. Then define a flow f as follows: For each 1≤i≤t, 1≤ j ≤ p, set f(vi,uj)=g(i,j) and f(uj,z)=Pt

i=1g(i,j)−αj. Since gsatisfies Equa-tion4.3and Inequality4.2, we get that the flowf fulfills the demands on the

4 Homogeneous Team Formation

nodes. Sincegfulfills Inequality4.4and the cost of each arc (uj,z), 1≤j≤p, is zero, flow f has cost of at mosts.

For the “if ” part, assume that f is a flow with cost of at most s. All costs, constraints, and demands are integer-valued, and hence, due to the Integrality Property [AMO93] of network flow problems, there exists an optimal flow with integer values. Then setg(i,j)=f(vi,uj) for each 1≤i≤ t, 1≤j≤p. Note thatgfulfills Equation4.3and Inequality4.2due to the demands on the nodes of the network and the capacities of the ingoing arcs ofz. Sinceni≤nfor all 1≤i≤t, also Inequality4.1is fulfilled. Note thatf has cost at mostsand, hence,gfulfills Inequality4.4.

Computing functionκas needed in ROWASSIGNMENTtakesO(p·t·m) time and as preprocessing we have to compute the input row types inO(n·m) time (by constructing a trie on the rows [Fre60]). We obtain the following lemma.

Lemma 4.3. Hint Checking can be solved in O(t2·p2·log(t p)+t·p·m+n·m) time.

Next, we describe several fixed-parameter algorithms for Hint Enumera-tion of the above described algorithmic scheme. The respective algorithms differ in the varying parameters that are used.

Homogeneity Parameterizations

Next, we use the algorithmic scheme to show that HOMOGENEOUSTEAM FORMATIONbecomes (fixed-parameter) tractable for instances with a cer-tain degree of homogeneity. We start with the question whether HOMO -GENEOUSTEAMFORMATIONis still intractable (that is,NP-hard) when the number pof pattern vectors, that is, the number of possible teams, is a constant. Combining with p the parameter tdenoting the number of input row types, we show that HOMOGENEOUSTEAMFORMATIONis fixed-parameter tractable with respect to the combined fixed-parameter (p,t). To this end, we use a brute-force realization of the Hint Enumeration. The corre-sponding algorithm (consisting of both phases) can also be interpreted as an XP-algorithm for HOMOGENEOUSTEAMFORMATIONparameterized byp, that is, HOMOGENEOUSTEAMFORMATIONis polynomial-time solvable for constantly many pattern vectors.

4.2 Theoretical Results

Theorem 4.3. There is an algorithm solvingHOMOGENEOUSTEAMFOR -MATIONin time O(tp·2p·(t2·p2·log(t p)+t·p·m)+n·m).

Proof. The parameterized hint enumeration works in two steps as follows.

1. For each pattern vectorv, determine whether it isusedin the solution, that is, determine whethervoccurs in the image of the mapping.

2. For each pattern vectorvthat is used in the solution, guess one of the rows which are mapped tovin the solution.

We realize both steps by branching over all possibilities. Step 1 can be realized by branching on 2p possibilities. In Step 2, we have to consider up totppossibilities. Since we consider all possibilities, one clearly finds a correct hint function for every yes-instance.

Since t≤ |Σ|m and we can assume without loss of generality that p≤ 2m·t, fixed-parameter tractability for (p,t) transfers to fixed-parameter tractability for (|Σ|,m).

The fixed-parameter algorithm behindTheorem 4.3seems practically relevant iftand more importantlypare relatively small. Adapting the ideas behind this algorithm an integer linear programming formulation solving HOMOGENEOUS TEAM FORMATION can be found. More precisely, one combines the brute-force approach for the hint enumeration with the ROW ASSIGNMENTapproach for the hint checking (seeFigure 4.5for details).

Since the number of variables is upper bounded by a function only depending ontandp, fixed-parameter tractability would already follow solely by this ILP formulation [Len83] (see alsoSection 2.5). Although the corresponding running time bound would be worse than that stated inTheorem 4.3, it might still be useful in practice: An evidence for this is that a closely related integer linear formulation for a combinatorial data anonymization problems performs surprisingly well on empirical data—dramatically better than its theoretical running time bounds indicate (seeSection 5.3for details).

Theorem 4.3 shows fixed-parameter tractability for HOMOGENEOUS TEAMFORMATIONwith respect to the combined parameter (t,p). Next, we develop a fixed-parameter algorithm for the individual parametertwhen there are no upper bounds on the team sizes. This is mainly a classifica-tion result because its current running time is impractical. Furthermore, it seems reasonable to assume that the number of possible teams can be

4 Homogeneous Team Formation

min

t

X

i=1 p

X

j=1

xi,j·c(j) (4.5)

t

X

i=1

con(i,j,i0)·xi,j≤hj,i0·n 1jp

1≤i0≤t (4.6)

t

X

i=1

con(i,j,i0)·xi,j+l(j)·(1−hj,i0)≥l(j) 1jp

1≤i0≤t (4.7)

t

X

i=1

con(i,j,i0)·xi,j−u(j)·(1−hj,i0)≤u(j) 1≤j≤p1≤i0≤t (4.8)

p

X

j=1

xi,j=n(i) 1≤i≤t (4.9)

t

X

j=1

hj,i0≤1 1≤j≤p (4.10)

Figure 4.5: Integer linear program formulation combining hint enumeration and hint checking. The integer variablesxi,jdenote the number of rows from typeibeing mapped to the pattern vector j. The binary variablehj,i0(representing the hint) is 1 if some row of type i0 is mapped to pattern vector j; otherwise it is set to 0.

Furthermore, n(i) denotes the number of rows of type iand con(i,j,i0) is 1 if one can consistently map any rowrof typei and any other rowr0of typei0to pattern vector j, that is, rowr and rowr0are identical on the-positions of pattern vectorj;

otherwise con(i,j,i0)=0. The goal function (4.5) ensures that the solution has minimum costs. Constraint set (4.6) ensures that the variableshj,i0 are consistently set with the variables xi,j, that is, if there is some positive variablexi,j indicating that the some row of typei0is mapped to pattern vector j, thenhj,i0=1.

Constraint sets (4.7) and (4.8) ensure that size constraints are fulfilled. Constraint set (4.9) ensures that each individual is in some team. Constraint set (4.10) ensures that there is at most one instance of each team.

4.2 Theoretical Results

bounded by a function in the number of different individuals in most realis-tic instances. Then, however, one would always prefer the algorithm from Theorem 4.3.

We begin with an important observation on the solution mappings which holds when there are no upper bounds on the team sizes. The following lemma says that without loss of generality one may assume that there is an optimal solution that uses at mosttpattern vectors.

Lemma 4.4. Let(M,P,l,u,c,s)be a yes-instance of HOMOGENEOUSTEAM FORMATIONwith u= 〈n〉, that is, there are no upper bounds on the team sizes.

If M has t row types, then there exists a solution mappingϕfor(M,P,l,u,c,s) whose image contains at most t elements.

Proof. Letϕbe a consistent mapping fulfilling the size constraintland having cost at mosts. If the image ofϕhas at mosttelements, then there is nothing to prove. So let the image ofϕcontain more thantelements, that is,ϕuses more thantpattern vectors. We now describe an operation that reduces the number of pattern vectors used byϕwithout increasing its cost.

We call a pattern vectorv used byϕredistributableif, for each rowr mapped tov, there is another pattern vectorv0used byϕsuch thatc(v0)≤ c(v) and mappingrtov0instead ofvdoes not violate the consistency of the mapping. Observe that if a pattern vector used byϕis redistributable, then we can eliminate this row type from the image ofϕby “moving” each of its rows to a different, at most as expensive pattern vector, while preserving the lower bound condition on pattern vectors from the image ofϕ. This operation reduces the number of pattern vectors used byϕwithout increasing the cost. As long as there are redistributable pattern vectors left, we repeatedly eliminate pattern vectors from the image of the mapping in this manner.

Letϕ0denote the mapping which results from exhaustive application of this procedure toϕ.

Now, we analyze the properties of the modified mappingϕ0. Clearly, its image contains only pattern vectors that are not redistributable. Consider any pattern vectorv0in the image ofϕ. Sincev0 is not redistributable, there exists a rowr0mapped tov0such that no row that has the same row type asr0can be consistently mapped to another pattern vector from the image ofϕwith at most the same cost. In this sense,v0is the “cheapest possible”

pattern vector for at least one row type. Hence, every pattern vector which is used byϕ0is the cheapest possible pattern vector for some row. Since there are onlytrow types, at mosttpattern vectors can be the “cheapest

4 Homogeneous Team Formation

possible” pattern vector for any row. Hence, the image ofϕ0 contains at mosttpattern vectors.

Theorem 4.4. There is an algorithm that solves HOMOGENEOUS TEAM FORMATIONin O(2n2n2n+2·(m+n2logn)+n·m)time. If there are no upper bounds on the team sizes, thenHOMOGENEOUSTEAMFORMATIONcan be solved in O(2t2t2t+2·(m+t2logt)+n·m)time.

Proof. We first show that HOMOGENEOUS TEAM FORMATION is fixed-parameter tractable for the single fixed-parameter twhen there are no upper bounds on the team sizes. Then, we show how to adapt the proof to also work for general HOMOGENEOUSTEAMFORMATIONfor the parametern.

To show fixed-parameter tractability for the parametert, we need a more re-fined realization of the hint enumeration phase. Clearly, wheneverp≤t, we use the brute-force realization fromTheorem 4.3without any modification.

The corresponding running time isO(tt·2t·(t4·logt+t2·m)+n·m). Forp>t, we slightly modify the Step 1 in the algorithm behindTheorem 4.3.

Recall that in Step 1 one determines a setP0⊆P of pattern vectors that are used in the solution. Due toLemma 4.4we know that without loss of generality|P0| ≤t. InTheorem 4.3we simply try all size-at-most-tsubsets ofP. Here, we show that for guessing we only have to take into account a relatively small subsetP⊆Pwith|P| ≤g(t) andgbeing a function which only depends ont.

Consider a pattern vectorvof the unknownP0. In Phase 2 of the algo-rithm (polynomial-time solving by the help of the hint), we determine the preimage types, that is, the set of input row types that may contain rows that are mapped tovin the solution. Assume that the preimage types for all pattern vectors fromP0are fixed. To determine which concrete pattern vector corresponds to a set of preimage types, we only have to take into account thetcheapest compatible pattern vectors, where compatible means that all rows of these preimage types coincide at the-symbol positions.

By definition, there exist at most 2tdifferent sets of preimage types. Thus, keeping for each set of preimage types thetcheapest pattern vectors and removing the rest results in a setPof size 2t·t.

Summarizing, whenp>t, we realize Step 1 by computingPas described above and branch on all subsetsP0⊆Pof size at mostt. This can be done in O¡2t·t

t

¢≤O(2t2tt) time. Step 2 in the algorithm behindTheorem 4.3remains unchanged, that is, for pattern vectorv∈P0we guess one row fromMwhich

4.2 Theoretical Results

is mapped tovin the solution. Altogether, we solve HOMOGENEOUSTEAM FORMATIONinO(tt·(2t2tt)·(t2·m+t4logt)+n·m) time.

Clearly, sincet≤n, our result also holds for the parametern. Moreover, we only need the fact that there are no upper bound on the team sizes to upper bound the size|P0|of used pattern vectors withLemma 4.4. This can be replaced by the bound|P0| ≤nwhich already follows from the problem definition. Thus, HOMOGENEOUS TEAM FORMATION(even with upper bound on the team sizes) can be solved inO(2n2n2n+2·(m+n2logn)+n· m) time.

ForTheorem 4.3we described an XP-algorithm with respect to the pa-rameterp, that is, an algorithm with polynomial running time for constant values ofp. We leave it open whether there also exists an algorithm where the degree of the polynomial is independent ofp, that is, whether HOMO -GENEOUSTEAMFORMATIONis fixed-parameter tractable for parameter p.

Note that as a consequence of the “binarization” inLemma 4.1, the question whether HOMOGENEOUSTEAMFORMATIONis fixed-parameter tractable with respect to the combined parameter (p,|Σ|) is equivalent to the question whether HOMOGENEOUSTEAMFORMATIONis fixed-parameter tractable with respect topalone.

However, at least for the special case without upper bounds on the team sizes and with s≥n·maxvR(P)c(v), that is, the costs are effectively un-bounded, we can show fixed-parameter tractability. This result obviously transfers to BASICHOMOGENEOUSTEAMFORMATION.

Theorem 4.5. If there are no upper bounds on the team sizes and no cost bound, thenHOMOGENEOUSTEAMFORMATIONcan be solved in O(p!·m· n2·t4·logt)time.

Proof. To prove this statement, we describe a greedy algorithm for comput-ing a hint functionh.

We will show that there is acorrect permutationof pattern vectors such that applying the following greedy procedure leads to a correct hint func-tionh. For now, assume that a specific permutation of pattern vectors is given.

Greedy hint construction

1. Start withh(x) :=∅for all pattern vectorsx∈P.

4 Homogeneous Team Formation

2. Take the first pattern vectorzwithh(z)=∅. 3. Seth(z) to be the first row inM.

4. Remove fromM the row h(z) and remove all rows which coincide withh(z) at the-positions ofz. IfMis nonempty, then go to Step 2.

The hint function constructed by this greedy procedure highly depends on the ordering of the pattern vectors. Thus, we simply try all possible orderings to realize the Hint Enumeration. It remains to show the following:

Claim.Letϕbe a solution forHOMOGENEOUSTEAMFORMATIONwhich uses a minimal number of pattern vectors and let P0⊆P be the set of pattern vectors used byϕ. Then, there is a permutation of pattern vectors such that applying the “Greedy hint construction” procedure to this permutation of pattern vectors produces a correct hint, that is, it assigns one row rito each pattern vector pi∈P0such thatϕ(ri)=pi, and no row to any pattern vector in P\P0.

Proof of Claim. GivenϕandP0we show the existence of the correct order-ing of pattern vectors by construction. To this end, let first(M) denote the first row in the matrixMand letπbe an (initially empty) list of pattern vectors.

a) Insertp:=ϕ(first(M)) at the end ofπ.

b) Remove all rowsxfromMwhich coincide with first(M) at the -positions of p.

c) IfMis nonempty, then go to Step (a).

d) Finally insert all pattern vectors fromP\P0at the end ofπ.

The ordering of the pattern vectors in πis correct: Observe that since ϕuses a minimal number of pattern vectors, it holds that for each pattern vector pi∈P0 there is at least one row which cannot be mapped to any other pattern vector fromP0. Hence, every pattern vector fromP0has some position inπ. Now, apply “Greedy hint construction” with the ordering of the pattern vectors given byπto obtain the hint functionh. Considerpi, the ithpattern vector inπ. By construction ofπ, row h(pi) was the first row in the matrix in theithiteration of the construction procedure forπ. Thus, by Step (a),ϕ(h(pi))=pi. Furthermore, “Greedy hint construction”