Multivariate complexity analysis of team management problems

(1)

Multiv

aria

te Comple

xity Analy

sis of T

eam Manag

emen

t Pr

oblems

Universitätsverlag der TU Berlin

Universitätsverlag der TU Berlin

Foundations of Computing

Volume 3

Robert Bredereck

Multivariate Complexity Analysis

of Team Management Problems

This thesis identifies and develops simple combinatorial models for four natural team manage-ment tasks. The considered tasks include building as well as modifying one team, partitioning individuals into multiple teams, and the redistribution of team members under several optimi-zation goals. To this end, it employs surprising relations to concepts from voting theory, data anonymization, and graph theory.

Performing a multivariate complexity analysis of the underlying problems all tasks are identi-fied to be computationally intractable in general. However, the analysis also shows meaningful tractable special cases for each task. Intractability is shown by hardness results with respect to the complexity classes NP, LOGSNP and the parameterized intractability classes W[1] and W[2]. Tractable cases are achieved trough exact fixed-parameter tractable or polynomial-time algo-rithms, integer linear programming formulations, and heuristics. Some of the most promising algorithms of the thesis are tested on synthetic and empirical data.

Robert Br

eder

eck

3

Multivariate Complexity Analysis of Team Management Problems

http://verlag.tu-berlin.de ISBN 978-3-7983-2764-1 (print)

ISBN 978-3-7983-2765-8 (online)

(2)

(3)

Robert Bredereck

Multivariate Complexity Analysis of Team Management Problems

(4)

Die Schriftenreihe Foundations of Computing der Technischen Universität Berlin wird herausgegeben von:

Prof. Dr. Stephan Kreutzer, Prof. Dr. Uwe Nestmann, Prof. Dr. Rolf Niedermeier

(5)

Foundations of Computing | 03

Robert Bredereck

Multivariate Complexity Analysis

of Team Management Problems

(6)

Bibliografische Information der Deutschen Nationalbibliothek

Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet überhttp://dnb.dnb.deabrufbar.

Universitätsverlag der TU Berlin, 2015

http://verlag.tu-berlin.de

Fasanenstr. 88, 10623 Berlin

Tel.: +49 (0)30 314 76131 / Fax: -76133 E-Mail:publikationen@ub.tu-berlin.de

Zugl.: Berlin, Techn. Univ., Diss., 2014 1. Gutachter: Prof. Dr. Rolf Niedermeier 2. Gutachter: Prof. Dr. Arkadii Slinko 3. Gutachter: Prof. Dr. Toby Walsh

Die Arbeit wurde am 7. Oktober 2014 an der Fakultät IV unter Vorsitz von Prof. Dr. Stefan Jaehnichen erfolgreich verteidigt.

Das Manuskript ist urheberrechtlich geschützt. Druck: docupoint GmbH

Satz/Layout: Robert Bredereck

Umschlagfoto: vladsilver | 123RF Stockfoto |http://de.123rf.com/profile_vladsilver| Bild-Nr.: 15395546

ISBN 978-3-7983-2764-1 (print) ISBN 978-3-7983-2765-8 (online) ISSN 2199-5249 (print)

ISSN 2199-5257 (online)

Zugleich online veröffentlicht auf dem Digitalen Repositorium der Technischen Universität Berlin:

URN urn:nbn:de:kobv:83-opus4-66067

(7)

Zusammenfassung

In dieser Dissertation identifizieren und entwickeln wir einfache kombina-torische Modelle für vier natürliche Teamverwaltungsaufgaben und unter-suchen bezüglich Berechnungskomplexität handhabbare und nicht hand-habbare Fälle. Hierzu analysieren wir die multivariate Komplexität der zu Grunde liegenden Probleme und testen manche unserer Algorithmen auf synthetischen und empirischen Daten.

Unsere erste Aufgabe ist es ein Team zu finden, welches von einer Ge-meinschaft akzeptiert wird und den Vorstellungen (im Folgenden „Agenda“) eines Chefs entspricht. Wir formalisieren diese Aufgabe mit einem einfa-chen kombinatoriseinfa-chen Modell, indem wir ein bekanntes Verfahren aus dem Wahlkontext durch ein Agendamodell erweitern. In diesem Modell wird die Gemeinschaft durch Wähler mit je einer „Favoritenmenge“ re-präsentiert. Wir zeigen, dass die resultierenden Probleme UNANIMOUSLY ACCEPTEDBALLOTund MAJORITYWISEACCEPTEDBALLOTNP-schwer sind, sogar wenn es keine Agenda des Chefs gibt. Hierbei fragt UNANIMOUS -LYACCEPTED BALLOT, ob es ein Team gibt, welches von allen Wählern akzeptiert wird. MAJORITYWISEACCEPTEDBALLOTfragt, ob es ein Team gibt, welches von einer strikten Mehrheit der Wähler akzeptiert wird. Ak-zeptanz bedeutet in diesem Zusammenhang, dass jeder Wähler die Mehrheit der Teammitglieder unterstützt. Auf der positiven Seite zeigen wir „fixed-parameter tractability“ (FPT) für die Parameter „Anzahl an potentiellen Teammitgliedern“ und „Anzahl an Wählern“. Für den Parameter „maxima-le Größe der Favoritenmengen“ zeigen wir ein FPT-Ergebnis für UNANI -MOUSLYACCEPTEDBALLOTundW[1]-Vollständigkeit für MAJORITYWISE ACCEPTEDBALLOT. Auf der negativen Seite zeigen wirW[2]-Schwere für den Parameter „Größe der Lösung“ undNP-Schwere für verschiedene Spe-zialfälle.

Unsere zweite Aufgabe ist es eine Menge von Individuen in homogene Gruppen zu partitionieren. Unter Ausnutzung von Konzepten des kombi-natorischen Datenanonymisierungsmodells k-ANONYMITYentwickeln wir ein neues Modell, welches diese Aufgabe formalisiert. Dabei werden die

(8)

Homogenitätsanforderungen jeder potentiellen Gruppe durch einen „Mus-tervektor“ spezifiziert. Die Informationen über die Individuen sind in einer Matrix gespeichert, wo Individuen durch Zeilen und ihre Attribute durch Spalten repräsentiert werden. Wir zeigen, dass einige Spezialfälle des sich ergebenden Problems HOMOGENEOUSTEAMFORMATIONNP-schwer sind während andere FPT-Ergebnisse ermöglichen. Zum Beispiel ist HOMOGE -NEOUSTEAM FORMATIONbereitsNP-schwer, wenn die Matrix nur zwei Spalten besitzt. Im Gegensatz dazu haben wir ein FPT-Ergebnis für den Pa-rameter „Anzahl an potentiellen Teams“ sowie für den PaPa-rameter „Anzahl unterschiedlicher Individuen“. Wir übertragen unser „Mustervektorkon-zept“ zurück in die Welt der kombinatorischen Datenanonymisierung und zeigen, dass es helfen kann die Nutzbarkeit der anonymisierten Daten zu verbessern. Wir zeigen, dass das zu Grunde liegende ProblemNP-schwer ist und ergänzen dies durch ein FPT-Ergebnis bezüglich eines „Homogenitäts-parameters“. Aufbauend darauf entwickeln wir sowohl eine ILP-basierte exakte Lösungsmethode als auch eine Heuristik. In Experimenten mit empirischen Daten zeigen wir, dass sich unser Algorithmus gut mit dem etablierten „Mondrian“-Algorithmus für k-ANONYMITYin Bezug auf An-onymisierungsqualität messen kann und diesen in Bezug auf Effizienz schlägt.

Unsere dritte Aufgabe ist es ein Team effektiv auszubilden, um sicherzu-stellen, dass aus einer Menge von wichtigen Fähigkeiten jede jeweils von der Mehrheit der Teammitglieder beherrscht wird. Wir formalisieren diese Aufgabe durch ein natürliches Matrixmodifikationsproblem auf binären Matrizen, wobei Teammitglieder durch Zeilen und deren Fähigkeiten durch Spalten repräsentiert werden. Das resultierende Problem ist bekannt als LOBBYINGim Kontext von Bestechung in Wahlen. Wir untersuchen wie natürliche Parameter wie „Anzahl an Zeilen“, „Anzahl an Spalten“, „Anzahl an zu modifizierenden Zeilen“ oder die „maximale Anzahl an fehlenden Einsen pro Spalte um eine Mehrheit an Einsen zu erhalten“ (im Folgenden „Gap-Wert“) die Berechnungskomplexität unseres Problems beeinflussen.

Auf der negativen Seite zeigen wirNP-Schwere, sogar wenn jede Zeile höchs-tens drei Einsen enthält. Auf der positiven Seite zeigen wir zum Beispiel ein FPT-Ergebnis für den Parameter „Anzahl an Spalten“ und entwickeln eine Heuristik mit logarithmischen Approximationsfaktor, welche für In-stanzen mit bis zu vier Spalten optimale Ergebnisse liefert. Wir zeigen auch empirisch, dass diese Heuristik gut auf allgemeinen Instanzen

(9)

funk-tioniert. Als weiteres Schlüsselergebnis zeigen wir, dass unser Problem

LOGSNP-vollständig ist für konstante Gap-Werte.

Unsere vierte Aufgabe ist es Teams gleicher Größe neu aufzuteilen. Ge-nauer versucht man die Anzahl gleichgroßer Teams zu reduzieren indem man einige Teams auflöst, deren Mitglieder an nicht in Konflikt stehen-den verbleibende Teams verteilt und dabei sicherstellt, dass alle neuen Teams wiederum gleich groß sind. Wir formalisieren diese Aufgabe durch ein neues kombinatorisches Graphmodell. Wir zeigen dessen Beziehungen zu bekannten Graphkonzepten wie Perfekten Matchings, Flussnetzwerken, und Sternpartitionen von Graphen. Auf der negativen Seite zeigen wir, dass das zu Grunde liegende ProblemNP-schwer ist, sogar wenn die alte Team-größe und der TeamTeam-größenanstieg voneinander verschiedene Konstanten sind. Auf der positiven Seite zeigen wir, dass unser Problem in Polynom-zeit lösbar ist, wenn es keine Konflikte gibt oder wenn die aufzulösenden und zu gewinnenden Teams bereits bekannt sind. Außerdem zeigen wir ein FPT-Ergebnis in Bezug auf den Parameter „Baumweite“, wenn die alte Teamgröße und der Teamgrößenanstieg Konstanten sind.

(10)

(11)

Abstract

In this thesis, we identify and develop simple combinatorial models for four natural team management tasks and identify tractable and intractable cases with respect to their computational complexity. To this end, we perform a multivariate complexity analysis of the underlying problems and test some of our algorithms on synthetic and empirical data.

Our first task is to find a team that is accepted by competing groups and also satisfies the agenda of some principal. Extending an approval balloting procedure by an agenda model, we formalize this task as a sim-ple combinatorial model where potential team members are represented by a set of proposals and the competing groups are represented by voters with favorite ballots, that is, subsets of proposals. We show that the under-lying problems UNANIMOUSLYACCEPTEDBALLOTand MAJORITYWISE ACCEPTEDBALLOTareNP-hard even without an agenda for the principal. Herein, UNANIMOUSLYACCEPTEDBALLOTasks for a set of proposals that is accepted by all voters and MAJORITYWISEACCEPTEDBALLOTasks for a set of proposals that is accepted by a strict majority of the voters where acceptance means that each voter supports the majority of the proposals. On the positive side, we show fixed-parameter tractability with respect to the parameters “number of proposals” and “number of voters”. With respect to the parameter “maximum size of the favorite ballots” we show fixed-parameter tractability for UNANIMOUSLY ACCEPTEDBALLOTand W[1]-completeness for MAJORITYWISEACCEPTEDBALLOT. On the nega-tive side, we showW[2]-hardness for the parameter “size of the solution” andNP-hardness for various special cases.

Our second task is to partition a set of individuals into homogeneous groups. Using concepts from the combinatorial data anonymization model k-ANONYMITY, we develop a new model which formalizes this task. The information about the individuals is stored in a matrix where rows repre-sent individuals and columns reprerepre-sent attributes of the individuals. The homogeneity requirement of each potential group is specified by a “pat-tern vector”. We show that some special cases of the underlying problem

(12)

HOMOGENEOUS TEAM FORMATIONare NP-hard while others allow for (fixed-parameter) tractability results. For example, the problem is already NP-hard even if the matrix has only two columns. In contrast, the problem is fixed-parameter tractable for the parameter “number of potential groups” as well as for the parameter “number of different individuals”. We transfer our “pattern vector” concept back to combinatorial data anonymization and show that it may help to improve the usability of the anonymized data. We show that the underlying problem PATTERN-GUIDEDk-ANONYMITY isNP-hard and complement this by a fixed-parameter tractability result based on a “homogeneity parameterization”. Building on this, we develop an exact ILP-based solution method as well as a simple but very effective greedy heuristic. Experiments on several real-world datasets show that our heuristic easily matches up to the established “Mondrian” algorithm for k-ANONYMITYin terms of quality of the anonymization and outperforms it in terms of running time.

Our third task is to effectively train team members in order to ensure that from a set of important skills each skill is covered by a majority of the team. We formalize this task by a natural binary matrix modification problem where team members are represented by rows and skills are represented by columns. The underlying problem is known as LOBBYINGin the context of bribery in voting. We study how natural parameters such as “number of rows”, “number of columns”, “number of rows to modify”, or the “maximum number of ones missing for any column to have a majority of ones” (referred to as “gap value”) govern the computational complexity. On the negative side, we showNP-hardness even if each row contains at most three ones. On the positive side, for example, we prove fixed-parameter tractability for the parameter “number of columns” and provide a greedy logarithmic-factor approximation algorithm. We also show empirically that this greedy algorithm performs well on general instances. As a further key result, we proveLOGSNP-completeness for constant gap values.

Our fourth task is to redistribute teams of equal size. More precisely, one asks to reduce the number of equal-size teams by dissolving some teams, distributing their team members to non-conflicting non-dissolved teams, and ensuring that all new teams are again of equal size. We formalize this task by a new combinatorial graph model. We show relations to known graph models such as perfect matchings, flow networks, and star partitions. On the negative side, we show that the underlying problem isNP-hard even if the old team size and the team size increase are distinct constants. On

(13)

the positive side, we show that even our two-party variant of the problem is polynomial-time solvable when there are no conflicts or when the districts to dissolve and the districts to win are known. Furthermore, we show fixed-parameter tractability with respect to treewidth when the old team size and the team size increase are constants.

(14)

(15)

Preface

This thesis summarizes parts of my work as research assistant at Friedrich-Schiller-Universität Jena and Technische Universität Berlin, supported by the Deutsche Forschungsgesellschaft (DFG) under the project name “Parametrisierte Algorithmik für Wahlsysteme” (parameterized algorith-mics for voting systems), PAWS, project number NI 369/10.

The results in this thesis are partially contained in journal and conference publications and have been obtained in closed collaboration with several coauthors. In the following, I will explain this in more detail.

InChapter 3we discuss the computational complexity of the problem UNANIMOUSLYACCEPTEDBALLOTand its majority variant MAJORITY -WISEACCEPTEDBALLOT. The research on this topic was stimulated by a discussion with Jérôme Lang (Université Paris-Dauphine), while he was visiting TU Berlin in October 2012. I developed the model with Jiehua Chen, my supervisor Rolf Niedermeier and Gerhard J. Woeginger (TU Eindhoven), who was visiting TU Berlin as a Humboldt research award winner from October 2012 until June 2013. Together with Jiehua Chen, I developed and implemented the basic intractability results, such as theW[2]-hardness re-sult with respect to the parameter “solution size” as well the “non-existence of polynomial kernels”, and we developed the integer linear programming formulation. In collaboration with Stefan Kratsch, who joined us for this project, I revised the proof of the W[1]-membership with respect to the parameter “maximum size of the favorite ballot” which was first proposed by Gerhard J. Woeginger and realized by Jiehua Chen. I presented the corresponding extended abstract at the 3rd International Conference on Algorithmic Decision Theory (ADT ’13) [Alo+13a]. Compared to this paper, I extended the agenda model allowing the leader to specify a lower bound on the number of agenda proposals that should be contained in the solution bal-lot (instead of forcing all agenda proposals being part of the solution balbal-lot). I adapted all algorithms to also work with this extended model and added a W[1]-hardness proof for the majority variant with respect to the parameter “maximum size of the favorite ballot”. The combinatorial results which have

(16)

been contributed by Noga Alon (Tel Aviv University) to the conference paper are not part of this thesis. The full version containing these all results will be published in Transactions on Economics and Computation [Alo+15]

In Chapter 4 we discuss the computational complexity of the HOMO -GENEOUS TEAM FORMATION problem which is closely related to the k-ANONYMITYproblem used in combinatorial data anonymization. While Geevarghese Philip (The Institute of Mathematical Sciences, Chennai) was visiting our group in November 2010, Rolf Niedermeier proposed to ana-lyze the parameterized complexity of combinatorial data anonymization. Together with André Nichterlein, we started our investigation with the k-ANONYMITYproblem—mainly with respect to homogeneity parameters. Already in the beginning of our discussions, we also considered a user-oriented model where the “types of allowed suppressions” can be specified by “pattern vectors” and observed that this model has a nice “data clus-tering view”. Most of the work has been done in close collaboration when Geevarghese Philip visited us in Berlin (November 2010 and March 2011) and when I visited Geevarghese Philip in Chennai, India (December 2010). Our results for k-ANONYMITYappeared in an extended abstract which was presented by André Nichterlein at the 18th International Symposium on Fundamentals of Computation Theory (FCT’11) [Bre+11b]; a full version appeared in Data Mining and Knowledge Discovery [Bre+14e]. In parallel to FCT, I presented an extended abstract containing our findings concerning the user-oriented anonymization and clustering model at the 36th Inter-national Symposium on Mathematical Foundations of Computer Science (MFCS’11) [Bre+11a]. For the full version of the MFCS paper, we changed the focus of our user-oriented model to the clustering view—more precisely to the team formation view. To this end, we slightly extended the model and adapted the algorithms accordingly. Furthermore, Thomas Köhler, who worked on this topic in his diploma thesis under co-supervision of André Nichterlein, Rolf Niedermeier and me, joined our project and contributed with some improvements in the intractability proofs and a fixed-parameter algorithm with respect to the number of pattern vectors if there are no cost bounds. The full paper appeared in Algorithmica [Bre+15b].

InChapter 5, we discuss the computational complexity of the PATTERN -GUIDED k-ANONYMITY problem. It is an improved version of our first user-oriented k-ANONYMITYapproach [Bre+11a]. I worked on this slightly modified model together with André Nichterlein and Rolf Niedermeier. I organized the experimental evaluation, implemented the ILP-based

(17)

al-gorithm, and co-supervised our student assistants Kolja Stahl, who im-plemented the greedy algorithm, and Thomas Köhler, who collected and prepared the test data. I also contributed to most parts of the theoretical analysis. An extended abstract appeared in the Proceedings of the Joint Conference of the 7th International Frontiers of Algorithmics Workshop and the 9th International Conference on Algorithmic Aspects of Information and Management (FAW-AAIM ’13) [BNN13b]. The full paper appeared in Algorithms [BNN13a].

InChapter 6, we discuss the multivariate complexity of the LOBBYING problem. Working on this problem was proposed by Rolf Niedermeier while Ondˇrej Suchý (Universität des Saarlandes) visited our group (in the second half of 2011). Besides Ondˇrej, also Jiehua Chen, Sepp Hartung, and Rolf Niedermeier were part of the research team. I was mainly responsible for theNP-hardness result when each row contains at most three ones and for the intractability of the below-guarantee parameterization. An extended abstract appeared in the Proceedings of the 26th Conference on Artificial Intelligence (AAAI ’12) [Bre+12]. Gerhard J. Woeginger joined the team for the journal version of paper and showed LOGSNP-completeness for instances with constant maximum gap value. For this journal version, I revised the correctness proof for our greedy algorithm for instances with up to four columns and showed a logarithmic approximation factor for general instances. Furthermore, I organized and evaluated the experiments and co-supervised our student assistant Kolja Stahl who extracted the real-world test data. The full version appeared in Journal of Artificial Intelligence Research [Bre+14b]. Compared to this journal version, in this thesis, I extended the experimental evaluation to a larger dataset.

InChapter 7, we discuss and newly introduce a model for network-based vertex-dissolution. We analyze its relation to known graph concepts and the computational complexity of the underlying problems. Motivated from gerrymandering scenarios, I proposed to work on “redistricting models” at a group-internal workshop in March 2013. Together with René van Bevern, Jiehua Chen, Vincent Froese, Rolf Niedermeier, and Gerhard J. Woeginger, I developed the corresponding model. Since we identified several open ques-tions already for the STARPARTITIONINGproblem, which is a special case of our new model, we first analyzed the computational complexity of the STAR PARTITIONINGproblem for several subclasses of perfect graphs. Laurent Bulteau joined our team for the interval graph case. The corresponding re-sults is published in the Proceedings of the 41st International Colloquium on

(18)

Automata, Languages, and Programming (ICALP’ 14) [Bev+14a], but they are not part of this thesis. We also also pursued the full “dissolution model”. Herein, I was leading the discussion and contributed to many results such as the relation to flow networks and restricted two-factors as well as the algorithms for cliques and graphs of bounded treewidth. I presented the cor-responding results at the 39th International Symposium on Mathematical Foundations of Computer Science (MFCS ’14) [Bev+14b]. The full paper will be published in SIAM Journal on Discrete Mathematics [Bev+15].

Besides the already discussed publications, I also contributed to several other journal and conference publications which are not part of this thesis. This includes two survey articles in the context of multivariate complexity analysis for Computational Social Choice problems [Bet+12a; Bre+14a] and several publications in the context of restricted domains for preference profiles [BCW13a;BCW13b], parameterized algorithms for voting [BBN10;

BBN14;Bre+14c], graph anonymization [Bre+13b;Bre+14d], graph

modifi-cation [Bet+11;Bet+12b;Bet+14;Bev+13], and explaining vectors [Bre+13a;

Bre+15a].

Clarifications. We denote by “log” the logarithm with respect to the base two throughout the whole thesis. We also assume that the number zero belongs to the set of natural numbers.

At most places where this thesis considers unspecific people, we explicitly refer to “he or she”, emphasizing that we consider male and female rep-resentatives. However, sometimes we arbitrarily fix one case to make the sentences simpler (for example, inChapter 3we refer to male voters and female leaders).

Acknowledgements. First of all, I am grateful to my supervisor Rolf Nie-dermeier, who directed me through my scientific career from the beginning as student research assistant to my PhD. I am deeply indebted for his guid-ance, advice, and countless tips. Furthermore, I would like to thank all my coauthors and (former) colleges for fruitful discussions and endless rounds of correcting and improving each other’s work. Finally, special thanks goes to my family. Seeing the thirst for knowledge of my children Mira and Wim gave me a key motivation for my research. Without the help of my wife Nina, my parents Frank and Andrea, and my in-laws Baldur and Maria this thesis would not exist.

(19)

I

Introduction and Basics

1

1 Introduction 3

1.1 Building Teams . . . 6

1.2 Modifying Teams . . . 9

2 Basic Concepts and Notation 13 2.1 Voting and Approval Balloting . . . 13

2.2 Combinatorial Data Anonymization . . . 14

2.3 Graphs . . . 16

2.4 Computational Complexity . . . 17

2.5 Algorithmic Techniques . . . 21

II Building Teams

25

3 Collectively Accepted Ballots 29 3.1 Motivation and Model . . . 29

3.1.1 The Model. . . 30

3.1.2 Our Contributions . . . 34

3.2 Theoretical Results . . . 35

3.2.1 NP-Completeness . . . 39

3.2.2 Few Proposals or Few Voters. . . 39

3.2.3 Small Ballots . . . 42

3.2.4 Further Parameterizations. . . 54

3.3 Discussion . . . 59

4 Homogeneous Team Formation 61 4.1 Motivation and Model . . . 61

(20)

Contents

4.1.2 The Full Model . . . 66

4.1.3 Our Contributions . . . 67 4.2 Theoretical Results . . . 68 4.2.1 NP-Hardness . . . 69 4.2.2 Limits of Preprocessing. . . 72 4.2.3 Tractability Results . . . 75 4.3 Discussion . . . 87 5 Pattern-Guided k-Anonymity 89 5.1 Motivation and Model . . . 89

5.1.1 The Model. . . 91 5.1.2 Our Contributions . . . 92 5.2 Theoretical Results . . . 92 5.2.1 Parameterized Complexity . . . 93 5.2.2 ILP Formulation . . . 99 5.2.3 Greedy Heuristic. . . 101 5.3 Experimental Results . . . 103 5.3.1 Data . . . 103 5.3.2 Implementation Setup . . . 104 5.3.3 Quality Criteria . . . 104 5.3.4 Evaluation . . . 106 5.3.5 Heuristic vs. Mondrian . . . 106

5.3.6 Heuristic vs. Exact Solution . . . 111

III Modifying Teams

117

6 Lobbying 121 6.1 Motivation and Model . . . 121

6.1.1 Related Models. . . 125 6.1.2 Our Contributions . . . 126 6.2 Theoretical Results . . . 128 6.2.1 NP-Completeness . . . 129 6.2.2 LOGSNP-Completeness. . . 133 6.2.3 Limits of Preprocessing. . . 137

6.2.4 At Most Two Ones per Row . . . 140

(21)

Contents

6.2.6 Further Parameterizations. . . 149

6.3 Experimental Results . . . 151

6.3.1 Random Instance Generation . . . 152

6.3.2 Real-World Data Generation. . . 154

6.3.3 Results. . . 155

7 Network-Based Vertex Dissolution 163 7.1 Model and Motivation . . . 163

7.1.1 Related work . . . 165

7.1.2 The Model. . . 166

7.1.3 Our Contributions . . . 170

7.1.4 Relation to Established Models . . . 171

7.2 Theoretical Results . . . 178

7.2.1 Complexity Dichotomy for Dissolution . . . 179

7.2.2 Complexity Dichotomy for Biased Dissolution . . . 182

7.2.3 Planar Graphs . . . 188

7.2.4 Cliques. . . 189

7.2.5 Graphs of Bounded Treewidth. . . 193

IV Conclusion

201

8 Conclusion and Outlook 203

(22)

(23)

Part I

(24)

(25)

1 Introduction

Teams are important whenever single agents, which can be humans, robots, or even computer programs, cannot execute some job without the help of others. Sometimes teams emerge without external control. Often, however, the requirements for the teams and their jobs are very complex and external control becomes necessary.

Assume, for example, that a company wants to distribute their employees to two-person teams sharing each an office. Each employee has a strict preference ordering over the other employees. The task is to partition the employees into teams of size two such that no two employees prefer each other over their assigned partner.

Since such team management tasks are often too large to be solved by hand, formal models and efficient algorithms are important to obtain useful solutions. The above-mentioned employee matching task is known as “stable roommates problem” and is a variant of “stable marriage” which is probably the most famous team management task that is successfully solved by a combinatorial model [GI89]. In its classical setting, the input is a set of men and a set of women and each person has a strict preference ordering over the persons of the opposite sex. The task is to find a stable matching between men and women, that is, a matching where no two people of opposite sex prefer each other over their assigned partner. Gale and Shapley [GS62] showed that there is always such a stable matching and developed a polynomial-time algorithm which computes it. This algorithm is used in various applications [Rot84;TST01]. The importance of “stable marriage” was in particular exhibited by awarding the “Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2012”1to Alvin E. Roth and Lloyd S. Shapley for their research in this context.

There are several variants and extensions of the stable marriage model discussed in the literature of which some, like the “stable roommates prob-lem”, remain efficiently solvable and some are computationally intractable

(26)

1 Introduction

(NP-hard), like the “hospitals/residents problem with couples” [GI89]. An efficient algorithm is desired in most situations, but it can also be a disad-vantage. For example, there is an efficient algorithm which computes the optimal manipulation strategy for the protocol used in the algorithm of Gale and Shapley [TST01]. Herein, manipulating means to provide untruthful preferences to improve the outcome compared to reporting truthful prefer-ences. Roth [Rot82] showed that in principle all stable marriage protocols can be manipulated, which means that one cannot completely avoid the possibility of manipulation. However, Pini et al. [Pin+11] proposed a way out of this quandary by designing a stable marriage procedure which is NP-hard to manipulate, that is, manipulation is computationally intractable with respect to worst-case complexity. This kind of “computational barrier” against manipulation is commonly used in the area of voting theory and was originally proposed by Bartholdi III, Tovey, and Trick [BTT92].

In this thesis, we consider and analyze further team management tasks. To this end, the example “stable marriage” provides two important insights which are followed up. First, useful team management tasks may be formal-ized by simple combinatorial models. Second, analyzing the computational complexity of the underlying problems is interesting for two reasons. On the one hand, to find efficient algorithms and, on the other hand, to identify computational barriers which might be even desirable. However, note that having identified a problem to beNP-hard, it is still possible that there is an algorithm that is efficient on average or even that all “realistic in-stances” can be solved efficiently. There are several approaches to cope with hard problems. For example, one may study the average-case complexity or search for approximation algorithms. If one does not want to give up exact solutions and still aims for worst-case performance guarantees, then a multivariate complexity analysis is an appropriate approach.

Multivariate Complexity Analysis. Often, one is confronted withNP -hard problems. However, worst-case -hardness does not imply that instances occurring in real-world applications are hard to solve as well. These in-stances may have structural properties that allow for efficient algorithms. In a multivariate complexity analysis, one tries to identify problem parameters and combinations thereof that have a strong influence on the computational complexity of the problem. To this end, the concept of fixed-parameter tractability, which is central in parameterized algorithmics [DF13;FG06;

(27)

Nie06], comes into play. Roughly speaking, a problem is fixed-parameter tractable with respect to some parameter if there is an algorithm which solves the problem exactly and the, probably unavoidable, super-polynomial part of its running time only depends on the parameter. Furthermore, pa-rameterized complexity theory also allows to identify parameters which presumably do not allow for fixed-parameter tractability.

Related work. Compared, for example, to graph theory or voting theory, there is no defined research area or research community considering team management tasks and we are not aware of surveys considering different types of team management tasks. However, several models for team man-agement tasks are considered, for example, in the context of operations research [Abd09;BDD07;CGS07;CL04;Wi+09;ZK99] and social network analysis [KAZ12;LLT09;MDN12]. The models are often quite complex and the problems tend to beNP-hard. Analyzing the computational complex-ity of the corresponding problems, however, often plays a minor role. The investigations so far mainly focus on approximation algorithms, several heuristics and linear programming approaches. To the best of our knowl-edge, a multivariate complexity analysis of any team management task was not performed so far. We initiate this line of research in this thesis.

Our Goal. We propose simple combinatorial models for natural team management tasks and show that a multivariate complexity analysis of the underlying problems helps to identify practically relevant situations where a task can be solved efficiently. To identify and develop these models, we use concepts from voting theory, data anonymization, and graph theory. Then, we identify important tractable and intractable special cases, and develop polynomial-time algorithms as well as fixed-parameter algorithms. Although the main focus of the thesis is the theoretical analysis, we also verify some of our findings by experiments. More precisely, for two models, where real-world test data as well as competing algorithms are available, we empirically evaluate the performance of our algorithms and also develop heuristics.

In this thesis, we focus on four natural team management tasks. To this end, we consider two main categories of team management tasks. Whenever one asks to find or select individuals in order to form one or multiple teams, we speak about a “team building task”. Whenever one or multiple teams

(28)

1 Introduction

are already present and one asks to improve the team(s) or its (their) composition(s), we speak about a “team modification task”. For each of the two categories, we consider one task affecting a single team and one task affecting multiple teams.

In the following, we briefly sketch the four tasks we consider in this thesis and illustrate each task by a simple example. We discuss the tasks and their formalization in detail in the two main parts of the thesis. We explore the team building tasks inPart IIand the team modification tasks inPart III.

1.1 Building Teams

Stable marriage is a very basic task where one asks to partition a set of individuals into teams of size exactly two. However, in many situations one needs larger teams and less restrictive size constraints. An important example is the task of assigning individuals to project teams. Usually, the objective is to maximize a certain kind of “team expertise”. To this end, several different approaches are discussed in the context of operations research [Abd09; BDD07; Wi+09; ZK99]. Team expertise is clearly not the only interesting criterion to form teams; one should additionally take into account aspects such as similarity of team members, conflicts between team members, or preferences of individuals. Next, we sketch the two team building tasks which will be analyzed inPart IIof the thesis.

Selecting an Accepted Team. Assume that a society wants to agree on a team that should represent the society. The input is a set of acceptable team members for each member of the society. We assume that a member of the society accepts a team if he or she accepts the strict majority of the team members. We are interested in two tasks: satisfying the whole society or a majority of the society. In addition, we also consider that there is an agenda containing the favorites of some principal and the task extends to additionally satisfying the agenda of the principal.

Example 1.1. Assume that the head of the department of mathematics

and philosophy wants to find a team of researchers representing the depart-ment at a prestigious congress. There are the following proposals for team members: an algebraist, a logician, an existentialist, and a statistician. Each workgroup of the department has a set of acceptable representatives

(29)

1.1 Building Teams

workgroup acceptable representatives algebra algebraist, logician

logic algebraist, logician, statistician stochastic algebraist, logician, statistician modern philosophy logician, existentialist, statistician classical philosophy logician, existentialist

Figure 1.1: Acceptable representatives for the departments.

as depicted inFigure 1.1. Sending only the logician would be acceptable for all workgroups. However, the head of the faculty wants to send the existentialist, since she is an exceptionally gifted speaker. Then, the team “algebraist, logician, and existentialist” is the only team that is acceptable for all workgroups and for the head of the department. All other teams that are acceptable for all workgroups to not contain the existentialist.

Using concepts from voting theory, we formalize the above task as a simple combinatorial model. To this end, we use a known voting procedure [FP04] and extend it by an agenda concept. In Chapter 3, we show that the underlying problems are computationally intractable even without agenda. On the positive side, we provide fixed-parameter tractability results which indicate that there are relevant cases where exact solutions can be computed efficiently, for example, when the number of potential team members or the size of the society is small. Moreover, we identify cases where satisfying the whole society is computationally easier than satisfying the majority of the society.

Forming Homogeneous Teams. Assume that students have to be as-signed to project teams. To be performed by the teams, the projects require some degree of homogeneity of the team members. For example, for a sports project it might be important that the team members have similar fitness level and body height. For a mathematics project it could be important to agree on whether zero is a natural number. To this end, we consider the task where the input is a set of individuals with attributes and a set of projects with different requirements on the homogeneity of the attributes. The task is to assign each individual to some project ensuring that the homogeneity requirements are satisfied.

(30)

1 Introduction

Attributes of the students: prog. language LP-solver location

C++ CPLEX Berlin

Haskell Gurobi Berlin

C++ CPLEX Jena

C++ CPLEX Saarbrücken

Haskell CPLEX Berlin

Homogeneity pattern of the projects: LP implementation ? Traffic monitoring ?

Homogeneous teams respecting the pattern:

Team 1: 3× C++ CPLEX ?

Team 2: 2× Haskell _? Berlin

Figure 1.2: Example assignment of students to project teams. A-symbol enforces homogeneity and a?-symbol allows heterogeneity. A solution respecting the given homogeneity pattern is given in the bottom table.

Example 1.2. Assume that five students apply for two programming

projects. The first project comprises an implementation for which knowledge of some high-level programming language and an LP-solver is required. To work together on such a project the students must agree on the program-ming language as well as the LP-solver. The second project is a software implementation for a traffic monitoring system. For testing the system in a real-world scenario the students should live in the same city and for realizing the implementation they also have to agree on the programming language. SeeFigure 1.2for an illustration.

InChapter 4, using concepts from combinatorial data anonymization, we develop a new combinatorial model formalizing the above task. We show that the underlying problems are computationally intractable in general, but we identify several tractable cases by means of a multivariate complexity analysis. For example, the problem becomes tractable if there are only few potential teams or if there are only few classes of individuals (with respect

(31)

1.2 Modifying Teams

to the attributes of the individuals). InChapter 5we transfer back our new concept to a closely related data anonymization task. We again analyze the multivariate complexity of the newly introduced anonymization model. Using an algorithmic scheme fromChapter 4, we develop a greedy heuristic and an integer linear program formulation and test them on synthetic and empirical data. This allows us to empirically evaluate the usefulness of our findings fromChapter 5at least for the closely related data anonymization model where real-world test data is available.

1.2 Modifying Teams

Team management remains challenging if the teams already exist. Many interesting tasks arise if the teams are known. Next, we sketch the two team modifying tasks which will be analyzed inPart IIIof the thesis.

Effective Training of Team Members. Assume that a company wants to improve the performance of a team. Having identified a set of important skills, the goal is to train some team members in order to ensure that each important skill is sufficiently covered. The input is a set of individuals, each with a set of personal skills, and a set of important skills. The task is to send a minimum number of individuals to training courses such that afterwards each important skill is covered by a majority of the team members. We assume that an individual that was sent to a training course gains all important skills.

Example 1.3. Assume that the head of a research group wants to train

her group members. She identified the following key skills necessary for high-quality papers: English language, mathematical accuracy, motivation, problem solving. The head wants to ensure that each skill is covered by more than half of the group members by sending some members to training courses. The personal skills of the group members are depicted inTable 1.1. The head can reach her goal by sending the first two group members to training courses. To verify this replace the first two rows in the table by all-1 rows and check that afterwards each column contains a majority of 1s.

We formalize above task by a simple combinatorial matrix modification problem which is known from voting theory [Chr+07]. To this end, we identify team members with rows and important skills with columns of

(32)

1 Introduction

Table 1.1: Skills of the research group members. There is a “1” in row i and column j if and only if group member i covers skill j.

English language mathematical accuracy motivation problem solving

1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 0 1 0 0 1 0 1

a binary matrix (as sketched inTable 1.1). InChapter 6we analyze the multivariate complexity of the corresponding problem and identify tractable and intractable cases. Furthermore, we develop a greedy heuristic and an integer linear programming formulation and evaluate their quality and performance on synthetic and real-world test data. For the above-mentioned team modification task, our results imply that instances with small team sizes or with a small number of important skills can be solved efficiently.

Redistributing Teams of Equal Size. Assume that a company has a set of important resources which each serves a team of s people. Due to some technical improvement each resource can suddenly serve∆sadditional

people. Now, the idea is to sell some resources, dissolve the corresponding teams, and redistribute their members to the remaining teams. However, there are some conflicts between the teams which prevent an arbitrary redistribution between the teams. Altogether, our input is a set of size-s teams, a conflict relation between the teams, and a number∆s. The task

is to dissolve some of the teams and redistribute their members to the non-conflicting remaining teams such that each new team is of size s +∆s.

Example 1.4. Consider a research group with three offices each occupied

by a two-person team. Now, the group moves into a new building with three-person offices. To distribute the group members to the new offices one team must be dissolved. However, there are private conflicts between the first two teams so that only the third team can be dissolved. SeeFigure 1.3

(33)

1.2 Modifying Teams Team 3 Team 1 Team 2 s = 2 ∆s= 1 conflict

Figure 1.3: An illustration of the dissolution of one out of three size-two teams. Teams are represented by large circles and team mem-bers by small circles. The conflicts are illustrated by the graph on the top. A solution is to dissolve the team which is not in-volved in a conflict and to distribute one team member to each of the remaining teams (illustrated on the left) resulting in two teams of size three (illustrated on the right).

InChapter 7, we formalize the above task by a new combinatorial graph model. We show relations of our new model to classic graph concepts such as perfect matchings, flow networks, and star partitions. Then, we analyze the computational complexity of the underlying graph problems and identify tractable and intractable cases. On the positive, we show that, for example, the above task can be solved efficiently if the teams to be dissolved are known in advance or if there are no conflicts or if s =∆s. On

the negative side, we show, for example, that the underlying problems are computationally intractable even if s and∆sare small constants.

(34)

(35)

2 Basic Concepts and Notation

In this chapter, we give a brief introduction to concepts which play an im-portant role in this thesis and we explain the notation employed throughout this thesis.

2.1 Voting and Approval Balloting

Voting scenarios arise whenever the preferences of different parties have to be aggregated to form a joint decision, for example in political elections, group decisions, web site rankings, or multiagent systems.

An election (A, V ) consists of a set A of alternatives and a multiset V of votes. In classical elections, votes are linear orders on A. For example, vote a Â b Â c expresses that alternative a is most preferred and alternative c least preferred. The set of (different) linear orders in V is called preference profile of (A, V ).

We also consider approval elections (A, V ) where the votes are subsets of the set A of alternatives. A subset of A is also called ballot. The idea is that each voter’s ballot contains all alternatives the voter approves. Analogously to preference profiles, an approval profile is the set of different ballots in V . A voting procedure maps an election to the winner set, which is a subset of the set of all alternatives. For example, the voting procedure Plurality maps an election to the set of alternatives that are most preferred in the maximum number of votes. Another example, the voting procedure Approval Voting, maps an approval election to the set of alternatives that are approved in the maximum number of votes. Intuitively, a voting procedure aims to find the “best alternative”; a winner set containing more than one alternative means that the procedure found several “equally good alternatives”.

An approval balloting procedure [KM12] is a function which maps an approval election to the multi-winner set, which is a set of ballots, that is, a family of subsets of A. For example, the approval balloting procedure Ballot Plurality maps an election to the set of ballots which occur most frequently

(36)

2 Basic Concepts and Notation

in V . Intuitively, an approval balloting procedure aims to find the “best ballot”; a multi-winner set containing more than one ballot means that the procedure found several “equally good ballots”.

To guard against misunderstandings concerning Approval Voting and approval balloting, we explain the difference between the voting procedure Approval Voting and the approval balloting procedure Ballot Plurality in the following example.

Example 2.1. Given is an approval election (A, V ) as follows. The set of

alternatives is {1, 2, 3, 4, 5}. The multiset of votes V contains {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}, and twice the ballot {4, 5}. The winner set with respect to Approval Voting is {1, 2, 3}, because each of these alternative is approved three times and no alternative is approved four times. The multi-winner set with respect to Ballot Plurality is {{4, 5}}, because {4, 5} occurs twice while all other ballots occur once.

Note that, depending on the application, the alternatives are also called proposals, issues, or candidates.

Sometimes one considers elections with restricted domains of preference (respectively approval) profiles where not all votes may occur together. For example, we say a preference profile is single-peaked [Bla48] if there is a linear order >Lover the alternatives such that

(a >Lb >Lc) ∨ (c >Lb >La) implies that in each vote (a Â b) → (b Â c).

Analogously, Faliszewski et al. [Fal+11] introduced the concept of peaked approval profiles as follows. We say an approval profile is single-peaked if there is a linear order >Lover the alternatives such that

a >Lb >Lc implies that in each vote Ai⊆ A it holds ({a, c} ⊆ Ai) → (b ∈ Ai).

2.2 Combinatorial Data Anonymization

Assume that data about n individuals is represented by length-m vectors consisting of attribute values. In other words, we are given an n × m data matrix with entries from some (potentially large) alphabetΣ. We distin-guish between four categories of attributes: identifying attributes, quasi-identifying attributes, sensitive attributes and non-sensitive attributes. Identifying attributes (such as names) are those attributes which directly

(37)

2.2 Combinatorial Data Anonymization

allow to identify the “owner” of the data row. Quasi-identifying attributes (such as gender or date of birth) contain publicly known information about the owner of the data row which may indirectly allow for identification. For example, Sweeney [Swe00] demonstrated that using a combination of the quasi-identifier attributes “gender, ZIP code, and birth date” one can identify 87 % of the US population. Sensitive attributes (such as disease) are those attributes that contain private information that, on the one hand, is most interesting for people using the data (for instance, in medical studies) and, on the other hand, needs to be protected from connecting to the owner of the data row. All remaining attributes are called non-sensitive.

Clearly, to achieve anonymity and to protect sensitive information one has to remove the identifying attributes. The owner of the data row can still by identified by the quasi-identifying attributes, but one can not simply remove all quasi-identifying attributes since they may contain important information for the user of the data. Hence, to achieve anonymity, other strategies have been considered for the quasi-identifying attributes.

Samarati and Sweeney [SS98], Samarati [Sam01], and Sweeney [Swe02b] devised the notion of k-anonymity to quantify the degree of anonymity in sanitized data. This notion formalizes the intuition that entities that have the same values in all quasi-identifying attributes cannot be distinguished from one another. For a positive integer k we say that a matrix M is k-anonymous if, for each row r in M, there are at least k − 1 other rows in M which are identical to r. Thus k-anonymity provides a clear and simple com-binatorial model for sanitizing data: choose a value of k which would satisfy the relevant privacy requirements, and then try to modify—“at minimum cost”—the data matrix restricted to quasi-identifying attributes in such a way that it becomes k-anonymous. The corresponding decision problem k-ANONYMITYasks, additionally given an upper bound s for the number of suppressions allowed, whether a matrix can be made k-anonymous by suppressing (blanking out) at most s entries.

One drawback of the k-anonymity concept is as follows. Assume that one has a set of at least k rows which are identical with respect to quasi-identifying attributes and sensitive attributes. Then, the attacker does not know the row corresponding to the individual, but the private information belonging to the individual is revealed. To prevent this, different concepts were introduced, e. g. p-SENSITIVITY[TV06],`-DIVERSITY[Mac+07], and t-CLOSENESS[LLV07].

(38)

The p-SENSITIVITYconcept is an enhancement to k-ANONYMITYin the sense that there is the additional requirement that in the output matrix each maximal set of rows that are identical in the quasi-identifiers contains at least p different private values. The private value can be seen as the concatenated values of the sensitive attributes. The`-DIVERSITYconcept requires that each maximal set of rows that are identical in the quasi-identifiers contains at least` “well represented” private values. A simple realization of “well represented” is for example to require that in each such set of rows the relative frequency of each private value is at most 1/`

[Won+06;XT06].

2.3 Graphs

Unless stated otherwise, we consider simple, undirected graphs G = (V , E), where V is a set of n vertices and E ⊆¡V

2¢ is a set of m edges. We use ¡ V 2

¢

to denote the family of all size-two subsets of V . For a given graph G, we denote by V (G) the set of vertices and by E(G) the set of edges of G. For a subset V0_{⊆ V (G) of vertices and a subset E}0_{⊆ (E(G) ∩}¡V0

2¢) of edges, the

graph G0= (V0, E0) is called a subgraph of G. We also say G contains G0. For a vertex subset V0⊆ V , the induced subgraph G[V0] of G is defined as G[V0] := (V0, E ∩¡V0

2¢).

A path is a graph P = (V , E) with vertex set V = {v1, v2, . . . , vn}and edge set

E = {{v1, v2}, {v2, v3}, . . . , {vn−1, vn}}. The vertices v1and vnare the endpoints

of the path P. We say two vertices v and v0in a graph G are connected if G contains a path with the endpoints v and v0. A graph is connected if every two vertices are connected. The connected components of a graph are its maximal connected subgraphs. For a vertex v ∈ V , we denote by N(v) := {u ∈ V | {u, v} ∈ E} the neighborhood of v, that is, all vertices that are connected to v by an edge.

A t-star is a graph K1,t= (V , E) with vertex set V = {v1, v2, . . . , vt+1}and

edge set E = {{v1, vi} | 2 ≤ i ≤ t + 1}. The vertex v1is called the center of the

star. A t-star partition of G is a partition {V1, . . . , Vbn/(t+1)c}of the vertex

set V into subsets of size t + 1 such that each G[Vi] contains a t-star as a

(39)

2.4 Computational Complexity

Our aim is to provide a deeper understanding of the computational complex-ity of someNP-hard problems. To this end, we employ classical complexity classes such asP(polynomial time) andNP(nondeterministic polynomial time) [GJ79] as well as the classLOGSNPof limited nondeterminism (pre-sumably lying between P and NP) [GLM96; PY96] and parameterized complexity classes such asFPT(fixed-parameter tractability),W[2](second level of the “weft hierarchy” of presumable parameterized intractability),

andXP[DF13;FG06;Nie06].

To study computational complexity, we formally introduce the concept of decision problems as follows. LetΣbe some finite alphabet and letΣ∗denote the set of all finite words consisting of letters fromΣ. A decision problem is a language L ⊆Σ∗_{. An instance of L is an element of}_Σ∗_{. Intuitively,}

L encodes a set of instances which have some “common property”. For example, consider the toy problem ODDINTEGERwhich can be formalized by the language L = {x ∈ {0,1}∗_{| x ends with 1}. Instead of this formal}

definition we use an input-question notation as follows. ODDINTEGER

Input: An integer x (encoded in binary). Question: Is x odd?

In general, the concrete encoding of a problem may have an influence on its computational complexity. However, for all problems considered in this thesis we can assume a standard encoding for example by bit strings. Furthermore, we usually limit our computational complexity analysis to decision problems, but all our algorithms can be adapted to also construct the corresponding “solution”.

Finally, for an instance x ∈Σ∗and a problem L we say x is a yes-instance of L if x ∈ L; otherwise we say x is a no-instance of L.

Pvs. NP. In classical computational complexity theory, to distinguish between tractability and intractability one uses the complexity classesP andNP. Herein,Pcontains all problems that can be decided in polynomial time by a deterministic Turing machine andNPcontains all problems that can be decided in polynomial time by a non-deterministic Turing machine. It is widely believed thatNPcontains problems which are not inP. Intuitively,

(40)

NPcontains problems which are harder to decide than any problem inP. To identify the “hardest” problems inNP, we use the concept of polynomial-time many-one reductions as follows.

Definition 2.1. Let L and L0 be two decision problems. Then, a func-tion f :Σ∗_→_Σ∗_{is called polynomial-time many-one reduction (or simply}

polynomial-time reduction) from L to L0_{if for each instance x ∈}_Σ∗

• x0_{:= f (x) can be computed in polynomial time, and}

• x ∈ L ⇐⇒ x0_{∈ L}0_.

This can be read as “L0is at least as hard as L up to polynomial factors”. A problem L isNP-hard if for each problem L0fromNPthere is a polynomial-time reduction to L. We say anNP-hard problem isNP-complete if it is also inNP. Hence, with respect to polynomial-time reductions,NP-complete problems are the “hardest problems inNP”.

LOGSNP. Papadimitriou and Yannakakis [PY96] introducedLOGSNPto

precisely characterize the computational complexity of certain problems in NPthat are neither known to beNP-complete nor known to be solvable in polynomial time. LOGSNPis a subclass of problems inNPwhich can be decided in polynomial time with an initial phase of O(log2N) nondetermin-istic steps, where N is the overall input size. LOGSNPdoes not include all problems decidable in polynomial time after O(log2N) nondeterministic steps since it puts additional restrictions on the computation. We omit the formal definition ofLOGSNP, because it is quite technical and not needed in our proofs.

It is widely believed thatLOGSNPis properly intermediate betweenP andNP. Problems complete forLOGSNPunder polynomial-time reductions include RICHHYPERGRAPHCOVER(seeSubsection 6.2.2for the definition) and LOGADJUSTMENT[PY96]. In LOGADJUSTMENT, a boolean expression in conjunctive normal form with r variables and a truth assignment T are given, and the question is whether there is a satisfying truth assignment whose Hamming distance from T is at most log r.

To identifyLOGSNP-complete problems we use polynomial-time reduc-tions from (showing hardness) or to (showing membership) knownLOGSNP -complete problems.

We mention in passing that alternative characterizations ofLOGSNP exist [Cai+97;FG06, Sec. 15.2]).

(41)

2.4 Computational Complexity

Fixed-Parameter Tractability andXP. The concept of parameterized complexity was pioneered by Downey and Fellows [DF13] (see also further textbooks [FG06;Nie06]). The fundamental goal is to find out whether the seemingly unavoidable combinatorial explosion occurring in algorithms to solveNP-hard problems can be confined to certain problem-specific parame-ters. If such a parameter assumes only small values in applications, then an algorithm with a running time that is exponential exclusively with respect to this parameter may be efficient. Formally, a parameterized problem is a language L ⊆Σ∗_×_Σ∗_{and the second component is called parameter. For}

the sake of convenience we assume that a parameter is an non-negative integer and a combined parameter (which is a vector of parameters) can be simply seen as the sum of its components.

Definition 2.2. A parameterized problem L is fixed-parameter tractable if

for every instance (x, p) there is a deterministic algorithm which decides whether (x, p) ∈ L in f (p)·|x|O(1)time where f is a computable function solely depending on p. Equivalently, we say L is contained in the parameterized complexity classFPT.

If the problem can only be solved in polynomial running time where the de-gree of the polynomial depends on p (such as |x|f (p)), then, for parameter p, the problem is said to lie in the—strictly larger [DF13]—parameterized complexity classXP. Note that containment inXPensures polynomial-time solvability for a constant parameter p whereasFPTadditionally ensures that the degree of the corresponding polynomial is independent of the pa-rameter p.

Kernelization. A common way of showing fixed-parameter tractability is through kernelization. A kernelization algorithm takes as input a problem instance x together with a parameter p and transforms it in polynomial time into an instance x0with parameter p0such that (x, p) is a yes-instance if and only if (x0, p0) is a yes-instance and there is a function f such that p0≤ f (p) and |x0| ≤ f (p). The function f measures the size of the (problem) kernel (x0, p0). A problem kernel is said to be a polynomial kernel if f is polynomially bounded. Note that it is well-known that a decidable problem is fixed-parameter tractable with respect to a parameter if and only if it admits a problem kernel [Cai+97]. The corresponding kernels, however, may have exponential size and it is of particular interest to determine which

(42)

problems, with respect to which parameter(s), allow for polynomial-size problem kernels [Bod09;GN07;Kra14].

Bodlaender, Thomassé, and Yeo [BTY11] introduced a refined concept of reductions that allows to transfer “non-existence results” for polynomial-size problem kernels to other problems. It is defined as follows.

Definition 2.3. Let L and L0 be two parameterized problems. Then, a function f :Σ∗×Σ∗→Σ∗×Σ∗ is called polynomial time and parameter transformation from L to L0if there is some polynomial g such that for each instance (x, p) ∈Σ∗×Σ∗

• (x0, p0) := f (x, p) can be computed polynomial time, • (x, p) ∈ L ⇐⇒ (x0, p0) ∈ L0, and

• p0_{≤ g(p).}

The difference between a polynomial time and parameter transformation and a classical many-one reduction is that the parameter in the instance one reduces to has to be bounded by some polynomial solely depending on the parameter in the problem instance one reduces from.

Bodlaender, Thomassé, and Yeo [BTY11] showed that, if there are two parameterized problems L and L0_{such that the unparameterized versions}

of L and L0_{are NP-complete and there is a polynomial time and parameter}

transformation from L to L0_{, then a polynomial problem kernel for L implies}

a polynomial problem kernel for L0.

Parameterized Intractability. Downey and Fellows [DF13] introduced a framework of parameterized intractability. Herein, the central tool is the W-hierarchy consisting of the following classes and interrelations:

FPT⊆W[1]⊆W[2]⊆ · · · ⊆W[t]⊆ . . . ⊆XP.

To showW[t]-hardness for any positive integer t, we use the concept of parameterized reduction which is defined as follows.

Definition 2.4. Let L and L0 be two parameterized problems. Then, a function f :Σ∗_×_Σ∗_→_Σ∗_×_Σ∗_{is called parameterized reduction from L to L}0

if there are two computable functions g1and g2such that for each instance

(43)

2.5 Algorithmic Techniques

• (x0, p0) := f (x, p) can be computed in g1(p) · |x|O(1)time,

• (x, p) ∈ L ⇐⇒ (x0_{, p}0_{) ∈ L}0_{, and}

• p0≤ g2(p).

The difference between a parameterized reduction and a polynomial time and parameter transformation is that the former reduction is allowed to takeFPT-time and the parameter in the instance one reduces to has to be bounded by any computable function (instead of a polynomial) solely depending on the parameter in the problem instance one reduces from.

Problem L0isW[t]-hard if for any problem L inW[t], there is a parame-terized reduction from L to L0. Typically, to show that some problem L0is W[t]-hard, we start from some knownW[t]-hard problem L and reduce L to L0. It is widely believed thatFPT6=W[1], that is,W[1]-hard problems are not fixed-parameter tractable [DF13].

W[1]-complete problems include INDEPENDENTSETparameterized by the solution size whileW[2]-complete problems include SETCOVER param-eterized by the solution size [DF13].

If a problem is shown to beNP-hard even if parameter p is a constant, then it cannot be contained in XP, and hence also not in FPT, unless

P=NP[DF13].

2.5 Algorithmic Techniques

In the remainder of this chapter, we introduce some important notations and algorithmic techniques used to develop our algorithms.

Flow Networks. Flow networks are a powerful technique to design poly-nomial time algorithms [AMO93] which we employ several times in this thesis. We use the following notation.

A flow network I∗consists of a directed graph G∗= (V∗, A∗) where V∗is the set of nodes and A∗_{is a set of arcs, an arc capacity function c}∗_{: A}∗_{→ R}+_,

and two distinguished nodes_{σ,τ ∈ V}∗_{denoted as the source and the target}

of the network. An arc is an ordered pair of nodes from V∗_{, and}_R+_{is the}

set of non-negative real numbers.

A (σ,τ)-flow f : A∗_{→ R}+_{is an arc value function with f (u, v) ≥ 0 for all}

(44)

• the capacity constraint is fulfilled, i.e.,

∀(u, v) ∈ A∗: f (u, v) ≤ c(u, v), and • the conservation property is satisfied, i.e.,

∀u ∈ V∗\ {σ,τ} : X

(u,v)∈A∗

f (u, v) = X

(v,u)∈A∗ f (v, u).

We call f integer if all its values are integers. The value of f is defined as P

(σ,u)∈A∗f (σ, u). The maximum value of f can be computed in polynomial time and the maximum value of f is integer if all capacity constraints are integer [AMO93].

Note that we distinguish between vertices in graphs and nodes in flow networks.

Integer Linear Programming. To design fixed-parameter algorithms, integer linear programming has become a powerful tool due to a famous result by Lenstra [Len83], which was later improved by Frank and Tar-dos [FT87] and Kannan [Kan87]. We use this technique several times in this thesis.

Often, the key achievement of the mentioned results is read as “integer linear programming is fixed-parameter tractable with respect to the number of variables”. Formally they considered the following decision problem. INTEGERLINEARFEASIBILITY

Input: An n × m matrix A with integer elements and a length-n integer vector b.

Question: Is there a length m vector x such that A · x ≤ b?

We interpret the entries of x as variables and the rows as constraints and use the standard syntax from linear programming.

Lenstra [Len83] showed that INTEGER LINEAR FEASIBILITYis fixed-parameter tractable with respect to the number m of variables. Frank and Tardos [FT87] and Kannan [Kan87] improved the corresponding running time bounds ending up with the following theorem.

Theorem 2.1 ([FT87;Kan87;Len83]). INTEGERLINEARFEASIBILITYcan be solved using O(m2.5m+m_{· `) arithmetic operations where ` is the number} of bits in the input. The space requirement is polynomial in`.

(45)

2.5 Algorithmic Techniques

To transfer this fixed-parameter tractability result to integer linear pro-gramming formulations, one has to take care of the additional objective function on x. In most cases, we can simply assume some upper or lower bound for the objective value (which is often explicitly given in the decision variant of the problem) and replace the objective function by an additional constraint. If one only has to consider a bounded number k of possible val-ues for the objective function, then one can even simulate the minimization or maximization process by iteratively decreasing or increasing the bound. This gives an additional factor k on the running time bound. We use this trick at several places in the thesis.

(46)

(47)

Part II

(48)

In this first of two main parts, we develop and discuss two models for team building tasks—one model is for forming one team out of a pool of many individuals and the other model is for partitioning a pool of individuals into teams. We show how concepts from voting theory and data anonymization help to understand and to perform the tasks.

Selecting an Accepted Team

Scenario 1. Assume that the head of a large department wants to send a

team of representatives to a prestigious congress. Doing this, the head has to ensure that several rivaling groups accept the team. One way is to de-termine the best representatives in a large open discussion. Unfortunately, experience tells us that such open discussions usually become very chaotic and do not end up with a useful compromise. Hence, the head has to find a promising team that is likely to be accepted by everyone. From polls the set of accepted team members for each rivaling group is known. Possibly, the head also has an own agenda which should be realized, that is, there is a set of favorites from which at least some should get into the team.

To perform this task, different preferences have to be aggregated into a joint decision. Thus, it seems natural to apply voting methods for this task. Indeed, the individuals of the society can be interpreted as voters casting votes over the potential team members (interpreted as alternatives). More precisely, the voters approve or disapprove each single alternative. In our model, the goal is to ensure that the voters accept the selected team. To this end, we assume that a voter accepts a selected team if he or she approved the majority of the team members.

There are several related models in the context of approval-based multi-winner rules [Azi+14;Kil10;KM12], collective domination [ELS11], and pro-portional representation in multiwinner elections [BSU13;CC83;Elk+14;

LB11; Mon95; PB98; PRZ08; SFS13a], in resource allocation [SFL15;

SFS13b], as well as voting by committees [BGS93;BMN05;BSZ91]. In all

cases one aims to select certain alternatives that provide the “best represen-tation” of the voters’ will. The central difference in our model, however, is that we aim for alternatives that are just acceptable to all (or the majority of the) voters and additionally respect the agenda of the head.

In the context of approval balloting [Kil10;KM12], an approval balloting procedure using threshold functions which is very close to our scenario