• Keine Ergebnisse gefunden

3.3 Discussion

4.1.1 The Basic Model

The attributes of the individuals are stored row-wise in ann×m-matrixM over a finite alphabetΣ. The homogeneity constraints are expressed by ap×m-matrixPover a binary alphabet {,?}, wherepdenotes the total number of allowed teams. That is, each team is represented by apattern vector{,?}m, wheremeans that homogeneity is required for the corre-sponding attribute and?means that individuals in the group may have different values for the corresponding attribute. A mapping from input rows ofMto pattern vectors ofPisconsistentif all rows that are mapped to the same pattern vector agree at the-positions. One arrives at the following basic decision problem.

BASICHOMOGENEOUSTEAMFORMATION

Input: A matrixM∈Σn×mand a homogeneity patternP∈{,?}p×m. Question: Is there a consistent mappingϕfrom input rows ofMto pattern

vectors ofP?

Example 4.1. Figure 4.1depicts the assignment of students to project teams. Consider seven students who have to apply for implementation projects that are to be realized in teams. The corresponding professor provides two sorts of projects with at most two suitable supervisors each.

Projects of the first sort comprise two implementations for which knowledge of some high-level programming language and an LP-solver is required. To work together on such a project the students must agree on the programming language as well as the LP-solver. Projects of the second sort consist of two different software implementations for a traffic monitoring system. The students are asked to test their implementations and to present their results in a collaborative talk. For testing in a real-world scenario the students should live in the same city. Clearly, for realizing the implementation and the talk they also have to agree on the programming language and the style of the slides. A solution respecting the given homogeneity pattern is given in the bottom table. Note that, for instance, there would be no solution if

4 Homogeneous Team Formation

Attributes of the students:

prog. language LP-solver location slides style

C++ CPLEX Berlin LibreOffice

Java CPLEX Saarbrücken LibreOffice Haskell Gurobi Berlin Latex Beamer

C++ CPLEX Jena Latex Beamer

C++ CPLEX Saarbrücken LibreOffice Java Gurobi Saarbrücken LibreOffice

Haskell CPLEX Berlin Latex Beamer

Homogeneity patterns of the projects:

2×LP implementation ? ? 2×Traffic monitoring ? Homogeneous teams respecting the pattern matrix:

Team 1

C++ CPLEX ? ?

C++ CPLEX ? ?

C++ CPLEX ? ?

Team 2 Java ? Saarbrücken LibreOffice Java ? Saarbrücken LibreOffice

Team 3 Haskell ? Berlin Latex Beamer

Haskell ? Berlin Latex Beamer

Figure 4.1: Example assignment of students to project teams.

there was only one traffic monitoring project but three LP implementation projects.

Starting from this basic problem variant we also study more general versions. Particularly, we allow the user to specify a lower and an upper bound for the size of each team. Furthermore, we will also extend the model such that the user may fix some costs for assigning an individual to a team and ask for solutions not exceeding some prespecified cost bound. A formal definition of the extended model follows inSubsection 4.1.2.

Relation tok-Anonymity and Related Work. Ann×m-matrixMover a fixed alphabet is said to be k-anonymous if for every rowr inMthere are at leastk−1 further rows inMthat are identical withr. The intuitive

4.1 Motivation and Model

idea which motivates this notion for data privacy is as follows: Suppose each row inMcontains data about a distinct person. Even if the table does not contain data—such as names or passport IDs—which is usually slotted under “identifying information”, it is possible—as has been remarkably illustrated using US Census data [Swe00]—that rows can be associated with specific individuals by observing unique combinations of their attributes. If the matrixMisk-anonymous for a large-enoughkthen, since there are no unique rows inM, one cannot associate a specific individual to one row of dataM[Sam01;SS98;Swe02b].

The well-studied problem of making a matrixk-anonymous by suppress-ing a minimum number of entries, that is, by replacsuppress-ing a minimum number of matrix entries with the?-symbol, is closely related to homogeneous team formation. Each group of at leastkidentical rows can be seen as homoge-neous team. Our full model can be seen as extension of this concept (see Subsection 4.1.2). We also provide a cost measure similarly to counting the number of suppressions and allow for specifying bounds on the team sizes similar to the degreekof anonymity. Additionally, we allow for specifying homogeneity patterns expressing which combination of attributes have to be identical, thus incorporating user guidance.

Fork≥3 it isNP-hard to make a given matrixk-anonymous by suppress-ing a minimum number of entries [Bon+11;MW04]. However, homogeneity in the input as well as in the solution has a (positive) effect on the compu-tational complexity of the problem [Bre+14e]. For example, the problem becomes fixed-parameter tractable for the parameter “number of different input rows” or for the parameter combination “number of different output row types” and “number of suppressions” [Bre+14e].

Our research is also related to the work of Aggarwal et al. [Agg+10] who proposed a new model of data anonymization based on clustering. While they develop several polynomial-time approximation algorithms, their basic modeling idea is—roughly—to cluster the rows of the input matrix and then to publish the “cluster centers”; importantly, it is required that each cluster contains at leastkrows, and this corresponds to thek-anonymity concept.

In Chapter 5we analyze how an adaption of the our team formation model can be used for anonymization purposes, including encouraging experimental findings.

We are not aware of any combinatorial models for homogeneous team formation in the literature. Next, we formally introduce the full model and the notation which we use in this chapter.

4 Homogeneous Team Formation