• Keine Ergebnisse gefunden

RobertGanian,IyadKanj,SebastianOrdyniak,StefanSzeider ParameterizedAlgorithmsfortheMatrixCompletionProblem TechnicalReportAC-TR-18-002

N/A
N/A
Protected

Academic year: 2022

Aktie "RobertGanian,IyadKanj,SebastianOrdyniak,StefanSzeider ParameterizedAlgorithmsfortheMatrixCompletionProblem TechnicalReportAC-TR-18-002"

Copied!
16
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Technical Report AC-TR-18-002

June 2018

Parameterized Algorithms for the Matrix Completion Problem

Robert Ganian, Iyad Kanj, Sebastian Ordyniak, Stefan Szeider

This is the authors’ copy of a paper that will appear in the proceedings of the 35th International Conference on Machine Learning (ICML’18), Stockholm, Sweden.

Proceedings of Machine Learning Research (PMLR) vol. 80, 2018 www.ac.tuwien.ac.at/tr

(2)

Parameterized Algorithms for the Matrix Completion Problem

Robert Ganian1 Iyad Kanj2 Sebastian Ordyniak3 Stefan Szeider1

Abstract

We consider two matrix completion problems, in which we are given a matrix with missing entries and the task is to complete the matrix in a way that (1) minimizes the rank, or (2) minimizes the number of distinct rows. We study the param- eterized complexity of the two aforementioned problems with respect to several parameters of in- terest, including the minimum number of matrix rows, columns, and rows plus columns needed to cover all missing entries. We obtain new algorith- mic results showing that, for the bounded domain case, both problems are fixed-parameter tractable with respect to all aforementioned parameters. We complement these results with a lower-bound re- sult for the unbounded domain case that rules out fixed-parameter tractability w.r.t. some of the pa- rameters under consideration.

1. Introduction

Problem Definition and Motivation. We consider the ma- trix completion problem, in which we are given a matrixM (over some field that we also refer to as thedomainof the matrix) with missing entries, and the goal is to complete the entries ofMso that to optimize a certain measure. There is a wealth of research on this fundamental problem (Cand`es

& Plan, 2010; Cand`es & Recht, 2009; Cand`es & Tao, 2010;

Elhamifar & Vidal, 2013; Hardt et al., 2014; Fazel, 2002;

Keshavan et al., 2010a;b; Recht, 2011; Saunderson et al., 2016) due to its ubiquitous applications in recommender systems, machine learning, sensing, computer vision, data science, and predictive analytics, among others. In these areas, the matrix completion problem naturally arises after

*Equal contribution 1Algorithms and Complexity Group, TU Wien, Vienna, Austria 2School of Computing, DePaul University, Chicago, USA 3Algorithms Group, University of Sheffield, Sheffield, UK. Correspondence to: Robert Ganian<rga- nian@gmail.com>, Iyad Kanj<ikanj@cdm.depaul.edu>, Sebas- tian Ordyniak<sordyniak@gmail.com>, Stefan Szeider<ste- fan@szeider.net>.

Proceedings of the35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 by the author(s).

observing a sample from the set of entries of a low-rank matrix, and attempting to recover the missing entries with the goal of optimizing a certain measure. In this paper, we focus our study on matrix completion with respect to two measures (considered separately): (1) minimizing the rank of the completed matrix, and (2) minimizing the number of distinct rows of the completed matrix.

The first problem we consider—matrix completion w.r.t. rank minimization—has been extensively studied, and is often referred to as the low-rank matrix completion prob- lem (Cand`es & Plan, 2010; Cand`es & Recht, 2009; Cand`es

& Tao, 2010; Hardt et al., 2014; Fazel, 2002; Keshavan et al., 2010a;b; Recht, 2011; Saunderson et al., 2016). A cel- ebrated application of this problem lies in the recommender systems area, where it is known as the Netflix problem (net).

In this user-profiling application, an entry of the input ma- trix represents the rating of a movie by a user, where some entries could be missing. The goal is to predict the missing entries so that the rank of the complete matrix is minimized.

The low-rank matrix completion problem is known to be NP-hard, even when the matrix is over the field GF(2)(i.e., each entry is 0 or 1), and the goal is to complete the ma- trix into one of rank 3 (Peeters, 1996). A significant body of work on the low-rank matrix completion problem has centered around proving that, under some feasibility as- sumptions, the matrix completion problem can be solved efficiently with high probability (Cand`es & Recht, 2009;

Recht, 2011). These feasibility assumptions are: (1) low rank; (2) incoherence; and (3) randomness (Hardt et al., 2014). Hardtet al.(2014) argue that feasibility assumption (3), which states that the subset of determined entries in the matrix is selected uniformly at random and has a large (sam- pling) density, is very demanding. In particular, they justify that in many applications, such as the Netflix problem, it is not possible to arbitrarily choose which matrix entries are determined and which are not, as those may be dictated by outside factors. The low-rank matrix completion problem also has other applications in the area of wireless sensor networks. In one such application, the goal is to reconstruct a low-dimensional geometry describing the locations of the sensors based on local distances sensed by each sensor; this problem is referred to asTRIANGULATION FROM INCOM-

PLETE DATA(Cand`es & Recht, 2009). Due to its inherent hardness, the low-rank matrix completion problem has also

TechnicalReportAC-TR-18-002

(3)

been studied with respect to various notions of approxima- tion (Cand`es & Recht, 2009; Cand`es & Tao, 2010; Frieze et al., 2004; Hardt et al., 2014; Keshavan et al., 2010a;b;

Recht, 2011).

The second problem we consider is the matrix completion problem w.r.t. minimizing the number of distinct rows. Al- though this problem has not received as much attention as low-rank-matrix completion, it certainly warrants studying.

In fact, minimizing the number of distinct rows represents a special case of theSPARSE SUBSPACE CLUSTERINGprob- lem (Elhamifar & Vidal, 2013), where the goal is to com- plete a matrix in such a way that its rows can be partitioned into the minimum number of subspaces. The problem we consider corresponds to the special case ofSPARSE SUB-

SPACE CLUSTERINGwhere the matrix is over GF(2)and the desired rank of each subspace is 1. Furthermore, one can see the relevance of this problem to the area of recom- mender systems; in this context, one seeks to complete the matrix in such a way that the profile of each user is identical to a member of a known (possibly small) group of users.

In this paper, we study the two aforementioned problems through the lens ofparameterized complexity(Downey &

Fellows, 2013). In this paradigm, one measures the com- plexity of problems not only in terms of their input sizenbut also by a certainparameterk∈N, and seeks—among other things—fixed-parameter algorithms, i.e., algorithms that run in timef(k)·nO(1)for some functionf. Problems admitting such algorithms are said to befixed-parameter tractable(or contained in the parameterized complexity classFPT). The motivation is that the parameter of choice—usually describ- ing some structural properties of the instance—can be small in some instances of interest, even when the input size is large. Therefore, by confining the combinatorial explosion to this parameter, one can obtain efficient algorithms for problem instances with a small parameter value forNP-hard problems. Problems that are not (or unlikely to be) fixed- parameter tractable can still be solvable in polynomial-time for every fixed parameter value, i.e., they can be solved in timenf(k)for some functionf. Problems of this kind are contained in the parameterized complexity classXP.

We also consider randomized versions of FPT and XP, denoted byFPTRandXPR, containing all problems that can be solved by a randomized algorithm with a run-time off(k)nO(1)andO(nf(k)), respectively, with a constant one-sided error-probability. Finally, problems that remain NP-hard for some fixed value of the parameter are hard for the parameterized complexity classparaNP. We refer to the respective textbooks for a detailed introduction to pa- rameterized complexity (Downey & Fellows, 2013; Cygan et al., 2015). Parameterized Complexity is a rapidly growing field with various applications in many areas of Computer Science, including Artificial Intelligence (B¨ackstr¨om et al., 2015; van Bevern et al., 2016; Bessiere et al., 2008; Endriss

et al., 2015; Ganian & Ordyniak, 2018; Gaspers & Szeider, 2014; Gottlob & Szeider, 2006).

Parameterizations. The parameters that we consider in this paper are: The number of (matrix) rows that cover all missing entries (row); the number of columns that cover all missing entries (col); and the minimum number of rows and columns which together cover all missing entries (comb).

Although we do discuss and provide results for the un- bounded domain case, i.e, the case that the domain (field size) is part of the input, we focus on the case when the ma- trix is over a bounded domain: This case is the most relevant from a practical perspective, and most of the related works focus on this case. It is easy to see that, when stated over any bounded domain, both problems under consideration are in FPTwhen parameterized by the number of missing entries, since an algorithm can brute-force through all possible solu- tions. On the other hand, parameterizing byrow(resp.col) is very interesting from a practical perspective, as rows (resp. columns) with missing entries represent the newly- added elements (e.g., newly-added users/movies/sensors, etc.); here, the above brute-force approach naturally fails, since the number of missing entries is no longer bounded by the parameter alone. Finally, the parameterization by combis interesting because this parameter subsumes (i.e., is smaller than) the other two parameters (i.e.,rowandcol).

In particular, any fixed-parameter algorithm w.r.t. this pa- rameter implies a fixed-parameter algorithm w.r.t. the other two parameters, but can also be efficient in cases where the number of rows or columns with missing entries is large.

Results and Techniques. We start in Section 3 by consider- ing the BOUNDEDRANKMATRIXCOMPLETIONproblem over GF(p)(denotedp-RMC), in which the goal is to com- plete the missing entries in the input matrix so that the rank of the completed matrix is at mostt, wheret∈Nis given as input. We present a (randomized) fixed-parameter algo- rithm for this problem parameterized bycomb. This result is obtained by applying a branch-and-bound algorithm com- bined with algebra techniques, allowing us to reduce the problem to a system of quadratic equations in which only few (bounded by some function of the parameter) equations contain non-linear terms. We then use a result by Miura et al.(2014) (improving an earlier result by Courtoiset al.(2002)) in combination with reduction techniques to show that solving such a system of equations is inFPTR

parameterized by the number of equations containing non- linear terms. In the case where the domain is unbounded, we show that RMC is inXPparameterized by eitherrowor coland inXPRparameterized bycomb.

In Section 4, we turn our attention to the BOUNDED

DISTINCT ROW MATRIX COMPLETION problem over both bounded domain (p-DRMC) and unbounded domain (DRMC); here, the goal is to complete the input matrix so

TechnicalReportAC-TR-18-002

(4)

that the number of distinct rows in the completed matrix is at mostt. We start by showing thatp-DRMC param- eterized bycombis fixed-parameter tractable. We obtain this result as a special case of a more general result show- ing that both DRMC andp-DRMC are fixed-parameter tractable parameterized by thetreewidth(Robertson & Sey- mour, 1986; Downey & Fellows, 2013) of the compatibility graph, i.e., the graph having one vertex for every row and an edge between two vertices if the associated rows can be made identical. This result also allows us to show that DRMC is fixed-parameter tractable parameterized byrow. Surprisingly, DRMC behaves very differently when param- eterized bycol, as we show that, for this parameterization, the problem becomesparaNP-hard.

row col comb

p-RMC FPT(Th. 3) FPT(Cor. 4) FPTR(Th. 7)

p-DRMC FPT(Th. 12) FPT(Th. 12) FPT(Th. 12) RMC XP(Cor. 5) XP(Cor. 5) XPR(Cor. 8)

DRMC FPT(Th. 13) paraNP(Th. 14) paraNP(Th. 14) Table 1.The parameterized complexity results obtained for the problemsp-RMC andp-DRMC and their unbounded domain variants RMC and DRMC w.r.t. the parametersrow,col,comb. We chart our results in Table 1. Interestingly, in the un- bounded domain case, both considered problems exhibit wildly different behaviors: While RMC admitsXPalgo- rithms regardless of whether we parameterize byrowor col, using these two parameterizations for DRMC results in the problem beingFPTandparaNP-hard, respectively.

On the other hand, in the (more studied) bounded domain case, we show that both problems are inFPT(resp.FPTR) w.r.t. all parameters under consideration. Finally, we prove that2-DRMC remainsNP-hard even if every column and row contains (1) a bounded number of missing entries, or (2) a bounded number of determined entries. This effectively rules outFPTalgorithms w.r.t. the parameters: maximum number of missing/determined entries per row or column.

2. Preliminaries

For a prime numberp, let GF(p)be a field of orderp; recall that each such field can be equivalently represented as the set of integers modulop. For positive integersiandj > i, we write[i]for the set{1,2, . . . , i}, andi : jfor the set {i, i+ 1, . . . , j}.

For anm×nmatrixM(i.e., a matrix withmrows and ncolumns), and for i ∈ [m] and j ∈ [n], M[i, j] de- notes the element in thei-th row andj-th column ofM.

Similarly, for a vector d, we writed[i] for the i-th co- ordinate of d. We write M[∗, j] for the column-vector (M[1, j],M[2, j], . . . ,M[m, j]), andM[i,∗]for therow-

vector(M[i,1],M[i,2], . . . ,M[i, n]). We will also need to refer to submatrices obtained by omitting certain rows or columns fromM. We do so by using sets of indices to specify which rows and columns the matrix contains. For instance, the matrixM[[i],∗]is the matrix consisting of the firstirows and all columns ofM, andM[2 :m,1 :n−1]

is the matrix obtained by omitting the first row and the last column fromM.

Therow-rank(resp.column-rank) of a matrixMis the max- imum number of linearly-independent rows (resp.columns) inM. It is well known that the row-rank of a matrix is equal to its column-rank, and this number is referred to as therank of the matrix. We letrk(M)anddr(M)denote the rank and the number of distinct rows of a matrixM, respectively. If Mis a matrix over GF(p), we call GF(p)thedomainofM.

Anincomplete matrixover GF(p)is a matrix which may contain not only elements from GF(p)but also the special symbol•. An entry is amissingentry if it contains•, and is a determinedentry otherwise. A (possibly incomplete)m×n matrixM0isconsistentwith anm×nmatrixMif and only if, for eachi∈[m]andj ∈[n], eitherM0[i, j] =M[i, j]

orM0[i, j] =•.

2.1. Problem Formulation

We formally define the problems under consideration below.

BOUNDEDRANKMATRIXCOMPLETION(p-RMC) Input: An incomplete matrix Mover GF(p) for a

fixed prime numberp, and an integert.

Task: Find a matrixM0consistent withMsuch that rk(M0)≤t.

BOUNDEDDISTINCTROWMATRIXCOMPLETION(p-DRMC) Input: An incomplete matrix Mover GF(p) for a

fixed prime numberp, and an integert.

Task: Find a matrixM0consistent withMsuch that dr(M0)≤t.

Aside from the problem variants wherepis a fixed prime number, we also study the case where matrix entries range over a domain that is provided as part of the input. In particu- lar, the problems RMC and DRMC are defined analogously top-RMC andp-DRMC, respectively, with the sole distinc- tion that the prime numberpis provided as part of the input.

We note that2-RMC isNP-hard even fort = 3(Peeters, 1996), and the same holds for2-DRMC (see Theorem 15).

Without loss of generality, we assume that the rows of the input matrix are pairwise distinct.

2.2. Parameterized Complexity

The classFPTconsists of all parameterized problems solv- able in timef(k)·nO(1), wherekis the parameter andnis the input size; we refer to such running time asFPT-time.

The classXPcontains all parameterized problems solvable

TechnicalReportAC-TR-18-002

(5)

in timeO(nf(k)). The following relations hold among the parameterized complexity classes:FPT⊆W[1]⊆XP.

The classparaNPis defined as the class of parameterized problems that are solvable by a non-deterministic Turing machine inFPT-time. In theparaNP-hardness proofs, we will make use of the following characterization ofparaNP- hardness (Flum & Grohe, 2006): Any parameterized prob- lem that remains NP-hard when the parameter is a con- stant isparaNP-hard. For problems inNP, it holds that XP⊆paraNP; in particular showing theparaNP-hardness of a problem rules out the existence of algorithms running in timeO(nf(k))for the problem. We also consider random- ized versions ofFPTandXP, denoted byFPTRandXPR, containing all problems that can be solved by a random- ized algorithm with a run-time off(k)nO(1)andO(nf(k)), respectively, with a constant one-sided error-probability.

2.3. Treewidth

Treewidth (Robertson & Seymour, 1986) is one of the most prominent decompositional parameters for graphs and has found numerous applications in computer science. Atree- decompositionT of a graphG = (V, E)is a pair(T, χ), whereT is a tree andχis a function that assigns each tree nodeta setχ(t)⊆V of vertices such that the following conditions hold:

(TD1) For every edgeuv∈E(G)there is a tree nodetsuch thatu, v∈χ(t).

(TD2) For every vertexv∈V(G), the set of tree nodestwith v∈χ(t)forms a non-empty subtree ofT.

The setsχ(t)are calledbagsof the decompositionT and χ(t)is the bag associated with the tree nodet. Thewidth of a tree-decomposition(T, χ)is the size of a largest bag minus1. A tree-decomposition of minimum width is called optimal. Thetreewidthof a graphG, denoted bytw(G), is the width of an optimal tree decomposition ofG. For the presentation of our dynamic programming algorithms, it is convenient to consider tree decompositions in the following normal form (Kloks, 1994): A tuple(T, χ)is anice tree decompositionof a graphGif(T, χ)is a tree decomposition ofG, the treeT is rooted at noder, and each node ofT is of one of the following four types:

1. aleaf node: a nodethaving no children and|χ(t)|= 1;

2. ajoin node: a nodethaving exactly two childrent1, t2, andχ(t) =χ(t1) =χ(t2);

3. anintroduce node: a nodethaving exactly one child t0, andχ(t) =χ(t0)∪ {v}for a nodevofG;

4. aforget node: a nodethaving exactly one childt0, and χ(t) =χ(t0)\ {v}for a nodevofG.

For convenience we will also assume thatχ(r) =∅for the rootrofT. Fort∈V(T)we denote byTtthe subtree ofT rooted attand we writeχ(Tt)for the setS

t0V(Tt)χ(t0).

Proposition 1((Kloks, 1994; Bodlaender, 1996; Bodlaen- der et al., 2016)). It is possible to compute an optimal (nice) tree-decomposition of ann-vertex graphGwith treewidth kin timekO(k3)n, and to compute a5-approximate one in time2O(k)n. Moreover, the number of nodes in the obtained tree decompositions isO(kn).

2.4. Problem Parameterizations

One advantage of the parameterized complexity paradigm is that it allows us to study the complexity of a problem w.r.t. several parameterizations of interest/relevance. To provide a concise description of the parameters under con- sideration, we introduce the following terminology: We say that a•entry at position[i, j]in an incomplete matrixMis coveredby rowiand by columnj. In this paper, we study RMC and DRMC w.r.t. the following parameterizations (see Figure 1 for illustration):

col: The minimum number of columns in the matrix Mcovering all occurrences of•inM.

row: The minimum number of rows in the matrixM covering all occurrences of•inM.

comb: The minimum value ofr+csuch that there existrrows andccolumns inMwith the property that each occurrence of•is covered one of these rows or columns.



1 1 1 0 • 1

0 0 1 0 • 1

0 • • 0 • •

1 1 0 1 0 1



Figure 1.Illustration of the parameterscol,row, andcombin an incomplete matrix. Herecol= 4,row= 3, andcomb= 2.

For instance, the aforementiond problem TRIANGULA-

TION FROM INCOMPLETE DATA, where a small number of distance-sensors are faulty, would result in matrix com- pletion instances wherecolandroware both small.

We denote the parameter under consideration in brackets af- ter the problem name (e.g., DRMC[comb]). As mentioned in Section 1, bothp-RMC andp-DRMC are trivially in FPTwhen parameterized by the number of missing entries, and hence this parameterization is not discussed further.

Given an incomplete matrixM, computing the parameter values forcolandrowis trivial. Furthermore, the parameter values satisfycomb ≤ row and comb ≤ col. We show that the parameter value forcombcan also be computed in polynomial time.

TechnicalReportAC-TR-18-002

(6)

Proposition 2. Given an incomplete matrixMover GF(p), we can compute the parameter value forcomb, along with setsRandC of total cardinalitycombcontaining the in- dices of covering rows and columns, respectively, in time O((n·m)1.5).

Proof. We begin by constructing an auxiliary bipartite graph GfromMas follows. For each rowicontaining a•, we create a vertexviinG; similarly, for each columnjcon- taining a•, we create a vertexwj. For each•that occurs at position[i, j], we add an edge betweenviandwj.

We observe that ifRcontains row indices andCcontains column indices which together cover all occurrences of•, thenX={vi|i∈R} ∪ {wj|j∈C}is a vertex cover of G. Similarly, for each vertex coverXofG, we can obtain a setR = {i | vi ∈ X}and a set C = {j | wj ∈ X} such thatRandCcover all occurrences of•inM. Hence there is a one-to-one correspondence between vertex covers inGand setsRandC which cover all•symbols inM, and in particular, the size of a minimum vertex cover in Gis equal to comb. The lemma now follows by K¨onig’s theorem and Hopcroft-Karp’s algorithm, which allow us to compute a minimum vertex cover in a bipartite graphGin timeO(|E(G)| ·p

|V(G)|).

3. Rank Minimization

In this section we present our results for BOUNDEDRANK

MATRIXCOMPLETIONunder various parameterizations.

3.1. Bounded Domain: Parameterization byrow As our first result, we present an algorithm for solving p-RMC[row]. This will serve as a gentle introduction to the techniques used in the more complex result forp- RMC[comb], and will also be used to give anXPalgorithm for RMC[row].

Theorem 3. p-RMC[row]is inFPT.

Proof. LetRbe the (minimum) set of rows that cover all occurrences of•in the input matrixM. Since the existence of a solution does not change if we permute the rows of M, we permute the rows ofMso that the rows inRhave indices1, . . . , k. We now proceed in three steps.

For the first step, we will define the notion ofsignature:

A signatureSis a tuple(I, D), whereI ⊆RandDis a mapping fromR\I to(I → GF(p)). Intuitively, a sig- natureSspecifies a subsetIofRwhich is expected to be independent inM[k+ 1 :m,∗]∪I(i.e., adding the rows inItoM[k+ 1 :m,∗]is expected to increase the rank of M[k+ 1 :m,∗]by|I|); and for each remaining row ofR, Sspecifies how that row should depend onI. The latter is carried out usingD: For each row inR\I,Dprovides a set

of coefficients expressing the dependency of that row on the rows inI. Formally, we say that a matrixM0that is com- patible with the incomplete matrixMmatchesa signature (I, D)if and only if, for each row (i.e., vector)d∈R\I, there exist coefficientsadk+1, . . . , adm ∈ GF(p)such that d=adk+1M[k+1,∗]+· · ·+admM[m,∗]+P

iID(d)(i)·i.

The first step of the algorithm branches through all possible signaturesS. Clearly, the number of distinct signatures is upper-bounded by2k·pk2.

For the second step, we fix an enumerated signatureS. The algorithm will verify whetherSisvalid,i.e., whether there exists a matrixM0compatible withMthat matchesS. To do so, the algorithm will construct a system of|R\ I| equations over vectors of sizen, and then transform this into a systemΥSof|R\I| ·nequations over GF(p)(one equation for each vector coordinate). For eachd∈R\I, ΥScontains the following variables:

◦ one variable for each coefficientadk+1, . . . , adm, and

◦ one variable for each occurrence of•in the rows ofR.

For instance, the first equation inΥS has the following form: d[1] = adk+1M[k+ 1,1] +· · ·+admM[m,1] + P

i∈ID(d)(i)·i[1], whereadk+1, . . . , admare variables, and d[1]as well as eachi[1]in the sum could be a variable or a fixed number. Crucially,ΥSis a system of at most(k·n) linearequations over GF(p)with at mostm+knvariables, and can be solved in timeO((m+kn)3)by Gaussian elim- ination. Constructing the equations takes timeO(m·n).

During the second step, the algorithm determines whether a signatureS is valid or not, and in the end, after going through all signatures, selects an arbitrary valid signature S= (I, D)with minimum|I|. For the final third step, the algorithm checks whether|I|+rk(M[k+1 :m,∗])≤t. We note that computingrk(M[k+ 1 :m,∗])can be carried out in timeO(nm1.4)(Ibarra et al., 1982). If the above inequal- ity does not hold, the algorithm rejects; otherwise it recom- putes a solution toΥSand outputs the matrixM0obtained fromMby replacing each occurrence of•at position[i, j]

by the value of the variablei[j]in the solution toΥS. The total running time isO((2k·pk2)·((m+kn)3+nm1.4)) = O(2kpk2·(m+kn)3).

To argue the correctness of the algorithm, consider a matrix M0that the algorithm outputs. Obviously,M0is consistent withM. Furthermore,M0has rank at mostt; indeed, the rank ofM[k+ 1 :m,∗]is at mostt− |I|, and every row d∈R\Ican be obtained as a linear combination of rows inM[k+ 1 :m,∗](using coefficientsadk+1, . . . , adm) andI (using coefficientsD(d)).

Conversely, assume that there exists a matrixM0 that is consistent withMand that has rank at mostt. ChooseM0

TechnicalReportAC-TR-18-002

(7)

to be of minimum rank over all matrices consistent withM.

Consider the signatureSobtained (“reverse-engineered”) fromM0as follows. First, we choose a row-basisBofM0 such that|B∩R|is minimized, and we setI =R∩B. Now, each row inR\Ican be obtained as a linear combination of rows inBand, in particular, as a linear combination of the rows inM0[k+ 1 :m,∗]andI. This can be expressed as a system of equationsΥ0, where, for each rowd∈R\I, we writeP d = adk+1M[k+ 1,∗] +· · ·+admM0[m,∗] +

iID(d)(i)·iand our variables are:adk+1, . . . , admand,

∀i∈I, D(d)(i). Let us fix an arbitrary solution toΥ0and use the values assigned to variablesD(d)(i),∀d∈R\I, i∈ I, to defineD. Observe thatS= (I, D)was chosen so that ΥSis guaranteed to have a solution.

Next, we argue that|I|+rk(M[k+ 1 :m,∗]) =rk(M0).

Indeed, assume for a contradiction that there exists a row r∈R∩Iwhich can be obtained as a linear combination of the rows inIand inM[k+ 1 :m,∗]; then, we could replace rinBby a rowr0fromM[k+1 :m,∗]which would violate the minimality of|B∩R|. So,|I|+rk(M[k+ 1 :m,∗]) = rk(M0)which means that our algorithm is guaranteed to set S as a valid branch, and hence, will either output a matrix compatible withMwhich matchesS, or a matrix compatible withMwhich matches a different signature but has the same rank asM0.

Since the row-rank of a matrixMis equal to its column- rank, the transpose ofMhas the same rank asM. Hence:

Corollary 4. p-RMC[col]is inFPT.

As a consequence of the running time of the algorithm given in the proof of Theorem 3, we obtain:

Corollary 5. RMC[row]andRMC[col]are inXP. 3.2. Bounded Domain: Parameterization bycomb In this subsection, we present a randomized fixed-parameter algorithm forp-RMC[comb] with constant one-sided error probability. Before we proceed to the algorithm, we need to introduce some basic terminology related to systems of equations. LetΥbe a system of`equations EQ1, EQ2,. . . , EQ`over GF(p); we assume that the equations are simpli- fied as much as possible. In particular, we assume that no equation contains two terms over the same set of variables such that the degree/exponent of each variable in both terms is the same. Let EQibe a linear equation inΥ, and letxbe a variable which occurs in EQi(with a non-zero coefficient).

Naturally, EQican be transformed into an equivalent equa- tion EQi,x, wherexis isolated, and we useΓi,xto denote the side of EQi,xnot containingx, i.e., EQi,xis of the form x= Γi,x. We say thatΥ0is obtained fromΥbysubstitution ofxin EQiifΥ0is the system of equations obtained by:

1. computing EQi,xand in particularΓi,xfrom EQi;

2. settingΥ0:= Υ\ {EQi}; and

3. replacingxwithΓi,xin every equation inΥ0. Observe thatΥ0has sizeO(n·`), and can also be computed in timeO(n·`), wherenis the number of variables occurring inΥ. Furthermore, any solution toΥ0can be transformed into a solution toΥin linear time, and similarly any solution toΥcan be transformed into a solution toΥ0in linear time (i.e.,Υ0andΥare equivalent). Moreover,Υ0 contains at least one fewer variable and one fewer equation thanΥ.

The following proposition is crucial for our proof, and is of independent interest.

Proposition 6. LetΥbe a system of`quadratic equations over GF(p). Then computing a solution forΥis inFPTR

parameterized by`andp, and inXPRparameterized only by`.

Proof. Letnbe the number of variables inΥ. We distin- guish two cases. Ifn≥`(`+ 3)/2, thenΥcan be solved in randomized timeO(2`n3`(logp)2)(Miura et al., 2014).

Otherwise, n < `(`+ 3)/2, and we can solve Υ by a brute-force algorithm which enumerates (all of the) at most pn< p`(`+3)/2assignments of values to the variables inΥ.

The proposition now follows by observing that the given algorithm runs in timeO(2`n3`(logp)2+p`(`+3)/2`2).

Theorem 7. p-RMC[comb]is inFPTR.

Proof. We begin by using Proposition 2 to compute the sets RandCcontaining the indices of the covering rows and columns, respectively; let|R|=rand|C|=c, and recall that the parameter value isk=r+c. Since the existence of a solution forp-RMC does not change if we permute rows and columns ofM, we permute the rows ofMso that the rows inRhave indices1, . . . , r, and subsequently, we permute the columns ofMso that the columns inC have indices1, . . . , c.

Before we proceed, let us give a high-level overview of our strategy. The core idea is to branch over signatures, which will be defined in a similar way to those in Theo- rem 3. These signatures will capture information about the dependencies among the rows inRand columns inC; one crucial difference is that for columns, we will focus only on dependencies in the submatrixM[r+ 1 :m,∗](the reason will become clear later, when we argue correctness). In each branch, we arrive at a system of equations that needs to be solved in order to determine whether the signatures are valid. Unlike Theorem 3, here the obtained system of equations will contain non-linear (but quadratic) terms, and hence solving the system is far from being trivial. Once we determine which signatures are valid, we choose one that minimizes the total rank.

TechnicalReportAC-TR-18-002

(8)

For the first step, let us define the notion of signature that will be used in this proof. AsignatureS is a tuple (IR, DR, IC, DC)where:

1. IR⊆R;

2. DRis a mapping fromR\IRto(IR→GF(p));

3. IC ⊆C; and

4. DCis a mapping fromC\ICto(IC →GF(p)).

We say that a matrixM0 compatible with the incomplete matrixMmatchesa signature(IR, DR, IC, DC)if:

◦ for each row d ∈ R\IR, there exist coefficients adr+1, . . . , adm ∈ GF(p)such thatd = adr+1M0[r+ 1,∗] +· · ·+admM0[m,∗] +P

iIRDR(d)(i)·i; and

◦ for each columnh ∈C\Ic, there exist coefficients bhc+1, . . . , bhn ∈ GF(p) such that h[r+ 1 : m] = bhc+1M0[r+ 1 :m, c] +· · ·+bhnM0[r+ 1 :m, n] + P

iICDC(h)(i)·i[r+ 1 :m].

The number of distinct signatures is upper-bounded by2r· pr2·2c·pc2 ≤2k·pk2, and the first step of the algorithm branches over all possible signaturesS. For the second step, fix an enumerated signatureS. The algorithm will verify whetherSisvalid,i.e., whether there exists a matrix M0, compatible with the incompleteM, that matchesS.

To do so, the algorithm will construct a system of|R\IR| equations over vectors of sizenand of|C\IC|equations over vectors of sizem−r, and then transform this into a systemΥSof|R\IR| ·n+|C\IC| ·(m−r)equations over GF(p)(one equation for each vector coordinate). For eachd∈R\IRScontains the following variables:

◦ one variable for eachadr+1, . . . , adm, and

◦ one variable for each occurrence of•.

For instance, the first equation inΥSfor somed∈R\IR

has the following form: d[1] = adr+1M[r+ 1,1] +· · ·+ admM[m,1] +P

i∈IRDR(d)(i)·i[1], whereadr+1, . . . , adm are variables,DR(d)(i)is a number, and all other occur- rences are either variables or numbers. Crucially, for all j > c, the equations ford[j]defined above contain only linear terms; however, forj∈[c]these equations may also contain non-linear terms (in particular,adr+1, . . . , adm are variables andM[r+ 1, j], . . . ,M[m, j]can contain•sym- bols, which correspond to variables in the equations). For z ∈ [m]and y ∈ [n], if an elementM[z, y]contains •, then we will denote the corresponding variable used in the equations asxz,y.

Next, for eachh ∈ C \ICS contains the following variables:

◦ one variable for eachbhc+1, . . . , bhn, and

◦ one variable for each occurrence of•.

For instance, the second equation inΥSfor someh∈C\IC

has the following form: h[r+ 2] = bhc+1M[r+ 2, c+ 1] +· · ·+bhnM[r+ 2, n] +P

iICDC(d)(i)·i[r+ 2], where bhc+1, . . . , bhn are variables, DC(d)(i)is a number, and all other occurrences are either variables or numbers.

Observe that all of these equations forhare linear, since the submatrixM[r+ 1 :m, c+ 1 :n]contains no•symbols.

This completes the definition of our system of equations ΥS. Recall that the only equations inΥS that may con- tain non-linear terms are those ford[j]whenj ≤ c, and in particularΥS contains at mostk2equations with non- linear terms (kequations for at mostkvectorsdinR\IR).

We will now use substitutions to simplifyΥS by remov- ing all linear equations; specifically, at each step we se- lect an arbitrary linear equation EQicontaining a variable x, apply substitution ofx in EQi to construct a new sys- tem of equations with one fewer equation, and simplify all equations in the new system. If at any point we reach a system of equations which contains an invalid equation (e.g., 2=5), thenΥSdoes not have a solution, and we dis- card the corresponding branch. Otherwise, after at most

|R\IR| ·n+|C\IC| ·(m−r)∈ O(kn+km)substitu- tions we obtain a system of at mostk2quadratic equations ΨS such that any solution toΨS can be transformed into a solution toΥS in time at mostO(kn+km). We can now apply Proposition 6 to solveΨSand markSas a valid signature ifΨShas a solution.

After all signatures have been processed, in the third—and final—step we select a valid signatureS= (I, D)that has the minimum value of|IR|+|IC|. The algorithm will then check whether|IR|+|IC|+rk(M[r+ 1 :m, c: 1 +n])≤ t. If this not the case, the algorithm rejects the instance.

Otherwise, the algorithm recomputes a solution toΥS, and outputs the matrixM0obtained fromMby replacing each occurrence of•at position[i, j]inM0byxi,j.

We now proceed to proving the correctness of the algorithm.

We do so by proving the following two claims:

Claim 1. If there exists a signatureS= (IR, DR, IC, DC) forMsuch that|IR|+|IC|+rk(M[r+1 :m, c: 1+n])≤t, then there exists a matrixM0compatible withMsuch that rk(M0)≤ |IR|+|IC|+rk(M[r+ 1 :m, c: 1 +n]). In particular, ifSis marked as valid by the algorithm, then the algorithm outputs a matrixM0satisfying the above.

Proof of Claim. SinceSis valid, the system of equations ΥShas a solution; fix one such solution. Consider the matrix M0obtained fromMby replacing each occurrence of•at position[i, j]by the value ofxi,jfrom the selected solution

TechnicalReportAC-TR-18-002

(9)

toΥS. Then the solution toΥSguarantees that each row in R\IRcan be obtained as a linear combination of rows in M0\(R\IR), and hence deleting all rows inR\IRwill result in a matrixM01such thatrk(M0) =rk(M01).

Next, consider the matrixM0[r+1 :m,∗]which is obtained by removing all rows inIRfromM01; clearly, this operation decreases the rank by at most|IR|, and hencerk(M0[r+ 1 : m,∗])≤rk(M01)≤rk(M0[r+ 1 :m,∗]) +|IR|.

Third, consider the matrixM02obtained fromM0[r+ 1 : m,∗]by removing all columns inC\IC. The solution toΥS

guarantees that each removed column can be obtained as a linear combination of columns inM0[r+1 :m,∗]\(C\IC), and hence,rk(M0[r+ 1 : m,∗) = rk(M02). Finally, we consider the matrixM0[r+ 1 :m, c+ 1 :n] =M[r+ 1 : m, c+ 1 :n]which is obtained by removing all columns inICfromM02. Clearly, removing|IC|columns decreases the rank by at most|IC|, and hencerk(M[r+ 1 :m, c+ 1 : n]) ≤ rk(M02) ≤ rk(M[r+ 1 : m, c+ 1 : n]) +|IC|. Putting the above inequalities together, we getrk(M0)≤ rk(M0[r+ 1 : m,∗]) +|IR| ≤ rk(M[r+ 1 : m, c+ 1 :

n]) +|IC|+|IR|.

Claim 2. If there exists a matrixM0compatible withM such thatrk(M0)≤ t, then there exists a valid signature S= (IR, DR, IC, DC)such that|IR|+|IC|+rk(M[r+ 1 : m, c: 1 +n])≤t.

Proof of Claim. Consider the following iterative procedure that creates a set IR from the hypothetical matrix M0. Check, for each rowr ∈ R, whetherRcan be obtained as a linear combination of all other rows inM0, which can be done by solving a system of linear equations; if this is the case, removerfromM0 and restart from any row in Rthat remains inM0. In the end, we obtain a submatrix M0RofM0which only contains those rows inRthat can- not be obtained as a linear combination of all other rows inM0R; letIRbe the set of rows inRthat remain inM0R. Furthermore, since each rowr0 ∈R\IRcan be obtained as a linear combination of rows inM0R, for each suchr0we compute a set of coefficientsτr0 that can be used to obtain r0and store those coefficients corresponding toIRinDR. For instance, if rowr0 ∈ R\IR can be obtained by an additive term containing1times rowu∈IR, then we set DR(r0) = (u7→1).

At this point, we have identifiedIRandDR. Next, we turn our attention to the submatrixM0[r+ 1 :m,∗], where we proceed similarly but for columns. In particular, for each columnc ∈ C restricted toM0[r+ 1 : m,∗], we check whetherccan be obtained as a linear combination of all other columns inM0[r+ 1 : m,∗], and if the answer is positive then we removecfromM0[r+ 1 :m,∗]and restart from any column inC that remains inM0[r+ 1 : m,∗].

This results in a new submatrixM0C ofM0[r+ 1 : m,∗],

and those columns ofCthat remain inM0Care stored inIC. Then, for each column inc0∈C\IC, we compute a set of coefficientsτc0that can be used to obtain that column and store the values of the coefficients that correspond toIC in DC, analogously as we did for the rows.

At this point, we have obtained a signatureS. The validity ofSfollows from its construction. Indeed, to solve ΥS, we can set each variablexi,j representing the value of a• symbol atM[i, j]toM0[i, j], and all other variables will capture the coefficients that were stored inτr0 andτc0for a rowr0or a columnc0, respectively.

Finally, we argue that|IR|+|IC|+rk(M[r+ 1 : m, c: 1 +n]) ≤ t. SinceM0Rwas obtained fromM0 only by deleting linearly dependent rows,rk(M0) =rk(M0R). Fur- thermore, sinceM0[r+ 1 :m,∗]can be obtained by delet- ing|IR|rows fromM0R, and all of these deleted rows are linearly independent of all other rows inM0R, we obtain rk(M0[r+ 1 : m,∗]) = rk(M0R)− |IR|. By repeating the above arguments, we see thatrk(M0[r+ 1 :m,∗]) = rk(M0C)andrk(M0[r+ 1 :m, c+ 1 :n]) =rk(M0C)−|IC|. Recall thatM0[r+1 :m, c+1 :n] =M[r+1 :m, c+1 :n].

Putting the above together, we obtainrk(M0[r+1 :m, c+1 : n]) +|IC| = rk(M0[r+ 1 : m,∗]), andrk(M0[r+ 1 : m, c+ 1 :n]) +|IC|+|IR|=rk(M0)≤t.

Finally, the total running time of the algorithm is obtained by combining the branching factor of branching over all signatures (O(2k·pk2)) with the run-time of Proposition 6 fork2many quadratic equations (O(3k2n3(logp)2+pk4)).

We obtain a running time ofO(3k2·pk4·n3).

As a consequence of the running time of the algorithm given in the proof of Theorem 7, we obtain:

Corollary 8. RMC[comb]is inXPR.

4. B

OUNDED

D

ISTINCT

R

OW

M

ATRIX

C

OMPLETION

Let(p,M, t)be an instance of DRMC. We say that two rows ofMarecompatibleif whenever the two rows differ at some entry then one of the rows has a•at that entry. The compatibility graphofM, denoted byG(M), is the undi- rected graph whose vertices correspond to the row indices ofMand in which there is an edge between two vertices if and only if their two corresponding rows are compatible.

See Figure 2 for an illustration.

We start by showing that DRMC (and thereforep-DRMC) can be reduced to the CLIQUECOVERproblem, which is defined as follows.

TechnicalReportAC-TR-18-002

(10)





1 • 0 • • 1

1 0 0 1 • •

1 0 • 1 0 1

1 0 1 1 0 •

1 0 1 1 0 0



 1 2

3

4 5

Figure 2.Illustration of a matrix and its compatibility graph. The vertex label indicates the corresponding row number.

CLIQUECOVER(CC)

Input: An undirected graphGand an integerk.

Task: Find a partition ofV(G)into at mostk cliques, or output that no such partition exists.

Lemma 9. An instanceI = (p,M, t)of DRMChas a solution if and only if the instance I0 = (G(M), t) of CC does. Moreover, a solution for I0 can obtained in polynomial-time from a solution forIand vice versa.

Proof. LetM0be a solution forIand letPbe the partition of the indices of the rows ofM0such that two row-indicesr andr0belong to the same set inPif and only ifM0[r,∗] = M0[r0,∗]. ThenPis also a solution forI0, sinceG[P]is a clique for everyP ∈ P.

Conversely, letPbe a solution forI0. We claim that there is a solutionM0forIsuch thatM0[r,∗] =M0[r0,∗]if and only ifrandr0are contained in the same set ofP. Towards showing this, consider a setP ∈ Pand a column indexcof M, and letE(M[P, c])be the set of all values occurring in M[P, c]. Then|E(M[P, c])\ {•}| ≤1, that is, all entries of M[P, c]that are not•are equal; otherwise,G[P]would not be a clique. Consequently, by replacing every•occurring inM[P, c]with the unique value inE(M[P, c])\ {•}if E(M[P, c])\{•} 6=∅, and with an arbitrary value otherwise, and by doing so for every column indexcand everyP ∈ P, we obtain the desired solutionM0forI.

Theorem 10. CCis in FPTwhen parameterized by the treewidth of the input graph.

Proof. Let I = (G, k) be an instance of CC. We will show the Theorem using a standard dynamic program- ming algorithm on a tree-decomposition ofG. Because of Proposition 1 we can assume that we are given a nice tree- decomposition(T, χ)ofGof widthω. For every nodet∈ V(T)we will compute the setR(t)of records containing all pairs(P, c), wherePis a partition ofχ(t)into cliques, i.e., for everyP ∈ Pthe graphG[P]is a clique, andcis the minimum integer such thatG[χ(Tt)]has a partitionP0into ccliques withP = {P0∩χ(t)|P0 ∈ P0} \ {∅}. Note thatIhas a solution if and only ifR(r)contains a record (∅, c)withc ≤ k, whereris the root of(T, χ). It hence

suffices to show how to compute the set of records for the four different types of nodes of a nice tree-decomposition.

Letlbe a leaf node ofT withχ(l) ={v}. ThenR(l) :=

{({{v}},1)}. Note theR(l)can be computed in constant time.

Lettbe an introduce node ofT with childt0andχ(t) = χ(t0)∪ {v}. Then R(t)can be obtained fromR(t0)as follows. For every(P0, c0)∈ R(t0)and everyP0 ∈ P0such thatG[P0∪{v}]is a clique, we add the record((P0\{P0})∪ {P0 ∪ {v}}, c0)toR(t). Moreover, for every(P0, c0) ∈ R(t0), we add the record(P0∪ {{v}}, c0+ 1)toR(t). Note thatR(t)can be computed in timeO(|R(t0)|ω2).

Lettbe a forget node ofT with childt0andχ(t)∪ {v}= χ(t0). ThenR(t)consists of all records(P, c)such thatc is the minimum integer such that there is a record(P0, c)∈ R(t0)and a setP0∈ P0withv∈P0and(P0\P0)∪(P0\ {v}) =P; if no such record exists(P, c)is not inR(t).

Note thatR(t)can be computed in timeO(|R(t0)|ω2).

Lettbe a join node with children t1and t2. ThenR(t) contains all records(P, c)such that there are integersc1

andc2withc1+c2− |P| = cand(P, c1)∈ R(t1)and (P, c2)∈ R(t2). Note thatR(t)can be computed in time O((|R(t1)|+|R(t2)|)ω)(assuming that the records are kept in an ordered manner).

The total run-time of the algorithm is then the number of nodes ofT, i.e., O(ω|V(G)|), times the maximum time required at any of the four types of nodes, i.e.,O(|R(t)|ω2), which because |R(t)| ≤ ω! is at mostO(ω!ω3|V(G)|).

Note that the above theorem also implies that the well- known COLORINGproblem isFPTparameterized by the treewidth of the complement of the input graph. The theo- rem below follows immediately from Lemmas 9 and 10.

Theorem 11. DRMC and p-DRMCare in FPT when parameterized by the treewidth of the compatibility graph.

4.1.p-DRMC

Theorem 12. p-DRMC[comb]is inFPT.

Proof. Let(M, t)be an instance ofp-DRMC, and letk be the parametercomb. By Proposition 2, we can compute a setRof rows and a set C of columns, where|R∪ C| ≤ k, and such that every occurrence of • inMis either contained in a row or column inR∪C. LetR andCbe the set of rows and columns ofM, respectively.

LetP be the unique partition of R\R such that two rowsrandr0 belong to the same set inP if and only if they are identical on all columns inC\C. Then|P| ≤ (p+ 1)k, for everyP ∈ P, since two rows inP can differ

TechnicalReportAC-TR-18-002

(11)

on at most|C| ≤kentries, each having(p+ 1)values to be chosen from. Moreover, any two rows inR\Rthat are not contained in the same set inPare not compatible, which implies that they appear in different components of G(M)\Rand hence the set of vertices in every component ofG(M)\Ris a subset ofP, for someP ∈ P. It is now straightforward to show thattw(G(M))≤k+(p+1)k, and hence,tw(G(M))is bounded by a function of the parameter k. Towards showing this consider the tree-decomposition (T, χ)forG(M), whereT is a path containing one node tP withχ(tP) =R∪P for everyP ∈ P. Then(T, χ)is tree-decomposition of widthk+ (p+ 1)k−1forG(M).

The theorem now follows from Theorem 11.

4.2. DRMC

The proof of the following theorem is very similar to the proof of Theorem 12, i.e., we mainly use the observation that the parameterrowis also a bound on the treewidth of the compatibility graph and then apply Theorem 11.

Theorem 13. DRMC[row]is inFPT.

Proof. Let(p,M, t)be an instance of DRMC, letkbe the parameterrowand letR?be a set of rows with|R?| ≤k covering all occurrences of• in M. Then G(M)\R?

is an independent set since any two distinct rows without

•are not compatible. It is now straightforward to show that tw(G(M)) ≤ k and hence bounded by a function of our parameterk. Towards showing this the following tree-decomposition(T, χ)for G(M), whereT is a path containing one nodetr withχ(tr) = R?∪ {r}for every r∈R\R?. Then(T, χ)is tree-decomposition of widthk forG(M). The theorem now follows from Theorem 11.

For our remaining hardness proofs we will make use of the following problem.

PARTITIONINGINTOTRIANGLES(PIT) Input: A graphG.

Task: Is there a partition P of V(G) into tri- angles, i.e.,G[P]is a triangle for every P ∈ P?

We will often use the following easy observation.

Observation 1. A graphGthat does not contain a clique with four vertices has a partition into triangles if and only if it has a partition into at most|V(G)|/3cliques.

Theorem 14. DRMC[col]isparaNP-hard.

Proof. We will reduce from the following variant of3-SAT, which isNP-complete (Berman et al., 2003).

3-SATISFIABILITY-2(3-SAT-2)

Input: A propositional formulaφin conjunctive normal form such that (1) every clause of φhas exactly three distinct literals and (2) every literal occurs in exactly two clauses.

Task: Isφsatisfiable?

To make our reduction easier to follow, we will divide the reduction into two steps. Given an instance (formula)φof 3-SAT-2, we will first construct an equivalent instanceGof PIT with the additional property thatGdoes not contain a clique on four vertices. We note that similar reductions from variants of the satisfiability problem to PIT are known (and hence our first step does not show anything new for PIT);

however, our reduction is specifically designed to simplify the second step, in which we will construct an instance (M,|V(G)|/3)of DRMC such thatG(M)is isomorphic toGandMhas only seven columns. By Observation 1 and Lemma 9, this proves the theorem since(M,|V(G)|/3)has a solution if and only ifφdoes.

Letφbe an instance of 3-SAT-2 with variablesx1, . . . , xn

and clausesC1, . . . , Cm. We first construct the instance Gof PIT such that G does not contain a clique of size four. For every variablexi of φ, letG(xi)be the graph with verticesx1i,x2i,x¯1i,x¯2i,xiand edges forming a triangle on the verticesx1i,x2i, andxias well as a triangle on the vertices x¯1i,x¯2i, and xi. Moreover, for every clause Cj

with literalslj,1,lj,2, andlj,3, letG(Cj)be the graph with verticesl1j,1,l2j,1,lj,21 ,l3j,2,l1j,3,lj,32 ,h1j, andh2j and edges betweenl1j,randl2j,rfor everyr∈ {1,2,3}as well as edges forming a complete bipartite graph between{h1j, h2j}and all other vertices ofG(Cj). Letf : [m]×[3]→ {xoi,x¯oi |1≤ i≤n∧1≤o≤2}be any bijective function such that for everyjandrwith1≤j≤mand1≤r≤3, it holds that:

Iff(j, r) =xoi (for someiando), thenxiis ther-th literal ofCj; and iff(j, r) = ¯xoi, thenx¯iis ther-th literal ofCj. Figures 3 and 4 illustrate the gadgetsG(xi)andG(Cj).

The graphG is obtained from the disjoint union of the graphsG(x1), . . . , G(xn), G(C1), . . . , G(Cm)after apply- ing the following modifications:

◦ For everyjandrwith1 ≤ j ≤ mand1 ≤ r ≤ 3 add edges forming a triangle on the verticesl1j,r,l2j,r, f(j, r).

◦ for everyiwith1 ≤ i ≤ 2n−m, add the vertices g1i, gi2and an edge betweengi1andgi2. Finally we add edges forming a complete bipartite graph between all vertices in{goi |1≤i≤2n−m∧1≤o≤2}and all vertices in{hoi |1≤i≤n∧1≤o≤2}. This completes the construction ofG. The following claim concludes the first step of our reduction.

TechnicalReportAC-TR-18-002

(12)

xi(i,•,0,0,0,0,0)

x2i(i,1,•,0,•,0,0) x1i(i,1,•,•,0,0,0)

¯

x2i(i,0,•,0,•,0,0)

¯

x1i(i,0,•,•,0,0,0) Figure 3.An illustration of the gadgetG(xi)introduced in the reduction of Theorem 14. The label of each vertexvindicates the row vectorR(v).

h1j(•,•, j,1,1,1,•) h2j(•,•, j,1,1,2,•)

lj,21 (5,1, j,•,1,•,0) lj,22 (5,1, j,•,1,•,0) lj,12 (4,1, j,1,•,•,0) lj,11 (4,1, j,1,•,•,0)

lj,31 (6,0, j,1,•,•,0) lj,32 (6,0, j,1,•,•,0) Figure 4.An illustration of the gadgetG(Cj)introduced in the reduction of Theorem 14. The label of each vertexvindicates the row vectorR(v); here we assume thatf(j,1) =x14,f(j,2) =x25, andf(j,3) = ¯x16.

Claim 3. φis satisfiable if and only ifGhas a partition into triangles. Moreover,Gdoes not contain a clique of size four.

Proof. We first show thatGdoes not contain a clique of size four by showing that the neighborhood of any vertex in Gdoes not contain a triangle.

◦ Ifv=xifor someiwith1≤i≤n, thenNG(v) = {x1i, x2i,x¯1i,x¯2i}and does not contain a triangle.

◦ Ifv= xoi for someiandowith1≤ i≤nand1≤ o ≤ 2, thenNG(v) = {xi, xo−1 mod 2+1

i , l1j,r, lj,r2 }, wheref−1(xoi) = (j, r), and does not contain a trian- gle.

◦ The case forv= ¯xoi foriandoas above is analogous.

◦ Ifv = loj,r for somej, r, andowith1 ≤ j ≤ m, 1 ≤ r ≤ 3, and 1 ≤ o ≤ 2, then NG(v) = {lo−1 mod 2+1

j,r , f(j, r), h1j, h2j}and does not contain a triangle.

◦ Ifv = hoj for somej and owith 1 ≤ j ≤ mand 1≤ o≤ 2, thenNG(v) = {loj,r |1≤ r≤3∧1≤ o≤2} ∪ {gj10, gj20 |1≤j0≤2n−m}and does not contain a triangle.

◦ Ifv = gjofor somej andowith1 ≤ j ≤ 2n−m and1 ≤ o ≤ 2, thenNG(v) = {goj1 mod 2+1} ∪

{hoj00 | 1 ≤ j0 ≤ m∧1 ≤ o0 ≤ 2}and does not contain a triangle.

We now show thatφis satisfiable if and only ifGhas a par- tition into triangles. Towards showing the forward direction letτ be a satisfying assignment forφ. Then a partitionPof Ginto triangles contains the following triangles:

(1) for everyiwith1 ≤i ≤nthe trianglexi,x1i,x2i if τ(xi) = 0and the trianglexi,x¯1i,x¯2i, otherwise, (2) for everyjwith1≤j≤mand everyrwith1≤r≤

3such that ther-th literal ofCjis satisfied byτ, the trianglel1j,r, l2j,r, f(j, r),

(3) for everyjwith1≤j≤mand everyrwith1≤r≤ 3such that ther-th literal ofCj is not satisfied byτ, the trianglel1j,r, l2j,r, hoj, whereo∈ {1,2}and ther-th literal ofCjis theo-th literal inCjthat is not satisfied byτ; note that this is always possible becauseCjhas at most two literals that are not satisfied byτ, (4) LetAbe the subset of{hoi |1≤i≤n∧1≤o≤2}

containing allhoi that are not yet part of a triangle, i.e., that are not part of a triangle added in (3). Then

|A| = 2n−m and it is hence possible to add the following triangles, i.e., for everyv ∈ Aa triangle containingvand the two verticesg1pandgp2for somep with1≤p≤2n−m.

Towards showing the reverse direction letPbe a partition ofV(G)into|V(G)|/3triangles. ThenPsatisfies:

(A1) For everyiwith1 ≤ i ≤ n,P either contains the triangle{xi, x1i, x2i}or the triangle{xi,x¯1i,x¯2i}.

(A2) For every j with 1 ≤ j ≤ m, P there is an r with 1 ≤ r ≤ 3 such thatP contains the triangle {lj,r1 , l2j,r, f(j, r)}.

(A1) follows because these are the only two triangles inG containingxifor everyiwith1≤i≤n. Moreover, (A2) follows because for everyjandrwith1 ≤ j ≤ mand 1 ≤ r ≤ 3there are only three triangles containing one of the verticesl1j,randl2j,r, i.e., the triangles{l1j,r, l2j,r, h1j}, {l1j,r, l2j,r, h2j}, and{lj,r1 , l2j,r, f(j, r)}. (A2) now follows becausePcan contain at most two triangles containing one ofh1i andh2i. But then the assignmentτsettingτ(xi) = 1 for every i with 1 ≤ i ≤ nif and only if P contains the triangle{xi,x¯1i,x¯2i}is a satisfying assignment forφ, because of (A2).

We will now proceed to the second (and final) step of our reduction, i.e., we will construct an instance(M,|V(G)|/3) of DRMC such that: G(M)is isomorphic toG andM

TechnicalReportAC-TR-18-002

Referenzen

ÄHNLICHE DOKUMENTE

[r]

In fact, the standard encoding for treewidth [16] is based on the characterization of treewidth in terms of elimination orderings, which are linear orderings of the vertices of

We implemented the SAT-encoding for treecut width and the two SAT-encodings for treedepth and evaluated them on var- ious benchmark instances; for comparison we also computed

Our result for CPS[failure cover number] in fact also shows fixed-parameter tractability of the problem for an even more general parameter than success cover number: the treewidth

For our initial lower bounds, we consider the restriction of CSC[∅] over GF(2) where k = 0; that is, there are no missing entries, and the problem merely asks for a partitioning of

That is, the goal for I N -C LUSTERING -C OMPLETION is to complete the missing entries so as to enable a partitioning of the result- ing (complete) set into at most k clusters such

Klausur zur Linearen Algebra II Bergische Universit¨ at Wuppertal.. Wintersemester

The first subsection shows that the estimate based on the inclusion algorithm, namely the matricial matrix cube algorithm of Subsection 4.5, is essentially identical to that obtained