Local search for k-Set Packing

Back in 2012, when I was a post-doc in Lugano, Switzerland, I was working with Fabrizio Grandoni and Monaldo Mastrolili on a pricing problem we called Hypermatching Assignment Problem. This problem is at the same time a generalization of the Generalized Assignment Problem and of the k-Set Packing problem, which brought my attention to the approximation status of the latter, being is the subject of this post.

k-Set Packing is a basic combinatorial optimization problem, where given a universe U and a family of its subsets F, we are to find a maximum subfamily of pairwise disjoint subsets of F. A special case of 3-Set Packing is the 3-Dimensional Matching problem. A good argument showing that those two problems are indeed classic is the fact that both belong to the Karp's list of 21 NP-hard problems.

A local search routine tries to locally modify a current solution in order to get a better one. For the k-Set Packing by p-local search, we denote an algorithm, which starts with an empty set A, and tries to improve it by adding a set A_0 of at most p new sets, such that the number of sets we need to remove in order to maintain a feasible solution is strictly smaller than the number of sets we have added. Having a local search maximum, our hope is to prove that an optimum solution is not much better.

pseudocode-localsearch

It is easy to see, that a 1-local search maximum gives an inclusionwise maximum subfamily of F, which in turn implies an approximation ratio of k. It. is not hard to show, that a 2-local search maximum yields a (k+1)/2 approximation ratio. In 1989 Hurkens and Schrijver [SIDMA'89] have shown, that with growing values of p, a p-local search maximum gives (k+eps_p)/2 approximation, where eps_p is a constant depending on p converging to 0. Up to now this was the best known polynomial time approximation algorithm for both k-Set Packing and 3-Dimensional Matching.

Later in [SODA'95]  Halldorsson proved that when we consider logarithmic values of p, the approximation ratio is upper bounded by (k+2)/3, which was slightly improved in our Hypergraph Assignment Problem paper downto (k+1+eps)/3, showing that in quasipolynomial time one can break the 1.5-approximation ratio for 3-Dimensional Matching. A natural question is the following. Can we implement the O(log n)-local search in polynomial time? This brings us to the area of fixed parameter tractability, where we are aiming at a 2^O(r) poly(|F|) time algorithms. Unfortunately we have shown that the parameterized local search is W[1]-hard, even in the more relaxed permissive variant, which means that f(r)poly(|F|) time algorithm is unlikely to exist, no matter how fast growing the function f() is.

Therefore we have to modify our strategy. In our second take we try the following. First, we inspect the existing analyses of approximation ratios upper bounds to see what is the exact set of swaps, which our current solution needs to be impossible to improve with, in order for the proof to go through. Next, we want to show, that for this particular set of swaps we can implement the local search algorithm effectively.

In the analysis of Halldorsson, a notion of a conflict graph is used. Imagine a bipartite graph with sets of our current solution A on one side, sets of FA on the other, where a set of A and a set of FA are connected by an edge if they share a common element.

conflict

Note that an edge indicates that it is impossible to have both sets at the same time in a solution. The main part of the analysis of Halldorsson is showing that if (k+2)/3|A| < |OPT|, then there exists a subset X of FA of size O(log n), such that |N(X)| < |X| in the conflict graph, that is the number of sets that we need to remove after adding sets of X is strictly smaller than the size of X itself. However, when inspected more closely the set X has some structure, that is G[N[X]] is a subdivision of one of the following graphs:

subdivisions2This means that in our local search algorithm we do not have to look for all improving sets X of logarithmic size, but it is enough to consider sets X, such that G[N[X]] is of constant pathwidth! In this way by using the color coding technique of Alon, Yuster and Zwick [JACM'95] we obtain a polynomial time (k+2)/3 approximation algorithm. Unfortunatelly this is not yet enough to break the 1.5 approximation ratio for 3-Dimensional Matching, as the existing upper bound of (k+1+eps)/3 in fact uses improving sets of unbounded pathwidth. This is the main technical hurdle, overcoming which required proving lemmas relating pathwidth, average degree and girth of a graph. Finally, we have obtained a (k+1+eps)/3 polynomial time approximation algorithm for k-Set Packing, implying a 1.34-approximation for 3-Dimensional Matching.

The described results were obtained by Maxim Sviridenko & Justin Ward, published at ICALP'13, and by myself, to appear in FOCS'13 and available on arxiv. Since the papers contain significant intersection of ideas, we decided to prepare a common journal version. Personally I am very happy about the described result, because it utilizes techniques from the parameterized complexity in a nontrivial way, to obtain an improvement in a over 20 years old approximation ratio of a basic problem.

WG 2014

The call for papers for WG 2014 is out.

Let me say that I really like the WG series. Not only because WG'08 was my first conference (and now WG'14 the first one I'm in a PC), but mainly because there is always a nice bunch of papers with cute combinatorics, and you always travel back home from WG with a full sack of cool open problems to think on.

Speaking of these, I went through my private "open problem list" and found a few problems that, in my opinion, may nicely suit WG. Hey, there are still a few months till the deadline, so why not solve some of the problems and go for a trip to this lovely castle in France?

The problems are from parameterized complexity, since I mostly work in this area. To the best of my knowledge, there are currently open. On all of them I spent some significant time somewhere in the past.

Maybe a small disclaimer is in place: although I found these problems interesting and nice, the techniques to solve them may turn out to be boring, other PC members of WG 2014 may have different opinions on their importance, etc. So, in any sense you should not treat this as some promise that a solution will get into WG. I just wanted to inspire some research, and see solutions to some nice problems I thought about and didn't succeed, and that is the sole motivation for this post 🙂

Cutting short paths. In this paper the authors study (among others) the following problem: given a (directed or undirected) graph with source and sink , and integers and , cut at most edges of so that a shortest path from to is of length larger than . They show an FPT algorithm, parameterized by both and . The question is: does this problem admit a polynomial kernel with respect to this parameterization?

Constrained bipartite vertex cover. The minimum vertex cover problem in bipartite graphs is solvable in polynomial time. What about the following variant: given a bipartite graph with fixed bipartition , and integers and , find a vertex cover of the graph with at most vertices in and vertices in . This is NP-hard. There is a simple reduction rule: a vertex from of degree larger than needs to be included in a solution, and symmetrically the same holds for a vertex in of degree larger than . After this reduction is exhaustively applied, note that a solution may cover at most edges, so we have a kernel of this size. Can you do better with respect to the number of vertices in the kernel? The classical vertex cover problem in arbitrary graphs has a kernel with linear number of vertices.

Imbalance minimization. Given an undirected graph , and an ordering of its vertices, the imbalance of in this ordering equals

that is, the absolute value of the difference between the number of neighbours of before and after in the ordering. The imbalance of the ordering is the sum of the imbalances of all vertices. Here the authors prove that the problem of finding an ordering of imbalance at most , parameterized by , is FPT. Does this problem admit a polynomial kernel?

Max-leaf outbranching, parameterized by treewidth. In a directed graph, an outbranching is a subgraph that is a rooted tree, where each arc is directed downwards. In the max-leaf outbranching problem we seek for an outbranching in the given graph with maximum number of leaves. We are interested in solving this problem, when we are given a tree decomposition of of width , that is, we study treewidth DPs. Here we have shown an -time randomized algorithm, but we could not get a matching lower bound (as we did for most other problems studied there). Is the optimal base of the exponent? (Of course, assuming Strong ETH). Or maybe you can do better?

Our Approximation Algorithm Library is Gaining Momentum

As theoreticians we usually tend to be over pessimistic about practical applicability of our results. This becomes even more visible in the area of approximation algorithms. It seems to be the case that algorithms with theoretical guarantees on approximation ratios are easily beaten by meta-heuristics like Simulated Annealing. Here, for example the recent 1.39-approximation algorithm for Steiner tree problem comes to mind. The algorithm, at first glance, seems too complicated to be implemented, and it seems that even if it was implemented it cannot deliver decent results in practice. However, both of these statements are only partially true. First of all, if one has the right tools, then implementing this algorithm is not impossible - the code developed by Maciej Andrejczuk can be seen here: http://siekiera.mimuw.edu.pl:8082/#/c/59/. (It is still undergoing some reviews, but should be shortly merged into the master branch.) The code is based on the iterative rounding framework we have developed in our approximation algorithms library: http://siekiera.mimuw.edu.pl/~paal/. The algorithm, when enhanced with some speed up heuristics, is able to solve reasonable size instances. Although, more improvements and tests are planned.

The other things we have implemented using the iterative rounding framework include 2-apprixmate solution to the tree augmentation problem, as well as the +1-approximation algorithm for the degree bounded spanning tree problem by Singh and Lau. Although, the second one of them still requires adding some documentation. On the other hand, we have developed a nice framework for local search algorithms that can be used to solve: TSP, capacitated and non-capacitated facility location k-median, as well as some greedy approximation algorithms. Finally, we are currently working on a framework for primal-dual algorithms, which could be used to implement several classical approximation algorithms. So stay tuned!

3 of our papers accepted for FOCS!

It seems that the 'God of FOCS' was favorable for us again and we managed to get the following 3 papers accepted for FOCS 2013:

  • The planar directed k-Vertex-Disjoint Paths problem is fixed-parameter
    tractable, Marek Cygan, Dániel Marx, Marcin Pilipczuk and Micha? Pilipczuk, arXiv.
  • Improved approximation for 3-dimensional matching via bounded pathwidth local search, Marek Cygan, arXiv.
  • Algebraic Algorithms for b-Matching, Shortest Undirected Paths, and f-Factors, Harold N. Gabow and Piotr Sankowski, arXiv, post on the paper.

In the search for lost ark - the shortest path problem we have forgotten to solve

The shortest path problem is one of the central research problem in the algorithmic research. The main setup for this problem is to find distances from given start vertex to all other nodes in the graph. Probably, the most important results here are Dijkstra and Belmann-Ford algorithms -- both from around 1960. Dijkstra considered directed graphs with non-negative weight edges. His algorithm can be implemented to run in O(m+n log n) time using Fibonacci heaps. On the other hand, Belman-Ford solved the directed problem with negative weights in O(nm) time. You might note that for non-negative weights the shortest paths problem in undirected graph can be solved via reduction to directed case, so Dijkstra is applicable here as well. Although there exists a faster linear time solution for undirected graph with integral weights that is due to Thorup [1]. Similarly, for the case of directed graphs with negative weights there has been some progress. First, in 80s two scaling algorithms working in [2] and [3] have been given. In these algorithms one assumes that edge weights are integral with absolute value bounded by W. Second, in 2005 two algebraic algorithms working in time have been presented [4,5], where  is matrix multiplication exponent. There is even more work in between these results that I did not mention, and much more papers studying all pairs shortest paths, or special cases like planar graphs. However, we have not mentioned the undirected shortest path problem on graphs with negative weight edges yet... Is this problem actually solvable in polynomial time? Yet it is, this has been shown by Edmonds in '67 via a reduction to matchings [6]. These notes are probably lost... maybe not lost but probably hard to access. Anyway the reduction is given in Chapter 12.7 of [7]. Let us recall it We will essentially show that in order to find the distance from fixed source s to fixed sink t one needs to solve minimum weight perfect matching problem once.

Let G=(V,E) be an undirected graph, let be the edge weight function and let be the set of edges with negative weights. We will define a graph that models paths in G by almost perfect matchings. We define the split graph with the weight function in the following way

An undirected graph G and its graph are shown on the figure below.

fig5

In zigzag edges weigh -1, dashed edges weigh 1 and the remaining edges weigh 0. Vertices corresponding to negative edges of G are white squares. The far right shows a matching of weight -2, which corresponds to a shortest path between a and c. Note how a length-two path in G, say a,b,c with w(ab)?0>w(bc) and e=bc, corresponds to
a matching in such as , having the same total weight.

An important property is that we can assume . This follows since we can assume , as otherwise the set of negative edges contains a cycle.

Let us consider minimum weight perfect matchings in , then we can observe the following.

Lemma 1.
Let , let M be the minimum weight perfect matching, and let be the minimum weight almost perfect matching in that does not match . If G does not contain negative weight cycles then and the shortest path weight from u to v in G is equal to .

Hence, in order to solve the same problem as Belman and Ford did, i.e., to compute the distances from given source to all other nodes we need to run some matching algorithm O(n) times. This does not seem to be right. Even using fast implementation of Edmonds weighted matching algorithm [8] one needs time, whereas Belman-Ford algorithm works just in O(nm) time. There seem to be something lost here, or at least overlooked through the years. Searching through the literature one can find partial answers that shed some light on the structure of such shortest paths. Sebö has characterized the structure of single-source shortest paths in undirected graphs, first for graphs with ±1 edge weights [9] and then extending to general weights by reduction [10]. Equation (4.2) of [9] (for ±1-weights, plus its version achieved by reduction for arbitrary weights) characterizes the shortest paths from a fixed source in terms of how they enter and leave "level sets" determined by the distance function. However, this is just a partial answer as it does not show how the shortest path "tree" looks like, or does not give an efficient way to compute it. We write "tree", because one might observe that the shortest paths do not necessary form a tree -- as shown on the figure below.

triangle

So how do shortest paths look like? Is there some notion of shortest path tree here? If yes then is this shortest path tree of O(n) size? In our recent paper with Hal Gabow, that is available on arXiv, we give answers to these question. For undirected shortest paths we get a somewhat simple definition of a generalized shortest-path tree -- see Section 6.1 of our paper. (It seems to us that such a definition may have been overlooked due to reliance on reductions.) The generalized shortest-path tree is a combination of the standard shortest-path tree and the matching blossom tree. This is not so astonishing if you recall the above reduction to matchings in general graphs. Examining the blossom structure of the resulting graph enables us to define our generalized shortest-path tree that, like the standard shortest-path tree for directed graphs, specifies a shortest path to every vertex from a chosen source. We give a complete derivation of the existence of this shortest-path structure, as well as an algebraic algorithm to construct it in time . We also construct the structure with combinatoric algorithms, in time or . Hence, this settles the problem, as these bounds are all within logarithmic factors of the best-known bounds for constructing the directed shortest-path tree with negative weights.

[1] M. Thorup, Undirected single-source shortest paths with positive integer weights in linear time, JACM, 46(3):362--394, 1999.

[2] H. N. Gabow and R. E. Tarjan, Faster scaling algorithms for network problems, SIAM Journal on Computing, 18(5):1013--1036, 1989.

[3] A. V. Goldberg, Scaling algorithms for the shortest paths problem, SODA '93.

[4] R. Yuster and U. Zwick, Answering distance queries in directed graphs using fast matrix multiplication, FOCS'05.

[5] P. Sankowski,  Shortest Paths in Matrix Multiplication Time, ESA'05.

[6] J. Edmonds, An introduction to matching. Mimeographed notes, Engineering Summer Conference, U. Michigan, Ann Arbor, MI, 1967.

[7] R. K. Ahuja, T. L. Magnanti and J.B. Orlin, Network Flows: Theory, Algorithms, and Applications, Prentice Hall, 1993.

[8] H. N. Gabow, Data Structures for Weighted Matching and Nearest Common Ancestors with Linking, SODA'90.

[9] A. Sebö, Undirected distances and the postman-structure of graphs, J. Combin. Theory Ser. B, 49(1):10--39, 1990.

[10] A. Sebö, Potentials in Undirected Graphs and Planar Multiflows, SIAM J. Comput., 26(2):582--603, 1997.

Workshop on Kernelization 2013

Last week we organized Workshop on Kernelization (WorKer 2013) in Warsaw. The programmee included

  • an update meeting on graph cut problems,
  • two tutorials (Matroid theory and kernelization by Saket Saurabh, Stefan Kratsch and Magnus Wahlström and Kernel-size lower bounds: the evidence from complexity theory by Andrew Drucker),
  • two invited talks on techniques used outside the kernelization area which might turn out useful in kernelization (on planar graphs by Piotr Sankowski and on spanners by Seth Pettie)
  • a few short contributed talks

We hope it was fun! Below you can see some pictures.

Presentation: Monge Property and Max-Flows in Planar Graphs

If you are interested in max-flow computations in planar graphs, below is the talk I have given at WORKER 2013. It gives an overview of how to use shortest paths and Monge property to get the following three results:

  • O(n log log n) time algorithm for undirected max-flow in planar graphs [1],
  • almost linear time algorithm for all-source all-sink max-flow in undirected planar graphs, i.e., it computes Gumory-Hu tree [2],
  • almost linear time algorithm for single-source all-sink max-flow in directed planar graphs [3].

[1] Giuseppe F. Italiano, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Improved algorithms for min cut and max flow in undirected planar graphs. STOC 2011: 313-322.
[2] Glencora Borradaile, Piotr Sankowski, Christian Wulff-Nilsen: Min st-cut Oracle for Planar Graphs with Near-Linear Preprocessing Time. FOCS 2010: 601-610.
[3] Jakub Lacki, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Single Source - All Sinks Max Flows in Planar Digraphs. FOCS 2012: 599-608.
If you are interested in max-flow computations in planar graphs, below is the talk I have given at WORKER 2013. It gives an overview of how to use shortest paths and Monge property to get the following three results:

  • O(n log log n) time algorithm for undirected max-flow in planar graphs [1],
  • almost linear time algorithm for all-source all-sink max-flow in undirected planar graphs, i.e., it computes Gumory-Hu tree [2],
  • almost linear time algorithm for single-source all-sink max-flow in directed planar graphs [3].

[1] Giuseppe F. Italiano, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Improved algorithms for min cut and max flow in undirected planar graphs. STOC 2011: 313-322.
[2] Glencora Borradaile, Piotr Sankowski, Christian Wulff-Nilsen: Min st-cut Oracle for Planar Graphs with Near-Linear Preprocessing Time. FOCS 2010: 601-610.
[3] Jakub Lacki, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Single Source - All Sinks Max Flows in Planar Digraphs. FOCS 2012: 599-608.
If you are interested in max-flow computations in planar graphs, below is the talk I have given at WORKER 2013. It gives an overview of how to use shortest paths and Monge property to get the following three results:
- O(n log log n) time algorithm for undirected max-flow in planar graphs [1],
- almost linear time algorithm for all-source all-sink max-fl

[1] Giuseppe F. Italiano, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Improved algorithms for min cut and max flow in undirected planar graphs. STOC 2011: 313-322.
[2] Glencora Borradaile, Piotr Sankowski, Christian Wulff-Nilsen: Min st-cut Oracle for Planar Graphs with Near-Linear Preprocessing Time. FOCS 2010: 601-610.
[3] Jakub Lacki, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Single Source - All Sinks Max Flows in Planar Digraphs. FOCS 2012: 599-608.

If you are interested in max-flow computations in planar graphs, below is the talk I have given at WORKER 2013. It gives an overview of how to use shortest paths and Monge property to get the following three results:
- O(n log log n) time algorithm for undirected max-flow in planar graphs [1],
- almost linear time algorithm for all-source all-sink max-flow in undirected planar graphs, i.e., it computes Gumory-Hu tree [2],
- almost linear time algorithm for single-source all-sink max-low in directed planar graphs [3].

[1] Giuseppe F. Italiano, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Improved algorithms for min cut and max flow in undirected planar graphs. STOC 2011: 313-322.
[2] Glencora Borradaile, Piotr Sankowski, Christian Wulff-Nilsen: Min st-cut Oracle for Planar Graphs with Near-Linear Preprocessing Time. FOCS 2010: 601-610.
[3] Jakub Lacki, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Single Source - All Sinks Max Flows in Planar Digraphs. FOCS 2012: 599-608.

cat row 1, celldf adsf sad fdsa fsda fsdf sadad sda dsa ds 2
cat row 1, celldf adsf sad fdsa fsda fsdf sadad sda dsa ds 2
cat row 1, celldf adsf sad fdsa fsda fsdf sadad sda dsa ds 2

According to the report by Foundation for Polish Science working in Poland can be seen as hurting your scientific carrier (114873,10657310, Badacze_o_Polsce__praca_tam_moze_zaszkodzic_karierze.html">article in polish).

cat

We do not really agree with it and the above meme was meant to be self-ironic. It of course could be better - as everywhere. Anyway, if you would like to see how it is to work in Poland we have two postdoc positions in algorithms open - see the call.
According to the report by Foundation for Polish Science working in Poland can be seen as hurting your scientific carrier (114873, 10657310,Badacze_o_Polsce__praca_tam_moze_zaszkodzic_karierze.html">article in polish).

cat

We do not really agree with it and the above meme was meant to be self-ironic. It of course could be better - as everywhere. Anyway, if you would like to see how it is to work in Poland we have two postdoc positions in algorithms open - see the call.
If you are interested in max-flow computations in planar graphs, below is the talk I have given at WORKER 2013. It gives an overview of how to use shortest paths and Monge property to get the following three results:

  • O(n log log n) time algorithm for undirected max-flow in planar graphs [1],
  • almost linear time algorithm for all-source all-sink max-flow in undirected planar graphs, i.e., it computes Gumory-Hu tree [2],
  • almost linear time algorithm for single-source all-sink max-flow in directed planar graphs [3].

[1] Giuseppe F. Italiano, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Improved algorithms for min cut and max flow in undirected planar graphs. STOC 2011: 313-322.
[2] Glencora Borradaile, Piotr Sankowski, Christian Wulff-Nilsen: Min st-cut Oracle for Planar Graphs with Near-Linear Preprocessing Time. FOCS 2010: 601-610.
[3] Jakub Lacki, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen: Single Source - All Sinks Max Flows in Planar Digraphs. FOCS 2012: 599-608.