The subgraph isomorphism problem asks whether a graph GG has a subgraph GGG'\subset G that is isomorphmic to a graph PP. So basically you have the picture on the box of a puzzle (GG) and want to know where a particular piece (PP) fits, if at all. It is NP-complete because Hamiltonian cycle is a special case.

In 1976 Ullman proposed a backtracking algorithm for this problem. The writeup in the original paper is hard to follow because the pseudo-code doesn’t use functions or even loops. Maybe those weren’t invented back then.1

The basic algorithm

It is possible to encode a subgraph isomorphism as a |VP|×|VG||V_P|\times |V_G| matrix MM in which each row contains exactly one 1 and each column contains at most one 1. We set mijm_{ij} to 1 iff vjGv_j\in G corresponds to viPv_i\in P in the isomorphism. Then P=M(MG)TP=M(MG)^T, where I use PP and GG to stand for the adjacency matrices. If we aren’t looking for induced subgraphs, PM(MG)TP\leq M(MG)^T (componentwise), i.e. the subgraph of GG selected by MM might contain additional edges not present in PP.

The algorithm works by systematically enumerating possible matrices MM and checking whether they actually encode an isomorphism.

We start by setting up a |VP|×|VG||V_P|\times |V_G| matrix M0M^0 that contains a 1 at (i,j)(i,j) if it is possible that vivjv_i \sim v_j in some subgraph isomorphism. For now, we only use the degree as a criterion, i.e. we can map viv_i to vjv_j if the latter has enough neighbors: mij0=1deg(vi)deg(vj)m_{ij}^0 = 1 \Leftrightarrow \mbox{deg}(v_i) \leq \mbox{deg}(v_j). If we were cleverer we could remove more 1’s and reduce our runtime.

Now all we need to do is try all matrices MM that can be obtained from M0M^0 by removing all but one 1 from each row while having at most one 1 in each column. We do that recursively.

recurse(used_columns, cur_row, G, P, M)
    if cur_row = num_rows(M)
        if  M is an isomorphism:
            output yes and end the algorithm

    M' = M
    prune(M')

    for all unused columns c
        set column c in M' to 1 and other columns to 0
        mark c as used
        recurse(used_column, cur_row+1, G, P, M')
        mark c as unused

    output no

Making the MM' copy of MM isn’t really necessary, but it will become so once we implement our pruning procedure.

Pruning

We want to (safely) change at least some of the 1’s in our matrix to 0’s to reduce the computation time. For that, we use a simple observation. If some pVPp\in V_P has neighbours p1,,plPp_1,\ldots, p_l \in P, and we map it to some gVGg \in V_G, then we’d better also map p1,,plp_1,\ldots,p_l to neigbors of gg.

Remember that a 1 at (i,j)(i,j) in MM means that we still think that viPv_i\in P can correspond to vjGv_j\in G. But if we already found out that there is a neighbour of viPv_i\in P can’t be mapped to any neigbour of vjGv_j\in G clearly the 1 at (i,j)(i,j) is wrong and we can change it safely to a 0. This change might make more mappings impossible, so we iterate this check until nothing can be changed. If we remove all 1’s from a row during this refinement, we can stop the whole process, since MM can’t be completed to an isomorphism anymore.

do
    for all (i,j) where M is 1
        for all neighbors x of vi in P
            if there is no neighbor y of vj s.t. M(x,y)=1
                M(i,j)=0
while M was changed

Now the effectiveness of our pruning procedure depends on the order of the vertices. The earlier in the recursion we find a 1 that can be changed to a 0, the better. Thus it is a good a idea to order the vertices such that high degree vertices of PP are first.

Clever Implementation

This algorithm seems rather costly, since we do a matrix multplication for every leaf in the recursion tree and manipulate the matrix quite a lot in between. However, note that we’re dealing with boolean matrices here. It’s a good idea to encode them as bit vectors. Then multiplication can be done efficiently using bit-twiddling, even if we still use the naive O(n3n^3) algorithm. Similarly setting a column to 1 and the other columns to 0 can be done using bit-twiddling and finding viable neighbors during the pruning step is fast too. Using these implementation tricks we can speed up the naive algorithm by some largish constant factor, depending on the word size of the CPU and whether it supports SSE or similar vector operations.


  1. Clifford Wolf says: I think those were invented in the late 60’s by Dijkstra (“Structured Programming” by E.W.Dijkstra in Nato Science Committee - Software Engineering Techniques in April 1970 and “GO TO Statement Considered Harmful” by E.W.Dijkstra in CACM V 11 No. 3 in March 1968). But afaik the concept of programming using block structures was unknown to most people in computer science before Kernighan and Plauger’s 1974 book “The Elements of Programming Style” and probably Nassi and Shneiderman’s 1973 paper “Flowchart techniques for structured programming”. So at least it was a rather new idea in 1976.


CC-BY-SA Adrian Neumann (PGP Key A0A8BC98)

adriann.github.io

RSS