• Keine Ergebnisse gefunden

3. Visual Analysis of Weighted Directed Graphs 45

3.5. Visual Analysis of Graph Motifs

3.5.2. Graph Motifs

The algorithmic determination of motif frequencies (i.e. maximum number of occurencies of a motif in a graph) for all possible motifs of a certain size is a NP-hard problem [GK07]. Schreiber et al. [SS04] define three concepts for determination of pattern frequency: Using conceptF1, all occurrences of a pattern are counted. In conceptF2, only motifs with non-overlapping edges are counted. ConceptF3restrictsF2with non-reuse of nodes in counting. ConceptF1finds all possible matches of a motif, thereby it shows a complete overview of patterns in the graph. Therefore, we use this concept in our work.

For finding motifs, in general, exact search is preferred [SS05,WR06,MSOI02]. However, in many ap-proaches heuristics are used in order to accelerate the analysis [SS04,Sch08,Wer06,GK07]. These heuristics are developed for finding all motifs of a certain size. In our case, we concentrate on one selected (predefined or user-defined) motif. Therefore, we follow approach from [GK07] for finding motifs.

In our work, we have implemented specific search procedures for the predefined motifs in order to reduce their computational time. They use motif structure for search and apply the symmetry breaking rules for avoiding isomorphic matching as proposed in [GK07] (see Algorithm 3.5.2.1 for an example feed-back search using notation presented below). Such adjustments are not possible for user-defined motifs that are not included in the set of predefined motifs, as their structure is unknown in advance. In the following, we therefore present a modified algorithm for finding user-defined motifs based on the algorithm of Grochow and Kellis [GK07].

Algorithm 3.5.2.1FINDFEEDBACKMOTIFS(G)

Input: G= (VG,EG)– the searched graph, all verticesvG∈VGhave unique identifiersID(vG)∈ {1, . . . ,n},n=

|VG|

Output: M–all matches of feed-back motif found in graphG.M=SMi= (Vi,Ei),Vi⊆VG,Ei⊆EG M← ∅

for allvG1 ∈VGdo

for all vG2 ∈(Successors(vG1)−vG1)do

ifID(vG2 >ID(vG1)then{Symmetry breaking}

for allvG3 ∈(Successors(vG2)−vG2)do

ifID(vG3 >ID(vG1) then{Symmetry breaking}

eG1,2←(gG1,gG2) eG2,3←(gG2,gG3) eG3,1←(gG3,gG1)

M←M∪({vG1,vG2,vG3},{eG1,2,eG2,3,eG3,1} end if

end for end if end for end for return M

The algorithm of Grochow et al. [GK07] describes motif search in undirected graphs. For finding motifs in directed graphs, this algorithm is performed first and then the found matches are divided according to their isomorphic types. This part of the algorithm is however not described in more detail. Using motif search first in undirected algorithms increases the need for additional discrimination of found motifs according to the edge direction. For example, in undirected case, feed-forward and feed-back motifs are not uniquely distinguished.

So more possible matches need to be checked for the match. We overcome this drawback by directly checking for edge direction during the algorithm iterations leading to lower number of matches that need to be checked for

3.5. Visual Analysis of Graph Motifs

isomorphism at the end of the procedure and earlier termination of the iterations owing to stricter checking crite-ria for match of the motif. Moreover, these direction-preserving checks also include checking for bi-directional edges, which is not possible in the algorithm of Grochow et al. [GK07]. In practice, checking for neighbors and degrees in the original algorithms has been replaced by checking for children and parent relationships and in-and out-degrees. Additional improvements include 1) sorting of motif vertices only once at the beginning of the procedure not in each iteration, 2) omitting of the checks for non-neighboring nodes in the ISOMORPHICEXTEN

-SIONS(see Algorithm3.5.2.3), as they do not lead to new restrictions, and 3) speeding up the finding of new node matches by checking only those nodes that are neighbors to the already matched node being examined (not all matched nodes) and by checking for support of the new match. These changes lead to significant improvement in computational time. The improvements depend on the structure and size of the graph. For example, for finding feed-backward motifs in a graph with ca. 105,000 nodes and ca. 124,000 edges (see Section3.7.2), our procedure is ca. 10 times faster using comparable implementation of the algorihtm.

The new algorithm contains the procedures described below (see also Algorithms 3.5.2.2to3.5.2.4). The main procedure is FINDSUBGRAPHINSTANCES(see Algorithm3.5.2.2). In these algorithms, if possible, we use the same notation as in [GK07] in order to ensure an easy comparability of the two algorithms. The algorithm searches for all matchesMof motif graphH= (VH,EH)found in graphG= (VG,EG).VHandVGrepresent the set of vertices of graphHandG.EHandEGare sets of edges of graphsHandG.M=SMi= (Vi,Ei),Vi⊆VG, Ei⊆EG,Vi are determined by the function fVi :VH→VG, andEi by fEi :EH →EG. Note that superscript denotes the graph, whose elements are referenced to. In the algorithms,mH,dmH, andmH,dmHdenote vertices of graph H andGrespectively. DVH⊆VH represents the domain of function fV andDHE ⊆EH represents the domain of function fE. Calling of a procedure is represented by small capital letters, e.g., CALLPROCEDURE.

• FINDSUBGRAPHINSTANCES(H,G): The main procedure for finding all matches of the motifHin graph G. It follows Grochow et al. [GK07]. All nodesgG∈VGare iterated trying to be matched them one of nodeshH∈H. At the end of each iteration, this node and its adjacent edges are removed from the graph (see Algorithm3.5.2.2).

• FINDAUTOMORPHISMS(H) the procedure finds all authomorifisms of the motifH. For finding automor-phisms, FINDSUBGRAPHINSTANCES(H,H) is called without using symmetry breaking parts.

• FINDSYMMETRYBREAKINGCONDITIONS(H,AutH) The procedure ceates symmetry breaking conditions for the motif graphHgiven all its authomorphismsAH. The procedure follows the same procedure pro-posed by Grochow et al. [GK07]. We refer to this publication for details.

• ORDERNODES(G): A procedure for sorting the nodesVGof graphGin increasing order using criteria, which take into consideration the in- and out-degrees of nodes (in contrast to using total degrees as in Grochow et al. [GK07]). The checks start with comparing the larger of out- and in-degree. In case of maximum degree equality, the other out-/in-degrees are compared. If they are equal, the largest out- and in-degree of all neighbors of the two vertices are used for comparing them in the same way.

• NODEGCANSUPPORTNODEH(hH,gG,H,G): The procedure checks whether the nodehHcan support the nodegG. The test is successful, if both the out-degree and in-degree of nodegGare larger or equal to the corresponding out-/in-degree of nodehH.

• ISOMORPHICEXTENSIONS(fV,H,G,[C]): A recursive procedure finding new motif matches via adding new node matches f(mH) =mGto the already matched motif nodes in fV. The procedure is terminated when all motif nodesVH have a match in graphG(see Algorithm 3.5.2.3). ParameterCis used when symmetry breaking conditions are to be applied.

• TESTFORFULLMATCH(fV,H,G): Procedure testing if the matched nodes, if all vertices ofHhave corre-sponding vertices and edges inG(see Algorithm3.5.2.4).

• GETMOSTCONSTRAINEDNODENOTIND(fV,H): Procedure for finding new nodemH∈/DH from motif graphHto be matched in graphGas the most constrained nodemH∈/DHfrom a set of all neighbors of the already matched nodesd∈DH,DH⊆VH. The procedure follows Grochow et al. [GK07]. Additionally, a nodedmH∈DHis found, which is the corresponding neighbor ofmH∈/DH.

• GETPOSSIBLEMATCHESOFMING(dmH,mH,dmG,fV,H,G): The procedure finds all possible matching nodesmGof nodemH in graphG. The difference to Grochow et al. [GK07] is the iteration part. They iterate over all nodes neighboring to all nodes inDthat are not inD, not just those that are neighbors to the already matched nodedmGthat is currently being matched. We also differentiate according to the type of relation ofmanddm, whether we search for successors, predecessors or both (in case of bi-directional relationship). We additionally check for support of the new match (see Algorithm3.5.2.5).

• TESTPOSSIBLEMATCH(mH,mG,fV,H,G,[C]): Procedure testing whether the found possible matching nodemG∈Gof nodemH∈Husing GETPOSSIBLEMATCHESOFMING obeys the neighboring and pos-sibly also symmetry breaking constraints. In comparison to Grochow et al. [GK07], successor and prede-cessor nodes (not neighbors in general) are tested independently. Testing for non-neighbors is omitted (see Algorithm3.5.2.6).

Algorithm 3.5.2.2FINDSUBGRAPHINSTANCES(H,G)

Input: H= (VH,EH)– the motif graph,G= (VG,EG)– the searched graph

Output: M–all matches of motifHfound inG.M=SMi= (Vi,Ei),Vi⊆VG,Ei⊆EG,Viare determined by the function fVi :VH→VG, andEiby fEi :EH→EG

M← ∅ DVH← ∅ DHE← ∅

[AutH←FINDAUTOMORPHISMS(H), when using symmetry breaking]

[C←FINDSYMMETRYBREAKINGCONDITIONS(H,AutH) when using symmetry breaking]

VG←ORDERNODES(G) VH←ORDERNODES(H) for allgG∈VGdo

for all hH∈VH such that NODEGCANSUPPORTNODEH(hH,gG,H,G)do fV(hH) =gG,DH=hH

M←M∪ISOMORPHICEXTENSIONS(fV,H,G,C) end for

EG←EG− {eG(gG)}{remove all edges incident to vertexgGfromEG} VG←VG− {gG}{remove vertexgGfromVG}

end for return M

3.5. Visual Analysis of Graph Motifs

Algorithm 3.5.2.3ISOMORPHICEXTENSIONS(fV,H,G,C)

Input: H= (VH,EH),G= (VG,EG), fV :DHV →RVG,DHV ⊆VH,RGV⊆VG,Crefers to the symmetry breaking conditions to be satisfied

Output: M– a set of matches,M= (fV,fE), where fE:DE→RE,DHE ⊆EH {RGE⊆EG} M← ∅

(c,fE)← TESTFORFULLMATCH(fV,H,G) ifc=trueand(fV,fE)∈/M then

M←(fV,fE) else if|D|<|VH|then

{dmH,mH} ←GETMOSTCONSTRAINEDNODENOTIND(fV,H) if∃mthen

dmG← fV(dmH)

RG←GETPOSSIBLEMATCHESOFMING(dmH,mH,dmG,fV,H,G),RG⊆VG ifRG6=∅then

for allmG∈RGdo

ifTESTPOSSIBLEMATCH(mH,mG,fV,H,G,C) is truethen fV ←fV∪f(mH) =mG)

ISOMORPHICEXTENSIONS(fV,H,G) end if

end for end if end if end if return M

Algorithm 3.5.2.4TESTFORFULLMATCH(fV,H,G)

Input: H= (VH,EH),G= (VG,EG), fV:DHV →RVG,DVH⊆VH,RVG⊆VG

Output: (c,fE)–cand a set of edge matches fE.cis true if a match has been found, elsecis false, fE:DHE → RGE,DHE ⊆EH,RGE⊆EG

fE=∅

if|DHV|<|VH|then return (false,∅) else if|DHV|=|VH|then

for alleH= (sH1,sH2)∈Hdo if∃eG= (fV(sH1),fV(sH2))then

fE←fE∪ {eG=f(eH)}) else

return (false,∅) end if

end for

if|RGE|=|EH|then return (true,fE) end if

end if

Algorithm 3.5.2.5GETPOSSIBLEMATCHESOFMING(dmH,mH,dmG,fV,H,G)

Input: H= (VH,EH),G= (VG,EG),fV :DVH→RVG,DV ⊆VH,RVG⊆VG, nodesdmH,mH,dmG Output: RG⊆VG

RG← ∅

if∃eH= (mH,dmH)and∃eH= (dmH,mH)then

for alld∈((Successors(dmG)∩Predecessors(dmG))−RVG)do ifNODEGCANSUPPORTNODEH(mH,d,H,G)then

RG←RG∪ {d} end if

end for

else if∃eH= (mH,dmH)then

for alld∈(Successors(dmG)−RGV)do

ifNODEGCANSUPPORTNODEH(mH,d,H,G)then RG←RG∪ {d}

end if end for

else{∃eH= (dmH,mH)}

for alld∈(Predecessors(dmG)−RV)do

ifNODEGCANSUPPORTNODEH(mH,d,H,G)then RG←RG∪ {d}

end if end for end if return RG

Algorithm 3.5.2.6TESTPOSSIBLEMATCH(mH,mG,fV,H,G,C)

Input: mGa possible match ofmHinG,H= (VH,EH),G= (VG,EG), fV:DV→RV,DVH⊆VH,RGV⊆VG,C set of symmetry breaking rules

Output: true, ifmGis a match ofmH, else false for allsuccessor∈Successors(m)do

ifsuccessor∈DHV and fV(successor)∈/Successors(mG)then return false

end if end for

for allpredecessor∈Predecessors(m)do

ifpredecessor∈DHV and fV(predecessor)∈/Predecessors(mG)then return false

end if end for ifC6=∅then

return result of the check for symmetry breaking usingCfor the new nodemG(according to [GK07]) else

return true end if

3.5. Visual Analysis of Graph Motifs