Library of Invariants - Proof Architecture

4.4 Proof Architecture

4.4.1 Library of Invariants

In the previous section we have described the proof architecture, which enables us to establish invariants of the depth-first search algorithm. In this section, we show how this architecture is put to use.

Based on the extensive state given in Section 4.3, we provide a variety of invariants which use the information in the state at different levels of detail. Note that these invariants do not depend on the extension part of the state, and thus can be proven independently of the hook functions, which only update the extension part. Further note that we present them as they occur in the locale DFS_invar, which fixes the state and assumes that the most specific invariant holds.

Due to how maps work in Isabelle, we will use the shorthand notation δs v for the discovery time of nodevin states, andϕs vfor the finishing time.

For the sets dom(discovered s)of discovered anddom(finshed s)of finished nodes, we prove, among others, the following properties:

lemma(inDFS_invar)disc_lt_fin: v∈dom(finished s) =⇒δ s v <ϕs v

lemma(inDFS_invar)stack_set_def: set (stack s)= dom(discovered s)−dom(finished s) lemma(inDFS_invar)finished_closed: E``(dom(finished s))⊆dom(discovered s)

abbreviation(inDFS_invar)reachable≡E^∗``V₀ lemma(inDFS_invar)nc_finished_eq_reachable:

¬cond s∧ ¬is_break s =⇒ dom(finished s)= reachable

The first lemma states that for each finished node, the finishing time is smaller than the discovery time (the assumption is necessary, for there is no finishing time for unfinished nodes). The lemma thereafter states that the nodes on the stack are exactly those that have already been discovered, but not yet finished. The third lemma states that edges from finished nodes lead to discovered nodes, and the last lemma expresses that the finished nodes are exactly the nodes reachable fromV₀ when the algorithm terminates without being interrupted.

We also prove more sophisticated properties found in standard textbooks (e. g., [5, pp.

606–608]), like the Parenthesis Theorem (the discovered/finished intervals of two nodes are either disjoint or the one is contained in the other, but there is no overlap) or the White-Path-Theorem (a nodevis reachable in the search tree from a nodeuiff there is a white path fromv tou, i. e., a path where all nodes are not yet discovered whenvis).

lemma(inDFS_invar)parenthesis:

assumesv∈dom(finished s)andw∈ dom(finished s) andδs v <δs w

shows(* disjoint *)ϕs v <δs w

∨(* v contains w *)ϕs w <ϕs v definition(inDFS_invar)white_pathwhere

white_path x y≡x6=y

−→(∃p. path E x p y∧

(δs x <δ s y∧ (∀v∈set(tl p).δ s x <δs v))) lemma(inDFS_invar)white_path:

assumesv∈reachableandw∈reachable and ¬cond s∧ ¬is_break s

showswhite_path v w←→(v,w)∈(tree s)^∗

The Parenthesis Theorem is important to reason about paths in the search tree, as it allows us to gain insights just by looking at the timestamps:

lemma(inDFS_invar)tree_path_iff_parenthesis:

assumesv∈dom(finished s)andw∈dom(finished s) shows (v,w)∈ (tree s)⁺

←→δs v <δs w∧ ϕs v >ϕs w

This theorem expresses the relation between two discovered nodesv and w: There is a path in the search tree from vtowiff the discovery and finishing times of v create the eponymous parenthesis around the times ofw.

From the location of two nodes in the search tree, we can deduce several properties of those nodes (e. g., the → direction of tree_path_iff_parenthesis). This can be used, for example, to show properties of back edges, as

lemma(inDFS_invar)back_edge_impl_tree_path:

J(v,w)∈ back_edges s; v6=wK =⇒ (w,v)∈(tree s)⁺.

That is, for any back edge which is not a self-loop, there exists a path in the search for the other direction.

Example 4.4.3(Cyclicity Checker: Proof)

The idea of cycles in the set of reachable edges is independent of any DFS instanti-ation. Therefore we can provide invariants about the (a)cyclicity of those edges in the general library, the most important one linking acyclicity to the existence of back edges:

lemma(inDFS_invar)cycle_iff_back_edges:

acyclic(edges s)←→back_edges s ={}

Here,edges sis the union of all tree, cross, and back edges.

The→direction follows as an obvious corollary of the lemmaback_edge_impl_tree_path shown above. The←direction follows from the fact thatacyclic(tree s∪cross_edges s), the proof of which uses the Parenthesis Theorem.

4.5 Refinement

Moreover, we need the fact that at the end of the searchedges s is the set of all reachable edges:

lemmanc_edges_covered:

assumes¬cond sand¬is_break s showsE∩reachable×UNIV = edges s

With those facts from the library, we recall the definition of the cyclicity checker in our framework as presented in Examples 4.2.1 and 4.3.1:

definitioncyc_checkerwhere cyc_checker =L

on_init≡returnLcyc = FalseM, (* initially no cycle has been found *), on_back_edge≡λu v s.returnLcyc = TrueM (_{* cycle! *}),

is_break≡λs. cyc s (* break iff cycle has been found *) M

interpretationcyc!: param_DFS E V₀cyc_checkerforE V₀.

As thecycflag is set when a back edge is encountered, the following invariant is easily proved:

lemmai_cyc_eq_back:

is_invar(λs. cyc s←→back_edges s6={}) apply(induct rule: establish_invar)

apply(simp_all add: cond_def cong: cyc_more_cong) apply(simp add: empty_state_def)

done

This happens to be the only invariant that needs to be shown for the correctness proof. Using the invariants mentioned above, we easily get the following lemma inside the localeDFS_invar, i. e., under the assumptionrwof s:

lemma(inDFS_invar)cycc_correct_aux:

assumes¬cond s

showscyc s←→ ¬acyclic(E∩ reachable×UNIV)

Intuitively, this lemma states that the cycflag is equivalent to the existence of a reachable cycle upon termination of the algorithm. Finally, we gain the correctness lemma of the cyclicity checker as an easy consequence:

lemmacyc_correct:

cyc.dfs E V₀ ≤specs.

cyc s←→ ¬acyclic(E∩reachable×UNIV).

Further examples of general properties (involving SCCs) are presented when proving the correctness of Tarjan’s algorithm in Section 4.7.2.

4.5 Refinement

So far, we have described the general and the abstract framework. While the former defines the structure of the parameterized search algorithms, the latter adds an explicit state and

therefore allows to reason about its properties. But this is exactly where the scope of the abstract framework ends: It should only allow to help reasoning about algorithms implementing DFS, which is achieved by keeping a very high level view – therefore the designationabstract.

Of course, this is, in most cases, not the final phase. In particular for our goal of implementing a model checker, we need exectuable code. Directly executing the abstract definitions, given it was even possible, would yield very unfortunate timings, thereby rendering any actual use improbable.

The common way to handle this, and the reason the Refinement Framework was created, is to build a more optimized version of the algorithm (cf. Section 2.2), which can then be exported into executable code using the code generator of Isabelle/HOL [16].

This optimization process here can consist of different sub-steps, each optimizing a particular aspect of the algorithm. Such aspects include data refinement, i. e., replacing data structures by better fitted and more efficient versions (e. g., sets by lists or red-black trees), and structural refinement, i. e., optimizing the algorithm itself, for example by adding heuristics.

Both of those large aspects are also directly supported by the framework and detailed in this section. While those refinement steps are applied sequentially (first data refinement, then structural refinement, finally code generation), they are in themselves designed independent of one another: The first two are enabled by providing a library of possible refinements, which can be used in a sort of plug-and-play system. The last one, code generation, is done using the Autoref Tool [24] and thus done separately for each algorithm.

Im Dokument CAVA – A Verified Model Checker (Seite 61-64)