Implementation - CAVA – A Verified Model Checker

2.2 Refinement

3.1.4 Implementation

The goal of the original automata library was to have an executable library, that could be used by other programs, if needed.

Therefore the formalizations need to be refined to executable code. This, in effect, only needs to cover the NFA, as a semi-automaton is not going to be used by the user. Since the main component of an automaton, as described so far, is the underlying transition system, the LTS is refined on its own and then used by the NFA as an existing component.

The library, due to its age, uses the original version of the Isabelle Collections Frame-work [28] directly and not the Isabelle Refinement FrameFrame-work [24] as an intermediary.

In this original framework, the first step (when refining data structures, not necessarily plain algorithms), is to give a specification of said structure. This is not the formalization, but a set of functions that an implementation must have (i. e., an interface in Software Engineering terms).

The most important function of such a specification is the abstraction function (regularly denoted by the suffix_α) that converts an object of the implementation world into the datatype as used in the formalization. In case of the labeled transition system, this is the following function-type, where 'Lⁱ is the implementation type:

type_synonym ('q,'l,'Lⁱ)lts_α= 'Lⁱ ⇒('q,'l)LTS

Together with a possible invariant on the implementation this results in a locale repre-senting some implementation of LTS:

localelts =

fixesα::('q,'l,'Lⁱ)lts_α fixesinvar :: 'Lⁱ ⇒bool

Further extensions like finiteness or determinism are gained by extending said locale, where the aforementioned additional invariant is assumed:

3.1 The Comprehensive Library

localefinite_lts = lts +

assumesinvar l=⇒finite(αl) localedlts = lts +

assumesinvar l=⇒LTS_is_deterministic(αl)

Operations on the structure are added piece by piece: for each operation a corresponding function on the implementation is defined, which is then encapsulated in a specific locale with corresponding assumptions¹. For instance, the successor function is added like this:

type_synonym('q,'l,'Lⁱ)lts_succ = 'Lⁱ ⇒'q⇒'l⇒'q option localelts_succ = lts +

fixessucc ::('q,'l,'Lⁱ)lts_succ assumes

invar l=⇒succ l v w = None =⇒ ∀v'.(v, w, v')∈/(αl) invar l=⇒succ l v w = Some v'=⇒(v, w, v')∈(αl)

Herelts_succis a function type, that when passed an instance of the LTS implementation, yields a successor function. In the same-named locale, an instance of thelts_succfunction is fixed (i. e., theoretically, an implementation may have multiple variants of successor definitions).

In the same façon other operations like membership-testing, emptiness check, insertion etc. are defined.

For an easier usage, common operations are combined into one record-type, so that an LTS-implemention can be expressed by an instance of such type:

record('q,'l,'Lⁱ)lts_ops = lts_op_α::('q,'l,'Lⁱ)lts_α lts_op_invar :: 'Lⁱ ⇒bool

lts_op_empty ::('q,'l,'Lⁱ)lts_empty lts_op_memb ::('q,'l,'Lⁱ)lts_memb lts_op_succ ::(_'q,'l,'Lⁱ)_{lts_succ} . . .

The record itself only contains the operations. This has two drawbacks: the correctness properties defined inside the appropriate locales are not contained, and using any imple-mentation needs the impleimple-mentation-record as a parameter, making the code cumbersome to read and write. For example, given thatltsis an LTS andLis the operation, one would have to writelts_op_succ L lts q ato get the successors for nodeqand labela.

Therefore, an additional locale²is introduced: It takes the implementation as a parameter and then defines abbreviations on them, allowing direct usage of the functions. It also connects those functions to the appropriate correctness-locale:

localeStdLTS =

finite_ltsαinvar +

lts_emptyαinvar empty +

1This approach allows an implementation to only provide a subset of operations.

2For implementation reasons, in Isabelle proper, this is defined as two locales, as inheritance cannot use the later-defined abbreviations.

lts_membαinvar memb + lts_succαinvar succ + . . .

fixesops ::('q,'l,'Lⁱ)lts_ops begin

abbreviationαwhereα≡lts_op_αops

abbreviationinvarwhereinvar≡lts_op_invar ops abbreviationemptywhereempty≡lts_op_empty ops abbreviationmembwherememb≡lts_op_memb ops abbreviationsuccwheresucc≡lts_op_succ ops . . .

end

A similar locale is created for deterministic LTS.

Tuerk provides multiple implementations of LTS, all of which are based on what he calls TripleSets, a map of map of sets (those TripleSets are also introduced by Tuerk, but any details will be omitted here). The difference between those implementations is the order (starting node×label to set of resulting nodes; label×starting node to set of resulting nodes; starting node×resulting node to set of labels between them), as the use case might make one variant perform better than another. All the implementations are based on the same principle:

1. Define a new locale fixing possible parameters and sub-implementations. In the case presented this is the actual implementation of TripleSets:

localeltsbm_QAQ_defs = ts: triple_set ts_αts_invar

forts_α::'ts⇒('Q×'A×'Q)set andts_invar

2. Define the basic constructs: An abstraction function and the general invariant on the concrete data structure:

abbreviation(inltsbm_QAQ_defs)ltsbm_α≡ts_α

abbreviation(inltsbm_QAQ_defs)ltsbm_invar≡ts_invar

As the LTS implementation is a very shallow layer on top of the TripleSets, both the abstraction and the invariant do not define anything on their own, but are renames of the counter-parts in the TripleSet. So they are, in this instance, only defined out of convenience, because by convention every implementation is expected to provide both_invarand_α.

3. Define the basic operations as needed by lts_ops and show that they fulfill the necessary properties as defined in the respective locales. For the LTS implementations, Tuerk does not define the operations for they are identical to the operations on the underlying TripleSet. Thus he only proves that they are correct for the application of an LTS. In theory, additional abbreviations (like for ltsbm_α) could have been introduced. But in practice the defintions inside the locale are seldomly used and therefore not necessary.

3.1 The Comprehensive Library

lemmaltsbm_memb_correct:

triple_set_memb ts_αts_invar memb=⇒ lts_memb ltsbm_αltsbm_invar memb unfoldinglts_memb_def triple_set_memb_def bysimp

lemmaltsbm_add_correct:

triple_set_add ts_αts_invar add=⇒ lts_add ltsbm_αltsbm_invar add unfoldinglts_add_def triple_set_add_def bysimp

4. Usually, the final step is the definition of a particular instance of the ops record (lts_ops) and showing that the assumptions of the according locale (StdLTS) are matched. Tuerk omits this from the theories for the LTS implementations and instead only defines them at the stage prior to code generation when all decisions for the underlying data structures have been made. This is probably due to the number of parameters that can be passed to the implementations of TripleSets (two for the maps, one for the final result set). To follow the example of Tuerk, when using Red-Black-Trees for all those underlying data structures, the final step looks like the following³:

definitionrs_lts_ops ::('V,'E,('V,'E)rs_lts)lts_opswhere rs_lts_ops≡_L

lts_op_α= rs_lts_α,

lts_op_invar = rs_lts_invar, lts_op_empty = rs_lts_empty, lts_op_memb = rs_lts_memb, . . . M

lemmars_lts_impl: StdLTS rs_lts_ops

The implementation of NFAs (represented as a tuple) is then defined in terms of those LTS implementations:

type_synonym('q_set, 'l_set, 'd)NFA_impl = 'q_set×'l_set×'d×'q_set×'q_set localenfa_by_lts_defs =

s!: StdSet s_ops(* Set operations on states *)+ l!: StdSet l_ops(* Set operations on labels *)+ d!: StdLTS d_ops(_{* An LTS *})

fors_ops::('q,'q_set,_)set_ops andl_ops::('l,'l_set,_)set_ops andd_ops::('q,'l,'d,_)lts_ops

3Thers_lts_functions are explicit definitions for an instance ofltsbm_QAQ_defswith Red-Black-Trees. The old Collections Framework unfortunately needed a lot of boilerplate code and technical definitions being lifted from locales.

Due to the tuple structure, the composing parts of the automaton need additional extraction functions to remain readable:

definition(innfa_by_lts_defs)nfa_states A≡fst A definition(innfa_by_lts_defs)nfa_labels A≡fst(snd A) definition(innfa_by_lts_defs)nfa_trans A≡fst(snd(snd A)) . . .

which can then be used to define the straightforward abstraction function:

definition(innfa_by_lts_defs)nfa_α::('q_set, 'l_set, 'd)NFA_impl⇒('q, 'l)NFAwhere nfa_αA =

LQ= s.α(nfa_states A), Σ= l.α(nfa_labels A),

∆= d.α(nfa_trans A), I = s.α(nfa_initial A), F = s.α(nfa_accepting A)_M

While the LTS implementation itself was very short and in general only mapped to the underlying TripleSet, the NFA implementation is more involved (88 lines vs 3500 lines).

This is due to the multitude of operations that are defined on the abstract NFA definition and now are replaced by efficient implementations. So, while for the LTS we had more or less a chaining of the underlying map/set structures, the implementations for the NFA are often very different from their abstract counterpart. Therefore the proofs are more complicated.

We will not go in any more detail for the NFA implementation. While, as mentioned, being more complicated, the general idea is like the one given for the LTS.

Eventually, Tuerk defines an instance of the NFA implementation using all Red-Black-Trees. This is then used for code generation and providing an accessor layer around the generated structures to make it possible to be used in raw SML or OCaml code as a mathematically correct library. In [31], Tuerk and Lammich go in more detail for one specific algorithm (Hopcroft’s algorithm for automata minimisation [20]) and present benchmarks for the generated code in comparison with other, unchecked, implementations.

Im Dokument CAVA – A Verified Model Checker (Seite 26-30)