## Finite Maps in Isabelle

Isabelle (or Isabelle/HOL) is a proof assistant supporting some degree of automated reasoning, and can be programmed using a dialect of ML. This makes it perfect for proving properties of functional programs (there’s even a converter from Haskell to Isabelle), and I’ve been learning it for the past month or so with the intention of proving correct bits of Déjà Fu.

Recently I “finished” a theory of finite maps in Isabelle, which I’m going to use to construct a theory of finite tries. Then, if you squint hard enough at the `BPOR`

type defined in `Test.DejaFu.SCT.Internal`

, it’s a finite trie! Because it’s a fairly awkward data structure, I want to get all this foundational work done well, to ease reasoning about the higher-level thing.

### Theories

An Isabelle source file is called a *theory*, and is a collection of types, definitions, functions, lemmas, theorems, and so on. A theory starts with a header which specifies its name (which must match the source file) and what theories this builds upon:

```
theory FMap
imports BNF_Cardinal_Arithmetic BNF_Def Map FSet
begin
```

Theories can also contain commentary, which can be compiled to LaTeX. Various section types as well as the ability to embed arbitrary TeX exist. With Isabelle, the culture seems to be that the code *is* the documentation. It certainly makes getting a paper out of your work easy.

```
section {* Finite Maps *}
```

### Types

There are two ways to define new types in Isabelle: through the *datatype* command, which is used to create regular sum and product types; and through the *typedef* command, which is used to impose constraints on already existing types.

I wanted to reuse the original map type to make proving things about this as easy as possible, so I went for a *typedef* over that, and just imposed the finiteness constraint:

```
typedef ('k, 'v) fmap = "{M :: ('k ⇀ 'v). finite (dom M)}" morphisms fmap Abs_fmap
proof -
have "finite (dom empty)"
by simp
thus ?thesis
by blast
qed
```

All types in Isabelle are inhabited, so a proof obligation is generated here of the form `∃x. x ∈ {M. finite (dom M)}`

. I have proven this by providing a witness: `empty`

is the empty map which, naturally, has a finite domain. `simp`

and `blast`

are different proovers: `simp`

applies a collection of rewrite rules, called simplification rules; `blast`

is one of the few different solvers for first-order logic.

This type definition also gives us morphisms to convert a regular map to our new finite map type and back, provided that the finiteness requirement holds.

Rather than write our types out as `('k, 'v) fmap`

all the time, new syntax can be defined to make the theories look prettier:

```
type_notation "fmap" (infixr "|~=>|" 0)
type_notation (xsymbols) "fmap" (infixr "↪" 0)
```

I spent quite some time looking for the unicode arrow which felt right to me but, alas, didn’t find the *perfect* arrow.

### Lifting

Isabelle lets us reuse definitions and functions using the base type with the abstract type, as long as we can prove that the invariant is satisfied. This is called *lifting*, and we need to explicitly turn it on for a type:

```
setup_lifting type_definition_fmap
lift_definition dom :: "('k ↪ 'v) ⇒ 'k set"
is "Map.dom".
lift_definition ran :: "('k ↪ 'v) ⇒ 'v set"
is "Map.ran".
```

In general, lifted definitions come with a proof obligation that the invariants of the abstract type get preserved, but here this is sufficiently trivial that `lift_definition`

completely handles it for us.

```
lift_definition elems :: "('k ↪ 'v) ⇒ ('k × 'v) set"
is "λ m. {(a, b). a ∈ Map.dom m ∧ b = the (m a)}".
```

Maps are defined as a function from `'k`

to `'v option`

, which if you’ve used Haskell is just `Maybe`

. The function `the`

extracts a value when there is one.

The reason `elems`

is defined like is to make the later proofs of `dom_elems`

and `ran_elems`

simpler. However, the more obvious formulation is also useful to have:

```
lemma elems_def_alt: "elems m = {(k, v). fmap m k = Some v}"
by (smt Collect_cong case_prod_unfold domI domIff elems.rep_eq option.collapse option.sel)
```

The `smt`

prover is used here which, given a collection of lemmas and definitions, attempts to prove the goal by finding a satisfying assignment of the free variables.

Naturally, the domain, range, and elements are all related:

```
lemma dom_elems: "dom m = fst ` elems m"
by (transfer, force)
```

`transfer`

is a prover which attempts to use the relation between the concrete and the abstract types to simplify the goal. `force`

is another of the first-order logic provers. It doesn’t always work, but when it does, `transfer`

vastly simplifies proofs involving lifted definitions.

The proof for `ran_elems`

is a little less simple:

```
lemma ran_elems: "ran m = snd ` elems m"
apply (transfer, simp add: Map.ran_def)
by force
```

There’s an extra simplification step here, because the definition of `elems`

isn’t as obviously related to `ran`

as it is to `dom`

. `apply`

applies the given provers, in this case, `transfer`

and `simp`

. Proofs with lots of `apply`

lines are a bit difficult to read, so structured proofs written in the Isar language tend to be preferred for difficult goals.

We also know that the domain, range, and elements are finite sets, by the definition of the type; so it’s no surprise that we can prove these properties. Here an Isar proof has been used to show the reasoning steps more clearly:

```
lemma elems_finite: "finite (elems m)"
apply transfer
unfolding Map.dom_def
apply simp
(* At this stage, the proof obligation is:
⋀m. finite {a. ∃y. m a = Some y} ⟹
finite {(a, b). (∃y. m a = Some y) ∧ b = the (m a)}
*)
proof -
fix m
assume "finite {a. ∃y. m a = Some y}"
hence "finite ((λ k. (k, the (m k))) ` {a. ∃y. m a = Some y})"
by simp
hence "finite {(a, b). (∃ y. m a = Some y) ∧ b = the (m a)}"
apply (simp add: image_def)
by (smt Collect_cong case_prod_unfold fst_conv prod.collapse snd_conv)
thus "finite {(a, b). (∃y. m a = Some y) ∧ b = the (m a)}"
by auto
qed
lemma dom_finite: "finite (dom m)"
by (simp add: elems_finite dom_elems)
lemma ran_finite: "finite (ran m)"
by (simp add: elems_finite ran_elems)
```

Isabelle has two levels of logic, on term-level bools and on the meta-logic. `⋀`

is the meta-logic equivalent of `∀`

; `⟹`

the equivalent of `⟶`

. In general the automated reasoning tools can decompose things expressed in the meta-logic more easily than they can things expressed in terms of bools, and so it should be preferred wherever it can be used.

Note that Isar and apply-style proofs can be mixed. This is good because sometimes a short chain of `apply`

invocations will do the job, where writing out each step explicitly would just be needlessly verbose.

We can also prove some facts about the cardinality of these sets, due to two facts: keys are unique (as it’s just a function), but values are not necessarily. Therefore there are as many elements as there are keys, and at most as many values as there are keys.

It’s convenient to approach this by first proving an auxiliary lemma about the underlying map type:

```
lemma map_elems_card: "card {(a, b). m a = Some b} = card (Map.dom m)"
proof -
have "Map.dom m = fst ` {(a, b). m a = Some b}"
by force
also have "card … = card {(a, b). m a = Some b}"
by (subst card_image, auto simp: inj_on_def)
finally show ?thesis by simp
qed
```

Here `…`

expands to the right-hand side of the prior proof line. More specifically, it pattern-matches the prior line against `?P ?x ?y`

and returns `?y`

. It’s convenient to avoid repeating things.

This makes the corresponding `fmap`

lemmas very simple:

```
lemma dom_elems_card: "card (dom m) = card (elems m)"
by (simp add: elems_def_alt FMap.dom_def map_elems_card)
lemma dom_ran_card: "card (dom m) ≥ card (ran m)"
by (simp add: ran_elems dom_elems_card elems_finite card_image_le)
lemma ran_dom_card: "card (ran m) ≤ card (dom m)"
by (simp add: dom_ran_card)
lemma ran_elems_card: "card (ran m) ≤ card (elems m)"
by (metis dom_elems_card dom_ran_card)
```

### The empty map

As the empty map was used as a witness earlier in the definition of `fmap`

, it’s no surprise that we might want to lift it to `fmap`

directly:

```
lift_definition empty :: "('k ↪ 'v)"
is "Map.empty"
by simp
```

Here the proof obligation is not so simple it can be solved by `lift_definition`

itself, but one call to `simp`

is all that is needed to complete it.

```
lemma elems_empty: "elems x = {} ⟷ x = empty"
by (transfer, auto)
lemma dom_empty: "dom x = {} ⟷ x = empty"
by (auto simp add: dom_elems elems_empty)
lemma ran_empty: "ran x = {} ⟷ x = empty"
by (auto simp add: ran_elems elems_empty)
```

### Functorial mapping

It should come as no surprise to anyone familiar with Hindley-Milner-style parametric polymorphism that `fmap`

is a functor when the key type is fixed! We can define the mapping function quite simply:

```
lift_definition fmap_map :: "('v ⇒ 'w) ⇒ ('k ↪ 'v) ⇒ ('k ↪ 'w)"
is "λ f m x. Option.bind (m x) (Some ∘ f)"
by (smt Collect_cong Map.dom_def bind.simps comp_def domD mem_Collect_eq option.distinct)
```

Then we have the functor laws to prove for this:

```
lemma fmap_map_id: "fmap_map id m = m"
by (auto simp add: fmap_map_def map_fun_def fmap_inverse)
lemma fmap_map_comp: "fmap_map (g ∘ f) m = fmap_map g (fmap_map f m)"
apply transfer
apply auto
by (simp add: comp_def)
```

To make other things easier later on, we’ll also prove some additional lemmas which, strictly speaking, we don’t need as we can derive from what we have already:

```
lemma fmap_map_dom: "dom m = dom (fmap_map f m)"
apply transfer
apply auto
by (meson bind_eq_Some_conv)
lemma fmap_map_ran: "ran (fmap_map f m) = f ` ran m"
apply transfer
apply auto
apply (smt CollectD Map.ran_def bind_eq_Some_conv comp_apply image_iff option.sel ranI)
by (smt Map.ran_def bind_lunit comp_eq_dest_lhs mem_Collect_eq)
```

### Relations

We can lift a relation over pairs of values to a relation over pairs of maps, where elements with the same key are compared:

```
lift_definition fmap_rel :: "('v ⇒ 'w ⇒ bool) ⇒ ('k ↪ 'v) ⇒ ('k ↪ 'w) ⇒ bool"
is "λ R m1 m2. (dom m1 = dom m2 ∧
(∀ x ∈ dom m1. R (the (fmap m1 x)) (the (fmap m2 x))))".
```

We can also define a notion of zipping maps together, where elements with the same key are tupled up. This clearly relates to the `fmap_rel`

definition just given:

```
axiomatization fmap_zip :: "('k ↪ 'v) ⇒ ('k ↪ 'w) ⇒ ('k ↪ 'v × 'w)"
where
fmap_zip_dom: "dom (fmap_zip m1 m2) = {k. fmap m1 k ≠ None ∧ fmap m2 k ≠ None}" and
fmap_zip_ran: "ran (fmap_zip m1 m2) = {(a,b). ∃ k. fmap m1 k = Some a ∧ fmap m2 k = Some b}" and
fmap_zip_fst_ran: "fst ` ran (fmap_zip m1 m2) ⊆ ran m1" and
fmap_zip_fst: "dom m1 = dom m2 ⟹ m1 = fmap_map fst (fmap_zip m1 m2)" and
fmap_zip_snd_ran: "snd ` ran (fmap_zip m1 m2) ⊆ ran m2" and
fmap_zip_snd: "dom m1 = dom m2 ⟹ m2 = fmap_map snd (fmap_zip m1 m2)" and
fmap_zip_rel: "fmap_rel R m1 m2 ⟷ ran (fmap_zip m1 m2) ⊆ {(x, y). R x y}"
```

I have not provided a definition for `fmap_zip`

directly, instead I have merely given a collection of named axioms it satisfies. This is yet another way of defining things in Isabelle, and is useful for reasoning about abstract concepts for which there perhaps isn’t a single sensible definition. I went for this approach because I couldn’t come up with a definition for `fmap_zip`

which was nice enough to let me prove all the lemmas I wanted, which is not a very good motivation.

### Bounded natural functors

Isabelle supports directly and indirectly recursive datatypes. By *directly* recursive, I mean something like this:

```
datatype 'a binary_tree = Leaf 'a | Branch "'a binary_tree" "'a binary_tree"
```

Here the recursion in `binary_tree`

happens on a `binary_tree`

value. Whereas *indirect* recursion would be something like this:

```
datatype 'a rose_tree = Leaf 'a | Branch "'a rose_tree list"
```

Where there is an intermediary type in the way, `list`

in this case. Not just any type can be used there, it must be a type which has been registered as a *bounded natural functor*.

A bounded natural functor (or BNF) is a functor which has some additional natural transformations associated with it to do with relations, and which preserves finiteness. Fortunately we’ve almost done everything we need in order to prove that `fmap`

is a BNF!

Just before we do, it’s convenient to relate the two notions of set cardinality in Isabelle. There is cardinality of sets, and there is a more generic notion of cardinality relating to types, which BNFs use:

```
lemma card_leq_ordLeq:
assumes "finite x"
assumes "finite y"
assumes "card x ≤ card y"
shows "ordLeq3 (card_of x) (card_of y)"
using assms
by (meson card_le_inj card_of_ordLeq)
```

The `assumes`

/`shows`

notation is a shorthand for `⟦assm1; …; assmnN⟧ ⟹ shows`

, which itself is a shorthand for `assm1 ⟹ … ⟹ assmN ⟹ shows`

.

Registering a type as a BNF consists of providing some definitions, and then proving that they satisfy a bunch of properties:

```
bnf "'k ↪ 'v"
map: fmap_map
sets: ran
bd: "BNF_Cardinal_Arithmetic.csum natLeq (card_of (UNIV :: 'k set))"
wits: empty
rel: fmap_rel
```

This gives rise to 10 proof obligations. Isar proofs with multiple obligations proceed by having multiple `show`

lines, one for each obligation. The `next`

keyword can be used to split up the proof:

```
proof -
show "fmap_map id = id"
by (auto simp add: fmap_map_id)
next
show "⋀ f g. fmap_map (g ∘ f) = fmap_map g ∘ fmap_map f"
by (auto simp add: fmap_map_comp)
next
show "⋀ x f g. (⋀z. z ∈ ran x ⟹ f z = g z) ⟹ fmap_map f x = fmap_map g x"
apply transfer
by (metis bind.simps comp_apply not_None_eq ranI)
next
show "⋀f. ran ∘ fmap_map f = op ` f ∘ ran"
by (auto simp add: fmap_map_ran)
```

Now we have some facts about cardinality that I don’t *really* understand. I *think* that it’s first establishing that the cardinality property given is a valid cardinality, and that the definition provided for the `sets`

returns a set within this cardinality bound:

```
next
show "card_order (BNF_Cardinal_Arithmetic.csum natLeq (card_of UNIV))"
apply (rule card_order_csum)
apply (rule natLeq_card_order)
by (rule card_of_card_order_on)
next
show "BNF_Cardinal_Arithmetic.cinfinite (BNF_Cardinal_Arithmetic.csum natLeq (card_of UNIV))"
apply (rule cinfinite_csum)
apply (rule disjI1)
by (rule natLeq_cinfinite)
next
fix m :: "('k, 'v) fmap"
have "ordLeq3 (card_of (ran m)) (card_of (dom m))"
by (simp add: ran_dom_card ran_finite dom_finite card_leq_ordLeq)
also have "ordLeq3 (card_of (dom m)) (card_of (UNIV::'k set))" (is "ordLeq3 _ ?U")
by (simp add: card_of_mono1)
also have "ordLeq3 ?U (BNF_Cardinal_Arithmetic.csum natLeq ?U)"
by (rule ordLeq_csum2) (rule card_of_Card_order)
finally show "ordLeq3 (card_of (ran m)) (BNF_Cardinal_Arithmetic.csum natLeq ?U)"
using ordLeq_transitive
by blast
```

As I said, I don’t really understand it. It’s entirely possible I have picked a wildly inappropriate cardinality bound, which is *way* bigger than it needs to be. Oh well.

Next we have some facts about `OO`

, relational composition:

```
next
show "⋀R S. fmap_rel R OO fmap_rel S ≤ fmap_rel (R OO S)"
by (smt fmap_rel.transfer pick_middlep predicate2I relcompp.relcompI)
next
show "⋀R. fmap_rel R = (BNF_Def.Grp {x. ran x ⊆ {(x, y). R x y}} (fmap_map fst))¯¯ OO BNF_Def.Grp {x. ran x ⊆ {(x, y). R x y}} (fmap_map snd)"
unfolding Grp_def fun_eq_iff relcompp.simps conversep.simps
apply auto
proof -
show "⋀R x xa.
fmap_rel R x xa ⟹
∃b. x = fmap_map fst b ∧
ran b ⊆ {(x, y). R x y} ∧
xa = fmap_map snd b ∧ ran b ⊆ {(x, y). R x y}"
by (meson fmap_rel.abs_eq fmap_zip_fst fmap_zip_rel fmap_zip_snd)
next
show "⋀R b. ran b ⊆ {(x, y). R x y} ⟹
fmap_rel R (fmap_map fst b) (fmap_map snd b)"
unfolding fmap_rel_def
apply auto
apply (metis fmap_map_dom)
apply (metis fmap_map_dom)
proof -
(* At this stage the proof obligation is:
⋀R b x. ran b ⊆ {(x, y). R x y} ⟹
x ∈ dom (fmap_map fst b) ⟹
R (the (fmap (fmap_map fst b) x)) (the (fmap (fmap_map snd b) x))
*)
fix R m x
assume "ran m ⊆ {(x, y). R x y}"
assume "x ∈ dom (fmap_map fst m)"
obtain a b where "fmap m x = Some (a, b)"
by (metis `x ∈ FMap.dom (fmap_map fst m)` dom.rep_eq domD fmap_map_dom prod.swap_def swap_swap)
have thea: "a = the (fmap (fmap_map fst m) x)"
by (simp add: `fmap m x = Some (a, b)` fmap_map.rep_eq)
have theb: "b = the (fmap (fmap_map snd m) x)"
by (simp add: `fmap m x = Some (a, b)` fmap_map.rep_eq)
have "R a b"
using `FMap.ran m ⊆ {(x, y). R x y}` `fmap m x = Some (a, b)` ran.rep_eq ranI
by fastforce
thus "R (the (fmap (fmap_map fst m) x)) (the (fmap (fmap_map snd m) x))"
by (simp add: thea theb)
qed
qed
```

The last obligation is very simple, the emptiness witness provided contains no values:

```
next
show "⋀x. x ∈ ran empty ⟹ False"
by (simp add: empty.abs_eq empty.rsp ran.abs_eq)
qed
```

Finally, the `begin`

at the start of the theory is closed:

```
end
```

Overall, I quite like Isabelle.

It’s taken me a little while to get into the swing of things, but I think I’ve got the hang of it now. There are still many things to learn, but progress is being made. I’m slowly building up a library of useful types and lemmas in my isabelle-library repository, which I hope will be useful for more than just proofs for Déjà Fu.

If you want to give Isabelle a try for yourself, you can download the latest release from the official website, which also has documentation. There’s also a mailing list, IRC channel (#isabelle on freenode, a little slow but has a few knowledgeable people), and the Archive of Formal Proofs, a collection of theorems in Isabelle, some of which are explained quite well.