Formal Verification

from types to theorems

Why This Matters

Memory corruption bugs cause approximately 70% of security vulnerabilities in systems software. Buffer overflows, use-after-free, double-free — the same bugs, decade after decade, in code written by experts.

Testing doesn't solve this. You can't test your way to security. Attackers find the cases you didn't test.

Space takes a different approach: make the bugs impossible. Not unlikely. Not caught-by-tests. Impossible. The type system rejects programs that could have memory bugs. If it compiles, those bugs don't exist.

The best code review is a machine-checked proof. The best test suite is a theorem.

Testing vs. Proof

Testing shows that a program works for specific inputs. Proof shows that a program works for all inputs.

Approach	Guarantee	Coverage
Manual testing	Works for cases I tried	Dozens of cases
Unit tests	Works for cases I wrote	Hundreds of cases
Property tests	Works for random cases	Thousands of cases
Fuzzing	Works for adversarial cases	Millions of cases
Proof	Works for all cases	Infinite cases

This isn't a difference of degree. It's a difference of kind. A million test cases still leave infinitely many untested. A proof leaves none.

The Verification Stack

Space achieves verified correctness through a chain of tools, each building on the last:

Space source code
↓
F* (specification + proof)
↓
Low* (low-level verified code)
↓
Karamel (extraction)
↓
C / WebAssembly

F*

F* is a dependently-typed programming language developed by Microsoft Research. "Dependently-typed" means types can depend on values:

// A vector type that knows its own length
type vector (a: Type) (n: nat) = ...

// This function only accepts vectors of equal length
val zip: #a:Type -> #b:Type -> #n:nat
       -> vector a n -> vector b n -> vector (a * b) n

The type vector a n isn't just "a list of things" — it's "a list of exactly n things." The compiler tracks this. If you try to zip vectors of different lengths, it's not a runtime error. It's a type error. The program doesn't compile.

Refinement Types

F* extends this with refinement types — types constrained by predicates:

// A natural number that's less than n
type index (n: nat) = x:nat{x < n}

// Array access that's bounds-checked at compile time
val get: #a:Type -> #n:nat -> arr:array a n -> i:index n -> a

The type index n means "a natural number less than n." When you call get, the compiler must prove your index is in bounds. No runtime check needed. If you can't prove it, you can't call the function.

Low*

Low* is a subset of F* designed for systems programming. It provides:

Mutable arrays and buffers
Stack and heap allocation
Machine integers with overflow semantics
A memory model matching C

Code written in Low* carries the same proofs as F*, but compiles to efficient C without garbage collection.

Karamel

Karamel extracts Low* code to readable C or WebAssembly. The output isn't obfuscated compiler output — it's clean, auditable C that preserves the structure of the original.

Crucially, extraction is semantics-preserving. The proofs verified in F* apply to the extracted C. You're not trusting Karamel to be correct; you're trusting the mathematical relationship between the source and target.

What Space Proves

The Space implementation includes machine-checked proofs of these properties:

Stack Correctness

// Push then pop returns the original value
lemma push_pop_identity:
  forall (s: stack) (v: cell).
    pop (push s v) == (s, v)

// Stack size increases by one after push
lemma push_size:
  forall (s: stack) (v: cell).
    size (push s v) == size s + 1

These aren't comments or documentation. They're theorems. The F* compiler verifies them using the Z3 SMT solver. If the lemma is false, compilation fails.

Universe Isolation

// Pointers are tagged with their universe of origin
type ptr = { universe: universe_id; offset: nat }

// Operations check universe membership
val read: u:universe -> p:ptr{p.universe == u.id} -> cell

The type system ensures you can only read from pointers belonging to the current universe. Cross-universe access requires explicit borrowing, which the type system also tracks.

Discipline Compliance

// Linear values must be consumed
type linear_cell = c:cell{must_consume c}

// The type system tracks consumption
val consume: linear_cell -> unit  // removes the obligation

Linear values carry a proof obligation: they must be consumed before the universe can be destroyed. The type system tracks this obligation and rejects programs that might drop linear values.

Borrow Safety

// Borrowed pointers are affine (must be dropped)
type borrowed (u: universe_id) = {
  ptr: ptr;
  source: u;
  active: bool  // tracked in the type
}

// Source can't destruct while borrows exist
val release_universe: u:universe{no_active_borrows u} -> unit

The type tracks which borrows are active. A universe can only be released when all borrows have been dropped. This is checked at compile time, not runtime.

The SMT Connection

F* doesn't prove everything manually. It delegates to Z3, an SMT (Satisfiability Modulo Theories) solver developed by Microsoft Research.

When you write a lemma, F* translates it into a logical formula and asks Z3: "Is there any way this could be false?" If Z3 finds a counterexample, compilation fails. If Z3 proves no counterexample exists, the lemma is verified.

This automation is crucial. Writing proofs by hand is tedious and error-prone. SMT solvers handle the mechanical parts, letting you focus on the interesting properties.

Trusted Computing Base

Every verified system has a trusted computing base (TCB) — the components you must trust to be correct, because they're not themselves verified.

Space's TCB is minimal:

Component	Trust Reason
F* type checker	Verifies proofs
Z3 solver	Discharges proof obligations
Karamel extractor	Preserves semantics
C compiler	Compiles extracted code
Hardware	Executes instructions

Notably absent: the Space compiler itself. The compiler's correctness is proven, not assumed. Bugs in the compiler would be caught by F*'s type checker.

Formal Verification in Practice

From the Space implementation:

module Space.Stack.Properties

open Space.Types
open Space.Stack

// Push-pop identity
let lemma_push_pop (s: stack) (v: cell) :
  Lemma (pop (push s v) == (s, v)) = ()

// Size after push
let lemma_push_size (s: stack) (v: cell) :
  Lemma (size (push s v) == size s + 1) = ()

// Size after pop (non-empty)
let lemma_pop_size (s: stack{size s > 0}) :
  Lemma (size (fst (pop s)) == size s - 1) = ()

// Empty stack has size zero
let lemma_empty_size () :
  Lemma (size empty == 0) = ()

Each lemma is a theorem about stack behavior. The empty proof body = () means Z3 can verify it automatically — the property follows directly from the definitions.

What Proofs Cannot Do

Proofs are powerful, but not omnipotent. They cannot guarantee:

Correct specification — Proofs verify that code matches its spec. If the spec is wrong, the code is wrong.
Termination — F* can prove termination for many programs, but some valid programs have unprovable termination.
Hardware correctness — Proofs assume the CPU executes instructions correctly. Cosmic rays and hardware bugs are out of scope.
Side-channel resistance — Proofs about functional behavior don't cover timing attacks or power analysis.

Verification reduces the attack surface. It doesn't eliminate it entirely.

The Proof Burden

Writing verified code is harder than writing unverified code. You must:

Specify exactly what properties you want
Structure code so properties are provable
Add annotations when Z3 needs hints
Debug proof failures, not just test failures

This burden is front-loaded. You pay it during development, not after deployment. And you pay it once, not repeatedly — unlike tests that must be maintained and re-run.

For security-critical code, this tradeoff is favorable. The cost of a memory corruption vulnerability — in breaches, patches, reputation — dwarfs the cost of verification.