203 lines
13 KiB
Markdown
203 lines
13 KiB
Markdown
---
|
|
description: Formalize the system design into a TLA+ specification and verify it
|
|
allowed-tools: [Read, Write, Edit, Glob, Grep, Bash, Task]
|
|
model: opus
|
|
---
|
|
|
|
# FormSpec Formalize
|
|
|
|
You are translating the system's natural-language design into a TLA+ formal specification, then running the TLC model checker to verify it. This is the core of the formspec workflow and demands precision.
|
|
|
|
There are two failure modes that you must handle differently:
|
|
1. **Meaning drift**: The TLA+ does not faithfully represent the design (your error to fix).
|
|
2. **Design flaw**: The TLA+ faithfully represents the design, but TLC finds a property violation (the developer must decide how to resolve).
|
|
|
|
## Phase 1 — Read all design artifacts
|
|
|
|
Read every artifact that exists:
|
|
- `spec/system.md`
|
|
- All feature design docs: Glob for `spec/features/*/design.md`
|
|
- `spec/formal/System.tla` (may not exist yet)
|
|
- `spec/formal/System.cfg` (may not exist yet)
|
|
- `spec/formal/traceability.md` (may not exist yet)
|
|
|
|
Extract a comprehensive list of all requirements across all documents:
|
|
- User stories (US-N)
|
|
- Success criteria (SC-N)
|
|
- Architectural constraints
|
|
- Security considerations
|
|
- Pathological/adversarial use cases
|
|
- Performance constraints that can be modeled
|
|
|
|
Determine whether this is a fresh creation or an update to an existing specification.
|
|
|
|
## Phase 2 — Write or update the TLA+ specification
|
|
|
|
### If System.tla does not exist: create from scratch
|
|
|
|
Write `spec/formal/System.tla` modeling the full system. The spec must:
|
|
|
|
- **Model all actors** described in the design as processes or roles
|
|
- **Model all states and transitions** from user stories and system behavior
|
|
- **Define invariants** for every success criterion that can be expressed as a state predicate
|
|
- **Define temporal properties** for liveness and fairness requirements
|
|
- **Include safety properties** that prevent every pathological behavior described in the design docs
|
|
- **Include best-practice safety properties** for this type of system, even if not explicitly in the design. Examples:
|
|
- Concurrent systems: no deadlocks, no starvation
|
|
- Auth systems: no privilege escalation, no backdoor tokens
|
|
- Distributed data: no lost updates, eventual consistency where claimed
|
|
- Financial systems: conservation of value, no double-spend
|
|
- Queue systems: no message loss, no duplicate delivery (if claimed)
|
|
|
|
### If System.tla exists: modify it
|
|
|
|
Diff the current spec against the new/updated design requirements. Add, modify, or remove states, transitions, invariants, and properties to match the current design. Preserve existing properties that remain valid.
|
|
|
|
### System.cfg
|
|
|
|
Write `spec/formal/System.cfg` with:
|
|
- `SPECIFICATION Spec` (or whatever the spec's temporal formula is named)
|
|
- `CONSTANTS` with small but meaningful values (enough to expose bugs, small enough for tractable checking)
|
|
- `INVARIANT` listing all invariants
|
|
- `PROPERTY` listing all temporal properties
|
|
|
|
### Subsystem specs
|
|
|
|
For standalone subcomponents with clean boundaries (e.g., a rate limiter, a queue, an auth module), create partial specs in `spec/formal/subsystems/` that abstract their communication with the rest of the system. These are optional and should only be created when a component is complex enough to warrant independent verification.
|
|
|
|
## Phase 3 — Write traceability mapping
|
|
|
|
Create or update `spec/formal/traceability.md` with this table:
|
|
|
|
```markdown
|
|
# Traceability Matrix
|
|
|
|
| Source Document | Requirement ID | TLA+ Property | Property Type | Status |
|
|
|----------------|----------------|---------------|---------------|--------|
|
|
| features/YYYY-MM-DD-feature/design.md | US-1 | PropertyName | invariant/temporal/safety/liveness | formalized |
|
|
| features/YYYY-MM-DD-feature/design.md | SC-1 | PropertyName | invariant | formalized |
|
|
| system.md | Best practice: no deadlock | NoDeadlock | safety | formalized |
|
|
```
|
|
|
|
**Status values**: `formalized` (written but not yet checked), `checked` (TLC passed), `violated` (TLC found counterexample), `not-yet-formalized` (requirement identified but no TLA+ property yet — must include justification).
|
|
|
|
Every requirement from Phase 1 must appear in this table. If a requirement cannot be formalized (e.g., "UI should feel responsive"), mark it `not-yet-formalized` with a note explaining why.
|
|
|
|
## Phase 4 — Run TLC model checker
|
|
|
|
Run:
|
|
```bash
|
|
tlc spec/formal/System.tla -config spec/formal/System.cfg -workers auto -metadir /tmp/tlc-states -cleanup
|
|
```
|
|
|
|
**Important**: Run this from the project root directory. The `-metadir` flag directs TLC's working state files to a temporary directory, and `-cleanup` removes them after a successful run. This prevents TLC from littering the project with a `states/` directory full of binary artifacts.
|
|
|
|
**Timeout policy**: Use a 5-minute timeout by default. If TLC has not completed within this limit:
|
|
|
|
1. **Stop TLC** and report the state space explored so far (states found, distinct states, queue size).
|
|
2. **Attempt mitigations** in this order, re-running after each:
|
|
- Reduce model constants (fewer processes, smaller domains)
|
|
- Enable symmetry sets if the model has interchangeable actors
|
|
- Check invariants and temporal properties separately (invariants are cheaper)
|
|
- Switch to simulation mode (`-simulate num=10000`) for probabilistic coverage
|
|
3. **If mitigations are insufficient**, this is a signal that the system-level spec may be too complex for monolithic checking. See "Complexity management" below — subsystem decomposition is likely the right response.
|
|
4. **The timeout may be extended** if there is a specific reason to expect longer runtimes (e.g., a known-large but finite state space that is close to completing, or a liveness property that requires deep exploration). The developer must explicitly approve any timeout beyond 15 minutes.
|
|
|
|
Parse the TLC output to determine:
|
|
- Number of states explored and distinct states
|
|
- Whether each invariant and property passed or failed
|
|
- Any counterexample traces
|
|
- Checking duration
|
|
|
|
## Phase 5 — Handle results
|
|
|
|
### All properties pass
|
|
|
|
Update all statuses in `spec/formal/traceability.md` from `formalized` to `checked`. Report success with a summary of what was verified: number of properties checked, state space explored, time taken. Suggest proceeding with `/formspec.plan.4` to create an implementation plan.
|
|
|
|
### Meaning drift detected (Phase 2 issue)
|
|
|
|
If during Phase 2 you discover that the design is ambiguous or contradictory, making faithful TLA+ translation impossible:
|
|
|
|
1. Attempt to resolve the ambiguity by choosing the most natural reading of the design.
|
|
2. If the ambiguity cannot be resolved confidently:
|
|
- **STOP**
|
|
- Present the specific ambiguity to the developer with exact quotes from the design doc
|
|
- Explain what the two (or more) possible interpretations are
|
|
- Offer to amend the design doc to resolve the ambiguity
|
|
- After the developer decides, continue from Phase 2
|
|
|
|
### Design flaw detected (TLC finds a violation)
|
|
|
|
If TLC finds a property violation and the TLA+ faithfully represents the design:
|
|
|
|
**IMMEDIATELY STOP ALL WORK.**
|
|
|
|
Present to the developer:
|
|
|
|
1. **Which property was violated**: Name and human-readable description
|
|
2. **The counterexample trace**: Simplified to be understandable — show the sequence of states that leads to the violation, translated back into domain language (not raw TLA+ state dumps)
|
|
3. **Traceability**: Which design requirement this traces back to, with a link to the source document
|
|
4. **Plain-language explanation**: What this means for the system's actual behavior — what could go wrong in production
|
|
5. **Classification**: Whether this violates an explicit design requirement or a best-practice property that you added
|
|
|
|
**Do not proceed** until the developer explicitly acknowledges and decides how to resolve. Possible resolutions:
|
|
- Redesign the component (go back to `/formspec.design.1`)
|
|
- Add a constraint to the design
|
|
- Accept the risk with documented justification
|
|
- Modify the TLA+ model if the property was too strict
|
|
|
|
#### Recording design amendments
|
|
|
|
When the resolution involves changing the design doc (adding a constraint, modifying a user story, strengthening a precondition), record the amendment in the affected design document:
|
|
|
|
1. **Edit the requirement text** to reflect the corrected design.
|
|
2. **Add an inline reference** next to the changed text: `(see Amendment N)` where N is the next sequential amendment number for that document.
|
|
3. **Add an Amendments section** at the bottom of the design doc (create it if it doesn't exist) with a numbered entry explaining what changed and why:
|
|
|
|
```markdown
|
|
## Amendments
|
|
|
|
1. **US-2, "persists the updated queue"** — Added during formalization.
|
|
The original design did not persist the queue on enqueue. TLC found
|
|
a DeadlockFreedom counterexample: a crash after enqueue silently
|
|
drops the waiter, who then never acquires the lock. The queue must
|
|
be persisted to satisfy the architectural constraint on deadlock
|
|
freedom under crashes.
|
|
```
|
|
|
|
Each amendment entry must include:
|
|
- Which requirement was changed (e.g., US-2, SC-3)
|
|
- What the change was (quote or paraphrase the new text)
|
|
- The TLA+ property whose counterexample motivated the change
|
|
- A plain-language explanation of why the original design was insufficient
|
|
|
|
Amendments are permanent — they document the design's provenance and record which requirements were non-obvious enough to require formal verification to get right. Only remove an amendment if the requirement it annotates is itself removed through a larger redesign, or changed significantly enough that a plain rewrite is simpler than preserving the amendment history.
|
|
|
|
**Evaluating amendment volume**: Multiple amendments in a single formalization run are not necessarily a problem — they may indicate that the design was written at a high level (e.g., by an AI agent or during an early brainstorming phase) and the formalization is doing its job of tightening it. The signal to watch for is not the count but the *pattern*:
|
|
|
|
- **Amendments spread across different parts of the spec** (different user stories, different components, different properties) suggest the design was broadly sketched and TLC is finding the expected gaps. This is normal and productive.
|
|
- **Amendments concentrated around the same requirement or interaction** suggest a localized design flaw that incremental patching may not resolve. After 2-3 amendments to the same area, consider whether the component needs a broader redesign via `/formspec.design.1` rather than further patching.
|
|
|
|
After recording the amendment, restart from Phase 2.
|
|
|
|
## Complexity management
|
|
|
|
The TLA+ specification should be as simple as possible while still faithfully modeling the properties that matter. If the spec or the TLC checking becomes unwieldy, that is a signal to simplify — not to push harder.
|
|
|
|
**Rules of thumb for spec complexity** (these are guidelines, not hard limits — use judgment based on the system being modeled):
|
|
|
|
- **State variables**: If the main module exceeds ~10-15 state variables, consider whether some of them can be abstracted away or moved to a subsystem spec. Variables that exist solely to make TLC tractable (bookkeeping that doesn't correspond to anything in the design) are a warning sign.
|
|
- **Model constants**: Start small (2-3 processes, small domains). Only increase if TLC passes quickly and you want higher confidence. The goal is to find bugs, not to exhaustively verify at production scale — even 2 processes can expose most concurrency bugs.
|
|
- **Spec readability**: If the TLA+ spec is hard to read even for someone who understands the design, it is too complex. A spec that cannot be reviewed is not trustworthy.
|
|
|
|
**When to decompose into subsystems**: The `spec/formal/subsystems/` directory exists precisely for this case. If the full system spec becomes impractical to model-check — TLC times out even with minimal constants, or the spec is too complex to write correctly — partition it:
|
|
|
|
1. Identify components with clear boundaries and limited interaction with the rest of the system.
|
|
2. Create subsystem specs in `spec/formal/subsystems/` that model the component's internal behavior, abstracting its environment (other components) as nondeterministic inputs.
|
|
3. Simplify the main `System.tla` by replacing the detailed component behavior with an abstract summary of its guarantees (the properties verified by the subsystem spec).
|
|
4. The main spec then checks system-level properties (cross-component interactions, end-to-end guarantees) against the abstract component summaries, at a manageable state space.
|
|
|
|
This decomposition mirrors how real systems are built and verified — components have contracts, and the system is verified against those contracts rather than against full implementation details.
|
|
|
|
**Reporting**: After each TLC run, report these complexity metrics alongside the results: number of state variables, number of states explored, number of distinct states, checking duration. These help the developer track whether the spec is growing beyond tractable bounds across formalization iterations.
|