Write a workflow for formal spec-driven development based on TLA+

2026-02-10 14:27:41 +02:00 · 2026-02-10 14:27:41 +02:00 · d8f0429537
commit d8f0429537
parent b8546200a4
9 changed files with 944 additions and 0 deletions
--- a/.claude/commands/formspec.check.8.md
+++ b/.claude/commands/formspec.check.8.md
@ -0,0 +1,130 @@
+---
+description: Cross-artifact consistency check across all formspec artifacts
+allowed-tools: [Read, Glob, Grep, Bash, Task]
+---
+
+# FormSpec Check
+
+You are performing a comprehensive cross-artifact consistency analysis. This command is **read-only** for spec artifacts — it runs verification tools but does not modify spec files. It reports findings for the developer to act on.
+
+## Steps
+
+### 1. Read all artifacts
+
+Read every artifact that exists (skip gracefully if some don't exist yet):
+- `spec/system.md`
+- All feature design docs: Glob for `spec/features/*/design.md`
+- `spec/formal/System.tla` and `spec/formal/System.cfg`
+- `spec/formal/traceability.md`
+- `spec/plan.md`
+- `spec/tasks.md`
+- `spec/verification/matrix.md`
+
+Also scan:
+- Implementation source code (Glob for `**/*.rs`, `**/*.go`, `**/*.fst`, `**/*.py`, `**/*.ts`, `**/*.java`, etc., excluding `spec/` and common dependency directories like `node_modules/`, `target/`, `vendor/`)
+- Test files (Glob for `**/*_test.go`, `**/test_*.py`, `**/*_test.rs`, `**/*.test.ts`, `**/*.spec.ts`, etc.)
+- User documentation (`README.md`, `docs/**/*.md`, excluding `spec/`)
+
+### 2. Run consistency checks
+
+Perform each of these checks and record findings:
+
+**Design <-> TLA+ (if both exist)**:
+- Every formalizable requirement (US-N, SC-N, architectural constraints, security considerations) in the design docs has a corresponding TLA+ property in traceability.md
+- No TLA+ properties exist in System.tla without a corresponding entry in traceability.md
+- No orphaned entries in traceability.md (property exists in table but not in System.tla)
+
+**TLA+ <-> Implementation (if plan.md and matrix.md exist)**:
+- Every TLA+ property has at least one implementation-level verification method in plan.md
+- Every verification method has been executed and recorded in matrix.md
+- No FAIL or PENDING entries in matrix.md that lack explanation
+- Verification matrix entries correspond to completed milestones, not individual tasks — flag any property marked PASS whose milestone tasks are not all complete
+
+**Design <-> Implementation (if tasks.md exists)**:
+- Every user story (US-N) from every feature design doc is covered by at least one task
+- Every task is marked complete or has a documented reason for incompleteness
+- No tasks reference nonexistent user stories or properties
+- Every TLA+ property with a runtime verification method has a corresponding milestone
+- Every milestone's task list includes all the components needed to execute its verification method (e.g., a crash recovery milestone must include tasks for persistence, the coordinator, and the recovery test)
+- No milestone claims verification of a property that its tasks don't collectively cover
+
+**Interface Contract <-> Implementation (if design docs define Interface Contracts)**:
+- Every schema defined in a design doc's Interface Contract section has a corresponding implementation (struct definition, protobuf message, CLI argument parser, etc.)
+- Implementation field names and types match the contract (reasonable naming convention differences are fine — `request_id` vs `RequestId` — but semantic mismatches are not)
+- Error conventions in the implementation match what the contract specifies (status codes, error shapes, exit codes)
+- Wire format matches (if the contract says newline-delimited text, the implementation should not use length-prefixed binary)
+
+**Terminology consistency**:
+- Key domain terms used in design docs match naming in TLA+ variables/operators
+- Implementation code uses consistent naming with the TLA+ spec (where reasonable — snake_case vs CamelCase differences are fine, semantic mismatches are not)
+
+**Orphaned artifacts**:
+- No design requirements without traceability entries
+- No TLA+ properties without implementation verification
+- No implementation code files that don't trace back to any task (heuristic — flag files that seem substantial but aren't referenced)
+
+**Tests <-> Specification (if test files exist)**:
+- Every TLA+ property with a runtime verification method (from plan.md's Verification Strategy) has corresponding test cases in the codebase
+- Test names or comments reference the TLA+ property or design requirement they verify
+- No test files exist that don't trace back to any task or verification method (heuristic — flag test files that seem unrelated to the spec)
+- Property-based tests (if specified in the plan) exist and cover the properties they claim to verify
+- Properties claimed as "structural guarantee" in the verification matrix are actually enforced by the data model — verify that the implementation's types make violation unrepresentable, and that no runtime test exists that claims to check the property but is actually a no-op (a test that can never fail provides no signal and misrepresents the verification method)
+- Properties claimed as "compile-time guarantee" in the verification matrix are actually enforced by the language used for that component. In polyglot projects, verify that the component responsible for the property is written in the language whose type system or ownership model is claimed to enforce it. For example: if the matrix claims Rust ownership enforces a memory safety property, verify the relevant code is actually in Rust, not in a Go or Python wrapper that bypasses the guarantee. If the matrix claims F* dependent types prove a bound, verify an `.fst` module exists that covers the claimed property. Flag cases where a compile-time guarantee is claimed but the enforcement boundary is incomplete — e.g., an FFI call, an `unsafe` block, a type assertion, or a serialization boundary that re-introduces the class of error the type system was supposed to prevent
+
+**Documentation <-> Specification (if user docs exist)**:
+- Every user story marked as user-facing has corresponding documentation
+- Documentation traceability comments (`<!-- spec: ... -->`) reference valid design documents and requirement IDs
+- No behavioral claims in documentation contradict the design documents or formal properties
+- Behavioral guarantees stated in docs correspond to verified TLA+ properties (cross-reference traceability.md — a documented guarantee should trace to a "checked" property)
+- Documentation is not stale: if design docs are newer than documentation files, flag potential drift
+
+**Staleness**:
+- Compare file modification times: if design docs are newer than System.tla, flag potential drift
+- If System.tla is newer than verification/matrix.md, flag that verification may be stale
+- If design docs are newer than documentation files, flag potential documentation drift
+
+### 3. Run TLC (if spec exists)
+
+If `spec/formal/System.tla` and `spec/formal/System.cfg` exist:
+
+```bash
+tlc spec/formal/System.tla -config spec/formal/System.cfg -workers auto -metadir /tmp/tlc-states -cleanup
+```
+
+Record pass/fail for each property.
+
+### 4. Run language-specific verification tools (if implementation exists)
+
+Detect the project's language(s) from source files and run appropriate tools:
+- **Rust**: `cargo test`, `cargo clippy`
+- **Go**: `go test ./...`, `go vet ./...`
+- **F***: `fstar.exe` on `.fst` files
+- **Python**: `pytest` if tests exist
+- **TypeScript/JavaScript**: `npm test` if package.json exists
+
+Only run tools that are available and applicable. Skip gracefully if tooling isn't set up.
+
+### 5. Report
+
+Produce a structured report organized by severity:
+
+```
+## CRITICAL
+[TLA+ violations, verification failures, design-spec mismatches, documentation contradicting the spec, tests failing]
+
+## HIGH
+[Missing traceability links, unverified properties, stale artifacts after design changes, documented guarantees not backed by verified properties, missing tests for runtime-verified properties]
+
+## MEDIUM
+[Terminology inconsistencies, missing task coverage, partial traceability, documentation missing spec traceability comments, untraceable test files]
+
+## LOW
+[Minor staleness, documentation gaps for non-critical features, style issues]
+```
+
+Each finding should include:
+- What the issue is
+- Which artifacts are affected
+- Suggested resolution (which formspec command to run, or what manual action to take)
+
+If no issues are found at a severity level, omit that section. If everything is clean, say so clearly.