libohad-dev/dotfiles

Fork 0

Ohad Livne de599d1850

Add a hint argument to more steps in the spec-driven workflow

2026-03-07 15:23:38 +02:00

8.5 KiB

Raw Blame History

description

argument-hint

allowed-tools

Cross-artifact consistency check across all formspec artifacts

scope

Read

Glob

Grep

Bash

Task

FormSpec Check

You are performing a comprehensive cross-artifact consistency analysis. This command is read-only for spec artifacts — it runs verification tools but does not modify spec files. It reports findings for the developer to act on.

Steps

1. Read all artifacts

Read every artifact that exists (skip gracefully if some don't exist yet):

spec/system.md
All feature design docs: Glob for spec/features/*/design.md
spec/formal/System.tla and spec/formal/System.cfg
spec/formal/traceability.md
spec/plan.md
spec/tasks.md
spec/verification/matrix.md

Also scan:

Implementation source code (Glob for **/*.rs, **/*.go, **/*.fst, **/*.py, **/*.ts, **/*.java, etc., excluding spec/ and common dependency directories like node_modules/, target/, vendor/)
Test files (Glob for **/*_test.go, **/test_*.py, **/*_test.rs, **/*.test.ts, **/*.spec.ts, etc.)
User documentation (README.md, docs/**/*.md, excluding spec/)

2. Run consistency checks

Perform each of these checks and record findings:

Design <-> TLA+ (if both exist):

Every formalizable requirement (US-N, SC-N, architectural constraints, security considerations) in the design docs has a corresponding TLA+ property in traceability.md
No TLA+ properties exist in System.tla without a corresponding entry in traceability.md
No orphaned entries in traceability.md (property exists in table but not in System.tla)

TLA+ <-> Implementation (if plan.md and matrix.md exist):

Every TLA+ property has at least one implementation-level verification method in plan.md
Every verification method has been executed and recorded in matrix.md
No FAIL or PENDING entries in matrix.md that lack explanation
Verification matrix entries correspond to completed milestones, not individual tasks — flag any property marked PASS whose milestone tasks are not all complete

Design <-> Implementation (if tasks.md exists):

Every user story (US-N) from every feature design doc is covered by at least one task
Every task is marked complete or has a documented reason for incompleteness
No tasks reference nonexistent user stories or properties
Every TLA+ property with a runtime verification method has a corresponding milestone
Every milestone's task list includes all the components needed to execute its verification method (e.g., a crash recovery milestone must include tasks for persistence, the coordinator, and the recovery test)
No milestone claims verification of a property that its tasks don't collectively cover

Interface Contract <-> Implementation (if design docs define Interface Contracts):

Every schema defined in a design doc's Interface Contract section has a corresponding implementation (struct definition, protobuf message, CLI argument parser, etc.)
Implementation field names and types match the contract (reasonable naming convention differences are fine — request_id vs RequestId — but semantic mismatches are not)
Error conventions in the implementation match what the contract specifies (status codes, error shapes, exit codes)
Wire format matches (if the contract says newline-delimited text, the implementation should not use length-prefixed binary)

Terminology consistency:

Key domain terms used in design docs match naming in TLA+ variables/operators
Implementation code uses consistent naming with the TLA+ spec (where reasonable — snake_case vs CamelCase differences are fine, semantic mismatches are not)

Orphaned artifacts:

No design requirements without traceability entries
No TLA+ properties without implementation verification
No implementation code files that don't trace back to any task (heuristic — flag files that seem substantial but aren't referenced)

Tests <-> Specification (if test files exist):

Every TLA+ property with a runtime verification method (from plan.md's Verification Strategy) has corresponding test cases in the codebase
Test names or comments reference the TLA+ property or design requirement they verify
No test files exist that don't trace back to any task or verification method (heuristic — flag test files that seem unrelated to the spec)
Property-based tests (if specified in the plan) exist and cover the properties they claim to verify
Properties claimed as "structural guarantee" in the verification matrix are actually enforced by the data model — verify that the implementation's types make violation unrepresentable, and that no runtime test exists that claims to check the property but is actually a no-op (a test that can never fail provides no signal and misrepresents the verification method)
Properties claimed as "compile-time guarantee" in the verification matrix are actually enforced by the language used for that component. In polyglot projects, verify that the component responsible for the property is written in the language whose type system or ownership model is claimed to enforce it. For example: if the matrix claims Rust ownership enforces a memory safety property, verify the relevant code is actually in Rust, not in a Go or Python wrapper that bypasses the guarantee. If the matrix claims F* dependent types prove a bound, verify an .fst module exists that covers the claimed property. Flag cases where a compile-time guarantee is claimed but the enforcement boundary is incomplete — e.g., an FFI call, an unsafe block, a type assertion, or a serialization boundary that re-introduces the class of error the type system was supposed to prevent

Documentation <-> Specification (if user docs exist):

Every user story marked as user-facing has corresponding documentation
Documentation traceability comments () reference valid design documents and requirement IDs
No behavioral claims in documentation contradict the design documents or formal properties
Behavioral guarantees stated in docs correspond to verified TLA+ properties (cross-reference traceability.md — a documented guarantee should trace to a "checked" property)
Documentation is not stale: if design docs are newer than documentation files, flag potential drift

Staleness:

Compare file modification times: if design docs are newer than System.tla, flag potential drift
If System.tla is newer than verification/matrix.md, flag that verification may be stale
If design docs are newer than documentation files, flag potential documentation drift

3. Run verification

If spec/hooks/run-checks exists and is executable, delegate all verification (TLC and language-specific tools) to it (see /formspec.init for the hook interface). Build the request from spec/formal/traceability.md (TLA+ properties), spec/plan.md (verification methods), and detected source files (language-specific checks). The external system runs all checks independently and returns results.

If the hook does not exist or fails, run checks locally:

TLC (if spec/formal/System.tla and spec/formal/System.cfg exist):

tlc spec/formal/System.tla -config spec/formal/System.cfg -workers auto -metadir /tmp/tlc-states -cleanup

Record pass/fail for each property.

Language-specific verification tools (if implementation exists):

Detect the project's language(s) from source files and run appropriate tools:

Rust: cargo test, cargo clippy
Go: go test ./..., go vet ./...
F*: fstar.exe on .fst files
Python: pytest if tests exist
TypeScript/JavaScript: npm test if package.json exists

Only run tools that are available and applicable. Skip gracefully if tooling isn't set up.

4. Report

Produce a structured report organized by severity:

## CRITICAL
[TLA+ violations, verification failures, design-spec mismatches, documentation contradicting the spec, tests failing]

## HIGH
[Missing traceability links, unverified properties, stale artifacts after design changes, documented guarantees not backed by verified properties, missing tests for runtime-verified properties]

## MEDIUM
[Terminology inconsistencies, missing task coverage, partial traceability, documentation missing spec traceability comments, untraceable test files]

## LOW
[Minor staleness, documentation gaps for non-critical features, style issues]

Each finding should include:

What the issue is
Which artifacts are affected
Suggested resolution (which formspec command to run, or what manual action to take)

If no issues are found at a severity level, omit that section. If everything is clean, say so clearly.

8.5 KiB Raw Blame History