Create an optional hook system for externally-managed verification

2026-02-10 18:52:52 +02:00 · 2026-02-10 18:52:52 +02:00 · 51ff9e2572
commit 51ff9e2572
parent bb7422f4c0
5 changed files with 99 additions and 9 deletions
--- a/.claude/commands/formspec.check.8.md
+++ b/.claude/commands/formspec.check.8.md
@ -83,9 +83,13 @@ Perform each of these checks and record findings:
 - If System.tla is newer than verification/matrix.md, flag that verification may be stale
 - If design docs are newer than documentation files, flag potential documentation drift

-### 3. Run TLC (if spec exists)
+### 3. Run verification

-If `spec/formal/System.tla` and `spec/formal/System.cfg` exist:
+If `spec/hooks/run-checks` exists and is executable, delegate all verification (TLC and language-specific tools) to it (see `/formspec.init` for the hook interface). Build the request from `spec/formal/traceability.md` (TLA+ properties), `spec/plan.md` (verification methods), and detected source files (language-specific checks). The external system runs all checks independently and returns results.
+
+If the hook does not exist or fails, run checks locally:
+
+**TLC** (if `spec/formal/System.tla` and `spec/formal/System.cfg` exist):

 ```bash
 tlc spec/formal/System.tla -config spec/formal/System.cfg -workers auto -metadir /tmp/tlc-states -cleanup
@ -93,7 +97,7 @@ tlc spec/formal/System.tla -config spec/formal/System.cfg -workers auto -metadir

 Record pass/fail for each property.

-### 4. Run language-specific verification tools (if implementation exists)
+**Language-specific verification tools** (if implementation exists):

 Detect the project's language(s) from source files and run appropriate tools:
 - **Rust**: `cargo test`, `cargo clippy`
@ -104,7 +108,7 @@ Detect the project's language(s) from source files and run appropriate tools:

 Only run tools that are available and applicable. Skip gracefully if tooling isn't set up.

-### 5. Report
+### 4. Report

 Produce a structured report organized by severity:

--- a/.claude/commands/formspec.formalize.2.md
+++ b/.claude/commands/formspec.formalize.2.md
@ -113,7 +113,11 @@ Parse the TLC output to determine:

 ### All properties pass

-Update all statuses in `spec/formal/traceability.md` from `formalized` to `checked`. Report success with a summary of what was verified: number of properties checked, state space explored, time taken. Suggest proceeding with `/formspec.plan.4` to create an implementation plan.
+If `spec/hooks/run-checks` exists and is executable, delegate TLC verification to it instead of running TLC locally (see `/formspec.init` for the hook interface). Present the returned results to the developer, including any dashboard URLs.
+
+If the hook does not exist or fails, run TLC locally as described in Phase 4.
+
+In either case, update all statuses in `spec/formal/traceability.md` from `formalized` to `checked`. Report success with a summary of what was verified: number of properties checked, state space explored, time taken. Suggest proceeding with `/formspec.plan.4` to create an implementation plan.

 ### Meaning drift detected (Phase 2 issue)

@ -133,6 +137,8 @@ If TLC finds a property violation and the TLA+ faithfully represents the design:

 **IMMEDIATELY STOP ALL WORK.**

+Update `spec/formal/traceability.md` to set the violated property's status to `violated`.
+
 Present to the developer:

 1. **Which property was violated**: Name and human-readable description
--- a/.claude/commands/formspec.implement.6.md
+++ b/.claude/commands/formspec.implement.6.md
@ -75,11 +75,13 @@ After completing a task, if the next task's dependencies are all met, continue i

 After completing a task, check whether any verification milestones in `spec/tasks.md` have all their tasks complete. For each newly-completed milestone:

-**a. Run the milestone's verification method** (e.g., integration tests, property-based tests, recovery tests). These are the tests that validate TLA+ properties end-to-end across multiple components.
+**a. Run the milestone's verification**: If `spec/hooks/run-checks` exists and is executable, delegate the milestone's verification to it (see `/formspec.init` for the hook interface). The request should include the milestone's properties, verification methods, and hints (e.g., the test commands from the plan). The external system runs the checks independently and returns results.

-**b. Handle milestone verification failures**: If the integration test fails, diagnose whether the issue is in one of the component tasks (go back and fix it) or a cross-component interaction that wasn't caught by unit tests. If it reveals a spec inconsistency, STOP as described above.
+If the hook does not exist or fails, run the milestone's verification method locally (e.g., integration tests, property-based tests, recovery tests). These are the tests that validate TLA+ properties end-to-end across multiple components.

-**c. Update the verification matrix**: After the milestone's verification passes, update `spec/verification/matrix.md`. Create the file if it doesn't exist, using this format:
+**b. Handle milestone verification failures**: If the verification fails (whether reported by the hook or observed locally), diagnose whether the issue is in one of the component tasks (go back and fix it) or a cross-component interaction that wasn't caught by unit tests. If it reveals a spec inconsistency, STOP as described above.
+
+**c. Update the verification matrix**: If the hook succeeded, it is the record-keeper — skip the markdown table update. If checks ran locally (no hook, or hook failed), update `spec/verification/matrix.md` directly. Create the file if it doesn't exist, using this format:

 ```markdown
 # Verification Matrix
--- a/.claude/commands/formspec.init.0.md
+++ b/.claude/commands/formspec.init.0.md
@ -32,6 +32,7 @@ Create the following directories:
 spec/formal/subsystems/
 spec/features/
 spec/verification/
+spec/hooks/
 ```

 ### 4. Generate spec/system.md
@ -63,3 +64,75 @@ The System Overview should be a concise but complete description — enough for
 ### 5. Report

 Tell the user what was created and suggest they proceed with `/formspec.design.1 <feature-name>` to start designing features.
+
+## Hook convention
+
+The `spec/hooks/` directory holds executable scripts that formspec commands invoke at defined points. This allows projects to plug in custom behavior (e.g., delegating verification to a TCMS, triggering CI pipelines, posting notifications) without modifying the commands themselves.
+
+If a hook script does not exist, the command falls back to its default behavior (running checks locally and updating markdown tables). If a hook exists and is executable, the command delegates to it instead.
+
+### run-checks
+
+**Path**: `spec/hooks/run-checks`
+
+**Called by**: `/formspec.formalize`, `/formspec.verify`, `/formspec.implement`, `/formspec.check`
+
+**When**: When verification needs to be performed (TLC model checking, milestone tests, consistency checks).
+
+**Purpose**: Delegate verification execution to an external system. The agent determines *what* needs checking from the workflow state; the hook tells an external system to *run* the checks, store the results in its own datastore, and report back. This keeps the external system as the independent record-keeper — results are produced by the verification infrastructure, not self-reported by the agent.
+
+**Request** (JSON on stdin):
+
+```json
+{
+  "command": "formspec.implement",
+  "project_root": "/home/user/projects/mutex",
+  "checks": [
+    {
+      "property": "MutualExclusion",
+      "source": "features/2026-02-10-mutex/design.md",
+      "requirement": "US-1",
+      "type": "invariant",
+      "method": "integration test",
+      "hint": "go test -race -run TestMutualExclusion ./..."
+    }
+  ]
+}
+```
+
+**Request fields**:
+- `command`: Which formspec command is requesting verification.
+- `project_root`: Absolute path to the project root, so the external system can locate source files and specs.
+- `checks`: Array of verification requests:
+  - `property`: TLA+ property name (or descriptive label for non-formalized checks).
+  - `source`: Design document path relative to `spec/`.
+  - `requirement`: Requirement ID (e.g., `US-1`, `SC-3`).
+  - `type`: `invariant`, `temporal`, `safety`, `liveness`, `structural`, `compile-time`, `static-analysis`, `unit-test`, or `integration-test`.
+  - `method`: Description of the verification method.
+  - `hint`: Suggested command or approach. The external system may use this directly, adapt it to its own environment, or ignore it entirely.
+
+**Response** (JSON on stdout):
+
+```json
+{
+  "results": [
+    {
+      "property": "MutualExclusion",
+      "status": "pass",
+      "details": "1024 states explored, 0 violations",
+      "url": "https://dashboard.example.com/runs/4821#MutualExclusion"
+    }
+  ]
+}
+```
+
+**Response fields**:
+- `results`: Array of verification outcomes, one per requested check:
+  - `property`: Echoes back the property name from the request.
+  - `status`: `pass`, `fail`, or `error`.
+  - `details`: Human-readable summary of the result — counterexample, error message, state space size, etc. The agent presents this to the developer.
+  - `url`: Optional. Link to the full result in the external system's dashboard. If present, the agent includes it in the report to the developer.
+
+**Exit codes**: 0 = all checks executed (individual checks may still have `fail` status). Non-zero = the hook itself failed (could not reach the external system, configuration error, etc.). On non-zero exit, the command warns and falls back to running checks locally and updating the markdown tables.
+
+**Behavior when hook is absent**: Commands run checks locally (TLC, `go test`, etc.) and update `spec/formal/traceability.md` and `spec/verification/matrix.md` directly. The markdown tables serve as the built-in, zero-configuration default.
--- a/.claude/commands/formspec.verify.3.md
+++ b/.claude/commands/formspec.verify.3.md
@ -19,6 +19,10 @@ If either is missing, tell the user to run `/formspec.formalize.2` first and sto

 ### 2. Run TLC

+If `spec/hooks/run-checks` exists and is executable, delegate verification to it instead of running TLC locally (see `/formspec.init` for the hook interface). Build the request from `spec/formal/traceability.md` (properties and their types) and `spec/formal/System.cfg` (the TLC invocation hint). Present the returned results to the developer, including any dashboard URLs.
+
+If the hook does not exist or fails, run TLC locally:
+
 ```bash
 tlc spec/formal/System.tla -config spec/formal/System.cfg -workers auto -metadir /tmp/tlc-states -cleanup
 ```
@ -27,12 +31,13 @@ Run from the project root directory. The `-metadir` flag directs TLC's working s

 ### 3. Parse and report

-Read the TLC output and produce a structured report:
+Read the TLC output (or hook response) and produce a structured report:

 **Summary**:
 - States explored / distinct states
 - Duration
 - Overall result: PASS or FAIL
+- Dashboard URL (if provided by hook)

 **Property results** (table format):