Steward H2 proof — 2026-06-10

Status: H2 smoke passed · Audience: maintainers and agents

Claim

mcp_flutter is the first external Skill Steward adoption target with a current steward/v1 contract and a passing H2 smoke loop. This proves contract discovery, safe action inspection, quick probe execution, and one strict dogfood benchmark. It does not prove product runtime correctness, full H4 workflow breadth, or H5 promoted harness maturity.

Repo state

  • Branch: codex/steward-adoption-h2
  • Contract commit: af35ac3185c31fb26edb4aa546834f70208a5600
  • Redacted proof snapshot commit: f6fd961bf3ba34bc9b663469169733fc3bfa091c
  • Worktree state during H2 loop: clean
  • Native gate: make check-contracts
  • Steward scenario: mcp_flutter.web-dogfood-warm
  • Local ignored benchmark summary: .steward/benchmark-summaries/mcp_flutter.web-dogfood-warm.strict.json
  • Tracked redacted review summary: docs/evidence/generated/mcp_flutter.web-dogfood-warm.strict.redacted.json

Contract refresh

  • repo.archetype now uses the public Skill Steward archetype vocabulary: harness.
  • stewardship.repo_quality is declared with contract_spec, maturity_model, and evidence path.
  • AGENTS.md now records the released steward executable as the reusable command surface. The earlier tool/steward/run.sh bridge was temporary proof scaffolding for a stale local binary and is not the adoption pattern for future repos.

Portable commands

Run from the repository root after installing a current steward binary.

make check-contracts
steward doctor --json
steward actions list --json
steward action inspect fmt.check.tool-prefix --json
steward probe --profile quick --json
steward benchmark --scenario mcp_flutter.web-dogfood-warm --strict --output .steward/benchmark-summaries/mcp_flutter.web-dogfood-warm.strict.json --json

Local provenance

The original run used the maintainer's private Dart SDK path plus a sibling Skill Steward checkout. A later proof used a temporary repo-local wrapper to bridge a stale global binary. Both forms are non-copyable provenance only; future evidence should use the released steward executable, Skill Steward's setup action, or an explicit maintainer source checkout.

Results

GateResult
make check-contractsPassed; existing skill metadata/source warnings only
doctor --jsonPassed; config.valid: true, repo.archetype: "harness"
actions list --jsonPassed; exposed fmt.check.tool-prefix
action inspect fmt.check.tool-prefix --jsonPassed; action is bounded_local / auto, with no writes, no network, no secrets, and no destructive effects
probe --profile quick --jsonPassed; selected fmt.check.tool-prefix
benchmark --strict --output ...Passed; result: "pass", blocked_by: null, durability.status: "ready", proof.status: "ready"

The tracked redacted summary preserves the benchmark result, proof status, durability status, subject commit, and warnings while omitting machine-local command paths.

Non-claims

  • This does not prove the WebMCP runtime dogfood path.
  • This does not promote a new diagnostic or action to H5.
  • This does not prove every mcp_flutter workflow is agent-operable.
  • source.commit is treated as the benchmark subject commit. A later local HEAD can differ and still produce a warning; this is not remote-equivalence proof.