Written with Claude Fable 5, July 1-7 window

    Iwo's Rigor Pack

    Six free Claude Code skills that carry Fable-grade discipline to the model you actually run every day. Blind graders scored Opus 4.8 with them at 12 wins, 0 losses, 2 ties.

    Blind-graded, rubrics and raw outputs published. Every run, including the two our first versions lost.

    Copy the first skill

    No signup. Free to use and share. Copy per skill or download the full pack as a zip. Not affiliated with or endorsed by Anthropic.

    Claude Fable 5 is included in paid plans only through July 7, 2026, then moves to metered usage credits at $10/$50 per million tokens. We spent part of the window having Fable 5 write down how it works: the planning, verification, and editing discipline that makes its output feel different. The skills run on any model. They were tested on Opus 4.8, the model most of us fall back to.

    The six skills

    Each card is pure delivery: copy the skill, drop it in ~/.claude/skills/<name>/SKILL.md, done. Claude Code picks it up automatically.

    plan-gate

    2-0 in blind grading

    No edits until a written plan exists: goal, unknowns, success criteria, step order.

    What the graders said

    Won both tasks. Graders cited the explicit plan, call-site verification before editing, and safe defaults surfaced instead of decided silently.

    adversarial-verify

    1-0 (1 tie) + held-out win

    Refute your own work before presenting it. Findings in the deliverable, never the narration.

    What the graders said

    v1 lost on verbosity and was rewritten twice, every run published. The final version surfaced a spec contradiction the plain run resolved silently, and won a fresh held-out task (an aliasing bug plus vacuous tests).

    live-state-truth

    1-0 (1 tie) + held-out win

    Docs are stale by default. Verify against the live system before asserting or acting.

    What the graders said

    v1 lost on ceremony and was rewritten, both runs published. The final version won the held-out task: two facts resolving through different override chains, each with its evidence, plus a third doc-drift catch.

    scope-fence

    2-0 in blind grading

    Do exactly what was asked. Flag adjacent problems, never silently fix them.

    What the graders said

    Won both tasks. Graders cited minimal diffs plus complete noticed-not-touched reports, including a downstream crash the fix alone would not have saved.

    ruthless-editor

    2-0 in blind grading

    Every sentence earns its place. Cut 30 percent with zero information loss.

    What the graders said

    Won both tasks, including the only outright trap-score win of run 1: all hard facts retained at under half the length while the unaided rewrite let filler back in.

    memory-hygiene

    2-0 in blind grading

    What deserves persisting, how to write it so it survives time, and when to re-verify what you recall.

    What the graders said

    Won both tasks. Graders cited the explicit refusal to persist a secret, dated entries with triggers, and correcting a stale memory instead of routing around it.

    The receipts

    How these numbers were made

    Every skill ran against tasks with planted traps: an off-by-one with a test that cannot fail, a spec that contradicts itself, a README that lies about the code, a one-line ticket inside a file full of tempting unrelated fixes. Two arms per task, identical except the skill: Opus 4.8 with it loaded, and without.

    A separate grader judged each pair blind, in randomized order, against a written rubric it alone could see. The two skills that lost their first round were diagnosed from their actual outputs, revised, re-run, and then tested once more on held-out tasks they had never been iterated against. Both held-outs were wins.

    Limits, stated before you ask: the sample is small (2 to 3 tasks per skill), the graders are a strong LLM following rubrics rather than a human panel, and the tasks are text-complete, which is the weakest case for the live-verification skill. Loading a skill costs roughly 7 percent more tokens. The full method, every task, every rubric, and every run including the losses, are in the full report: .

    Where the skills did NOT help

    • Straightforward code review: when you explicitly ask Opus 4.8 to review a small function, it finds planted bugs without help. adversarial-verify earned its keep on spec contradictions and contract violations, not on tasks the model was already going to do well.
    • Facts fully in view: when every relevant file is on screen, Opus reads an env override fine unaided. live-state-truth tied there. Its measured edge appeared when facts resolve through different chains across sources.
    • Our first versions: v1 of two skills measured as pure overhead and lost their gradings. We published those runs too. If we had shipped v1, the honest verdict would have been placebo.

    These skills enforce rigor inside a session. They cannot remember anything between sessions, so the discipline resets every time you start fresh. That gap is what Second Brain exists for: persistent memory that carries what the model learned about your work from one session to the next. If the skills earn their keep this week, that is the natural next question. The SB50 code takes 50% off through July 6, because we set the launch promo to end before the model window does. Entirely your call.

    New skills and benchmark updates, by email if you want them. The pack stays free either way.

    How this page makes money: it does not. The pack is free and stays free. If you later buy Second Brain, that is the business. Written using Claude Fable 5 during its included-access window (July 1-7, 2026), tested on Claude Opus 4.8. Not affiliated with or endorsed by Anthropic.