Test Integrity¶

When a test fails, the path of least resistance is to weaken the test: replace the assertion with assert True, swap an integration call for a Mock(), or sprinkle @pytest.mark.skip to make red go green. You want an LLM reviewer that catches these patterns at the moment the test edit lands, with full diff context.

from __future__ import annotations

from captain_hook import (
    BaseHookEvent,
    Event,
    HookResult,
    Prompt,
    SourceEdits,
    TestFile,
    Tool,
    on,
    prompt_check,
)

INTEGRITY_TEMPLATE = """
You are reviewing a test edit for signs the agent weakened tests to make them pass.

Block if you see any of:
- An assertion replaced by `assert True`, `pass`, or a no-op.
- A real call replaced by a `Mock()` that defeats the test's purpose.
- A bulk addition of `@pytest.mark.skip` or `pytest.skip(...)` without justification.
- An integration boundary (DB, HTTP, file I/O) swapped for a stub.

File: {fp}

--- old ---
{old}
--- new ---
{new}
"""


@on(Event.PostToolUse, only_if=[SourceEdits(lang="py", include_tests=True), TestFile(), Tool("Edit")])
def guard_test_edits(evt: BaseHookEvent) -> HookResult | None:
    if not (fp := evt.file) or not (old := evt.old) or not (new := evt.content):
        return None
    return prompt_check(
        evt,
        Prompt.from_template(INTEGRITY_TEMPLATE, fp=fp.path, old=old, new=new),
        prefix="TEST INTEGRITY",
        suffix=" If unsure whether the change weakens the test, allow.",
    )

What to learn: prompt_check() runs an inline LLM evaluation and returns a HookResult | None you can pass straight back from an @on handler. Prompt.from_template(TEMPLATE, **vars) renders the template with str.format-style substitution and dedents the block. The SourceEdits(lang="py", include_tests=True) + TestFile() combo restricts the hook to test file edits only, which is the only slice where weakening matters.