Workflows¶
Workflows enforce multi-step checklists before an agent is allowed to stop. They are the right choice when you need to guarantee that a sequence of actions has been completed -- running linters, executing tests, generating artifacts -- before work is considered done.
When to use workflows¶
Use a workflow when:
- Multiple steps must be completed in a session (not just one gate condition)
- You need to validate that files were created on disk
- You want to run a callback after all steps pass
- The agent should receive specific "next step" instructions when blocked
For single-condition gates, use gate() instead. Workflows are for ordered checklists.
Registration¶
Register a workflow with workflow(). It creates a blocking hook on SubagentStop with max_fires=1.
from captain_hook import workflow, Step, text_matches
workflow(
label="QA-GATE",
marker="QA COMPLETE",
steps=[
Step(
name="run linter",
check=text_matches(r"ruff check"),
stopped_at="Linter has not been run.",
next_step="Run: ruff check src/",
),
Step(
name="run tests",
check=text_matches(r"pytest.*passed"),
stopped_at="Tests have not been run.",
next_step="Run: pytest tests/ -x",
),
],
)
- Parameters:
-
label-- prefix for all messages from this workflow (e.g.,"QA-GATE") -
marker-- text that must appear in the transcript to indicate completion -
steps-- ordered list ofStepobjects to validate -
artifacts-- list ofArtifactobjects for file validation (optional) -
post_complete-- callback(evt) -> HookResult | Noneinvoked after all checks pass (optional) -
tests-- inline test dict (optional)
How workflows resolve¶
When a subagent attempts to stop, the workflow guard runs this evaluation pipeline:
flowchart TD
A[SubagentStop event] --> B{Marker in transcript?}
B -->|No| C[Find first incomplete step]
C --> D[Block: label INCOMPLETE - stopped_at + next_step]
B -->|Yes| E{All artifacts valid?}
E -->|No| F[Block: label INCOMPLETE - artifact error]
E -->|Yes| G{post_complete defined?}
G -->|No| H[Allow stop]
G -->|Yes| I[Run post_complete callback]
I --> H
Step 1: Marker check¶
The guard first checks whether the marker string appears anywhere in the transcript's full text. If found, the workflow skips step evaluation and proceeds directly to artifact validation.
The marker is a string the agent is expected to output (or that appears in tool output) when the workflow is complete. Choose a distinctive marker that won't appear accidentally.
Step 2: Step evaluation¶
If the marker is absent, the guard iterates through steps in order. The first step whose check predicate returns False gates the agent with a message combining stopped_at and next_step:
Step 3: Artifact validation¶
After the marker is found, each Artifact is validated:
- The file at
artifact.pathmust exist on disk - The file must parse as valid JSON matching the artifact's Pydantic
model - The optional
validatecallback must returnNone(any string return is treated as an error message)
Step 4: Post-complete callback¶
If all steps and artifacts pass, the optional post_complete callback runs. It receives the event and can return a HookResult for a final gate or warning, or None to allow the stop.
Step objects¶
Each Step represents one checkpoint in the workflow.
from captain_hook import Step, text_matches
Step(
name="run tests",
check=text_matches(r"pytest.*\d+ passed"),
stopped_at="Tests have not been run.",
next_step="Run: pytest tests/ -x",
)
| Field | Type | Description |
|---|---|---|
name |
str |
Human-readable name for the step (used in logging) |
check |
(Transcript) -> bool |
Predicate that returns True when the step is complete |
stopped_at |
str |
Message explaining what is incomplete |
next_step |
str |
Instruction telling the agent what to do next |
The check function receives the full Transcript object. Use text_matches() for simple regex checks against the transcript text, or write a custom predicate for more complex logic:
from captain_hook import Step, Transcript
def has_enough_tests(t: Transcript) -> bool:
return t.tool_uses.where(name="Edit", file="test_*.py").count() >= 3
Step(
name="write tests",
check=has_enough_tests,
stopped_at="Not enough test files have been written.",
next_step="Write at least 3 test files before stopping.",
)
text_matches¶
A convenience factory that creates a check predicate from a regex pattern. It searches the transcript's full_text property.
from captain_hook import text_matches
# Matches if "ruff check" appears anywhere in the transcript
check = text_matches(r"ruff check")
# Matches if pytest reports passing tests
check = text_matches(r"pytest.*\d+ passed")
# Matches if a specific command was run
check = text_matches(r"uv run mtest.*ok")
Artifacts¶
Artifact objects validate that files exist on disk and contain valid data. They are checked after the marker is found in the transcript.
from pydantic import BaseModel
from captain_hook import workflow, Step, Artifact, text_matches
class LintReport(BaseModel):
errors: int
warnings: int
workflow(
label="QUALITY",
marker="QUALITY CHECK DONE",
steps=[
Step(
name="run linter",
check=text_matches(r"ruff check"),
stopped_at="Run the linter.",
next_step="Run: ruff check src/ --output-format json > reports/lint.json",
),
],
artifacts=[
Artifact(
path="reports/lint.json",
model=LintReport,
validate=lambda r: "Too many errors" if r.errors > 0 else None,
),
],
)
| Field | Type | Description |
|---|---|---|
path |
str |
File path to check (relative to the project root) |
model |
type[BaseModel] |
Pydantic model to validate the file's JSON content |
validate |
(M) -> str \| None |
Optional validator; return a string to block, None to pass |
Validation sequence:
- File must exist at
path - File content must parse as JSON and validate against
model - If
validateis provided, it must returnNone
Any failure produces a block message: QUALITY INCOMPLETE: reports/lint.json not found.
post_complete callback¶
Run a final check or side effect after all steps and artifacts pass:
from captain_hook import workflow, Step, text_matches, HookResult, Action
def final_review(evt):
edit_count = evt.ctx.t.tool_uses.where(name="Edit").count()
if edit_count > 20:
return HookResult(
action=Action.warn,
message=f"Large changeset ({edit_count} edits) -- consider splitting the PR.",
)
return None
workflow(
label="REVIEW",
marker="REVIEW COMPLETE",
steps=[
Step(
name="self-review",
check=text_matches(r"self.review|code review"),
stopped_at="Perform a self-review.",
next_step="Review your changes against the style guide.",
),
],
post_complete=final_review,
)
The callback receives the event and returns HookResult | None. A block result prevents the stop; a warn result adds advisory context; None allows the stop.
Tips¶
Naming conventions. Use uppercase labels like "QA-GATE", "REVIEW", "DEPLOY-CHECK" so the block messages stand out in the transcript. The label is prepended to all messages: QA-GATE INCOMPLETE: ....
Marker placement. The marker should be something the agent explicitly outputs after completing the workflow, not something that appears incidentally. "QA COMPLETE" is better than "done".
Debugging workflows. When a workflow unexpectedly blocks, check:
- Is the marker text exactly as specified (case-sensitive)?
- Does the
checkpredicate match the actual transcript text? Usetext_matches()with a regex that accounts for formatting variation. - For artifacts, does the file exist at the exact
pathand does its content validate against themodel?
Single fire. Workflows register with max_fires=1 on SubagentStop. Once the workflow allows the stop, it will not fire again in the same session.