When to Use What: Choosing the Right Claude Code Harness for the Task

Hooks, skills, CLAUDE.md, MCPs, sub-agents, plan mode. A practical decision framework for picking the piece of the harness that actually earns its keep on the task in front of you.

Eli Wood headshot

Eli Wood

April 23, 2026 6 min read
Whiteboard sketch: a toolbox with six distinct Claude Code harness tools, and a hand picking the right one for the job.

Every Claude Code session we run at Black Flag Design happens inside a harness: hooks, skills (slash commands), project CLAUDE.md, MCP servers, sub-agents, plan mode. Each of these pieces is a different shape of leverage. Each one has a setup cost. And the most common mistake we see (in our own work and on client teams) is reaching for the wrong piece, or reaching for all of them, on the task in front of you.

This is the decision framework we actually use. The question on every task is the same: what is the smallest harness change that will make this session land cleanly?

1. Plan mode is for scope you don't trust yet

Whiteboard sketch: a stick figure stands at a fork in the road with branching arrows pointing toward a flag, a looping arrow, and an exclamation mark.

Plan mode earns its keep when you don't yet know the shape of the diff. The prompt is fuzzy. The blast radius is unclear. You're one wrong turn away from a sprawling commit nobody wants to review.

The tell is simple: can you describe the change in one sentence? If not, start in plan mode. Let Claude Code propose the moves, push back on the scope, and only exit the plan once the diff fits in your head. Running plan mode on a one-line fix is overhead. Skipping it on a cross-cutting refactor is how you end up with a 40-file diff at 4:30pm on a Friday.

What to do: default to plan mode when the prompt is larger than the patch. Skip it when you already know exactly which file, which function, and which acceptance check.

2. Skills are for disciplines you run more than twice

A slash command (skill) is a frozen prompt plus a frozen workflow. Every skill costs you an hour or two to write well. It pays back every time you run it.

The rule we use: if we've done the same kind of work three times and the second and third runs felt like we were re-explaining ourselves, the fourth run gets a skill. Meeting notes to action items. PRD drafts. Feedback triage. Weekly client updates. Those are skills in our repo because they happen on a rhythm.

Don't make a skill out of a one-off. Don't make a skill out of a task that changes shape every time you touch it. Skills are for disciplines, not novelty.

What to do: track repeated prompts for a week. The ones you typed three times become skills. The ones you typed once stay as prompts.

3. CLAUDE.md is for the rules of the house

CLAUDE.md lives in the repo and loads into every session in that repo, automatically. That is its power and its trap. Everything you put there costs attention on every single run.

So the bar is high: CLAUDE.md is for rules that apply to every task in this codebase. Which tokens to use. Which folders are read-only. Which conventions matter. Which clients and their stakeholders the repo serves. Short, declarative, pointer-heavy. Not a place for task instructions or one-time context.

If a rule only applies to some sessions, it belongs in a skill or a prompt, not here. If it applies to everyone every time, write it once in CLAUDE.md and stop re-explaining it.

What to do: audit your CLAUDE.md monthly. Anything that doesn't apply to every session in that repo is moved out. The file gets shorter over time, not longer.

4. Hooks are for rules the model should not be allowed to break

Hooks run deterministically, outside the model's control. A pre-commit hook that blocks a missing test. A post-edit hook that runs the linter. A pre-tool hook that rejects a hardcoded color.

The distinction that matters: CLAUDE.md is guidance; hooks are enforcement. Guidance drifts. Enforcement doesn't. Any rule where you cannot accept a single exception (security, licensing, token purity, destructive-action gates) should be a hook, not a paragraph in a markdown file.

The cost is real: hooks need maintenance, they can fire in the wrong place, and a badly written hook slows every session in the repo. Keep them narrow, loud when they trigger, and easy to bypass when a human explicitly decides to.

What to do: write a hook the second time the same rule gets violated in a review. Once is a lesson. Twice is evidence it needs to be mechanical.

5. MCPs are for reaching into real systems, not for reading files

An MCP server puts a live system (Linear, Gmail, your CMS, your database) inside the session's reach. The value is the live part. The moment Claude Code can query the real ticket, the real inbox thread, the real blog draft, you stop watching it hallucinate what those things contain.

But MCPs are not a substitute for the filesystem. If the information lives in the repo, read the repo. If it lives in a system outside the repo, and you'd otherwise paste a screenshot or a CSV into the prompt, that's the MCP signal.

We add an MCP when the same external system shows up in three different sessions and the workaround is getting painful. We don't add one because it exists.

What to do: list the external systems you paste into prompts this week. Two or more paste-ins of the same system is the threshold for an MCP.

6. Sub-agents are for parallel, independent, scoped work

Whiteboard sketch: a central person at a desk with dotted lines extending to three small robot helpers, each facing a separate workspace.

Sub-agents (spawning a teammate for a specific slice of work) are the piece most teams over-reach with. They shine when the work splits cleanly into independent chunks: five meetings to process, three clients to update, six components to audit. One sub-agent per slice, explicit scope, no shared state beyond a task list.

They fail when the work is sequential, or when the slices depend on each other's output. Two sub-agents fighting over the same files is slower than one session doing them in order. We've burned enough hours on git lock contention and conflicting edits to believe this firmly.

We wrote up a year of those trails and the pattern held: the teams that got the most out of sub-agents were also the strictest about scoping them.

What to do: spawn sub-agents only when the work is genuinely parallel and the slices don't touch each other. One teammate per client, one per transcript, one per component. Never one per "phase."


The one-paragraph version

Match the harness piece to the shape of the task. Plan mode when the scope is fuzzy. Skills when the discipline repeats. CLAUDE.md for rules that apply every run. Hooks when the rule must be mechanical. MCPs when the information lives outside the repo. Sub-agents only when the work parallelizes cleanly. Everything else is a plain session with a tight prompt, which is still the right answer more often than people admit.

If you're rebuilding your harness this quarter, start with the task log, not the tool list. The tasks will tell you which pieces earn their keep.

About the author

Eli Wood headshot
Eli Wood

CEO, Black Flag Design

Eli Wood leads Black Flag Design, a creative technology company focused on shipping ambitious digital products, AI systems, and design-forward software with a direct point of view on how technology changes work.

Related stories

More from the journal

Pen-and-ink sketch of a small clockwork robot working at a tool-covered workbench late at night while a human sleeps peacefully on a couch in the background, a wall clock reading 2:00 above
ai April 24, 2026 13 min read

The Agent Stays Up Late, Not Me

Every senior engineer knows the right way to set up a codebase. None of them do it. Here’s the four-stage framework we use — The Ratchet — to take a vibe-coded project all the way to a thing you’d trust in production, and the punchline about why this only just became worth doing.

Most teams have always known they should be running tests, type-checking, security audits, accessibility checks, dead-code analysis, prose linting, and a coverage floor. Most teams run two of those. Here’s why that math has finally inverted, and the four-stage framework we use to ratchet a vibe-coded project to a hardened one.

Keith Pattison

Keith Pattison

Founder, Black Flag Design

Read
Black Flag Journal
claude code April 20, 2026 5 min read

What a Year of Claude Code Trails Tells You About Your Team

Claude Code leaves evidence — sessions, commits, PRs, review notes. Read it like a logbook and you'll find what devs actually need to know before they go deeper.

After a year of shipping with Claude Code across real client work, the signal isn't in any single session — it's in the trails. Here's what those trails told us about where Claude Code shines, where it drifts, and the habits devs should build before they lean in harder.

Eli Wood headshot

Eli Wood

CEO, Black Flag Design

Read
Black Flag Journal
playbook April 20, 2026 6 min read

The Black Flag Playbook: Six Principles for Shipping with AI

Battle-tested principles for teams building real software with AI-generated code. Human judgment, tight scope, and weekly evidence — the disciplines that keep AI-built systems reliable.

The six rules we use to ship production software with AI. Small scope, weekly demos, human-led oversight, and continuous improvement — drawn from six months of real client engagements.

Keith Pattison

Keith Pattison

Founder, Black Flag Design

Read