The Black Flag Playbook: Six Principles for Shipping with AI

Battle-tested principles for teams building real software with AI-generated code. Human judgment, tight scope, and weekly evidence — the disciplines that keep AI-built systems reliable.

Keith Pattison

Keith Pattison

April 20, 2026 6 min read

AI can generate code faster than any team can review it. That is the problem, not the product. The teams shipping reliable software with AI are not the ones running agents unattended for hundreds of hours — they are the ones keeping humans in the loop, scope small, and evidence weekly.

These six principles are how we work at Black Flag Design. We built them out of real engagements — financial services, longevity, education — and we apply them on every project. They are not a philosophy. They are a discipline.

01. Aim small, win small everyday

Making complex systems feel simple requires more work than making them complex.

We avoid the "run for hundreds of hours" hype that surrounds today's AI agents. Our clients don't need technical complexity — they need software that works simply.

We keep scope small at first. We build functional slices of the product. Clients see working software by week two. We deliver updates every week after that. Each feature starts small and scales up only after proving its value.

When working with AI, we face constant temptation to build the entire application at once. We resist. Instead, we aim small, build working pieces first, and expand only when those pieces deliver clear value. This discipline prevents wasted effort and keeps focus on what matters.

In practice: client-facing capabilities get early investment and sustained attention. Infrastructure scales in direct response to demonstrated need. The teams that try to build comprehensive technical foundations before establishing user value consistently hit delays and false starts.

02. Reject the narrative that AI can operate independently

Humans define requirements. AI proposes implementations. Humans verify quality.

From day one, we establish clear boundaries around AI responsibilities. We build systems where humans maintain control and direction while AI accelerates execution. This approach demands more from our team, not less.

When we face pressure to delegate core responsibilities to AI, we refuse. The consistency creates trust with our clients. They see AI as our tool, never our replacement.

Our pattern is simple: Generate → Validate → Annotate. AI generates. Humans validate. We annotate what worked and what didn't, and we feed that back into our rules. When conflicts arise between human judgment and AI suggestions, human judgment prevails.

Teams that delegated decision authority to AI without oversight hit extremes — either uncontrolled feature proliferation or endless optimization cycles. Balanced work distribution is the clearest indicator of effective human oversight.

03. Repeat the purpose to yourself, your clients, and your AI

Technology exists to serve client outcomes, not the other way around.

We develop our approach to AI with a simple rule: technology serves client outcomes. We establish this principle early and maintain it relentlessly. We reject features that showcase technical prowess but fail to improve client retention.

Every morning, we ask the same question: how does this technology serve our customers today? The repetition creates consistency.

Our commitment to purpose extends into our AI interactions. We build standardized prompts that reinforce client-first thinking. These prompts serve as guardrails that prevent feature creep and maintain focus on measurable outcomes.

The clearer our purpose statements, the more precisely our AI-generated solutions align with client needs. Directional clarity strongly predicts work focus. When we specify exact outcomes and identify which component to modify, AI-assisted work produces focused changes. When we hand over general mandates like "enhance user experience," effort scatters across interface, logic, and data simultaneously.

04. Plans are worthless, but planning is everything

We track velocity, not completion percentages.

We build momentum through regular delivery cycles. We set aside overly detailed roadmaps in favor of weekly progress. When user interviews lead to new product directions, our products adapt because we haven't locked ourselves into rigid plans.

Our process feels methodical, sometimes frustratingly so, but delivers reliable results week after week. The product grows stronger through this discipline.

We create structure that supports consistent progress: regular check-ins with defined outcomes, every team member participating regardless of role. AI tools accelerate individual work but demand more coordination, not less.

The pattern we see across engagements: code optimization investment stays deliberately modest through the first four months, then accelerates in the final period. Teams following rigid advance plans do the opposite — they optimize prematurely, refining systems before validating their fundamental value. Velocity collapses when user feedback demands course corrections.

05. Everybody is a manager now — don't be a chill one

Managing AI resembles directing junior developers, not solving puzzles.

Using AI to build means everybody is a manager now. The AI requires clear direction, constant oversight, and regular course correction.

User stories and acceptance criteria aren't bureaucratic overhead — they're essential guardrails that keep AI-assisted development on track. Without these structures, output drifts rapidly from intended direction.

We rebalance weekly, not quarterly — the way a desk rebalances a portfolio. It keeps risk in range, protects compounding, and avoids costly resets.

The signal in the data: roughly 85% of commits touch fewer than 200 lines — controlled, incremental progress. But the occasional 2,000-line spike tells the story. Those aren't planned features. They're corrections triggered by oversight gaps — moments when vague specifications let AI drift, requiring large fixes to bring systems back on track.

Ship features and upkeep together. Every release carries new capability and maintenance. The surface stays stable, outages stay rare, and momentum stays high.

06. If you're not getting better, you're getting worse

Reliability beats novelty.

Clients buy outcomes they can trust. We set our priorities by the promises we make: stable screens, safe data, clear workflows, and predictable support. Every technical choice answers those promises. If a decision threatens them, we change course.

We handle pivots like a desk rebalancing risk. We shift effort from polish to controls, logging, and permissions. When exposure comes back within range, we resume feature work.

Protect the promise, absorb the cost. When risk rises, fund the rework yourself and keep the client experience unchanged.

Choose vendors you can defend. If you cannot justify a provider to a client or regulator in one line, switch. If they are not in line with how fast you are moving, move on.

Keep optionality high. We avoid lock-ins on models, vendors, and data. When better tools arrive, we can switch without disruption. That is how we stay current without re-platforming.

Quiet, disciplined moves protect the promise — and discipline pays off. Large rewrites are the exception, not the rule, because we choose to improve continuously.


The common thread

These six principles share one premise: AI is leverage, not autonomy. Every rule here — small scope, human oversight, repeated purpose, planning rhythm, active management, continuous improvement — is a way of keeping a human hand on the wheel while AI does the heavy lifting.

That's the difference between AI-assisted software that ships reliably and AI-generated software that doesn't. The teams doing it well aren't the ones with the most sophisticated prompts. They're the ones with the most sophisticated disciplines.

If any of this sounds familiar — or if you're trying to apply it in a regulated environment where the stakes are real — we'd be glad to talk.

About the author

Keith Pattison
Keith Pattison

Founder, Black Flag Design

Keith leads Black Flag Design, a studio that ships production-ready software with AI-assisted development. He writes about the disciplines — small scope, weekly evidence, and human oversight — that keep AI-built systems reliable in the real world.

Related stories

More from the journal

Pen-and-ink sketch of a small clockwork robot working at a tool-covered workbench late at night while a human sleeps peacefully on a couch in the background, a wall clock reading 2:00 above
ai April 24, 2026 13 min read

The Agent Stays Up Late, Not Me

Every senior engineer knows the right way to set up a codebase. None of them do it. Here’s the four-stage framework we use — The Ratchet — to take a vibe-coded project all the way to a thing you’d trust in production, and the punchline about why this only just became worth doing.

Most teams have always known they should be running tests, type-checking, security audits, accessibility checks, dead-code analysis, prose linting, and a coverage floor. Most teams run two of those. Here’s why that math has finally inverted, and the four-stage framework we use to ratchet a vibe-coded project to a hardened one.

Keith Pattison

Keith Pattison

Founder, Black Flag Design

Read
Black Flag Journal
claude code April 20, 2026 5 min read

What a Year of Claude Code Trails Tells You About Your Team

Claude Code leaves evidence — sessions, commits, PRs, review notes. Read it like a logbook and you'll find what devs actually need to know before they go deeper.

After a year of shipping with Claude Code across real client work, the signal isn't in any single session — it's in the trails. Here's what those trails told us about where Claude Code shines, where it drifts, and the habits devs should build before they lean in harder.

Eli Wood headshot

Eli Wood

CEO, Black Flag Design

Read
The Death of Software as a Service (SaaS) cover image
ai systems March 27, 2026 2 min read

The Death of Software as a Service (SaaS)

Denver AI, a local group focused on moving AI out of theory and into execution; bringing together operators, founders, and builders to share real-world use cases, demos, and practical approaches to applying AI inside...

Eli Wood headshot

Eli Wood

CEO, Black Flag Design

Read