Six Principles for Shipping Reliable Software with AI

How we stay sane while "vibes" infiltrate codebases — the playbook we run on every engagement.

Keith Pattison

Keith Pattison

April 20, 2026 5 min read

When AI-generated code started flooding projects — tools writing thousands of lines in minutes — we faced the same chaos everyone did. Promises of speed delivered drift instead. We found sanity through discipline.

Six principles emerged from our work in school districts and financial services. The vendors change. The models change. The principles hold.

1. Aim small, win small every day

Making complex systems feel simple requires more work than making them complex. We avoid the "run for hundreds of hours" hype that surrounds today's AI agents. Our clients don't need technical complexity — they need software that works simply.

We keep scope small at first. We build functional slices of the product. Clients see working software by week two, and we deliver updates every week after that. Each feature starts small and scales up only after proving its value.

When working with AI, we face constant temptation to build the entire application at once. We resist. Aim small, build working pieces first, expand only when those pieces deliver clear value.

In practice: across a recent six-month financial services engagement, client-facing capabilities received early investment while infrastructure scaled in direct response to demonstrated need. Database architecture, authentication, and API layers materialized in proportion to validated demand — not anticipated demand.

2. Reject the narrative that AI can operate independently

From day one, we establish clear boundaries around AI responsibilities. Humans define requirements. AI proposes implementations. Humans verify quality. This cycle — Generate, Validate, Annotate — governs every aspect of our development process. When conflicts arise between human judgment and AI suggestions, human judgment prevails.

We assign AI tasks based on our team members' skills. The designer directs visual development. The engineer manages system architecture. When responsibilities blur, we clarify them explicitly.

In practice: across 626 distinct development decisions in one engagement, roughly two-thirds focused on new capabilities and one-third on refinement. The balance held throughout. Organizations that delegated decisions to unsupervised AI experienced extremes — uncontrolled feature proliferation on one end, endless optimization paralysis on the other.

3. Repeat the purpose to yourself, your clients, and your AI

Technology exists to serve client outcomes, not the other way around. We establish this principle early and maintain it relentlessly. We reject features that showcase technical prowess but fail to improve client retention.

Our commitment to purpose extends beyond team meetings into our AI interactions. We build standardized prompts that reinforce a client-first mentality. They serve as guardrails that prevent feature creep and keep focus on measurable outcomes.

Clarity creates quality. The clearer our purpose statements, the more precisely AI-generated solutions align with client needs. Documentation beats conversation — we build rule engines for AI interactions instead of relying on unstructured dialogue.

4. Plans are worthless, but planning is everything

We build momentum through regular delivery cycles. We set aside overly detailed roadmaps in favor of weekly progress. We track velocity, not completion percentages.

When user interviews lead to new product directions, our products adapt because we haven't locked ourselves into rigid plans. The process feels methodical, sometimes frustratingly so, but it delivers reliable results week after week.

In practice: new capability development progressed steadily across six months. Code optimization investment stayed deliberately modest through the first four months, then accelerated substantially in the final period — when patterns had crystallized across validated features. Organizations following rigid advance planning hit the inverse pattern: they optimized prematurely, then watched velocity collapse when user feedback demanded course corrections.

5. Everybody is a manager now — don't be a chill one

Using AI to build means everybody is a manager now. Managing AI resembles directing a team of junior developers more than solving a technical puzzle. The AI requires clear direction, constant oversight, and regular course correction.

User stories and acceptance criteria aren't bureaucratic overhead — they're essential guardrails that keep AI-assisted development on track.

We rebalance weekly, not quarterly. We correct small drifts each week, the way a desk rebalances a portfolio. It keeps risk in range, protects compounding, and avoids costly resets.

In practice: 85% of commits in the same engagement touched fewer than 200 lines — controlled, incremental progress. But six commits spiked above 2,000 lines each. Those weren't planned features. They were corrections triggered by oversight gaps — moments when vague specifications let AI drift, requiring large fixes to bring systems back on track. The sub-200-line commits came from clear user stories with acceptance criteria. AI knew exactly what to change and when to stop.

6. If you're not getting better, you're getting worse

Reliability beats novelty. Clients buy outcomes they can trust. We set our priorities by the promises we make: stable screens, safe data, clear workflows, and predictable support. Every technical choice answers those promises. If a decision threatens them, we change course.

We handle pivots like a desk rebalancing risk. We shift effort from polish to controls, logging, and permissions. When exposure comes back within range, we resume feature work.

We keep optionality high. We avoid lock-ins on models, vendors, and data. When better tools arrive, we can switch without disruption. That is how we stay current without re-platforming.

Velocity without chaos

AI tools write code. They don't define requirements, verify edge cases, or maintain customer trust.

The discipline feels tedious. Write acceptance criteria before code. Review every AI output. Hold daily standups. Track velocity metrics. Validate constantly. The structure works because it acknowledges a simple truth: AI accelerates execution but cannot replace judgment.

The technology evolves constantly. New models arrive monthly. Capabilities expand rapidly. The infrastructure shifts. But the principles remain constant because they address the unchanging challenge — delivering reliable software while managing tools that generate thousands of lines in minutes.

That's the real work. That's what clients buy.

About the author

Keith Pattison
Keith Pattison

Founder, Black Flag Design

Keith leads Black Flag Design, a studio that ships production-ready software with AI-assisted development. He writes about the disciplines — small scope, weekly evidence, and human oversight — that keep AI-built systems reliable in the real world.

Related stories

More from the journal

Pen-and-ink sketch of a small clockwork robot working at a tool-covered workbench late at night while a human sleeps peacefully on a couch in the background, a wall clock reading 2:00 above
ai April 24, 2026 13 min read

The Agent Stays Up Late, Not Me

Every senior engineer knows the right way to set up a codebase. None of them do it. Here’s the four-stage framework we use — The Ratchet — to take a vibe-coded project all the way to a thing you’d trust in production, and the punchline about why this only just became worth doing.

Most teams have always known they should be running tests, type-checking, security audits, accessibility checks, dead-code analysis, prose linting, and a coverage floor. Most teams run two of those. Here’s why that math has finally inverted, and the four-stage framework we use to ratchet a vibe-coded project to a hardened one.

Keith Pattison

Keith Pattison

Founder, Black Flag Design

Read
Black Flag Journal
claude code April 20, 2026 5 min read

What a Year of Claude Code Trails Tells You About Your Team

Claude Code leaves evidence — sessions, commits, PRs, review notes. Read it like a logbook and you'll find what devs actually need to know before they go deeper.

After a year of shipping with Claude Code across real client work, the signal isn't in any single session — it's in the trails. Here's what those trails told us about where Claude Code shines, where it drifts, and the habits devs should build before they lean in harder.

Eli Wood headshot

Eli Wood

CEO, Black Flag Design

Read
Black Flag Journal
playbook April 20, 2026 6 min read

The Black Flag Playbook: Six Principles for Shipping with AI

Battle-tested principles for teams building real software with AI-generated code. Human judgment, tight scope, and weekly evidence — the disciplines that keep AI-built systems reliable.

The six rules we use to ship production software with AI. Small scope, weekly demos, human-led oversight, and continuous improvement — drawn from six months of real client engagements.

Keith Pattison

Keith Pattison

Founder, Black Flag Design

Read