Six Principles for Shipping Reliable Software with AI

When AI-generated code started flooding projects — tools writing thousands of lines in minutes — we faced the same chaos everyone did. Promises of speed delivered drift instead. We found sanity through discipline.

Six principles emerged from our work in school districts and financial services. The vendors change. The models change. The principles hold.

1. Aim small, win small every day

Making complex systems feel simple requires more work than making them complex. We avoid the "run for hundreds of hours" hype that surrounds today's AI agents. Our clients don't need technical complexity — they need software that works simply.

We keep scope small at first. We build functional slices of the product. Clients see working software by week two, and we deliver updates every week after that. Each feature starts small and scales up only after proving its value.

When working with AI, we face constant temptation to build the entire application at once. We resist. Aim small, build working pieces first, expand only when those pieces deliver clear value.

In practice: across a recent six-month financial services engagement, client-facing capabilities received early investment while infrastructure scaled in direct response to demonstrated need. Database architecture, authentication, and API layers materialized in proportion to validated demand — not anticipated demand.

2. Reject the narrative that AI can operate independently

From day one, we establish clear boundaries around AI responsibilities. Humans define requirements. AI proposes implementations. Humans verify quality. This cycle — Generate, Validate, Annotate — governs every aspect of our development process. When conflicts arise between human judgment and AI suggestions, human judgment prevails.

We assign AI tasks based on our team members' skills. The designer directs visual development. The engineer manages system architecture. When responsibilities blur, we clarify them explicitly.

In practice: across 626 distinct development decisions in one engagement, roughly two-thirds focused on new capabilities and one-third on refinement. The balance held throughout. Organizations that delegated decisions to unsupervised AI experienced extremes — uncontrolled feature proliferation on one end, endless optimization paralysis on the other.

3. Repeat the purpose to yourself, your clients, and your AI

Technology exists to serve client outcomes, not the other way around. We establish this principle early and maintain it relentlessly. We reject features that showcase technical prowess but fail to improve client retention.

Our commitment to purpose extends beyond team meetings into our AI interactions. We build standardized prompts that reinforce a client-first mentality. They serve as guardrails that prevent feature creep and keep focus on measurable outcomes.

Clarity creates quality. The clearer our purpose statements, the more precisely AI-generated solutions align with client needs. Documentation beats conversation — we build rule engines for AI interactions instead of relying on unstructured dialogue.

4. Plans are worthless, but planning is everything

We build momentum through regular delivery cycles. We set aside overly detailed roadmaps in favor of weekly progress. We track velocity, not completion percentages.

When user interviews lead to new product directions, our products adapt because we haven't locked ourselves into rigid plans. The process feels methodical, sometimes frustratingly so, but it delivers reliable results week after week.

In practice: new capability development progressed steadily across six months. Code optimization investment stayed deliberately modest through the first four months, then accelerated substantially in the final period — when patterns had crystallized across validated features. Organizations following rigid advance planning hit the inverse pattern: they optimized prematurely, then watched velocity collapse when user feedback demanded course corrections.

5. Everybody is a manager now — don't be a chill one

Using AI to build means everybody is a manager now. Managing AI resembles directing a team of junior developers more than solving a technical puzzle. The AI requires clear direction, constant oversight, and regular course correction.

User stories and acceptance criteria aren't bureaucratic overhead — they're essential guardrails that keep AI-assisted development on track.

We rebalance weekly, not quarterly. We correct small drifts each week, the way a desk rebalances a portfolio. It keeps risk in range, protects compounding, and avoids costly resets.

In practice: 85% of commits in the same engagement touched fewer than 200 lines — controlled, incremental progress. But six commits spiked above 2,000 lines each. Those weren't planned features. They were corrections triggered by oversight gaps — moments when vague specifications let AI drift, requiring large fixes to bring systems back on track. The sub-200-line commits came from clear user stories with acceptance criteria. AI knew exactly what to change and when to stop.

6. If you're not getting better, you're getting worse

Reliability beats novelty. Clients buy outcomes they can trust. We set our priorities by the promises we make: stable screens, safe data, clear workflows, and predictable support. Every technical choice answers those promises. If a decision threatens them, we change course.

We handle pivots like a desk rebalancing risk. We shift effort from polish to controls, logging, and permissions. When exposure comes back within range, we resume feature work.

We keep optionality high. We avoid lock-ins on models, vendors, and data. When better tools arrive, we can switch without disruption. That is how we stay current without re-platforming.

Velocity without chaos

AI tools write code. They don't define requirements, verify edge cases, or maintain customer trust.

The discipline feels tedious. Write acceptance criteria before code. Review every AI output. Hold daily standups. Track velocity metrics. Validate constantly. The structure works because it acknowledges a simple truth: AI accelerates execution but cannot replace judgment.

The technology evolves constantly. New models arrive monthly. Capabilities expand rapidly. The infrastructure shifts. But the principles remain constant because they address the unchanging challenge — delivering reliable software while managing tools that generate thousands of lines in minutes.

That's the real work. That's what clients buy.

About the author

Keith Pattison

Founder, Black Flag Design

Keith leads Black Flag Design, a studio that ships production-ready software with AI-assisted development. He writes about the disciplines — small scope, weekly evidence, and human oversight — that keep AI-built systems reliable in the real world.

Six Principles for Shipping Reliable Software with AI

1. Aim small, win small every day

2. Reject the narrative that AI can operate independently

3. Repeat the purpose to yourself, your clients, and your AI

4. Plans are worthless, but planning is everything

5. Everybody is a manager now — don't be a chill one

6. If you're not getting better, you're getting worse

Velocity without chaos

More from the journal

The Agent Stays Up Late, Not Me

What a Year of Claude Code Trails Tells You About Your Team

The Black Flag Playbook: Six Principles for Shipping with AI