Packaging a Proven Model: Applied AI Without Flattening What Made It Work

Organizations that spent two decades perfecting a method face a cruel paradox when they try to scale it: the moment you write the model down so others can run it, you risk losing the judgment that made it work. Applied AI can carry the procedure without flattening the practice — but only if you separate the two on purpose.

Eli Wood headshot

Eli Wood

June 24, 2026 4 min read
A craftsperson holding a finished model ship above a row of identical empty bottles on a workbench

The problem: the model lives in people's heads

There's a specific kind of organization that has earned the right to scale. It has run the same hard work long enough — a decade, two decades — that the method genuinely works. The outcomes are real and repeatable in-house. So the natural next move is to package it: write it down, codify it, hand it to new sites, new teams, new partners.

Then something quietly breaks. The version that gets handed over is a thinner version of the thing. The steps survive; the judgment doesn't. What made the original work was never the checklist — it was a practitioner reading a situation and knowing which of the rules to bend, and when. Compress that into a template and you ship the skeleton without the muscle.

This is the trap of productizing expertise. The pressure to make a model legible to outsiders is exactly the pressure that flattens it.

The insight: a procedure is not the same thing as a judgment

The useful move is to stop treating "the model" as one undifferentiated asset and split it into two engines that have always been tangled together.

There is a rules engine: the parts of the work that are stable, repeatable, and the same every time. Sequencing. Documentation. Surfacing the right reference material at the right step. Checking that nothing was skipped. These are real and valuable, and they are also boring in the precise way that means a machine can carry them faithfully.

Then there is a judgment engine: the parts that depend on context a practitioner is holding in their head. Whether this case is the exception. What a quiet signal actually means. When the standard path is wrong for this particular situation. This is the part you spent twenty years building, and it is the part you must not automate away.

Applied AI earns its place on the first engine and stays out of the way on the second. It takes over the repetitive scaffolding so the expensive human judgment has more room, not less. Done well, a new practitioner using the system spends their attention on the decisions that matter instead of on remembering the procedure — which is the opposite of flattening.

The path: start where judgment is expensive and repetitive

The instinct is to automate the whole model at once. Don't. Start where two things overlap: the judgment is expensive and the surrounding work is repetitive. That's where AI removes the most friction with the least risk.

Keep a human in the loop everywhere being wrong is costly. If a bad call has real consequences for a real person downstream, the machine drafts and a person decides. The system's job at those moments is not to answer — it's to assemble the context, show its reasoning, and make the human's decision faster and better-informed. Earn trust with explainability: a recommendation you can't interrogate is one nobody will rely on twice.

A concrete way to start in two days:

  • Day one — map the two engines. Walk through one real instance of your model end to end. At each step write down: is this a rule (same every time) or a judgment (depends)? You'll be surprised how much is rules wearing a judgment costume — and you'll find the two or three judgment moments that are the actual product.
  • Day two — automate one rule, instrument one judgment. Pick a single repetitive step and have AI carry it, with a person reviewing the output. For one judgment moment, don't automate it — build the thing that hands the practitioner everything they need to decide well. Watch where they override it. Those overrides are the tacit knowledge you've been trying to capture; now it's visible.

At the end of two days you have a small, working seam between procedure and practice — and a much clearer picture of which parts of your model are safe to scale and which parts are the model.

A proven method is worth protecting. The way you protect it while scaling is not to refuse to write it down — it's to be ruthless about what gets handed to a machine and what stays a human call.

Black Flag Design builds applied-AI products that respect the difference between a procedure and a judgment. If you've got a model worth scaling and you're worried about flattening it, spend two days with us — we call it a Foundation Sprint.

About the author

Eli Wood headshot
Eli Wood

CEO, Black Flag Design

Eli Wood leads Black Flag Design, a creative technology company focused on shipping ambitious digital products, AI systems, and design-forward software with a direct point of view on how technology changes work.

Related stories

More from the journal

Pen-and-ink sketch of a small clockwork robot working at a tool-covered workbench late at night while a human sleeps peacefully on a couch in the background, a wall clock reading 2:00 above
ai April 24, 2026 13 min read

The Agent Stays Up Late, Not Me

Every senior engineer knows the right way to set up a codebase. None of them do it. Here’s the four-stage framework we use — The Ratchet — to take a vibe-coded project all the way to a thing you’d trust in production, and the punchline about why this only just became worth doing.

Most teams have always known they should be running tests, type-checking, security audits, accessibility checks, dead-code analysis, prose linting, and a coverage floor. Most teams run two of those. Here’s why that math has finally inverted, and the four-stage framework we use to ratchet a vibe-coded project to a hardened one.

Keith Pattison

Keith Pattison

Founder, Black Flag Design

Read
Black Flag Journal
claude code April 20, 2026 5 min read

What a Year of Claude Code Trails Tells You About Your Team

Claude Code leaves evidence — sessions, commits, PRs, review notes. Read it like a logbook and you'll find what devs actually need to know before they go deeper.

After a year of shipping with Claude Code across real client work, the signal isn't in any single session — it's in the trails. Here's what those trails told us about where Claude Code shines, where it drifts, and the habits devs should build before they lean in harder.

Eli Wood headshot

Eli Wood

CEO, Black Flag Design

Read
Black Flag Journal
playbook April 20, 2026 6 min read

The Black Flag Playbook: Six Principles for Shipping with AI

Battle-tested principles for teams building real software with AI-generated code. Human judgment, tight scope, and weekly evidence — the disciplines that keep AI-built systems reliable.

The six rules we use to ship production software with AI. Small scope, weekly demos, human-led oversight, and continuous improvement — drawn from six months of real client engagements.

Keith Pattison

Keith Pattison

Founder, Black Flag Design

Read