Packaging a Proven Model: Applied AI Without Flattening It

The problem: the model lives in people's heads

There's a specific kind of organization that has earned the right to scale. It has run the same hard work long enough — a decade, two decades — that the method genuinely works. The outcomes are real and repeatable in-house. So the natural next move is to package it: write it down, codify it, hand it to new sites, new teams, new partners.

Then something quietly breaks. The version that gets handed over is a thinner version of the thing. The steps survive; the judgment doesn't. What made the original work was never the checklist — it was a practitioner reading a situation and knowing which of the rules to bend, and when. Compress that into a template and you ship the skeleton without the muscle.

This is the trap of productizing expertise. The pressure to make a model legible to outsiders is exactly the pressure that flattens it.

The insight: a procedure is not the same thing as a judgment

The useful move is to stop treating "the model" as one undifferentiated asset and split it into two engines that have always been tangled together.

There is a rules engine: the parts of the work that are stable, repeatable, and the same every time. Sequencing. Documentation. Surfacing the right reference material at the right step. Checking that nothing was skipped. These are real and valuable, and they are also boring in the precise way that means a machine can carry them faithfully.

Then there is a judgment engine: the parts that depend on context a practitioner is holding in their head. Whether this case is the exception. What a quiet signal actually means. When the standard path is wrong for this particular situation. This is the part you spent twenty years building, and it is the part you must not automate away.

Applied AI earns its place on the first engine and stays out of the way on the second. It takes over the repetitive scaffolding so the expensive human judgment has more room, not less. Done well, a new practitioner using the system spends their attention on the decisions that matter instead of on remembering the procedure — which is the opposite of flattening.

The path: start where judgment is expensive and repetitive

The instinct is to automate the whole model at once. Don't. Start where two things overlap: the judgment is expensive and the surrounding work is repetitive. That's where AI removes the most friction with the least risk.

Keep a human in the loop everywhere being wrong is costly. If a bad call has real consequences for a real person downstream, the machine drafts and a person decides. The system's job at those moments is not to answer — it's to assemble the context, show its reasoning, and make the human's decision faster and better-informed. Earn trust with explainability: a recommendation you can't interrogate is one nobody will rely on twice.

A concrete way to start in two days:

Day one — map the two engines. Walk through one real instance of your model end to end. At each step write down: is this a rule (same every time) or a judgment (depends)? You'll be surprised how much is rules wearing a judgment costume — and you'll find the two or three judgment moments that are the actual product.
Day two — automate one rule, instrument one judgment. Pick a single repetitive step and have AI carry it, with a person reviewing the output. For one judgment moment, don't automate it — build the thing that hands the practitioner everything they need to decide well. Watch where they override it. Those overrides are the tacit knowledge you've been trying to capture; now it's visible.

At the end of two days you have a small, working seam between procedure and practice — and a much clearer picture of which parts of your model are safe to scale and which parts are the model.

A proven method is worth protecting. The way you protect it while scaling is not to refuse to write it down — it's to be ruthless about what gets handed to a machine and what stays a human call.

Black Flag Design builds applied-AI products that respect the difference between a procedure and a judgment. If you've got a model worth scaling and you're worried about flattening it, spend two days with us — we call it a Foundation Sprint.

About the author

Eli Wood

CEO, Black Flag Design

Eli Wood leads Black Flag Design, a creative technology company focused on shipping ambitious digital products, AI systems, and design-forward software with a direct point of view on how technology changes work.

LinkedIn Website

Packaging a Proven Model: Applied AI Without Flattening What Made It Work

The problem: the model lives in people's heads

The insight: a procedure is not the same thing as a judgment

The path: start where judgment is expensive and repetitive

More from the journal

The Agent Stays Up Late, Not Me

What a Year of Claude Code Trails Tells You About Your Team

The Black Flag Playbook: Six Principles for Shipping with AI