The directory was never the problem
Every organization that connects people to resources eventually builds a directory. Hundreds of programs, each with its own eligibility rules, intake quirks, capacity limits, and renewal cadence. The directory grows, someone keeps it tidy, and everyone congratulates themselves on coverage. Then you watch a real person try to use it and the illusion breaks. They don't know which of the three hundred programs applies to them. They don't know the order to apply in, or that this benefit disqualifies them from that one, or that the program that fits closed its intake last month. A searchable list assumes the person already knows what they're looking for. The people who need help most rarely do.
The actual problem is a matching problem on a messy graph. On one side, a person with a tangled, half-articulated situation. On the other, a web of programs with overlapping criteria, hidden dependencies, and constantly shifting availability. The work is traversing that graph — figuring out which paths are open, which are blocked, which unlock others — and doing it fast enough to matter. That is exactly the kind of work applied AI is good at, and exactly the kind where getting it wrong can cost someone the help they needed.
Separate the rules engine from the judgment engine
The mistake is to build one model that ingests a person and emits an answer. The discipline is to split the system in two.
The first half is a rules engine. Eligibility is mostly deterministic: income thresholds, status requirements, geography, documentation. Encode it as rules, not vibes. A model that guesses whether someone qualifies for a hard-criteria program is a liability; a rules engine that checks is an asset, and it can tell you exactly why something was included or excluded. This is where you get reliability and an audit trail for free.
The second half is a judgment engine, and it earns its keep on everything the rules can't settle. Reading a messy intake narrative and inferring what someone actually needs. Ranking ten programs the person technically qualifies for by which will genuinely move their situation. Spotting that two needs are really one underlying problem. This is fuzzy, contextual, and high-value — and it's where a model that explains its reasoning beats a static decision tree.
Keep them separate and each does what it's good at. The rules engine narrows the graph to what's actually open. The judgment engine reasons over what's left. When the two disagree, that disagreement is a signal worth surfacing, not a bug to paper over.
Keep the human on the high-stakes edges
Most matches are low-stakes and high-volume: route someone to a food program, a job board, a benefits screening. Automate those relentlessly, with the reasoning visible so a caseworker can audit a sample and trust the rest. But some cases carry real consequence — a crisis, a disqualifying interaction between benefits, an irreversible decision. Those are where being wrong is expensive, and those are exactly the edges where a human has to stay in the loop. The system's job there isn't to decide; it's to assemble the picture, show its work, and hand a named person a decision they can actually make in minutes instead of hours.
That's the trust mechanism. Not a confidence score nobody understands, but a recommendation that shows every program it considered, why each was kept or dropped, and which rule or inference drove the call. Explainability is what lets an organization let go of the routine matches without losing sleep over the consequential ones.
A two-day start
Don't boil the graph. Pick one population with a recurring, painful matching problem and the messiest intake you have. Stand up the rules engine for just that slice of programs — the hard eligibility criteria, encoded and testable. Put a thin judgment layer on top that reads the intake, ranks the open options, and writes a one-paragraph explanation a caseworker can read at a glance. Route anything high-stakes to a human queue. Run it against real historical cases and compare its matches to what actually happened.
In two days you'll know the two things that matter: where the rules are clean enough to automate, and where judgment is doing the real work. That boundary is the map for everything you build next — automate the traversal, keep the human on the consequential edges, and make every match show why.
Black Flag Design builds applied-AI products for organizations where being wrong is costly and the right answer is buried in complexity. If you have the directory but not the matching, spend two days with us — we call it a Foundation Sprint.