Field notes
AI vendor shortlists that don't rot in six months.
Every quarter or so we get asked to weigh in on a shortlist. Usually it's three vendors. Sometimes seven, occasionally a number a CFO would call "distressing." The deck that comes with it is always the same: a feature grid, a price column, maybe a security-questionnaire-status row in red and green. By the time we read it, the decision has already been almost made; we're being brought in to confirm it or to provide a defensible reason to overturn it.
Feature grids are bad procurement tools for AI vendors specifically. The features all converge — every serious vendor in a category will have the same dozen capabilities within nine months — and the things that diverge are exactly the things that don't fit in a grid. The grid that ages well doesn't compare features. It compares assumptions.
The four assumption rows that matter
When we run a shortlist, the rows we add to the grid are the ones we'd be embarrassed not to have at the bottom of the page when the contract is signed.
- Cost curve assumption — at our projected scale 12 months from now, what is the marginal cost of an additional unit of work, and how is that priced? Per token, per call, per seat, per user? What changes when the upstream model gets cheaper?
- Release cadence assumption — how often does this vendor ship breaking changes to their API, their pricing, their underlying model? Are we comfortable being on the receiving end of that cadence?
- Moat assumption — what specifically is this vendor doing that we couldn't, in principle, do ourselves with three engineers and a quarter? Is the moat the integration surface, the data, the brand, or genuinely the model? If the answer is "we can't tell," that is a finding.
- Exit assumption — when (not if) we move off this vendor, what comes with us? Whose data is it? Are the prompts portable? Is the evaluation harness portable? If we're locked in by gravity rather than contract, that is a real cost.
What we recommend instead of the feature grid
Run a two-week paid pilot with each finalist against your real data. Not a sandbox — your data. Make each vendor write the integration; this is itself a signal about how they treat customers. Score on three things: time-to-first-working-result, quality of communication during the pilot, and the assumption rows above. The feature grid can sit at the back of the deck as an appendix nobody reads.
The vendor pitches that look the most impressive in a 60-minute demo are not, in our experience, correlated with the vendors that age best at 18 months.
When a shortlist fights you
Sometimes the shortlist won't converge. Two vendors are close on everything that matters and you can't justify the price delta. That's almost always a sign that you're shortlisting in the wrong category. The actual question is one level up: is this a workflow we should be buying, building, or partnering on? The shortlist itself is a symptom of having answered the wrong question.
When we end up in this conversation we'll usually recommend stepping back to a one-week Strategy sprint to re-examine the underlying buy-vs-build call. Sometimes the shortlist gets shorter; sometimes it gets longer; sometimes it gets replaced entirely with a two-engineer build. The procurement exercise that was about to consume a quarter of someone's time goes away.
Subscribe
More like this, once a quarter.
Long-form notes from the practice — vendor selection, fractional executive arrangements, what we've learned the slow way. No vendor pitches, no AI hype.

