#117- AI infra is a margin trap, unless you pass this checklist
A practical 5-point checklist to evaluate AI infra startups. Use it to filter out weak wrappers, margin traps, and infra that won’t scale
In the last three parts of this ModelOps series, I went deep into how real world AI systems should be built and managed- clean data in, recoverable states out, and live models that don’t silently decay.
But what about the teams building the infra behind all this?
Over the past 12 months, I’ve met dozens of founders building AI infra. Most are chasing weak wrappers, stacking brittle layers on top of someone else’s moat, or being chased by LLM feature creep. But a few, very few, are building infra that will outlive todays foundation model race.
So here’s the checklist I now use. Five simple filters to stress test any AI infra startup. You don’t need to be an ML engineer to apply them. You just need to think like someone who has to live with the system at scale.
How I stress test any AI infra startup today
If they fail 3 out of these 5 filters, run.
AI infra isn’t about cool demos anymore. It’s about reliability, margin, and control. Most startups in this space are one model update away from being irrelevant.
Here’s my 5-filter test. Most don’t pass.
1. Is it infra or just a fancy wrapper?
A lot of “infra” startups are just skin on top of someone else’s API. If OpenAI changes their pricing or product, you’re dead.
How to check:
Are they solving a hard problem with real abstraction?
Or just moving data from one box to another?
2. Can someone outside tech understand & explain the value?
If the only people who get it are ML engineers on Twitter, it’s not a business. Infra that sticks helps real users save time, cut cost, or make more money. Even my non tech friend in marketing spotted chatGPT’s various use cases & automated 50% workflows in the first 2-3 months of its launch.
How to check:
Can a head of ops or marketing explain why they use it?
Would the business miss it if it disappeared?
3. Do they ship like operators or publish like researchers?
Cool research doesn’t matter if the thing breaks in production. Infra wins by being boring, fast, and predictable. If it needs constant tuning, it won’t scale.
How to check:
Have they shipped software used in production before?
Can they talk in uptime and latency (business use case), not just accuracy and models (tech use case)?
4. Do unit economics get better with time?
Infra on top of someone else’s infra is a margin trap. If your costs grow faster than your usage, the model is broken.
How to check:
Can they cut costs over time?
Are they in control of the stack or just renting layers?
5. Does this still matter if the model race ends tomorrow?
If OpenAI or Anthropic becomes the default, do you still exist? Great infra companies survive platform shifts. They don’t bet on chaos lasting forever. This is easier said than done, but there are some ways to check-
How to check:
Is it model-agnostic?
Would users still need this if the market consolidates?
If they fail 3 of these, its a pass
If they fail 4, its a deck
If they fail all 5, its a feature
Infra isn’t supposed to be exciting. It’s supposed to work! Every time, without surprises.
If a startup can pass 4 out of 5 of these filters, you probably want to work with them. Or back them. Or at least not bet against them.
Everything else? Either a feature, a weak wrapper (wrappers are not necessarily bad despite the word on the street), or a soon to be ghosted pilot.
This post is part of my ongoing ModelOps series, a hands on look at how modern AI infrastructure gets built, deployed, and scaled in the real world. If you liked this, you’ll probably enjoy:
And if you’re building in this space and think you pass these filters, I want to hear from you