2 Comments
User's avatar
Sam's avatar

You've done a good job of framing the urgency and surfacing under discussed pain points like ones around pipeline fragility, rollback blind spots, and the inadequacy of legacy infra for ML-specific needs. But some observations-

1. In practice, very few orgs outside hyperscalers or top AI labs face these problems at that scale today

2. DVC and LakeFS lack full backup semantics but they’re often stitched together with Airflow, S3 versioning, or even Git LFS in well run infra teams. Saying they don’t offer full-fledged backup is accurate, but is actually dismissive of how power users actually extend them

3. The big enterprise shift I’m seeing is that AI governance is no longer about what model did we use, but what decision did this model influence, based on what data. That means you need data backups + prompt logs + retraining checkpoints all tied together. This convergence is worth a deeper dive.. and maybe a handful are solving this

4. Backup sounds great until your CFO sees the AWS invoice. There’s a trade-off between granularity, retention period, and cost here

Lays a good foundation to the problem deep dive ngl, but it needs a level up for a proper thesis

Expand full comment
Siddharth Shah's avatar

Appreciate the push to level up, I see this post as laying the groundwork of my understanding of this space. I think we're early, not many companies have tons of data but this problem pains at scale, it should begin at hyperscalers and will become mid-market hygiene at some point. Just trying to be early and frame the thesis before it becomes widespread. And when it comes to cost, cost of compliance is far less than cost of non-compliance, those who scale will take cognizance, and also this is where air gapped backup from raw backups or even blockchain can help - again, early days but exciting nonetheless

Expand full comment