Discussion about this post

User's avatar
Sam's avatar

You've done a good job of framing the urgency and surfacing under discussed pain points like ones around pipeline fragility, rollback blind spots, and the inadequacy of legacy infra for ML-specific needs. But some observations-

1. In practice, very few orgs outside hyperscalers or top AI labs face these problems at that scale today

2. DVC and LakeFS lack full backup semantics but they’re often stitched together with Airflow, S3 versioning, or even Git LFS in well run infra teams. Saying they don’t offer full-fledged backup is accurate, but is actually dismissive of how power users actually extend them

3. The big enterprise shift I’m seeing is that AI governance is no longer about what model did we use, but what decision did this model influence, based on what data. That means you need data backups + prompt logs + retraining checkpoints all tied together. This convergence is worth a deeper dive.. and maybe a handful are solving this

4. Backup sounds great until your CFO sees the AWS invoice. There’s a trade-off between granularity, retention period, and cost here

Lays a good foundation to the problem deep dive ngl, but it needs a level up for a proper thesis

Expand full comment
1 more comment...

No posts