3 Comments
User's avatar
Sam's avatar

The pain is real, and automation here is overdue. That said, would’ve liked more depth. For example, on how this differs from observability/metadata tools, and how “auto” plays out across modalities beyond tabular/text?

Today observability tools like Monte Carlo, Metaplane, or even databand and metadata layers like OpenLineage, Marquez, or Feast for features are becoming core to how preprocessing is validated and governed

The $6-10bil TAM makes sense if you include all ETL, wrangling, and data integration efforts. But if we’re strictly talking about ML-specific preprocessing, the addressable market narrows fast!

Still, a good thesis kickoff man!

Expand full comment
Siddharth Shah's avatar

Again, appreciate the nudge to level up. I will read up more on observability tools

On TAM, yes pre-processing today is a niche. But dont you think as ML pipelines and enterprise dataops converge (esp in mm orgs where teams are small and consequently wear multiple hats), a wedge will develop?

Expand full comment
Sam's avatar

spot on, I was part of a small startup where one team managed everything from ingestion to modeling, this became a recurring drag. In larger orgs, teams are silo'd so doesnt affect much. Wedge is to unlock serious and compounding time savings

Expand full comment