#120- The AI trust crisis is already here

And no one’s talking about the layer that will make or break enterprise deployments

Jun 28, 2025

Over the past twelve months, much of the conversation around large language models has focused on capability like what these systems can generate, automate, or accelerate. That has been necessary discourse but it’s also incomplete.

Far fewer have asked a more consequential question: what should these models be allowed to do?

As enterprises move from experimentation to deployment, copilots and intelligent assistants are being embedded into core business workflows. The pace of rollout is accelerating. But so is exposure. In most of these environments, foundational safeguards are still missing.

Today, the majority of enterprise AI deployments cannot answer three simple, yet critical questions:

Who retrieved which piece of data?
Did the model hallucinate or leak sensitive information?
Can any given output be reconstructed, explained, or audited later?

What seems immaterial implementation details are actually material blockers to safe and scalable adoption. Without clear answers to these questions, AI systems will remain untrustworthy no matter how capable they become.

This is eerily similar to the early phase of internet adoption. There is widespread enthusiasm, rapid experimentation, and a sense of inevitability. But there are also glaring structural gaps. The current enterprise AI stack is built for performance, not control, for experimentation, not accountability.

We are deploying powerful new systems far faster than we are even thinking about the infrastructure to govern them. That imbalance will define the next stage of enterprise AI. And resolving it is no longer optional.

The next attack will be internal, not external

The most immediate threat in enterprise AI isn’t going to come from a sophisticated bad-intentioned outsider. It’ll likely be from a well-intentioned insider.

Across most organizations, GenAI systems are being deployed without even basic safeguards. Prompt pipelines are undocumented. Retrieval logic is improvised. Vector stores contain sensitive data but offer no meaningful access control. There is little visibility into what data was queried, by whom, and under what governance context.

It would be great if this were an edge case but sadly it’s the prevailing default.

It only takes one careless interaction, one hallucinated output drawn from a sensitive document, shared with a customer or regulator, embedded in a presentation or filing, with no trace of how it got there or who put it. There is literally no malicious actor in that scenario. Just a system operating exactly as it was allowed to but without constraints.

This is what makes invisible failure so dangerous. It doesn’t announce itself. It hides in plain sight, until the consequences surface.

Shifting from building fast to building responsibly

Over the last two years, the dominant question in enterprise AI was: which model should we use? That phase is now closing. The question today is far more difficult, and far more important: how do we know what this model just did?

We are moving from experimentation to production. From developer-led enthusiasm to board-level scrutiny. From showcasing capabilities to managing consequences. This is the moment where AI stops being a technical accelerant and starts becoming an operational liability unless governed correctly.

The historical parallel is instructive. In the late 2000s to early 2010s, cybersecurity went from an IT concern to a boardroom imperative. No major system could be deployed without a security review. Tomorrow, AI governance will follow that same arc, emerging as the trust layer that determines whether AI systems can be adopted at scale in high stakes, high trust, highly regulated environments.

In this context, governance is neither a dashboard, a feature, or a systems compliance checklist. It is infra which decides whether a powerful system can be used or must be shut down.

What governance means?

There is still considerable confusion around what AI governance entails. In many enterprise discussions, the term is reduced to vague assurances like compliance checklists, redaction tools, fairness audits, or vague references to a new buzzword of responsible AI to avoid deepfakes.

That’s not enough.

Properly understood, governance is not a policy binder or a PR gesture. It is a real-time control system. A control tower for how AI systems interact with enterprise data, users, and output channels.

At a functional level, this means visibility and control across four dimensions:

What data gets used
– Content tagging, role-based access, sensitivity classification
How that data gets used
– Retrieval constraints, prompt-level enforcement, usage throttling
What is generated
– Output filtering, hallucination detection, redaction layers, real-time suppression
Who sees what
– Identity-aware delivery, user clearance mapping, access logging and traceability

The discussion has moved on from being an academic chatter to being an operational necessity. Sadly, most GenAI deployments today meet none of these standards. Enterprises are building production grade systems without permissions, safeguards, or traceability. We’ve built copilots with no black box, no brakes, and no flight log. This is a shit ton of risk at scale.

Governance is no longer optional

Across boardrooms, regulatory bodies, and IT budgets, one trend is becoming unmistakable: governance is no longer a downstream concern. It is moving rapidly to the front of the AI adoption lifecycle. Several independent signals are converging to make governance inevitable:

1. Regulatory Momentum

EU AI Act – Final Text Adopted, March 2024. The EU AI Act requires high-risk AI systems to demonstrate:
- Traceability of outputs
- Logging of training and inference steps
- Human oversight and clear accountability
  This affects any AI system used in hiring, credit scoring, legal advice, medical decisioning, and more.
India: The Digital Personal Data Protection Act, 2023 explicitly places obligations on entities processing personal data, including logging, consent, and redressal. Drafts of India’s AI governance framework further push for "techno-legal" controls like rule-encoded outputs and audit trails.
The US: The White House AI Executive Order (Oct 2023) and NIST AI RMF require federal agencies and contractors to implement explainability, red-teaming, and audit mechanisms for AI systems.

Regulation is moving beyond models to usage and behavior controls and it’s arriving faster than most enterprises expect.

2. Board-level urgency

In April 2023, Samsung banned employee use of ChatGPT after sensitive code was leaked via a prompt. The ban was later reversed but only after implementing stricter usage controls and internal sandboxing. Another 2024 report showcased multiple Fortune 500 CIOs were halting LLM deployments until legal, security, and governance reviews were complete even for internal use cases. Many boards now ask the same thing about AI they once asked about cybersecurity: “What is our exposure?” Governance is literally a Director’s Liability issue.

3. Trust barriers in regulated sectors

JPMorgan Chase banned ChatGPT internally due to data security concerns. They now run closed pilots under strict internal access controls. In healthcare, LLM outputs cannot be used for clinical decision-making unless auditability, data provenance, and usage constraints are proven per HIPAA, FDA guidance, and hospital review boards. In pharma, documentation for clinical trials and regulatory filings must be traceable. AI-generated content without lineage or approval logs cannot be submitted. Without governance, AI is unusable in sectors where risk is regulated.

4. Infra spend is shifting

Spending on GenAI infra is now outpacing model training spend. Top categories include:

AI observability (Arize, WhyLabs)
ModelOps & governance (Credo AI, TruEra, Robust Intelligence)
RAG + semantic retrieval (Pinecone, Weaviate)

Within enterprises, studies show that post-pilot GenAI spend is flowing into governance layers like access control, logging, and explainability.

5. Real-world vulnerabilities are surfacing

A Stanford study (2024) showed how prompt injection can bypass LLM guardrails with ~93% success rate under basic conditions. Simon Willison documents live exploits in tools like ChatGPT and Bing. There have also been Hallucination Incidents like that of Air Canada which held legally liable when their chatbot offered incorrect refund policies. The airline had to honor the AI's false statement. Apple and Goldman Sachs banned third-party LLMs due to concerns that sensitive internal data was being used for inference by public models. Vulnerabilities are now material, documented, and legal liabilities.

But the governance stack is broken

The AI governance and data infrastructure ecosystem is expanding rapidly. But it remains deeply fragmented. Most current solutions address only isolated parts of the problem. What’s emerging is not a unified governance stack, but a mosaic of tools, each solving for narrow layers of control, monitoring, or security.

This fragmentation poses two challenges for enterprise adoption:

There is no standard way to enforce governance across the full AI lifecycle from ingestion to inference to output.
Most tools were not designed for unstructured data or real-time usage in LLM contexts, and are now being retrofitted for a use case they never anticipated.

Below is a mapping of key segments in this evolving landscape:

1. Data catalog & Metadata governance

Players like Alation, Collibra were built originally for structured data environments, these tools help enterprises organize and manage metadata, data lineage, and policy definitions. They’re increasingly exploring AI plugins but still operate largely at the metadata and documentation layer, not at runtime or semantic content level. Notable backers include Sapphire Ventures, ICONIQ, Google (CapitalG)

2. Access control for structured data

Players like Immuta, Privacera enforce row-level, column-level, or user-role-based access controls across data lakes and analytics systems. These tools are vital for analytics pipelines but do not cover unstructured data or LLM-based access. Notable backers include Intel Capital, Insight Partners

3. Unstructured Data ETL and Preprocessing

Players like Unstructured.io, Haystack (deepset) are open-source and developer-first tools focused on parsing, chunking, and cleaning unstructured data for retrieval-based AI use. They form the ingestion layer of the RAG stack but do not provide policy enforcement or usage governance. Notable backers include Menlo Ventures, GV

4. Large scale unstructured data management

Players like Komprise, Varonis analyze and organize massive file-based data stores, often for tiering or archiving. Some provide access analytics or permissions audits, but none address real-time LLM governance. Notable backers are Canaan, public equity markets

5. Vector databases

Players like Pinecone, Weaviate, Chroma store and retrieve high dimensional embeddings critical to semantic search and RAG systems. While performant, they lack built in enforcement for role based access or policy aware retrieval logic. A notable backer is a16z

6. RAG Orchestration and LLMOps

LangChain, LlamaIndex, Dust, Gantry, Arize AI are frameworks simplify building and chaining LLM based applications with memory, retrieval, and tool use. However, most prioritize speed and developer experience over security or governance. Emerging LLMOps platforms offer evaluation and monitoring but do not enforce semantic policy constraints. Notable backers include Sequoia, Benchmark, Greylock

7. Data & model quality

Cleanlab, Snorkel improve training data integrity and reduce model error through programmatic labeling, error detection, and data curation. These tools are upstream of inference governance, but foundational to trust in AI systems. Notable backers are Menlo, Bain Capital, Lightspeed

8. AI governance & policy oversight

Players like Credo AI, IBM WatsonX Governance, Microsoft Purview are governance frameworks for documenting model risks, defining acceptable use, and managing fairness/compliance audits. These are essential at a policy level but often lack runtime enforcement across unstructured data or retrieval pipelines.

9. AI security & model hardening

Robust Intelligence, Protect AI, HiddenLayer, Adversa focus on AI runtime and model perimeter defense, prompt injection protection, output validation, vulnerability scanning. These tools complement governance by reducing adversarial risk, but do not manage internal usage governance or semantic control. Backers are Cisco (via M&A), Booz Allen, Acrew Capital

Below is what the governance stack should look like but doesn’t. The host of tools create the illusion of safety. But behind the neat boxes is a fragmented, fragile pipeline.

The current governance stack is a broken mosaic. Stack not exhaustive

The core takeaway- Just because tools exist at every layer doesn’t mean governance exists. The stack looks orderly, but enforcement is either absent, retrofitted, or broken by design.

The current governance stack is a patchwork highly adapted from adjacent domains like data security, analytics, and traditional compliance. None of these systems were originally designed for real-time LLM-based workflows, unstructured data retrieval, or semantic-level access control.

As enterprises scale GenAI deployment across copilots, assistants, and automated decisioning tools, this fragmented tooling environment will create both technical and regulatory friction. The market is ripe for consolidation or reinvention where governance becomes an integrated, usage-aware, content-level infrastructure layer.

Until then, most AI deployments will continue to run with partial visibility and limited control.

What this means

For Founders

The biggest open space in GenAI infra is governance that operates at semantic, unstructured, real-time levels. If you’re building Okta for AI, Datadog for prompts, or Snowflake with policy enforcement then you’re not late. You’re early.

For Investors

The first wave was model labs. The second wave is agent tooling. The third is where real enterprise value compounds, in trust infrastructure: access layers, guardrails, observability, and policy engines that make GenAI usable in sensitive environments. We’ve funded what AI can do. Now it’s time to fund what makes it safe to usen

For Enterprises

If you're deploying copilots, internal assistants, or LLM-based automation, and you don’t know:
Who accessed what
Under which policy
And why the model said what it said
You’re already out of compliance. You just haven’t been asked yet.

Reality is if you don't know who accessed what, under which policy, and why the model said what it said…you’ve lost the AI race already. AI is not going away. But neither is risk. Governance is the layer that decides whether we scale safely or fail silently.

If you’re building in this space, or facing these issues in production, I want to hear from you.

Siddharth's Newsletter

Discussion about this post