Your LLM Isn’t the Risk. Your Data Is.
Why this matters right now
If you’re putting large language models (LLMs) or “AI copilots” into real workflows, you’ve quietly taken on two new responsibilities:
- Becoming a data controller for prompts and outputs (often full of customer data, secrets, and PII).
- Becoming a model owner (even if you’re calling a vendor API) from a risk, audit, and governance standpoint.
Most teams are still treating LLMs like “expensive autocomplete.” From a privacy & governance perspective, that’s wrong. They are:
- Extremely capable data mixture machines (training, fine-tuning, retrieval, logging).
- New exfiltration surfaces for sensitive data (via prompts, context windows, and embeddings).
- Hard to reason about from an audit and policy standpoint (what went where, when, and why).
What changed is not just “we use AI now.”
What changed is that your worst data is now concentrated in one place, often sent off-prem to a third party, with logs and caches you only sort-of control.
If you care about SOC2, ISO 27001, HIPAA, GDPR, or just “not leaking production secrets,” you need a concrete model for:
- Data retention (prompts, outputs, embeddings, fine-tunes, logs).
- Model risk (hallucinations, data reconstruction, shadow training).
- Auditability (who did what, with which data, when).
- Policy-as-code (enforceable controls, not PDF manuals).
What’s actually changed (not the press release)
Three shifts matter in practice.
1. Prompt = data lake
Previous “AI” integrations mostly worked on controlled feature sets. With LLMs:
- Prompts often contain raw, unfiltered:
- Customer support transcripts
- CRM notes
- Source code
- Contracts and legal docs
- Internal strategy docs
- Retrieval-Augmented Generation (RAG) shoves indexed copies of that into:
- Vector DBs (embeddings)
- Caches (prompt/response)
- Fine-tune datasets (if you go there)
Your LLM system is now an alternate universe data lake that rarely went through the same privacy design as your “real” data lake.
2. Observability is worse than your web app
Your main app:
- Has request-level logging tunable by field.
- Has mature log PII redaction policies.
- Integrates with SIEM and IAM.
Most LLM stacks today:
- Have ad-hoc logs from:
- Frontend (user input)
- Orchestration layer (prompt templates, chain-of-thought, tool calls)
- LLM vendor (if logging enabled)
- Mix user data and system prompts in one blob.
- Rarely record what documents or tools were used as context in a way that’s queryable for an audit.
So when the privacy officer asks, “Which customer data did the assistant use to answer this user?” your honest answer is: “We’re not sure.”
3. Policy lag vs. model speed
Org patterns:
- Security & privacy policies: updated annually (if that).
- LLM features: shipped weekly.
Security teams are used to: “new microservice, same patterns, same controls.”
LLMs broke that: they behave like a multi-tenant, ever-learning interpreter sitting at the center of your stack.
Result: most orgs have a policy gap of 6–18 months between what’s in their SOC2/ISO controls and what’s actually happening in their LLM stack.
How it works (simple mental model)
Here’s a practical mental model to reason about LLM privacy & governance:
Four flows, four scopes:
- Prompt flow – What goes into the model
- Response flow – What comes out of the model
- Context flow – What extra data you attach (RAG, tools, DBs)
- Lifecycle flow – Where any of the above is stored (logs, caches, training)
Map these across four scopes:
- User device / client
- Your backend / orchestration layer
- Model provider / infra providers
- Downstream consumers (logs, analytics, BI, SIEM)
For each intersection, ask 5 questions:
- What kinds of data? (PII, PHI, secrets, regulated data, internal only)
- Who can access it? (humans & systems)
- How long is it kept? (and where)
- What policies should apply? (retention, masking, jurisdiction)
- How is that policy enforced? (code, config, or “trust me bro”)
Practically, that means:
- Treat prompts + context as first-class data assets, not transient strings.
- Treat your LLM pipeline as a system with its own data classification matrix.
- Make retention and access decisions at the pipeline step level, not “for the LLM” generically.
Example mental map (simplified):
- Frontend:
- Prompt typed by user → client logs? browser storage? analytics SDK?
- Orchestrator:
- Prompt + user profile + retrieved docs → request log, feature flags, tracing.
- Model API:
- Provider: stores prompts for N days unless you disable logging.
- Vector DB:
- Indexed chunks of contracts → backups, replicas, dev copies.
Every box above needs explicit answers on: retention, encryption, masking, and audit.
Where teams get burned (failure modes + anti-patterns)
Some anonymised patterns seen in production LLM deployments:
Failure mode 1: “It’s just an API call”
Pattern:
- Team calls a hosted LLM API directly from backend.
- Leaves vendor logging on by default.
- Sends user identifiers + raw content (chat logs, tickets, code) verbatim.
Later:
- Privacy team discovers that the vendor kept all prompts for 30 days+.
- Some prompts contained regulated or contractual data that was never supposed to leave the org or region.
Root cause: treating the LLM like a stateless math function, not a service with its own data lifecycle.
Failure mode 2: RAG as a data dumpster
Pattern:
- Team stands up a RAG system.
- Dumps entire S3 bucket / internal Confluence into a vector DB.
- No filtering on:
- Access level
- Data type (e.g., HR docs, legal, secrets)
- Data subject (customers vs internal)
Later:
- A junior support agent asks the assistant about a customer issue.
- The model surfaces snippets from internal compliance investigations and other customers’ tickets.
- Now you have cross-tenant data leakage via retrieval.
Root cause: indexing without authorization-aware filtering and classification.
Failure mode 3: “Redaction in the UI is enough”
Pattern:
- UI masks PII (e.g., emails, card numbers) before showing logs to users.
- Backend logs untainted raw prompts.
- LLM observability tool ingests logs and snapshots prompts for analytics.
Later:
- SOC2 auditor asks, “How do you ensure logs don’t contain cardholder data?”
- Logs clearly show full card numbers and addresses in prompts.
Root cause: redaction implemented cosmetically at the presentation layer; no policy-as-code at the logging layer.
Failure mode 4: No traceability from answer to source
Pattern:
- LLM copilot suggests actions in a CRM or incident management system.
- No persistent record of:
- Which context docs were used
- Which tools / APIs were called at model’s “decision time”
Later:
- A customer asks: “Why did the system send this email with incorrect financial data?”
- You can’t reconstruct:
- What the model saw
- Which records it pulled from
- Whether it hallucinated or RAG misfired
Root cause: missing LLM-specific telemetry and immutable event logs tied to each answer or action.
Practical playbook (what to do in the next 7 days)
You can’t fix everything in a week, but you can radically reduce risk.
Day 1–2: Build the “data map” for your LLM stack
- Inventory all LLM entry points:
- Chatbots, copilots, internal tools, backend batch jobs.
- For each, sketch the 4 flows (prompt, response, context, lifecycle) across the 4 scopes.
- Mark, for each step:
- Data types (PII/PHI/secrets/customer secrets/source code).
- Retention assumptions (if you don’t know, write “UNKNOWN”).
- Third parties involved (LLM vendors, vector DB SaaS, observability tools).
You want one ugly-but-honest diagram you can show to security.
Day 3: Apply a minimal policy matrix
Draft a simple classification → rule matrix for LLM-related data, reusing your existing privacy framework where possible:
Example:
-
Public / low sensitivity
- Retention: follow app default logs.
- Allowed with third-party LLM logs on.
-
Internal-only
- Retention: 30–90 days max in logs.
- Third-party LLM logs: off, or masked.
-
Customer confidential / contractual
- Retention: 30 days max; must be encrypted and access logged.
- Third-party LLM logs: off.
- Only via vendors with DPAs and data residency alignment.
-
Regulated (PII/PHI/PCI)
- Retention: strict minimum, ideally none outside primary data system.
- Third-party LLM logs: off, or use in-house/self-hosted models.
- Require explicit legal & security review.
You’re not aiming for perfection; you’re creating a baseline you can encode later as policy-as-code.
Day 4–5: Change configs, not slides
Implement concrete, reversible changes:
- Turn off vendor logging where not absolutely necessary.
- Truncate or hash user identifiers in:
- Prompts
- Telemetry
- LLM traces
- Mask obvious high-risk fields before LLM boundary:
- Emails, phone numbers, card numbers, national IDs.
- On your vector DB:
- Remove obviously out-of-bounds collections (HR, legal investigations, secrets).
- Add per-document ACLs and filter results by user auth where possible.
These changes alone usually knock out 60–80% of the practical privacy risk.
Day 6: Add minimal auditability
For each LLM feature, start capturing LLM-specific audit events:
user_idsession_id(or correlation ID)timestampmodel_nameand providercontext_sources(IDs of documents/tools used)actions_taken(e.g., “sent email”, “updated ticket”, “drafted contract section”)
You do not need full prompts in audit logs; capture metadata and document IDs instead. Store this in the same place you keep other security-relevant logs (or at least in a queryable store with retention controls).
Day 7: Define “policy-as-code v0”
Take your matrix and codify two or three concrete rules:
Examples:
-
Rule 1:
Any call tocallLLM()must passdata_classificationanduser_region.- If
data_classificationin {regulated, customer_confidential} →vendor_logging=false. - If
user_region == EU→ use EU-hosted or EU-capable model endpoint.
- If
-
Rule 2:
Any document ingested into RAG must haveowner_idandvisibility(e.g.,org-wide,team,user).
Retrieval enforcesvisibilityagainst the caller’s auth context. -
Rule 3:
Logs that include anyllm_context_sourceare stored with max 30-day retention and are excluded from non-production log sinks.
Implement as:
- Middleware in your LLM client/orchestrator.
- Static analysis or CI checks on LLM integration code.
- Runtime guards in your API gateway.
Policy-as-code here is basic: central choke points + explicit parameters + automated checks.
Bottom line
LLMs didn’t magically blow up privacy and governance. They amplified problems you already had:
- Unstructured, under-classified data everywhere.
- Third-party services with default-on logging.
- Weak traceability from system behavior back to data and policy.
The difference now is:
- Prompts and embeddings carry some of your most sensitive data.
- The blast radius of a bad choice is multi-tenant and hard to see.
- Auditors and customers are starting to ask specific questions about “AI features,” not just generic security.
If you treat your LLM stack as:
- A first-class data system,
- With explicit classification, retention, and access rules,
- Enforced via policy-as-code, not Word docs,
you can make meaningful progress in a week and be in good shape for SOC2/ISO-style scrutiny.
Ignore it, and your “AI assistant” will quietly become the most dangerous data product in your company. Not because of the model, but because of the way you wired your data into it.
