Your LLM Strategy Is Not SOC2‑Ready (Yet)
Why this matters right now
Most organizations already have shadow LLM usage.
If you haven’t seen it, it’s because:
- Engineers are pasting logs and stack traces into public chatbots.
- Analysts are dropping CSVs into “AI assistants”.
- Product is experimenting with SaaS copilots wired into internal tools.
You can pretend this isn’t happening, or you can treat it like any other high‑risk data processor:
- What is sent?
- Where does it go?
- How long is it stored?
- Who can see it?
- How do we prove that to an auditor?
LLM security, privacy, and governance problems are not abstract:
- Training data leaks can violate DPAs and NDAs.
- Prompt logs can contain secrets and PII that never got threat‑modeled.
- “Temporary logs” turn into long‑lived vendor data stores.
- SOC2 / ISO 27001 controls get quietly bypassed because “it’s just experimentation”.
If your security program is mature for web apps and data warehouses but you treat LLMs like a toy, you’ve just created a parallel, unmanaged data plane.
This post is about how to bring LLM usage back under the same discipline you already apply to any high‑risk SaaS or data processor.
What’s actually changed (not the press release)
Three things have meaningfully shifted in the last 12–18 months:
-
LLM traffic is now infrastructure, not just research
- LLM calls are in production paths: customer support, code review, fraud detection, internal search.
- That makes LLM providers de facto critical vendors and data processors.
- Their telemetry and logging (prompts, completions, embeddings, feedback) now hold sensitive content.
-
Model risk is no longer just “hallucinations”
Traditional model risk for ML was about bias, drift, and performance. With LLMs you now have:
- Prompt injection leading to data exfiltration.
- Cross‑tenant inference risk when fine‑tuning or using shared context.
- Training data feedback loops (your data used to train a frontier model).
- Unbounded input surface: prompts can contain arbitrary content and code.
It looks much more like supply-chain risk than classical ML risk.
-
Vendors are racing ahead of your policies
- Vendors toggle “no training on your data” flags, change retention defaults, and ship features like “chat history” without enterprise‑grade defaults.
- “Enterprise” pricing tiers often come with better retention and isolation, but:
- Not everyone is on those tiers.
- Shadow usage is almost never on those tiers.
Your policy deck from 2021 doesn’t describe:
- Fine‑tuning on SaaS infrastructure.
- Cross‑region inference.
- LLM-based “AI assistants” embedded into every third‑party tool.
How it works (simple mental model)
Use this mental model: LLM usage is a new data processor with four planes.
-
Input Plane (what you send)
- Prompts, documents, logs, database rows, user messages.
- Derived artifacts: embeddings, feature vectors, tool parameters.
Key questions:
- Does this contain PII, PHI, PCI, or secrets?
- Is there a classification label on it?
- Is it scoped to a tenant / customer?
-
Processing Plane (how it’s handled)
- Inference only vs. fine‑tuning vs. RAG (retrieval‑augmented generation).
- Where the model runs: vendor cloud, your VPC, on‑prem.
- Controls applied: content filters, policy enforcement, redaction.
Key questions:
- Is processing stateless or is state reused across requests?
- Are prompts persisted to improve the service?
- Can the model or its tooling call out to the internet or internal services?
-
Persistence Plane (what gets stored, where, and for how long)
- Prompt/response logs for debugging and quality.
- Vector stores / indexes.
- Fine‑tuned model artifacts or adapters.
- Analytics events.
Key questions:
- Who owns the storage? You vs. vendor.
- Default retention periods and deletion guarantees.
- Encryption at rest and key management (your KMS or theirs).
-
Governance Plane (how you control and prove behavior)
- Policies: which data can be sent, which models are allowed.
- Enforcement: gateways, SDKs, middleware, DLP.
- Visibility: audit logs, access logs, model spending reports.
- Compliance: SOC2, ISO 27001, GDPR, HIPAA mappings.
Key questions:
- Can you express policy as code and enforce centrally?
- Can you show an auditor: “here’s who sent what, where, and when”?
If you don’t explicitly design all four planes, developers will fill in the gaps:
- Inputs: “Just send the whole object.”
- Processing: “Use the default model from the docs.”
- Persistence: “Let the vendor keep logs, so we can debug.”
- Governance: “We’ll add it to the risk register later.”
Where teams get burned (failure modes + anti-patterns)
Here are the recurring failure patterns from real‑world deployments.
1. “Free tier now, enterprise later”
Pattern:
- Teams prototype with free/public endpoints.
- Prompts contain production data during “experiments”.
- Later, the company signs an enterprise DPA and assumes they’re clean.
Problems:
- Early data may already be in vendor training corpus or logs with longer retention.
- No audit trail of who sent what.
- Hard to answer due diligence questions: “Has any customer data ever been sent to non‑enterprise endpoints?”
Mitigation:
- From day zero, route all LLM traffic through:
- A single enterprise tenant with contractual guarantees, or
- A self‑hosted / VPC deployment.
- Block public endpoints at network and proxy layers.
2. “Chat history as source of truth”
Pattern:
- Product teams rely on built‑in “chat history” for UX.
- That history lives in the vendor’s multi‑tenant infrastructure.
- No data classification, no per‑tenant logical deletion.
Problems:
- Hard to satisfy “right to be forgotten” and strict data retention policies.
- Support chats may contain secrets, PII, and internal notes permanently stored.
- Auditors ask: “Where is this data? How do you delete it?”
Mitigation:
- Treat vendor chat history as disabled by default.
- Implement your own history store:
- In your DB, tied to tenants.
- Under your existing data retention and deletion policies.
3. Unbounded logging in observability systems
Pattern:
- Engineers log full prompts and responses into:
- Application logs
- Tracing systems
- Central log aggregators
Problems:
- These systems were never intended for unredacted PII and secrets.
- Logs often have long retention (years) without easy per‑record deletion.
- Violates internal policy and external data protection obligations.
Mitigation:
- Introduce structured LLM logging:
- Store only hashes, identifiers, or truncated/filtered content.
- Apply the same masking / redaction as you do for API payloads.
- Configure separate retention for LLM logs.
4. Vector stores becoming shadow data warehouses
Pattern:
- Teams spin up vector DBs and dump entire tables or document sets:
- Contracts, tickets, support chat, internal docs.
- Little thought to:
- Row‑level security
- Multi‑tenant isolation
- Data minimization
Problems:
- Vector DB now holds the most sensitive slices of your business.
- Many vector DBs are new and less battle‑tested on hard security requirements.
- Access patterns are often looser than main DB (e.g., “just give the app full access”).
Mitigation:
- Treat vector DBs as regulated data stores:
- Per‑tenant indexes or namespaces.
- Strict authz, encryption, and monitoring.
- Minimize embeddings:
- Chunking + selective fields, not raw documents with PII.
5. Governance by wiki page
Pattern:
- Company publishes an “AI usage policy” in a wiki.
- No technical enforcement, no central routing, no audits.
Problems:
- Developers ignore it under delivery pressure.
- Shadow usage via plugins, SaaS copilots, and browser tools explodes.
- Compliance evidence is non‑existent.
Mitigation:
- Move to policy‑as‑code:
- Gate LLM usage through a shared service / SDK / proxy.
- Implement model and data access rules as code (e.g., OPA/Rego, custom middleware).
- Provide approved building blocks (libraries, endpoints) so “doing the right thing” is the easiest path.
Practical playbook (what to do in the next 7 days)
Assume you’re a security‑minded engineer, tech lead, or CTO. Here’s a concrete 7‑day plan.
Day 1–2: Inventory and traffic stop
-
Discover existing LLM usage
- Code search:
openai,gpt-,anthropic,vertexai,llm,chatCompletion. - Proxy / egress logs: look for requests to known model vendors.
- Browser extension review for high‑risk teams (support, sales, eng).
- Code search:
-
Classify usage
For each integration, capture:
- Data types involved (PII/PHI/PCI/secrets/internal).
- Environment (prod/stage/dev).
- Vendors and models used.
- Retention / training settings if known.
-
Impose a gentle change freeze
- Communicate: “No new external LLM integrations this week without review.”
- Not a ban; just pause the blast radius while you get control.
Day 3: Minimum viable guardrails
-
Choose a default pattern: central gateway or blessed SDK
- If your org is small: a minimal internal SDK that:
- Wraps vendor calls.
- Enforces model allow‑list.
- Sets
data_processing=false/ “no training” / minimum retention.
- If you’re bigger / have a platform team: deploy a central LLM proxy:
- All LLM traffic goes through it.
- Policy enforcement happens there.
- If your org is small: a minimal internal SDK that:
-
Set default security & privacy controls
At this stage, be opinionated and simple:
- Only 1–2 approved providers and 2–3 approved models.
- Explicit configuration for:
- No training on your data.
- Minimal logging at vendor.
- Region pinning if needed for data residency.
-
Turn off high‑risk vendor features where possible
- Chat history, “improve our models with your data”, broad “AI assistant” ingest.
- Document what you’ve disabled and why.
Day 4–5: Policy‑as‑code for LLMs
-
Define a small initial policy surface
Express rules like:
- Data classification to model mapping:
- Public data → any approved model.
- Internal data → approved vendors with enterprise DPAs.
- PII/PHI/PCI → allowed only via specific models + no vendor logging.
- Environment constraints:
- Dev/stage can’t call production‑only models with real customer data.
- Usage caps:
- Per‑service and per‑user rate and spend limits.
- Data classification to model mapping:
-
Enforce in the gateway/SDK
Concretely:
- Inspect request metadata (classification, tenant, env).
- Block or route based on policy.
- Attach a unique request ID for audit.
-
Hook in lightweight redaction
- Before sending to vendor, apply:
- Regex‑based secret scanning (API keys, tokens).
- Basic PII masking where feasible.
- Log redaction events (with hashes, not raw content) for later improvement.
- Before sending to vendor, apply:
Day 6: Auditing and logging
-
Define what you audit
For each LLM request, log (internally):
- Timestamp
- Caller (service, user, role)
- Tenant / customer ID
- Data classification
- Vendor + model
- Whether redaction was applied
- Whether blocked/allowed by policy
- Token counts / cost (for cost governance)
Do not log full prompts/responses by default.
-
Wire logs into your existing security stack
- SIEM alerts for:
- New vendors / models being called.
- High‑sensitivity data classifications hitting non‑approved paths.
- Sudden usage spikes by a service or user.
- SIEM alerts for:
Day 7: Compliance alignment and docs
-
Map to SOC2 / ISO 27001 controls
At minimum, document how LLM usage maps to:
- Access control (who can configure models and vendors).
- Change management (how model changes are reviewed/deployed).
- Vendor management (DPA, security review of LLM providers).
- Data retention and deletion (for logs, vector stores, fine‑tunes).
- Incident response (how you’d investigate an LLM‑driven data leak).
-
Write a 2‑page “LLM Data Handling Standard”
Keep it short and concrete for engineers:
- Approved vendors and models.
- What data may / may not be sent.
- Required integration pattern (gateway/SDK).
- How to request exceptions.
- Links to code examples.
Then iterate. You will evolve this like any other security control, but in one week you’ve moved from “we have no idea what’s going on” to “we have a governed data plane with auditability.”
Bottom line
LLMs introduce new data flows more than they introduce new math.
If you already care about:
- Privacy and data retention
- Vendor risk and model risk
- SOC2 / ISO 27001 alignment
- Policy‑as‑code and reproducible security posture
…then you already know what to do. The problem is not inventing new frameworks; it’s applying your existing discipline to a new set of infrastructure.
Key points to internalize:
- Treat LLM providers as high‑risk data processors from day one.
- Route all usage through a governed plane (gateway or SDK), not ad‑hoc calls.
- Minimize what you send, and be explicit about what gets stored, where, and for how long.
- Make policies executable, not just documented.
- Design for auditability early; retrofitting it later is painful.
Your future audit finding will not say “you used AI”. It will say:
You transmitted high‑sensitivity data to external processors without adequate controls, visibility, or retention policies.
You can fix that now, before your LLM stack becomes the least governed part of your security program.
