Your AI Stack Is Now Your Attack Surface

Table of Contents

Why this matters right now

You don’t have “an AI feature” anymore. You have:

A zoo of model runtimes (hosted, self-managed, embedded)
Secrets sprawled across notebooks, CI, and prompt templates
Third‑party AI APIs plugged directly into production workflows
Auto-remediation bots with write access to your infra

This is a new security perimeter, and most orgs are treating it like a demo environment.

Specific changes that make cybersecurity by design non-optional for AI systems:

Identity is weird now
- Fine-grained access to models, features, and vector stores is almost always an afterthought.
- “The model did it” is becoming a real problem when models can mutate infra, tickets, and user data.
Secrets handling is worse, not better
- API keys in prompts, env vars in notebooks, model weights in S3 with broad access.
- “Temporary” dev shortcuts end up in production: hard-coded tokens, shared accounts, no rotation.
Cloud security posture is drifting faster
- New GPU clusters, ephemeral workloads, and data copies for training/testing lead to IAM sprawl and S3/RDS sprawl.
- Data scientists and ML engineers often operate outside the usual infra-guardrails.
Supply chain = model + data + code
- You now depend on model hubs, pre-trained weights, and fine‑tuning pipelines in addition to libraries and containers.
- Attacks look like: poisoned datasets, malicious model artifacts, compromised embeddings libraries.
Incident response is lagging reality
- Runbooks rarely cover “model exfiltrates secrets via prompt injection” or “fine-tuning job leaked PII.”
- Security teams can’t reason about model behavior with the same tools they use for infra.

If you’re running AI in production, you already have these risks. The question is whether you can observe, limit, and recover from them.

What’s actually changed (not the press release)

Ignore the generic “AI will change everything” noise. From a practitioner’s perspective, these are the concrete shifts.

1. Models have power, not just access

Previously:

Services read/write data within clearly-bounded apps.
“Smart” components were advisory (e.g., recommendations) with limited side effects.

Now:

Agents and LLMs can:
- Execute tools that call internal APIs.
- Modify tickets, configs, dashboards, code.
- Orchestrate other services.

Net effect: Logical decisions (what to do) are increasingly delegated to components that are probabilistic and hard to audit.

2. Your blast radius extends into third-party AI providers

You’re shipping:

Chatbots that send user inputs (including PII) to external LLM APIs.
RAG systems that fetch internal docs and pass them verbatim to hosted models.
Monitoring or support tools that auto-summarize incidents with production logs.

Net effect: Data exfiltration risk is higher, and your trust boundary now spans vendors’ security posture and retention policies.

3. Data duplication and drift are normal

MLOps reality:

Same dataset in:
- Raw lake
- Feature store
- Training bucket
- Fine-tuning bucket
- Notebook exports
Feature variants and embeddings are often less governed than core tables.

Net effect: Data classification and access controls are weaker exactly where the most sensitive data often ends up.

4. Supply chain now includes models and datasets

Previously:

Main concerns: dependencies, containers, CI artifacts.

Now additionally:

Models downloaded from public hubs.
Fine-tuning jobs on user-generated content.
Data pipelines that pull from semi-trusted sources.

Net effect: You can ship a compromised model or poisoned dataset even if your application code and containers are clean.

How it works (simple mental model)

A workable mental model for “cybersecurity by design” in AI systems:

Four planes of control: identity, data, execution, and feedback.
Make each explicit, minimal, and observable.

1. Identity plane: who is allowed to do what?

Think beyond “users and services.”

You now have:

Human users (end-users, ops, DS/ML, admins)
Machine identities (services, jobs, agents, notebooks)
External providers (LLM APIs, hosted vector DBs)
Derived identities (tenants, projects, experiments)

Design principles:

One identity per boundary-crossing actor
- Each model-serving endpoint, training job, and agent gets its own identity.
RBAC/ABAC for AI resources
- Permissions for:
  - Which datasets can be read.
  - Which tools/APIs can be called.
  - Which tenants can be accessed.

2. Data plane: what data lives where?

You need to explicitly model:

Data classes (public, internal, sensitive, regulated)
Data states (raw, feature, embedding, training, logs)
Data flows (source → transform → model → logs/metrics)

Design principles:

Classification & tagging, at least for:
- Training datasets
- Embedding corpora
- Model I/O logs
Policy per class:
- Where it can be stored.
- Who/what can read/write.
- Whether it can leave your VPC / region.

3. Execution plane: how and where code/models run

The execution surface includes:

Model servers (containers, serverless, managed)
Training/fine-tuning jobs
Batch inference jobs
Agent frameworks invoking tools

Design principles:

Least privilege by execution context
- Training job identity != serving identity.
- Agents have constrained toolsets.
Guardrails as code, not conventions:
- Allowed tools, rate limits, output filters enforced in code.

4. Feedback plane: what you learn and how it feeds back

Modern AI systems are often self-referential:

Logs and user feedback are used to tune prompts, models, or guardrails.
Incidents trigger policy updates.

Design principles:

Make behavior observable:
- Capture model/tool call traces with actor, inputs, outputs, and decisions.
Close the loop:
- Incidents → updated data access policies.
- Misuse → updated agent tools/permissions.
- Drift → updated guardrails.

This four-plane model lets you reason systematically instead of treating AI features as “just another microservice.”

Where teams get burned (failure modes + anti-patterns)

Failure mode 1: “It’s internal, so it’s safe”

Pattern:

Internal RAG system built for employees.
Connects to wide-open document storage.
Exposes a chatbot that can answer “anything.”

What goes wrong:

Sensitive HR/legal/exec docs become discoverable by anyone with chat access.
No per-tenant or per-role filtering at retrieval time.

Mitigation:

Apply row/document-level security at the retrieval layer.
Include identity in the query to the vector store (not just in the app).

Failure mode 2: Secrets leakage via prompts and logs

Pattern:

API keys or internal URLs embedded directly into prompts (“call this endpoint with this token”).
Model I/O logging dumps entire prompts and responses into a central log store.

What goes wrong:

Logs become a goldmine of secrets.
A compromised log system or LLM training loop leaks access to prod systems.

Mitigation:

Structured logging: separate sensitive fields; avoid raw prompts with injected secrets.
Use short-lived, scoped tokens passed alongside the request, not embedded in the prompt.

Failure mode 3: Over-trusting AI agents with tools

Pattern:

Agent can create tickets, run queries, and even modify configs.
Safety is “the prompt says be careful.”

What goes wrong:

Prompt injection leads to unintended actions: ticket spam, config changes, data exports.
Hard to prove which actions were legitimate vs. manipulated.

Mitigation:

Treat tools as mutating operations with approvals, not free-for-all:
- Read‑only tools are cheap to expose.
- Write tools require stricter policies, sometimes human approval.
Capture a decision trace: why was this tool invoked, with what context?

Failure mode 4: Shadow MLOps outside security guardrails

Pattern:

Data scientists spin up ad-hoc clusters, storage buckets, or notebooks.
Pull production data snapshots into “temporary” test environments.

What goes wrong:

Sensitive data lands in under-protected environments.
No central visibility into what models/data exist and where.

Mitigation:

Provide paved roads for experimentation:
- Default secure S3 buckets, managed notebooks, pre-approved model registries.
Central asset inventory: models, datasets, jobs, and ownership.

Failure mode 5: Ignoring model and data supply-chain risks

Pattern:

Download a popular pre-trained model.
Fine-tune on user data.
Deploy without scanning or integrity controls.

What goes wrong:

Model already contains malicious backdoors or data exfiltration behavior.
Poisoned training data produces targeted failures.

Mitigation:

Treat models like artifacts:
- Signed, versioned, scanned.
Maintain provenance metadata:
- Source, hashes, training data lineage, approvals.

Practical playbook (what to do in the next 7 days)

You can’t redesign everything in a week, but you can materially reduce risk.

Day 1-2: Inventory and classify

Catalog AI entry points:
- List all:
  - LLM/ML-backed APIs
  - Internal/external AI apps
  - Agent-like systems with tools
- For each, capture:
  - Owner
  - What data it touches
  - What it can mutate
Classify data flows:
- For each AI system:
  - Inputs (user data, logs, documents)
  - Outputs (responses, summaries, actions)
  - External calls (LLM APIs, vectors, third-party tools)
- Mark sensitive data flows (PII, financial, health, source code).

Day 3: Lock down the obvious holes

Tighten secrets and logs:
- Check that:
  - No API keys are included in prompts.
  - Model logs exclude or redact secrets, auth tokens, and high-sensitivity fields.
- Rotate any secrets found in prompts/notebooks/repos.
Restrict external model calls:
- Add:
  - One network egress point for third‑party AI APIs.
  - Basic allowlist (which services can call which models).
- Disable or gate any “experiment” endpoints hitting external LLM APIs with production data.

Day 4: Enforce least privilege in one critical path

Pick one high-value AI workflow (e.g., support agent, RAG over company docs):
- Enforce:
  - Per-user/role filtering in the retrieval layer.
  - Separate identities for:
    - Frontend app
    - Retrieval service
    - Model service
- Reduce each identity’s permissions to the minimum actually used.

Day 5: Add traceability

Turn on structured, privacy-aware traces for AI calls:
- Capture:
  - Actor identity (user, service, tenant)
  - Model/endpoint version
  - Tools called and their parameters (redacted where needed)
  - High-level classification of inputs/outputs (not full content for sensitive classes)
- Route to an observable but access-controlled store.

Day 6: Integrate AI into incident response

Extend one incident playbook to cover an AI-specific scenario:
- Example scenarios:
  - Prompt injection leads to data exfiltration.
  - Misconfigured RAG exposure.
- Define:
  - How to disable or roll back specific models/agents quickly.
  - How to search logs/traces for blast radius.
  - Who owns decision-making for turning features off.

Day 7: Decide on your default posture

Make two explicit decisions and document them:
- Default for external AI APIs:
  - “No regulated/sensitive data ever,” or
  - “Allowed only through specific vetted services with DPIAs / contracts,” etc.
- Default for AI agents with write capabilities:
  - “No direct mutations in prod without human approval,” or
  - “Only for low-risk systems with strict scopes and rate limits.”

Write these down in your engineering handbook or runbook. Ambiguity is where breaches hide.

Bottom line

AI adoption has quietly redefined your security perimeter:

Models and agents are new high-privilege actors.
Data is copied, transformed, and logged in places your legacy controls don’t cover.
Third-party AI providers and model artifacts extend your supply chain risk.

“Cybersecurity by design” for AI isn’t about buying new tools; it’s about:

Making identity, data, execution, and feedback planes explicit.
Applying boring, well-understood security practices (least privilege, auditability, separation of duties) to these new components.
Treating models, datasets, and AI workflows as first-class citizens in your security and incident response programs.

If you ship AI to production, you already have an AI security architecture. The choice is whether it’s accidental or intentional.

Your AI Stack Is Now Your Attack Surface

Why this matters right now