Your VPC Is Not a Security Boundary: Hard Truths About AWS Serverless Security

Table of Contents

Why this matters right now

Serverless on AWS has quietly become the default for a lot of new work:

APIs on API Gateway + Lambda or Lambda + ALB
Event-driven glue with SQS, SNS, EventBridge
Data pipelines with Kinesis, Lambda, Step Functions, DynamoDB
Internal platforms built on top of these building blocks

From a security and cloud engineering perspective, that changes where you’re actually exposed:

You have more identity and permissions edges (IAM, resource policies, service roles)
You have less control of the OS and network, but more control of app logic and policies
Observability and incident response now depend heavily on logs, traces, and config state, not host forensics

At the same time:

Attackers are actually exploiting misconfigured IAM, public S3, vulnerable Lambda dependencies, and over-permissioned roles.
Cloud bills are quietly inflated by always-on security scanning tools you barely use or understand.
Platform teams are now responsible for securing capabilities, not just EC2 instances.

This post is about building a realistic mental model for AWS serverless security, and avoiding the assumptions that get teams breached or burned.

What’s actually changed (not the press release)

Strip away the marketing; the real changes with AWS serverless from a security perspective:

Network is weaker as a primary control; identity is stronger
- You used to think in terms of “inside the VPC = trusted.”
- With Lambda, API Gateway, EventBridge, S3, DynamoDB, Step Functions:
  - Many things are publicly addressable by design (APIs, S3 policies, EventBridge buses).
  - Resource policies + IAM conditions + auth layers are now more important than subnets and security groups.
- VPC-only Lambdas are common, but they often call public AWS control plane APIs that matter more than east-west traffic.
The security boundary is now at the integration surface
- Instead of “the server running this app,” you have:
  - API Gateway → Lambda → SQS → Lambda → DynamoDB → EventBridge → Step Functions → external SaaS
- Each hop has a separate auth and authorization model (IAM roles, resource policies, JWTs, custom auth).
- Mistakes in any one can yield privilege escalation or data exfiltration.
Blast radius lives in IAM, not instance size
- A single over-permissioned Lambda execution role is now equivalent to a compromised root process on a large EC2 box.
- Lambda concurrency and auto-scaling can amplify an issue very quickly:
  - Misconfigured role + compromised function = rapid data shredding or extraction.
Observability has become your incident response substrate
- No host-level forensics. You have:
  - CloudTrail, Lambda logs, API Gateway logs, VPC Flow Logs, ALB logs, X-Ray traces, Config snapshots.
- If you didn’t design for this upfront, your “forensics” during an incident will be guesswork.
Platform engineering and security are now coupled
- Internal platforms (golden paths, templates, “paved road” stacks) define:
  - Default IAM boundary strength
  - Logging defaults
  - Encryption defaults
  - How easy it is to accidentally publish something to the internet
- In practice, the platform enforces most of your real security posture, not just your security team.

How it works (simple mental model)

You can reason about AWS serverless security using three layers:

1. Identity and policy plane (the real perimeter)

This is IAM, resource policies, and auth:

Principals: IAM roles (Lambda execution roles, ECS tasks, CI jobs), users, federated identities.
Permissions:
- Identity policies (attached to roles/users)
- Resource policies (S3 bucket policy, Lambda resource policy, API Gateway resource policy, SQS queue policy, KMS key policy).
Conditions: aws:SourceArn, aws:SourceAccount, VPC endpoint conditions, source IP, organization ID, tags.

Think:

“Who can call which API, with what conditions, and what can that API then do on their behalf?”

This is where most real-world breaches start:
– Publicly accessible API with weak or missing authorization
– Lambda execution role with *:* or wildcards across sensitive services
– Resource policies that allow cross-account access without strong conditions

2. Data plane (movement and storage of data)

This is what actually flows:

S3 objects, DynamoDB items, Kinesis records, SQS messages, event payloads, logs.
Movement between services: Lambda reads/writes, Step Functions transitions, SNS fan-out, EventBridge routing.

Security questions here:

Is data encrypted at rest (KMS keys, key policies, grants)?
Is data encrypted in transit (TLS, mTLS where relevant, private endpoints)?
What’s the maximum fan-out of any given message or event?
Where can sensitive payloads leak (logs, DLQs, retry queues, debug output)?

3. Control plane (configuration and observability)

This is how the system is configured and monitored:

CloudFormation, CDK, Terraform, serverless frameworks
CloudWatch Metrics, Logs, X-Ray, CloudTrail, AWS Config
Security tooling (Config rules, Security Hub, GuardDuty, custom detectors)

Security issues here:

Who can change security-relevant config (IAM, bucket policies, API Gateway authorizers)?
Are security-relevant changes reviewed and auditable?
Do you have alerts that fire on misconfigurations, not just runtime events?
Can you reconstruct “what happened” from logs if something goes wrong?

When you think about a serverless system, analyze it top-down:

Identity plane: Who can call it / what can it call?
Data plane: What sensitive data flows through / where can it go?
Control plane: Who can change its behavior or weaken its security?

Where teams get burned (failure modes + anti-patterns)

1. “VPC = safe” fallacy

Example:
An internal team builds a “private” Lambda API behind an ALB, in private subnets. They:

Use AWS SDK from the function to hit various services.
Give the function an execution role with AdministratorAccess during prototyping.
Never tighten it.

Incident pattern:

SSRF or injection vulnerability in their handler → attacker runs arbitrary AWS SDK calls using that role.
Result: reading S3 buckets, secret exfiltration from Secrets Manager, changing IAM.

Root issue: IAM, not the VPC, determined the real blast radius.

2. Over-permissioned cross-account access

Example:
A platform account hosts an EventBridge bus for org-wide events. An application account has:

A rule in the platform account sending events to its bus.
A resource policy on the app account’s bus allowing events from the platform account with Principal: * and missing or weak aws:SourceAccount conditions.

Failure mode:

Any compromised principal in the platform account can send arbitrary events into the app account’s bus.
Downstream Lambdas in the app account trust event origin and may take sensitive actions.

3. Public APIs with “soft” auth

Example:

API Gateway + Lambda public endpoint for partner integrations.
Security handled via custom headers or weakly validated JWTs.
No WAF, no meaningful rate limits, no strict auth enforcement at the gateway level.

What goes wrong:

Attackers brute-force or bypass the soft auth.
Even without full compromise, they drive up Lambda invocations and downstream database load (cost + availability hit).
Logs fill with sensitive payloads from noisy attack traffic.

4. Logs as a liability

Example:

Lambdas and Step Functions log entire request/response payloads (including tokens, PII, secrets) to CloudWatch Logs for “debugging.”
Devs forget to sanitize or disable this in production.
No retention policies or log access limits.

Risk:

Anyone with broad CloudWatch Logs or Athena-on-logs access has indirect access to sensitive data.
In an incident, logs themselves become a breach surface.

5. Security tools that blindside your bill without improving posture

Pattern:

Multiple agents / scanners / config tools, each:
- Running frequent API scans across all accounts
- Producing noisy findings someone occasionally exports to a spreadsheet
- Little integration into CI/CD or platform templates

Impact:

Material AWS bill from these tools and their underlying data stores.
Engineering teams ignore alerts due to baseline noise.

Security-wise, you’ve just bought threats you won’t act on.

Practical playbook (what to do in the next 7 days)

Assuming you run on AWS with some serverless footprint (Lambda, API Gateway, SQS, SNS, EventBridge, DynamoDB, S3):

Day 1–2: Establish where your real risk sits

Inventory critical serverless workloads
- APIs handling auth, payments, PII
- High-volume event pipelines
- Cross-account shared infrastructure (buses, buckets, KMS keys)
For each, list:
- Entry points (public/private APIs, events, queues)
- IAM roles involved (execution roles, CI roles, cross-account roles)
Quick IAM blast radius review
- For top-10 most critical Lambda functions, inspect the execution role:
  - Look for * actions ("Action": "*") or wide wildcards ("dynamodb:*", "s3:*") on sensitive resources.
  - Check if the role can modify IAM, KMS, CloudTrail, S3, or parameter stores.
- If you find obvious over-broad roles, document, don’t fix yet; you want a plan, not ad hoc edits.

Day 3–4: Lock the identity plane down one step

Define a hard “no” list for Lambda roles
- Unless absolutely necessary, Lambda execution roles should NOT have:
  - iam:*
  - kms:* (beyond specific key usage)
  - s3:* on all buckets
  - sts:AssumeRole into more-privileged accounts
Apply minimal scope to 2–3 critical Lambdas
- Rewrite their IAM policies to:
  - Use resource-level constraints (specific buckets, tables, queues).
  - Prefer allowed actions lists (e.g., dynamodb:GetItem, dynamodb:PutItem) over dynamodb:*.
- Validate with integration tests (or targeted manual tests) to avoid breaking prod.
Add missing resource policy conditions on cross-account resources
- For S3, SQS, EventBridge, KMS used cross-account:
  - Ensure you have both Principal and Condition (aws:SourceArn, aws:SourceAccount) locked down.
  - Avoid “allow org” wildcards without checking if that’s truly desired.

Day 5: Get observability to minimum viable incident-response

Turn on / verify key logs
- CloudTrail org-level, multi-region for management events.
- API Gateway access logs for public APIs with at least: status code, caller IP, latency, request path.
- Lambda logs with sanitized payloads (remove tokens, PII).
Set coarse but meaningful alerts

Use CloudWatch alarms or existing tooling to alert on:
- Sudden spike in API 5xx for critical APIs
- Sudden spike in Lambda errors for security-sensitive functions
- CloudTrail events:
  - IAM policy changes
  - KMS key policy changes
  - CloudTrail modifications
You’re not building a full SIEM in a week; you’re establishing tripwires.

Day 6: Fix the worst logging liabilities

Search for PII or secrets in logs:
- Spot-check Lambda handlers for console.log / print of full payloads.
- Identify services where request/response bodies are routinely logged.
For the worst offenders:
- Remove or redact sensitive fields from logs.
- Add basic logging guidelines to your internal platform templates / coding standards.

Day 7: Align platform engineering with security

If you have a platform team, do a 60–90 minute working session with security:
- Review your “golden path” templates (CDK constructs, Terraform modules, starter repos).
- Ensure defaults include:
  - Encrypted data stores with customer-managed KMS keys where warranted
  - Least-privilege IAM roles baked into templates
  - Logging enabled by default and reasonably structured
  - Sensible API Gateway + Lambda auth patterns (Cognito, JWT validation, custom authorizers, or ALB auth)
Decide one short feedback loop:
- E.g., security reviews changes to platform modules and provides threat models, not individual app stacks.
- Over time, this will have more impact than chasing every single misconfigured Lambda.

Bottom line

On AWS serverless, the real security boundary is:

IAM principals and policies
Resource policies and their conditions
The pathways data can take between managed services
The observability you have when something goes wrong

VPCs, subnets, and security groups still matter, but they’re no longer the primary defense.

If you:

Treat every Lambda execution role like a root shell on your account,
Make cross-account and public entry points explicit and tightly constrained,
Design logs and metrics as incident-response tools, not afterthoughts,
And push these rules into platform-level defaults,

you’ll be meaningfully safer than most AWS users, without doubling your cloud bill or your cognitive load.

The hard part isn’t more tooling or more features. It’s accepting that your “servers” are now policies, identities, and events—and engineering them with the same discipline you used to apply to machines.