Your VPC Is Not a Castle: Real Security in Serverless AWS

Table of Contents

Why this matters right now

If your org is going heavy on AWS serverless and platform engineering, your threat model has quietly changed more in the last three years than in the previous ten.

Key shifts:

You don’t own the OS anymore.
Your “hosts” are ephemeral and unpatchable (by you).
Blast radius is defined by IAM and configuration, not subnets and firewalls.
Observability tools are chatty but blind in the places that matter for modern cloud security.

Meanwhile:

Attackers are exploiting misconfigured IAM roles, event-driven fan-out, and overlooked managed services (S3, SQS, EventBridge, Step Functions) at scale.
Your existing security program is still optimized for EC2, bastion hosts, and perimeter firewalls.
Cost optimization has pushed you to consolidate shared infrastructure (shared VPCs, shared Kubernetes clusters, shared accounts), which increases lateral movement opportunities if you get one thing wrong.

This post is about the boring but real mechanics of securing AWS serverless and platform-engineered setups with a focus on:

Identity and access (IAM, roles, trust policies).
Network boundaries (VPC, private endpoints, egress).
Event-driven attack paths (Lambda triggers, SQS, SNS, EventBridge).
Observability and detection that actually catches abuse.

If you’re responsible for production reliability and cloud security, you need a mental model that matches how AWS actually works in 2026, not 2015.

What’s actually changed (not the press release)

From a cybersecurity and cloud engineering standpoint, three structural changes matter.

1. Identity is now the real perimeter

In classic EC2-centric setups:

Security groups + subnets + NACLs = primary guardrails.
IAM was often “glue” for automation.

In modern AWS serverless stacks:

90% of critical permissions are IAM-based (Lambda, Step Functions, EventBridge, S3, DynamoDB, KMS, Secrets Manager).
Most services talk to each other over AWS’s internal control plane, not TCP in your VPC.
Cross-account access via roles and resource policies is the norm.

Consequence: One overly permissive Lambda role or S3 bucket policy can replace an entire compromised subnet in terms of blast radius.

2. Platform engineering teams have become de facto security gatekeepers

Platform teams roll out:

Shared CI/CD workflows.
Shared VPCs and transit gateways.
Centralized logging and observability.
“Golden paths” for serverless patterns.

In practice:

Whatever defaults they choose (or fail to choose) become your security posture.
Overly generic platform modules (“lambda-with-full-s3-access”) get widely reused.
“Security as a platform feature” is still young; most platforms optimised for DX and speed, not least-privilege.

3. Observability is noisy where it’s safe, quiet where it’s dangerous

You have:

CloudWatch logs, traces, metrics.
Distributed tracing across Lambda/APIGW.
Structured logs from your apps (hopefully).

But you’re often missing:

Systematic visibility into IAM changes and role usage.
End-to-end mapping of which identities can touch which data.
Context-aware alerts on cross-account role assumptions or anomalous event triggers.

Consequence: You know when a Lambda is slow, but not when it’s exfiltrating data through a misconfigured role.

How it works (simple mental model)

You can think of a modern AWS serverless environment as four interlocking planes:

Identity Plane (IAM, STS, resource policies)
- Who can call what, on whose behalf.
- Roles, policies, trust relationships, condition keys.
- Examples: Lambda execution roles, Step Functions state machine roles, cross-account roles.
Data Plane (S3, DynamoDB, RDS, SQS, KMS)
- Where sensitive data lives and moves.
- Protected by resource policies, encryption, VPC endpoints.
- Accessed almost always via AWS API calls authorized by IAM.
Event Plane (API Gateway, EventBridge, SNS, S3 notifications, DynamoDB streams)
- What triggers code and in what sequence.
- Defines implicit connectivity between services that might otherwise seem isolated.
- Harder to visualize, easier to misuse.
Boundary Plane (VPC, PrivateLink, security groups, egress controls)
- Controls where packets go when leaving AWS services that support VPC networking.
- Increasingly used not for “north-south” traffic but for “east-west” containment and exfiltration control.

Security and reliability come from aligning these planes:

Identity plane defines who can do things.
Event plane defines when and under what conditions actions occur.
Data plane defines what is at risk.
Boundary plane caps where data can escape.

If you treat serverless security like EC2 security, you’ll over-invest in the boundary plane and under-invest in identity and events.

Where teams get burned (failure modes + anti-patterns)

1. “One role to rule them all” Lambda patterns

Anti-pattern:

Platform team creates a Terraform module: generic_lambda with:
- A catch-all IAM policy (s3:* on *, dynamodb:* on *, kms:Decrypt on *).
- Handy for prototyping, never cleaned up.
100+ functions now share this execution role.

Failure mode:

SSRF in one API Lambda → attacker gets Lambda role credentials.
With that, they can enumerate S3 buckets, read sensitive logs, decrypt KMS secrets, potentially assume other roles.
This shows up in CloudTrail as your own Lambda calling APIs; not an obvious “red alert.”

2. Over-trusted cross-account roles

Pattern:

“Central security” or “central data” account with:
- Cross-account roles that can read logs or access data.
- IAM sts:AssumeRole policies that trust *.prod accounts broadly.

Failure mode:

Compromise in a single prod account → attacker uses the trust relationship to move into the central account.
From there, they gain global visibility and potentially write access to logging/detection systems.

This is the cloud security equivalent of putting the root CA on a public terminal because “we need access from everywhere.”

3. “Private VPC = safe” thinking

Pattern:

Lambda, Fargate, and RDS all in private subnets.
NAT gateway for outbound.
No serious egress controls or DNS filtering.

Failure mode:

Compromised function still has:
- Outbound internet via NAT.
- Access to internal services exposed via security groups.
If IAM roles are powerful, attacker can also hit S3, KMS, Secrets Manager directly over AWS APIs, bypassing VPC entirely.

The VPC keeps inbound noise low, but doesn’t meaningfully protect you from a compromised identity.

4. Event-driven escalation paths

Example pattern:

S3 upload → triggers Lambda A (validation) → writes to SQS → Lambda B (processing) → writes to DynamoDB and triggers Streams Lambda C (indexing).

Failure modes:

A single misconfigured S3 bucket policy allows an attacker to upload crafted files.
Lambda A uses a wide-scope role that can write arbitrary messages to SQS.
Lambda B’s code trusts message content too much (command injection, path traversal, etc.).
An attacker uses the event chain as a programmable pipeline to reach deeper systems.

Most teams threat-model HTTP APIs; few threat-model S3 notifications and EventBridge buses with the same rigor.

Practical playbook (what to do in the next 7 days)

Assuming you have typical AWS serverless / platform engineering patterns in place, here is a focused, realistic 7‑day plan.

Day 1–2: Get a truthful map of your identity plane

Enumerate high-privilege machine identities
- Lambda execution roles, ECS task roles, CI/CD roles, Step Functions roles.
- Criteria: AdministratorAccess, *:* actions, or wide-scoped s3:*, kms:*, iam:*, sts:*.
Flag shared roles
- Any role used by >1 logical application or service.
- Any role assumed cross-account by multiple principals.
Outcome: A short list (ideally <20) of “if this role is compromised, we’re in trouble” identities.

Day 3: Enforce a minimum of least-privilege

For the top 5–10 high-risk roles:

Scope S3 permissions to specific buckets and prefixes.
Scope KMS permissions to specific keys and actions (Decrypt only where needed).
Remove iam:* unless absolutely necessary (and then lock to particular roles).

Don’t chase perfect least-privilege yet. Aim for:

“Full wildcard to app-specific wildcard” (e.g., s3:* → s3:GetObject,PutObject on arn:aws:s3:::myapp-*).
“Account-wide to resource-specific” (e.g., kms:* on * → key ARNs).

Day 4: Put a fence around outbound traffic

For workloads in a VPC:

Introduce or tighten egress control
- Use dedicated egress NAT or firewall-like services.
- Deny random outbound to the internet; allow-list what’s truly needed (third-party APIs, etc.).
Prefer AWS PrivateLink / VPC endpoints
- For S3, DynamoDB, Secrets Manager, SSM, KMS.
- Block corresponding public endpoints from your VPC where feasible.

Goal: If a Lambda or container is compromised, exfiltration has to go through well-defined, observable paths, not arbitrary HTTPS to the internet.

Day 5: Harden your event plane

Enumerate external entry points
- API Gateway, ALB, S3 buckets with public or cross-account access, EventBridge rules that accept events from other accounts or SaaS partners.
For each external entry point:
- Confirm there’s some notion of authentication/authorization (IAM auth, JWT, signed URLs, bucket policies with conditions, etc.).
- Validate that the triggered Lambda roles are not over-privileged.
Add basic abuse logging
- Ensure request IDs, caller identity (if any), and key metadata are logged in a consistent format for these entry points.

Day 6: Basic identity-centric observability

Turn critical security events into first-class alerts
- New IAM role creation with sts:AssumeRole trust.
- Role policy attachment that includes wildcards for iam:*, kms:*, s3:*, sts:*.
- Changes to resource policies on S3, KMS, and EventBridge buses.
**Track role usage, not just API

Your VPC Is Not a Castle: Real Security in Serverless AWS

Why this matters right now