Your “Secure” AWS Serverless Stack Probably Isn’t


Why this matters right now

If you’re moving hard into AWS serverless—Lambda, API Gateway, DynamoDB, Step Functions, EventBridge, S3—you’ve probably been told it’s “secure by default.”

It isn’t.

You inherit less OS and network surface area, but the blast radius is now mostly:

  • IAM misconfiguration
  • Over-permissive event wiring
  • Insecure defaults in managed services
  • Missing observability and guardrails

And because serverless is cheap per unit and highly parallel, small mistakes become:

  • Large data exfiltration in seconds
  • Massive unauthorized fan-out writes
  • Huge cloud bills from abuse

At the same time:

  • Security teams can’t keep up with the velocity of infra changes.
  • Platform teams are asked to “own security” without being given authority or tooling.
  • Auditors don’t understand serverless semantics and lean on irrelevant controls.

If your AWS security model is still “VPC + security groups + WAF on an ALB,” you’re missing the main ways serverless systems get compromised.

This post is about practical security for AWS serverless and platform engineering: what’s changed, where the real risks are, and how to build guardrails that actually work in production.


What’s actually changed (not the press release)

Three concrete shifts matter for cybersecurity on AWS serverless:

1. Identity is the primary perimeter

In traditional architectures, you had:

  • Network boundaries (VPC, subnets, SGs)
  • Host boundaries (EC2 instances, containers)

In serverless, most resources are:

  • Publicly reachable endpoints (API Gateway, AppSync)
  • Event buses and queues (EventBridge, SNS, SQS)
  • Data stores exposed via IAM (DynamoDB, S3, Secrets Manager)

IAM policies and resource-based policies are now your de facto firewall.

Implication: A single bad s3:* or dynamodb:* in a Lambda role is the equivalent of punching giant holes in your perimeter.

2. Everything is “integration by default”

AWS has made it trivial to glue services:

  • S3 → Event notifications → Lambda
  • DynamoDB Streams → Lambda
  • API Gateway → Lambda / Step Functions
  • EventBridge buses → Targets everywhere

The good: easy to compose.

The bad: easy to create unbounded trust chains and unexpected call graphs.

Implication: An attacker who gets code execution in one function can pivot across:

  • Other functions (via EventBridge, SNS, SQS)
  • Data stores (DynamoDB, S3, RDS Proxy)
  • External systems (webhooks, Slack, email)

3. Security posture is now code, not tickets

Reality in most orgs:

  • Infra is managed via Terraform / CDK / CloudFormation.
  • PR review is the only real gate.
  • Security teams can’t manually review every template.

Implication: If your platform engineering team doesn’t provide secure defaults and automated checks, every squad will re-discover the same security pitfalls.

This is why “we’ll fix it later with a security review” is fiction in serverless environments.


How it works (simple mental model)

Use this mental model for serverless security on AWS:

Every unit of compute is cheap, short-lived, and extremely powerful. Your job is to constrain what each unit can see, do, and emit.

Break it down into four planes:

1. Identity plane

Questions:

  • Who can assume which roles?
  • What can those roles do?
  • How are tokens and credentials scoped and rotated?

Tactics:

  • One IAM role per Lambda function or narrowly scoped group.
  • Restrictive policies:
    • No * actions unless truly necessary.
    • Resource-level constraints everywhere possible.
  • Use session policies and condition keys (aws:PrincipalTag, aws:SourceArn, aws:SourceAccount) to bind permissions to specific call paths.

2. Data plane

Questions:

  • Which data stores does a function talk to?
  • At what granularity is access controlled?
  • How is sensitive data classified and encrypted?

Tactics:

  • Limit each function’s data access to the minimal tables/buckets/keys.
  • Use:
    • DynamoDB fine-grained access control where feasible.
    • S3 bucket policies that deny by default, allow by specific ARNs.
    • Customer-managed KMS keys with key policies bound to roles.

3. Event plane

Questions:

  • What can produce events?
  • What can consume them?
  • Can untrusted inputs trigger high-privilege behavior?

Tactics:

  • Separate untrusted and trusted event sources:
    • External HTTP calls → dedicated “edge” Lambdas with minimal privileges.
    • Internal system events → internal buses/queues in private accounts/VPCs.
  • Use EventBridge permissions to tightly scope which principals can put events on which buses.

4. Observability plane

Questions:

  • Can you see abnormal behavior quickly?
  • Can you correlate identity, data access, and events?

Tactics:

  • Centralized logging for:
    • CloudTrail
    • Lambda logs
    • API Gateway / ALB / CloudFront logs
    • DynamoDB / S3 access logs
  • Alerts on:
    • New IAM roles with broad privileges.
    • Lambdas assuming privileges they never used before.
    • Sudden spikes in invocations, failed auth, or throttles.

Keep this mental model in mind when evaluating any design: for each Lambda, queue, or bus, ask what it can see/do/emit and how you’d notice if it misbehaved.


Where teams get burned (failure modes + anti-patterns)

1. “*” everywhere IAM

Pattern: To ship faster or “make it work,” teams apply policies like:

  • lambda:InvokeFunction on *
  • dynamodb:* on application tables
  • s3:* on critical buckets

Failure mode:

  • Compromised low-importance Lambda (e.g., from a vulnerable third-party dependency) can:
    • Read/write prod data.
    • Invoke internal admin Lambdas.
    • Put events on privileged EventBridge buses.

Example:
A fintech platform had a “metrics collector” Lambda with dynamodb:* on multiple tables for convenience. A supply-chain attack in an NPM package allowed RCE in that function, which was then used to dump user PII from an unrelated auth table.

2. Overloaded “god” Lambdas

Pattern: One big Lambda per domain:

  • Handles public API requests.
  • Talks to all core data stores.
  • Publishes to multiple event buses.

Failure mode:

  • Any bug or exploit in this function is a single point of catastrophic compromise.
  • Hard to reason about permissions because everything is “needed somewhere.”

Example:
An e-commerce team used a single Lambda for all checkout flows. It both processed orders and managed refunds. A logic bug in a discount endpoint allowed crafted requests to hit refund code paths, issuing arbitrary refunds without authorization.

3. Untrusted event sources wired directly to privileged targets

Pattern:

  • API Gateway / AppSync calling high-privilege Lambdas directly.
  • Public SNS topics with wide “publish” permissions feeding sensitive processing.
  • EventBridge buses accepting events from any principal in the account.

Failure mode:

  • Attacker bypasses normal authorization path by sending forged events.
  • System treats event as trusted and executes sensitive actions.

Example:
A SaaS platform allowed any internal microservice to put events on a central EventBridge bus. A compromised internal service forged “user-verified” events that triggered provisioning flows, granting large resource allocations to attacker-controlled tenants.

4. No cross-account boundary

Pattern: Entire stack (dev, test, prod) in a single AWS account:

  • Shared IAM namespace.
  • Shared event buses.
  • Shared CloudTrail and logs.

Failure mode:

  • Misconfigurations or compromised entities in non-prod impact prod.
  • Hard to apply least privilege across environments.

5. Observability as an afterthought

Pattern:

  • CloudTrail logs are on, but nobody looks at them.
  • CloudWatch logs exist, but no structured logging.
  • No correlation between identity, data access, and application logs.

Failure mode:

  • Breaches go unnoticed.
  • Forensics is painful or impossible.
  • Teams under-react because impact is unclear.

Practical playbook (what to do in the next 7 days)

You won’t rebuild your platform in a week, but you can dramatically improve your security posture with targeted moves.

Day 1–2: Inventory and classify

  1. List your serverless entry points:

    • API Gateway routes
    • AppSync operations
    • Public S3 buckets / static sites
    • EventBridge buses exposed cross-account
    • SNS topics with external publishers
  2. For each entry point, identify:

    • Which IAM role executes on its behalf (e.g., Lambda role).
    • Which data stores that role can access.
    • Which events it emits (SNS, SQS, EventBridge, Step Functions).
  3. Classify by sensitivity:

    • High: modifies money, identity, permissions, or PII.
    • Medium: internal state that can cause major operational impact.
    • Low: metrics, logs, cache, non-sensitive content.

Your goal: a rough map of “edge” → “privilege” flows.

Day 3: Kill the worst “*” policies

  1. Enable and use IAM Access Analyzer (or equivalent) to find:

    • Roles used by Lambda with Action: "*".
    • Policies with broad Resource: "*", especially for data services.
  2. For the top 5 most-invoked high-sensitivity Lambdas:

    • Tighten IAM policies:
      • Restrict S3 to specific ARNs.
      • Restrict DynamoDB to specific tables and actions.
      • Restrict KMS to specific keys.
    • If you can’t fully scope in one pass, at least:
      • Remove obvious overreach (e.g., iam:*, ec2:*).
      • Split out separate roles for clearly separate functions if currently shared.

Day 4–5: Put guardrails in your platform stack

If you have Terraform, CDK, or CloudFormation:

  1. Create secure-by-default modules / constructs:

    • secure_lambda that:
      • Requires an explicit allowed_resources list.
      • Attaches structured logging (JSON logs, correlation IDs).
    • secure_bucket that:
      • Blocks public access.
      • Enforces encryption.
      • Optionally writes access logs to a central bucket.
  2. Add simple static checks in CI:

    • Fail PRs that:
      • Introduce policies with Action: "*".
      • Mark S3 buckets as public.
      • Wire public endpoints (API Gateway) directly to high-privilege Lambdas without an “edge” function.
  3. Document and socialize one pattern per team:

    • E.g., “Edge Lambda with minimal privileges that calls internal private API with auth context.”

You’re building paved roads that are easier to use than insecure ad-hoc configs.

Day 6: Harden the event plane

  1. EventBridge:

    • Restrict PutEvents to specific roles and accounts.
    • For cross-account, use aws:PrincipalOrgID and aws:SourceAccount conditions.
  2. SNS/SQS:

    • Lock topic/queue policies to specific principals.
    • Remove “everyone in the account” style policies if not necessary.
  3. API Gateway / AppSync:

    • Ensure authentication and authorization are explicit:
      • Cognito / JWT validation.
      • IAM auth where appropriate.
    • Avoid allowing unverified clients to directly trigger high-privilege actions.

Day 7: Minimal observability you’ll actually use

  1. Centralize logs:

    • Ensure all Lambda logs are:
      • Structured (JSON).
      • Tagged with key fields: request ID, principal/user ID (if any), correlation ID.
  2. Set 5–10 meaningful alerts:

    • New IAM roles with admin-level actions.
    • Lambdas failing due to AccessDeniedException.
    • Sudden spikes in Lambda invocations on security-sensitive functions.
    • CloudTrail events for:
      • PutRolePolicy, AttachRolePolicy, PutBucketPolicy on sensitive resources.
  3. Practice one incident scenario:

    • “Lambda abused to exfiltrate data from DynamoDB.”
    • Walk through:
      • How would you detect?
      • How would you revoke access?
      • How would you assess what was accessed?

Don’t aim for perfection; aim for fast detection and bounded blast radius.


Bottom line

AWS serverless doesn’t remove security work; it moves it up the stack:

  • From hosts and networks → to identity, events, and data access.
  • From manual reviews → to platform engineering and guardrails.
  • From static perimeters → to dynamic, fine-grained policies.

If you own a serverless-heavy AWS environment and can’t answer:

  • “Which public entry points can indirectly touch our most sensitive data?”
  • “Which Lambdas have more privilege than they actually use?”
  • “How quickly would we notice if one Lambda started doing something weird?”

—you don’t have a credible security posture yet.

The fix is not more process or more tickets; it’s:

  • Opinionated modules and reference architectures.
  • Strict IAM and event-plane boundaries.
  • Observability wired for identity and data access.

Treat each Lambda as cheap, disposable, and dangerously powerful. Design so that when—not if—one is compromised, the damage is small, detectable, and reversible.

Similar Posts