Stop Calling It an “Agent” If It’s Just a Cron Job with a LLM

Table of Contents

Why this matters this week

The AI automation conversation has shifted from “can we?” to “should we ship this to production?”

In the past two weeks, I’ve seen three patterns repeat:

A fintech team ripped out 60% of their RPA flows and replaced them with an LLM-driven workflow engine, cutting their “script babysitting” time by half.
A SaaS vendor quietly rolled out an internal “copilot for ops” that now handles ~25% of their Tier-1 support tickets end-to-end (including refunds), with measured guardrails and human review.
A manufacturing company put a multi-step “agent” in front of a legacy ERP to handle order changes. It worked great in staging, then flooded the warehouse with conflicting instructions when a schema changed in production.

Same technology family, wildly different outcomes.

This week’s reality:
AI automation (agents, copilots, orchestrated workflows) is mature enough to replace brittle RPA and rule engines for some classes of work — but only if you treat it like a distributed, partially-stochastic system, not a clever macro.

If you’re responsible for reliability, cost, or compliance, you need a clear mental model of:

What actually changed technically in the last 6–12 months
Where these systems fail in production
What you can ship this week without betting the company

What’s actually changed (not the press release)

Three concrete shifts make AI automation materially different from “RPA + chatbots”:

LLMs are now competent state transformers, not just text generators
- Modern models can:
  - Read semi-structured inputs (emails, PDFs, logs)
  - Normalize them into structured schema (JSON payloads)
  - Decide which tools to call in what order
- This turns a bunch of fragile regex/if-else/RPA logic into:
  - “Given this state + tools, propose next action and arguments”
- Evidence: You can now reliably ask a model to:
  - Parse a gnarly invoice into a typed schema
  - Choose the correct internal API to call
  - Handle edge cases reasonably, if you constrain the space
Tooling + orchestration frameworks stopped being toys
- We now have:
  - Function calling / tool calling as a first-class API pattern
  - Workflow engines that treat LLM calls as nodes with retries, timeouts, and circuit breakers
  - Vector search + retrieval that’s good enough for many knowledge tasks
- The net effect: you can build deterministic scaffolding with probabilistic decisions inside, instead of all-or-nothing stochastic flows.
Costs and latency dropped to “operationally tolerable” for many tasks
- Running “AI as glue” between systems is now:
  - Cents, not dollars, per multi-step flow in many cases
  - 1–5 seconds end-to-end for moderately complex workflows
- That’s still too slow/expensive for high-frequency, low-value events, but fine for:
  - Support tickets
  - Back-office ops
  - Partner integrations
  - Exception handling previously done manually

What hasn’t changed:

No guarantee of correctness or consistency across calls
No first-class transactional semantics (no atomic “all or nothing” across tools)
No free interpretability — your “business logic” is partly inside a model you can’t inspect

Plan accordingly.

How it works (simple mental model)

Drop the “agent” buzzword. Treat AI automation as:

A workflow engine with:
1. Deterministic skeleton (state machine / DAG)
2. Probabilistic decision points (LLM calls)
3. Side-effecting tools (APIs, scripts, RPA leftovers)

A simple mental model:

State
- You maintain an explicit workflow state object, e.g.:
  
  json { "ticket_id": "123", "customer_message": "...", "parsed_intent": "...", "account_status": "active", "proposed_actions": [], "audit_log": [] }
Policy / Guardrails
- You define hard constraints outside the model, e.g.:
  - Max refund without human review = $100
  - Never delete data without dual control
  - Only call tools from an allowlist
LLM as decision function
- At specific points, you call the model with:
  - Current state
  - Available tools + schemas
  - Business constraints
- You ask for:
  - Next action: which tool to call
  - Arguments: structured JSON
  - Rationale (optional, mostly for debugging)
Tool execution + observation
- You execute the selected tool deterministically.
- You capture:
  - Success/failure
  - Response payload
- You append this to state + audit log.
Loop or exit
- You terminate when:
  - A goal condition is met (ticket closed, order updated, incident escalated)
  - A safety constraint triggers (too many steps, cost cap, policy violation)
  - You hit an explicit “hand off to human” path

This is effectively an orchestrated agent:

The orchestrator controls:
- When the LLM is called
- Maximum number of steps
- What tools are allowed
The LLM controls:
- Which allowed tool to use next
- How to map unstructured input to structured actions

Key implication: reliability doesn’t come from the model; it comes from the orchestration, constraints, and observability around it.

Where teams get burned (failure modes + anti-patterns)

A few recurring failure modes in production AI automation:

1. “Invisible business logic inside the prompt”

Symptoms:

A single 300-line prompt encodes your refund, escalation, and fraud rules.
Product asks for a small policy change. You tweak the prompt. Something unrelated breaks.

Why it happens:

It’s fast to jam logic into natural language.
It feels flexible — until you need versioning, testing, or auditability.

Mitigation:

Keep business invariants outside prompts:
- Limits, thresholds, roles, approval rules
Use prompts for:
- Interpretation, classification, summarization
- Mapping state → recommended tools/arguments
Treat prompts as code:
- Version them
- Add tests with fixed seeds and snapshots

2. Over-trusting model output for side effects

Symptoms:

The model fabricates IDs, endpoints, or fields.
It calls the wrong API with plausible arguments.
Integrations “sort of work” in staging, then corrupt data in production.

Why it happens:

Function calling encourages belief that “the model will follow the schema.”
In reality, it follows the spirit of the schema, not the letter.

Mitigation:

Always validate:
- JSON schema validation before side effects
- Reference checks (e.g., does this ID exist?)
Enforce:
- Role-based access per tool
- Dry-run mode in non-prod with synthetic data
Add “compensating actions”:
- If a step fails, record and halt, don’t guess a fallback.

3. Turning RPA spaghetti into “agent spaghetti”

Symptoms:

You replace 200 RPA scripts with a single “smart agent” that:
- Logs into 10 systems
- Handles 20 edge-case flows
Debugging becomes impossible. Failures look like “the agent did something weird.”

Why it happens:

The pendulum swings from hyper-explicit flows to “let the agent figure it out.”

Mitigation:

Decompose by business capability, not by technology:
- “Invoice matching agent”
- “Subscription change agent”
- “KYC document reviewer”
Keep each agent’s:
- Tool set small
- Responsibilities clear
- Flows observable (traces, logs, per-step metrics)

4. No SLOs, no guardrails, no budget

Symptoms:

Your AI automation system silently:
- Generates large cloud bills
- Introduces latency spikes into critical flows
Or worse, it:
- Executes long, looping tool calls with no cap

Mitigation:

Define SLOs up front:
- Max cost per workflow
- Max latency per class of request
- Max steps per run
Enforce:
- Hard per-run ceilings
- Circuit breakers (disable automation if error rate spikes)
- Shadow mode rollouts before full autonomy

Practical playbook (what to do in the next 7 days)

If you’re a tech lead or CTO, you can move from “we should look at agents” to a concrete, low-risk pilot in one week.

Day 1–2: Identify one candidate workflow

Look for:

High manual load, moderate volume
Text-heavy input, structured output
Clear success criteria

Good candidates:

Classify + respond to specific types of support tickets
Normalize inbound partner emails into structured requests
Validate and route inbound forms or applications
Resolve low-value exceptions in back-office ops

Avoid (for your first iteration):

Direct payment movement
Irreversible destructive operations (deletes, hard cancels)
Anything legally sensitive without clear compliance guidance

Define:

Target automation rate (e.g., 30–50% of cases)
Acceptable error rate and failure modes
“Always human” edge cases (e.g., VIP customers, large amounts)

Day 3: Sketch the orchestration, not the prompt

On a whiteboard or in code:

Define your state object (fields, lifecycle)
List tools:
- Fetch customer/account
- Fetch knowledge (retrieval)
- Apply action (e.g., issue refund, update subscription)
Decide control flow:
- Where do you ask the LLM for help?
- Where are decisions purely deterministic?

You should end up with something like:

Ingest request → enrich with customer data
LLM: classify intent + compute proposed action
Policy engine: validate proposal against rules
If safe → apply action
Else → escalate with full context to human

Day 4–5: Build a vertical slice in “shadow” mode

Implement:

LLM calls with:
- System prompt that describes tools and constraints
- JSON-only output enforced via schema
Tool wrappers with:
- Input validation
- Logging (inputs/outputs)
Observability:
- Trace each workflow from input → all intermediate steps → output
- Tag with cost, latency, and outcome

Run it on real traffic in shadow mode:

The AI

Stop Calling It an “Agent” If It’s Just a Cron Job with a LLM

Why this matters this week

What’s actually changed (not the press release)

How it works (simple mental model)

Where teams get burned (failure modes + anti-patterns)

1. “Invisible business logic inside the prompt”

2. Over-trusting model output for side effects

3. Turning RPA spaghetti into “agent spaghetti”

4. No SLOs, no guardrails, no budget

Practical playbook (what to do in the next 7 days)

Day 1–2: Identify one candidate workflow

Day 3: Sketch the orchestration, not the prompt

Day 4–5: Build a vertical slice in “shadow” mode

The Unsexy Reality of AI Automation: Where It’s Actually Working (and Failing)

From RPA Graveyards to Real AI Automation: What’s Actually Working

Stop Calling It an “Agent” If It’s Just a Script with LLM Glue

Your RPA Bots Are Fragile. Here’s the AI Automation Stack That’s Replacing Them.

Stop Gluing LLMs to UIs: A Pragmatic Path from RPA to Real AI Automation

Stop Calling It “Agents” If It’s Just a Cron Job with a GPT Call

Why this matters this week

What’s actually changed (not the press release)

How it works (simple mental model)

Where teams get burned (failure modes + anti-patterns)

1. “Invisible business logic inside the prompt”

2. Over-trusting model output for side effects

3. Turning RPA spaghetti into “agent spaghetti”

4. No SLOs, no guardrails, no budget

Practical playbook (what to do in the next 7 days)

Day 1–2: Identify one candidate workflow

Day 3: Sketch the orchestration, not the prompt

Day 4–5: Build a vertical slice in “shadow” mode

Similar Posts