Your RPA Scripts Are Just Legacy Code With Worse Tooling

Table of Contents

Why this matters right now

Most “AI in business” pitch decks are fluff. But underneath the noise, something real is happening:
teams are quietly ripping out brittle RPA scripts and duct-taped macros, and replacing them with AI-driven workflows and agents that:

Survive UI changes and minor process drift
Handle edge cases without exploding into 500-rule monstrosities
Reduce manual triage work instead of just automating mouse clicks

This is not about “AGI agents that run your company.” It’s about:

Replacing fragile RPA with workflow-centric automation that uses LLMs as flexible “glue” between systems
Building copilots for internal teams that talk to your real systems, not just your wiki
Using orchestration frameworks to coordinate calls to LLMs, tools, and traditional services

If you own a backlog of “we should automate this later” tickets, AI automation is now a credible way to ship real, maintainable systems instead of yet another Selenium zombie.

What’s actually changed (not the press release)

The story isn’t “agents are here and will replace humans.”
The story is: we finally have a general-purpose interpreter that’s good enough to sit inside workflows.

Concretely, three things shifted in the last 18 months:

LLMs became decent at “structure in, structure out”

The initial wave of GPT demos were chat toys. That’s mostly irrelevant for operations.
What matters for automation:
- Models can reliably output JSON, CSV, and function call arguments with high fidelity
- With tool APIs, you get: “Here is the user query → here is the function I want to call + arguments”
- You can enforce schemas, validate, and retry cheaply
That makes them usable as dynamic routers and policy interpreters, not just text generators.
Tool use and function calling became first-class

This is the key mechanism for replacing brittle RPA:
- RPA automates the surface layer (UI clicks, DOM paths, XPaths)
- Tool-using LLMs automate the intent layer (decide what action makes sense, and call the right API)
Instead of automating “click button X then Y,” you automate “if the refund is below $500 and the account is in good standing, approve; else escalate.”
Open-source + hosted orchestration frameworks matured

You no longer have to build your own agent framework from scratch:
- You can define workflows: multi-step tasks with branching, retries, and human-in-the-loop
- You can plug in LLMs, tools, and classic microservices as nodes in the graph
- You get observability: logs, traces, token usage, error metrics
This moves “AI agents” from being a PowerPoint architecture to something you can deploy, monitor, and roll back.

None of this is magic. It’s just enough to make LLMs act like a configurable, probabilistic decision engine in the middle of your existing systems.

How it works (simple mental model)

Forget “agents as autonomous employees.”
Use this mental model instead:

A modern AI automation system is: a workflow engine + a set of tools + an LLM that decides which tool to use next.

Breakdown:

Workflow layer (orchestration)
- Describes the skeleton of your process:
  - Steps
  - Branching logic
  - Where humans get pulled in
- Think: “Handle support ticket” → classify → enrich data → propose actions → get approval → execute
This is where you codify guarantees:
- No financial transaction runs without a human approval step
- Timeouts, retries, compensating actions
Tools layer (deterministic operations)

Tools are your ground truth operations:
- Internal APIs (CRM, ERP, billing, inventory, email, Slack)
- Databases and search indices
- Checkers/validators (compliance rules, permission checks, schema validation)
Each tool has:
- A contract: name, parameters, possible responses
- Rate limiting and access control
- Clear semantics for failure / idempotency
LLM layer (decision + glue)

The LLM sits between workflow and tools:
- Maps fuzzy input → concrete action (“what tool should I call, and with what arguments?”)
- Interprets unstructured data (emails, PDFs, logs) into structured forms
- Generates text or summaries where needed
You define:
- System prompts: “You are an internal ops assistant for X domain…”
- Tool descriptions and constraints
- Guardrails (e.g., only suggest actions that match a whitelist of tools)
Control loop

For each “task”:
1. Workflow state + context → LLM prompt
2. LLM decides: call tool, ask user, or return result
3. Tool executes (if called), updates state
4. Workflow engine moves to next step based on outcome
Importantly: the workflow engine is in charge, not the LLM.
The LLM is a policy subroutine, not a root authority.

That’s the architecture that can **replace a surprising amount of RPA** without re-creating the fragility at a different layer.

Where teams get burned (failure modes + anti-patterns)

Here’s where production teams lose time and credibility.

1. Letting the LLM drive everything

Anti-pattern: “We’ll just prompt it to follow the process.”

No explicit workflow
No explicit tools
Giant prompt with all the instructions

Result:

Hard to test
Hard to update policy
Intermittent failures that you can’t reason about

Fix:
Treat the LLM as an *implementation detail* behind typed interfaces and workflows. Externalize policy.

2. Automating the wrong layer (UI scraping instead of system calls)

Example pattern:

Company has an internal web app with no APIs
They add an “AI agent” that drives a headless browser via LLM instructions
DOM changes → whole thing breaks

This is just RPA with extra latency and hallucinations.

Better options:

Expose minimal internal APIs (even if ugly and undocumented) tailored for automation
Use LLMs to handle messy input, but keep interactions with core systems deterministic

3. No guardrails around side effects

Common failure modes:

Agent sends duplicate emails or refunds due to retries and ambiguous states
LLM composes an email, then a bug causes it to be re-sent 20 times

Mitigations:

Idempotency keys for side-effecting operations (refunds, updates, emails)
“Dry run” mode and audit logs for every action
Require explicit human approval for high-risk operations (payments, legal communication)

4. Ignoring data quality and access control

Two big issues:

Garbage in, garbage out
If your CRM is a mess, an AI agent will just make bad decisions faster.
Permission leakage
Slapping “agent” on top of systems without per-user access control can leak data or enable actions users previously couldn’t do.

Mitigations:

Perform a minimal data cleanup for the specific automation use case (don’t fix the world; fix what the workflow touches)
Enforce RBAC at the tool layer: tools check the requesting user’s privileges, not just “agent has God mode”

5. No observability or evaluation

Symptoms:

You don’t know:
- Error rates
- How often humans override the agent
- Which prompts or models perform better
Debugging is based on “someone complained in Slack.”

Needed basics:

Central logging of:
- Inputs, tool calls, outputs (with PII controls)
- Latency, token usage, model versions
Simple evaluation harness:
- Golden test cases
- Regression checks when changing prompts/models

Practical playbook (what to do in the next 7 days)

Assume you’re a tech lead / architect tasked with “doing something real with AI automation this quarter.”

Day 1–2: Pick one surgical use case

Criteria:

High manual load, low emotional stakes
Structured enough that you can outline the steps
Touches few systems (1–3 APIs)

Examples that work well:

Classifying and routing inbound support tickets and proposing responses
Enriching sales leads from public data and internal CRM and suggesting next actions
Extracting fields from semi-structured documents (invoices, contracts) into your system of record

Explicitly avoid for v1:

Money movement without approvals
Anything legally binding (contracts sent without review)
Fully “autonomous” processes with no human checkpoints

Day 3: Map the workflow on a whiteboard

Define:

Steps and branches:
- “Receive email → classify intent → pull account context → propose 2–3 actions → human chooses → agent executes”
What is deterministic vs fuzzy:
- Deterministic: checking account status, updating tickets, sending notifications
- Fuzzy: interpreting user intent, generating text, choosing among allowed actions
Where humans must approve or can override

Deliverable: a very boring flowchart with explicit success/failure paths.

Day 4: Define tools (APIs) and contracts

For the chosen workflow:

List every external action:
- “getticket(id)”, “updateticket(id, status)”, “send_email(to, subject, body)”
For each:
- Input schema
- Output schema
- Error cases
- Permission model

Wrap these behind an internal service or SDK that the agent/orchestrator calls, not direct DB access for v1.

Day 5: Build a thin orchestration skeleton

In your preferred stack:

Implement a simple workflow runner:
- Given a task, move step-by-step, store state in DB
- Call tools, handle retries, record logs
Add LLM integration:
- One or two calls where you truly need “intelligence”
- Provide:
  - System prompt
  - Relevant context (never more than needed)
  - Tool descriptions (name + purpose + parameters)

Run it without auto-execution of side effects: log suggested actions rather than doing them.

Day 6: Put a human in the loop

Ship an internal preview:

UI:
- Show input (ticket/email/etc.)
- Show agent’s proposed actions or draft response
- Let human approve, edit, or reject
Log:
- Which suggested actions were taken or overridden
- Free-text reasons for rejection (optional)

Run this for a small group of power users. Tell them explicitly: “You are supervising a junior assistant, not replacing yourself.”

Day 7: Instrument and define upgrade criteria

Add basic metrics:

Automation coverage:
- % of cases where the agent’s suggestion is accepted with no edits
Latency per step
Tool failure rates
Token usage and model cost per task

Define thresholds for moving from:

“Suggest-only” → “Auto-execute on low-risk paths with human spot checks”
“Pilot team” → “Broader rollout”

Write down the conditions under which you’d roll back or switch models (e.g., acceptance rate drops below X%).

Bottom line

AI automation in real businesses isn’t about autonomous agents that run everything. It’s about:

Using LLMs as flexible interpreters inside well-defined workflows
Letting them handle fuzz and glue, while traditional systems own state and side effects
Replacing brittle RPA and scripts with something:
- Easier to adapt as processes change
- Easier to observe and test
- Safer to run at scale

If you treat “agents” as magic employees, you’ll get unpredictable systems that are hard to debug.
If you treat them as probabilistic subroutines in an orchestrated workflow, you can ship real value this quarter:

Faster response times
Fewer manual tickets
Lower maintenance cost than sprawling RPA farms

The technology is good enough. The constraint now is engineering discipline: where you draw boundaries, how you enforce guarantees, and whether you build observable, testable systems instead of the next generation of brittle automation.

Your RPA Scripts Are Just Legacy Code With Worse Tooling

Why this matters right now

What’s actually changed (not the press release)

How it works (simple mental model)