AI agents keep failing. The fix is 40 years old.

Code examples in this post are available in five languages. Pick yours:

The pattern I keep seeing

An agent reads a function that takes a list and returns a list. It writes tests. They pass. The function fails in production because it depends on a global config and a database singleton the signature never declared. The agent had no way to know. This isn’t a model problem. Functional programmers solved it in the 1980s.

I’ve shipped AI products for over a decade, and the trajectory is always the same: impressive demo, promising pilot, gradual degradation, debugging nightmare, project abandoned. Most agent projects never make it to production. The ones that do often get rolled back within a year. MIT found 95% of AI pilots fail to deliver ROI. The instinct is to blame the models. “GPT-5 will fix it” or “we need better prompts.” The failures are architectural.

When an agent writes code into a mutable, tightly-coupled codebase, it’s producing non-deterministic output that depends on hidden state it can’t see. The global config object three modules away, the function that logs to disk as a side effect, the test that was mocking a database that behaves differently in production: the agent has no way to know about any of it.

The codebase is hostile to automation, and we keep blaming the agent.

Why agents need different code

A human developer builds a mental model of a codebase over months. They know where the bodies are buried: which functions mutate state, which modules share globals, which tests are flaky. They carry this context between sessions.

Agents don’t have that luxury. Every session starts from scratch. An agent reads the code that’s in front of it, follows the explicit contracts, and produces output based on what it can verify. This means anything implicit, any hidden state, any side effect buried inside a “pure” function, becomes a trap.

Here’s a function that looks fine to a human:

function evaluate_options(options):
    threshold = GLOBAL_CONFIG["decision_threshold"]   // hidden dependency
    history = Database.get_instance().get_history()    // hidden dependency
    return options where option.score > threshold

def evaluate_options(options):
    threshold = GLOBAL_CONFIG["decision_threshold"]
    history = Database.get_instance().get_history()
    return [opt for opt in options if opt.score > threshold]

function evaluateOptions(options: Option[]): Option[] {
  const threshold = GLOBAL_CONFIG.decisionThreshold;
  const history = Database.getInstance().getHistory();
  return options.filter((opt) => opt.score > threshold);
}

func EvaluateOptions(options []Option) []Option {
    threshold := globalConfig.DecisionThreshold
    history := db.GetInstance().GetHistory()
    var result []Option
    for _, opt := range options {
        if opt.Score > threshold {
            result = append(result, opt)
        }
    }
    return result
}

fn evaluate_options(options: &[Option]) -> Vec<Option> {
    let threshold = GLOBAL_CONFIG.lock().unwrap().decision_threshold;
    let history = Database::instance().get_history();
    options.iter()
        .filter(|opt| opt.score > threshold)
        .cloned()
        .collect()
}

A developer on the team knows that config gets loaded from a YAML file at startup and the database accessor is a singleton that needs initialization. An agent sees a function that takes a list and returns a list. It writes tests against that contract, the tests pass in isolation, and the function fails in production because the global config wasn’t loaded.

Now multiply this across a codebase with hundreds of these hidden dependencies. Every function the agent touches has an invisible blast radius. Every change it makes can break something in a module it never read. This is why agent projects degrade: each iteration introduces subtle state corruption that compounds.

The agent sees inputs and outputs. The hidden dependencies are invisible.

The fix is forty years old

Functional programming solves these problems because it was designed to eliminate exactly the properties that make code hostile to automated reasoning. This isn’t a new insight. ML researchers have known since the 1980s that referentially transparent code is easier for machines to analyze, optimize, and transform. We just haven’t applied the lesson to the agents writing our code.

The principles are straightforward:

Pure functions return the same output for the same input, with no global state, database calls, or logging inside the function body. An agent can test a pure function by calling it with no setup or mocking required.

Explicit data flow means you can trace how inputs become outputs by reading the code linearly, without action-at-a-distance or mutations happening in a callback three layers deep. An agent can follow the data pipeline and understand what each step does.

Side effects at the boundaries means I/O, database access, and external API calls happen in a thin outer layer. The core logic is deterministic. An agent can rewrite core logic without worrying about accidentally triggering a payment or sending an email.

Composition over coupling means small functions that snap together like Lego bricks. An agent can replace one function without understanding the entire module graph.

This isn’t about purity for its own sake. I don’t care about monads or category theory. I care that when an agent modifies a function, the scope of possible breakage is exactly one function.

SUPER: five principles for agent-friendly code

I put these into an acronym because that’s how principles survive in organizations. Hover any term below for the full definition.

SUPER is five constraints on how you write code. Side Effects at the Edge means I/O happens in a thin outer layer, never inside business logic. Uncoupled Logic means dependencies are passed in, never pulled from globals. Pure & Total Functions means deterministic functions that handle every input. Explicit Data Flow means you can trace data linearly from input to output. Replaceable by Value means any expression can be swapped with its computed result.

The practical effect: an agent working on SUPER-compliant code can modify any function by reading only that function and its type signature. No hidden state to trace, no global config to discover, no side effects to accidentally trigger. Here’s what that looks like on a real function:

Before: the evaluate_options function from earlier, with its hidden dependencies.

function evaluate_options(options):
    threshold = GLOBAL_CONFIG["decision_threshold"]   // hidden
    history = Database.get_instance().get_history()    // hidden
    log("evaluating " + options.length + " options")   // side effect
    return options where option.score > threshold

def evaluate_options(options):
    threshold = GLOBAL_CONFIG["decision_threshold"]   # hidden
    history = Database.get_instance().get_history()    # hidden
    logger.info(f"evaluating {len(options)} options")  # side effect
    return [opt for opt in options if opt.score > threshold]

function evaluateOptions(options: Option[]): Option[] {
  const threshold = GLOBAL_CONFIG.decisionThreshold;   // hidden
  const history = Database.getInstance().getHistory();  // hidden
  console.log(`evaluating ${options.length} options`);  // side effect
  return options.filter((opt) => opt.score > threshold);
}

func EvaluateOptions(options []Option) []Option {
    threshold := globalConfig.DecisionThreshold        // hidden
    history := db.GetInstance().GetHistory()            // hidden
    log.Printf("evaluating %d options", len(options))  // side effect
    var result []Option
    for _, opt := range options {
        if opt.Score > threshold {
            result = append(result, opt)
        }
    }
    return result
}

fn evaluate_options(options: &[Option]) -> Vec<Option> {
    let threshold = GLOBAL_CONFIG.lock().unwrap().decision_threshold; // hidden
    let history = Database::instance().get_history();                 // hidden
    log::info!("evaluating {} options", options.len());               // side effect
    options.iter()
        .filter(|opt| opt.score > threshold)
        .cloned()
        .collect()
}

An agent writing tests for this function will miss the config dependency, the database singleton, and the logger. The tests pass in isolation. The function fails in production.

After: the same logic, SUPER-compliant. Dependencies are parameters. I/O is the caller’s job. Every input is explicit.

// pure: all inputs explicit, no side effects
function evaluate_options(options, threshold) -> list:
    return options where option.score > threshold

// edge: caller handles I/O and config
threshold = config.get("decision_threshold")
result = evaluate_options(orders, threshold)
log("evaluated " + result.length + " options")

# pure: all inputs explicit, no side effects
def evaluate_options(options: list, threshold: float) -> list:
    return [opt for opt in options if opt.score > threshold]

# edge: caller handles I/O and config
threshold = config.get("decision_threshold")
result = evaluate_options(orders, threshold)
logger.info(f"evaluated {len(result)} options")

// pure: all inputs explicit, no side effects
function evaluateOptions(options: Option[], threshold: number): Option[] {
  return options.filter((opt) => opt.score > threshold);
}

// edge: caller handles I/O and config
const threshold = config.get("decisionThreshold");
const result = evaluateOptions(orders, threshold);
console.log(`evaluated ${result.length} options`);

// pure: all inputs explicit, no side effects
func EvaluateOptions(options []Option, threshold float64) []Option {
    var result []Option
    for _, opt := range options {
        if opt.Score > threshold {
            result = append(result, opt)
        }
    }
    return result
}

// edge: caller handles I/O and config
threshold := config.Get("decision_threshold")
result := EvaluateOptions(orders, threshold)
log.Printf("evaluated %d options", len(result))

// pure: all inputs explicit, no side effects
fn evaluate_options(options: &[Option], threshold: f64) -> Vec<Option> {
    options.iter()
        .filter(|opt| opt.score > threshold)
        .cloned()
        .collect()
}

// edge: caller handles I/O and config
let threshold = config.get("decision_threshold");
let result = evaluate_options(&orders, threshold);
log::info!("evaluated {} options", result.len());

The agent can now test evaluate_options by calling it with a list and a number. No mocking, no setup, no teardown. If the function is wrong, the agent sees it immediately. If it’s right, it stays right regardless of what the rest of the codebase does. The blast radius of any change is exactly one function.

Deep dive: each principle in practice

The acronym is easy to remember. Knowing when you’re violating each principle is harder. Here’s what each one looks like in real code, with the specific failure mode it prevents.

Side Effects at the Edge

A function that sends a notification inside business logic means every test, every agent run, and every dry run triggers a real notification. Move the side effect to the caller. The function computes what to send; the boundary layer sends it.

// bad: side effect buried in logic
function process_order(order):
    total = calculate_total(order.items)
    send_email(order.user, "Your total is " + total)  // fires every call
    return total

// good: pure core, I/O at the edge
function process_order(order) -> (total, email):
    total = calculate_total(order.items)
    email = make_email(order.user, "Your total is " + total)
    return (total, email)

// caller decides when to send
(total, email) = process_order(order)
send_email(email)

# bad: side effect buried in logic
def process_order(order):
    total = calculate_total(order.items)
    send_email(order.user, f"Your total is {total}")  # fires every call
    return total

# good: pure core, I/O at the edge
def process_order(order) -> tuple[float, Email]:
    total = calculate_total(order.items)
    email = make_email(order.user, f"Your total is {total}")
    return total, email

# caller decides when to send
total, email = process_order(order)
send_email(email)

// bad: side effect buried in logic
function processOrder(order: Order): number {
  const total = calculateTotal(order.items);
  sendEmail(order.user, `Your total is ${total}`);  // fires every call
  return total;
}

// good: pure core, I/O at the edge
function processOrder(order: Order): { total: number; email: Email } {
  const total = calculateTotal(order.items);
  const email = makeEmail(order.user, `Your total is ${total}`);
  return { total, email };
}

// caller decides when to send
const { total, email } = processOrder(order);
sendEmail(email);

// bad: side effect buried in logic
func ProcessOrder(order Order) float64 {
    total := CalculateTotal(order.Items)
    SendEmail(order.User, fmt.Sprintf("Your total is %.2f", total)) // fires every call
    return total
}

// good: pure core, I/O at the edge
func ProcessOrder(order Order) (float64, Email) {
    total := CalculateTotal(order.Items)
    email := MakeEmail(order.User, fmt.Sprintf("Your total is %.2f", total))
    return total, email
}

// caller decides when to send
total, email := ProcessOrder(order)
SendEmail(email)

// bad: side effect buried in logic
fn process_order(order: &Order) -> f64 {
    let total = calculate_total(&order.items);
    send_email(&order.user, &format!("Your total is {total}"));  // fires every call
    total
}

// good: pure core, I/O at the edge
fn process_order(order: &Order) -> (f64, Email) {
    let total = calculate_total(&order.items);
    let email = make_email(&order.user, &format!("Your total is {total}"));
    (total, email)
}

// caller decides when to send
let (total, email) = process_order(&order);
send_email(&email);

The agent can test process_order a thousand times without sending a single email. When something breaks, you know it’s the computation, not the network.

Uncoupled Logic

A function that imports a module to get its dependencies is married to that module. Pass dependencies as arguments instead. This lets an agent swap implementations for testing without touching the import graph.

// bad: coupled to a specific cache implementation
function get_user(id):
    cached = RedisCache.get("user:" + id)      // locked to Redis
    if cached: return cached
    user = UserDB.find(id)                      // locked to this DB
    RedisCache.set("user:" + id, user)
    return user

// good: dependencies are parameters
function get_user(id, cache, db):
    cached = cache.get("user:" + id)
    if cached: return cached
    user = db.find(id)
    cache.set("user:" + id, user)
    return user

# bad: coupled to a specific cache implementation
def get_user(id: str) -> User:
    cached = RedisCache.get(f"user:{id}")       # locked to Redis
    if cached:
        return cached
    user = UserDB.find(id)                      # locked to this DB
    RedisCache.set(f"user:{id}", user)
    return user

# good: dependencies are parameters
def get_user(id: str, cache: Cache, db: DB) -> User:
    cached = cache.get(f"user:{id}")
    if cached:
        return cached
    user = db.find(id)
    cache.set(f"user:{id}", user)
    return user

// bad: coupled to a specific cache implementation
function getUser(id: string): User {
  const cached = RedisCache.get(`user:${id}`);  // locked to Redis
  if (cached) return cached;
  const user = UserDB.find(id);                 // locked to this DB
  RedisCache.set(`user:${id}`, user);
  return user;
}

// good: dependencies are parameters
function getUser(id: string, cache: Cache, db: DB): User {
  const cached = cache.get(`user:${id}`);
  if (cached) return cached;
  const user = db.find(id);
  cache.set(`user:${id}`, user);
  return user;
}

// bad: coupled to a specific cache implementation
func GetUser(id string) User {
    cached := redisCache.Get("user:" + id)       // locked to Redis
    if cached != nil {
        return cached
    }
    user := userDB.Find(id)                      // locked to this DB
    redisCache.Set("user:"+id, user)
    return user
}

// good: dependencies are parameters
func GetUser(id string, cache Cache, db DB) User {
    cached := cache.Get("user:" + id)
    if cached != nil {
        return cached
    }
    user := db.Find(id)
    cache.Set("user:"+id, user)
    return user
}

// bad: coupled to a specific cache implementation
fn get_user(id: &str) -> User {
    if let Some(cached) = REDIS_CACHE.get(&format!("user:{id}")) {  // locked to Redis
        return cached;
    }
    let user = USER_DB.find(id);                                     // locked to this DB
    REDIS_CACHE.set(&format!("user:{id}"), &user);
    user
}

// good: dependencies are parameters
fn get_user(id: &str, cache: &dyn Cache, db: &dyn DB) -> User {
    if let Some(cached) = cache.get(&format!("user:{id}")) {
        return cached;
    }
    let user = db.find(id);
    cache.set(&format!("user:{id}"), &user);
    user
}

An agent testing the good version passes in a hash map as the cache and a list as the database. No Redis, no connection strings, no Docker containers.

Pure & Total Functions

A function that throws on unexpected input is a function that’s lying about its return type. A total function handles every case. Agents can’t catch exceptions they don’t know about; they can read a return type that says “this might fail.”

// bad: partial function, throws on bad input
function parse_age(input):
    n = to_integer(input)
    if n is null: throw "not a number"
    if n < 0:    throw "negative age"
    if n > 150:  throw "unrealistic age"
    return n

// good: total function, returns a result for every input
function parse_age(input) -> result:
    n = to_integer(input)
    if n is null: return error("not a number: " + input)
    if n < 0:    return error("negative age: " + n)
    if n > 150:  return error("unrealistic age: " + n)
    return ok(n)

# bad: partial function, throws on bad input
def parse_age(input: str) -> int:
    n = int(input)          # raises ValueError
    if n < 0:
        raise ValueError("negative age")
    if n > 150:
        raise ValueError("unrealistic age")
    return n

# good: total function, returns a result for every input
def parse_age(input: str) -> int | str:
    try:
        n = int(input)
    except ValueError:
        return f"not a number: {input}"
    if n < 0:
        return f"negative age: {n}"
    if n > 150:
        return f"unrealistic age: {n}"
    return n

// bad: partial function, throws on bad input
function parseAge(input: string): number {
  const n = parseInt(input, 10);
  if (isNaN(n)) throw new Error("not a number");
  if (n < 0)    throw new Error("negative age");
  if (n > 150)  throw new Error("unrealistic age");
  return n;
}

// good: total function, returns a result for every input
function parseAge(input: string): { ok: true; value: number } | { ok: false; error: string } {
  const n = parseInt(input, 10);
  if (isNaN(n)) return { ok: false, error: `not a number: ${input}` };
  if (n < 0)    return { ok: false, error: `negative age: ${n}` };
  if (n > 150)  return { ok: false, error: `unrealistic age: ${n}` };
  return { ok: true, value: n };
}

// bad: panics on bad input
func ParseAge(input string) int {
    n, err := strconv.Atoi(input)
    if err != nil {
        panic("not a number")
    }
    if n < 0 {
        panic("negative age")
    }
    if n > 150 {
        panic("unrealistic age")
    }
    return n
}

// good: total function, returns a result for every input
func ParseAge(input string) (int, error) {
    n, err := strconv.Atoi(input)
    if err != nil {
        return 0, fmt.Errorf("not a number: %s", input)
    }
    if n < 0 {
        return 0, fmt.Errorf("negative age: %d", n)
    }
    if n > 150 {
        return 0, fmt.Errorf("unrealistic age: %d", n)
    }
    return n, nil
}

// bad: panics on bad input
fn parse_age(input: &str) -> u32 {
    let n: i32 = input.parse().expect("not a number");
    if n < 0   { panic!("negative age"); }
    if n > 150 { panic!("unrealistic age"); }
    n as u32
}

// good: total function, returns a result for every input
fn parse_age(input: &str) -> Result<u32, String> {
    let n: i32 = input.parse().map_err(|_| format!("not a number: {input}"))?;
    if n < 0   { return Err(format!("negative age: {n}")); }
    if n > 150 { return Err(format!("unrealistic age: {n}")); }
    Ok(n as u32)
}

The agent reads the total version’s return type and knows it can fail. It writes tests for both paths. The partial version’s return type says int, so the agent writes tests that assume success, and the first bad input crashes production.

Explicit Data Flow

When data moves through nested callbacks or mutates an object across multiple methods, an agent can’t follow the pipeline. Linear data flow, where each step takes input and returns output, is readable by both humans and machines.

// bad: mutation spread across methods
request = new Request()
request.set_headers(auth_headers)
request.set_body(payload)
request.add_retry_policy(3)
request.sign(secret_key)
response = request.send()

// good: linear pipeline, each step returns new data
headers  = add_auth(default_headers, token)
body     = serialize(payload)
signed   = sign(headers, body, secret_key)
response = send(signed, retries=3)

# bad: mutation spread across methods
request = Request()
request.set_headers(auth_headers)
request.set_body(payload)
request.add_retry_policy(3)
request.sign(secret_key)
response = request.send()

# good: linear pipeline, each step returns new data
headers  = add_auth(default_headers, token)
body     = serialize(payload)
signed   = sign(headers, body, secret_key)
response = send(signed, retries=3)

// bad: mutation spread across methods
const request = new Request();
request.setHeaders(authHeaders);
request.setBody(payload);
request.addRetryPolicy(3);
request.sign(secretKey);
const response = request.send();

// good: linear pipeline, each step returns new data
const headers  = addAuth(defaultHeaders, token);
const body     = serialize(payload);
const signed   = sign(headers, body, secretKey);
const response = send(signed, { retries: 3 });

// bad: mutation spread across methods
request := NewRequest()
request.SetHeaders(authHeaders)
request.SetBody(payload)
request.AddRetryPolicy(3)
request.Sign(secretKey)
response := request.Send()

// good: linear pipeline, each step returns new data
headers  := AddAuth(defaultHeaders, token)
body     := Serialize(payload)
signed   := Sign(headers, body, secretKey)
response := Send(signed, 3)

// bad: mutation spread across methods
let mut request = Request::new();
request.set_headers(auth_headers);
request.set_body(payload);
request.add_retry_policy(3);
request.sign(&secret_key);
let response = request.send();

// good: linear pipeline, each step returns new data
let headers  = add_auth(&default_headers, &token);
let body     = serialize(&payload);
let signed   = sign(&headers, &body, &secret_key);
let response = send(&signed, 3);

In the mutation version, the agent has to read the Request class to know what sign does to the internal state. In the pipeline version, sign takes three values and returns a new value. The agent can test it in isolation.

Replaceable by Value

If you can swap a function call with its return value and the program still behaves the same, that function is referentially transparent. This property lets agents cache results, skip redundant computation, and reason about code by substitution.

// bad: not replaceable by value (side effect)
function discount_price(price):
    result = price * 0.9
    Analytics.track("discount_applied", result)   // side effect
    return result

total = discount_price(100) + discount_price(100)
// can't replace with: total = 90 + 90
// because that skips two analytics events

// good: replaceable by value
function discount_price(price):
    return price * 0.9

total = discount_price(100) + discount_price(100)
// identical to: total = 90 + 90

# bad: not replaceable by value (side effect)
def discount_price(price: float) -> float:
    result = price * 0.9
    analytics.track("discount_applied", result)   # side effect
    return result

total = discount_price(100) + discount_price(100)
# can't replace with: total = 90 + 90
# because that skips two analytics events

# good: replaceable by value
def discount_price(price: float) -> float:
    return price * 0.9

total = discount_price(100) + discount_price(100)
# identical to: total = 90 + 90

// bad: not replaceable by value (side effect)
function discountPrice(price: number): number {
  const result = price * 0.9;
  analytics.track("discount_applied", result);   // side effect
  return result;
}

let total = discountPrice(100) + discountPrice(100);
// can't replace with: total = 90 + 90
// because that skips two analytics events

// good: replaceable by value
function discountPrice(price: number): number {
  return price * 0.9;
}

let total = discountPrice(100) + discountPrice(100);
// identical to: total = 90 + 90

// bad: not replaceable by value (side effect)
func DiscountPrice(price float64) float64 {
    result := price * 0.9
    analytics.Track("discount_applied", result)   // side effect
    return result
}

total := DiscountPrice(100) + DiscountPrice(100)
// can't replace with: total = 90 + 90
// because that skips two analytics events

// good: replaceable by value
func DiscountPrice(price float64) float64 {
    return price * 0.9
}

total := DiscountPrice(100) + DiscountPrice(100)
// identical to: total = 90 + 90

// bad: not replaceable by value (side effect)
fn discount_price(price: f64) -> f64 {
    let result = price * 0.9;
    analytics::track("discount_applied", result);   // side effect
    result
}

let total = discount_price(100.0) + discount_price(100.0);
// can't replace with: let total = 90.0 + 90.0;
// because that skips two analytics events

// good: replaceable by value
fn discount_price(price: f64) -> f64 {
    price * 0.9
}

let total = discount_price(100.0) + discount_price(100.0);
// identical to: let total = 90.0 + 90.0;

When every function is replaceable by its value, an agent can reason about your code algebraically. It can inline, extract, reorder, and cache without fear of changing behavior. That’s the foundation the other four principles build toward.

SPIRALS: a process loop for human-agent collaboration

SUPER handles the code. But agents also need a structured process, or they drift. Anyone who’s watched Auto-GPT burn through API credits in an infinite loop knows what unstructured agent autonomy looks like.

SPIRALS is a seven-step loop that I run agents through on every task. It’s not a waterfall; it’s a tight cycle, often sub-minute, that keeps agents focused and gives humans natural checkpoints to intervene.

Sense

Gather context: read the relevant files, check git status, identify what already exists. Agents that skip this step rebuild things that already work.

Plan

Draft an approach, consider trade-offs, and define what “done” looks like. The human validates before any code gets written.

Inquire

Identify gaps in knowledge. What assumptions is the agent making? What doesn’t it know? This prevents the confident hallucination problem where an agent barrels ahead on wrong assumptions.

Refine

Simplify the plan. Apply the 80/20 rule. If a ticket is bigger than 3 story points, split it. Complexity gets killed here, before it enters the codebase.

Act

Write the code, following SUPER principles, as small bounded changes with tests alongside.

Learn

Run the tests and check the output. If something failed, the agent records what specifically went wrong for the next iteration.

Scan

The step Auto-GPT never had. The agent zooms out, looks for duplication, new risks, and things the change might have broken elsewhere. This is why Auto-GPT looped forever: it never checked whether it was actually making progress.

The seven steps split into two phases:

\underbrace{\textsf{S} \cdot \textsf{P} \cdot \textsf{I} \cdot \textsf{R}}_{\text{plan}} \;\Big|\; \underbrace{\textsf{A} \cdot \textsf{L} \cdot \textsf{S}}_{\text{execute}}

In practice, I run these as two separate commands. The planning phase (Sense, Plan, Inquire, Refine) produces design docs, tickets, and a burndown. A human reviews and approves. Only then does the execution phase (Act, Learn, Scan) start, and it runs per-ticket: write the code, verify it works, check for regressions, commit, move to the next ticket. The gate between SPIR and ALS is the only point where I require human approval. Everything else, the agent handles.

The SPIRALS loop: each iteration cycles through all seven steps until Scan confirms the goal is met.

repeat up to MAX_LOOPS times:
    context  = sense()
    plan     = make_plan(context, goal)
    gap      = inquire(plan)
    if gap exists:
        context = context + fetch(gap)
        plan    = refine(plan, context)
    outcome  = act(plan)
    context  = learn(outcome, context)
    if scan(context, goal) passes:
        stop

for _ in range(max_loops):
    ctx  = sense()
    plan = plan(ctx, goal)
    if gap := inquire(plan):
        ctx.update(fetch(gap))
        plan = refine(plan, ctx)
    outcome = act(plan)
    ctx     = learn(outcome, ctx)
    if scan(ctx, goal):
        break

for (let i = 0; i < maxLoops; i++) {
  let ctx = sense();
  let plan = makePlan(ctx, goal);
  const gap = inquire(plan);
  if (gap) {
    ctx = { ...ctx, ...fetch(gap) };
    plan = refine(plan, ctx);
  }
  const outcome = act(plan);
  ctx = learn(outcome, ctx);
  if (scan(ctx, goal)) break;
}

for i := 0; i < maxLoops; i++ {
    ctx := Sense()
    plan := Plan(ctx, goal)
    if gap := Inquire(plan); gap != nil {
        ctx.Merge(Fetch(gap))
        plan = Refine(plan, ctx)
    }
    outcome := Act(plan)
    ctx = Learn(outcome, ctx)
    if Scan(ctx, goal) {
        break
    }
}

for _ in 0..max_loops {
    let mut ctx = sense();
    let mut plan = make_plan(&ctx, &goal);
    if let Some(gap) = inquire(&plan) {
        ctx.merge(fetch(&gap));
        plan = refine(plan, &ctx);
    }
    let outcome = act(&plan);
    ctx = learn(outcome, ctx);
    if scan(&ctx, &goal) {
        break;
    }
}

The loop terminates when Scan confirms the goal is met. If it doesn’t converge, Scan flags it and a human decides what to do next, so you don’t wake up to an infinite loop that burned through your API budget overnight.

Why they work together

SUPER without SPIRALS gives you clean code with no process. The agent writes a perfect function, then writes nine more that weren’t needed. Or it refactors something that didn’t need refactoring. Discipline in the code means nothing without discipline in the workflow.

SPIRALS without SUPER gives you a structured process applied to a messy codebase. The agent follows all seven steps, but the Act step produces code with hidden dependencies that corrupt on the next iteration. The loop degrades because the underlying code can’t support reliable automated modification.

Together:

Side effects at the edge means only the Act step touches the real world. Sense, Plan, Inquire, and Refine are pure reasoning, safe to retry and cheap to test.
Uncoupled logic means each SPIRALS step can be its own module or its own agent. You can swap in a better planner without rewiring the system.
Purity means Plan and Refine are deterministic. Same input state, same plan. You can reproduce bugs by replaying inputs.
Explicit data flow means you can trace exactly what happened at each step. When something goes wrong at minute 47 of a long run, you read the log linearly and find it.
Referential transparency means intermediate results are cacheable. If Sense returns the same context, skip to Plan.

What this looks like in practice

I use SUPER and SPIRALS on every project now. This website, Unfudged, Intraview, all of it.

The concrete difference: agents working on SUPER-compliant code produce changes that pass tests on the first try about 3x more often than agents working on typical imperative code with global state. I don’t have a rigorous study for this; it’s what I’ve observed across projects over the past year. The debugging time drops even more because when something does fail, the failure is local to one function, not spread across a graph of shared state.

The process difference with SPIRALS: agents used to require heavy babysitting, where I’d check every output and try to catch hallucinations before they landed. With SPIRALS, the Scan step catches most regressions before I see them. I review at the Plan and Learn steps and skip the rest unless Scan flags something. My involvement per task dropped from continuous to two checkpoints.

Neither framework requires rewriting your codebase from scratch. Start with SUPER’s “S”: move side effects out of your three most-modified modules. That alone makes agent modifications safer. Add the Scan step to your agent workflows. You’ll catch the infinite loops and the confident-but-wrong outputs before they cost you.

Both frameworks are in my CLAUDE.md files, so every agent I work with follows them from the first prompt.

Where to start

You don’t need to rewrite your codebase. Pick one module and work through these five steps.

Find your three most-modified modules

Run git log --format=format: --name-only | sort | uniq -c | sort -rn | head -20. The files your team touches most are where hidden dependencies cause the most damage. Start there.

Move the side effects out

Find every function in those modules that reads global config, hits a database, writes a log, or calls an external API. Pull that I/O into the caller. The function’s job is to compute; the caller’s job is to interact with the world.

Make dependencies explicit

Every value a function needs should be in its parameter list. If a function reaches into a singleton or ambient context, add the parameter and pass it in. The function signature should be the complete contract.

Add the Scan step

After your agent completes a change, have it zoom out: check for duplication, look for things the change might have broken elsewhere, and verify the goal is actually met. This is the step that prevents infinite loops and confident-but-wrong outputs.

Measure the difference

Run your agent against the refactored code. Count how often the tests pass on the first try compared to before. If the architecture is right, you’ll see it in the numbers.

The industry is moving toward more agent autonomy, not less. If your code can’t be reasoned about by a machine, no amount of model improvement will save you.

The fix has been in your CS textbook for forty years. The agents just made it urgent.

The pattern I keep seeing

Why agents need different code

The fix is forty years old

SUPER: five principles for agent-friendly code

Deep dive: each principle in practice

SPIRALS: a process loop for human-agent collaboration

Why they work together

What this looks like in practice

Where to start

Related Posts

I built a tech debt simulation. At 10x coding speed, feature output approaches zero.

Stop reading walls of text from your AI agent

Pretext is a text measurement library. The most interesting use cases have no DOM at all.