Agentic Lamarkian Evolution - Teaching the AI Ant Colony to Evolve.

Cynthia Unwin
Mar 21
9 min read

Recently, it was suggested that I read a book called Children of Time by Adrian Tchaikovsky. It was an excellent read, and it turned out to be a very timely suggestion as, my big plan for this weekend was to upgrade my AI ants from a colony that works together to produce complex outcomes, to a colony that learns from experience so that it drives better complex outcomes over time.

In this vein, I've been thinking a lot about how to have the AI ants learn. I was originally going to implement a system based on the memory and context management cookbook from Anthropic. I can't use the actual Anthropic tools because my sandbox environment runs gpt-4o, but I could do something similar and I was just about ready to commit the design to paper (so to speak) and start writing code. I hadn't actually done it yet because I had a niggling sense that I really wanted something different. Something informed by that pattern; something that would build on that pattern; but still something different. I just wasn't sure what.

Then along came the Children of Time to provide the missing piece. But it wasn't the ants in Children of Time that inspired me, it was the spiders. I won't give away any more of the plot (really, just go read the book yourself) but the topic of Lamarckian evolution was raised. While this doesn't actually work with biological systems, there is no reason that it can't work with software.

For those who haven't read my previous posts, the short version is this. I'm building an AIOps system where simple LLM-powered agents — ants — collaborate to detect and resolve production incidents. The ants don't communicate with each other directly. They coordinate through a shared blackboard, a stigmergic environment where each ant reads the current state, decides autonomously whether it can contribute, and writes its findings back. No central controller. No orchestrator deciding who goes next. Resolution emerges from the collective contributions, not from a single coordinator directing action.

The ants are organised into types — sensors that observe, actuators that remediate, and assessment ants that correlate. Each ant has a goal (keep the system available), access to specific tools, and a behavioral loop: form a hypothesis about what's happening, take an action using available tools, then validate whether the outcome matched the hypothesis.

That architecture is working and can even handle cascading failure. What it doesn't do yet is learn.

Every time I restart the ants, they starts from scratch with whatever domain knowledge was baked into their system prompt. If a sensor ant spent three days learning that a particular metric pattern correlates with billing pipeline failures 87% of the time, that knowledge dies when the ant is stopped or the history rolls off the blackboard. The next iteration of the sensor ant has to rediscover patterns from scratch.

Why not just give the ants memory?

This was my original plan, and it's the obvious one. They already use the blackboard to retain knowledge and the Anthropic memory cookbook lays out a clean pattern to give an individual ant a file-based memory store, let it write down what it learns, and reload those notes at the start of the next session. It's elegant, it works, and it was what I was going to build.

But there was a problem, and it took me a while to articulate it. The memory cookbook is fundamentally a pattern for individual agent learning. One agent, remembering its own past, applying its own notes to its own future sessions. That's a diary. It's powerful for a single agent that operates across many sessions, but it doesn't quite fit what I need.

The ant colony isn't one agent running across sessions. It's a population of agents running concurrently. I don't need one ant to remember what it learned last week. I need the colony to get smarter over time. I need the knowledge that one ant discovered to flow to other ants — including ants that haven't been spawned yet. I need the population to develop expertise that no single ant could accumulate in its own lifetime.

I need inheritance, not memory. And that's what Tchaikovsky's spiders gave me.

In biology, Lamarck was wrong. Giraffes can't pass on stretched necks. Acquired characteristics don't make it into the DNA. But software agents aren't biological. They can serialize what they've learned. An ant that spent days discovering that connection pool exhaustion follows a specific metric signature can write that knowledge into a structured packet. The next generation of ants can be born knowing it.

This is Lamarckian evolution — the inheritance of acquired characteristics — and it's the architectural pattern that connects the Anthropic memory cookbook's mechanics to the population-level learning I actually need.

The key differences from the Anthropic cookbook pattern are worth being explicit about:

The cookbook is one agent writing notes to its future self. Our pattern is a population where the ant that discovered a correlation is not necessarily the ant that will use it. So architecturally we introduce the concept of the crystalliser that consolidates learning and the spawner that uses that learning to create the next generation of ant.

The crystalliser extracts knowledge from all the ants at retirement or after a set period of time, the gene pool stores it, and the spawner injects it into new ants at birth. The new ant has never seen the old ant's trials but it inherits distilled heuristics (not memories).

The Anthropic cookbook's memory is accumulative — a pattern written in session 1 persists regardless of whether it helps or hurts in sessions 2 through 100 and is wildly effective for long running or multi-session agents. However, our gene pool can apply selection pressure based on a population. Experience packets carry fitness scores derived from an ant's actual operational track record. Knowledge that doesn't prove useful in practice gets out competed. Our spawner recombines acquired layers from multiple ancestors — crossover from evolutionary computation — and then applies mutation to prevent the population from converging on local optima.

The cookbook agent is immortal in principle. Our ants have bounded lifetimes with explicit retirement triggers. Without generations, there's no evolutionary pressure. The retirement/crystallisation/spawning cycle is what makes the system genuinely evolutionary rather than just a persistent cache.

That said, the Anthropic cookbook informed two important concrete decisions in our new architecture. First, client-side storage: the gene pool is structured JSON files on disk, owned by the infrastructure, not an event stream. Second, the principle of storing patterns rather than history: the crystalliser extracts distilled heuristics, not event logs that have been applied. This has the added function of allowing a specific ant colony to evolve independently and in reference to the environment it operates in.

The architecture that is emerging operates across three distinct timescales, stigmergic; episodic; and evolutionary. I think this is a very important insight.

The stigmergic layer operates in seconds. This is the blackboard — the real-time coordination mechanism that already exists. Ants read entries, decide whether to participate, write their findings. An entry about a latency spike appears, a sensor ant reads it, investigates, writes a correlation finding, an assessment ant reads the correlation, synthesises a root cause hypothesis. This is all happening in real time, through the shared environment, with no direct communication between ants. This layer is unchanged.

The episodic layer operates over hours to days. This is an ant's lifetime. Each hypothesis-action-validation cycle produces a scored trial. Over its lifetime, the ant accumulates a track record: which hypotheses were accurate, which tools were effective, which environmental patterns it discovered. This is the raw material for crystallisation. When the ant reaches end-of-life — whether through time-based rotation, a deployment-triggered refresh, fitness degradation, or an explicit human signal — the crystalliser distills its trial history into a structured experience packet and deposits it in the gene pool.

The evolutionary layer operates over weeks to months. This is the gene pool. Crystallised experience packets from retired ants, fitness-ranked, partitioned by domain, available for recombination when spawning new ants. When a new ant needs to be created, the spawner selects multiple high-fitness packets from the relevant niche, recombines their acquired layers, applies mutation, and expresses the result as a concrete system prompt and tool configuration (possibly requiring HITL to build define and test new tools). The new ant starts life with its ancestors' distilled wisdom already baked into its prompt and context.

The three timescales interact but don't interfere. The blackboard handles the present. The episodic layer captures within-lifetime learning. The gene pool preserves cross-generational knowledge. An ant reads both its inherited genome (the slow channel) and the current blackboard state (the fast channel) to determine its behaviour. Different timescales of knowledge, different propagation mechanisms, same shared environment.

For an LLM-based ant, the genome is concrete: it's the system prompt and the tool configuration.

The innate layer — type template, tool definitions, base behavioral policy. The acquired layer — learned correlations with confidence scores, tuned thresholds, topology models, anti-patterns — is the inherited experience.

At spawn time, the spawner combines both into a runtime config. Acquired correlations get injected into the system prompt as prior domain knowledge. Anti-patterns get injected as warnings. The new starts with better priors than its predecessor.

The truly Lamarckian mechanism is innate layer promotion. When 80%+ of high-fitness ants in a niche independently develop the same correlation, the spawner promotes it from acquired to innate — baking it into the system prompt template for all future ants of that type in that niche. Conversely, tools that no high-fitness ant ever used get pruned from the tool definitions, reducing noise and token cost. The prompt evolves. The tool set evolves. The species template changes based on what individuals learned.

Every ant in the existing system already follows the hypothesis-action-validation loop. Every cycle produces a self-scoring trial: hypothesis accuracy, validation, outcome and messaging. That last metric is purely stigmergic: the blackboard tracks which entries get queried and acted on by other ants. A sensor ant whose entries consistently trigger productive assessment responses has high fitness. One whose entries get ignored has low fitness. No direct communication required.

The composite fitness score blends four components: hypothesis accuracy (effectiveness), signal-to-noise ratio (useful blackboard entries vs. total), overall outcome, and improvement over time. That last component matters — an ant that failed early but adapted scores higher than one that was consistently mediocre. We're selecting for adaptability, not just correctness.

The gene pool is partitioned by niche and type. Database sensor ants inherit from database sensor ancestors. Network ants from network ancestors. Each population evolves under its own selection pressure, accumulating domain-specific expertise — connection pool behaviour vs. latency propagation vs. memory leak signatures based on the environment they act in— exactly as biological populations adapt to local environments.

The niche partitioning also means different evolutionary tempos emerge naturally. Stable infrastructure domains converge quickly on effective patterns. Domains under active development face constant environmental disruption, and effectiveness triggered ant retirement ensures those populations keep adapting rather than ossifying. The evolutionary clock ticks at the rate the environment actually changes.

None of this violates the architectural principles from the original system. The ants still communicate exclusively through the blackboard. No direct inter-ant communication. No central controller. The spawner creates ants but doesn't direct their behaviour — it's infrastructure, not orchestration. The blackboard remains the real-time coordination layer. The gene pool is a separate, slower store operating on a completely different timescale.

The evolutionary layer adds a new capability without replacing or undermining the stigmergic foundation. If anything, it strengthens it. Ants that are better at contributing useful blackboard entries — the ones whose findings trigger productive responses from other ants and support favorable outcomes— have higher fitness and pass their knowledge forward. The evolutionary pressure is aligned with stigmergic effectiveness. Evolution selects for good blackboard citizens.

There are things I don't have answers for yet:

Should the crystalliser use the LLM to distill patterns? The crystalliser needs to cluster similar hypotheses and extract generalized correlations from raw trial data. Using the LLM for this adds intelligence but also adds cost and hallucination risk. The alternative is purely statistical clustering using text embeddings. I'm genuinely undecided.

Should knowledge cross niche boundaries? There might be patterns worth sharing across domains — cascading timeout behavior looks similar whether the origin is in the database layer or the network layer. The architecture has a placeholder for cross-niche correlations, but I haven't worked out the mechanics of when and how to share them without diluting the benefits of niche specialization.

How will human approval play in innate mutations? When the spawner promotes an acquired pattern into the system prompt template, it's modifying the species template. A bad innate mutation affects every future ant of that type in that niche. The stakes are materially higher than a bad acquired layer entry. We will require human review for innate mutations, at least initially but their performance really needs to be evaluated based on metrics. This is where lower environments become critical.

How do we prevent catastrophic forgetting of rare events? The gene pool pruner removes low-fitness packets to prevent unbounded growth. But what about the knowledge that came from the once-a-year cascading failure? If recent generations have high fitness without it, the pruner might discard it. The diversity preservation mechanism in the selection algorithm is a partial answer, but I suspect I'll also need a dedicated rare event archive.

How do we attribute fitness in collaborative outcomes? When an incident is resolved through multi-ant collaboration — sensor detects, assessment correlates, actuator remediates — each ant gets credit for its own hypothesis accuracy. The downstream activation metric captures some of the collaborative value. But I'm not fully satisfied with the attribution model yet.

I'll be back when I've tested this against real incidents. The code is coming together. What I need now is to measure whether generation five ants actually outperform generation one ants, and see whether the niche partitioning produces the kind of domain-specific specialization I'm expecting.

If the hypothesis-action-validation cycle works as a fitness function — if it produces selection pressure that actually drives improvement — then this is a fundamentally different kind of learning system than anything the current agentic AI frameworks are building. We are not building a single agent that remembers. We are not building an orchestrator that optimizes. We are building a population that evolves.

Agentic Lamarkian Evolution - Teaching the AI Ant Colony to Evolve.

Why not just give the ants memory?

Recent Posts

Comments