The first rule of Agent Driven AIOps
- Cynthia Unwin

- Jan 2
- 4 min read

There are lots of rules for success when it comes to Agent Driven AIOps. Allowing non-deterministic software to take action in your critical environments is high risk. But it's also high reward. So, how do we manage this risk?
There are a lot of layers to the answer to this question but let's start with something that is obvious, but is harder than it looks. Software running in a complex system is effected by circumstances external to it's own mechanisms. Complex systems by their nature are full of unseen feedback loops, complex relationships, and hidden influences. Adding non-deterministic agentic AI to this mix only increases this complexity. That's ok, but it needs to be managed.
This is why the first rule of Agent Driven AIOps is that pervasive automated self testing of AI processes is not an add-on but an intrinsic part of any agentic system. (Reinforcement learning will play a role here but we can talk about that another day.)
There are a couple of truths that we need to think about. First, you can test your software pre-release, you can sanity check it at release, but you cannot reasonably expect the performance of non-deterministic software to remain consistent over time in a complex system with changing components and data. Second, the amount of ongoing manual testing that would be required to continue to review agentic output in a large and complex environment would overshadow the benefit of it's implementation. This means that we need reliable and repeatable automated processes that consider actions over a wide topology and time frame (because operational impact is not necessarily local or tightly time bound).
Because there is not always a clear next action in restoring a complex system to a desired state, and because it often requires multiple actions to achieve the goal of restoring state AIOps systems can't always be tested with simple testing strategies. There are four mechanisms that we can use to implement ongoing testing of non-deterministic systems and they need to be used in combination to provide the results that we need in an AIOps environment.
Standard deterministic testing: Executing process A with input B should produce output C. We can use this on all the deterministic bits of code within our systems; the tools; and the basic application of them. This gives us confidence that if an AI Agent takes a right action they will achieve the intended outcome. I recommend that we continue to run these tests on an ongoing basis as a health check, just like we would on any software system. These tests can included agents or not include agents. For example we can test that a log writing tool writes a log entry when called, but for a more complete end to end experience we can test that when an agent calls the tool it consistently results in the log being properly written. We are not testing here that the agent knows to call the tool, but that when it does call the tool it gets a predictable result across scenarios.
Hypothesis Testing: Before an agent takes an action it needs to predict what the outcome of that action is going to be and compare that result to the actual outcome. I propose that this be a central feature of any Agent Driven AIOps platform as it increases transparency and provides a measurable way to judge the accuracy of the agents understanding of context and situational awareness. This is different from testing for the desired result, this only tests whether the result is consistent with the systems expectation. While we expect that no agent, just as no engineer, will always produce the expected result all the time, there needs to be a threshold beneath which the system self identifies that it is not behaving within acceptable operational parameters and recuses itself until the root of the problem can be identified.
LLM as a Judge: The system includes random assessment from another agent set that establishes whether it agrees with the analysis and action of the primary system. This is required in addition to hypothesis testing because a system that accurately predicts that it's action will have a certain outcome but consistently selects non-optimal outcomes is still an issue. A system that accurately predicts that deleting the contents of a folder will release space on a drive, and it can knows which tool to use to do that is effective in implementing it's plans. However, if the correct action was to zip the files and move them to archive so they are available for audit purposes it is still flawed. This test is about correct action. This can be difficult because in real world scenarios the correct action can be far less obvious than the example.
Evolution Testing: Is the system consistently improving based on it's growing understanding of an environment or has it been polluted with noise and now performs less accurately than initial levels? This testing tests whether a system is effectively building consistently better representations of it's environment and is able to use that representation to drive better outcomes. This is less a test of the agents and more a test of the overall systems knowledge synthesis and communication but it is key to the development of intelligent systems because agents to not operate in silos and are composite parts of a complex system. (Again this comes back in many ways to reinforcement learning)
Regular human analysis of whether the system is providing the value it was designed to provide is important and needs to happen regularly as correct and valuable aren't always synonymous. As we move to more and more intelligent AIOps systems, having better and better ways to maintain transparency and resilience becomes more important. We need to build the foundations for the systems that we will have, systems that are intelligent, autonomous, and pervasive. We need to do that now because - A complex system that works is invariably found to have evolved from a simple system that worked




Impressive article. Quite insightful! 🙂