top of page
alwaysOn
Things I probably should have known...

Home: Welcome
Search


What I learned building an agentic ant colony.
Photo by Jorge Coromina on Unsplash Over the past week or so I have been working on coding an agent ant colony that restores service to a running application. The agents don't do complex RCA, log tickets, interact with engineers etc.. They just keep the application up and running. It was definitely fun, and I learned some interesting and useful things. What I built: A simple python based web ordering application with a front end supported by two micro-services each with

Cynthia Unwin
Jan 247 min read


Agents: How do we know they work?
Photo by Dean Pugh on Unsplash Agentic platforms are everywhere and we are pushing forward to use more and more AI driven software. As Site Reliability Engineers we need to really think about what it means to run diverse agent platforms at scale. We need to think about what needs to be in place to make them manageable. How do we know right now that our agents are working? What do we need to see in the logs to troubleshoot when they don't? What data needs to be gathered a

Cynthia Unwin
Jan 157 min read


Framing the Problem
Photo by Gaspar Uhas on Unsplash Following through on the the fundamental assumption that the key to solving a problem is understanding what that problem is and being able to ask the right questions about that problem to expose how to create a solution, it's time to step back and take a quick review of what it is that we, the AI Enabled SRE community, are talking about when we discuss AIOps. When we look at IT Operations through the lens of how we implement AIOps we deal wi

Cynthia Unwin
Jan 84 min read


Asking the right question
Photo by Camylla Battani on Unsplash "If I had an hour to solve a problem, I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions." Albert Einstein I've been thinking about Bas Pluim's comment from my post the other day. It isn't a new thought but it is a really important one. As I look back on my career, it is clear how much time we (as an industry) spend solving the wrong problem and Bas's comment about not just achieving a goal but taking

Cynthia Unwin
Jan 44 min read


The first rule of Agent Driven AIOps
Photo by Raffaele Parente on Unsplash There are lots of rules for success when it comes to Agent Driven AIOps. Allowing non-deterministic software to take action in your critical environments is high risk. But it's also high reward. So, how do we manage this risk? There are a lot of layers to the answer to this question but let's start with something that is obvious, but is harder than it looks. Software running in a complex system is effected by circumstances external t

Cynthia Unwin
Jan 24 min read


It's really a search problem...
Photo by ün LIU on Unsplash Or more specifically it's a knowledge synthesis problem. Building AI agents or agent teams for AIOps systems isn't hard. Even if you build them from scratch, a bit of python and an API key gets you a piece of non-deterministic software that can legitimately do some cool things. It can even do some smart things. The trick is to get it to do consistently useful things. This is much harder. Lots of things contribute to this from choosing the ri

Cynthia Unwin
Jan 13 min read


Ants as Agents
Photo by Christian Holzinger on Unsplash Today I learned a new word. Stigmergy. It's a good word. Stigmergy is a form of indirect communication where agents coordinate their actions by modifying their shared environment, leaving traces (like pheromones or digital markers) that influence the subsequent behavior of other agents , creating complex, self-organized systems without central control. I learned this word when I was reading about ants. I was reading about ants b

Cynthia Unwin
Dec 14, 20253 min read


Getting AIOps Right
Photo by Immo Wegmann on Unsplash Several years ago I wrote an article about there no longer being a role for support teams who are just...

Cynthia Unwin
Aug 4, 20256 min read


What Oatmeal Taught me about Software
Photo by Andrea Tummons on Unsplash Recently I had a life changing moment. I was having lunch with a friend and he told me that he...

Cynthia Unwin
Jan 20, 20254 min read


We need to talk about Agile...
Photo by Trnava University on Unsplash I know a lot of companies and teams do an excellent job of Agile development at Enterprise scale....

Cynthia Unwin
Jan 13, 20253 min read


There is no more "keeping the lights on"
Several years ago I came across a presentation by an Architect at IBM named Simon Grieg. It was a presentation that I found when I was...

Cynthia Unwin
Jun 5, 20235 min read


Why Cloud Projects Fail
I talk to teams on a daily basis that explain to me that their solution is different. The normal rules don't apply to them because they...

Cynthia Unwin
Mar 18, 20233 min read


The Search for Broken Feedback Loops
Digital transformation is only partly about technology and in many very real ways technology is the easy part. The real crux of...

Cynthia Unwin
Jan 17, 20235 min read


Why is declarative state so important to the future of operations?
I spend a large amount of time working with teams that are struggling to manage the increase in solution complexity that is affecting all...

Cynthia Unwin
Jan 8, 20236 min read


Time to Think
Last week I went to the office for the first time in more than 2 years. I had to wear shoes. I lived through it though, and coming out...

Cynthia Unwin
Apr 26, 20223 min read


The power of testing your understanding.
I've been waging war against technical debt for the better part of two decades. I've watched good teams drown slowly beneath waves of...

Cynthia Unwin
Nov 17, 20215 min read


What does it take to make technical operations teams resilient?
Over the past year of lock downs and home schooling, of changing roles at work and at home, I have spent a lot of time thinking about...

Cynthia Unwin
Jun 26, 20217 min read


Episode Two: What Makes Mainframes Different?
If you started this journey with me in "Episode One" you will have read about mainframe computers being fast, resilient, secure and...

Cynthia Unwin
Apr 15, 20204 min read


Episode One: Why Mainframes?...
I have spent the last 20 years absorbed by computers, computing, and software and I like to think that I have a broad range of skills and...

Cynthia Unwin
Apr 5, 20204 min read
Home: Blog2
bottom of page
