Getting AIOps Right
- Cynthia Unwin

- Aug 4
- 6 min read

Several years ago I wrote an article about there no longer being a role for support teams who are just keeping the lights on. I talked about complex systems and the necessity for teams to really dig in and understand their environments to be successful. Now we stand at the beginning of a fundamental change in how we support large, complex platforms. We need to think differently and I find those words from years ago to be more true than ever.
I won't beat around the bush. By treating IT support as a low skilled role we cripple our ability to meet the challenges we currently face. My apologies to all the excellent support teams out there, but I am seeing the industry taking the wrong path in more and more situations. Staffing with lower, and lower skilled resources that are measured on the wrong things because we are going to depend on AI and Automation to solve problems will drive exactly the wrong outcome. We need to be investing in higher skilled resources that can really drive the adoption and implementation of AI and Automation at scale, not undermining out ability to adapt by delegating critical thinking to AI and leaving lower skilled tasks to humans
. It's a chicken and egg problem. How do we integrate really smart solutions to complex problems into platforms if we take away the people who understand the problems and how the solutions can best address them?
The industry continues to struggle with reliability in systems that are highly distributed, externally impacted, and difficult to trace. Now we are adding non-deterministic software into the mix which will vastly increase the overall complexity. Our approach, as an industry, to this is to staff support teams with fewer lower skilled resources. While I understand that the idea is that AI and automation is going to make us more productive and remove the need for higher skills, experience tells me this is not going to have the outcome we are looking for. I agree that we will need likely fewer overall people do do the same things (freeing people to do work that isn't currently getting done), but it isn't the low skill people that we will need to rely on to fill the gaps. The automation will do this, it's the hard problems that we will need people for.
To be successful working in an AI enabled world, supporting non-deterministic software, using non-deterministic tools we need support resources with exceptional communication skills; an understanding of business outcomes; an almost compulsive desire to understand how things work; a willingness to learn new skills at an unprecedented rate; and no fear of change.
I used to spend a huge part of my time advocating for the inclusion of Site Reliability Engineering (SRE) teams on every project. I don't support that approach any more. SRE style resources can't be an add on to support teams in an era were the automation of toil can be pervasive. SRE style resources need to replace lower skill support teams. We shouldn't need the standard sysadmin role in modern systems, that repetitive, known outcome, type role needs to be automated out of existence, however, that doesn't mean that we don't need those people. The types of problem that teams face in an AIOps driven environment are different than they were in the past, and we need to build our teams differently to meet that change. We need people who work differently (notice how I didn't say different people). We need problem solvers, innovators, and communicators to meet ongoing challenges and to drive continuous improvement in a world that is changing quickly.
Our next generation support teams don't need to be large, but they do need to be high skill and they need to be a mix of junior and senior resources. Support resources in this era need to be able to code and follow a rigorous SDLC. They need to be able build automation scripts, write python, and test effectively because this will be the core of their role. Scripts, bots, and processes will monitor systems and gather information. Humans will step in when new bots and scripts are required, and when problems exceed the complexity of the existing automation. This means the teams need to not just be able to run and monitor automation, they need to be able to constantly improve and adjust it. Automation can't be done to a team, it has to be the primary focus of the team.
AI enabled support teams need to have excellent communication skills. They need to be able to understand user and business problems at a fundamental level. As software (and support processes) become less deterministic, teams need to understand intent to a far greater extent than was necessary with software that did a thing in a certain way. Working as designed will be harder and harder to determine and less and less acceptable as an explanation for a problem. Support teams are the advocates for client users and need to really understand what it means to provide a valuable experience not just check check boxes on a list. "Working to contract" in a world that is using technology to create lower and lower friction outcomes will miss the mark. Teams that can't meet experience expectations will no longer be able to rest on having met SLAs.
We have talked about a commitment to improving skills, to continuous learning cultures, for many years. This becomes critical for support teams moving into an AIOps driven world. As technology changes and matures at an exponential rate we need to both learn new things and cast aside old ideas at a rate that requires constant attention. Technologist who are not dedicating a portion of every day to new learning are going to fall behind very quickly. To maintain a rate of technical learning high enough stay current doesn't just require teams that are driven to learn but it requires a management commitment to setting an expectation that support resources invest in their own skills. Not only does this expectation need to be set from the leadership level, it needs to be given direction by technical leadership. While all learning is good learning, in the current climate it is very easy for technologists to fall into silos that blind them from maintaining the necessary breadth to be successful. It is very easy for people to be pulled into a vendor specific viewpoint if they aren't constantly being challenged to think more broadly.
Finally, AIOps driven teams need to not just be comfortable with change, they need to actively seek it. Metathesiophobia is the fear or dread of any kind of life adjustment. This anxiety about change is disturbingly common and can very quickly undermine the value of a team. Any time I hear a team say "that won't work here" instead of "how can we make that work here" I know I've met a metathesiophobic team. Support teams in the new world need to wake up every day thinking how can we do things better, they need to constantly push for change because the world is changing around them and if they remain constant they are falling behind. Change can be uncomfortable but it can also be freeing and this ability to be freed by change, I would argue, is every bit as valuable and important as technical skills. Without the courage to do things differently support teams atrophy and fail to meet challenges effectively.
I was in a meeting about a year ago, and I don't remember what project it was, or who said it, but someone accused us of being well intentioned amateurs. That phrase really stuck with me. Change is hard, and it requires work and commitment. As we seek to launch into pervasively AI driven IT support models the risk behind this phrase becomes more and more real. We can't do new things the same way as we did what came before or we won't get the revolutionary outcomes we are looking for. To be in front of the change we need to invest in the skills, processes and mindsets that are necessary to be successful. We can't just implement new processes or use new tools, we need a new approach. We need teams that are prepared to meet the challenges ahead.




Comments