Why is declarative state so important to the future of operations?

Cynthia Unwin
Jan 8, 2023
6 min read

I spend a large amount of time working with teams that are struggling to manage the increase in solution complexity that is affecting all Enterprises that are undergoing digital transformation. While different groups have different problems it is really clear that we can't continue to support modern Enterprise platforms in the same ways that we have in the past several decades. While all platforms and processes evolve and build on good practices from the previous generation of solutions, we are seeing a need to make a fundamental shift in how we approach delivering and supporting solutions over the long term. This isn't just a change of tools or practices, this shift will require a change in how we think about how we do our jobs.

This is not a new message. The technology industry understands on an academic level that we need to work differently to keep up with the explosion of complexity but, as a group, we lack a good understanding of exactly how to make this change. There is no simple solution or silver bullet that will result in the cultural transformation that we really need to make. Abandoning the processes and practices we have spent decades learning in favour of starting fresh will result in us making the same mistakes we made two decades ago again; but clinging to how we have been successful in the past will not allow for the necessary shift in thinking that will drive continued success in the future. At a fundamental level, much of the problem is that it is very difficult to identify, as part of our day to day actions, which of our behaviours are the cause of our current challenges. If I walked into your organisation today and told all of your operational teams that they needed to change the way that they tied their shoes before they left for work in the morning, even if everyone agreed this was something they needed to do, I would predict that after the first couple of days the compliance with the new shoe process would almost immediately drop-off. This would not be because people were choosing not to comply, but because shoe fastening is so habitual that people don't think about how they are doing it, it just gets done. This is the problem. How we work, as an industry, is so deeply ingrained in our days that we perform much of what we do automatically. Shifting this is hard. In order to change what we do, we need to change how we think, so that these habitual actions start to stand out as counter productive.

There are many components to this, however, one of the key shifts in thinking that will enable teams to start to think differently about how they approach their work is the shift from thinking about systems imperatively to declaratively.

Just so we are all rowing in the same direction, imperative programming focuses on providing a list of steps that a computer performs to achieve an outcome, where as declarative programming focuses on the results that it seeks to achieve. This may seem semantic, and in many ways it is but it is critically important.

This is important because this shift in thinking is the key to actually achieving that ever elusive state of no touch production --that nirvana where your production systems are self-healing and operational teams don't spend their days fighting fires. However, for a complex system to truly reach a state where it requires minimal human intervention to keep it running, that system needs to be aware of how it should be running, not performing rote actions that lead to outcomes. By working towards maintaining a target state the system can adapt to what is happening in real-time and adjust it's actions accordingly.

To illustrate this we can talk about the example of the Therac-25. Many of you probably know this story but I'll summarise for those who don't. Therac-25 was a radiation therapy machine designed to use targeted beams of radiation to destroy cancerous tumours inside a human body. For many people this machine was a life saver. Unfortunately, between 1986 and 1987 it delivered fatal doses of radiation to four people and caused severe injury to two more. A variety of flaws led to these terrible outcomes, including issues with the fundamental design, significant limitations in testing, and insufficient use of fail-safes mechanisms. There is lots of information on exactly what happened if you want to read the detail (http://www.cs.umd.edu/class/spring2003/cmsc838p/Misc/therac.pdf). However, this example, partially due to how horrific the outcome was, is an excellent one for setting up the discussion of the value of maintaining a declarative state over performing imperative actions.

The Therac-25 used software to set up the machine; aim the beams; set radiation levels, etc. The software did this by following procedures selected based on input supplied by an operator. That is, it followed imperative commands and assumed that at the end of those command sequences that it would be ready to safely administer a dose of radiation to a patient. Usually this was true, but in at least 6 cases, it wasn't. Thirty years later, teams are still learning the peril of believing that the outcome of a prescribed set of actions will always get you to the same place.

If the Therac-25 had been programmed to achieve an end state that was consistent with a set of conditions it would have been far less likely to operate in an unsafe state. For example:

Imperative Method:

Perform: set dosage; aim beam; set filters; fire radiation.

Declarative Method:

Obtain state of: dosage set, beam aimed and filters set - then fire radiation

The declarative method, by it's nature seeks to be in a correct state and when it is not the software continues to adjust. If for some reason a timing issue, counter overflow, or some other unforeseen event occurs in the initial set up process the target state won't be achieved (and the radiation beam is not fired), However, if something goes wrong in the do a, then b, then c process of the imperative method, and it is not caught by error handling, the machine will operate in an unsafe state.

It is true that good programming and testing could have avoided many of the Therac-25's problems. Imperative checks of state before completing the process could have been implemented; stronger error handling could have been in place; all the things that programmers and testers have learned to do in the intervening years to avoid this type of problem would have helped. However, all of these techniques rely to some extent on the programmer or tester anticipating what could go wrong. I am sure that I am not the only one who has spent my entire career striving to prepare for anything that could go wrong, only to repeatedly be completely blind-sided by something I had never imagined. We will never think of everything.

So what does this have to do with changing the way teams think and through this, eventually achieving the goal of "no touch production"?

If we want systems to be adaptable and react to unique situations, they need "know" what an acceptable state to be in is. As long as systems seek to act in a certain way and then assume that this will result in them being in the correct state every time, they will continue to fail in new and wondrous ways. Examples of automation that has run successfully thousands of times in the past suddenly reducing a production system to an incoherent mess because they encountered a novel situation are not hard to find. (https://fortune.com/2021/12/10/amazon-software-problem-cloud-outage-cause/),(https://www.indiatoday.in/technology/news/story/google-explains-why-gmail-youtube-and-other-services-suffered-outage-1751088-2020-12-19)

By seeking to maintain specified parameters, a system is free to adapt and (theoretically) pulls back towards the correct state when things start to go wrong, rather than compounding errors or continuing in an unsafe manner. This isn't fool proof of course. The defined state could result in undesirable outcomes even if the defined parameters have been met but targeting an outcome is far more robust and flexible a model than repeating steps that "should" get you where you need to go.

This is the fundamental idea behind how k8s and k8s operators manage a system, however, to be successful, we need to propagate this mind set out to all levels of Enterprise system management. This shift focuses our attention on what we are trying to achieve and embraces the fact that there is often more than one way to get where we need to go. All systems will still need to be told how to perform the steps necessary to make changes and achieve goals, but by shifting the focus from performing actions to obtaining an outcome, we open the door to adaptability both in software, and in teams.

Why is declarative state so important to the future of operations?

Recent Posts

Comments