The Search for Broken Feedback Loops

Cynthia Unwin
Jan 17, 2023
5 min read

Digital transformation is only partly about technology and in many very real ways technology is the easy part. The real crux of successful long-term delivery of transformation is effectively shifting the way people think about building and supporting technical solutions, shifting how they approach problems, There is endless discussion how to orchestrate change, how to get people to be motivated to work differently, how to get people to make this shift. We talk about establishing actionable metrics; we talk about psychological security and making safe places to fail and to grow and these discussions are often insightful and are definitely important. Today, however, as I wade through another round of trying to help struggling teams change how they work, while under time; resource; geographic and cultural constraints, I'm looking for something more concrete and actionable.

There is no simple solution to this problem, however, I recently read David Anderson's The Value Flywheel Effect and at the end of a chapter, while providing a brief discussion of socio-technical systems he made a comment about looking for broken feedback loops. While his context was slightly different, I found that directive exceedingly compelling.

If we looked at the repeating patterns that get technology teams into trouble when managing platforms where change is happening quickly, complexity is increasing exponentially, and there never seems to be enough time to do everything, there are a couple of things that stand out. First, many teams focus very heavily on measuring outcomes that are lagging indicators of success. They measure how long it took to release code, how much additional traffic a change drove, how much it cost to implement a new feature. These are all items that are only measurable when the work is complete. Lagging indicators in most cases are reasonably easy to calculate and are important but in all reality they appear too late in the cycle to produce change. They all look at how well we did, not how well we are doing right now. We can't change what we did, however, we can change what we are doing. What we are doing right now is the realm of the leading indicator. These are less concrete kpis that predict an outcome. Leading indicators include things like how much work are we completing in this sprint compared to how much we expected to do, code coverage, code churn, sprint defect rate, ongoing new skill acquisition, production error rates, chaos testing results. Leading indicators measure the items that lead to success; they measure processes while in flight, when we still have the power to change them.

Measuring leading indicators can be tricky though, and this is where the feedback loop comes in. The ability to see trends that will ultimately lead to failure is entirely dependent on having a consistent feedback loop in place that tells the right group that a shift needs to be made before it is too late to effect the outcome. Intentionally designing and monitoring feedback loops across a platform is critical to long-term growth of a team but it is not always as easy as it sounds.

There are a couple of different feedback loop anti-patterns that we need to be aware of when looking for problems:

The first is the most obvious, the completely missing feedback loop. For example, k8s needs to restart a micro-service every 48 hours because it is running out of memory/CPU/threads but there is no impact so it doesn't get tracked or raised beyond the L1 ops team (no feedback loop) and investigating/fixing it never makes it into the backlog. This issue festers under the surface until a new feature is deployed that makes it worse (and now we don't know what code introduced the issue) or the environment changes and this issue now causes an outage. At this point a loop farther down the chain picks it up, but it is now a problem.
Next we see feedback loops that provide feedback to the wrong place. Example: A delivery manager produces a monthly incident report which is shared with the business that shows incident counts by category but this isn't reviewed with the Reliability Architect and nobody ever completes a technical trend analysis to find the underlying patterns.
Closely related to the issue above is the loop that terminates without ownership. The loop that provides feedback to nobody in particular; a warning is sent to a group mailbox; a condition is raised to a Slack channel that many people can see but nobody is actually responsible for monitoring. A feedback loop is only useful if somebody does something with the feedback.
Finally, there is the feedback loop that provides no useful information (noise). The classic examples of which are the alerts that page a resource when there is no action to be taken or the monitoring dashboard that provides lots of charts but no clear relationship between the charts and the actual operation of the system.

When I look back over the years at the teams I have worked with this pattern is clearly visible. If I look at teams that have really struggled, I can almost always trace the issue back to a missing or broken feedback loop. These loops can be both tactical and strategic depending on the long term goal they are trying to support. Teams that don't track how they spend their time end up with velocity problems because they can't allocate resources properly; teams that don't unit test effectively miss release dates because of defects that are built on top of defects; teams with poor monitoring have more outages; teams that don't track whether their team members are happy lose skills to greener pastures. If your team is struggling, look for the loop that is missing, or goes the wrong place.

Platforms that are deliberate about creating useful and intentional feedback loops run into fewer surprises and can better manage their time and resources over the life-cycle of a platform. Teams that test effectively, earlier and more often can better control their development cycles than teams that test later; teams that know exactly how a system is going to perform during failure scenarios have fewer outages and react faster when they do; teams that actively promote skills development and make time for resources to learn new things regularly, develop the skills to grow with a solution. While none of these points are new, the lens is slightly different. If we really look at what information we are tracking, when we are tracking it, what it tells us, and who needs to know, we can both streamline our processes and reduce the amount of information that teams need to process to be successful. Just like dashboards need to be planned, feedback loops need to be created intentionally, and while in some cases they will grow organically, sometimes they will need to be pruned or trained to grow in a different direction.

The Search for Broken Feedback Loops

Recent Posts

Comments