The Hidden Cost of Process: A Guide for Tech LeadersBuilding processes is a mechanism for resolving issues fully. But they're also bottlenecks which slow your organization down. How do you navigate this conundrum?
Welcome to the Scarlet Ink newsletter. I'm Dave Anderson, an ex-Amazon Tech Director and GM. Each week I write a newsletter article on tech industry careers, and specific leadership advice. Free members can read some amount of each article, while paid members can read the full article. For some, part of the article is plenty! But if you'd like to read more, I'd love you to consider becoming a paid member! I really wanted to send someone an article about process, but I had an existing article from a few years ago (which needed work). So, per policy, I re-wrote it this week to my current writing style. Hope you enjoy! I was talking to a manager recently who left Amazon for a much smaller company. She said that it was great how fast she felt the company could move. How they made decisions quickly, and how it felt like people just got things done. Projects that would have taken a couple of weeks at Amazon were pushed to production in a day. This was in comparison to her prior (huge) org at Amazon, where it felt like even the smallest change took forever. I asked her, now that she’d been at the company for a couple of months, what she thought her first focus might be. Perhaps something she already observed that she could improve using her experience. She said that she was hoping to “level up” their processes a bit. She said that everything was ad-hoc, and it made her worry that they were going to make mistakes. So she wanted to bring in an operations meeting, a program review meeting, and some processes around change management. I thought that was pretty funny, considering what she admired about her new workplace. I am not usually subtle with my communication, but I tried hard.
Well, subtle isn’t my strong suit. She responded with something like, “Yeah, I’ll take things slowly” and then changed the subject. Because I am not practiced in telling people that they’re potentially wrong in a way that doesn’t hurt their feelings. But why was this manager looking to change things and add process? In large part because managers are expected to do something. As she was new to her role, she became concerned that she had nothing to show for her presence. If she didn’t make a change, what was she there for? And managers have limited tools in their toolbox. If an engineer wants to make a difference as a new hire, they can build software. If a manager needs to make an impact, they… make process? When you have a hammer, everything looks like a nail (as they say). This problem of “I need to do something!” is even worse if something is going wrong. When something goes wrong.Earlier in my Amazon career, I was a manager in a large organization. Our organization had made it through years of increases in traffic without major incident. At Amazon’s historical growth, that means that each service dealt with the growing pains of doubling every year for many years. Sometimes that meant only additional hardware. Sometimes it meant re-building a service to make it more robust. Either way, dealing with scaling was a large part of the engineering focus. Then one Q4 (peak traffic for most parts of Amazon), a critical team hadn't properly tested the ability of a new service to scale. There was no red flag in the event, it was simply an oversight along with a team distracted by many projects which were barely meeting their dates. Either way, the new service didn't handle the increased load, and it took days of throttling and engineers scrambling before the related system could be improved enough to handle the load. It was a painful event for everyone involved. Customers were impacted, though not drastically. Engineers stayed up late, and their management teams had to repeatedly explain the situation across departments. It was certainly not at the scale to impact an earnings report, but it was a metaphorical egg on the face of the organization's business leaders. They needed to do something. You can make mistakes at Amazon, but you’re always asked, “What are you doing about it?” Because something has to be done. And that’s the root of this article. It’s about leaders that for various reasons feel the need to do something, the cost of doing something, and how you should think about these decisions. So what did they do?If a leader wants to keep their job, they can’t shrug and say that there was nothing they could have done to avoid a bad event. And I don’t disagree with that assessment. You don’t want someone shrugging when bad things happen. So in my example story, the leadership team being the types of people intent on keeping their jobs, they turned to their teams, and demanded something be done. "Explain how this will be prevented in the future." As smart Amazon employees, we know the general answer to this question. It’s almost always mechanisms. Mechanisms are essentially processes (or changes to existing processes) which enable you to systematically solve issues, rather than dealing with one-off problems. An additional benefit of talking about mechanisms is that there is less focus on the individual people or teams. So the team that made the mistake wouldn’t need to throw themselves under a bus. Instead, they could point out (correctly), that this could have happened to anyone. It’s a way of taking a more global ownership over an issue, without suggesting that any human deserves the blame. Which actually made working at Amazon bearable because it’s already a high-pressure environment. By most situations being about mechanisms, you could focus less on individual mistakes, and more on how you got there. Therefore, in this situation, like others, you could explain that this wasn’t an error by this team. It was instead a missing process, which means we (the entire organization) were all at fault. The team wrote that the missing mechanism was a process for preparing for Q4 load. They explained that every team did ad-hoc preparation for Q4. The word ad-hoc as used was code for randomly, which was essentially the team waving a red flag in front of our leaders. Because doing things randomly is clearly bad. By pointing at how bad everyone was, they emphasized that this error was an inevitable conclusion of an organizational mistake. The team proposed that the only way to avoid inevitable painful customer-facing issues in the future was to create a centralized Q4 operational planning process. They gracefully volunteered to lead the effort to create a process. Now because this article isn’t about this Q4 event, I’m going to cut the story short. They created a documented and detailed Q4 operational planning process. As the next Q4 approached, every team in the organization had to prepare a document, run a set of load tests, run various availability tests, review their alarms, and give a presentation in front of their leadership teams. Did this mechanism help?Did this new mechanism save the overall organization from encountering another painful and embarrassing outage? Well, there wasn’t a major outage that Q4, but there’s no proof that the preparation actually saved anyone. It’s hard to prove or disprove a negative. The next year there was an outage, and the responsible team added more steps to the Q4 preparation process because something new had gone wrong. Politically, I assure you that the widely broadcast statement outside of our organization was that we had invented the best mechanism, and everyone should consider doing the same in their organization. Because in large companies, you’re rewarded for the perception of your actions, not necessarily for actual results. No one can prove or disprove that the Q4 preparation process saved us, but you can claim it did. However, here’s the kicker. Who paid for the added cost of this new process? Everyone. It became part of the cost of doing business. No one sat down and said, “Ok, this process costs approximately X. If we save on average one outage per year, that costs Y. Let’s compare the two.” And I assure you, as a part of that organization, that the Q4 preparation process was incredibly expensive. Dozens of hours of work from every team across a massive organization means millions of resources spent on what? Making it less likely that a problem happened. I’m not saying it wasn’t worthwhile. I’m saying it was expensive, and I don’t think this organization (or most organizations) would properly evaluate that expense. Ignoring the cost of process.In general, while large companies reward perception of action, they ignore the cost of process. So what happens? Large companies and organizations tend to create a lot of process, and rarely remove it. They’ll even say that their process was invented to allow them to move quickly.
When asked why they’re less agile, they’ll say that smaller companies couldn’t possibly understand the challenges of operating at their scale. Which is true, to a point. I know many engineers joined Amazon, and were shocked at the scale of their software. And the impact of making a small mistake. But large companies or organizations continue to add processes, and a decade later, they realize that stuff is getting done a lot slower than at those smaller companies. Is it because their systems are more complex, or their scale is just too big to move fast? Partially. But another reason is that processes are by definition bottlenecks. They're institutionalized bottlenecks, purposefully put in place to slow people down to ensure they do the right thing. Mechanisms can be an invaluable tool, but they frequently end up being one of the largest (and hardest to see / measure) costs to your business. Why are processes expensive? Let’s walk through three specific costs... |