Deciding When a Fix Is Worthwhile
Beth had asked for my opinion on a feature she was finalizing. The project was a biggie — year long, 20+ engineers, many different tech stacks that had to integrate with legacy code, etc. I sat down and she showed me her laptop screen and then groaned. “I forgot to turn on the Product Set feature flag. Hang on.” She sighed. She tapped out a few keys and then sat there. We waited. And waited. A minute and a half went by, which is an eternity when you’re waiting for code. Finally it launched. Then she had to click around a few times to get back to the screen that had the thing she was demoing.
The feature was fine, in fact I don’t even remember what it was. What concerned me was how long the feedback loop was between changing code and then seeing the effect of that change. The longer that delay is, the less productive people are. I decided to bring it up.
“Uh, so that build time seems pretty long… what happened to the automatic reloading?”
“Oh, that’s been broken for weeks. The only way to bring up the site is to run it in prod.”
<strange high-pitched sound I make when I’m trying to remain calm but not doing so great at it>
This was bad. Not “we’re screwed” bad, but still, not good. A production build on this app meant that even if a change was small, the build system would need to process all tens of thousands of lines of code to implement the change. A developer will make hundreds of changes to code in a day, and then compound that by the number of people on the team and pretty soon you’re talking a big chunk of time. Further, production mode meant that a lot of debugging tools weren’t available. This was a React app, and there was a rich ecosystem of tooling that couldn’t be used.
After a little more prodding it turned out that this started happening when we had to consolidate both the back-end API and the front-end React code into a single repository1, and the developer who did that was in a time crunch and did it quickly, which broke the automated build. The fix was spending a little bit of time to update the build scripts to allow both the API and front-end to run in debug mode concurrently. It took about four hours to research, implement, and test. That response loop dropped to just a few seconds, debug tools were available again, and there was much rejoicing.
Four hours is not much time in the grand scheme of things. But, that time was never budgeted in any sprint planning, triage, or any other kind of scheduling meeting. A common trap that developers can fall into is tunnel vision. “Just get the task done. Just get the task done. Just get…” and lose sight of the forest because of the trees. To go back up in the story briefly, I should say that Beth is a fantastic developer and consistently built systems quickly and at a high standard of quality, but even someone like that can slip into an anti-pattern.
A management super-power that is worth developing is to identify tunnel vision when it happens and correct it. This can be tricky because another common trap is to spend far more time automating a process than time spent dealing with the “problem” manually. Here’s the criteria I use when making this determination:
- How much time is wasted each time the problem occurs? And what’s the frequency?
- How many people encounter this?
- Will either of those numbers increase or decrease over time?
- How much time will the fix take? (This assumes the fix is a known quantity.)
In this case, #1 was a trivial amount of time compounded by its frequency, then compounded by #2. And since code has a tendency to only grow over time, which increases build time, #3 is going in the wrong direction too. Based on that, adding more people to the project will increase the time spent dealing with the issue. Unless the fix was gonna take months this was a no-brained. Fortunately, I was familiar enough with the build system that I could take a reasonable stab at estimating #4 and knew it wouldn’t take long.
If you’ve answered those questions and are still unsure of the direction to go, I recommend following Dave Thomas’s advice about working with agility (little A). I have a lot of criticisms about the state of Agile (capital A) these days, but this is gold:
When faced with two or more alternatives that deliver roughly the same value, take the path that makes future change easier.
In this case, the tradeoff was simple. Speeding up the build and the evaluation loop of software development will absolutely make future changes easier, and it’s almost always worth putting in extra effort to improve those.
I like situations like these. I mean, I don’t seek them out, but when they happen, it feels great to solve them. The solution wasn’t painful, and the result made the team faster and better. That’s a big win.
Sometimes clients like to throw wrenches into things. It’s fun.