"They couldn't count what mattered so they made what they could count matter."
That quote is from a Vietnam vet in Ken Burns' Vietnam documentary. It really stuck with me as I was in constant conflict over the use of metrics in my professional career. I've seen metrics used well and be a critical factor in important decisions. I've also seen metrics used incredibly poorly, but I could never provide a good argument or alternative in those cases. That changed when I watched this part of the documentary and heard that quote.
For background: This part of the documentary talked about how it was difficult for the US to track progress in the Vietnam War. In previous wars you could look at the territory you held. Here, territory meant little as it was all theoretically held by South Vietnam and the US.
What was the solution? How could they measure progress in a way that would look clear in a report?
Body count.
The metric was the number of people who died on each side and the side with fewer deaths and more kills was winning. It was an easy number to put on a report for people up the chain. What about morale? Or public support? Those are harder to measure so... why bother. The results were not great.
Trying to improve efficiency in software development doesn't quite compare to The Vietnam War, but sometimes extreme examples can help put things into perspective. In every company in every industry, similar mistakes are made. You have a goal that you don't know how to measure, so you find something measureable and make that the goal.
In software development, a lot of the true goals are actually measureable. High user engagement. High number of active users. And of course: high revenue. The problem with these goals is that there is significant lag in seeing these metrics and the start of development. You can't see many active users you'll have until after the software has been launched. What if it takes 6 months to get something launched? Will management be happy with no visibility into progress for 6 months?
Unlikely.
What can we measure in the mean time? The number of JIRA tickets completed. Sprint velocity. Number of commits (I see you paying attention to your Github contribution graph).
What we ultimately want though is paying users. The number of paying users is measureable, but the reason they are paying is less so. We can talk about user value, but that's not the easiest to calculate. Sure looking at a puppy photo may improve my mental health, but by how much? 3 mental health units? 10 mental health units? What the hell is a mental health unit?
In an ideal world, our requirements are perfect and so every JIRA ticket perfectly represents user value. That's almost never the case. That leaves a disconnect between what we can measure versus our actual goals. If you've spent any amount of time in software development, you know which one of those wins out.
Things get worse when performance evaluations come into play. What can we measure? JIRA tickets completed. What actually matters? Users. You can't evaluate a developer's performance based on whether users like the product though, it was someone else's job to get requirements! Yet, when a development team is asked to take on something, the process has that automatically added to a backlog to be evaluated at a later date. That task could have resulted in new information about users that invalidates the current sprint's work, yet it still goes to the backlog and waits for its turn.
Sometimes it's not management putting in place metrics for performance evaluations that cause that behavior. Sometimes, developers do it to themselves.
I remember having one job where my team "failed" 90% of our sprints. I'm probably being generous there too because it seemed like a miracle every time we actually completed a sprint. This bothered me way more than it bothered my boss. I used to put the blame on interruptions as I was constantly being interrupted from various people in the company. My manager asked me to create a log of these interruptions (I suspect he knew what the results were going to be which is why he never cared about our failed sprints).
The result was that I was interrupted somewhere between 25 and 30 times a day. Focus time was non-existant. I'm willing to bet that most developers reading that immediately cringed. I cringed a little just remembering it.
However, my manager and I looked at that log a bit deeper. Why was I interrupted? More importantly, what was accomplished in those interruptions? We realized that those interruptions created more value for the company than a lot of the work my team could not get done in our sprints. That value wasn't really measurable, we were making a judgement call based on our understanding of our business. I was just caught in the goals I could measure for myself rather than our actual goals.
Therein lies what I believe to be the underlying problem. When your ultimate goal is not measureable, you have to do the work to make a judgement call. It's a lot harder than collecting and looking at a metric. It requires a deep understanding of the problem. There's also no deferrance of blame if you're wrong. This fear of using good judgement is what causes us to flock to the comfort of seeing numbers go up and down. It causes us to stay in that comfort zone, with all the misplaced incentives that are created as a result. And it affects nearly everything, from software projects to major world conflicts.