#54: Code Coverage
Hello! Today, we will discuss code coverage and whether we should aim for a specific target.
When we talk about code coverage, we are generally referring to line coverage—the percentage of code statements executed during testing.
Though it isn’t a perfect metric to assess the quality of our tests, code coverage is a standard, objective measure in our industry. That’s one of the main benefits: whether we move to another team or company, code coverage would still hold the same meaning. Therefore, tracking this metric is something we likely want in any codebase. As Kate Meyer once told me:
“What’s important with code coverage is that it exists.”
I think this captures quite well the essence of why code coverage matters.
As software engineers, it’s important to be aware that high code coverage doesn’t guarantee high-quality test coverage. Generally, the higher the coverage, the less likely we are to miss a bug, but this isn’t a strict guarantee. Code coverage simply measures the lines of code that are executed by tests, but it doesn’t assess the effectiveness of those tests. For example, if our tests miss key behaviors or edge cases, even 100% coverage can leave bugs unnoticed.
Yet, aiming for high code coverage should remain, in general, a good practice. As Google puts it:
“Our experience shows that efforts in increasing code coverage can often lead to culture changes in engineering excellence that, in the long run, reduce defects.”
This brings us to the million-dollar question: Should we enforce specific code coverage percentage? For example, preventing any code from being merged that has a code coverage below x
%. In my opinion, no.
I once worked on a team where we had to target 80% of test coverage. On many occasions, we suffered from this constraint as we ended up writing tests just to meet the number, not necessarily to really improve quality. Again, tracking the code coverage metric is something important; yet, as stated by Goodhart’s law:
“When a measure becomes a target, it ceases to be a good measure.”
The situation I observed in many teams was the following: “My code meets the coverage requirements. Cool, I’m ready to create my pull request.” Said differently, developers treated code coverage as a box to tick, often stopping once they hit the target (and I have to admit I reasoned similarly from time to time).
Instead, I believe that judgment is more important than any rigid percentage. This approach places responsibility on the coder. I also think this is a better way to make junior developers grow if we let them think for themselves about what should be tested rather than following blindly a metric. And if gaps remain, they could still be addressed during pull request reviews: “This function is not covered, how about adding a test for X and Y?“
Another challenge with a strict target is defining the right number. For example, why choose 80%? Why not 91% or even 67%? There’s no universal coverage target that would fit every project. Even inside a project, some parts deserve more attention and coverage than others.
At Google, there’s no hard requirement for coverage; instead, we consider 60% acceptable, 75% commendable, and 90% exemplary. We also favor mutation testing, and many teams use an internal tool that warns (but doesn’t block) changes if they lead to a significant coverage drop.
Lastly, pushing code with very high coverage has its costs. Beyond a certain point, adding coverage isn’t linear. For example, going from 90% to 95% can require much more effort than going from 55% to 60%. Also, aiming for 100% can lead to overly specific tests that are way too tied to the code; hence harder to maintain.
In conclusion, while code coverage is valuable, its effectiveness depends on the context. Our best bet is to use it as a tool among many others and balance it by experience and judgment.
Explore Further
Tomorrow, you will receive your weekly recap on the theme of unit tests.