«    »

Reviewing Automated Test Code

Software engineering research and my own experience have taught me the value of code reviews and automated tests, especially automated unit tests written using test-driven development (TDD).

In the past my general pattern when doing code reviews has generally been to focus on production code and do a more minimalistic review of the unit test code. But over the past year, I have been seeing indications that this might be a mistake. One respected colleague even advised me to reverse my priorities and focus primarily on the tests. So why would reviewing test code be so important?

Reasons to review test code

Research by Caper Jones indicates that errors in test cases are quite common - as high as 20%. In other words, one in five test cases has something wrong. Caper was likely referring to manual test scripts rather than automated test code, but there's no reason to expect the act of automation to magically fix all such errors. I recently reviewed a set of unit tests for a class and found one test that made no sense - it appeared to be verifying behavior against the wrong expected result. When I raised the issue with the developer, they confirmed that the test was wrong. Fixing the test revealed a defect in the production code which they then promptly fixed. This was the same production code that I had already carefully reviewed.

Automated tests can have other issues besides simply being incorrect. Tests can be incomplete, weak, fragile, or hard to read and maintain.

Incomplete tests fail to test certain functionality. Reviewing the results of a code coverage tool is a very easy way to identify gaps in coverage of the existing production code. A more careful review of the actual tests is needed, however, to ensure that all required functionality is actually implemented since code coverage cannot identify gaps when production code is missing.

Weak tests have a low likelihood of detecting defects. One example is tests for trivial getters and setters - such tests are so weak that I generally consider them useless clutter and delete them when I see them in a review. Test weakness is really a continuum from virtually useless, as per my example above, to extremely powerful. Part of reviewing tests is to identify when the power of the test can be improved. A while ago I had to change a business rule to use less-than-or-equals logic rather than less-than logic as originally specified. I confirmed there were tests covering this rule, then changed the logic and re-ran the tests expecting them to fail, but to my surprise they all passed. Upon investigating, I discovered that the tests improperly exercised the boundary condition - they tested a value below the boundary and above the boundary, but not on the boundary. So the equal-to-boundary case, which was the logic I was changing, was not tested. This prompted me to review the tests for other similar functionality, where I found the same pattern of weak boundary tests, a few of which had led to defects escaping detection. If we had been more diligent reviewing the unit tests when this functionality was first written we could have avoided these problems.

Fragile tests have a high likelihood of requiring updates when production code is changed. In other words, changes are likely to cause such tests to start failing. Having too many fragile tests can lead to maintenance issues down the road - developers may choose to delete such failing fragile tests rather than fixing them, or worse might choose to stop running the tests altogether. Martin Fowler termed this test cancer. Reviews by individuals experienced in maintaining unit tests are the only way I know of detecting and minimizing fragile tests.

Test code needs to be clean - easy to read and maintain - just like production code. In fact, many proponents of automated tests argue that well-written tests can function as documentation highlighting what the production code is supposed to do and providing examples of how to use specific APIs. In my experience, using tests as documentation only works when the test code is clean, which generally requires reviews to ensure it happens consistently.

Incorrect perceptions regarding reviewing test code

Given all these compelling arguments for reviewing unit tests, why have I not reviewed tests as much in the past? The 'excuses' I have used seem to be due to incorrect perceptions on my part.

As an architect, I often do what I call architect-level reviews, focusing on the overall structure of the production code and non-functional attributes. I am often not familiar with the detailed functional requirements the code is trying to satisfy. So I don't bother looking at the details of the test code, since I usually can't confirm if the expected results are actually what the requirements specify. But wait - if I am able to still effectively review the production code, then I can perform the same type of review for the test code.

I focus first on production code due to my underlying belief that correct and clean production code is more important than test code. In one sense, this is true. But there are other viewpoints. One view is that if it isn't tested, it doesn't work. Another view is that the long-term vitality of the production code over multiple release depends on a solid, well-written suite of automated tests.

Related to the prior error in perception is the issue of schedule and/or budget constraints. Given limited time or effort, where should I allocate my time for a review? If my priority is production code, then I focus on that and neglect the test code. But this very question contains a flawed assumption: it assumes I should sacrifice the thoroughness of the review for other constraints. But the defects I fail to detect as a result of this sacrifice will likely get caught and fixed later at much greater expense. Reviews are one of the most effective means of defect removal - they are usually twice as effective as most kinds of testing at finding defects and are half as costly to fix the defects that are found.

Conclusion

While I hope this article helps convince you of the importance of reviewing test code, I wrote it just as much for myself to reinforce the importance of doing these reviews. Here are some pointers for getting started:

  • Use a code coverage tool like Cobertura to review how well the tests cover the production code.
  • Review the tests first, before reviewing the production code.
  • Use the following questions as a guide to reviewing test code:
    • Are the requirements fully tested? Are there any omissions of functionality?
    • Are the tests weak or fragile?
    • Are the tests readable and the code clean (e.g. no duplication)?

If you find this article helpful, please make a donation.

2 Comments on “Reviewing Automated Test Code”

  1. James Mills says:

    Awesome article. Absolutely hit the nail on the head.

  2. Thanks for the comment. (You were the respected colleague, so thanks for your guidance.)

«    »