Wednesday, December 16, 2009

Is Test Overlap A Necessary Evil?

In a recent blog post titled "The Limitations of TDD", Jolt Awards colleague Andrew Binstock shared some reservations Cédric Beust has about TDD. When a person of extensive experience like Cédric speaks about testing, you pay attention. And I did.

Among the very interesting quotes from Cédric that Andrew has reproduced, the following really struck me:
Another important point is that unit tests are a convenience for *you*, the developer, while functional tests are important for your *users*. When I have limited time, I always give priority to writing functional tests. Your duty is to your users, not to your test coverage tools.

You also bring up another interesting point: overtesting can lead to paralysis. I can imagine reaching a point where you don't want to modify your code because you will have too many tests to update (especially in dynamically typed languages, where you can't use tools that will automate this refactoring for you). The lesson here is to do your best so that your tests don't overlap.
Trust me, as a test-infected developer, I would love to stay in a state of self-delusion and pretend that test-induced paralysis doesn't exist. But that would be a lie: the reality is grimmer than the wonderland of testing I would wish to live in. The reality is that tests both encourage and resist change.

On the one hand, tests encourage and support refactoring: when the behavior of the application should not change but the code needs to be re-organized, tests are a blessing. They give you the courage to dare changing code because of the immediate feedback they give when you've been refactoring a little too aggressively. And this is priceless.

On the other hand, tests resist behavioral changes. Because tests have captured all the nitty-gritty of your application, when comes the time to change its behavior, you will need to invest time to adapt your tests accordingly, and this whether you rework the tests first or not. As Cédric pointed out, in a dynamically typed language, this is immensely painful as development tools are almost useless in assisting you with the required changes. Similarly, if you use mock objects, you are good for going down a deeper Circle of Hell, where more painful and frustrating manual fixes await you.

So, is there any hope out of this love / hate relationship? Knowing that "the only way to go fast is to go well" dumping tests altogether is certainly not an option. Could the solution lies in Cédric's very last words: "do your best so that your tests don't overlap"?

At this point, I don't know yet but I've decided that, as a starting point, I should start to estimate the amount of overlap I'm dealing with in the Erlang game server I'm working on. Interestingly, what I've found could pretty much apply to the vast majority of Java projects I've been previously working on. Maybe it applies to your projects too?

The first thing I've looked at is the testing overlap that exists between two layers of our application:

As you can see, the overlap exists because tests of the upper layer rely on mocks to simulate all the happy paths and most of the unhappy paths of the underlying layer. The overlap is not total because a layer tend to reduce the granularity of the unhappy paths it faces internally in order to expose the upper layer to a limited amount of bad situations to deal with. Hence the limited amount of mocked features in the overlap area.

When applied to a typical vertical slice of our system, it looks like this:


This is not too bad. Until the wind of feature change comes blowing on this mock-based card-house of tests, life is peachy.

Until now, the tests I have been looking at were only unit and database ones. If I add our functional tests on top of the overlap diagram, here is what I get:


Now the application container is also tested, plus we get an insane amount of overlap.

But the amount of overlap is not what I want to discuss first: it's the test coverage profile that I want to look at first. Notice how the functional tests explore less unhappy paths as they exercise deeper application layers. This can be explained simply: some unhappy paths are very hard to reproduce via the reduced set of functionalities exposed at the top level, oftentimes because they require a very specific and complex state to be established beforehand or conditions that could only be met in case of low level failures (loss of networking, for example).

It's obviously out of the question to consider dropping functional tests in order to reduce the testing overlap. As Cédric said, they are the only tests that have a true value for the end user of the system. My experience confirms that you can reach a nearly flawless first-time client integration if your functional tests have a coverage profile that is similar to the one in the last figure above.

The only problem lies in the quality of feedback you get from functional testing: because it's impossible to make the gory details of the errors encountered when exploring unhappy paths surface at the uppermost level, your system must have a solid logging strategy that allows you to precisely track issues, should you decide to code using functional tests as your only safety net.

So are the unit tests overlapped by the functional tests the ones that must go? Cédric again gives the answer: if time is short, it's better to focus on the functional tests. Of course, if you have a battery of unit tests in place, keep them.

But, maybe, just maybe, as you move to your next project, consider writing functional tests firsts? That way you would have built first the tests that truly matter and, if time permits, write unit tests as you implement the features expected by the functional tests.