Code Coverage, Metrics & Cyclomatic Complexity
Arguably a controversial one this time. How valuable is cyclomatic complexityand code coverage metrics?
These two concepts are not entirely unrelated. As it happens I am a fan of both in the right place, since path coverage calculations ultimately use elements of cyclomatic complexity to calculate the paths through the programme to check each line has been covered. First, a recap.
Code Coverage (aka Test Coverage)
This is the amount of the source code of a programme which is covered by a test suite. There are a few traditional breakdowns of this, including:
- Statement coverage - How much of the [executable] source code of a programme is touched in tests
- Path Coverage - Perhaps the more interesting metric, the number of paths through the programme which are exercised through tests.
There is one crucial thing to note here. You cannot have path coverage without statement coverage! However, you can certainly have statement coverage without path coverage (for example, a statement could call a method, but not all branches within that method are tested, since statement coverage will get to an IF statement, say, and not go further into the nesting - after all, it's hit the IF statement and only one of those is true). If your tool measures code coverage using statement coverage methods, you don't have anywhere near enough confidence that your code doesn't have bugs due to missing tests in the test suite.
The significant measure of test coverage is Path coverage, not statement coverage. You get it all with Path coverage. A lot of commentators have made the sweeping statement that test coverage is useless because of this, but what they're actually saying is Statement coverage is the weaker form of test coverage. In the maths world, regard this as a necessary but not sufficient condition. People also wrongly associate quality with statement coverage and one thing it's not, is a measure of quality. More on the difference between path and statement coverage here.
Paths, Paths and More Paths
Consider the following example C# code.
public bool IsLegalDriver(int age, bool hasLicense, DateTime carTaxExpiry, bool hasInsurance)
{
return ( age > 17 ) && hasLicense
&& ( carTaxExpiry >= DateTime.Today ) && hasInsurance;
}
This code snippet needs 16 different test cases to cover all combinations of:
- Age - Aged under or equal to 17 over 17
- License - Has and hasn't got a license
- Car Tax - Expired or not
- Insurance - Driver insured or not
That is 2^4 combinatorial possibilities. Hence, full coverage is 16 tests. Statement coverage will register only 1. This isn't sufficient to exercise all possibilities, even if some of them are invalid.
Expand the Graph
For those who have written languages and compilers (or at least syntax analysers) in their time, you'll know that statements can effectively be expanded into a syntax tree. In a similar way, the above return statement can be expanded through it's syntax tree and then the introduction of the terminal characters to become a series of subtrees which can be combined into a whole complex tree of possibilities.
To illustrate it, consider the tree from the point of view of the branch (IF) statements, which basically create the following 4 subtrees.
Now, start to combine these as you read the RETURN statement from left to right (bearing in mind the return is based on the AND of these, so the optimised code resolution path looks like):
(optimised tree of the RETURN statement - When one AND is false, entire RETURN statement is false)
However, the tree only has 5 terminal points, right? So why 16 tests? Well, the clue is in the caption. Remember an AND statement only requires one of it's binary inputs to be false for the whole statement to be false. What the above figure of 16 tests gives you are the number of paths needed to test all unoptimised paths, regardless of validity. De-optimising this tree gives us the control flow through the programme.
This time, we have the full gamut of all 16 endpoints, one for each test! As you can see, it's a combination of all IF statement resolutions of TRUE and FALSE. After all, it's the terminal states we're interested in (they are the post-conditions/Assertions). It's tests for the positive and negative paths through the system. Does this mean the previous tree with 5 terminal points is useless? Far from it.
Understanding the Role of Cyclomatic Complexity
You might be asking yourself where this fits in. If not, then you might want to voice that in your head. It's OK, I'll wait
The cyclomatic complexity of the system is the path of control through the application. The most famous measure of cyclomatic complexity is that of McCabe, developed in 1976 (http://en.wikipedia.org/wiki/Cyclomatic_complexity) and in software, this maps to:
C = E - V + 2P
Where:
C = Cyclomatic Complexity
E = Number of branches and lines of a piece of code (control flow)
V = Number of statements
P = Number of programmes, method or functions in the flow. For a single method, this equals 1.
For the previous RETURN statement, expanded in IF statement form, the cyclomatic complexity is:
E = 8 (+1 for the entry point to the method)
V = 6 (all statements + the entry point + exit point RETURN)
P = 1 (a single method)
So C = 9 - 6 + (2 x 1) = 5. Recognise that number? Yes, it's the post-conditions (end points) in the middle graph.
Why are they different?
This may sound daft, but materially they are not! If you look at the number of tests we're running, a lot of them are asserting against the same end result. Specifically, any paths returning FALSE all return FALSE for exactly the same reason. They failed one portion of the AND statement in the return. It doesn't matter if one, two or three subconditions evaluated to false, as effectively, they are the same test assertion (i.e. return FALSE)
What is 100% coverage?
This is where it gets interesting. 100% coverage should be the number of tests required to cover the whole control flow of the programme. However, using the example, people often confuse this with having to cover that return statement with 16 tests and not 5! 16 is the maximum number of tests you'll have to cover. This often matches with exploratory testing techniques, since you have to fill in all combinations of data to determine that there are only 5 relevant execution paths anyway. The 5 is a supremum of the subset of all possible test coverages, that cover the code 100% (or more technically).
Why is that? I'll cover the mathematical treatment of that in a future post, which will also introduce ways to determine the number of tests you actually need. However, in short, it all revolves around the AND statement. Any one of those can allow it to return FALSE, so the internal control flow can just return FALSE without evaluating anything in the AND chain after that point. However, there is only one that allows it to return TRUE. Th is is why you only need to have 5 tests instead of 16.
If you consider all the tests that offer 100% (or above) coverage, you only need to test to the 100% point and that's it (it's the supremum you want, not the maximum). Covering the other evaluations of the AND just duplicate the Assert.IsFalse(...) tests, which is near enough pointless.
Conclusion
I personally find test coverage metrics extremely important. It covers off the number of tests you should have throughout all suites. As you sail through the sea of a development programme, they are the robustness in your regression toolkit. You'll need to use those tools to fix bugs found in your system. When water sttarts to seep into the boat your system is afloat, the lower the coverage, the more holes your bailing bucket has, the less water you can bail out of your canoe and the more likely you are to sink. Appropriate regressions offer a shield against the slings and arrows of outrageous misfortune, you're more likely to find out if shooting Jaws shot a hole in your bucket too.
Coverage metrics add both governance and risk management to a code-base. If someone says to you "Code coverage metrics don't define software quality" I'd agree on the semantics, since it is not software quality, but I'd also argue that indirectly it can very definitely show you where there are holes in your process which are the problems most likely to introduce poor quality software into the business and crucially show them to customers. Hence, where systems have value, don't skimp! Cyclomatic complexity should match the number of tests you have for the main control flow (obviously, add more for exceptions as needed). If they don't, then you're either missing some, or you've likely got duplication.
Happy TDD'ing!