Monday, 13 July 2015

The test first attitude

Problem statement:

I want to ensure a number is between the range of 5 and 9, inclusive.

Thinking in Test first. 

I can produce 5 test cases for this very simple problem.

A test of a number less than 5. 2. Returns false
A test of the left bound. 5. Returns true
A test of a number in the middle of the range. 7. Returns true.
A test of the right bound. 9. Returns true
A test of a number greater than 9. 15. returns false

Taking a step back, we want to be really thorough we could write the following tests as well.
A test just outside the left bound. 4. False
A test just inside the left bound. 6. True

A test just inside the right bound. 8. True
A test just outside the right bound. 10. False

I define an interface to the method.

public boolean ensureNumberIsIn5To9Range(int numberToTest);

First I code up the 9 tests against the specified interface.

Execute all tests. They fail.

Now I start the solution code... I'm done when all tests pass.

I come up with a pretty standard if statement, using logical And and two return statements. The tests all pass.

However now I realise I could make this more concise and prettier for the reader. So I refactor the code to just use logical AND and simply return what it evaluated to the caller. Tests all pass.

Happy days.

Now I check the code coverage, just to be sure it meets targets. 100%. I'm done. Check the whole lot in.

Thinking in solution first approach

I'm going to promise to do some tests to get us code coverage, with a code coverage metric target of 80%. Management love code with a coverage of 80%.

I will just code up a if statement, with two boundary clauses anded together. Simple. Will then write a test that triggers the boundary clause. And another test that doesn't trigger the boundary condition so happy out.

Next I'll have a think about the interface, it's gonna be pretty simple in this case.

Open up my IDE. Fire in the interface, the algorithm and now I'll think about tests.

So looking at my code, I can see the branch (if statement). The first test I write, just takes any number outside the boundary condition. So I trigger the method with 15, expecting false. I run it, its passes and I get 75% code coverage. Wow I'm nearly there with just one test!

If I trigger the boundary condition on the if statement, then I can increase this figure. So I'll write one more test. This time I'm going to pass 7. Right in the middle of the boundary condition, I am really expecting it to pass.

I execute it. It passes. 100%!!!!!!! Happy days. Check the whole lot in. Home time.

But wait...
Did you spot the bug in the solution first approach? The upper boundary has been incorrectly coded... values of 10 will return true. It's a good thing this code wasn't used to control an auto mobile safety system, an aircraft or a train!

So I've got 100% coverage, but I'm still leaking bugs. Why?

The set of useful unit tests

We explored the set of useful tests in another article. In this simple example we see the set of useful tests consists of 9 tests. I should implement these 9 tests to ensure correctness of the code. The set of tests that gives 100% code coverage is just 2 tests. This is a significant minority of the tests I actually need to ensure correctness. Hence when reality throws in something we didn't test for, we find bugs, although I have 100% code coverage. You can see there is a large scope for bugs, even in this simple application when you test for coverage.

Spending time writing more automated tests around your top 20% of all uses cases that your users use, will give you a much greater bug count reduction in future releases. Spending any time increasing code coverage when you haven't got that 20% of your code base well tested, is waste.

Tuesday, 7 July 2015

Defining a good Unit Test suite

I measure test suites under 6 main criteria. The criteria are pretty hard and fast and there are key indicators to measure them. There is also an AND relationship between them. So if you can tick 5 out of the 6, the other one should be addressed.

  1. Trust. Tests pass when the component is ok.
  2. Comprehensiveness Majority of the ways of use for the component are covered by the tests
  3. Correct level of abstraction. Tests should be written to a stable, well defined interface. Unit tests faciliate refactoring.
  4. Language Tests should match the language of the problem
  5. Reliability. Tests fail only when the code is not ok.
  6. Independent Tests should be independent of other tests, methods and classes, in a pragmatic way. ie each test should only use methods that are "well used" in the public domain. This does not include data driven approaches.

What defines a useful test suite is:

  1. A developer can be pretty sure, once the unit test suite passes, that no other functional issues will be found. We are happy to release the product after the automated suite passes.
  2. Majority of the problems are found at the unit-test level. For this our Fault Slip-through Analysis of our bugs indicates that the majority of bugs are found in the right level of test.
  3. The unit tests are a vital tool to help refactoring. I can do multiple run-test - refactor - run-test cycles, without making a change to the tests. The unit-tests are written towards the "thing" wrapped in an interface, and not just any method or any class.
  4. They reflect the language of the problem definition re using terminology the customer used. Ideally Customers should be able to understand the tests.
  5. When a test case fails, it points to an actual problem in the component
  6. Test cases shouldn't change when we change or extend the system, therefore I can trust them. Test cases are the guarantee that what worked yesterday, still works today. If we have common methods and utility classes referenced in our tests, that are changed as the system grows then we cannot depend on our tests. In other words, if I change my test code, who will test my tests?

What "smells" to measure that a unit test suite is useless:

  1. Developers don't trust the unit tests to verify the component. This means, more or less, that a developer isn't that confident to release the component based on unit test alone. We require a manual test before we are confident to release the product.
  2. Majority of problems are being found in later stages of testing. Our fault slip through Analysis is showing large numbers of bugs appearing in later phases of test, that could have been found in earlier phases.
  3. The unit tests are written at too low a level and now hinder refactoring. I change the internals of a component and several unit tests no longer compile, never mind that the don't run. Every method of every class has at least one unit test associated with it. Worse still, methods that should be private are public to enable testing!
  4. They reflect the terminology of the code - we see language of the solution in the tests. For example things like factories or other design patterns start appearing in the tests.
  5. Test cases regularly fail at random times during various runs. Failures are "false" because they were caused by some environmental or platform problem. For example a database service we needed wasn't started or the disk was full.
  6. All my tests depend on a test utility method I wrote a good while back and this utility method needs to be regularly changed when we add new features. Most times I add new tests, I have to change the utility method, causing a subtle change in all my tests.

Updated 26th May, 2016

Updated 3rd June, 2016

Updated 23rd August, 2016

Updated 4th September, 2017

Updated 26th October, 2017

Wednesday, 1 July 2015

The set of useless unit-tests

This diagram depicts the sets of unit tests that exist.

What's immediately obvious is that tests are either considered useful or useless. There is no common ground. It's obvious that the set of useful unit tests is smaller than the set of useless unit tests, but not by much. 

Also you can see that you only need a fraction of the tests to get maximum code coverage, the set in blue. This set is a fraction of the total test set in either set.

What's not obvious is that the set of useful tests are more difficult to write because they require a good understanding of the problem. While the set of useless unit tests is relatively easy to develop because they require an understanding of the code. For most developers, it's far easier and less hassle to focus on the code, as opposed to the problem we are trying to solve.

If you just mandate developers to write unit tests, without much support from the people who specify the problem - inevitably they will end up writing tests from the set on the right. These tests, while you think you want them, are pure waste. Extra lines of code. At worst you have to maintain them and at best have to be aware of...

If you then mandate your developers to target code coverage, they will focus on the subset in blue, in the set on the right. This isn't the worst thing in the world. It means you will write less wasteful tests... thats' not a bad thing. However they are still waste.

If you have developers who are mature and experienced enough to write their tests in the set on the left, then you are really lucky. They are a very rare resource. The more of this set we complete the better. We will have less bugs in the future.

When you set a coverage target when writing great tests, there is a danger you write the least valuable tests. The 80:20 rule tells us that 20% of our code is run 80% of the time by our users. Or that users need just 20% of a component, 80% of their time. Then thinking of our users, we need to have tests to verify this 20% of the code base works, all the time. In other words you get most value ignoring 80% of the code base. 

So if time is a limiting factor and you are writing good tests, targeting code coverage isn't a good idea because it means that you are writing the least valuable tests.

Fragile Tests