OTTERs and the Theory of Automatic Testgeneration

Creating test cases by hand can be a lot of effort. It takes time, and so costs plenty of money. It is estimated that testing on average costs roughly 50% of the project budget.So maybe, we could try and skip it? Well, we still need to test and, among other things, make sure that the system behaves in the way we specified. But maybe we can develop an automatic method for creating tests? And this is the core idea: Why not use the specification to generate the tests?

 This is by no means an easy task and today we will only scratch the surface of what is possible. So to keep it simple, we restrict ourselves to simple systems that get a couple of booleans as an input and return a single boolean as an output.This could be a setting, where we want to use simple buttons to control a light or something similar. Think of a system with two buttons (A and B) and a light: The light should only be turned on when both buttons are pressed. Someone implements a function that controls the light given the states of the two buttons and it is our task to create tests, that should in some sense verify that the system behaves correctly.

One possible implementation may look like this:

foo(A, B) {
    if (A) {
        if(B) {
            return true;
        } else {
            return false;
        }
    } else {
        return false;
    }
}

The easiest way would be to simply have a test for every possible input and compare it with the expected output. In this case this would be quite simple since there are only four different input combinations. It also would have the nice effect that it would show that the function behaves for all inputs the way it was specified. In practice this is done only in rare occasions, since the number of tests doubles for each parameter, making testing for all possible inputs difficult.Another popular option is to have a test for every path through the program: This means that if there is an if/else condition, both the if and the else block are each executed by at least one test. In this case there are three different paths, and there are two possible testsuites for this function: 

ABfoo(A,B)
000
100
111

and

ABfoo(A,B)
010
100
111

Depending on the complexity of the function this may already help a lot, but let’s make it a bit more difficult. Say we don’t get to look at how the function was actually implemented (maybe because we want to do test driven development and create tests before the code is written). So we have to treat the function as a black box. We can run the function for different inputs but we can’t look inside. We know its behaviour, but we don’t know its inner workings. Testing all possible paths through the program is now no longer possible. Or is it?Well, since this blog post isn’t finished yet, you can probably guess what comes next.

Probing Probable Program Paths

Since we now no longer have access to the code we have to think of something else. So we turn back to our system specification. There are only so many ways one may implement a system (especially a system as easy as the one we are considering), so maybe we can enumerate those implementations and go back to the all-path testing mechanism…Well… About that…While this would work in theory with a system as easy as this one, this approach has an even worse runtime than simply testing all possible inputs (for those curious: the runtime grows superfactorial meaning that every additional parameter increases the runntime by an exponentially larger factor).So we have to be more clever and make some assumptions about how the program might look like:This assumption will be that the program won’t do any unnecessary variable checks, so if we can implement the specification by only looking at the first parameter, the system will not look at the second. What does this mean for our example system?Well there are two possible “simple” implementations of our specification:

foo(A, B) {
    if (A) {
        if(B) {
            return true;
        } else {
            return false;
        }
    } else {
        return false;
    }
}
foo(A, B) {
    if (B) {
        if(A) {
            return true;
        } else {
            return false;
        }
    } else {
        return false;
    }
}

They only differ in the order we check the variables. We can categorize the input into different classes depending on the program path they will take:Program 1:

  • (1,1)
  • (1,0)
  • (0,*) (We don’t care about B since we return after having only tested A)

Program 2:

  • (1,1)
  • (0,1)
  • (*,0) (We don’t care about A since we return after having only tested B)

Covering multiple programs with one testsuite

Already we can see that even though these two implementations are different, they both contain the (1,1) path. So we can see that different implementations may have similar structures. It makes sense that we have a test for (1,1) since we know all reasonable implementations will have a path that is tested by it.If we look at the two programs long enough we can see that with three tests we can fully test both programs:

  • (1,1) First Path in both Programs
  • (1,0) Second/Third Path in the Programs respectively
  • (0,1) Third/Second Path in the Programs respectively

The Otter and Where to Learn More

But here is the kicker: Using a bit of algorithmic magic, we can always compute such a testsuite that coveres all simple implementations without having to enumerate them. This algorithm is calledOpTimal TEstsuites for Requirements (OTTER for short) and you can read all the details here (be warned it gets quite theoretical). What this means is the we can still apply all paths testing even when we don’t know the actual implementation of the program just by assuming that the programmer will not add any unnecessary tests.

Summary

To reiterate, there are different strategies how to test our code. We have seen why testing all input combinations is not a good idea and looked at a better alternative that simply tests all paths through the program. We then tried to apply this same approach to the area of black box testing, where the actual implementation of the program is not known and saw the difficulties with that. Finally, we looked at an assumption that allowed us to use a new algorithm to generate all-paths testsuites even without knowing the actual implementation. This allows us to do more informed and precise black box testing.