The myth of validating what your application should NOT do

April 24, 2020

It is usually after a disaster (small or big) that people start proclaiming:

The application should NOT crash!
The application should NOT block waiting for that downstream service!
The application should NOT allow unauthorised access!

And similar statements describing what the application should NOT do instead of what it should.

After such a proclamation and since the statement, or at least the intention behind the statement is reasonable, people are left trying to figure out how to do that.

In this blog post, I would like to share my view on the futility of negative validation that may end up harming rather than benefiting your software delivery capability.

Deduction vs Induction

First, some philosophy:

Deduction is the process of reaching a specific conclusion based on general and absolute truths. Example: all employees of “ACME Labs” work from home this week, Anna works for “ACME Labs”, thus Anna works from home this week.
Induction is the process of taking many particular instances and trying to figure out the general truths. Example: All the cars that I’ve observed in my country drive on the left side, thus in my country everyone is driving on the left side.

The catch is that deduction is logically certain but induction is generally uncertain. Example: All swans ever observed are white, thus all swans are white[1]. Here we see how an induction process can lead us to the wrong conclusion.

The kingdom of deduction is Mathematics, where axioms provide the base to deduct (“prove”) theorems and then you can use those theorems to further prove other theorems. Unfortunately for us humans, reality is not obligated to reveal its laws to us. So, we are left to piece together the general truths using our powers of induction.

Software testing

Automated software testing is an inductive process. We are putting the application (or part of it) under a test harness and we are passing some input so that we can assert that the application is doing the right thing under the given conditions. By running many tests under different conditions, we can conclude that the application will also do the right thing in production! The more tests we have the more confident we can be about our conclusions.

In those tests, we can have both positive and negative assertions. Making sure that what we wanted to happen actually happened and that what we wanted to avoid did not happen. We can have a ton of assertions ensuring that every aspect of the system behaves as expected! Problem solved, right?

Here is the catch, even though our stakeholders used the word “NOT”, what they really meant was “NEVER”!

It not about how precisely you can assert a given state but rather on how many states you assert.

It is not about NOT, it is about NEVER!

Let’s dig a bit deeper into the intention behind the earlier proclamations:

The application should NEVER crash!
The application should NEVER block waiting for that downstream service!
The application should NEVER allow unauthorised access!

Now probably you can spot the problem, it is not about the number of assertions but rather the number of different conditions under which we need to test the application to ensure that it always behaves correctly.

Each test is asserting that the application is behaving correctly only under a very specific set of conditions. How can we draw a conclusion for all possible conditions? Depending on the domain, the number of conditions can vary from unfathomably large to infinite.

Deciding that a requirement ALWAYS holds, using an inductive process such as automated testing is just impossible.

In critical software, that must always behave as expected, formal verification methods[2] are used in addition to testing.

Trying to achieve such an outcome can have negative consequences:

The more tests the slower your pipelines are going to be, creating a bottleneck in your development process.
As you strive to test under more and more exotic conditions the probability of creating flaky tests increases.
Software is always in a state of change, if for even small changes many tests need to be updated then your delivery performance is going to be affected.

Let the impact guide you

As I mentioned in the beginning, it is usually after a disaster that people react by stating in absolute terms what the software cannot do. But, we need to take a step back and figure another way to avoid disasters.

The size of the disaster depends on its impact. The bigger the business impact the bigger the disaster. Not all business processes supported by the application have the same value. Actually, the value probably follows a power law distribution with a few processes capturing most of the value, and many more with only a little value.

A better approach would be to identify those process with the most value so that those scenarios can be more thoroughly tested.

The other important aspect of a disaster is its duration. Modern software development strategies focus more on improving the “Mean Time To Repair” over the “Mean Time Between Failures”.

Scaling groups, automatic health checks, fast pipelines are some examples of tools and practices which are reducing the duration of outages and sometimes even automate the recovery process.

Deduction vs Induction

Software testing

It is not about NOT, it is about NEVER!

Let the impact guide you

Links