Test Quality

New advances in text analytics make the tech news nearly every week, most prominently IBM Watson, but also more recently AI approaches such as ELMo or BERT. And now it made world news with the pandemic caused by the Covid-19 virus, with the white house requesting help via NLP.

Text Analytics and Natural Language Processing (NLP) deal with all types of automatic processing of texts and is often built on top of machine learning or artificial intelligence approaches. The idea of this article is not to explain how text analytics works, but instead to explain what is possible.

Creating test cases by hand can be a lot of effort. It takes time, and so costs plenty of money. It is estimated that testing on average costs roughly 50% of the project budget.So maybe, we could try and skip it? Well, we still need to test and, among other things, make sure that the system behaves in the way we specified. But maybe we can develop an automatic method for creating tests? And this is the core idea: Why not use the specification to generate the tests?

There are many reasons to do automated testing on the GUI level. Automated tests are fast, repeatable, and (hopefully) provide reliable test results. On the long run, they might be even cheaper than manual testing (the only alternative for GUI testing). Done the right way, you can even integrate them in your build system giving you the final verdict that each build is as you expect it to be.

One Can’t Go Wrong With Test Automation, Right?

All that sounds very promising. However, it’s quite easy to waste all these beautiful advantages of test automation by having a test suite of poor quality:
  • If your automated test suite is fragile and breaks every second time it is executed, testing becomes annoying quickly. And what is the benefit of a test suite that is unreliable? Would you trust the outcome of such a test suite?
  • If your test cases are labor intensive to change, you will slow your development pace. Put in different way: The only way to keep your test suite updated with the same speed as you develop new/changed features is to spend more effort in changing your test suite. And soon, you will ask yourself “Why am I spending so much effort in test automation? What has my automated test suite ever done for me?

Team Foundation Server (TFS) and its software-as-a-service counterpart Microsoft Visual Studio Team Services (VSTS) are widely used application lifecycle management (ALM) and test management tools. They offer many great facilities to create tests, manage test plans, and execute them.  Consequently, many of our clients as well as prospective customers wanted to use our test improvement software Test Scout along with TFS/VSTS to improve the test case quality. So, here is the question that we always face: How do we get the data from a testing tool into the Test Scout? As always in life, there is a straightforward and a fancy solution. Let me show you what I mean.

First: a simple integration

Test Scout is able to process almost any kind of text format. So, integrating test management tools such as TFS/VSTS is quite straightforward: For each test management tool, we created exports, which we imported into the Scout. For HP ALM, for example, we use a simple script to create a database dump containing all currently existing test cases. We then automatically imported and processed this data in Test Scout to evaluate the test case quality. Since Test Scout keeps versions of each import in its database, the history of all test cases is available in Test Scout. Therefore, all features, such as comparing different versions of test cases and historical development of test cases still works out of the box.

Integrating ALM tools via exports

 

For high requirements quality we need quality assurance. In a previous post, I explained why automatic methods cannot replace manual methods. Instead I suggested to combine both worlds. And the ugly truth is, in both system testing and requirements engineering, we need both manual and automatic quality assurance to control requirements quality and test quality. Now you wonder, how? I got you covered. In this brief post, I want to point out how you can combine the two worlds and how you benefit from the combination.

Ok, after I publish this blog post, I will probably get some angry calls from my sales department... Well, truth must be told. There are many crazy defects in requirements, and, as I wrote in my last post, you can detect quite a bunch of them automatically (and you should do so!). When I present our automatic methods for natural language requirement smells or automatic methods for detecting defects in tests to our customers, I'm proud to say that they are usually very excited. Sometimes they are too excited and then this can turn into a problem. What I mean is that I explain all the amazing things that you can detect with tools and suddenly people think that the tool will solve all the problems that they face. Spoiler alert: It doesn't. And because we're a company that is interested in happy customers, I want to briefly summarize all the problems (*that come into my mind) that a tool can't solve. And because I don't want to leave you in despair, I will also suggest some solutions, how I personally would suggest to work on that problem.

Typically, you have your test suite structured in a hierarchic way to keep it organized. The way you structure your system test suite has a considerable impact on how effective and efficient you can use your tests. A good structure of a system test suite supports:

  • maintaining tests when requirements change
  • determine which part of your functionality has been tested, and to which degree (coverage)
  • finding and reusing related tests while creating new tests
  • selecting a set of test cases to execute (test plan)
  • finding the root cause of a defect (debugging)
In my opinion, the first two points are the most important ones, as they touch the core of what system tests should do, namely to ensure that the system fulfills its requirements. Of course each project is different and no matter which structure I choose, I always run into the "tyranny of the dominant decomposition" (i.e. there is no such thing as THE best way to build a hierarchy) in the end. Nevertheless, I have seen a couple of anti-patterns in the past, which only very rarely make sense.

I am a strong advertiser of modern, automatic methods to improve our day to day life. And so I really don't want to check by hand whether my tests and requirements fit my template, or whether my sentences are readable. So quality assurance and defect detection, for example reviews or inspections, should use automation as far as possible. BUT: When I speak to clients, sometimes people get so hooked up by the idea of automatic smell detection, that I need to slow them down. Therefore, this post tries to give a rough overview: What is possible to detect automatically? The answer basically depends on two questions:

  1. How much syntax (or structure) do your artifacts and tests have?
  2. Which language do you use?
In this post I will refer to requirements artifacts here and there, but the answers are pretty much the same for both system tests and requirements.

At Qualicen, it's often my job to check other people's system test cases and tell the team what I think about these tests. So what do I look for? Well, in principle it is simple: After tests are written down, they are "only" executed and maintained. So this is where tests can be bad and I try to spot things that make execution and maintenance harder. For the maintenance, the largest problem here are clones, which we covered in our last blog post. For the test execution, the main problem that you want to avoid is that different testers test different things. This is called ambiguity and comes in many tastes. In this blog post, I want to explain what is structural ambiguity and why it is bad, and this way help you to create better test cases. (Scroll to the summary, if you don't care about the details) ;) Ambiguous test flow The problem for test execution that I want to discuss here, is an ambiguous test flow. This means, that for a single test case, there are multiple paths that a tester can follow when she executes the test. Let’s look at an example. [caption id="attachment_145" align="alignleft" width="810"]A simple, straight-forward natural language system test case. A simple, straight-forward natural language system test case.[/caption]

I recently reviewed a manual test suite of one of our customers. One of the first things I check very early in a review is the number of clones (i.e. duplicated parts of a test suite, usually created by copy and paste). In this recent case, I discovered that nearly 70% of the test suite is duplicated. That means, when I take some arbitrary test step, the chance is 70% that the test step is a 1:1 copy of another step. At the top of the post is a tree map that visualizes the amount of clones I found. Each rectangle represents a test, the more red a rectangle is, the bigger the amount of cloning. In my experience, cloning in test suites is the biggest problem with regard to maintainability of a test suite. Cloning causes considerable costs as the effectiveness of the test suite decreases and the effort for maintenance rockets. In this post I take a closer look on cloning in test suites. I show you an example to illustrate how clones can look like and explain where clones come from. Later, I give you good reasons why you should care about clones in tests and discusss strategies you can employ to avoid or at least deal with clones.