BeBuggingForTesters

Mike: have you played with tools that introduce errors into the source code, to see if the tests catch them. Kent Beck, in the prerelease version of his latest book on TestDrivenDevelopment?, mentioned that he ran a java defect-injection tool, and the ONLY line of code that could be changed without breaking a test, was a hard-coded hash function that wasn't really being used. (His intention was to make a more robust hash function later, but later never came.) Would you feel very effective if changing any line of the code under test was detected by at least one of your tests? -- KeithRay

I can't think of a less effective way to test whether one line changed, nor why should I care? I'd leave that to the audit trail of my SCM tool. Building traps to catch edits with check-sums delivers no values I can identify with, unless you're testing the operation of the SCM package. I've been there - if tabs get converted to spaces or vice-versa is it changed? That was one trick we had to soften to avoid nearly 100% replace detected when an editor made tabs on all lines "canonical". [Our algorithm became to compare any run of whitespace except newline as a single blank.] BobLee 2002.04.19

Bob, I'm not sure you understand what I'm trying to say. This isn't checksumming or anything related to checksumming. This is regression testing, and testing the tests by injecting errors into the code under test (say by changing in the code-under-tests an 'x equals y' expression into an 'x not equals y' expression, or changing a loop to to be off-by-one).

If you can inject errors into the code-under-test, and the regression tests don't catch them, then you don't have good enough testing, for some value of good enough. This goes beyong branch coverage and statement covereage in determining the extent and quality of the tests.

In real life, programmers maintaining a project will need to fix bugs, do refactoring, add new features, and so on, and modify the tests to match (though refactoring usually doesn't need tests to be modified). But they could also inject errors, and if there were not sufficient tests, would not immediately know that they inject errors until real users or manual testing exercised the code more aggressively then their current set of tests. Defect-injection to tests the tests can give the team confidence that they have enough tests to catch the one-character or one-line problems that sometimes occur in refactoring or other changes.

So my question to Mike was "how would you feel if a tool gave you an objective measure of the extant and quality of your tests?"

KeithRay 2002.04.20

Wow. I think you just expanded the scope of the project by nearly an order of magnitude. If your regression tests can catch deliberate off-by-one errors, you're fighting the tools battle at a more difficult level. Java and other interpretive languages will catch off-by-one if they exceed bounds, as will other intrusive test tools [BoundsChecker?, etc.] The design of injected bugs must be an interesting science: How do you develop an actual bug to inject that will afford an error but not trigger a crash before you can evaluate?

Is the framework of the tests valuable enough to warrant that effort? What value does it add vs. what costs does it incur? In the end, the results matter, not how the program chooses to get there. A benign but irrelevant piece of code may be indetectable via testing, but might be noticed in inspections. Example: replicate a line of the form: "x = y;" the double-up is benign but useless. Would I test for it? No. I'd notice it next visit, and the nanoseconds lost per century would be cheaper than testing cycles to find it.

If my application is life critical (pacemaker controller) I would add inspection and testcase effort. I'd insist on very clean refactoring and understandability so my inspections would produce the maximum benefit. I would also exploit the SCM tool's version DIF logs for questionable modules. I think that humans are better at intent consistency checking than a test frame. I would expect those tests to be more black box.

I'm certain that no testing mechanism can discover a mistake within a comment line.

BobLee 2002.04.20

I couldn't find it using , but looking the the TestDrivenDesign? book-draft identified the tool I'm talking about.

Another way of evaluating test quality is defect insertion. The idea is simple?change the meaning of a line of code and a test should break. You can do this manually, or with a tool like Jester

When I said 'off-by-one' I was thinking of situations where the underlying 'bounds-checking' code will not find the problem, such as "sum the first ten values" but it actually sums the first 9 values.

KeithRay 2002.04.20

Yes, but a sufficient test framework to catch this is intrusive. To make a correct change to the source code requires reengineering the tests and adjusting them. A benign refactoring would be caught as a change, wouldn't it?

BobLee 2002.04.20

I don't know what you mean by intrusive. The tests may be "black box" or "white box", but the tests don't change.

The tool that tests the tests, Jester, automatically changes one line of the source-under-test, rebuilds it, runs the tests, and records whether the change caused any test to fail (and then restores the source code to its original state). Jester can be left to run all night, changing each and every line of source-under-test.

Jester finds cases where changing the behavior of a line of source code doesn't break any tests -- a more strict evaluation of test quality than just measuring code coverage.

Jester might actually be changing the byte codes of the compiled class files, to avoid working with the source files, but the functionality is the same. Apparently Jester is smart enough not to make changes that can't compile, and it doesn't rename public classes or methods.

"Refactoring" is changing the design without changing the behavior, so pure refactorings should never cause tests to fail. That's why having extensive unit tests is important when doing manual refactoring, because when manually refactoring, it is easy to introduce a defect without realizing it, but running all the tests after each step of a refactoring helps detect that accidental defect injection almost immediately.

KeithRay 2002.04.20

Hey Keith, I think I have a better idea of your use of the term "fault injection". This is different than either bebugging, as I understand it, or "fault injection" as it is used here. The software you describe sounds like a improvement on code coverage tools, but its use assumes a dedicated effort to write structural, aka black box, tests. I have yet to work at a place where code coverage topped 80% as measured by lines touched by tests, so I can't imagine that the fault injection software would do more than lower that percentage considerably. Focused structural tests should easily push the 80% line coverage to 100%, but I have yet to see it done. Again, as Steve noted, the universal "don't have time" rationale is invoked. As a tester, I don't write structural tests, but focus on interfaces (aka integration) and systems. There are many anomalies to be found there, that can't be touched by structural testing. MikeMelendez 2002.04.23

When it comes to increasing code-coverage by tests, there is an alternative that probably doesn't get much discussion. Instead of adding more tests, simplify the code-under-test. If you have 80% coverage by the tests, what would happen if that 20% not currently covered is removed? Would it be possible to remove half of that 20%?

One of my favorite (unofficial) refactorings is "remove dead code".

KeithRay 2002.04.23

I just learned the 'correct' phrase is "mutation testing" and not "defect injection".

KeithRay 2002.04.23

I like the idea of "code-under-test". It's a new idea to me that I would classify under Testability. That's a buzz word that's just hatching locally. By that I mean, developers are wondering, "Testability sounds like a good idea. How do I do that?" As buzzwords do, this one interacts with Fault Injection and the developers saying, "Ah that's what Testability means! As soon as I get some time I'll do some of that." Keep the ideas coming, Keith. MikeMelendez 2002.04.24

I'll mention this - maybe someone is into it. One guy on usenet apparently "proves" program correctness using a formal language suited to that purpose, and then translates that proved program into whatever actual programming language he is using. I don't expect many people do this.

This guy says that TestFirstProgramming seems very similar to this formal technique, in spirit anyway, and TestFirstProgramming doesn't require writing the code twice.

A search of recent messages on comp.software.extreme-programming would turn up these messages, if someone is interested in this.

KeithRay 2002.04.24

I've worked at places that did some research into formal proofs of correctness. The last I'd heard (several years dated now), formal program proofs required programs longer than those being proved correct. This unfortunately leads to an infinite chain of required proofs. Perhaps things have changed? MikeMelendez 2002.04.25

I've heard of people using Zed, translation and formal proof for models and design work, but not for programs.

- BeckyWinant 2002.04.25

I put a short essay about test-first in C++ at This is basically the first draft, so I do plan to clean it up a lot. If something is confusing, or you have other comments, please email me at ckeithray @ attbi.com KeithRay 2002.04.27

Updated: Saturday, April 27, 2002