Entomology

Some words set alarm bells ringing in our heads whenever we hear them, because in the past we've repeatedly found them at the core of some problem or troubled situation. One of my alarm bell words is "bug" - meaning not a member of the order Hemiptera, but the species found in software.

I don't mind people using "bug" in informal conversation. You'll catch me saying "bug" often enough, in situations where I don't have to be precise. It's a different matter if important decisions are riding on what I am saying, because a lot of what matters to a software professional is in one way or another related to "bugs". As a software developer, I have deep misgivings about the word "bug" in professional contexts. I find the word vague and almost entirely without merit in problem-solving. Worse, I have often heard people use the word "bug" solely to assign blame, or avoid responsibility.

Tales From a Troubled Team

I once worked on a vertical software package which my employer had acquired from a company that was going out of business. I was consulted briefly when negotiations opened, and one of the things my boss mentioned was the "bug list" for the product, which, I was told, had 200 items on it.

Some time passed; the negotiations concluded, assisted by a small team of one technician and one domain expert; the acquisition went through. A little later, my boss asked me to take a technical lead position on the project, to "stabilize" the product preparatory to a significant sales effort. When I inquired about the "bug list", I heard that it had been cut down to 5 items.

I didn't believe that a small team working on an unfamiliar product could fix 200 bugs in three weeks. I was puzzled and dug a little deeper. The acquisition team, it turned out, had gone over the list and removed all items which were not an active concern of an existing customer. This cleared things up : a definition of "bug" I'm familiar with is as "a concern about the quality of the product", or again as "a discrepancy between the user's expectations and the actual behavior of the program". If no one is concerned about it, it's not a bug.

Shortly after, in the course of a conversation with my boss, I got puzzled again. I had been wading through the quarter-million lines of C++ code which the product comprised, in order to determine how much effort would have to be spent on "stabilizing" it. I hadn't liked what I saw : spaghetti code and pseudo-object-oriented programming, huge redundancy between modules, and all manners of Bad Things. I was voicing these concerns to my manager.

"Don't worry," he told me, "we know the quality of the code is adequate. After all, we have only 5 known bugs." Now, my manager was a former programmer. To many programmers, and the dictionaries concur with them, the term "bug" means more specifically a flaw or a mistake in the program. In a lot of cases, this means a flaw in their work. Of course under this definition, "no bugs" would mean "no flaws".

I felt, of course, that my concerns should count for something. Under the first definition of "bug" as "a concern about quality", any concern of mine would have been a "bug". My objections were mostly brushed aside on the grounds that "since there are users actually using the product and they aren't complaining, we can assume that the quality is adequate".

This wasn't quite true: one one occasion, we were called to resolve a pressing problem at one of our customer sites. Was there a "bug", my colleague who took the call asked? The user didn't know. All he knew was that when he started the application, it went half-way through its initialization phase then crashed. We went on-site to investigate. The "bug", it turned out, was a corrupted version of the file to which the application saved the positions of opened windows. Once a good version of the file was restored, everything went back to normal. We couldn't get the application to fail again.

Inspection of the relevant code showed no obvious problems. None, that is, that were obvious to us. My colleagues argued that this was an "intermittent bug", harder to fix since it couldn't be reliably reproduced. I was worried about the potentially large support costs if such incidents should happen on a regular basis, but in the end the decision was to defer the issue to an indeterminate date.

No-one seemed to feel very responsible - perhaps with justification. The image the term "bug" evokes is of small creatures creeping into programs unnoticed, through the cracks of our understanding as it were, an image which tends to absolve the programmer of responsibility for the consequences of "buggy" programs. As long as she has in good faith made an effort to remove all the "bugs" she could find, who can blame her if some remain in the product that is eventually shipped to users ?

What really threw me for a loop, though, was when one of my programmer colleagues announced in a status report that she had been fixing a "bug" I was responsible for. Since this was CCed to my manager as well as the rest of the team, I naturally investigated the issue in some detail.

What I found was this : in the original version of the application, some code was illegally accessing a block of memory which was freed by a previous operation. However, since no memory-related calls were made in between the two, the illegal access had gone undetected. I had recently modified some unrelated part of the application, with the result that a memory operation was now made at the crucial point. The application was now crashing, and it hadn't been before : it could only be "my fault".

What's in a Word?

In the space of one project, many different definitions and uses of "bug" had surfaced:

  • concerns about quality;
  • discrepancies between behaviour and specifications;
  • or between behaviour and expectations;
  • failures of the product during use;
  • programming mistakes;
  • reprehensible behaviour on the part of a developer (me).

We had accepted these various interpretations tacitly, not really recognizing how they evolved over the course of the project, how different people held different interpretations at the same time, or how these interpretations clashed with each other in various ways.

I find myself in full agreement with Von Neumann : "There's no sense in being precise when you don't even know what you're talking about." But that has a corollary : if you do know what you are talking about, being precise can have a payoff.

As in all things, there are tradeoffs involved: excessive precision can have its own risks, such as losing people's attention. One way to think about tradeoffs is in terms of a tradeoff curve, as below : you want to pay attention to tradeoffs if you are close to the tradeoff curve, and you happen to know that any gains resulting from increased precision (fewer mistakes, fewer avoidable conflicts) must be balanced against increased costs (more training for new hires, for instance).

On balance, my own experience is that many teams are not even on the tradeoff curve. They are in the area below, where it is possible to improve the quality of communications without requiring significant additional efforts.

One good way to do improve quality is to spend a little time in the team discussing alternate terms which communicate more clearly than "bug". The literature on software testing suggests several, such as "fault", "failure", "defect", "error", "issue". (Naturally - that is another aspect of the trade-off - the "exact" definitions of these terms are a subject of debate within the software community. I don't want to come out in favor of one set of definitions over another, because that would require a whole new article.) With the terminology come useful models and techniques, such as diagrams of effects to trace the causes of defects, or "issue triage" to avoid wasting time on minor problems when more severe ones require attention.

If you are looking at a stack trace and think you remember a place in the code where you were worrying about being passed a null parameter, you're not "fixing a bug". You're trying to locate a defect, and you have some specific ideas of the root causes. When I'm working in this mode, the kind of phrases I want to hear from my coworkers are "my hypothesis is that", "let's see if we can reproduce that state", "this rules that out, therefore", etc. If found that interactions with coding partners were a lot slower when hampered by vague phrases or imprecise thinking, and a lot faster when supported by a wider range of vocabulary than just the term "bug", and reasoning based on test cases and controlled experiments.

Even if you're using a vague term such as "bug" intentionally, for instance to refer to "quality concerns" in the broadest possible sense of the term, in order to elicit exploration of these concerns, you might want to be vague in a precise manner : that is, make sure your definition of "bug" is explicitly known to the participants in the conversation. Make sure it is understood that no blame or divisiveness is attached to the term.

At the very least, stop to think for a second whenever you're about to use the word "bug" at work, when fixing one, writing a report about one, or thinking you've noticed one in someone's code. Reflect on the possible misinterpretations between you and your coworkers. Take particular note whenever something seems to be amiss, and you will soon spot ways to make your communication more effective.

In the project I mentioned above, many misunderstandings might have been avoided. For instance, calling the initial "bug list" an issue list would have called attention to the fact that all the items on the list were there because a customer had at some point brought an issue to the previous team's attention. With that in mind, I wouldn't have been surprised to hear that some issues had gone "stale" and it was no longer of any value to keep a record of them.

If we'd been familiar with the notion of "defect-prone modules" (and by implication had agreed on a definition of "defect" beforehand), my concerns about the internal quality of the code might not have been dismissed. The terms "fault" and "failure" and "failure conditions" might have been good tools to analyze what was going on at our customer's. A less simplistic model of how code modifications give rise to defects, including some legitimate modifications, might have avoided a lot of blame-throwing.

Words Reveal Cultures

My colleague Phlip, a developer like me, observes wrily : "We all know bugs are a fact of programming life. Our job is to write code, find bugs in it, make a list of them, and remove them one by one." This, of course, is irony. Only when we act as if we believed in "death, taxes and bugs" do we make it come true. (We still can't do much about death and taxes, but we know "bugs" are well within our sphere of control.)

All cultures have their particular blind spots and weaknesses, and the most serious errors we make are those resulting from such blind spots. The only solution I know is to treat cultures themselves, lucidly, as man-made conventions which we have the power to change. When we notice that a part of our culture, such as a popular term, is leading us to error, we can take steps to effect a deliberate change.

Think about how you and others on your team are using the word 'bug'. Discuss it. You'll save yourselves a lot of grief.

Acknowledgements

This article owes much to the efforts of the Shape community, and in particular Bob Lee, James Ward, James Bach, Dave Liebreich, Bill Hamaker, Sue Petersen, Dwayne Phillips, Michael Bolton, Sharon Marsh Roberts, C. Keith Ray, Stephen Norrie, David Bowles, and Jim Batterson. They are a perfect counterexample to the old saw about too many cooks spoiling the broth. Any infelicities left within are entirely mine.

References

See Jerry Weinberg's Quality Software Management,
volume 2, p. 237ff, for a discussion of the terminology of software errors and defects;
volume 4, p. 177, for an example of a tradeoff curve

The Shape community lives at http://www.geraldmweinberg.com/shape.html.

Laurent can be reached at [email protected]. Your feedback is welcome.


Email this article to a friend.


Comments to: