Designing Useful Metrics: Using Observation, Modeling, and Measurement to Make Decisions

Originally published in STQE magazine, May/June 2000

As a manager, you want to increase effectiveness and improve the quality of software. Using measurement as a tool for accomplishing this, however, may be something you’re skeptical about. I’d like to encourage you to take another look at metrics, and show you how you can use observation, modeling and measurement to manage more effectively within your team.

If you’d like to see the figures associated with this article, read it on Esther’s site:

As a manager, you want to increase effectiveness and improve the quality of software. Using measurement as a tool for accomplishing this, however, may be something you’re skeptical about. Why? Perhaps you don’t have an established measurement culture, and implementing a measurement seems awfully daunting. Or maybe you’ve been around a measurement program that didn’t quite work as planned. I’d like to encourage you to take another look at metrics, and show you how you can use observation, modeling and measurement to manage more effectively within your team.

What makes a measurement useful? Here’s my definition: a useful measurement is one that helps you understand and make decisions. The cost of gathering the information can’t exceed the benefit it provides, and the measurement shouldn’t have lots of unintended side effects.

Let’s work from the basic premise that there’s a continuum of measurement. On one end is measurement to learn about what’s going on (to get information), and on the other is measurement to modify behavior. And there are lots of points in between that have some characteristics of both. (For more discussion on this read Tom DeMarco’s essay, “Mad About Measurement,” in his book Why Does Software Cost So Much?)

I might as well confess my bias up front: I’m not a fan of software measurement programs that aim to motivate certain behavior by tying metric results to performance evaluations or compensation. There’s a bunch of problems with this type of program, starting with the fact that the people being measured often figure out how to nudge the numbers to make themselves look better, or they neglect some important aspect of what they’re doing because it’s not being measured. This is not merely human nature — it’s a flawed measurement design.

I want to look instead at the other end of the scale: measurement to gather information so that you understand the current situation, and can make good decisions.

This isn’t part of a formal “official” measurement program. It’s the kind of observation, modeling, and simple data gathering that an experimenter does to learn something about the world around her. Measuring to obtain information assumes that you do not have a preconceived outcome in mind — you’re open to whatever the data tells you. The key is that you are looking to learn more about the environment so that you can manage more effectively.

Jerry Weinberg, in his series of books Quality Software Management calls this kind of metric first-order measurement: the sort of informal, “just enough” measurement that’s used to get a system working (versus second-order measurements, which are used to fine-tune a system). To illustrate the difference, consider two projects. Project A produced 200 lines of code (LOC) per programmer day. Project B cranked out 300 LOC. If you focus on LOC, Project B did better. But LOC is a second-order measure: it’s about tuning performance. First-order measurement is noticing that even though Project B produced more clean code, the project was cancelled after dragging on for six months after its original delivery date. Now you and I wouldn’t pronounce Project B more successful based solely on LOC, but it does make the point: Start with first-order measurements and the basics of getting the system working, and then worry about tuning it.

Measuring within a system

When you set out to gather information it helps to have some notion of how the system you are setting out to measure works. All measurement exists within a system, and can only be understood within that system. Designing, building, testing, and supporting software involves complex technical and human systems — and, like any complex systems, they are made up of multiple factors, any one of which might influence any other. Rarely are the relationships linear, or straightforward cause and effect.

You may not be an expert on modeling systems, but you probably are an expert on the system you work withyou just need a clear picture of it and some evidence of how it’s working.

Let’s examine an example of such a system, and how some simple guidelines can help you use measurement to your organization’s advantage.

Suppose for a moment that you work for Software Planet, a company that sells software, and that you manage the customer support function. You have a team of technical support representatives who take calls from customers who have questions or problems with the software. The agents may resolve the customer’s needs with a short answer, or they may work through a complicated problem with the customer on the phone.

One day your manager comes in and tells you that customers have been complaining about the length of time they spend on calls with your technical support representatives. Some customers have even cancelled contracts and gone to the competition.

Straight cause and effect would indicate that long calls cause decreased customer satisfaction. Simple, right? Reduce the length of the call, and satisfaction will go up. Figure 1 illustrates that model: When the length of call goes down, customer satisfaction with support services goes up. It’s an inverse relationship; when one value goes up, the other goes downor vice versa.

Figure 1. As the length of the call goes up, customer satisfaction goes down. (The dot with bi-directional arrows indicates an inverse relationship: as one factor increases, the other decreases.)

The phone system automatically tracks the length of calls, which you could use as a measure — but is it a useful measure? Will measuring the duration of calls help you understand what is really happening or help you decide how to go about increasing customer satisfaction?

First-order measurement can help you understand what’s going on and make a good decision, using the following steps.

First, model the system

You know that Software Planet’s system is more complex than what’s shown in Figure 1. Before you set out to drive down call duration, consider what other factors could be influencing the length of the calls, and what else might be causing customer dissatisfaction.

You can start by building a more robust model of your system to get some ideas about where to collect data. What factors could be affecting the duration of calls? What other factors might affect customer satisfaction? What other things can you observe about the system?

Maybe the reason customers don’t like being on the phone so long is that they spend half the time on hold while your team tries to get additional resources on the line. Perhaps they spend twenty minutes on the phone and don’t get a solution — but wouldn’t mind being on the phone for twenty minutes if they were getting results. Figure 2 shows how the system looks when you add in other factors that could influence call duration and customer satisfaction.

Figure 2. Call support system model with the addition of other factors that could be influencing call length and custoer satisfaction. (The dot with bi-directional arrows indicate an inverse relationship: as one factore increases, the other decreases.) Involve the team

Involving your team in a discussion about the problem and its possible causes will help build a more accurate model of the system. If you ask the people doing the work they may point you to other factors that are affecting their ability to solve customer problems quickly. Perhaps they can’t find the information they need in product manuals, or the network drops and they have to re-boot during calls. Add these to your model (as in Figure 3). Even though our Software Planet model is now more complex than what we started with (call duration goes down -> customer satisfaction goes up) the systems you work with are probably even more complex.

Figure 3. System model with additional effects suggested by the team

Now that you have a better notion of what the system looks like, you can think about gathering data. But before you start

Create safety

Before you begin collecting data, talk with your team. They need to understand why you are collecting data, and how that data will be used. It’s critical that your team understands that they will not be blamed, ranked, or rated based on the data. Blaming the people for the data will produce measurement distortion — not because of a flaw in human nature, but because of the flawed measurement design. You are depending on the team to collect the data, and you want accurate data. Once you assure your team that you won’t be using the data to evaluate them, be sure you don’t. The first time you do, or your team thinks you do, you will lose a valuable source of information. If your team starts reporting the data in a way designed to make their performance look better, you will be relying on distorted data — and that can be worse than having no data at all.

Convincing people that data is being collected only for information can be a tough sell if employees have been evaluated or penalized by measurement programs in the past. One way to increase employee safety is to arrange the collection so that you (and other managers) only see aggregate data, not the results associated with any one employee.

Gather data

Based on our Software Planet model (Figure 3), here are some things you might want to find out about:

How many separate problems are addressed in the average call?
How often does a solution require more than one call?
How often do representatives have to put off answering a question because they can’t find the information in the documentation or product manual?
How many problems are handed off to some other area within the company?
How long does it take to find the correct person to solve a problem that the agent can’t answer?
How long does the representative who first gets the call spend talking to the customer before she hands it off?
How often does the infrastructure fail during a customer call?

You don’t have to have fancy automated data collection or an elaborate measurement program to do this. Since you’ll only be collecting data for a short period of time — a few days — a paper form and a pencil will do. It will be a little extra effort for the people taking the calls, and will require some effort to tabulate and analyze — but you will collect some useful, accurate (and therefore interesting) data. The benefits of having a better understanding of the problem before you attempt a solution should outweigh these costs.

Act on the data

Once the data is collected, your new information will help you steer the system in a way that will increase customer satisfaction with the service your team provides. The data we have collected for Software Planet, for example, might show that technical service representatives don’t have a good understanding of the system they support, and that they need more training. Or it could show that when reps try to bring additional technical resources into the call it takes twenty minutes to locate and get that person in on the call. Each situation calls for a different remedy. When you consider what action to take, use your system model to anticipate what effects your change will have. You probably won’t identify every unintended consequence — but you will increase your chance of success by thinking things through with the model and giving some thought to what could go wrong.

Check on progress

Now that you’ve done your research and implemented a plan to improve customer satisfaction, you’ll want to collect some data to know whether or not your intervention is working. But you probably won’t want to collect all the data you did in the information — gathering stage. Metrics should have a sunset clause: If there isn’t a good reason to keep collecting the data, stop.

What measurements should you continue? Customer satisfaction is worth measuring, but it’s a lagging measure — you may know there are problems only when customer satisfaction goes down. As a manager you want to know sooner than that, so that you can correct problems before they impact the customer. When you designed your intervention, you targeted some key elements in the system for change, with the goal of improved customer satisfaction. Gather data on how that change affects both the changed elements and customer satisfaction.

Suppose the intervention involved decreasing the wait between (a) when a technical rep identifies that the call demands additional expertise and (b) getting that expert person on the phone. Watch to see if that measure decreases. Once you are convinced that your intervention is working, continue to check on this measure. You may not track it all the time; you may just sample the data intermittently, as scientists do. If the number moves it’s time to observe, model the system, and gather data. Find out if there is some new factor in the environment that could be causing the number to move. Figure out if the metric is telling you about a trend, or if there was an isolated event or situation that caused the movement. If the indicator still holds, make a correction — before customer satisfaction takes a dive.

Keep your eye on the goal

It’s fairly common in organizations for the metric to become the goal, rather than a useful piece of information. The key to keeping measurements useful is to pay attention to the meaning, not just the number — if a number starts going up (or down), it’s time to gather data and tune the system, not just announce that the number needs to move in a particular direction.

This is especially important if the measure you’re using is of some aspect that contributes to the desired result but does not measure the desired result itself. In our Software Planet example, increased customer satisfaction with technical support services is the desired result; shorter calls are one factor that contributes to the result. Lengthy calls were an indicator of a problem — but focusing only on shortening the duration of the calls would not have increased customer satisfaction (and might have actually decreased satisfaction, had technical reps become adept at getting the customer off the phone but not solving the problem).

First-order measurement can help you understand what’s going on, make decisions, and improve results. Observation, modeling, and simple data gathering are things that you can implement in your work group without a big measurement program or big funding. Our Software Planet example looked at software support, but you can apply the same techniques and analysis to software development, software maintenance, testing, or project management. As practice, start by modeling your system and working out on paper how different measures will affect your system. Then involve your team, expand your model, and try some simple data gathering. This approach to measurement is one more tool in your toolkit, and it will move your organization toward better quality.

Designing Useful Metrics: Using Observation, Modeling, and Measurement to Make Decisions

Tags