Home | Login | Recent Changes | Search | All Pages | Help

UsingStatisticsToHelp

JonathanSiegel writes "I'm a consulting statistician and information systems developer with 15 years experience. I help people collect and use information to learn about their business. I try to think about my work as using statistics and information systems to help people learn -- help people develop and improve models of their business and their world. There have been times I've been able to do this in a very tangible and successful way, and I would like to be able to do it more."

I would like to hear stories about both your successful and unsuccessful experiences with the usage of statistics to help people understand their systems and change their models.

SteveSmith 2003.05.01


OK. In about 1984, our online group medical claim payment system (Big Hunk of Mainframe) was turning down too many enhancement requests. Their home-grown disk file access worked, but was brittle to expand.

We (the DBA organization) proposed that they use a commercial DBMS (IDMS) to service new data requirements. The online folks naturally asked us to prove that it wouldn't bring the system to its knees.

We designed a streamlined database for them, and the refrain remained "Prove it!"

We sampled transaction access patterns, traffic patterns, and built a model of proposed data access, with probabilities on each potential DB I/O. We reviewed the probability model with them and gained agreement that the probabilities were correct.

Then, we built a stand-alone program that attached a variable number of concurrent threads, each with a different random seed. Each thread would "roll the dice" and perform database I/Os according to probabilities, sleep a while and repeat randomly again. We also attached a background thread that sponged-up CPU resources. By varying the number of worker threads and delay window, we were able to demonstrate concurrent DBMS load equal to and surpassing the live system and ensure sufficient capacity.

Since system outage on that machine was estimated at $150,000 per hour of down time, people needed some demonstrated comfort to buy into newer technology. We were able to get them out of home-grown disk access. (Of course, we also had to develop dual disk journalling and automated recovery, but that's a different story.)

BobLee 2003.05.01


In 1995 I was hired as a consultant to a major investment bank. They wanted a statistical analysis of their disk caches on the Internet firewalls. The Internet was new at that time and they wanted to know how big disk caches should be for optimal performance.

I made them some nice graphs which showed the tradeoff. The main graph was a curve which showed for each possible value of 'megabytes of storage' what the 'percentage of requests "hit"'. That is how likely the request would be found in the cache.

I found that their disk drives were too big. The drives could be shrunk drastically and the users would never notice the difference. With much smaller drives there would be hardly any loss in the hit rate. Probably performance would increase since each request will have a smaller disk to search.

Management used this result to justify buying BIGGER disks. Since thats what they wanted to do anyway they just paraded my charts to upper management and asked for some money. No one from above looked at what the charts really showed. The manager who hired me were interested in the mathematics and thought all the work I did was 'fun' to learn about but they did not see a use in thinking too hard about their system.

KenEstes 2003.05.01



Updated: Friday, May 2, 2003