NeedForProductionKeyProcessArea

In the past two years I have discovered the CMM model for software development. I have never worked in an organization which uses this framework to model the development process, but I enjoy reading the literature for ideas to help me organize my own development and give perspective to the whole organization in which I work. The model, and its popularization by Steve McConnel in "The Software Project Survival Guide", help me in reasoning about the root cause of many organizational and planning problems which I encounter in my work.

However the CMM is not a complete fit to the kind of work which I have been engaged in. I often work at companies which have a production department which is responsible for keeping the software running after the development and coding have been done. This production group will inherit the responsibility for running and monitoring the programs after they have been written. Most studies show that the majority of the cost of using proprietary software will come after the transition to production. The CMM is largely silent on the issues regarding productionalization of software. This means that half the lifetime of the software is not adequately represented in the software development models proposed by the CMM.

I have been toying with the idea of a CMM for production. A framework to help organize the key responsibilities of a production group. The framework should not unduly constrain how the production group performs its tasks or specify a minimal level of competency for each task but should rather invite the organization to examine how these tasks are carried out and expand those processes which do not currently fit the need.

I have only worked in a few kinds of production environments so I can only speak about these sorts of organizations. Some production shops are 'Real Time', that is software must be running continuously. Financial Industries, and their 'Application Service Providers' (ASP), often have trading software which must run during trading hours or large amounts of money will be lost. Many of the large Internet companies (dotcom's) must have their service be available nearly all the time. Additionally financial services have 'Trade Processing' software which are constrained to finish processing before the start of work the next day. Increasingly these applications are viewed as 'Real Time' because they are being run at the limits of their deadlines and regulations are pushing what was traditionally an after hours processing operation into the main trading software.

I while I have been able to extract key tasks involved in production management I am disappointed in my organizational skills. I have been unable to derive a general framework to organize these tasks into related skill sets. The net result is a long list of bullet points which are hard to understand and remember. If after reading this list you have ideas which could help categorize and summarize this mountain of information I would be delighted to hear about it.

Production has traditionally been left out of the design and testing process. This means that there are often not provisions for production to monitor new software. The physical components structure should be designed to mimic the logical components of the system. This will allow individual components to be upgraded, patched, restarted, monitored separately. Too often when the production team discovers that there is a problem there is no way for them isolate the component which is causing trouble or to under take simple maintenance without developers help. Clear errors messages in the event of an environmental problem like hardware or network failures would be beneficial. The organization and physical location of the configuration files id often of great concern to the production group but is ignored as and issue by the developers. Additional systems maintenance features like the ability to turn off features which are broken or see live system performance on each module or the physical design of the configuration system, must be designed into the system from the start and must involve the production group in all decisions.

Too often developers make their programming job easier at the expense of the production department. Without financial monitoring of the maintenance cost of individual programs it becomes difficult to get production related changes into the software development life cycle. Too often fragile code is released which needs constant babysitting and monitoring because the development organization does not feel the costs of on going maintenance, they only are concerned about getting marketing features into the next release before the deadlines. These costs are the largest part of the software development life cycle are are often ignored. Without clear accounting productions needs are ignored.

At one job the developers decided that self configuring the software was too difficult to write in the time that management had given them to deploy the software. They wrote a beautiful front end to allow end users to configure the software as they needed it. This front end was not actually connected to the software the users thought they were configuring, instead each configuration edit generated an email which was sent to the production group. The production group had to retype each email by hand. The expected load was several thousand emails a month. Clearly this design should never have been implemented.

Production is the group where the rubber meets the road. At no other point in the company can the load on the system (CPU, Disk, Network, ETC, usage) and the characteristics of the input (user behavior, quality of market feeds, quality of network connection) be accurately measured. Too often there is no feedback loop back to the development organization with this vital data. Production is left with mysterious capacity limits which are exceeded seemingly at random. Developers design their code without knowing the real input data (ratio of database queries to database inserts, number of users on a CPU, amount of network congestion between production Database and Web machines) thus the code is not properly fitted to its task.

At my last job Production implemented a series of byzintine rules to add capacity to the system on an as needed basis. While the rules had great intuitive appeal, this system could not be made to work. Too many of the inputs were arbitrary so that it took 'skill', othwise known as hindsight, to determine when to add new machines to production. The rules were based on gross system characteristics which were easily availible to production. They should have been derived from an integrated view of the internal implementation, the developers should have allowed internal datastructures and database operations to be monitored for design bottle necks. The system often had spikes in usage caused unpredicatable and nonlinear capacity problems. This meant that the simple linear rules would often underrepresent the real capacity needs even under the most liberal interpretation. Real data was sorely needed to plan effectively.

Too often production is forced to maintain any code which development produces. Over time there are many different schools of thought in the development organization as different managers and different developers work using their favorite methodologies. To the development organization each project is unique and is an opportunity for new tools and techniques. The development management organization often allows each project to be run with only the necessary communication between development groups. This is in stark contrast to the view of production. To production there is one very large complex system. Monitoring and architectural considerations need to be made in a uniform manner to allow a consistent and maintainable view of the whole operation. Production must help the developers see which parts of their project need to be uniform and ensure that the new code fits into the nessary systems for monitoring and maintenance. Production has a more holistic view of all the disparate programs in use in the company and they need to find ways to communicate this view to the organizations which only see a few individual parts.

Release and rollback of the production code is closely related to other version control issues. However there are some differences to consider when working with binary files. It is not possible to checkout an arbitrary state of the system, Some upgrades must be done together some code can not be rolled back from. Not all related files are found close together. OS issues often require certain types of files to live in special directories (for starting on reboot or to allow the files to be exported to the network).

Just as developers review their code production should regularly review of the configuration files. Since production will be editing and adjusting a set of text files they must regularly review the cumulated effects of these changes and ensure that the results are still readable and understandable. Too often many small changes to these files result in a large maintenance debt which must be payed during a system crisis. Too many times quick but dangerous fixes are made to these files during a crisis and the configurations are not reversed when the crisis ends. These little time bombs are only uncovered during the next crisis. Once our production group accidentally rolled back the software the thought they had just deployed. The software caused some problems and while trying to clean up the problems an accidental rollback occured. The lack of upgrade was not discovered for several days.

Production should proactively Monitor recommended OS Patches, Security Upgrades for all third party software. Few organizations currently track vendor patches and this is a large risk. Security problems are discovered regularly and it is always possible that new bugs in software will corrupt data and damage files. Only by assigning people to monitor vendors errata annoucements can an organization protect itsself from major known risks in third party software. The Managment of the production organization needs to have a clear idea what their time to deploy these 'bugfix' upgrades are so that they can determine if this is an acceptable risk.

Clearly define what productions main responsibilities are. In the event of a catastrophe this can serve as a guide to what parts of the system are most critical, which customers need immediate attention and who needs to be informed of the problems. Without clear and documented goals the individuals fixing the problems will have different ideas of what is important and work at cross purposes.

Production is often the first place that the lack of communication between various IT groups can be clearly seen. The Network, Security, QA group all have influence on the development process but too often any single group can be ignored by the developers. Production need to help facilitate the communication between these groups. This is easy because production can see when the inconsistancies and problems caused by the lack of communication.

Production needs its own change control system to manage changes to the production environment. This is a complementary task to development change control. Production needs to ensure that it can adequately prepare for the new releases and that the releases conform to any standards which are in place. It is a good idea to Separate those upgrades which will occur frequently from those upgrades which have large risks.

I have prepared a detailed checklist of 'good practices' from my work on Unix systems. If you are interested in the complete list please let me know.

Who wrote this? --KeithRay

I wrote it (KenEstes). I wrote it AFTER I had tried to abstract out a list of the issues which I thought that a production group should be working on (see the links on my guest book page for both articles). When I tried to get feedback on this list of skills, I found that most people did not find a big list of abstract ideas nearly as interesting as I did. I tried to justify the whole idea by telling a few stories from my experiances. Do you like it?

--Ken Estes

Ken, I just came across this - a year late. I am interested in any findings you've made since then. Can you tell me where you are on this a year later?

Just read your comments on the Satir Change model page. Will get back to you on this.

- BeckyWinant 2002.10.21

Ken, Have you been following the Software Maintenance thread on Jerry's SHAPE forum lately? (It started up on 2002.10.15) I think you might care to contribute on that thread.

--BobLee 2002.10.21

Thanks for the heads up. I got behind in Shape in early spring and never got caught up. I intend to read it all some day soon. But I guess I will have to start working on this now.

---KenEstes 2002.10.21

Ken,

Here's a few thoughts from someone who isn't a production person, but spends time up front trying to help clients get things off to as good a start as possible. And, I did spent quite a few years as a developer back when development and testing was part of the same job description:

- Not only is production left out of the design and test process, it often is left out of an explict budget process!

- In an environment where production doing their job well means the customers are happy, the lessons learned in that area would be gold to the folks doing the next version or variation on a given product.

- Bad design from my what I know and have read, tends to be more a by product of poor management or bad politics, not poor developers wanting to make production people squirm. Most software developers really want to do the best job possible, and many developers actually understand good principles of design - they just don't always get to apply them.

- Since you cite good design principles that have been continually studied and defined and discussed since the 70s (perhaps earlier, but I wasn't there). I'd have to ask about cultures that don't value good design. So, good design may only be found on an individual basis. ( I have run into a developer or two who has done really bad design, but no-one seemed to care).

- Developers aren't always encouraged to design to long term care issues. This is sad. Sometimes contracts or marketing windows (and the politics thereof) look to the short term. Some companies actually bid on the maintenance contracts separately, and what do you think that does to motivate developers?

- I agree that production needs a holistic view. Heaven knows that someone does! I've seen that people only face this issue in a crisis. Then, everyone is agast - as if it just happened.

- Your observations and desires to improve production practice seem reasonable, desireable and, well, bound to deliver a cost saving - but perhaps not soon enough to short-term thinking companies.

BeckyWinant 2002.10.22

It's the short term thinking that gets to us again and again. The issues that give the production people grief are the same ones that create technical debt, that reduce / eliminate testing time, and so on.

Perhaps a production key process area would help those companies which want or need certification.

SherryHeinze 2002.10.22

> Ken, I just came across this - a year late. I am interested in any
> findings you've made since then. Can you tell me where you are on
> this a year later?

Its not really clear where I am. These kinds of work organization ideas (CMM, McConnel) still are facinating to me and I believe they are important on some level. However after much discussion with people in the extended AYE community I have not heard much which would encourage this sort of view. I have noticed that in my own life I hardly ever follow the checklists I make for myself. Just this summer I had a list of goals, some concrete and some emotional, to accomplish on a vacation. During the trip, I often looked at but chose not to follow this list. If I can so easily dismiss a list when it contained goals I agreed to and were not burryied with some deep emotional significance, how can I expcet such lists to be useful in more complex environments?

The best advice I have recived so far came from Jerry who one year ago told me "Good managment will create good processes not vice versa." In this form it is easy to agree with. What would you rather have good managers who make good decisions but often do the same things differently or some book written by the smartest people there are but implemented by a bunch of "average" managers. Clearly no matter how smart the book is, the devil is in the implementation. It would be better to have smart people who evolve the parts of the system which can be adapted to become more regular and predictable over time. Then to have the alternative, a bunch of not carefully thinking people shouting "but the procuedure says".

Oh, and I am off to the library this week to get some articles recommended in Dwayne Philips book. I am particularly intrigued by the idea of "Design by Walking Around". This technique describes a set of documents which look at a design from several (five?) different angles.

I am also very interested in issues about Group Facilitation (hence my InternationalAssociationOfFacilitators page) and Agile (XP, JAD) methodologies. I am thinking that part of the big trouble here is to get consensus from below. These seem to be ways of helping to do that.

--KenEstes 2002.10.23

Ken,

These kinds of work organization ideas (CMM, McConnel) still are facinating to me and I believe they are important on some level. However after much discussion with people in the extended AYE community I have not heard much which would encourage this sort of view.

For one, I believe the issues are important at every level. I think you might get more feedback from a group like SHAPE (which BobLee also suggested) since their interactions shape the SHAPE forum. Not all AYE participants join in the AYE wiki, so it may be a bit more catch as catch can. I do not know how much you talked with people in private email or at the conference itself.

My experience supports Jerry's advice. Having been a manager and executive and developer/tester in my career, I've seen that the majority of processes (good or bad) are supported, or not, by management. Grass roots initiatives can be very difficult in cultures where the initiative conflicts with management desires or goals.

Dwayne Phillips book sounds interesting indeed. I may do some library searching myself :)

I am thinking that part of the big trouble here is to get consensus from below. These seem to be ways of helping to do that. Ask Jerry about the engineering model, the newtonian model, the learning model and the one-person-at-a -time model. You'll gain insight into why any method, process, or change initiative may or may not, will or won't work.

- BeckyWinant 2002.10.23

Updated: Wednesday, October 23, 2002