There are 7 posts and 7 comments on this blog, if you cannot find what you intially looked for, use the search above and press 'go'!


NAVIGATION

As we’ve grown over the years (we used to be a 2 person IT team doing less than $300k in revenue - now we’re a 100 person team managing nearly 40 systems doing $200M+ in revenue) we’ve found it necessary, like most IT departments, to go to a more process driven approach (I used to write software and build servers, now I write guidelines and checklists). But, as we grow to writing guidelines for how processes are developed (is there a concept of metaprogramming that can be applied to management?) - we also decided it was time to sit down and put to paper our “philosophy” for IT:

1. The worst line of code in production is worth more than the best line of code in development.

Programmers are always certain that they will get it better the next time. But code in production has the advantage of having been beaten to death by real users, something you can’t recreate for all the unit tests and UAT cycles in the world. So maybe it’s not fast enough, or particularly elegant, but it works. With the new stuff, you have no guarantee.

2. Plan to Fail - Every line of code will have a bug, every test case is incomplete, every deployment will miss a step. Expect failure at every step and build in the time to recover.

Our schedules used to have a cycle of QA testing, followed by UAT testing, followed by Production. What good is that? Every step we do, before and including the production release, should be expected to fail and find problems. So every step should be followed by a certain amount of time to correct and adjust, whether that means changing code, deployment processes, test plans, estimation processes.

3. The most important part of any deployment is the rollback plan.

No matter how well you’ve planned, something may go wrong (after all, who knows how crazy timezones work in Brazil). The only thing you can be sure of is that your systems worked before you touched them, so you better be ready to get back to that state. Being able to go backwards is sometimes more valuable than being able to go forwards.

4. There are no delays. A project will take its alloted time. All delays are failures to properly assess and mitigate risk.

This has always been the most controversial guideline, but over the years, I’ve found very few things that actually affect the duration of a project (and can be seen without the benefit of hindsight) - they are scope,technical implementation and acceptance of risk. After those choices have been made, very little will change how long a project will take. So why do so many projects run over budget and over time? Because most project schedules are based off of a collection of things - a rough estimate of the technical implementation, an estimation of the resource availability, and an estimation of the business’ tolerance for the schedule (”if we don’t get this out the door by Q4, the market will destroy us!”). So an implementation plan is largely immutable, but the project plan is a negotiation.

Strive to be honest about the plan. You can be honest ahead of time, or beg for forgiveness at the end, but no negotiation will ever do anything to affect a project schedule. Cut scope or cut the crap. Anything less leads to death marches (which themselves have no evidentiary support of ever actually making a project shorter).

5. Everyone is responsible for a project’s success, but you are the one responsible for its failure.

When a bug happens (where a code bug, a configuration bug, a resource “bug,” etc) you need to assume its your fault, and therefore within your ability to fix. If you are absolutely sure its not your bug, then prove it. Otherwise, it gets too easy for people to spend weeks handing problems off, until organizations have to assume that certain bugs are unfixable. A bug that is your problem is liberating: aren’t you going to get a lot of credit for fixing it?

6. Code = Configuration = Data = Architecture.

We tend to think of code, configuration files, servers and data to be different things, but they aren’t. They work in concert to produce a running system. Most organizations will closely monitor one or two of these things, and be cavalier with the rest. But, to your customers, a misconfigured server is going to be just as problematic as a poorly implemented algorithm. It must all work, or none of it works. Configuration management is the responsibility of everyone.

7. Make your decisions before you have to make them.

Most decisions are easy enough to see coming - are we going to have to rollback this release? Are we going to have to cut this feature? Am I going to have to hire/fire a programmer? When the decision must finally be made, there may or may not be a lot of time to actually make it, so you don’t want to be spending time floundering. You should be able to build the pseudocode for a decision ahead of time:

if (average workweek > 50 hours && new project start < 2 months) hire new programmer

if (system downtime > 30 minutes && invoicing system is not up) rollback release.

Make your decisions before you have to, and when the time runs short, things won’t be stressful. Make sure everyone knows as many of the criteria for a decision, so everyone can agree and help with data collection.

8. There are no outsiders.

We’re a large organization now - outsource partners, contractors, consultants, business user liaisons, revenue partners. The majority of our internal staff spends its time managing technical relationships. It gets very easy, but very dangerous, to begin an “us vs. them” mentality. But our customers don’t care if we failed or one of our vendors did. If it doesn’t matter to them, then it can’t matter to us.

9. The right tool for the job

Homogeneous environments are typically cheaper, we’d rather make sure everyone has the right tool. The slight extra cost from running both Sun servers and intel servers, java and .net usually pays off when it reduces programmer time. As for training costs - we don’t want too many programming languages out there, but we’ve found it takes much more time to get good with a particular technology (Oracle, JBoss, Rails, JMS, Berkeley DB, etc) than it does to move from Ruby to VB.Net to Java. Now, if we introduced LISP…

10. Everything changes…

When’s the last time you worked on a config.sys file? Everything I’ve said is a good idea, but its written in pencil. We reserve the right to change our minds when we find something better.