ARTEFACTKU: continuous deployment

Maecenas mattis, tortor ut posuere aliquam

iam wisi quam lorem vestibulum nec nibh, sollicitudin volutpat at libero litora, non adipiscing. Nul...
Post with SoundCloud

iam wisi quam lorem vestibulum nec nibh, sollicitudin volutpat at libero litora, non adipiscing. Nul...
Consectetur adipisicing elit

iam wisi quam lorem vestibulum nec nibh, sollicitudin volutpat at libero litora, non adipiscing. Nul...
Post With Featured Image

iam 1989 wisi quam lorem vestibulum nec nibh, sollicitudin volutpat at libero litora, non adipiscing...
Elementum mauris aliquam ut

iam wisi quam lorem vestibulum nec nibh, sollicitudin volutpat at libero litora, non adipiscing. Nul...

Showing posts with label continuous deployment. Show all posts

Case Study: Continuous deployment makes releases non-events

08:34:00 0 comments

The following is a case study of one entrepreneur's transition from a traditional development cycle to continuous deployment. Many people still find this idea challenging, even for companies that operate solely on the web. This case presents a further complication: desktop software. Without being able transparently modify the software in situ, is it still possible to deploy on a continuous basis? Read on to find out.

Ash Maurya is the founder of WiredReach, a bootstrapped startup that he has been running for seven years. Recently, he was bitten by the lean startup bug and has started writing about his experiences attempting to apply lean startup and customer development principles. I've previously named his post Achieving Flow in a Lean Startup as one of my favorite blog posts of 2009.

What follows is his own account of the challenges he faced as well as the solutions he adopted, lightly edited for style. If you're interested in contributing a case study for publication here, consider getting started by adding it to the Lean Startup Wiki case study section. -Eric

Of all the Lean Startup techniques, Continuous Deployment is by far the most controversial. Continuous Deployment is a process by which software is released several times throughout the day – in minutes versus days, weeks, or months. Continuous Flow Manufacturing is a Lean technique that boosts productivity by rearranging manufacturing processes so products are built end-to-end, one at a time (using singe-piece flow), versus the more prevalent batch and queue approach.

Continuous Deployment is Continuous Flow applied to software. The goal of both is to eliminate waste. The biggest waste in manufacturing is created from having to transport products from one place to another. The biggest waste in software is created from waiting for software as it moves from one state to another: Waiting to code, waiting to test, waiting to deploy. Reducing or eliminating these waits leads to faster iterations which is the key to success.

My transition to Continuous Deployment
Prior to adopting continuous deployment, I used to release software on a weekly schedule (come rain or shine) which I viewed as pretty agile, disciplined, and aggressive. I identified the must-have code updates on Monday, official code cutoff was on Thursday, and Friday was slated for the big release event. The release process took at least half a day and sometimes the whole day. Dedicating up to 20% of the week on releasing software is incredibly wasteful for a small team. This is not counting the ongoing coordination effort also needed in prioritizing the ever-changing release content for the week as new critical issues are discovered. Despite these challenges, I fought the temptation to move to a longer bi-weekly or monthly release cycle because I wanted to stay highly responsive to customers (something our customers repeatedly appreciate). Managing weekly releases got a lot harder once I started doing customer development. Spending more time outside the building, meant less time for coding, testing, and deploying. Things started to slip. That is when I devised a set of work hacks to manage my schedule (described here) and what drove me to adopting Continuous Deployment.

My transition from staged releases to continuous deployment took roughly 2 weeks. I read Eric Ries' 5 step primer to getting started with Continuous Deployment and found that I already had a lot of the necessary pieces. Continuous integration, deployment scripts, monitoring, and alerting are all best practices for any release process - staged or continuous.

The fundamental challenge with Continuous Deployment is getting comfortable with releasing all the time.

Continuous deployment makes releases non-events and checking in code is synonymous with triggering a release. On the one hand, this is the ultimate in customer responsiveness. On the other hand, it is scary as hell. With staged releases, time provides a (somewhat illusory) safety net. There is also comfort in sharing test responsibility with someone else (the QA team). No one wants to be solely responsible for bringing a production system down. For me neither was a consideration. I didn't have time or a QA team.

I took things easy at first - made small changes and audited the release process maniacally. I started relying heavily on functional tests (over unit tests) which allowed me to test changes as a user would. I also identified a set of events that would indicate something terribly going wrong (e.g. no users on the system) and built real-time alerting around them (using nagios/ganglia). As we built confidence, we started committing bigger and multi-part changes, each time building up our suite of testing and monitoring scripts. After a few iterations, our fear level was actually lower than how we used to feel after a staged release. Because we were committing less code per release, we could correlate issues to a release with certainty.

These days, we never wonder if unexpected errors could have been introduced as a result of a large code merge (since there is no branching. We also rely on more testing and monitoring automation, which is way more robust and consistent than what we were doing before.

All that said, mistakes are still made and we commit bad code now and then. None that have taken the system down (not yet anyway). Rather than seeing these as a shortcoming of the process, we view it as an opportunity to build up our Cluster Immune System. We try and follow a Five Whys approach to keep these errors from recurring. There is always some action to take: writing more tests, more monitoring, more alerts, more code, or more process.

Looking back, struggled to balance the opposing pulls of "outside the building" versus "inside the building" activities. Adopting Continuous Deployment has allowed me to build "flow" into my day which allows me to do both. But easier releases are not the only benefit of Continuous Deployment. Smaller releases lead to faster build/measure/learn loops. I've used these faster build/measure/learn loops to optimize my User Activation flow, delight customers with "near-instant" fixes to issues, and even eliminate features that no one was using.

While it is somewhat easier to continuously deploy web based software, with a little discipline, desktop based software too can be built to flow. Here's how I am implement continuous deployment for my desktop-based application (CloudFire).

My Continuous Deployment process

Don't push features

If you've followed a customer discovery process, identified a problem worth solving, and built out your minimum viable product, DON'T keep adding features until you've validated the MVP, or more specifically the unique value proposition of the MVP. Unneeded features are waste and not only create more work but can needlessly complicate the product and prolong the "customer validation" phase.

Every new feature should ideally be pulled by more than one customer before showing up in a release.

Build in response to a signal from the "customer", and otherwise rest or improve.

As a technologist, I too love to measure progress based on how much stuff I build. But instead of channeling all my energy towards building new features, I channel roughly 80% of it towards measuring and optimizing existing features. I am not advocating adding no features at all. Users will naturally ask for more stuff and your MVP by definition is minimal and needs more love. Just don't push it.

Code in small batches

I've previously described my 2 hour blocks of maker time for maximizing my work "flow". Prior to starting any maker activity, I clearly identify what needs to get done (the goal) and sketch out how it needs to get done (the design).

It is important to point out that the goal of the maker activity need not be a user facing feature or even a complete feature. There is inherent value in committing incremental work into production to diffuse future integration surprises. During the maker activity, I code, unit test, and create or update functional tests, as needed. At the end of the maker activity, I check-in code which automatically triggers a build on a continuous integration server that is then run through a battery of unit and functional tests. The artifacts created at the end of the build are installers for mac and windows (for new users) along with an Eclipse P2 repository (OSGI) for automatic software updates (for current users). The release process takes ~15 minutes and runs in the background.

Prefer functional tests over unit tests whenever possible

I don't believe in blindly writing unit tests just to achieve 100% code coverage as reported by some tool. To do that I would have to mock (simulate) too many critical components. I deem excessive unit testing a form of waste. Whenever possible, I rely on functional tests that verify user actions. I use Selenium, which lets me control the application on multiple browsers and OS platforms, just as a user would. One thing to be wary of is that functional tests are longer running than unit tests and will gradually increase the release-cyle-time. Parallelization of tests with multiple test machines is a way to address this. I am not at that point yet but Selenium Grid looks like a good option. So does Go Test It.

Always test the User Activation flow
After the integration tests are run and the software packaged, I always verify my User Activation flow before going live. The user activation flow is the most critical path towards achieving initial user gratification or product/market fit. My user activation flow is automatically tested on both a mac and windows machine.

Utilize automagic software updates
A major challenge with desktop-based (versus web-based) software is propagating software updates. Studies have shown that users find traditional software update dialogs annoying. To overcome this, I am using a software update strategy that works silently without ever interrupting the user, much like an appliance. Google Chrome utilizes a similar update process. The biggest risk with this approach is that users will find it Orwellian. So far no one has complained, and many users like the auto-update feature. It helps that CloudFire, being a p2web app, runs headlessly with a browser-based UI.

This is how the software update process currently works:

At the end of each build, we push an Eclipse P2 repository (OSGI) which is a set of versioned plug-ins that make up the application. Because the application is composed of many small plug-ins, coupled with the fact that we commit small code batches, the size of each software update can be downloaded quickly.
Every time the user starts up the application, it checks for a new update, downloads and installs one if available. Depending on the type of update, it could take effect immediately or require an application restart. If an application restart is required, we wait until the next user initiated relaunch of the application or trigger one silently when the system is idle.
If the application is already running, it periodically polls for new updates. If an update is found, it is also downloaded and installed in the background (as above) without interrupting the user.

Alerts and monitoring

I use nagios and ganglia to implement both system and application level monitoring and alerting on the overall health of the production cluster. Examples of things I monitor are the numbers of user activations, active users, and aggregate page hits to user galleries. Any out-of-the-norm dip in these numbers immediately alerts us (via twitter/SMS) to a potential issue.

Application level diagnostics

Despite the best testing, defects still happen. More testing is not always the answer as some defects are intermittent and a function of the end-user's environment. It is virtually impossible to test all combinations of hardware, OS, browsers, and third party apps (e.g. Norton anti-virus, Zone Alarm, etc.).

Relying on users to report errors doesn't always work in practice. To compensate, we've had to build basic diagnostics right into the application itself. They can notify both the user and us of unexpected errors, and allow us to pull configuration information and logs remotely. We can also do remote rollbacks this way.

Tolerate unexpected errors exactly once

Unexpected errors provide the opportunity to learn and bullet-proof a system early. Ignoring them or implementing quick-and-dirty patches inevitably lead to repeat errors which are another form of waste. I try and follow a formalized Five Why's process (using our internal wiki) for every error. This forces us to really stop, think, and fix the right problem(s).

My continuous deployment process is summarized below:

So why is Continuous Deployment so controversial?

Eric has addressed a lot of the objections already on his blog. One that I hear a lot is the belief that you need a massive team to pull off continuous deployment. I would argue that the earlier in the development cycle and the smaller the team, the easier it is to implement a continuous deployment process. If you are a start-up with a MVP, there is no better time to adopt a continuous deployment process than the present. You don't yet have hundreds of customers, dozens of peers, or dozens of features. It is a lot easier to lay the groundwork now with time on your side.

If you enjoyed Ash's writing in this case study, I suggest you subscribe to his blog. -Eric

Continuous deployment for mission-critical applications

17:00:00 0 comments

Having evangelized the concept of continuous deployment for the past few years, I've come into contact with almost every conceivable question, objection, or concern that people have about it. The most common reaction I get is something like, "that sounds great - for your business - but that could never work for my application." Or, phrased more hopefully, "I see how you can use continuous deployment to run an online consumer service, but how can it be used for B2B software?" Or variations thereof.

I understand why people would think that a consumer internet service like IMVU isn't really mission critical. I would posit that those same people have never been on the receiving end of a phone call from a sixteen-year-old girl complaining that your new release ruined their birthday party. That's where I learned a whole new appreciation for the idea that mission critical is in the eye of the beholder. But, even so, there are key concerns that lead people to conclude that continuous deployment can't be used in mission critical situations.

Implicit in these concerns are two beliefs:

1. That mission critical customers won't accept new releases on a continuous basis.
2. That continuous deployment leads to lower quality software than software built in large batches.

These beliefs are rooted in fears that make sense. But, as is often the case, the right thing to do is to address the underlying cause of the fear instead of avoiding improving the process. Let's take each in turn.

Another release? Do I have to?
Most customers of most products hate new releases. That's a perfectly reasonable reaction, given that most releases of most products are bad news. It's likely that the new release will contain new bugs. Even worse, the sad state of product development generally means that the new "features" are as likely to be ones that make the product worse, not better. So asking customers if they'd like to receive new releases more often usually leads to a consistent answer: "No, thank you." On the other hand, you'll get a very different reaction if you ask customers "next time you report an urgent bug, would you prefer to have it fixed immediately or to wait for a future arbitrary release milestone?"

Most enterprise customers of mission critical software mitigate these problems by insisting on releases on a regular, slow schedule. This gives them plenty of time to do stress testing, training, and their own internal deployment. Smaller customers and regular consumers rely on their vendors to do this for them, and are otherwise at their mercy. Switching these customers directly to continuous deployment sounds harder than it really is. That's because of the anatomy of a release. A typical "new feature" release is, in my experience, about 80% changes to underlying API's or architecture. That is, the vast majority of the release is not actually visible to the end-user. Most of these changes are supposed to be "side effect free" although few traditional development teams actually achieve that level of quality. So the first shift in mindset required for continuous deployment is this: if a change is supposedly "side effect free," release it immediately. Don't wait to bundle it up with a bunch of other related changes. If you do that, it will be much harder to figure out which change caused the unexpected side effects.

The second shift in mindset required is to separate the concept of a marketing release from the concept of an engineering release. Just because a feature is built, tested, integrated and deployed doesn't mean that any customers should necessarily see it. When deploying end-user-visible changes, most continuous deployment teams keep them hidden behind "flags" that allow for a gradual roll-out of the feature when it's ready. (See this blog post from Flickr for how they do this.) This allows the concept of "ready" to be much more all-encompassing than the traditional "developers threw it over the wall to QA, and QA approved of it." You might have the interaction designer who designed it take a look to see if it really conforms to their design. You might have the marketing folks who are going to promote it double-check that it does what they expect. You can train your operations or customer service staff on how it works - all live in the production environment. Although this sounds similar to a staging server, it's actually much more powerful. Because the feature is live in the real production environment, all kinds of integration risks are mitigated. For example, many features have decent performance themselves, but interact badly when sharing resources with other features. Those kinds of features can be immediately detected and reverted by continuous deployment. Most importantly, the feature will look, feel, and behave exactly like it does in production. Bugs that are found in production are real, not staging artifacts.

Plus, you want to get good at selectively hiding features from customers. That skill set is essential for gradual roll-outs and, most importantly, A/B split-testing. In traditional large batch deployment systems, split-testing a new feature seems like considerably more work than just throwing it over the wall. Continuous deployment changes that calculus, making split-tests nearly free. As a result, the amount of validated learning a continuous deployment team achieves per unit time is much higher.

The QA dilemma
A traditional QA process works through a checklist of key features, making sure that each feature works as specified before allowing the release to go forward. This makes sense, especially given how many bugs in software involve "action at a distance" or unexpected side-effects. Thus, even if a release is focused on changing Feature X, there's every reason to be concerned that it will accidentally break Feature Y. Over time, the overhead of this approach to QA becomes very expensive. As the product grows, the checklist has to grow proportionally. Thus, in order to get the same level of coverage for each release, the QA team has to grow (or, equivalently, the amount of time the product spends in QA has to grow). Unfortunately, it gets worse. In a successful startup, the development team is also growing. That means that there are more changes being implemented per unit time as well. Which means that either the number of releases per unit time is growing or, more likely, the number of changes in each release is growing. So for a growing team working on a growing product, the QA overhead is growing polynomially, even if the team is only growing linearly.

For organizations that have the highest standards for mission critical, and the budget to do it, full coverage can work. In fact, that's just what happens for organizations like the US Army, who have to do a massive amount of integration testing of products built by their vendors. Having those products fail in the field would be unacceptable. In order to achieve full coverage, the Army has a process for certifying these products. The whole process takes a massive amount of manpower, and requires a cycle time that would be lethal for most startups (the major certifications take approximately two years). And even the Army recognizes that improving this cycle time would have major benefits.

Very few startups can afford this overhead, and so they simply accept a reduction in coverage instead. That solves the problem in the short term, but not in the long term - because the extra bugs that get through the QA process wind up slowing the team down over time, imposing extra "firefighting" overhead, too.

I want to directly challenge the belief that continuous deployment leads to lower quality software. I just don't believe it. Continuous deployment offers three significant advantages over large batch development systems. Some of these benefits are shared by agile systems which have continuous integration but large batch releases, but others are unique to continuous deployment.

Faster (and better) feedback. Engineers working in a continuous deployment environment are much more likely to get individually tailored feedback about their work. When they introduce a bug, performance problem, or scalability bottleneck, they are likely to know about it immediately. They'll be much less likely to hide behind the work of others, as happens with large batch releases - when a release has a bug, it tends to be attributed to the major contributor to that release, even if that association is untrue.
More automation. Continuous deployment requires living the mantra: "have every problem only once." This requires a commitment to realistic prevention and learning from past mistakes. That necessarily means an awful lot of automation. That's good for QA and for engineers. QA's job gets a lot more interesting when we use machines for what machines are good for: routine repetitive detailed work, like finding bug regressions.
Monitoring of real-world metrics. In order to make continuous deployment work, teams have to get good at automated monitoring and reacting to business and customer-centric metrics, not just technical metrics. That's a simple consequence of the automation principle above. There are huge classes of bugs that "work as designed" but cause catastrophic changes in customer behavior. My favorite: changing the checkout button in an e-commerce flow to appear white on a white background. No automated test is going to catch that, but it still will drive revenue to zero. Continuous deployment teams will get burned by that class of bug only once.
Better handling of intermittent bugs. Most QA teams are organized around finding reproduction paths for bugs that affect customers. This made sense in eras where successful products tended to be used by a small number of customers. These days, even niche products - or even big enterprise products - tend to have a lot of man-hours logged by end-users. And that, in turn, means that rare bugs are actually quite exasperating. For example, consider a bug that happens only one-time-in-a-million uses. Traditional QA teams are never going to find a reproduction path for that bug. It will never show up in the lab. But for a product with millions of customers, it's happening (and being reported to customer service) multiple times a day! Continuous deployment teams are much better able to find and fix these bugs.
Smaller batches. Continuous deployment tends to drive the batch size of work down to an optimal level, whereas traditional deployment systems tend to drive it up. For more details on this phenomena see Work in small batches and the section on the "batch size death spiral" in Product Development Flow.

For those of you who are new to continuous deployment, these benefits may not sound realistic. In order to make sense of them, you have to understand the mechanics of continuous deployment. To get started, I recommend these three posts: Continuous deployment in 5 easy steps, Timothy Fitz's excellent Continuous Deployment at IMVU: Doing the impossible fifty times a day, and Why Continuous Deployment?

Let me close with a question. Imagine with me for a moment that continuous deployment doesn't prevent us from doing staged releases for customers, and it actually leads to higher quality software. What's preventing you from using it for your mission-critical application today? I hope you'll share your thoughts in a comment.
Read More »

Why Continuous Deployment?

13:43:00 0 comments

Of all the tactics I have advocated as part of the lean startup, none has provoked as many extreme reactions as continuous deployment, a process that allows companies to release software in minutes instead of days, weeks, or months. My previous startup, IMVU, has used this process to deploy new code as often as an average of fifty times a day. This has stirred up some controversy, with some claiming that this rapid release process contributes to low-quality software or prevents the company from innovating. If we accept the verdict of customers instead of pundits, I think these claims are easy to dismiss. Far more common, and far more difficult, is the range of questions from people who simply wonder if it's possible to apply continuous deployment to their business, industry, or team.

The particulars of IMVU’s history give rise to a lot of these concerns. As a consumer internet company with millions of customers, it may seem to have little relevancy for an enterprise software company with only a handful of potential customers, or a computer security company whose customers demand a rigorous audit before accepting a new release. I think these objections really miss the point of continuous deployment, because they focus on the specific implementations instead of general principles. So, while most of the writing on continuous deployment so far focuses on the how of it, I want to focus today on the why. (If you're looking for resources on getting started, see "Continuous deployment in 5 easy steps")

The goal of continuous deployment is to help development teams drive waste out of their process by simultaneously reducing the batch size and increasing the tempo of their work. This makes it possible for teams to get – and stay – in a condition of flow for sustained periods. This condition makes it much easier for teams to innovate, experiment, and achieve sustained productivity. And it nicely compliments other continuous improvement systems, such as Five Whys.

One large source of waste in development is “double-checking.” For example, imagine a team operating in a traditional waterfall development system, without continuous deployment, test-driven development, or continuous integration. When a developer wants to check-in code, this is a very scary moment. He or she has a choice: check-in now, or double-check to make sure everything still works and looks good. Both options have some attraction. If they check-in now, they can claim the rewards of being done sooner. On the other hand, if they cause a problem, their previous speed will be counted against them. Why didn't they spend just another five minutes making sure they didn't cause that problem? In practice, how developers respond to this dilemma is determined by their incentives, which are driven by the culture of their team. How severely is failure punished? Who will ultimately bear the cost of their mistakes? How important are schedules? Does the team value finishing early?

But the thing to notice in this situation is that there is really no right answer. People who agonize over the choice reap the worst of both worlds. As a result, developers will tend towards two extremes: those who believe in getting things done as fast as possible, and those who believe that work should be carefully checked. Any intermediate position is untenable over the long-term. When things go wrong, any nuanced explanation of the trade-offs involved is going to sound unsatisfying. After all, you could have acted a little sooner or a little more careful – if only you’d known what the problem was going to be in advance. Viewed through the lens of hindsight, most of those judgments look bad. On the other hand, an extreme position is much easier to defend. Both have built-in excuses: “sure there were a few bugs, but I consistently over-deliver on an intense schedule, and it’s well worth it” or “I know you wanted this done sooner, but you know I only ever deliver when it’s absolutely ready, and it’s well worth it.”

These two extreme positions lead to factional strife in development teams, which is extremely unpleasant. Managers start to make a note of who’s on which faction, and then assign projects accordingly. Got a crazy last-minute feature, get the Cowboys to take care of it – and then let the Quality Defenders clean it up in the next release. Both sides start to think of their point of view in moralistic terms: “those guys don’t see the economic value of fast action, they only care about their precious architecture diagrams” or “those guys are sloppy and have no professional pride.” Having been called upon to mediate these disagreements many times in my career, I can attest to just how wasteful they are.

However, they are completely logical outgrowths of a large-batch-size development process that forces developers to make trade-offs between time and quality, using the old “time-quality-money, pick two fallacy.” Because feedback is slow in coming, the damage caused by a mistake is felt long after the decisions that caused the mistake were made, making learning difficult. Because everyone gets ready to integrate with the release batch around the same time (there being no incentive to integrate early), conflicts are resolved under extreme time pressure. Features are chronically on the bubble, about to get deferred to the next release. But when they do get deferred, they tend to have their scope increased (“after all, we have a whole release cycle, and it’s almost done…”), which leads to yet another time crunch, and so on. And, of course, the code rarely performs in production the way it does in the testing or staging environment, which leads to a series of hot-fixes immediately following each release. These come at the expense of the next release batch, meaning that each release cycle starts off behind.

Many times when I interview a development team caught in the pincers of this situation, they want my help "fixing people." Thanks to a phenomenon called the Fundamental Attribution Error in psychology, humans tend to become convinced that other people’s behavior is due to their fundamental attributes, like their character, ethics, or morality – even while we excuse our own actions as being influenced by circumstances. So developers stuck in this world tend to think the other developers on their team are either, deep in their souls, plodding pedants or sloppy coders. Neither is true – they just have their incentives all messed up.

You can’t change the underlying incentives of this situation by getting better at any one activity. Better release planning, estimating, architecting, or integrating will only mitigate the symptoms. The only traditional technique for solving this problem is to add in massive queues in the forms of schedule padding, extra time for integration, code freezes and the like. In fact, most organizations don’t realize just how much of this padding is already going on in the estimates that individual developers learn to generate. But padding doesn’t help, because it serves to slow down the whole process. And as all development teams will tell you – time is always short. In fact, excess time pressure is exactly why they think they have these problems in the first place.

So we need to find solutions that operate at the systems level to break teams out of this pincer action. The agile software movement has made numerous contributions: continuous integration, which helps accelerate feedback about defects; story cards and kanban that reduce batch size; a daily stand-up that increases tempo. Continuous deployment is another such technique, one with a unique power to change development team dynamics for the better.

Why does it work?

First, continuous deployment separates out two different definitions of the terms “release.” One is used by engineers to refer to the process of getting code fully integrated into production. Another is used by marketing to refer to what customers see. In traditional batch-and-queue development, these two concepts are linked. All customers will see the new software as soon as it’s deployed. This requires that all of the testing of the release happen before it is deployed to production, in special staging or testing environments. And this leaves the release vulnerable to unanticipated problems during this window of time: after the code is written but before it's running in production. On top of that overhead, by conflating the marketing release with the technical release, the amount of coordination overhead required to ship something is also dramatically increased.

Under continuous deployment, as soon as code is written, it’s on its way to production. That means we are often deploying just 1% of a feature – long before customers would want to see it. In fact, most of the work involved with a new feature is not the user-visible parts of the feature itself. Instead, it’s the millions of tiny touch points that integrate the feature with all the other features that were built before. Think of the dozens of little API changes that are required when we want to pass new values through the system. These changes are generally supposed to be “side effect free” meaning they don’t affect the behavior of the system at the point of insertion – emphasis on supposed. In fact, many bugs are caused by unusual or unnoticed side effects of these deep changes. The same is true of small changes that only conflict with configuration parameters in the production environment. It’s much better to get this feedback as soon as possible, which continuous deployment offers.

Continuous deployment also acts as a speed regulator. Every time the deployment process encounters a problem, a human being needs to get involved to diagnose it. During this time, it’s intentionally impossible for anyone else to deploy. When teams are ready to deploy, but the process is locked, they become immediately available to help diagnose and fix the deployment problem (the alternative, that they continue to generate, but not deploy, new code just serves to increase batch sizes to everyone’s detriment). This speed regulation is a tricky adjustment for teams that are accustomed to measuring their progress via individual efficiency. In such a system, the primary goal of each engineer is to stay busy, using as close to 100% of his or her time for coding as possible. Unfortunately, this view ignores the overall throughput of the team. Even if you don’t adopt a radical definition of progress, like the “validated learning about customers” that I advocate, it’s still sub-optimal to keep everyone busy. When you’re in the midst of integration problems, any code that someone is writing is likely to have to be revised as a result of conflicts. Same with configuration mismatches or multiple teams stepping on each others’ toes. In such circumstances, it’s much better for overall productivity for people to stop coding and start talking. Once they figure out how to coordinate their actions so that the work they are doing doesn’t have to be reworked, it’s productive to start coding again.

Returning to our development team divided into Cowboy and Quality factions, let’s take a look at how continuous deployment can change the calculus of their situation. For one, continuous deployment fosters learning and professional development – on both sides of the divide. Instead of having to argue with each other about the right way to code, each individual has an opportunity to learn directly from the production environment. This is the meaning of the axiom to “let your defects be your teacher.”

If an engineer has a tendency to ship too soon, they will tend to find themselves grappling with the cluster immune system, continuous integration server, and five whys master more often. These encounters, far from being the high-stakes arguments inherent in traditional teams are actually low-risk, mostly private or small-group affairs. Because the feedback is rapid, Cowboys will start to learn what kinds of testing, preparation and checking really do let them work faster. They’ll be learning the key truth that there is such a thing as “too fast” – many quality problems actually slow you down.

But for engineers that have the tendency to wait too long before shipping, they too have lessons to learn. For one, the larger the batch size of their work, the harder it will be to get it integrated. At IMVU, we would occasionally hire someone from a more traditional organization who had a hard time letting go of their “best practices” and habits. Sometimes they’d advocate for doing their work on a separate branch, and only integrating at the end. Although I’d always do my best to convince them otherwise, if they were insistent I would encourage them to give it a try. Inevitably, a week or two later, I’d enjoy the spectacle of watching them engage in something I called “code bouncing.” It's like throwing a rubber ball against the wall. In a code bounce, someone tries to check in a huge batch. First they have integration conflicts, which require talking to various people on the team to know how to resolve them properly. Of course, while they are resolving, new changes are being checked in. So new conflicts appear. This cycle repeats for a while, until the team either catches up to all the conflicts or just asks the rest of the team for a general check-in freeze. Then the fun part begins. Getting a large batch through the continuous integration server, incremental deploy system, and real-time monitoring system almost never works on the first try. Thus the large batch gets reverted. While the problems are being fixed, more changes are being checked in. Unless we freeze the work of the whole team, this can go on for days. But if we do engage in a general check-in freeze, then we’re driving up the batch size of everyone else – which will lead to future episodes of code bouncing. In my experience, just one or two episodes are enough to cure anyone of their desire to work in large batches.

Because continuous deployment encourages learning, teams that practice it are able to get faster over time. That’s because each individual’s incentives are aligned with the goals of the whole team. Each person works to drive down waste in their own work, and this true efficiency gain more than offsets the incremental overhead of having to build and maintain the infrastructure required to do continuous deployment. In fact, if you practice Five Whys too, you can build all of this infrastructure in a completely incremental fashion. It’s really a lot of fun.

One last benefit: morale. At a recent talk, an audience member asked me about the impact of continuous deployment on morale. This manager was worried that moving their engineers to a more-rapid release cycle would stress them out, making them feel like they were always fire fighting and releasing, and never had time for “real work.” As luck would have it, one of IMVU’s engineers happened to be in the audience at the time. They provided a better answer than I ever could. They explained that by reducing the overhead of doing a release, each engineer gets to work to their own release schedule. That means, as soon as they are ready to deploy, they can. So even if it’s midnight, if your feature is ready to go, you can check-in, deploy, and start talking to customers about it right away. No extra approvals, meetings, or coordination required. Just you, your code, and your customers. It’s pretty satisfying.
Read More »

Fear is the mind-killer

23:48:00 0 comments

Fear is an emotion that slows teams down. It makes us more cautious. It makes us over-invest in prevention. It makes us less willing to trust, to communicate openly, and - most painfully - to take risks. It is the dominant reason I see teams fall back on "best practices" which may not be effective, but are at least reassuring. Unfortunately, these actions generally work to increase the batch size of our work, which magnifies the consequences of failure and therefore leads to more fear. Reducing fear is a heuristic we can use to judge process improvements. Anything that reduces fear is likely to speed up the fundamental feedback loop.

The interesting thing about fear is that to reduce it requires two contradictory impulses. First, we can reduce fear by mitigating the consequences of failure. If we construct areas where experimentation is less costly, we can feel safer and therefore try new things. On the other hand, the second main way to reduce fear is to engage in the feared activity more often. By pushing the envelope, we can challenge our assumptions about consequences and get better at what we fear at the same time. Thus, it is sometimes a good idea to reduce fear by slowing down, and sometimes a good idea to reduce fear by speeding up.

To illustrate this point, I want to excerpt a large part of a recent blog post by Owen Rogers, who organized my recent trip to Vancouver. I spent some time with his company before the conference and discussed ways to get started with continuous deployment, including my experience introducing it at IMVU. He summarized that conversation well, so rather than re-tread that material, I'll quote it here:

One thing that I was surprised to learn was that IMVU started out with continuous deployment. They were deploying to production with every commit before they had an automated build server or extensive automated test coverage in place. Intuitively this seemed completely backwards to me - surely it would be better to start with CI, build up the test coverage until it reached an acceptable level and then work on deploying continuously. In retrospect and with a better understanding of their context, their approach makes perfect sense. Moreover, approaching the problem from the direction that I had intuitively is a recipe for never reaching a point where continuous deployment is feasible.
Initially, IMVU sought to quickly build a product that would prove out the soundness of their ideas and test the validity of their business model. Their initial users were super early adopters who were willing to trade quality for access to new features. Getting features and fixes into hands of users was the greatest priority - a test environment would just get in the way and slow down the validation coming from having code running in production. As the product matured, they were able to ratchet up the quality to prevent regression on features that had been truly embraced by their customers.

Second, leveraging a dynamic scripting language (like PHP) for building web applications made it easy to quickly set up a simple, non-disruptive deployment process. There’s no compilation or packaging steps which would generally be performed by an automated build server - just copy and change the symlink.

Third, they evolved ways to selectively expose functionality to sets of users. As Eric said, “at IMVU, ‘release’ is a marketing term”. New functionality could be living in production for days or weeks before being released to the majority of users. They could test, get feedback and refine a new feature with a subset of users until it was ready for wider consumption. Users were not just an extension of the testing team - they were an extension of the product design team.

Understanding these three factors makes it clear as to why continuous deployment was a starting point for IMVU. In contrast, at most organizations - especially those with mature products - high quality is the starting point. It is assumed that users will not tolerate any decrease in quality. Users should only see new functionality once it is ready, fully implemented and thoroughly tested, lest they get a bad impression of the product that could adversely affect the company’s brand. They would rather build the wrong product well than risk this kind of exposure. In this context, the automated test coverage would need to be so good as to render continuous deployment infeasible for most systems. Starting instead from a position where feedback cycle time is the priority and allowing quality to ratchet up as the product matures provides a more natural lead in to continuous deployment.

The rest of the post, which you can read here, discusses the application of these principles to other contexts. I recommend you take a look.

Returning to the topic at hand, I think this example illustrates the tension required to reduce fear. In order to do continuous deployment at IMVU, we had to handle fear two ways:

Reduce consequences - by emphasizing the small number of customers we had, we were able to convince ourselves that exposing them to a half-baked product was not very risky. Although it was painful, we focused our attention on the even bigger risks we were mitigating: the risk that nobody would use our product, the risk that customers wouldn't pay for virtual goods, and the risk that we'd spend years of our lives building something that didn't matter - again.
Fear early, fear often - by actually doing continuous deployment before we were really "ready" for it, we got used to the real benefits and consequences of acting at that pace. On the negative side, we got a visceral feel for the kinds of changes that could really harm customers, like commits that take the whole site down. But on the plus side, we got to see just how powerful it is to be able to ship changes to the product at any hour of the day, to get rapid feedback on new ideas, and to not have to wait for the next "release train" to put your ideas in action. On the whole, it made it easier for us to decide to invest in preventive maintenance (ie the Cluster Immune System) rather than just slow down and accept a larger batch size.

Making this fear-reduction strategy work required more than just the core team getting used to continuous deployment. We eventually discovered (via five whys) that we also had to get each new employee acculturated to a fearless way of thinking. For people we hired from larger companies especially, this was challenging. To get them over that hurdle, we once again turned to the "reduce consequences" and "face your fears" duality.

When a new engineer started at IMVU, I had a simple rule: they had to ship code to production on their first day. It wasn't an absolute rule; if it had to be the second day, that was OK. But if it slipped to the third day, I started to worry. Generally, we'd let them pick their own bug to fix, or, if necessary, assign them something small. As we got better at this, we realized the smaller the better. Either way, it had to be a real bug and it had to be fixed live, in production. For some, this was an absolutely terrifying experience. "What if I take the site down?!" was a common refrain. I tried to make sure we always gave the same answer: "if you manage to take the site down, that's our fault for making it too easy. Either way, we'll learn something interesting."

Because this was such a big cultural change for most new employees, we didn't leave them to sink or swim on their own. We always assigned them a "code mentor" from the ranks of the more established engineers. The idea was that these two people would operate as a unit, with the mentor's job performance during this period evaluated by the performance of the new person. As we continued to find bugs in production caused by new engineers who weren't properly trained, we'd do root cause analysis, and keep making proportional investments in improving the process. As a result, we had a pretty decent curriculum for each mentor to follow to ensure the new employee got up to speed on the most important topics quickly.

These two practices worked together well. For one, it required us to keep our developer sandbox setup procedure simple and automated. Anyone who had served as a code mentor would instinctively be bothered if someone else made a change to the sandbox environment that required special manual setup. Such changes inevitably waste a lot of time, since we generally build a lot more developer sandboxes than we realize. Most importantly, we immediately thrust our new employees into a mindset of reduced fear. We had them imagine the most risky thing they could possibly do - pushing code to production too soon - and then do it.

Here's the key point. I won't pretend that this worked smoothly every time. Some engineers, especially in the early days, did indeed take the site down on their first day. And that was not a lot of fun. But it still turned out OK. We didn't have that many customers, after all. And continuous deployment meant we could react fast and fix the problem quickly. Most importantly, new employees realized that they weren't going to be fired for making a mistake. We'd immediately involve them in the postmortem analysis, and in a lot of cases it was the newcomer themselves (with the help of their mentor) who would would build the prophylactic systems required to prevent the next new person from tripping over that same issue.

Fear slows teams of all sizes down. Even if you have a large team, could you create a sandboxed environment where anyone can make changes that affect a small number of customers? Even as we grew the team at IMVU, we always maintained a rule that anyone could run a split-test without excess approvals as long as the total number of customers affected was below a critical threshold. Could you create a separate release process for small or low-risk commits, so that work that happens in small batches is released faster? My prediction in such a situation is that, over time, an increasing proportion of your commits will become eligible for the fast-track procedure.

Whatever fear-reducing tactics you try, share your results in the comments. Or, if fear's got you paralyzed, share that too. We'll do our best to help.

(with apologies to Frank Herbert)

Continuous deployment with downloads

06:18:00 0 comments

One of my goals in writing posts about topics like continuous deployment is the hope that people will take those ideas and apply them to new situations - and then share what they learn with the rest of us. So I was excited to read a recent post about applying the concept of continuous deployment to that thickest-of-all-clients, the MMOG. Joe Ludwig goes through his current release process and determines that the current time to make and deploy a release is about seven and half hours, which is why his product is released about once a month. While that's actually quite speedy for a MMOG, Joe goes through the thought experiment of what it would take to do it much faster:

Programmer Joe � Continuous Deployment with Thick Clients
If it takes seven and a half hours to deploy a new build you obviously aren’t going to get more than one of them out in an 8 hour work day. Let’s forget for a moment that IMVU is able to do this in 15 minutes and pretend that our target is an hour. For now let’s assume that we will spend 10 minutes on building, 10 minutes on automated testing, 30 minutes on manual testing ...

In fact, if those 30 minutes of manual testing are your bottleneck and you can keep the pipeline full, you can push a fresh build every 30 minutes or 16 times a day. Forget entirely about pushing to live for a moment and consider what kind of impact that would have on your test server. Your team could focus on fixing issues that players on the test server find while those players are still online. Assuming a small change that you can make in half an hour, it would take only an hour from the start of the work on that fix to when it is visible to players. That pace is fast enough that it would be possible to run experiments with tuning values, prices of items, or even algorithms.Of course for any of this to work the entire organization needs to be arranged around responding to player feedback multiple times per day. The real advantage of a rapid deployment system is to make your change -> test -> respond loop faster.

This is a great example of lean startup thinking. Joe is getting clear about what steps in the current process actually deliver value to the company, the imagining a world in which those steps were emphasized and others minimized. Of course, as soon as you do that, you start to reap other benefits, too.

I'd like to add one extra thought to Joe's thought experiment. Let's start with a distinction between shipping new software to the customer, and changing the customer's experience. The idea is that often you can change the customer's experience without shipping them new software at all. This is one of the most powerful aspects of web architecture, and it often gets lost in other client-server programming paradigms.

From one point of view, web browsers are a horribly inefficient platform. We often send down complete instructions for rendering a whole page (or even a series of pages) in response to every single click. Worse, those instructions often kick off additional requests back and forth. It would be a lot more efficient to send down a compressed packet with the entire site's data and presentation in an optimized format. Then we could render the whole site with much less latency, bandwidth usage, and server cost.

Of course, the web doesn't work this way for good reasons. Its design goals aren't geared towards efficiency in terms of technical costs. Instead, it's focused on flexibility, readability, and ease of interoperability. For example, it's quite common that we don't know the exact set of assets a given customer is going to want to use. By deferring their selection until later in the process, we can give up a lot of bookkeeping (again trading off for considerable costs). As a nice side-effect, it's also an ideal platform for rapid changes, because you can "update the software" in real time without the end-user even needing to be aware of the changes.

Some conclude that this phenomenon is made possible because the web browser is a fully general-purpose rendering platform, and assume that it'd be impossible to do this in their app without creating that same level of generality. But I think it's more productive to think of this as a spectrum. You can always move logic changes a little further "upstream" closer to the source of the code that is flowing to customers. Incidentally, this is especially important for iPhone developers, who are barred by Apple Decree from embedding a programming language or interpreter in their app (but who are allowed to request structured data from the server).

For example, at IMVU we would often run split-test experiments that affected the behavior of our downloadable client. Although we had the ability to do new releases of the client on a daily basis (more on this in a moment), this was actually too slow for most of the experiments we wanted to run. Plus, having the customer be aware that a new feature is part of a new release actually affects the validity of the experiment. So we would often ship a client that had multiple versions of a feature baked into it, and have the client call home to find out which version to show to any given customer. This added to the code complexity, latency, and server cost of a given release, but it was more than paid back by our ability to tune and tweak the experimental branches in near-real-time.

Further upstream on the spectrum are features that can be parameterized. Common examples are random events which have some numeric weighting associated with them (and which can be tuned) or user interface elements that are composed of text or graphics. We tacked on a feature to the IMVU client that worked like this: whenever the client called home to report a data-warehousing event, we used to have a useless return field (the client doesn't care if the event was successfully recorded). We repurposed that field to optionally include some XML describing an on-screen dialog box. That meant we could notify some percentage of customers of something at any time, which was great for split-testing. A new feature would often be first "implemented" by a dialog box shown to a few percent of the userbase. We'd pay attention to how many clicked the embedded link to get an early read on how much they cared about the feature at all. Often, we'd do this before the feature existed at all, apologizing all the way.

There are plenty more techniques even further upstream. Eventually, you wind up with specialized state machines, interpeters or a full-fledged embedded platform. We eventually embedded the Flash interpreter into our process, so we could experiment with our UI more quickly.

In fact, we considered releases themselves to be a special case of this more general system. We had a structured automated release process. After all, the release itself was just a static file checked into our website source control. Every new candidate release was automatically shown to a small number of volunteers, who would be prompted to upgrade. The system would monitor their data and if it looked within norms gradually offer the release to a small number of new users (who had no prior expectation of how the product should work). It would carefully monitor their behavior, and especially their technical metrics, like crashes and freezes. If their data looked OK, we'd have the option to ramp up the number of customers bit by bit until finally all new users were given the new release and all existing users were promtped to upgrade. Although we'd generally do a prerelease every day, we wouldn't pull the trigger on a full release that often, because our upgrade path for existing users wasn't (yet) without cost. It also gave our manual QA team a chance to inspect the client before it was widely deployed, due to the lower level of test coverage we have on that part of the product (it's much harder).

In effect, every time we check in code to our client code base, we are kicking off another split-test experiment that asks: "is the business better off with this change in it than without it?" Because of the mechanics of our download process, this is answered a little slower than on the web. But that doesn't make it any less important to answer.

To return to the case of the thick-client MMOG, there are some additional design constraints. It's more risky to have different players using different versions of the software, because that might introduce gameplay fairness issues. I think Joe's idea of deploying to the test server is a great way around this, especially if there is a regular crew of players subjecting the test server to near-normal behavior. But I also think this could work in a lot of production scenarios. Take the case of determining the optimal spawn rate for a certain monster or treasure. Since this is a parameterized scenario, it should be possible to do time-based experiments. Change the value periodically, and have a process for measuring the subsequent behavior of customers who were subjected to each unique value. If you want to be really fancy, you can segregate the data for customers who saw multiple variations. I bet you'd be able to write a simple linear optimizer to answer a lot of design questions that way.

My experience is that once you have a tool like this, you start to use it more and more. Even better, you start to migrate your designs to be able to take ever-increasing advantage of it. At IMVU, we found ourselves constantly migrating functionality upstream, in order to get faster iteration. It was a nearly uncoscious process; we just felt that much more productive with rapid feedback.

So thanks for sharing Joe! Good luck with your thought experiment, and let us know if you ever decide to make those changes a reality.

Continuous deployment and continuous learning

21:11:00 0 comments

At long last, some of the actual implementers of the advanced systems we built at IMVU for rapid deployment and rapid response are starting to write about it. I find these on-the-ground descriptions of the system and how they work so much more credible than just theory-type posts that I am excited to share them with you. I can personally attest that these guys know what they are talking about; I saw them do it first-hand. I will always be full of awe and gratitude for what they accomplished.

Continuous Deployment at IMVU: Doing the impossible fifty times a day by Timothy Fitz
Continuous Deployment isn’t just an abstract theory. At IMVU it’s a core part of our culture to ship. It’s also not a new technique here, we’ve been practicing continuous deployment for years; far longer than I’ve been a member of this startup.

It’s important to note that system I’m about to explain evolved organically in response to new demands on the system and in response to post-mortems of failures. Nobody gets here overnight, but every step along the way has made us better developers.

The high level of our process is dead simple: Continuously integrate (commit early and often). On commit automatically run all tests. If the tests pass deploy to the cluster. If the deploy succeeds, repeat.

Our tests suite takes nine minutes to run (distributed across 30-40 machines). Our code pushes take another six minutes. Since these two steps are pipelined that means at peak we’re pushing a new revision of the code to the website every nine minutes. That’s 6 deploys an hour. Even at that pace we’re often batching multiple commits into a single test/push cycle. On average we deploy new code fifty times a day.

We call this process continuous deployment because it seemed to us like a natural extension of the continuous integration we were already doing. Our eventual conclusion was that there was no reason to have code that had passed the integration step but was not yet deployed. Every batch of software for which that is true is an opportunity for defects to creep in: maybe someone is changing the production environment in ways that are incompatible with code-in-progress; maybe someone in customer support is writing up a bug report about something that's just being fixed (or worse, the symptom is now changing); and no matter what else is happening, any problems that arise due to the code-in-progress require that the person who wrote it still remember how it works. The longer you wait to find out about the problem, the more likely it is to have fallen out of the human-memory cache.

Now, continuous deployment is not the only possible way to solve these kinds of problems. In another post I really enjoyed, Timothy explains five other non-solutions that seem like they will help, but really won't.

1. More manual testing.

This obviously doesn’t scale with complexity. This also literally can’t catch every problem, because your test sandboxes or test clusters will never be exactly like the production system.

2. More up-front planning

Up-front planning is like spices in a cooking recipe. I can’t tell you how much is too little and I can’t tell you how much is too much. But I will tell you not to have too little or too much, because those definitely ruin the food or product. The natural tendency of over planning is to concentrate on non-real issues. Now you’ll be making more stupid mistakes, but they’ll be for requirements that won’t ever matter.

3. More automated testing.

Automated testing is great. More automated testing is even better. No amount of automated testing ensures that a feature given to real humans will survive, because no automated tests are as brutal, random, malicious, ignorant or aggressive as the sum of all your users will be.

4. Code reviews and pairing

Great practices. They’ll increase code quality, prevent defects and educate your developers. While they can go a long way to mitigating defects, ultimately they’re limited by the fact that while two humans are better than one, they’re still both human. These techniques only catch the failures your organization as a whole already was capable of discovering.

5. Ship more infrequently

While this may decrease downtime (things break and you roll back), the cost on development time from work and rework will be large, and mistakes will continue to slip through. The natural tendency will be to ship even more infrequently, until you aren’t shipping at all. Then you’ve gone and forced yourself into a total rewrite. Which will also be doomed.

What all of these non-solutions have in common is that they treat only one aspect of the problem, but at the expense of another aspect. This is a common form of sub-optimization, where you gain efficiency in one of the sub-parts at the expense of the efficiency of the overall process. You can't make these global efficiency improvements until you get clear about the goal of your development process.

That leads to a seemingly-obvious question: what is progress in software development? It seems like it should be the amount of correctly-working code we've written. Heck, that's what it says right there in the agile manifesto. But, unfortunately, startups can't afford to adopt that standard. As I've argued elsewhere, my belief is that startups (and anyone else trying to find an unknown solution to an unknown problem) have to measure progress with validated learning about customers. In a lot of cases, that's just a fancy name for revenue or profit, but not always. Either way, we have to recognize that the biggest form of waste is building something that nobody wants, and continuous deployment is an optimization that tries to shorten this code-data-learning feedback loop.

Assuming you're with me so far, what will that mean in practice? Throwing out a lot of code. That's because as you get better at continuous deployment, you learn more and more about what works and what doesn't. If you're serious about learning, you'll continuously learn to prune the dead weight that doesn't work. That's not entirely without risk, which is a lesson we learned all-too-well at IMVU. Luckily, Chad Austin has recently weighed in with an excellent piece called 10 Pitfalls of Dirty Code.

IMVU was started with a particular philosophy: We don't know what customers will like, so let's rapidly build a lot of different stuff and throw away what doesn't work. This was an effective approach to discovering a business by using a sequence of product prototypes to get early customer feedback. The first version of the 3D IMVU client took about six months to build, and as the founders iterated towards a compelling user experience, the user base grew monthly thereafter.

This development philosophy created a culture around rapid prototyping of features, followed by testing them against large numbers of actual customers. If a feature worked, we'd keep it. If it didn't, we'd trash it.

It would be hard to argue against this product development strategy, in general. However, hindsight indicates we forgot to do something important when developing IMVU: When the product changed, we did not update the code to reflect the new product, leaving us with piles of dirty code.

So that you can learn from our mistakes, Chad has helpfully listed ten reasons why you want to manage this dirty-code (sometimes called "technical debt") problem proactively. If we could do it over again, I would have started a full continuous integration, deployment, and refactoring process from day one, complete with five why's for root cause analysis. But, to me anyway, one of the most inspiring parts of the IMVU story is that we didn't start with all these processes. We hadn't even heard of half of them. Slowly, painfully, incrementally, we were able to build them up over time (and without ever having a full-stop-let's-start-over timeout). If you read these pieces by the guys who were there, you'll get a visceral sense for just how painful it was.

But it worked. We made it. So can you.

A new version of the Joel Test (draft)

22:46:00 0 comments

(This article is a draft - your comments are especially welcome as I think through these issues. Please leave feedback!)

I am convinced one of Joel Spolsky's lasting contributions to the field of managing software teams will turn out to be the Joel Test, a checklist of 12 essential practices that you could use to rate the effectiveness of a software product development team. He wrote it in 2000, and as far as I know has never updated it.

I have been thinking a lot about what a new version of this test would look like, given what I've seen work and not work in startups. Like many forms of progress, most of the items on the new test don't replace items from Joel's - they either supplement or extend the ideas on which the original is based.

Let's start with the original list:

Do you use source control? This is still an essential practice, especially on the web. There was a time when "web content" was considered "not code" and therefore not routinely source controlled. but I have not seen that dysfunction in any of the startups I advise, so hopefully it's behind us. Joel mentions that "CVS is fine" and so is Subversion, its successor. I know plenty of people who prefer more advanced source control system, but my belief is that many agile practices diminish the importance of advanced features like branching.
Can you make a build in one step? You'd better. But if you want to practice rapid deployment, you need to be able to deploy that build in one step as well. If you want to do continuous deployment, you'd better be able to certify that build too, which brings us to...
Do you make daily builds? Daily builds are giving way to true continuous integration, in which every checkin to the source control system is automatically run against the full battery of automated tests. At IMVU, our engineering team accumulated thousands upon thousands of tests, and we had a build cluster (using BuildBot) that ran them. We did our best to keep the runtime of the tests short (10 minutes was our target), and we always treated a failing test as a serious event (generally, you couldn't check in at all if a test was failing). For more on continuous deployment, see Just-in-time Scalability.
Do you have a bug database? Joel's Painless Bug Tracking is still the gold standard.
Do you fix bugs before writing code? Increasingly, we are paying better lip service to this idea. It's incredibly hard to do. Plus, as product development teams in lean startups become adept at learning-and-discovery (as opposed to just executing to spec), it's clear that some bugs shouldn't be fixed. See the discussion of defects later in this post for my thoughts on how to handle those.
Do you have an up-to-date schedule? This, along with #7 "Do you have a spec?" are the parts of the Joel Test I think are most out-of-date. It's not that the idea behind them is wrong, but I think agile team-building practices make scheduling per se much less important. In many startup situations, ask yourself "Do I really need to accurately know when this project will be done?" When the answer is no, we can cancel all the effort that goes into building schedules and focus on making progress evident. Everyone will be able to see how much of the product is done vs undone, and see the finish line either coming closer or receding into the distance. When it's receding, we rescope. There are several ways to make progress evident - the Scrum team model is my current favorite.
Do you have a spec? I think the new question needs to be "does the team have a clear objective?" If you have a true cross-functional team, empowered (a la Scrum) to do whatever it takes to succeed it's likely they will converge on the result quickly. You can keep the team focused on customer-centric results, rather than conformance to spec. Now, all well-run teams have some form of spec that they use internally, and Joel's advice on how to craft that spec is still relevant. But increasingly we can move to a world where teams are chartered to accomplish results instead of tasked with creating work on spec.
Do programmers have quiet working conditions? Joel is focused on the fact that in many environments, programmers are considered "just the hired help" akin to manual labor, and not treated properly. We always have to avoid that dysfunction - even the lean manufacturing greats realized that they couldn't afford to see their manual-labor workforce that way. I think we need to modify this question to "Do programmers have access to appropriate working conditions?" We want every knowledge worker to be able to retreat into a quiet haven whenever they need deep concentration. But it's not true that energized programmers primarily do solitary work; certainly that's not true of the great agile teams I've known. Instead, teams should have their own space, under their control, with the tools they need to do the job.
Do you use the best tools money can buy? Joel said it: "Top notch development teams don't torture their programmers." Amen.
Do you have testers? I think reality has changed here. To see why, take a look at Joel's Top Five (Wrong) Reasons You Don't Have Testers. Notice that none of those five reasons deals with TDD or automated testing, which have changed the game. Automated testing dramatically reduces the cost of certifying changes, because it removes all of the grunt work QA traditionally does in software. Imagine a world where your QA team never, ever worries about bug regressions. They just don't happen. All of their time is dedicated to finding novel reproduction paths for tricky issues. That's possible now, and it means that the historical ratio of QA to engineering is going to have to change (on the other hand, QA is now a lot more interesting of a job).
Do new candidates write code during their interview? Completely necessary. I would add, though, a further question: Do new employees write code on their first day? At IMVU, our rule was that a new engineer needed to push code to production on their first day. Occasionally, it'd have to be their second day. But if it languished until the third day, something was seriously wrong. This is a test of many key practices: do you have a mentoring system? Is your build environment difficult to set up? Are you afraid someone might be able to break your product without your automated defenses knowing about it?
Do you do hallway usability testing? I love Joel's approach to usability, and I still recommend his free online book on UI design. Some people interpret this to mean that you have to do your usability "right" the first time. I strongly disagree. Usability design is a highly iterative process, and the more customers who are involved (via in-person interview, split-test experiment, etc) the better.

Now let's take a look at some new questions:

Do you work in small batches? Just like in lean manufacturing, it's generally more efficient to drive down the batch size. I try to encourage engineers to check in anytime they have the software in a working state in their sandbox. This dramatically reduces the waste of integration risk. We rarely have code conflicts, since nobody gets out of sync for very long. And it's way easier to deploy small bits of code, since if something goes wrong, the problem is automatically localized and easy to revert.

Do you routinely split-test new features? I hope to write at a future date about how to build your application so that A/B tests are just as easy as not doing them.

Do you practice Five Why's? Joel himself has written about this topic, in the context of doing root cause analysis to provide excellent quality of service without SLAs. I'm not aware of anyone using this tool as extensively as we did at IMVU, where it became the key technique we used to drive infrastructure and quality improvements. Instead of deciding upfront what might go wrong, we use what actually went wrong to teach us what prevention tactics we need to do. Our version of this was to insist that, for every level of the problem that the post-mortem analysis uncovered, we'd take at lesat one corrective action. So if an employee pushed code that broke the site, we'd ask: why didn't our cluster immune system catch that? why didn't our automated tests catch it? why couldn't the engineer see the problem in their sandbox? why didn't they write better code? why weren't they trained adequately? And make at all five of those fixes.

Do you write tests before fixing bugs? If a bug is truly a defect, then it's something that we don't want to ever see again. Fixing the underlying problem in the code is nice, but we need to go further. We need to prevent that bug from ever recurring. Otherwise, the same blindspot that lead us to create the bug in the first place is likely to allow it happen again. This is the approach of test-driven-development (TDD). Even if you've developed for years without automated tests, this one practice is part of a remarkable feedback loop. As you write tests for the bugs you actually find and fix, you'll tend to spend far more time testing and refactoring the parts of the code that are slowing you down the most. As the code improves, you'll spend lest time testing. Pretty soon, you'll have forgotten that pesky impulse to do a ground-up rewrite.

Can you tell defects from polish? Bugs that slow you down are defects, and have to be fixed right away. However, bugs that are really problems with the experience design of your product should only be fixed if they are getting in the way of learning about customers. This is an incredibly hard distinction to understand, because we're so used to a model of product development teams as pure "execution to spec" machines. In that model, anything that the product owner/designer doesn't like is a bug, and Joel's right that we should always fix before moving on (else you pile up an infinite mound of debt). However, in the learning phase of a product's life, we're still trying to figure out what matters. If we deploy a half-done feature, and customers complain about some UI issues (or split-tests demonstrate them), we should refine and fix. But oftentimes, nobody cares. There are no customers for that feature, UI issues or no. In that case, you're better off throwing the code away, rather than fixing the UI. The hardest part is forcing yoursel fot make this decision binary: either continue to invest and polish or throw the code out. Don't leave it half-done and move on to new features; that's the fallacy Joel tried to warn us about in the first place.

Do your programmers understand the product they are building and how it relates to your company's strategy? How can they iterate and learn if they don't know what questions are being asked at the company's highest levels. At IMVU, we opened up our board meetings to the whole company, and invited all of our advisers to boot. Sometimes it put some serious heat on the management team, but it was well worth it because everyone walked out of that room feeling at a visceral level the challenges the company faced.

What other questions would you ask a brand-new startup about its product development practices? What answers would predict success?

Customer Development Engineering

18:33:00 0 comments

Yesterday, I had the opportunity to guest lecture again in Steve Blank's entrepreneurship class at the Berkeley-Columbia executive MBA program. In addition to presenting the IMVU case, we tried for the first time to do an overview of a software engineering methodology that integrates practices from agile software development with Steve's method of Customer Development.

I've attempted to embed the relevant slides below. The basic idea is to extend agile, which excels in situations where the problem is known but the solution is unknown, into areas of even greater uncertainty, such as your typical startup. In a startup, both the problem and solution are unknown, and the key to success is building an integrated team that includes product development in the feedback loop with customers.

2008 09 06 Eric Ries Haas Columbia Customer Development Engineering

View SlideShare presentation or Upload your own.

As always, we had a great discussion with the students, which is helping refine how we talk about this. As usual, I'm heavy on the theory and not on the specifics, so I thought I'd share some additional thoughts that came up in the course of the classroom discussion.

Can this methodology be used for startups that are not exclusively about software? We talk about taking advantages of the incredible agility offered by modern web architecture for extremely rapid deployment, etc. What about a hardware business with some long-lead-time components?

To be clear, I have never run a business with a hardware component, so I really can't say for sure. But I am confident that many of these ideas still apply. One major theory that has influenced the way I think about processes comes from Lean Manufacturing, where they use these same techniques to build cars. If you can build cars with it, I'm pretty sure you can use it to add agility and flexibility to any product development process.
What's an example of a situation where "a line of working code" is not a valid unit of progress?

This is incredibly common in startups, because you often build features that nobody wants. We had lots of these examples at IMVU, my favorite is the literally thousands of lines of code we wrote for IM interoperability. This code worked pretty well, was under extensive test coverage, worked as specified, and was generally a masterpiece of amazing programming (if I do say so myself). Unfortunately, positioning our product as an "IM add-on" was a complete mistake. Customers found it confusing and it turned out to be at odds with our fundamental value proposition (which really requires an independant IM network). So we had to completely throw that code away, including all of its beatiful tests and specs. Talk about waste.
There were a lot of questions about outsourcing/offshoring and startups. It seems many startups these days are under a lot of pressure to outsource their development organization to save costs. I haven't had to work this model under those conditions, so I can't say anything definitive. I do have faith that, whatever situation you find yourself in, you can always find ways to increase the speed of iteration. I don't see any reason why having the team offshore is any more of a liability in this area than, say, having to do this work while selling through a channel (and hence, not having direct access to customers). Still, I'm interested in exploring this - some of the companies I work with as an advisor are tackling this problem as we speak.
Another question that always comes up when talking about customer development, is whether VC's and other financial backers are embracing this way of building companies. Of course, my own personal experience has been pretty positive, so I think the answer is yes. Still, I thought I'd share this email that happened to arrive during class. Names have, of course, been changed to protect the innocent:

Hope you're well; I thought I'd relay a recent experience and thank you.
I've been talking to the folks at [a very good VC firm] about helping them with a new venture ... Anyway, a partner was probing me about what I knew about low-burn marketing tactics, and I mentioned a book I read called "Four Steps to the…"
It made me a HUGE hit, with the partner explaining that they "don't ramp companies like they used to, and have very little interest in marketing folks that don't know how to build companies in this new way."

Anyway, thanks to Steve and all of his students - it was a fun and thought-provoking experience..

ARTEFACTKU

Maecenas mattis, tortor ut posuere aliquam

Post with SoundCloud

Consectetur adipisicing elit

Post With Featured Image

Elementum mauris aliquam ut

Case Study: Continuous deployment makes releases non-events

Continuous deployment for mission-critical applications

Why Continuous Deployment?

Fear is the mind-killer

Continuous deployment with downloads

Continuous deployment and continuous learning

A new version of the Joel Test (draft)

Customer Development Engineering

Flickr

Labels

Arsip Blog

Entri Populer

Popular Posts

About

About Me

Template Information

Random Product

Tags 1

Tags 2

Tags 3

Tags 4