Embrace technical debt

Financial debt plays an important and positive role in our economy under normal conditions. Yet, especially in times like these, it’s easy to rail against the badness of being in debt; it’s a very human feeling. Remember Hamlet?

LORD POLONIUS:
Neither a borrower nor a lender be;
For loan oft loses both itself and friend,
And borrowing dulls the edge of husbandry.

Technical debt works the same way, and has the same perils. Here’s one of my favorite introductions to the subject, courtesy of Martin Fowler:

In this metaphor, doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. We can choose to continue paying the interest, or we can pay down the principal by refactoring the quick and dirty design into the better design. Although it costs to pay down the principal, we gain by reduced interest payments in the future.

The human tendency to moralize about debt affects engineers, too. Many conclude that technical debt is a bad thing, and that teams that incur technical debt are sloppy, irresponsible or stupid.

In this post, I want to challenge that idea, by talking about real-world situations where debt is highly valuable. I hope to show why lean and agile techniques actually reduce the negative impacts of technical debt and increase our ability to take advantage of its positive effects. As usual, this will require a little theory and a willingness to move beyond the false dichotomy of “all or nothing” thinking.

I won’t pretend that there aren’t teams that take on technical debt for bad reasons. Many legacy projects become completely swamped servicing the debt caused by past mistakes. But there is more to technical debt than just the interest payments that come due. Startups especially can benefit by using technical debt to experiment, invest in process, and increase their product development leverage.

In a startup, we should take full advantage of our options, even if they feel dirty or riddled with technical debt. Those moralizing feelings are not always reliable. In particular, try these three things:

Invest in technical debts that may never come due.
The biggest source of waste in new product development is building something that nobody wants. This is a sad outcome which we should work very hard to avoid. Yet there is one silver lining when it does happen: we wind up throwing out working code, debt-riddled and elegantly designed alike. This happened quite often in the early days of IMVU.

For example, I’ve talked often about our belief that an instant messaging add-on product would allow IMVU to take advantage of a network effects strategy. Unfortunately, customers hated that initial product. The thousands of lines of code that made that feature work were a mixed bag – some elegantly designed and under great test coverage, others a series of hacks. The failure of the feature had nothing to do with the quality of the code. As a result, many technical debts were summarily cancelled. Had we taken longer to get that feedback by insisting on writing cleaner code, the debt would have been much deeper.

Accept that good design sometimes leads to technical debt anyway.
Discussions of technical debt are usually framed this way (again from Martin Fowler):

The metaphor also explains why it may be sensible to do the quick and dirty approach. Just as a business incurs some debt to take advantage of a market opportunity developers may incur technical debt to hit an important deadline.

This framing takes for granted that the quick and dirty approach will incur significantly more technical debt than the slow and clean approach. Yet other agile principles suggest the opposite, as in YAGNI and DoTheSimplestThingThatCouldPossiblyWork. Reconciling these principles requires a little humility.

Most of us think we know a good design when we see it. Unfortunately, no matter how much up-front analysis we do, until the design is tested by actual practice, we can't really know. Outside the world of hypothetical examples, it's more important to make continual progress than to build the ultimate design.

For example, at a previous virtual world company, we spent years developing an architecture to cope with millions of simultaneous users. Unfortunately, we made two critically flawed assumptions: that customers would primarily consume first-party assets that we shipped to them on CD and that they would tend to congregate in a relatively uniform way. Neither assumption proved remotely accurate. The design failure meant that there was constant thrashing as the servers struggled to provision capacity according to the “elegant” algorithm we’d designed.

As in many scalability decisions, we’d have been much better off investing in agility, so that we could change the architecture in response to actual customer demand, rather than trying to predict the future. That’s what Just-in-time Scalability is all about. Sometimes quick and dirty actually incurs less debt.

Leverage product development with open source and third parties.
Financial leverage refers to investing that is supplemented by borrowed money. Similarly, product development leverage refers to situations in which our own work is fortified by the work of outsiders. For example, early on at IMVU, we incorporated in tons of open source projects. This was a huge win (and we were delighted to give credit where it was due), because it allowed our initial products to get to market much faster. The downside was that we had to combine dozens of projects whose internal architectures, coding styles, and general quality varied widely. It took us a long time to pay off all the debt that incurred – but it was worth it.

In addition, third-party services and API’s enabled us to do more with less, but at a cost: taking on the technical debt of products and teams outside our direct control. We’re not accustomed to accounting for technical debt that occurs in code that we don’t write, but this is short sighted. It’s important to learn to see the whole system that makes our product work: human as well as machine, internal as well as external.

For example, IMVU’s early business model was made possible by Paypal’s easy self-serve and open access payment system. However, we’ve often had to put up with unreliable service, caused by their inflexible internal architecture. We had to live with their technical debts without being able to repay them. It was still a good trade.

Not all debts are created equal.
Interest rates vary, so we should be selective about taking on new debts. Given the choice between incurring technical debt in a particular end-user-visible feature and incurring the same level of debt in a core system, I’d much prefer the former. Here’s why:

  • There’s a chance that I’ll never have to pay for that particular debt, because the feature may have no value for customers.

  • It’s possible that the feature, even with debt, might be good enough, and therefore not need revision for a long time. Technical debt manifests as rigidity or inflexibility. When modifying a part of the product afflicted by debt, the work requires a lot of extra – and unpredictable – clean up. But if a given feature is rarely modified, its debt is much less expensive.

The opposite is true with debt in a core system; it’s much more likely that this debt will slow down our ability to make changes later on. For example, an unreliable library deep in the core will manifest as intermittent defects all throughout the product, each of which is hard to localize and debug. Side-effects that reduce agility are the most damaging symptoms of technical debt.

Lean vs. debt
In the world of physical goods, the leaner a supply chain is, the less debt is required to operate it. This makes lean supply chains more robust in the face of the unexpected: if sales suddenly dry up, they are stuck with less unsold inventory and simultaneously have less debt to service. The just-in-time nature of the value chain reduces risk in the face of uncertainty and is also more capital efficient.

A similar relationship applies to technical debt. Teams that practice an agile or lean development process are able to minimize the accumulation of technical debt without sacrificing speed, because they work in smaller batches. They also take better advantage of debt, because they find out sooner if a particular investment has paid off. Traditional development teams, by contrast, often build and deploy large systems before learning if their early choices were sensible, and therefore wind up with a much larger debt to pay. In fact, by the time they become aware of it, they’ve already started to pay significant interest on that debt.

Invest in speed instead of features or debt
This relationship between lean and debt opens up new approaches for dealing with technical debt. The usual debate is phrased as an either-or choice between taking more time to “build it right” or taking a shortcut and incurring more debt. But those are not our only two options. Taking on technical debt does allow investing energy elsewhere, but other new features are not the only option.

We can trade technical debt for process improvement, too. If that improvement pays off (by reducing the batch size of our work, for example), it becomes easier to address all technical debt in the future – including the debt just incurred. And because any particular debt might never come due, this is a better trade. To take one concrete example, it’s often worthwhile to write test coverage for legacy code even without taking the time to refactor.

This reverses the standard intuition about what engineering activities add value, which usually concludes that test coverage is a form of necessary waste but a refactoring is value-added work. However, a refactoring (by itself) might go stale or introduce unintended side-effects. Adding test coverage will make it easier to refactor in the future and also reduce our fear of making changes elsewhere.

Investing in the dynamics of development is more valuable than investing in the static status quo. Startups are always moving, so invest in moving faster and better.

Technical debt in the real world
So far, all of these considerations have been framed in the form of abstract either-or tradeoffs. Real life seldom presents such comparable choices. Instead, we balance lots of unknowns. How much technical debt will a particular approach incur? How likely will customers ultimately use that feature? How painful will it be to refactor later? How much will it slow us down in the meantime? And how much more expensive would it be to do it right? Oh, and how likely is it that the “right” approach actually is?

Luckily, there are better options for these complex decisions than picking an easy extreme, like “never incur technical debt” or “anything goes.” Instead, we can choose a disciplined approach to making proportional investments in prevention and paying down debt, such as Five Whys. They work by focusing our energy on making process and technical changes in precisely those areas that are causing the biggest waste and slowdown.

This is better than making abstract choices about where to invest: better design, paying down old debts, or better process. Instead, techniques like Five Whys teach us to view the entire application and product development team as one integrated system. From this holistic viewpoint, we can optimize accordingly.

Once we can see opportunities for truly global efficiency gains, all that remains is to ensure our team actually makes room for those investments. To do that, we add specific speed regulators, like integrating source control with our continuous integration server or the more elaborate dance required for continuous deployment. This produces a powerful combination: the speed of just-in-time experimentation wedded to a discipline of rigorous waste-reduction.

One last thought. When I talk and write about the advanced product development process at IMVU today, like the cluster immune system or the disciplined approach we take to split-testing and interaction design, it may sound as if we had that capability from the start. Nothing could be further from the truth. The early IMVU was riddled with legacy code and technical debt. We spent endless hours arguing about whether we’d made the right choices in the past. And with the benefit of hindsight, it’s clear that we often made serious mistakes. As one engineer recently told me, “Once we had money in the bank and were near-profitable, I think we would have been well-served by increased up-front product and technology planning. As a culture, we hadn’t yet learned how to make long-term decisions.” He’s right.

In the end, what mattered wasn’t that we did everything right, but that our fundamental approach was flexible and resilient. At no point did we stop everything and do a ground-up rewrite. Instead, we incrementally improved our process, architecture, and infrastructure, always learning and adjusting. The blur you see today is the result of the beneficial compounding interest of that approach applied with discipline over many years. Trust me, it’s a lot of fun.

(This post was tremendously enhanced by a number of early readers from the Twitterverse. You know who you are. Thanks so much.)

Reblog this post [with Zemanta]

0 comments:

welcome to my blog. please write some comment about this article ^_^