Why PHP won

PHPWhen I first learned to program on the web, Perl + CGI was the dominant platform. But by the time I was building my first websites for commercial use, PHP had taken over. Since then, PHP (as part of the LAMP stack) has really been the dominant development platform, at least in the free software and startup worlds. Through my platform choices, I have forced many people to learn PHP and to work with it on a regular basis. Some of them are probably still cursing my name, because - let's face it - PHP can be pretty painful. As a language, it's inelegant. Its object-orientation support is "much improved" - which is another way of saying it's been horrendous for a long time. Writing unit tests or mock objects in PHP is an exercise in constant frustration.

And yet I keep returning to PHP as a development platform, as have most of my fellow startup CTOs. This post is about why.

Let's start with some circular reasoning. The number one reason I keep coming back to PHP is that it has overwhelming community support. I've written elsewhere that success in creating a platform is "becoming a function not of the size and resources of the company that builds it, but of the size of the community that supports it." When we started IMVU in 2004, we could rely on a staggering amount of open source software that jumpstarted our initial product offering. From e-commerce to blogs to photo galleries, we benefited from tens of thousands of lines of code that had already been battle-tested. Add onto that the incredible amounts of infrastructure code that we reused, and we were able to bring our product to market months earlier.

But that just begs the question: why does PHP have the most thriving community? It's an example of an information cascade - as PHP started to pull ahead of the platform pack, more and more people started working on it, increasing the size of the community and therefore making it more likely that even more people would choose it.

So, to understand how that cascade got started (and keeps going), we have to look at several key attributes of PHP. You'll notice a pattern: most of the key features that enabled PHP's success are considered among its defects by experts in language design. Ironically, as PHP has grown up, its designers have been busy "fixing" these shortcomings. It remains to be seen whether they will be able to do so without losing what enabled them to succeed in the first place. Or, perhaps they'll create an opening for another web platform to start a new cascade. If that happens, I expect you'll see some of these attributes in that new challenger:
  1. Speed of iteration, thanks to reload-pages-every-time. CGI scripts had notoriously bad performance. That's because for every page request the web server had to spawn a new interpreter process, load the script, and return the output across process boundaries. So when people started using Apache's module system to build better web application platforms, it was a natural that they'd try to make the system more efficient. Take mod_perl, for example. In that system, the first time you loaded a script to handle a page, the whole script (as well as associated data) were kept resident in memory. The next time you needed to handle that page, you could take advantage of caching for excellent performance. There was only one real disadvantage: when the script source code changed, you needed to restart apache so that it could reload. This sounds like a good trade-off, but it turned out to be a classic sub-optimization.

    Memory-resident code made it slightly slower to iterate on scripts. Every change required a server restart. Even on a single server, those extra few seconds between iterations is costly. Writing code with PHP, even though the language is crufty, feels fast, because of the tight write-test-debug loop. Plus, with memory-resident code, it's pretty easy to forget which version of the code is actually running at any given time. The stuff in memory is invisible - you can't double check it. As a result, you have to be a lot more careful.

    This feature also supported multi-user environments much better. You could give someone a directory on your server as effectively a simple sandbox. You didn't need to worry about them breaking your server configuration or restarting it while you were in the middle of something important. This made it easier for ISP's to offer PHP hosting on shared servers, too. All of which meant it has always been exceptionally easy to get started with PHP. And when considering a new language or platform, the getting-started barrier is probably the single most important factor.

  2. Direct mapping of outputs to inputs: URL's to files, code and presentation intermixing. PHP has a number of other seeming defects that make it easy to understand what's happening in a PHP application. This allows new people to get integrated into an existing PHP project faster. For example, I have repeatedly taught PHP to complete novices in several companies, including people who had never before programmed at all.

    Almost every PHP-based website is burdened with ugly URLs that contain the name of the file that implements each page. This is considered bad practice, because URLs are supposed to be about resources (meaning the content of a given page), not implementation. In theory, you should be able to completely change platforms, languages, and servers and not have to change your URLs. However this approach, while elegant, has a major drawback: it requires non-trivial understanding to figure out where to find the code for a given page. Just consider one way this is commonly done: having a table of regular expressions somewhere that maps URLs to pages. This sounds simple, but in order to use it everyone on your team has to: 1) know where to find the list of mappings and 2) be able to read regular expressions correctly. Every other system I've seen has some drawback like this. PHP has no such compunction: the name and location of the file are hanging out in the open, for everyone (including your teammates) to see.

    And then let's consider what happens when you get to the implementation file tiself. PHP encourages one of the worst programming heresies of all: intermixing code and presentation logic in a giant mish-mash. If you're emitting HTML, as 99% of all PHP apps are, it's just so damn convenient to throw in PHP tags along with your layout tags, all inline on the page. In many ways, this is a maintainability nightmare, but it has a compensating benefit. In most cases, simple changes are simple to make. At IMVU, when we'd hire a new engineer, we could get them to ship code to production on their first day, even if they had never programmed in PHP before. That's simply impossible on most platforms. That benefit carries over through the whole life of a project. The vast majority of features are actually only a few lines of code. All the difficulty and effort is in finding the right place to put them. PHP's transparency makes that easier, encouraging more experimentation and fine-tuning.

    All of which leads to a non-intuitive conclusion. PHP shines on one of the most important criteria for platform selection: readability. Yes, you heard me right. Even though PHP's syntax is an inelegant beast, the overall platform is impressively readable.

  3. Incoherent (but huge) standard library. These days, successful programming is as much about processing data as creating algorithms. To be a good data-processing language, you need a large standard library. PHP has always scored well on this count, with lots of support for database drivers, URL parsing, HTTP fetching, regular expressions - you name it. And this all came bundled up in another bad practice: a big monolithic interpreter. For most of its existence, PHP didn't have a standard package-distribution system or very good module support. As a result, important features that were used widely almost had to be bundled into the interpreter itself. Although this was annoying for sysadmins, security consultants, and language purists (and for those who had proprietary modules that couldn't be bundled), it was a huge boon for developers. It meant that the PHP brand stood for the same set of tools always and everywhere. If someone offers you a library or script, it's a huge benefit to know that it will run in your environment, without having to worry about dependencies, which leads to a lot more code sharing and code reuse.

  4. Bad OOP support. PHP began, as many scripting languages do, with pretty primitive language features. It was especially criticized for its lack of object orientated programming support. Although recent versions have added robust objects, inheritance, and interfaces, for most of its life PHP's objects were little more than decorated dictionaries. This is another feature that makes language purists cringe, but was key to PHP's success.

    One of the overriding benefits of OOP is encapsulation: the ability to take a chunk of code and data and put them behind an interface. At the cost of a little extra indirection, you can organize your code around a series of long-lived objects, each of whom communicates with the other objects in simple, easy-to-understand ways.

    In practice, though, this extra indirection imposes a steep penalty in transparency. Most PHP scripts are not long-lived, meaning that every object has to be constructed, used, and disposed for every request. And most PHP servers are run in a stateless mode, delegating storage to other processes like memcached and MySQL. So, in many cases, there's not much readability benefit to constructing an object, when all it is is a lightweight wrapper for external data access methods. A "module" of related functions does the job just as well, and is a lot easier to find when you're trying to understand how an application works. This is yet another way that PHP supports non-programmers well.

    There's another big benefit: one of the major strengths of the web is that it auto-encapsulates code behind URLs. That's part of its magic, and PHP takes advantage of it. Everything is request-oriented, short-lived, and stateless. A PHP script is like an object itself, encapsulating a bit of functionality behind an interface defined by HTTP. It's one of the most effective paradigms for software devlopment in history. The platforms that win on the web are those that mirror its fundamental structure, not those that try to morph it into a more traditional "elegant" shape.
I'll close with one last thought. The inimitable Paul Graham has an excellent essay called The Python Paradox in which he argues:
...that you could get smarter programmers to work on a Python project than you could to work on a Java project.

I didn't mean by this that Java programmers are dumb. I meant that Python programmers are smart. It's a lot of work to learn a new programming language. And people don't learn Python because it will get them a job; they learn it because they genuinely like to program and aren't satisfied with the languages they already know.

Which makes them exactly the kind of programmers companies should want to hire. Hence what, for lack of a better name, I'll call the Python paradox: if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it.
As always, Paul is right. So, given that PHP is so popular, you might think it wouldn't be a great choice for companies that are trying to hire great programmers. Given that startups depend on superstars to survive, why stick with PHP? In my role as a CTO, I've always tried to choose the right tool for the right job. The IMVU downloadable client, just to name one example, is written primarily in python. This let us hire extremely high-caliber programmers to work on it. You might then assume that our backend was written in mod_python. But it's not.

It's easy to forget that these decisions need to be made, not on a language-by-language basis, but on a platform-by-platform basis. PHP may not be a great language, but the platform it enables does attract one particular kind of great developer: the cutting edge web gurus who work primarily in javascript, DHTML, and Ajax. For them, PHP is an ideal language precisely because it gets out of their way, allowing them to build a simple foundation for their complex and beautiful browser-based cathedrals.

When simple things are simple, and hard things are possible, great people can take it from there. So here's to the team that built PHP. Thank you.
Reblog this post [with Zemanta]

0 comments:

welcome to my blog. please write some comment about this article ^_^