Faster PHP fo shizzle—HipHop for PHP

The Lunch Break

I had to move my car during the break so I missed most of it. I suppose I should note now that I wasn’t impressed with the cafeteria food, but since it was the second time I’ve been able to actually eat at Facebook—they grew too big and I grew too impatient all the other times—I suppose I shouldn’t complain. Oh yeah, I’m apparently FourSquare Mayor of all the fast food places in Mission Beach, so that gives you an idea of my palette.

In any case, I used that time to catch up with Haiping. There was some amusement when he thought everyone in the PHP world was as skeptical about HipHop as me, Lucas, and Shire.

PHP at Facebook (before HipHop)

“Move fast and break things.”
—Mark Zuckerberg, CEO of Facebook, talking about Facebook’s core engineering philosophy

(This was given by Haiping.)

HP likes PHP

Put simply, PHP means fast development.

First, it is easy to pick up. It is much easier to hire PHP developers compared to any other language out there. Moreover, you don’t have to hire PHP developers, you can just hire engineers who know other languages and teach them PHP. In Facebook’s case, 50-60% of their engineers do not know PHP coming in, but they pick it up in a few weeks.

Here is “Hello World” in PHP:
Hello World!

Paul Reinheimer (and Andrei)

Paul Reinheimer (and Andrei)
Facebook, Palo Alto, California

Olympus E-P2, M.ZUIKO Digital ED 14-42mm 1:3.5-5.6
1/60sec @ ƒ3.5, ISO500, 14mm (24mm)

“You’ve probably visited one of my sites.”
—<a href="http://blog.preinheimer.com/" title="Paul Reinheimer

Paul is the senior developer at a bunch of large websites that the internet is for. Needless to say, scaling traffic is an issue for them.

Furthermore, PHP is easy to debug. It really fits well with an open source/open development structure. In Facebook’s case, because of PHP, every engineer can own and understand the entire stack. This means code can be modified by anyone else and, since every patch is code reviewed and test plans are written, this can be carried over and the necessary parts can be cobbled together for release (this is the subject of other talks already given by by other Facebook engineers at conferences so I won’t go into this.)

Finally, from Haiping’s perspective (as a C++ coder), there are a number of wins. The language is extremely straightforward with a syntax very similar to C++ and Java. However, you can see that PHP is a thin layer between it and the kernel (he means lower level languages like C), unlike .NET or Java which is highly abstracted. Here traditional weaknesses of PHP (like inconsistent function naming, lack of lexical scoping) become strengths when you are trying to write a cross-compiler. For instance, the reason function naming and parameter ordering is inconsistent in PHP is because that’s the name and ording in the C libraries they are based on—they’re almost straight translations. Another example is that PHP has only one level of scoping (no more), and global variables have to be explicitly imported—this will make a huge difference when parsing for static types as we’ll see later. Given what I’ve written about him in the past, you can almost see Haiping chomping at the bit to tackle this optimization problem.

Four problems of PHP

PHP does present four challenges for Facebook.

The first is High CPU utilization.

CPU shootout (by language)

A comparison of CPU performance on some artificial benchmarks—the data is here. Smaller bars are better.

Some notes about the benchmarks: The tests were done an a quad-core 64-bit Linux machine, which is a typical type of CPU/OS for a website on Facebook’s scale. The benchmarks, however, are quite artificial, compared to our problem space—scripting languages should do much worse and Python is more optimized for these than the others. Finally, some marks are missing making the geometric mean a little flawed. They do, however, demonstrate a point—while you can’t take the raw numbers as gospel—no, PHP is not 30 times slower than Java on the web—relative rankings should are mostly unchanged.

If we could perform the suite on this set, I’d expect HipHop for PHP to improve the performance conservatively at 3x. This would put it near the performance of Erlang—significantly better than Python, Perl, and Ruby, but similarly significantly slower than C#, Java and native C++.

In general, Facebook claims, that the ratio of users-to-developers on their site means the problem gets amplified with the scale. (I tend to think the ratio is the ratio of user-space-coders-to-core-development is also significant—Yahoo has hundreds of C extensions for PHP; Facebook has around a dozen.) By this he concludes something well-known operationally on consumer facing websites of this scale: small savings in time = big savings in money because hardware costs and power costs take a huge chunk of the overall budget. In Facebook’s case, currently a 20 millisecond per page savings is estimated at a $10 million/year overall savings.

The performance gap between PHP and C++ is emphasized simply by the architecture of Facebook. Facebook is a very monolithic PHP application (unlike Yahoo, which would be many independent separate applications of reasonable size). In their infrastructure currently, include structures may chain as deep as 50 levels, with over a thousand separate file includes in order to render one page.

(This is highly unusual, even for a social network, and normally would have been coded around. That can’t be done here. We’ll see when we investigate other attempts to leave previous attempts to leave PHP.)

The second problem is high memory usage:

Memory usage (by language)

This is the memory usage for the above benchmarks. Smaller bars are better.

Java looks horrible because of its automatic garbage collection.

A more telling feature of the quirky memory usage of PHP is this a completely artificial benchmark, array assignment:

for ($i=0; $i<$max; $i++) {
	$a[] = $i;
}

Yields the following memory footprint:

System Memory (MB)
PHP 5.2, $max=1,000,000M 150
PHP 5.2, $max=5,000,000M 700
HipHop, $max=1,000,000M 17
HipHop, $max=5,000,000M 47

Of course, this is an extreme case, but the point is memory efficiency is bad. (Given some of the patches to PHP core I’ve seen coming out of Facebook, I’d be unsurprised that some code there is assigning an array with a million nodes.)

(This is significant because it sounds like Facebook web servers are running at peak util on both RAM and CPU—a very uncommon situation. This is a sign of bad luck or a highly-optimized, well-utilized PHP installation.)

The third problem is that Facebook has a strong C++ and Python following (as well as other languages). PHP is used in a number of areas: display (to HTML and Ajax), business logic, and data models. All of that is built on infrastructure modules, some of which are also built on PHP. These C++/Python developers would like access to components, some of which is only written in PHP. This means that it is difficult to use PHP logic in other systems.

(Note this isn’t as bad as it sounds. Facebook does have an API layer in the form of Thrift for inter-language communication. But not all stuff is Thrift-enabled and there is the RPC overhead associated with this. Most startups on this scale are less tightly-coupled than Facebook. For instance, at a social network I worked at, all high-performance modules are written in Java, not PHP; no display logic is coded without a corresponding API (both Ajax and binary protocols); and the Java layer “speaks” PHP and directly accessed same memcached objects, databases and filesystem as PHP.)

The fourth problem is that extensions are hard to write for most PHP developers (at least at Facebook). While Yahoo has had this approach to performance, a survey of extension use at Facebook finds only 15 custom PHP extensions written from scratch there—seven of which were written by Haiping.

Other attempts to leave PHP

“Gallium Arsenide is the ‘material of the future’… and will be so in the future.”
—Aphorism in solid state physics

There have been many attempts to migrate away from PHP at Facebook over the years.

Since 2007 alone, Haiping named four failed attempts: to Python (twice), to Java, to C++.

What happened was the same problem that you see in GaAs-technology for semiconductors. A friend of mine is fond of quoting the aphorism above. In materials sciences, as a semiconductor, Gallium Arsenide has a number of advantages over silicon—imagine 300Ghz CPUs and you get the idea. Because of that a lot of research over the years has been invested in trying to make GaAs microprocessors. But the issue is that silicon too has a number of advantages that allow it to ride Moore’s law exists. After a decade of research, a GaAs microprocessor may have matched silicon’s performance numbers only to find out silicon has increased to 32x the processing power during that same time period—chasing and chasing, and only falling further behind.

These attempts fall into this same classic trap of architecture migration. Let’s say you gather a number of five great engineers who are really excited about moving Facebook over to Python. They’re so motivated and so good, that they work late every night to turn out 7500 lines of Python that day. The problem is, in the same day, Facebook has a hundred engineers who maybe are not as motivated, but they’re just as talented, and they’ve turned out 10,000 new lines of PHP—a language with a similar work/line of code as Python. Not only that, how do you keep up the rate as new hires come in and increase the total PHP code output of the company? You’ve lost.

Facebook offices

Facebook offices
Facebook, Palo Alto, California

Sony DSC-WX1
1/100sec @ ƒ2.4, ISO640, 4.3mm (24mm)

Facebook, like many Web 2.0 companies, has open offices. They use the same model desks that I used at Tagged (Ikea Galant) which is a step down from the ones at Plaxo and Meebo (Ikea Effektiv). The actual size of the desks is slightly smaller than the one I had because we wanted ours to face each other: designed to fit 2 wide = 1 length, but they’re currently laid out in long horizontal lines. These sort of inches add up in a company—you can fit far more engineers in a tiny space this way, then with cubes.

To effect this sort of change, you depend on everything to come to a screeching halt as you migrate the platform. It’s not a language thing—Friendster tried to migrate to PHP—it’s a process thing. And amplified when your CEO is repeating the mantra: “Move fast and break things.”

From my personal experience, I was involved with a re-architecture (without language change) of a social network. At every point the new code added had to be more efficient than the piece it replaced and feature failover was designed to bootstrap the old code if needed. There were only a few “old infrastructure” programmers to compete with. Many of the re-architected libraries allowed the developers to code much more efficiently under the new system. The existing release cycle was slow (seven major releases a year when I started). Even under these relatively benign re-architecture conditions, it took two years to complete and I exited with a modern web release cycle in place—250 releases a year. Facebook has an architecture as variegated as the teams that built the features, has over a hundred “old infrastructure” programmers, an no obvious global efficiencies that haven’t already been worked around through a patch to the PHP engine or APC, and releases multiple times a day!

Improving PHP

Another approach is to improve the Zend core. In fact, many times in Facebook’s history, the growth curve was about to break everything when some improvement to APC, Zend, etc. was done to get them out of the bind. You can get a great idea of some of those from Brian Shire’s PHP Tek talk on it (Facebook blog) and there were ooze other ideas left to try: like warming up interpreter, roll back and restarting

For some reason or another, engineering management felt these sort of patches would only yield a marginal increase in performance going forward.

Another approach is to migrate Facebook to Apache 2.0. Facebook is on Apache 1.3. This was a little confusing, but what they really mean is upgrade to a threaded Apache 2.0 instead of in Apache in MPM prefork mode. They were further confused in not understanding that the PHP core is actually already thread-safe, but none of the libraries are. PHP is a lot like Linux: when we use the term “Linux” we don’t mean just the kernel, but the entire GNU/Linux software stack that forms the operating system; when we use the term “PHP” it refers to more than Zend and the core—to the entire commonly used extension stack that sits on top of it. Auditing PHP for multithreading would mean auditing every library—not a simple task and definitely with no support from the community.

Well in any case, for better or worse, after about four years of straight successes with this process, it lost out to the HipHop project.

61 thoughts on “Faster PHP fo shizzle—HipHop for PHP

    1. My blog post isn’t finished but I haven’t claimed “only if you’re big.” (yet). My claim is that you need over 2 machines and a bottleneck in the application server (which is rare).

      To your point specifically, Harry, you are correct. If your latency gets degreased than this is good. But taking a real world example, before I joined one startup, "Hello World" took 240msec, a rearchitecture (without something as drastic as HipHop) dropped it to 15msec. I don’t think latency would be a win at that point alone. My guess is similar improvements can be found at other companies that are >90msec. However that is not always the case: Rasmus feels this may be a win for frameworks—their bloat usually destroys response time.

      In Facebook‘s case, all the "big latency" hurdles were eliminated when they moved to lazy-loading APC, they are clearly thinking about eliminating even more with the ability to snapshot the core and restart from there (that that approach is highly complex). And their big issue is sheer cost, not latency. So to them total time matters and latency is simply a matter of running the servers sparsely.

      Unfortunately I cannot quote Facebook’s numbers on CPU time. You will have to ask them, or figure it out yourself by the copyright trick.
      My recent post Faster PHP fo shizzle—HipHop for PHP

  1. Maybe I came off wrong in my comment but it was more a reflection of the hype and not the technology.

    I shouldn't have acted so quickly to say that WordPress used eval all over the place because as you say that isn't correct. I was trying to find the least complicated reason to give for why people using WordPress shouldn't bother.

    My last resort comment was based on there not being much more you can do to speed up PHP than compiling it to binary. If there were then Facebook would have done that instead. That isn't a bad thing but it is a reflection of what people will be using the tool for.

    The hype around HipHop makes it out to be something everyone that uses PHP will be using (that is why you get comments like Patric's about how great it will be for WordPress users). You need to be committed to using PHP compiled with HipHop. I'm sure the hype will die down but what is worrisome is that people who don't understand what to use it for will fail in their attempts and complain about it.

    On another subject, does anyone know when it is it going to be released? I want to actually use it so my comments are based on some reality.
    My recent post Developing Adobe Air Apps with Linux

    1. Hmm, what part of “possibly” did you miss?

      In any case, it depends on the benchmark, but if the benchmark is artificial enough (like many of those in the Alioth shootout), then the static analyzer can replace nearly everything with native C++ calls. At that point, you’re basically benchmarking Java vs. C++, not PHP.

      If you look at the same tables you reference, you’ll note that C++ does better in CPU usage than Java. Both C++ and PHP (native) already do miles better than Java in total memory usage (because of automatic garbage collection).

      In practice, I’d say it puts them in the same class in terms of CPU—mostly slightly slower, but a few times much faster. This should come as no surprise because Java has a JIT and HipHop is a cross-compile to C++ which is a straight compile.
      My recent post Faster PHP fo shizzle—HipHop for PHP

  2. HipHop is interesting but, I'll definitely argue about it being PHP. By picking and choosing what language features they'll support they're building a language kinda sorta like PHP but not quite PHP.

    Considering how many OSS apps and frameworks use eval() I also think it's disingenuous of them to characterize it as a rarely used feature. Now, maybe it's one that _should_ be rarely used but, that's a different argument.
    My recent post HipHop for PHP is not PHP

    1. I understand what you’re saying but it’s a losing argument. The frameworks you mention that can’t implement HipHop almost all because they depend on dynamic scripting of template pages for performance. That would no longer be needed in HipHop.

      I’m not saying you are wrong right now, I’m just saying that it’s a lot easier to port frameworks than you think. They simply have to add a flag to allow you to turn off any dependency on dynamic scripting components like Smarty. They shouldn’t be necessary to run the base framework.

      Before HipHop, there was no reason to not do dynamic scripting and a whole host of reasons why performance improves when you do. Now, HipHop changes that cost-benefit. To not expect framework developers (who I feel have as a failing their alacrity in which they adopt anything new), to change due to that is short-sighted.

      I argue in the article why OSS apps probably won’t change.
      My recent post Faster PHP fo shizzle—HipHop for PHP

  3. Oh, I think that much of the OSS world will adapt and quickly. Supporting HipHop will likely become a checklist feature and looking at the usage of eval() in some projects it would be trivial to remove. I mainly cited them as part of taking issue with their "rarely used feature" characterization.

    My larger point is that instead of actually supporting the PHP language, they're moving the goal posts to a position more convenient for them and calling it PHP. For better or for worse, eval() is part of PHP.
    My recent post HipHop for PHP is not PHP

  4. Oh, I think that much of the OSS world will adapt and quickly. Supporting HipHop will likely become a checklist feature and looking at the usage of eval() in some projects it would be trivial to remove. I mainly cited them as part of taking issue with their "rarely used feature" characterization.

    My larger point is that instead of actually supporting the PHP language, they're moving the goal posts to a position more convenient for them and calling it PHP. For better or for worse, eval() is part of PHP.
    My recent post HipHop for PHP is not PHP

  5. I'm interested in what language features besides eval() are not supported. They give eval() as an example but imply there are others. Seems kind of important to be able to consider what will and will not be available before getting TOO excited…

    1. I have a list of some which I’ll get to when I finish the article but here is a quick rundown off the top of my head.

      – eval() not supported
      – dynamic scripting is not allowed (That's where you use PHP to create a PHP file. Like when you use Smarty to compile a file).
      – create_function() is not supported
      – preg_replace when using e (execute PHP code on match)
      – some functions are not implemented yet/were overlooked (An example was that was php_version() was not returning anything which was crashing HPHPi when it was running against the WordPress codebase. These bugs should be reported and fixed though.)

      …and there was something to do with ordering where it works in PHP but won't when the static analyser hits it. Meaning in some of your scripts you may have to move things around for it to work.
      My recent post Faster PHP fo shizzle—HipHop for PHP

  6. hiphop won’t change the fact that php is a language most people grow to hate. i don’t know anyone who likes it more after a year than they did on day 1. so hiphop doesn’t make the rewrite argument go away. it might delay it, but inevitably the pain of actually writing and maintaining php remains.

    1. You dance with the one who brung you.

      I didn’t advocate when Friendster decided to switch from Java to PHP. I didn’t advocate when Del.icio.us decided to switch from Perl/Mason to PHP/symfony. I don't advocate anyone switch to PHP because of HipHop on PHP. Why would I start arguing that a company leave PHP because apparently according to your limited experience nobody "likes it after more than one year"?

      Architecture changes are hard because they are inherently waterfall. They are especially hard since the web development cycle is tight (if the company is any good). If you want to shoot yourself in the head, (or if you are a consultant, cause your clients to shoot themselves) be my guest.
      My recent post Faster PHP fo shizzle—HipHop for PHP

      1. "I guarantee those engineers who failed were a lot smarter than you."

        ? you don't even know who the hell i am. i single-handedly created the most popular news website in the world, which has been #1 for a decade. you can piss on that too or you can admit maybe you and your cabal are by no means the last word in who knows how to build websites.

        1. No, I don’t know who you are. The fact that you hide behind a curtain of anonymity while I don’t speaks volumes as to your authority.

          You’re afraid to put up, therefore you get shut down.

          Unlike you, I am never secret in my affiliations. Thus, to my knowledge, I’m not part of a cabal. But, I would like to know the number of your crack dealer, since you are obviously on it. 😀

    2. I started using PHP for side projects about 8–9 years ago. I started using it as one of my primary responsibilites at work about 4 years ago. While I don't recall exactly how much I liked it 9 years ago, I'm quite fond of it now.

      Poorly designed code is painful, regardless of its language, and "the pain of actually writing and maintaining [code]" is part of software. I don't see how that's unique to PHP.

    3. I started using PHP for side projects about 8–9 years ago. I started using it as one of my primary responsibilites at work about 4 years ago. While I don't recall exactly how much I liked it 9 years ago, I'm quite fond of it now.

      Poorly designed code is painful, regardless of its language, and "the pain of actually writing and maintaining " is part of software. I don't see how that's unique to PHP.

  7. Pingback: abcphp.com
  8. It's a shame that so much time has been wasted creating a PHP to C++ cross compiler. Sure, it will help Facebook, and some other large websites in speeding up their systems, but it encourages more PHP usage, which is a downright awful language. PHP needs to die.

    Also, your arguments about PHP being a more universally supported language than some other scripting languages is archaic. Shared hosting is approaching the end of its lifetime, and anyone who wants to create a Python/Ruby/Scala/etc website will be able to do so thanks to on-demand cloud computing.

    TLDR: PHP is dead. Get over it.

    1. Ahh, another anonymous comment with a blanket “PHP is bad” statement and no evidence to back it up. Did I get slashdotted and no one tell me?

      Shared hosting may be EOL for those people doing Web 2.0 startups, but for the SME market it is not only alive and well, but thriving. The SME market is many orders of magnitude larger than Web 2.0 startups—talk to GoDaddy sometime before you make that claim. In fact, I’ve noticed that 3 of the top 3 open source CMSs (which pretty much own about 90% of the open-source CMS market) are written in PHP. Shared hosting was instrumental, and no amount of slicehosting will eliminate that, since slicehosting is not used by non web-based SMEs.
      My recent post Faster PHP fo shizzle—HipHop for PHP

  9. "phalanger? – MS bought the team, they’ve disappeared"

    Are you sure about that? Phalanger is alive and well and being developed to a new version (3.0) by a UK Software company and a team at CHarles University. It's being deployed in the Enterprise and in Government.
    My recent post Drumma Boy discography

  10. "phalanger? – MS bought the team, they’ve disappeared"

    Are you sure about that? Phalanger is alive and well and being developed to a new version (3.0) by a UK Software company and a team at CHarles University. It's being deployed in the Enterprise and in Government.
    My recent post Drumma Boy discography

  11. I also think that many developers will not use it, as many people say it only makes sense if you know what you are doing, if the problem is cpu/memory AFTER profiling the code, and if you have at least 3 servers and you can perhaps save one of them.
    I'm really interested in the source code, examples, "compatibility lists", translated extensions and so on, this will take a while until we are able to use it I think.
    My recent post HipHop für PHP

    1. That is like the assertion that until 1994, COBOL was the language of choice, because it took that long for the new code to outstrip the legacy. And if you count ABAP/4 as COBOL, you could claim it came even later.

    2. Wow, passing C++ and Visual Basic. That’s phenomenal (and unexpected).

      Still most problems can be solved with any language. I feel even Facebook’s could have been. The issue is that language would have come with its own baggage.

  12. I too have a question about the 'only worthwhile if you're big' sentiment: wouldn't improved memory efficiency be very important to a site that's running on a tiny VPS? Or are the memory gains not really that substantial?

  13. You are too pessimistic, HipHop PHP is going to change everything!!!! Be more excited and happy my friend, this enable developers to streamline other development concerns by increasing performance.

    For instance, our development cluster in-house is larger than our production because we do massive testing (i.e. download the entire site and analyze it). This will be a great win for many development teams if harnessed correctly.
    My recent post HipHop PHP is going to save the world

  14. Terry, can you , please, notify people somehow (twitter probably) once you finish this article? I don't want to check every week if you updated it (or not). That would be nice.

    And, it seems like there is some issue with displaying/formatting posts' dates on your blog.

Leave a Reply to tychay Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.