Faster PHP fo shizzle—HipHop for PHP

FAQs

[Note: When I say (paraphrased), the people linked didn’t actually say those words, but it does express a sentiment or a misinterpretation could ensue that I want to address.]

[IN PROGRESS: THIS SECTION IS INCOMPLETE & MAY HAVE ERRORS]

“If successful, this is going to fragment the PHP community.” (paraphrased)

Doubtful.

The PHP community is actually mostly userspace PHP and userspace doesn’t need—and can’t use—HipHop. PHP, in the end, is built on the backs of user-space applications run on shared hosting—this is the large funnel that drives adopation and has catapulted PHP to the 3rd most popular language overall .

Core development is much more pragmatic than userspace, but key limitations of HipHop (mentioned below) preclude the core from running in that direction. Besides, Ilia cogently notes that PHP core developers are C coders, not C++.

At the Summit
At the Summit
Facebook, Palo Alto, California

Sony DSC-WX1
1/125sec @ f/2.4 ISO400, 4.2mm (24mm), vertical panorama mode.

The usual suspects, minus a few people, are in the room. I should have gone the second day so I could get plastered. I’ll be blogging a little about about these people as breaks from the article at large (too large?).

“PHP is a slow interpreted language”

[TODO]

“HipHop is a [JIT|Rewrite of the core|new language].” (paraphrase)

No. No. and No.

A lot of this came about because the rumor mill was going heavily before the release and I share in some of the blame by not being clear in my reference to it back in December. Here’s an example of how bad it got by January:

“For example, this guy right now is single-handedly rewriting, essentially, the entire site. Our site is coded, I’d say, 90% in PHP. All the front end — everything you see — is generated via a language called PHP. He is creating HPHP, Hyper-PHP, which means he’s literally rewriting the entire language…However, if you went to go talk to him about basketball, you would probably have the most awkward conversation you’d have with a human being in your entire life. You just can’t talk to these people on a normal level.”
Anonymous Facebook employee

I kept that last sentence in because I have had no trouble talking to Haiping about basketball—just keep your discussion focused on the Houston Rockets ;-)

“This is no different than [some other PHP compiler]”

[TODO]
- Roadsend, Quercus, Phc, etc.

Here’s a history lesson: there were three nuclear powerplant designs in the United States. The first was designed and completed first by General Electric—that’s the one that powered Three Mile Island, the next was designed and completed by Westinghouse—that’s the basis of the Light water reactor designs you see talked about recently, the third was designed by General Atomics.

Since my dad worked for General Atomics in graduate school and was a Westinghouse employee for thirty years, I asked him about which design was the best. He said, “I felt the GA design was the safest of the three, but it was never produced because the other designs were completed first. When you deal with nuclear power, nobody wants to take risks.”

HipHop has one thing going for it the others don’t: it’s in use, live, on a site that does billions of monthlies. It’s performance gains are there for the world to see.

And when you deal with websites on this scale, nobody wants to take risks unless they have to.

So this means that WordPress will be converted to HipHop?

[TODO]

So this means that there’s no point in converting anything to HipHop.

[TODO]

Marco Tabini
Marco Tabini
Facebook, Palo Alto, California

Olympus E-P2, Cosina-Voigtländer NOKTON Classic 40mm f1.4 S.C.
1/100sec, ISO320, 40mm (70mm)

Marco Tabini is the co-founder of php|architect. He also does PHP consulting through Blue Parabola, LLC.

About three years ago, he tried to make a PHP script to C extension compiler. You can read his take on HipHop here.

“If I only had HipHop, my scalability problems would have been solved!” (paraphrased)

HipHop is a performance mod, not a scalability one. That’s good because scalability is an architecture problem, not a language one.

But clearly the example I linked is scaled. Let’s take that example:
30 million monthlies on 25 Apache web servers, 6 MySQL databases + ? memcached boxes + squid + PHP on eAccelerator =running hotter than 20%.

They’re a blogging site, so I should use WordPress.com’s numbers, but I won’t. I think we all can agree a social network is more data intensive, so lets take numbers I am more familiar with:
7 billion monthlies (20 billion dynamic pages monthly) on 120 Apache web servers, 10 Oracle databases + 30 memcached boxes + Netscalers + PHP on APC. =nominal load.

Why the at-least 50x server efficiency difference? I’d hazard to guess that the system, like most systems, are bottlenecked on the data tier. Was it really PHP that was crashing on these Grey’s Anatomy spikes or was it that PHP was waiting on the database disk and thus all the processes were backed up? Is all your data in RAM somewhere? What’s your memcached hit rate? Facebook’s is at 98%. How about network I/O? Can your servers handle twice the I/O per machine?

(Well actually a large difference is I didn’t mention was that 80 machines + almost all new server outlays were repurposed for back-end business logic services written in Java, but I’m talking about the webserving stack.)

Rasmus likes to say:

PHP is rarely the bottleneck.

And Facebook? 250 billion monthlies (a trillion dynamic pages generated?) on [unknown #] Apache webservers, 600 MySQL databases + 700 memcached boxes + at least 3 Netscalers + PHP + APC = no outages due to traffic.

And, this is before HipHop replaced it, and in line with the previous numbers, so why the change to HipHop? Look at those numbers again: to support that across three data centers—you’re hitting a power ceiling. There’s just a physical limitation here that HipHop is supposedly extending.

“Facebook is reported double the performance, without reporting how. I don’t buy it.”

Well, nobody there seemed to dispute the numbers when they were given. They seemed in line with expectation and possibly a bit conservative (no -O3 compiler optimization, debug build). But roughly the gain comes in the form of both total CPU time/request savings and using less memory: in other words, they can run more requests/machine and the requests finishes quicker.

In practice, the actual gain may vary.

“Facebook sees about a 50% reduction in CPU usage when serving equal amounts of Web traffic when compared to Apache and PHP. Facebook’s API tier can serve twice the traffic using 30% less CPU.”
Claim on Facebook’s website.

Why more gain on the API tier? API requests finish faster than web requests so the improved performance of a libevent-based custom web server starts to add up quickly.

How does HipHop achieve it’s performance gains?

C++ is simply a lower level language than PHP. Because of this, it can be further optimized and the execution occurs at the machine level as opposed to the virtual machine level.

That answer is a cop-out because what you really want to know is how HipHop in particular leverages C++ for more efficient code.

If your pages are really small the biggest gain will be in the fact that HipHop is multi-threaded on a libevent-based webserver (as noted above). There are a number of other features which I’ll talk about in detail later, but the single largest is the static analyzer. In the first stage of the translator, the parser, reads your entire PHP code base, and the second stage tries to find where it can replace dynamically typed variables with statically typed ones. And then it does so if it can.

Think of this snippet of code:

function fibonacci(&$count)
{
   for( $l = array(1,1), $i = 2, $x = 0; $i < $count; ++$i )
   {
        $l[] = $l[$x++] + $l[$x];
   }
   return $l;
}

for ( $j=0; $j<$fibmax; ++$j ) {
  printf("fib(%d) = %d\n", $j,  fibonacci($j))
};

In this case, because of dynamic typing, all the variables must remain variants in the virtual machine. In HipHop’s case it works like this. In the first pass of the static analyzer it can clearly detect that $i and $x are integers, and not variants. In later passes it will realize that $j is unmodified inside it’s include scope and therefore always an integer, Eventually, it will survey all the files (finding everywhere you’ve set $fibmax or called fibonacci()) and realize that $count and and $fibmax can safely be made integers. (When it can’t infer the type, it will default to a catch-all “variant” type which resembles ZVAL in extension writing.)

After finding out that everything is an integer (in this case), the optimizer can replace all operations native C++, and after a number of passes (it could be optimized to realize the array $l can be unrolled), might resemble the C++ version of the link above delta some name changes to avoid scoping collisions in that language.

The only reason Facebook used this solution was “they’re stuck with all these developers that only know PHP so it was somehow cheaper to engineer a way to change PHP to C++ than it was to retrain developers on C++ (or, probably more realistic, Java).” —Hans Leellelid

As you’ll read later, less than half of Facebook engineers who work in web development know PHP coming in. One of the attraction points of PHP is the ease you can get running to it no matter where they come from. This allows Facebook to hire “good engineers” without any care of their PHP experience. Anyone who has gone through their hiring process can attest to this.

You’ll also read that since 2007 alone, there were multiple attempts to leave PHP for another language—two of which are to languages Hans mentioned. Those attempts failed to produce anything because of the rapid development that is a language feature.

As a language choice, Java might have been a mistake prima facia simply because it would have bottlenecked the servers on memory. C++ can be eliminated as the development language because the time to develop on C++ vs. PHP is at least a full order of magnitude. (Hiring, training, and managing thousands of C++ engineers instead of a hundred PHP ones? No thanks.)

The 2x performance gain that Facebook is trolling is a big deal to them because of their size: we are talking thousands of machines with over 250 billion monthlies and probably close to 1 trillion dynamic requests—this nets to an operational cost savings at around $100 million going forward. Honestly, the examples I’ve been reading about on the blogs are tiddly-winks, many PHP sites like Yahoo do a lot of traffic and haven’t needed HipHop. It is simply impractical for Facebook to make the architectural changes needed within userspace PHP for the same reason it is impossible to recode the site in another language—too much code has been written and continues to be written. Facebook has hundreds of engineers churning out new features and fixing bugs pushing releases multiple times a day.

“Had Facebook started with Ruby, they’d have had JRuby and not needed this hack”. (paraphrased)

Bullshit.

In a synthetic benchmark, Ruby MRE to JRuby nets a 3x performance gain. Sounds impressive, huh? That’s until you realize that JRuby still performs 8x slower than Java on those same benchmarks. Synthetic benchmarks are just that—synthetic.

In practice, Facebook chose HipHop, not over PHP, but over PHP + APC + performance patches base on real-world numbers. Those alone, I would hazard IRL outperform Ruby+JRuby—after all, Facebook must have gotten a larger gain on that approach alone without switching to HipHop in order to just keep up with their growth from 2006-2009 delta building out a new datacenter.

Furthermore, much of the gain that HipHop provides are in the static analyzer (as mentioned above). This is something simply not possible unless the language were statically typed (Java is; Ruby, PHP, Perl are not). It’s not like when you run JRuby you suddenly have a statically typed setup—and I guarantee that without analyzing the entire codebase (not just at runtime) the static-typed inferencing will be less efficient.

Finally, every language choice has consequences far beyond simple ones. In the case of Facebook, PHP meant it’s easy to hire developers and to “move fast and break things,” but it also encouraged a monolithic architecture with a deep-include hierarchy that is most of what HipHop tries to work around. Fine, you may dodge the last with Ruby, but you’ll also have adopted something like the Rails framework. Web frameworks, even efficient ones, are about 5x slower than minimalist approaches in the same language—Ruby on Rails is not a efficient framework. (They get away with being so inefficient by most sites being bottlenecked on the data tier—which has already been optimized in Facebook’s case.) You just trade one set of development headaches for a different set.

Twitter is a tiny application in terms of engineering. Despite rumors to the contrary, want to ask Twitter if they have no Rails left there?

You dance with the one who brung you.

What’s missing in HipHop that’s in PHP?

There are three major incompatibilities of HipHop, and a host of minor ones:

First, there are language features that aren’t supported and will never be supported. The example you hear bandied around is eval(). Since the static analyzer has to survey the entire PHP codebase and features like eval can dynamically write code, they cannot be supported by HipHop—and probably will never be supported.

Second, extensions must be recoded to work in HipHop. Facebook has recoded the extensions they use internally. To the extent that the PHP versions are open source, they’ll roll those out with HipHop. This is also huge. For instance on PECL alone, there are currently 238 extensions, each one may have dozens of functions associated with it. Only a tiny fraction of these have been ported and you can count that many don’t exhibit the same exact behavior as PHP (as they have to be ported by hand).

Moreover, this task is not as easy as Facebook intimates. PHP’s core is multi-thread capable but the single biggest hurdle blocking PHP from being a multi-thread language is the extension built on PHP depend on libraries which are not multi-threaded. Since HipHop is multi-threaded, those libraries need to be made thread-safe, which may not exactly be the case. Stas goes into this in detail.

Stanislav Malyshev
Stanislav Malyshev
Facebook, Palo Alto, California

Olympus E-P2, Cosina-Voigtländer NOKTON Classic 40mm f1.4 S.C.
1/100sec, ISO200, 40mm (70mm)

Stas is one of Zend’s PHP core developers.

Read his take on HipHop.

Third, there are language behaviors and features that aren’t supported or incorrectly supported. Here’s how the process at Facebook worked. One day someone at Facebook noticed that HipHop had a runaway process and was dying, “Hey, Haiping, where did max_execution_time and set_time_limit() go?” Haiping reads PHP’s online manual, “Oh, I don’t have that.” “Well, if a script goes on forever, it will take up all your resources and take down your server.” “I guess that’s a good thing to support then.” And it gets supported.

There are simply too many functions in PHP and quirky behaviors for all of them to be supported as-is. The day before I was at Facebook, David Recordon had tried getting HPHPi to accept the WordPress codebase and it had failed—in one case simply because phpversion() wasn’t returning anything!

Having said that, this is exactly the thing that’s “easy” to fix—if you find it, you can report the bug and the HipHop team will patch it. I put this in quotes because experience has shown that this well is infinite. The two will never achieve 100% compatibility—and in the case of “echo 08;” linked above probably should never.

There are also a number of minor incompatibilities.

Related to the above is that the server is not Apache. LAMP is an integrated solution and Apache, or whatever server you are using, is part of that. PHP tries to play friendly with that. This server uses libevent, which makes it far more efficient than Apache prefork, but it is probably missing features you’ve become accustomed to. Not only that, but this approach implies multi-threading and the can of worms mentioned above.

Another one was that the target version of HipHop was PHP 5.2.5, and later 5.2.6. PHP is currently at 5.3.1, but the new language features of 5.3 are not supported (yet).

Also, there is currently no Windows support—that’s because the Facebook doesn’t deploy on Windows.

“I’m interested in what language features besides eval() are not supported. They give eval() as an example but imply there are others.”—Jenn (in comments below)

Well that list is infinite as I just explained, but the question is really wondering what things won’t ever be supported. Here is a list:

  • eval() not supported
  • create_function() is not supported
  • dynamic scripting is not allowed. That’s where you use PHP to create a PHP file. The most common example is using Smarty to compile a file for performance. All template files would have to be precompiled or the smarty include() hook would crash HipHop.
  • preg_replace() when using e modifier—execute PHP code on match.
  • I think $$ is not supported for the same static type optimizer reason above. call_user_func(), however, is.
  • Order dependent symbol lookups (checking for exists and then doing something) won’t behave the same way. This is because of the way the static analyzer iterates over everything in multiple passes.

What about security?

Well most security is a matter of process, not language. Usually the biggest, easiest exploits are the stuff outside the engine itself—for instance in userspace.

In any case, this is an issue, which Ilia discusses better than I ever would. You can also check out echolibre.

Ilia Alshanetsky multitasks
Ilia Multitasky
Facebook, Palo Alto, California

Olympus E-P2, M.ZUIKO Digital ED 14-42mm 1:3.5-5.6
1/60sec @ ƒ5.6, ISO1000, 14mm (24mm)

Ilia Ashanetsky was the release manager of PHP 5.2 and is a web security expert. You can read his take on HipHop here.

Can’t they just add support of [feature that isn’t supported] later into HipHop? Can’t they turn it into a JIT later?

The complete code analysis is key to the performance gain.

Note, because I said “probably never” doesn’t mean “maybe.” Think of it this way, the Facebook engineers already wrote a lot of code to get HipHop to run at all and it’s still a leaky boat. There is a reason why something as large as dynamic scripting wasn’t supported. The point at which you support it, that code will have lost any performance gains that attracted you to HipHop as a solution. Doubly-so since you’d have some very costly context switch at that point.

That’s not to say the whole thing is impossible, just unlikely to get there from here.

How do I test if my code would run in HipHop without recompiling?

Facebook will be releasing an interpreter called HPHPi that is supposed to be able to run the code without compile.

This is probably the least ready part of HipHop, so details are very sketchy.

“My main concern about HPHPi is the xdebug compatibility. I really hope the team at facebook made this so seamlessly integrated that we won’t notice a difference in our development process” —David

It is my understanding that HipHop doesn’t have xdebug compatibility, since they use xhprof internally to fill a similar role. Therefore HPHPi won’t have such a thing.

Facebook would argue that you could always make a HipHop XDebug extension (how?). Or, you can still develop in PHP+Zend (preferred), then run it through HPHPi to ensure compatibility with HipHop.

About tychay

light writing, word loving, ❤ coding
This entry was posted in PHP and tagged . Bookmark the permalink. Follow any comments here with the RSS feed for this post.

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.