Two and a half years ago, when first wrestling with the Tagged codebase, I asked Andrei about replacing all my PHP includes with __autoload. I was told under no uncertain terms to not do this.
It’s not that Andrei is wrong in his admonition. Far from it! For reasons that I don’t quite care to know, there are caching and lookup optimizations that APC cannot do when it has to switch context to run __autoload. But the problem in practice was two-fold:
- The company was bug-driven and the easiest way to eliminate an “Undefined class” error was to go into the preinclude script and include it. Voilá! problem solved at the expense of code bloat. (This bug happens often when deserializing nested objects from cache.)
- There are slowdowns when you use include_once where include would do, or when you don’t use the full path in your include, or when you construct your full path from symbols. How many of us do this? Heck, I’m still trying to get used to the idea of include_once and require_once. Ahh the days when I’d have to write symbols with every include file!
- More to the previous. If you have deep dependencies and don’t use a FrontController pattern, you’re going to have to use require_once() which will get executed multiple times. An __autoload only gets executed once.
At a certain point, optimization gives way to convenience and practicality.
For Tagged, this was that PHP would allocate 12MB/80ms to say “hello world”, 20MB/465ms to display the homepage, and 22MB/1965ms/1207ms to return my profile page
After the rewrite it takes 0.3MB/3ms to say hello world and 3.7MB/109ms to return my profile page.
This was done by wrapping all functionality into classes, rigidly naming those classes so they can easily be loaded on use, writing bootstrapping code so it can be backwardly compatible (you have to careful with any nested defines as they’re not auto-included), and then slowly migrating parts of the site over as new parts are being added in a new framework.
But what if you don’t have two years?
I think at that point, one should go the Facebook route and employ an obscure trick called Lazy Loading in APC.
Brian Shire does an excellent job of describing it, but I’ll just reword it.
PHP compiles your scripts into virtual machine instructions known as zend opcodes. These opcodes are grouped into oparrays indexed by function or script name. So if you have an include with 10 functions defined in it, that’d be 11 op arrays.
APC then stores these oparrays into shared memory so that the Zend Engine doesn’t have to do this compilation process over and over again. It simply copies the oparrays into execution space every time an include occurs that has already been compiled.
The problem comes when, say, you only need one function of the 11 oparrays you just compiled. The whole schmeer is copied to userspace before one executes. Lazy Loading gets rid of this.
I should probably tell Tagged again about this option. But the again, I’ve been trying for a year and a half to get them to set apc.stat=off so why bother?
I’d note a tiny error, when referencing an awesome article showing the inclued graphs of various frameworks, Shire implies that the more complex the include hierarchy the more it might benefit from Lazy Loading. This isn’t necessarily the case, it just practically is the case. Same argument as above as to why.
BTW, here is the inclued for saying Hello World in Tagged’s framework: