Eternal Pointlessness of the PHP Complier

Out of nowhere, the tired old dog of PHP compilation rears its ugly head.

Here is an recent e-mail exchange that puts it in its place (along with some information about code browsers and documentors and a lot of confusion on my part). I don’t think anyone is going to mind me posting these since the only one who comes off as an idiot is me.

[Dialog after the jump]

Me, Myself and I

Do you know any good PHP code browser? Like a web page that I can click on a function call it actually jumps to function’s declaration? Also, is there a PHP -> C/C++ codegen/compiler, or any effort on that?

HP,

Happy birthday. I hope all is going well at Facebook. :-)

The Zend IDE will do click-on-function-code-jump and its quite convenient (I’m told by my devs, I’m old school). It can’t do dynamic function calls or eval()‘d function calls or variable-variables, but for the basics it works fine. I don’t know if Komodo can do this so I’m cc:ing this to the head of ActiveState’s tech support, Jeff Grifiths, so he can answer that. Since I paid for it, I’m sort of interested in the answer myself.

There was effort about three years ago on a PHP->C/C++ extension code generator by a number of developers. I think Marco Tabini of php|architect worked on one, or if not, he knows of who was trying to do this. I added Marco to the cc: so maybe he can provide the status of those things.

There is no C++ codgen. Getting C++ extensions to work in PHP is a chore, IMO. I’d stick with C. I know you love C++, but trust me on this.

This PEAR package is just used to translate a XML definition file into a C extension stubs, not what you are looking for. Probably the authority on this will have to be Sara Golemon who is also on the cc. She has a book called Extending and Embedding PHP which you’ll need to look at. I included her amazon link.

I bet Brian or Lucas has a copy somewhere.

Finally, a word of warning. When this thing was all the rage three years ago, (like a PHP JIT, this sort of stuff tends to go in cycles), I talked with George about this—he’s the guy who co-wrote APC which you use at Facebook. He pointed out that PHP is filled with a lot of edge cases that you won’t think anyone would be stupid enough to exploit until you find out they do. “All APC does” is code cache and yet bugs are still being fixed in it to this day: ask your coworker Brian Shire. A codegen would have to require a “color between the lines” approach to PHP programming in order to actually work, though now with PHP 5 the compatibility set will be much larger.

In general, I’m told Yahoo! maintains around 100+ PHP extensions in their codebase. I’d like to believe that perhaps this has more to do with enabling special bindings with stuff PHP is bad at more than it is for performance to stuff PHP does just fine. I think that every language has a philosophy that it is important to understand because it points to both its strengths and weaknesses. PHP’s philosophies are “shared none” and “glue language” which creates a strength is in its simplicity, flexibility, and scalability, but a glaring weakness in performance and consistency. Codegen is inherently a performance thing which fights against the core nature of PHP. I don’t think its wise to go against the core nature of any language, but as always, I wish you luck, hope to be proven wrong, and am curious to hear what the state is of this.

Jeff Griffiths, ActiveState Komodo

*puts demo-guy hat on*

Komodo IDE 4.0 has ‘go to definition’ as a new feature, so you can select a function and Komodo will go to the function where it is defined, assuming Komodo knows where that is. We’re releasing Komodo 4.0.3 this week which will introduce ‘include everything’ semantics for projects and selected directories, which will greatly improve this kind of feature for people using auto-load mechanisms like Cake or Drupal. Komodo formerly relied on include statements to find stuff.

Marco Tabini, php|architect

My attempt was just a proof-of-concept attempt at demonstrating that it was possible to convert PHP into C (not C++—see below). I (and some others) proved that it was possible, and we left it at that, mostly because we realized that it was pointless—the performance gain of straight PHP->C conversion was not significant enough to justify the loss in maintainability and the potential for incompatibilities between the compiled code and its PHP equivalent.

Nevertheless, there is, however, a (now open-sourced) compiler called Roadsend that claims to be able to compile PHP scripts directly into native binaries on various platforms. I don’t know much about it, but you might want to take a look at it.

There is no C++ codgen. Getting C++ extensions to work in PHP is a chore, IMO. I’d stick with C. I know you love C++, but trust me on this.

That is a very generous assessment of the situation :-)

I am not an expert on internals, but IMHO you’re better off focusing on using PHP as a prototyping environment for those extensions that you eventually want to move to C or C++. A code generator won’t give you that much of a performance boost (because it will still have to deal with PHP’s “magic,” like type juggling) and will introduce all sorts of probable incompatibilities that are bound to come back and bite you eventually.

Plus, don’t forget that Y! counts Rasmus, Andrei and Sara in their ranks—which is enough to hack any part of PHP into submission :-)

Haiping, Facebook

Thanks, Marco. I thought I was just asking Terry some random questions, but then he’s so resourceful, and I got so much helps from experts, and I’m really thankful here.

  1. For code browser, I kinda want to stay away from any IDEs, just because everyone uses their own way of coding. All I need is a program that I can run to generate static HTML pages linkified by functions and references. Then it would prepare a web site that everyone can jump on to browse codes without any installation of any IDEs. I’m looking into Eclipse, as they have a Java PHP parser and formatter that seems to work for me for this purpose. Is anyone working on that project, or any comments on whether I should go or not go for that? Perhaps there’s some other PHP parser packages that I can use?
  2. The PHP->C/C++ conversion thing is actually not for performance or any replacement of PHP code we have. I only need it for some simple borrowing of PHP code for I can call them from my C/C++ programs. Embedded PHP has some memory problems, due to our own coding (not PHP itself). But I can read more on that to see if there is a better way. But Marco, is your proof-of-concept coding open sourced? Is it possible for me to have a copy of that? Thanks.

Jeff Grifiths

Ah, ok. “Code Browser” seems to be overloaded. I think you want to look into either doxygen or more likely PHPDocumentor.

Komodo uses a scintilla-related component called silvercity, if you wanted something to call from C++ or Python code.

Haiping

PHPDocumentor doesn’t deal with real coding, does it? It only works on comments, right?

Marco Tabini

…code browser…

Apologies for my abyssal ignorance here, but isn’t that something that phpDoc would be able to do?

…converter code sample…

I would have no problem giving it to you—except it’s long gone :-(

Have you looked at the source for roadsend? Maybe you can use that as a starting point.

Me

PHPDocumentor tokenizes the source code (I think it uses token_get_all() to do it) when processing. The javadoc comments assist in the documentation (for instance, by declaring type information on the input parameters), but PHPDocumentor will process uncommented code just fine.

8 thoughts on “Eternal Pointlessness of the PHP Complier

  1. Hartmut Holzgraefe

    > There is no C++ codgen. Getting C++ extensions to work in PHP is a chore, IMO. I’d stick with C. I know you love C++, but trust me on this.
    > This PEAR package is just used to translate a XML definition file into a C extension stubs, not what you are looking for.

    Small correction: CodeGen_PECL can actually generate both C and C++
    extension code. The generated code skeleton will still be procedural and
    mostly look like plain C in C++ mode, but it will allow you to put C++
    code into the function bodies and to link against C++ libraries just fine …

    Reply
  2. SantosJ

    I think a x2 increase in performance in 99% of the test cases is a big deal when you have a massive loop of data processing.

    The datatype conversion isn’t a big deal nor would it greatly decrease the performance of any application. I say so, because type conversion and type checking is widely used in C/C++ development. It is more, well like you said, for maintenance. Using Templates in C++ does allow for some dynamic typing, but not in all of the cases PHP handles. RTTI is another technique, but it is highly advanced, which would also be a (-1) for maintainability.

    It isn’t that C++ and PHP Extensions are a hassle, it is just that PHP doesn’t have any built in support for C++ classes, RTTI, Templates, Namespaces. Interestingly, the PHP internals uses the Object (Class) pattern throughout the core and would be better fit using C++ classes for the sake of maintainability, but not so much for speed.

    I think (and I’m sort of flaming here again, so shame on me), that the core developers have something against C++, but it would be so much easier for them and others to get into the Engine and Extension building. One reason could be for speed, virtual methods do incur performance costs that the C objects do not (in part because virtual methods aren’t used). Virtual methods do not need to be used however.

    I think the difference is not so much in web scripts where you don’t do a lot of processing and where the database and file is where the most overhead is. Also where you will cache the page in most cases.

    You would see the difference more in CRON jobs where you can save 5 seconds, 10 seconds, maybe even a few minutes, on heavily computation tasks. Okay, so should you do all of that in C/C++ and have the maintainability issue or should I keep it in PHP and compile or just-in-time it? Okay, so it also depends on how crappy the algorithm is, but sometimes like in AI or big math heavy equations, you have no other choice.

    Reply
  3. Greg Beaver

    you might want to tell him that phpdocumentor’s –sourcecode=on option will generate xref-style source code, which is what he is looking for (clickable function names, object members, etc.).

    Reply
  4. Pingback: PHPDeveloper.org

  5. Pingback: developercast.com » Terry Chay’s Blog: Eternal Pointlessness of the PHP Complier

  6. Stanislav Malyshev

    I’m actually surprised compiling the PHP code gives only 2x performance. I would expect more. However, translating PHP to good C code, which could be optimized by something like gcc, is indeed very non-trivial task. First problem, as it was correctly pointed out, is the edge cases, which are many and strange in PHP. The second problem is the dynamic nature of the PHP language, meaning eval, include, etc. performed at runtime make compiler’s work very problematic – especially if we don’t talk about isolated function set but a full application. Also, I see Roadsend in fact do not use the PHP engine to work with PHP code – meaning there’s no guarantee their compiler understands PHP the same way as the PHP engine does. And since there’s no spec for PHP (unfortunately) there’s little way to ensure the code won’t break. And indeed, Roadsend docs show a lot of PHP functionality is not available, so the holy grail of the performance-hungry PHP developer – to take their code and compile it into binary – is still more of a dream than the reality.
    Also, one of the strengthes in PHP is the function libraries, and again, it’s not clear how well the compiler would work with them – especially given that it doesn’t have the real engine these functions rely on. Judging from their docs, they have to manually fit the extensions for their PHp version, meaning they would never allow their users to rely on the whole wealth of extensions available to the “real PHP” users. Which of course reduces the value of the compiler significantly and I believe it is the wrong way to go. The right way would be to make extensions work with the compiled code. This is a very non-trivial task, however.
    Summarily, it’s very hard to do it right and little use to do it wrong. Nevertheless, I wish Roadsend all luck in their interesting project and we’ll see where it goes…

    Reply
  7. Pingback: Congratulations, Haiping! | The Woodwork

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>