The Zend Engine is my bitch

I don’t read php-internals anymore because I’m partial to getting work done, but there was an interesting question the dealmac developer posted. Basically dealmac, like my current employer, has a large array structure in a PHP file somewhere that is included on every page. It’s abusing memory.

Brian then notices that if you use var_export()(a function I keep forgetting exists), he was able to cut the memory usage from 5MB to 1.2MB. Storing it serialized reduced the memory usage to 20% but with double to load time performance penalty.

How could a 300K file use up so much space in memory?

The way you answer this is to use vld to make the Zend Engine your bitch.

[The answer after the jump.]

What VLD is

Vulcan Logic Disassembler was written by Derick and Andrei back in the dark ages of PHP 4 when opcode encryption was in vogue. Nobody remembers it exists which is a pity because it’d make things like PINT a whole lot easier to swallow (pardon the pun).

It’s a PHP extension that does something real simple: it spits out the Zend opcodes of a script.

Huh?

A simple explanation of the way PHP runs is that your PHP scripts are parsed by into symbols, and those symbols are turned into a set of virtual assembly language instructions called opcodes that are stored in oparrays that are indexed by the filename (or function name etc.).

Then Zend Engine is a virtual machine that executes the opcodes in the main oparray. If it hits certain events like an include() or eval(), then it starts up the PHP engine to repeat the process and add to the oparrays.

All of those oparrays are stored in memory during execution, and are sometimes cached by code caching mechanisms like Zend Platform, APC, XCache, what-have-you.

If you understand this than you can answer two interview questions I always ask candidates (and they always bomb).

12 lines later…

So I pasted the code sample Brian gave a segment into a file called “big_array_array.php.”

I then wrote a script called “big_array_make.php” to make the other two files I needed.

Now run vld on the stuff (remember to mod your php.ini file to load vld)

test$ php -dvld.active=1 big_array_array.php 2>output1.txt
test$ php -dvld.active=1

Here is a relevant sample of array segment that writes two elements

filename:       /home/tychay/compile/utils/big_array_array.php
function name:  (null)
number of ops:  202
line     #  op                           fetch          ext  operands
-------------------------------------------------------------------------------
   3     0  FETCH_W                                          $0, 'CATEGORIES'
         1  FETCH_DIM_W                                      $1, $0, 202
         2  FETCH_DIM_W                                      $2, $1, 'id'
         3  ASSIGN                                               $2, '202'
   4     4  FETCH_W                                          $4, 'CATEGORIES'
         5  FETCH_DIM_W                                      $5, $4, 202
         6  FETCH_DIM_W                                      $6, $5, 'name'
         7  ASSIGN                                               $6, 'clothing+%26+accessories'

4 opcodes to write each element, 3 registers created per element…

Here is a relevant sample of a var_export() segment that writes the same two elements

filename:       /home/tychay/compile/utils/big_array_varexport.php
function name:  (null)
number of ops:  80
line     #  op                           fetch          ext  operands
-------------------------------------------------------------------------------
   4     0  INIT_ARRAY                                       ~1, '202', 'id'
   5     1  ADD_ARRAY_ELEMENT                                ~1, 'clothing+%26+accessories', 'name'

1 opcode per array element, 1 register for entire thing, <1 reference/element.

Here is the entire oparray for the serialize version:

filename:       /home/tychay/compile/utils/big_array_serialize.php
function name:  (null)
number of ops:  5
line     #  op                           fetch          ext  operands
-------------------------------------------------------------------------------
   1     0  SEND_VAL                                             'a%3A13%3A%7Bi%3A202%3Ba%3A15%3A%7Bs%3A2%3A%22id%22%3Bs%3A3%3A%22202%22%3Bs%3A4%3A%22name%22%3Bs%3A22%3A%22clothing+%26+accessories%22%3Bs%3A6%3A%22parent%22%3Bs%3A1%3A%220%22%3Bs%3A10%3A%22standalone%22%3Bs%3A0%3A%22%22%3Bs%3A11%3A%22description%22%3Bs%3A8%3A%22clothing%22%3Bs%3A10%3A%22precedence%22%3Bs%3A1%3A%220%22%3Bs%3A9%3A%22preferred%22%3Bs%3A1%3A%220%22%3Bs%3A10%3A%22searchable%22%3Bs%3A1%3A%221%22%3Bs%3A7%3A%22product%22%3Bs%3A1%3A%221%22%3Bs%3A10%3A%22aliased_id%22%3Bs%3A1%3A%220%22%3Bs%3A4%3A%22path%22%3Bs%3A22%3A%22clothing+%26+accessories%22%3Bs%3A13%3A%22url_safe_name%22%3Bs%3A20%3A%22clothing-accessories%22%3Bs%3A11%3A%22child_count%22%3Bs%3A1%3A%226%22%3Bs%3A9%3A%22childlist%22%3Ba%3A13%3A%7Bi%3A0%3Bi%3A202%3Bi%3A1%3Bi%3A2%3Bi%3A2%3Bi%3A275%3Bi%3A3%3Bi%3A4%3Bi%3A4%3Bi%3A481%3Bi%3A5%3Bi%3A446%3Bi%3A6%3Bi%3A454%3Bi%3A7%3Bi%3A436%3Bi%3A8%3Bi%3A205%3Bi%3A9%3Bi%3A227%3Bi%3A10%3Bi%3A203%3Bi%3A11%3Bi%3A280%3Bi%3A12%3Bi%3A204%3B%7Ds%3A8%3A%22children%22%3Ba%3A12%3A%7Bi%3A2%3BN%3Bi%3A275%3BN%3Bi%3A4%3BN%3Bi%3A481%3BN%3Bi%3A446%3BN%3Bi%3A454%3BN%3Bi%3A436%3BN%3Bi%3A205%3BN%3Bi%3A227%3BN%3Bi%3A203%3BN%3Bi%3A280%3BN%3Bi%3A204%3BN%3B%7D%7Di%3A2%3BR%3A31%3Bi%3A275%3BR%3A32%3Bi%3A4%3BR%3A33%3Bi%3A481%3BR%3A34%3Bi%3A446%3BR%3A35%3Bi%3A454%3BR%3A36%3Bi%3A436%3BR%3A37%3Bi%3A205%3BR%3A38%3Bi%3A227%3BR%3A39%3Bi%3A203%3BR%3A40%3Bi%3A280%3BR%3A41%3Bi%3A204%3BR%3A42%3B%7D'
         1  DO_FCALL                                      1  $1, 'unserialize', 0
         2  FETCH_W                                          $0, 'CATEGORIES'
         3  ASSIGN                                               $0, $1
         4  RETURN                                               1

5 opcodes, 2 registers for the entire structure.

I wonder if ZendPlatform optimizes the first case. Does anyone know if the output of VLD is valid if ZendOptimizer is on?

Passing note

I like to thank Brian for pointing out var_export(). It’s surprisingly efficient than I thought and possibly a nice alternative to serialize for special cases (no references, no objects). I’ll have to use the new Benchmark2 I wrote (but haven’t had time to blog) to decide.

9 thoughts on “The Zend Engine is my bitch

  1. VLD on the actual categories array yields.

    code (334k file)
    —-
    33,160 opcodes
    33,157 regsters
    2.7MB disassembled output file

    export (244k file)
    ——
    9,606 opcodes
    2,080 registers
    2,083 temporary registers
    830K disassembled output file

    ser (220K file)

    5 opcodes
    2 registers
    360K disassembled output file

    (Obviously not all instructions run at the same speed. The registration space should be freed at the end of the execution if the oparray, but the oparray itself will stay resident in memory for the entire execution.)

  2. VLD should work with the optimizer on, however, it changes some opcodes so that it’s not always 100% correct (as opcode types are added for example).

  3. @gopal:

    Thanks. I read your blog, or at least references to your blog. For instance, just the other day I was considering whether your now #define’s-for-real extension could be used on our site. 😛

    As for the vlc thing. There is actually a nice little tutorial on building your own VLD in George’s Advanced PHP Programming book. “Not that anyone reads it.”

  4. VLD worked well before zendOptimizer version 3.0 but after 3.0 optimzer does not work if vld is installed.

  5. So, the point is that you were filling an array inefficiently. Someone might learn PHP, others might look at the var_export() output… but why the hell would anyone use VLD for such a purpose?! Funny.

  6. VLD shows you how the PHP compiler tells the virtual machine to execute a set of commands. “Learning PHP” won’t tell you the answer, and the answer itself will vary from PHP version to version. For all I know, since the article was written, the compiler has changed to be more efficient for assignments and serialize may benchmark much faster.

  7. We can say through Zend fastest execution. We can encrypt our code for security point of view. Server fast response and execution of script fast and optimize

Leave a Reply to luky Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.