The Zend Engine is my bitch

I don’t read php-internals anymore because I’m partial to getting work done, but there was an interesting question the dealmac developer posted. Basically dealmac, like my current employer, has a large array structure in a PHP file somewhere that is included on every page. It’s abusing memory.

Brian then notices that if you use var_export()(a function I keep forgetting exists), he was able to cut the memory usage from 5MB to 1.2MB. Storing it serialized reduced the memory usage to 20% but with double to load time performance penalty.

How could a 300K file use up so much space in memory?

The way you answer this is to use vld to make the Zend Engine your bitch.

[The answer after the jump.]

What VLD is

Vulcan Logic Disassembler was written by Derick and Andrei back in the dark ages of PHP 4 when opcode encryption was in vogue. Nobody remembers it exists which is a pity because it’d make things like PINT a whole lot easier to swallow (pardon the pun).

It’s a PHP extension that does something real simple: it spits out the Zend opcodes of a script.

Huh?

A simple explanation of the way PHP runs is that your PHP scripts are parsed by into symbols, and those symbols are turned into a set of virtual assembly language instructions called opcodes that are stored in oparrays that are indexed by the filename (or function name etc.).

Then Zend Engine is a virtual machine that executes the opcodes in the main oparray. If it hits certain events like an include() or eval(), then it starts up the PHP engine to repeat the process and add to the oparrays.

All of those oparrays are stored in memory during execution, and are sometimes cached by code caching mechanisms like Zend Platform, APC, XCache, what-have-you.

If you understand this than you can answer two interview questions I always ask candidates (and they always bomb).

12 lines later…

So I pasted the code sample Brian gave a segment into a file called “big_array_array.php.”

I then wrote a script called “big_array_make.php” to make the other two files I needed.

Now run vld on the stuff (remember to mod your php.ini file to load vld)

test$ php -dvld.active=1 big_array_array.php 2>output1.txt
test$ php -dvld.active=1

Here is a relevant sample of array segment that writes two elements

filename:       /home/tychay/compile/utils/big_array_array.php
function name:  (null)
number of ops:  202
line     #  op                           fetch          ext  operands
-------------------------------------------------------------------------------
   3     0  FETCH_W                                          $0, 'CATEGORIES'
         1  FETCH_DIM_W                                      $1, $0, 202
         2  FETCH_DIM_W                                      $2, $1, 'id'
         3  ASSIGN                                               $2, '202'
   4     4  FETCH_W                                          $4, 'CATEGORIES'
         5  FETCH_DIM_W                                      $5, $4, 202
         6  FETCH_DIM_W                                      $6, $5, 'name'
         7  ASSIGN                                               $6, 'clothing+%26+accessories'

4 opcodes to write each element, 3 registers created per element…

Here is a relevant sample of a var_export() segment that writes the same two elements

filename:       /home/tychay/compile/utils/big_array_varexport.php
function name:  (null)
number of ops:  80
line     #  op                           fetch          ext  operands
-------------------------------------------------------------------------------
   4     0  INIT_ARRAY                                       ~1, '202', 'id'
   5     1  ADD_ARRAY_ELEMENT                                ~1, 'clothing+%26+accessories', 'name'

1 opcode per array element, 1 register for entire thing, <1 reference/element.

Here is the entire oparray for the serialize version:

filename:       /home/tychay/compile/utils/big_array_serialize.php
function name:  (null)
number of ops:  5
line     #  op                           fetch          ext  operands
-------------------------------------------------------------------------------
   1     0  SEND_VAL                                             'a%3A13%3A%7Bi%3A202%3Ba%3A15%3A%7Bs%3A2%3A%22id%22%3Bs%3A3%3A%22202%22%3Bs%3A4%3A%22name%22%3Bs%3A22%3A%22clothing+%26+accessories%22%3Bs%3A6%3A%22parent%22%3Bs%3A1%3A%220%22%3Bs%3A10%3A%22standalone%22%3Bs%3A0%3A%22%22%3Bs%3A11%3A%22description%22%3Bs%3A8%3A%22clothing%22%3Bs%3A10%3A%22precedence%22%3Bs%3A1%3A%220%22%3Bs%3A9%3A%22preferred%22%3Bs%3A1%3A%220%22%3Bs%3A10%3A%22searchable%22%3Bs%3A1%3A%221%22%3Bs%3A7%3A%22product%22%3Bs%3A1%3A%221%22%3Bs%3A10%3A%22aliased_id%22%3Bs%3A1%3A%220%22%3Bs%3A4%3A%22path%22%3Bs%3A22%3A%22clothing+%26+accessories%22%3Bs%3A13%3A%22url_safe_name%22%3Bs%3A20%3A%22clothing-accessories%22%3Bs%3A11%3A%22child_count%22%3Bs%3A1%3A%226%22%3Bs%3A9%3A%22childlist%22%3Ba%3A13%3A%7Bi%3A0%3Bi%3A202%3Bi%3A1%3Bi%3A2%3Bi%3A2%3Bi%3A275%3Bi%3A3%3Bi%3A4%3Bi%3A4%3Bi%3A481%3Bi%3A5%3Bi%3A446%3Bi%3A6%3Bi%3A454%3Bi%3A7%3Bi%3A436%3Bi%3A8%3Bi%3A205%3Bi%3A9%3Bi%3A227%3Bi%3A10%3Bi%3A203%3Bi%3A11%3Bi%3A280%3Bi%3A12%3Bi%3A204%3B%7Ds%3A8%3A%22children%22%3Ba%3A12%3A%7Bi%3A2%3BN%3Bi%3A275%3BN%3Bi%3A4%3BN%3Bi%3A481%3BN%3Bi%3A446%3BN%3Bi%3A454%3BN%3Bi%3A436%3BN%3Bi%3A205%3BN%3Bi%3A227%3BN%3Bi%3A203%3BN%3Bi%3A280%3BN%3Bi%3A204%3BN%3B%7D%7Di%3A2%3BR%3A31%3Bi%3A275%3BR%3A32%3Bi%3A4%3BR%3A33%3Bi%3A481%3BR%3A34%3Bi%3A446%3BR%3A35%3Bi%3A454%3BR%3A36%3Bi%3A436%3BR%3A37%3Bi%3A205%3BR%3A38%3Bi%3A227%3BR%3A39%3Bi%3A203%3BR%3A40%3Bi%3A280%3BR%3A41%3Bi%3A204%3BR%3A42%3B%7D'
         1  DO_FCALL                                      1  $1, 'unserialize', 0
         2  FETCH_W                                          $0, 'CATEGORIES'
         3  ASSIGN                                               $0, $1
         4  RETURN                                               1

5 opcodes, 2 registers for the entire structure.

I wonder if ZendPlatform optimizes the first case. Does anyone know if the output of VLD is valid if ZendOptimizer is on?

Passing note

I like to thank Brian for pointing out var_export(). It’s surprisingly efficient than I thought and possibly a nice alternative to serialize for special cases (no references, no objects). I’ll have to use the new Benchmark2 I wrote (but haven’t had time to blog) to decide.

About tychay

light writing, word loving, ❤ coding
This entry was posted in PHP. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.