The Shu Ha Ri of strings

In a discussion on the speed differences between various types of strings, I was completely misunderstood (or ignored).

Sad.

George told me his all-time favorite PHP talk of mine was the first one I gave: “OOPs: The PHP Fear and Loathing Guide to Basic Object-Oriented Design.” Perhaps one reason may be is this segment (click the image below a bunch to playback the slide deck):

Shu-Ha-Ri is the way you learn in Aikido, but it applies to everything. For those who didn’t play back the above slidedeck or understand it, here is the ideas I want you go have in your head:

  1. Shu – Hold – Copy
  2. Ha – Break – Deconstruct
  3. Ri – Leave – Transcend

Shu: Use the single quote

Here is a summary of results a PHP developer who leads the unexamined life:

  • Single quotes are always faster than double quotes but the difference is negligible.
  • Function parameterization is faster than concatenation (use commas over periods when calling print or echo).
  • String concatenation is always faster string interpolation.

But this can be understood with a simple Ri maxim:

“In PHP, always code for readability first.”
—Me circa 2004

I’ll explain how to get from Shu to Ri in gory detail below and answer a couple of my favorite interview questions in the process.

[Gory details after the jump]

Ha: String interpolation is the devil

The commenting in the article crushed the hyperlink I provided for doing the diagnosis that Sara did.

Even though I’ve said this two times in the comments and tried to provide an article teaching them how to fish, it appears that people are so dense that they need a direct answer.

So to be very direct and specific, the Opcodes for f1 ($c = "foo ".$bar) and f3 ($c = 'foo '.$bar) are identical:

FETCH_R
CONCAT
FETCH_W
ASSIGN

but for f2 ($c = "foo $bar") they look like this:

INIT_STRING
ADD_STRING
ADD_STRING
FETCH_R
ADD_VAR
FETCH_W
ASSIGN

It gets worse the more word breaks there are in your string. It gets downright ugly, in fact.

Ha: Function parameterization

Here is a favorite interview question for the PHP developer who thinks they’re hot shit. I give the developer five strings and ask them to sort them in increasing order of performance (and explain why):

  1. echo "foo$bar";
  2. echo 'foo' , $bar;
  3. echo "foo" , $bar;
  4. echo "foo" . $bar;
  5. echo 'foo' . $bar;

Nobody gets this one. :-D.

Given the discussion above, you can order (1,4/5), but to figure out where (3/2) you’d have to know the thing Sara mentions in her post.

The answer is knowing ECHO is faster than CONCAT + a temporary register. I found that out when Rasmus wrote posted about this a few years ago. If someone ever answers (1,4,5,3,2) and then immediately explains correctly why, I’ll know that they’ve either been reading my blog, or I should be on the lookout of the candidate pulling a John Nash down the road.

Ha: Single quotes are always better

It might also be instructive to note in my answer that single quotes are still faster than double quotes, even though the Opcodes generated are the identical. The benchmarking scripts provided don’t time the Zend compile time. That’s where the difference between single and double quotes manifest themselves (assuming you aren’t using the string interpolation features).

Basically the compiler needs to introspect a double quoted string to see if it needs to be interpolated or not. It avoids this with single quoted strings.

Ha: HERE documents are the new ugly

Speaking of interpolation, want to see really bad performance? Run a HERE document through VLD and look at the opcodes on that. (Hint: it’s the same as running string interpolation). Or consider which is faster: Executing multiple interleaved <?php echo ?> or constructing the same line with a single comma separated echo?

That last question has a huge performance impact on most templating systems out there.

Ri: …sometimes they’re not

That last point brings us to a more important point. Here is another interesting result:

'foo
bar'

is faster than

"foo\nbar"

which is faster than

'foo'."\n".'bar'

And yet, I prefer the second form in the interest of readability.

APC bundling in PHP6 and Zend Platform at work eliminates the difference between those the first two since your script will be pulled in compiled form from the cache. See that’s where practice trumps knowledge.

In most cases, I feel single quotes are more readable. Besides, the maxim “single quotes over double quotes” encourages better code for the Shu-programmer by discouraging string interpolation for those who don’t know the finer details of when and where one is better than the other. Because we now have a Ha-level understanding, we know why. But our goal is to reach the transendental state of “Ri-level” and know there are no rules.

Ri: Considering i18n/l10n

At a certain point performance has to give way to readability and modularization. Take g11n. Concatenation is way faster than sprintf() but one is localizeable and the other isn’t. Which one do you think a good developer users? (This is an interview question I ask PHP developers few of whom get this stuff right, mostly those who do just explain it on the Shu level because they’ve i18n’d a site before.)

The answer is to use sprintf() and printf() because the strings you create using gettext are actually l10n-able. Unfortunately the only language I know is English so I can’t really demonstrate it. But in simple terms consider the following string: $user_name.'’s homepage' which might be l10n’d into another language as 'The homepage of '.$user_name.

How much i18n logic would you need to write so that it is l10able? That’s not a good proposition. Far simpler is to turn it export the following string into gettext:
sprintf(_('%s’s homepage'),$user_name);
which can be flipped in a po file as 'The homepage of %s'.

Benchmark this and you’ll see that it executes extremely slowly. The difference is, slow wins: down the road a successful site will be internationalized and the later form makes it easy even if performance “bites” it’s extremely cheap to develop this way and development time is the scarce resource here.

Ri: What language is this again?

As PHP moves to standarization on PHP6: Moore’s law, safe and standardized code caching, compiler optimization, and JIT execution among other things will eventually make these differences moot.

But consider that they’re moot now: Are you waiting on your string processing or your database?

Premature optimization is the root of all evil in programming.”
—Donald Knuth, “The Errors of TeX”

PHP is a web language designed to solve the web problem. It is the glue, it is not the house.

Ri: Reality is the ultimate arbiter

There are only a couple languages that execute slower than PHP for any synthetic benchmark you can imagine (MacPascal?). For all intents and purposes, PHP is the slowest language on the internet (okay ColdFusion gives it a run for the money).

Yet, this “slowest language” powers the most-trafficked websites on the internet. PHP is the engine of the internet—other, much more faster languages are not depsite additional advantages in modular libraries, readability, hype, monopoly levers, or money. (That’s Perl, Python, Ruby on Rails, dotNET ASP, and Java J2EE, respectively, for those keeping score at home—all of which benchmark faster than PHP.)

Think about that for a moment.

The why of this is probably best answered by the epigraph in Andrei’s blog:

“Man, if you gotta ask, you’ll never know.” —Louis Armstrong, when asked “What is jazz?”

Maybe you know the answer?

You have reached “Ri” and can depart from this article.

9 thoughts on “The Shu Ha Ri of strings

  1. Great article! Really enjoyed it. I’ve always used single quotes instead of doubles but I never even considered using comma separators instead of concats. Then again, I find that I’m often building a string through the course opf several if/thens and other constructs so the comma separation wouldn’t really be available.

  2. Totally disagree. Performance difference between interpolation and concatenation is so small that almost no web developer is ever going to notice the difference. Even massive scale projects have no need for the negligible performance gain by concatenation.

    BTW try a loop that does:

    $a = “a $a a”;

    and

    $a = ‘a ‘.$a.’ a’;

    The interpolated version is faster! Not all concatenation performs better.

    Writing READABLE code is far more important to any project.

  3. David,

    How is your thesis “totally disagree” with my statement where I advocate the “super slow” sprintf() over any of the others. Besides being even more readable than your first example, it’s also i18n ready. I’m trying to point out how, in any case where you need to think about strings in this manner, sprintf() is invariably the forward-thinking function.

    BTW, the interpolated version is not faster in most versions of PHP (and in all versions of PHP at the time I wrote this article). Taking it apart you can see that the early PHP compiler puts a ADD_STRING before every whitespace. This has since been fixed, but it pretty much amounts to the difference between ADD_VAR and CONCAT (CONCAT is slightly slower).

    But before you advocate interpolation, fire up PHP 5 < 5.2 and any version of PHP 4 and make a large HERE document with a few variable replacements (like you would have a viral e-mail) and then benchmark the difference. Then come back to me and say it’s negligible (which I think I’ve said in this old article).

  4. The three characters you’ve got listed vertically are in backwards order from the translation… not that it really matters. Just thought since you took the trouble to put up the nice post, that you might as well as get that right.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.