One of the fringe benefits of open sourcing an existing code base is that you have an opportunity to set error_reporting to E_ALL | E_STRICT
, or perhaps rather to 2147483647. When you do that you find small problems with your code base you missed the first time you sloppily wrote it.
In my case, I noticed that date() was throwing strict errors. For example
error_reporting(E_ALL | E_STRICT); ini_set('date.timezone',false); echo date('c');
shows you
I’m sure if you’re Derick, you are intimate with date()ing, but I had forgotten about this wasted guess_timezone()
sys call and the suppressed strict error (which still takes time in PHP 5).
I sent an e-mail with this bug, along with the one line fix to the php.ini, to site operations…and promptly forgot about it. That is until the ticket was sent back with the message that it needed to be “tested in dev and stage before making it to production.”
(The younger, less-tolerant terry would have blown a fuse at this point.) The older, jaded terry simply became curious about what the costs of date() really are.
Benchmarking
The last time I did synthetic benchmarking was week 1 at Tagged where I wrote a harness to compare sqlrelay vs. two home grown connection pools. Not happy with PEAR’s Benchmark_Iterate, I figured it was time to write a benchmarking suite again.
// Bootstrap this without framework include('timer.php'); include('iterate.php'); //mimic production $error_level = error_reporting(0); // {{{ date() ini_set('date.timezone',false); $b1 = new tgif_benchmark_iterate(true); $b1->run(10000, 'date', 'c'); $b1->description = 'date("c")'; // }}} // {{{ date() + date.timezone ini_set('date.timezone','America/Los_Angeles'); $b2 = new tgif_benchmark_iterate(true); $b2->run(10000, 'date', 'c'); $b2->description = 'date("c") + date.timezone'; // }}} // {{{ iterate date() + ini_set //date_default_timezone_set(''); ini_set('date.timezone',false); function ini_and_date() { ini_set('date.timezone','America/Los_Angeles'); date('c'); } $b4 = new tgif_benchmark_iterate(true); $b4->run(10000, 'ini_and_date'); $b4->description = 'iterate date("c") + ini_set'; // }}} // {{{ date() + date_default_timezone_set ini_set('date.timezone',false); date_default_timezone_set('America/Los_Angeles'); $b3 = new tgif_benchmark_iterate(true); $b3->run(10000, 'date', 'c'); $b3->description = 'date("c") + date_default_timezone_set()'; // }}} // {{{ iterate date() + date_default_timezone_set //date_default_timezone_set(''); ini_set('date.timezone',false); function set_and_date() { date_default_timezone_set('America/Los_Angeles'); date('c'); } $b5 = new tgif_benchmark_iterate(true); $b5->run(10000, 'set_and_date'); $b5->description = 'iterate date("c") + date_default_timezone_set'; // }}} echo tgif_benchmark_iterate::format($b1->compare($b2,$b3,$b4,$b5)); // restore error_reporting($error_level);
(Here are timer.php and iterate.php. If there are major bugs, I apologize, I hacked it together in bed last night.)
Results
I should have really rebuilt my dev install not to have xdebug, inclued, and other debugging crap here. In any case, here is the result when performed on my MacBook Pro:
mark | wall time | resource time |
---|---|---|
date(“c”) | 0.000043s | 0.000044s |
date(“c”) + date.timezone | 5.53x | 5.60x |
date(“c”) + date_default_timezone_set() | 6.79x | 6.96x |
iterate date(“c”) + ini_set | 3.30x | 3.37x |
iterate date(“c”) + date_default_timezone_set | 3.45x | 3.55x |
Yeah, we’re talking about microseconds here, but as you can see, even if you set the default timezone on every request and only call date() once on average, you’re still much better off with a userspace ini_set
or date_default_timezone_set
than with doing nothing. That’ll add up if some idiot programmer has date()
caught in a tight loop to build a calendar or something—don’t laugh, I’ve seen this. And since doing it in user-space doesn’t require another bounced trouble ticket, I promptly did just that.
We’ve been running that way for a couple weeks now.
No one has noticed yet.
you should figure out how to fix our timezone problem and then post that up too haha. oh it’s so bad.
I just thought you’d be happy that I wrote a pretty benchmark comparison routine. It should make optimizations on tag_encode much easier now. 🙂
Thanks for this. I’m one of those people who have date being called in a tight loop. 🙁 What would you do instead of calling date in a loop?
Also, does strtotime suffer from the same problems?
For most use cases you should be able to get a speed up by simply doing:
date(‘c’, $_SERVER[‘REQUEST_TIME’]);
@David try storing the result of the date() outside of the loop and using that, also setting it from the request_time as Lukas suggested will negate an internal time() call.
@Lukas, Wes: Hmm…
Yields:
The difference is less but noticeable (a 104% speed up). Good point though, if I had used a static time to, the impact of guess_timezone() more pronounced.
…
BTW, here is a result comparing three different hashing algorithms (used for generating unique ids of various lengths based on a small amount of server related random data).
Of course, the CRC32 is only 32 bits (5 digits when represented as a base64 number).
David, I don’t have date in a tight loop, per se. What I have is calls to date (2 per month), that calculates the day that the first falls on and the total amount of days in the month.
At this, I have a total of 24 calls to date per year. Now, I could perhaps, instead just use two date calls, check to see if it is a leap year and find what day the first of January falls on. Then build an algorithm that loops through the 12 months, keeping track of the next month first falls on.
The first is pretty clean and the time cost is reasonable to how often it is actually used. I actually have the date.timezone set in the php.ini, so I don’t incur any cost from lookup.
The algorithm for the first is reasonably easy and someone who doesn’t know shit can pick it up and modify it with very few problems. In fact, most of the code is basically class syntax and inline commenting noise.
I was thinking more in regards to formatting dates in a result set.
Eg, listing a table of orders made in the last month where I want it to show the date the order was created, and the date the order was dispatched. Obviously, if the date comes pre-formatted from the database, then that would eliminate the need for date() calls completely. But is the database (MySQL in my case) *that* much faster at formatting dates?
Ok, maybe it wasn’t a great example… the main point is that date is formatting a unique timestamp each time it’s called, vs something that can be called once and reused throughout the script.
David,
A common use case is to make a clickable calendar on a blog. People usually build this by going column by column, row by row and calling the date() function. George Schlossnagle once had a talk showing the homepage of a serendipity install and how a profiler would show that 10% of the entire calltime of the page was wasted in date()s for the tiny calendar on the right—a fact not evident unless you were to use APD or XDebug.
There isn’t a calendar like that on the Tagged website (to my knowledge), but when I did this at Plaxo, I generated the calendar entirely in javascript (retrieved via “Ajax”). In the previous version of the site (older by a couple months), I used html tables generated from premade clearsilver template data generated by a C engine.
Dates in a result set like you describe, (as in say a list of comments ordered by date, or in the newsfeed ordered by date) have a date() function call for each one at Tagged. In actually, the date() function can be removed because on render, in the client side javascript, those dates are transparently replaced with a javascript date call that reads a span attribute UTC timecode (or interprets the actual RFC compliant date if that isn’t available) and turns this into a relative dating structure via javascript (you know like 3 seconds ago… 1 minute ago… etc. a la Twitter, only done where the processing would be easiest). 🙂
(By the way, since we used to do the computation on the server side, data when we’d leave “relative dating” would be very hard to compute and require remember the user’s time zone preferences. This is the “timezone” problem that Mark alludes to in his first comment. I’ve eliminated this for a couple parts of the site through the trick above.)
I hope this helps outline some strategies that can deal with date()ing problems. The actual solution you use will depend on the problem. Writing software is about making choices.
Take care,
terry
Marco Tabini plugs Derick’s book (which is linked in the above article).
@Terry
Thanks for the feedback. Never occurred to me to use JavaScript to do the date formatting. Very cool.