Original article posted to PHP Advent 2008. Happy Christmas!
Take a simple PHP trick and follow it on a huge tangent to the philosophy of good web architecture.
It’s an honor to be asked to share my ideas with the PHP community. When Chris and Sean asked me to write an entry for the Advent Calendar, I had to accept. Like last year, this article will be quite long. If you need something short and sweet like the other advent entries, you can just read the first section. But if you read it all, there might be a worthwhile concept buried in this logorrhea.
Funky Caching
Funky Caching is an obscure trick often attributed to Rasmus but actually invented by Stig. It is also known as the “ErrorDocument trick,” “Smarter Caching” and “Rasmus’s Trick.” It was first presented by PHP creator, Rasmus Lerdorf, in his fun Tips and Tricks talk.
It’s entails the following:
First you create an ErrorDocument line in your apache.conf.
ErrorDocument 404 /error.php
What this does is tell the webserver to redirect all missing files to a PHP script called “error.php” in your directory root. You then create this handler:
$filepath = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH); //or $_SERVER['REDIRECT_URL'] $basepath = dirname(__FILE__).DIR_SEP; // Test to see if you can work with it if (false) { //…EDIT… //output a 404 page include('404.html'); // see http://www.alistapart.com/articles/perfect404/ for tips return; } // Generate the file // …EDIT… $data = 'something'; // Don't send 404 back, send 200 OK because this is a pretty smart 404 // not a clueless one! http://www.urbandictionary.com/define.php?term=404 header(sprintf('%s 200', $_SERVER['SERVER_PROTOCOL'])); //Show the file echo $data; //Store the page to bypass PHP on the next request. Use a temp file with a // link trick in order to avoid race conditions between concurrent PHP // processes. $tmpfile = tempnam($basepath.'tmp','fc'); $fp = f_open($tmpfile,'w'); //remove the "_" this is crashing my blog syntax hilighter fputs($fp, $data); fclose($fp); @link($basepath.$filepath, $tmpfile); //suppress errors due to losing race unlink($tmpfile);
Other than the two areas delimited by …EDIT…, the code above is pretty canonical.
What does this trick do? Basically, when a file doesn’t exist, it gives an opportunity for PHP to create the data and return it instead of a 404 error page.
But after that is where the extra magic happens. It places the generated file directly into the web server path. What this means is that the next request to the same web resource goes directly for that resource without starting up this code and regenerating the data. Thus PHP never gets instantiated again.
At that point it truly becomes PHP without PHP.
Words have meaning through paradigm
The foundation of Steve McConnell’s seminal text, Code Complete, was that software development should be based around the paradigm of construction. But that was fundamentally flawed because of the mythical man-month—the “man-month” term itself originally coming from construction work. We now know McConnell was wrong and software isn’t construction, it is engineering. And we’re called “software engineers,” not “software workers” for this reason: we’re engineers, not construction workers.
My title at Tagged is currently “software architect.” And I have a “radical” idea that maybe titles are that way because they mean something. Meaning that if I’m hired as a software architect then I should think like an architect and find my inspiration from architecture.
Fallingwater
Nestled along a creek in the woods of southwestern Pennsylvania is a house that’s angular features are cantilevered 40 feet above a waterfall. This was the summer home of the Kaufmann family, owners of a Pittsburgh department store. (I remember that department store well because I spent days next to it in a neighboring newsstand reading issues of MAD magazine and Cracked.)
When I was a kid, my dad took us to visit the place, and I became one of millions of visitors to Fallingwater, a home that on its inception was hailed by TIME magazine and became known as the quintessential example of the organic architecture of architect Frank Lloyd Wright.
Why is Fallingwater, a summer home for a Pittsburgh family, so obviously beautiful that hundreds of thousands make the trek (50 miles from the nearest city) each year, that it was voted “best all-time work of American architecture” in 1991 by the American Institute of Architects, that pictures of it are as instantly recognizable as any natural wonder?
Maybe it’d be enlightening to consider how Frank Lloyd Wright built it. Before he started, he commissioned a survey to of the entire topography around the waterfall and had them include all trees and boulders. He then came up with an idea of a cantilevered house that would stretch in a manner that would look like it floated in air above the waterfall.
But more important, perhaps are details such as these:
- A staircase connects the house directly to the stream below. The stream and waterfall, though not seen, can be heard in all parts of the house.
- Windows wrap around the entire building such that the tungsten lighting of the interior would complement the fall foliage and the house and greenery would merge optically throughout the house.
- This glass was often caulked directly to each other and to the steel frame or stone walls to further symbolize the lack of barriers between the construction and the surrounding nature.
- The driveway trellis had a semi-circular cutaway to accommodate an existing tree.
- The stone floor in the living room, quarried from local rocks, is actually built around a boulder undisturbed from its original position in the site.
From details like that to the whole view taken in at once, one gets a feeling that, in spite of the sharp horizontal and vertical lines of the building, the whole lives in harmony with its environment “instead of lording above [it] in an isolated spot as a man-made imposition.”
Frank Lloyd Wright designed on the principles of: “organic, democratic, plasticity, continuity.” We can see how this example holds true to those values.
Could this building have been built anywhere else?
Why is Funky Caching trick so prevalent in the PHP world?
If you look at Funky Caching, it doesn’t need PHP to implement it. This begs the question as to why it first appeared in the PHP world? Why is this obscure design pattern so ubiquitous in the PHP world? In fact, you, as a PHP developer, use it every day when you visit the PHP.net website the instant you type http://www.php.net/strstr to figure out the order of the needle in the haystack.
A cynic would say because PHP is so slow to execute it needs solutions like this to perform well. The problem with this argument is that no single web language outperforms the fastest static servers out there, or even come close to the slow ones. There is no web language that wouldn’t benefit from this trick.
But there is truth to the cynic’s statement. The PHP world may have discovered this first because it trades off speed of execution with speed and ease of development. As Andrei mentioned earlier, that is fundamental to its design. In fact, all dynamically typed scripting languages make this tradeoff.
The ubiquity of this trick in the PHP world is because it—like Frank Lloyd Wright’s Fallingwater—lives in harmony with its environment. The environment is an Apache web server, persistent data store in the form of a relational database, and the demands of large-scale consumer-facing web. Would this solution exist without an Error Document handler built into Apache? Would this solution exist if we didn’t persist content on a (relatively) slow data store like a database? Would this solution exist if the consumer didn’t demand millisecond response time for dynamic content?
Funky Caching in the PHP world lives in harmony with that environment. It lives in harmony with PHP itself.
The architectural principles of PHP itself
PHP is a language that is designed to solve the “web problem.”
PHP is built on as a component of the web architecture as Maggie mentioned earlier. Without Apache serving it, without a database backing it, without the demands of the web behind it, without thousands of hosting sites installing it, without hundreds of open-source packages written in it, it would be useless.
The language itself, like Fallingwater, is customized for the problem at hand and compliments the environment in which it lives. Just like Wright’s design lived true to his principles, so does PHP and its solutions live true to its principles: “cheap, scalable, pragmatic.”
When using PHP, Let us not forget PHP’s three principles that attract us to the language in the first place:
- Cheap (cheap on developer time and resources): “A project done in Java will cost 5 times as much, take twice as long, and be harder to maintain than a project done in a scripting language such as PHP or Perl.” —Phillip Greenspun
- Scalable (shared-nothing architecture): “That a Java servlet performs better than a PHP script, under optimal conditions [has] nothing to do with scalability. The point is can your application continue to deliver consistent performance as volume increases…PHP delegates all the “hard stuff” to other systems.” —Harry Fuecks
- Pragmatic (designed to solve the web problem): “PHP is not about purity in CS principles or architecture, it is about solving the ugly web problem with an admittedly ugly, but extremely functional and convenient solution. If you are looking for purity you are in the wrong boat. Get out now before you get hit by a wet cat!” —Rasmus Lerdorf
Could this “ugly, but extremely functional and convenient” web language been built to solve anything other than the “ugly web problem”?
Bellefield Tower
One block from where my mother used to work, on the corner of Fifth Avenue and Bellefield in Pittsburgh, stands a strange sight. A very modern building wraps around in a Jobsian-loving rounded rectangle, narrowly avoiding a gothic Romanesque tower a century its senior. An uglier and more out-of-place architectural juxtaposition I have never yet seen.
If you weren’t in Pittsburgh in the late 1980’s you wouldn’t have understood how his could have happened. On this ground once stood the original Bellefield church built in the 1880’s. Since its congregation had been moved further down Fifth Avenue, the building was sold a century later to developers trying to exploit the new business attracted by the Pittsburgh Supercomputing Center and the joint CMU/Pitt software building. They wanted to level it and build a new building, but were blocked when people mobilized to save the old tower. The developer then proceeded to honor this by demolishing everything but the tower and building the ironically-named “Bellefield Towers” next to it.
You can see the current Bellefield Presbyterian Church as an common example of the gothic architecture of the area. You can also note Carnegie Library of Pittsburgh and the Cathedral of Learning—both next door, both reflecting the gothic Romanesque architecture, and both figuring prominently in iconic photos of the most famous game in baseball.
Why is Bellefield Towers so obviously ugly? the old Bellefield Church tower stands next to Bellefield Towers with a “sawed-off” quality to it. The curved modern architecture of the latter serves only to emphasize how it was built with no consideration of the surrounding environment. The Oakland and Shadyside areas of the city that the old Bellefield Church straddled contains many unique examples of this Romanesque gothic architecture. When faced with a gorgeous 100-year old example of the area’s architecture, instead of working with the environment like Frank Lloyd Wright did with Fallingwater—in the same area of Pennsylvania no less!—the developer simply sawed it off!
I remember watching it happen, and this (literal) architectural lesson guides me to this day about the follies of architectural hubris in software.
What hubris?
“What hubris?”
Well have you ever seen developers write code without considering the environment in which the code will live?
I guess my big beef with most frameworks is that they’re often written with no consideration of the environment—that is almost by definition. The best frameworks are ones that are less frameworks than applications which force constraints of an environment.
As Paul mentioned earlier, even if you build it your way and customize the solution for your application, it’s still a framework. But it’s a framework most likely to have at least one successful user… yourself.
“I’m a developer, I can make the software conform to my needs.”
Oh really? That sounds a lot like trying to “lord over the environment with an isolated man-made imposition.”
“But what I mean is it’s all man-made in software, there is no environment.”
You don’t develop in a community as Chris mentioned earlier? That’s environment. You never took over a project you didn’t write or worked at a company with a pre-existing code base? That’s environment. You never dealt with an installation problem because your host was configured differently then your development environment? That’s environment. You never had business needs trump the little feature creature sitting on your shoulder? That’s environment. You’ve never listened to a user request, as Paul mentioned earlier? That’s environment.
“There is no danger of that environment being different.”
When I joined my current company, they had a couple services written in Java, only Zend Accelerator could code cache their PHP 4 installation, Oracle RAC powered the back-end and development occurred by engineers working in cubes with a relatively heavyweight waterfall development process.
Even though I prefer Python to Java for services, we’ve increased our Java development to around 40% of our code base! Even though I prefer MySQL to Oracle, we still use Oracle as our back-end. Even the transition to the open office occurred after it became apparent the company had outgrown the cube-space.
Why? Because that is the environment and your solutions have to work within that environment. Anything else is architectural hubris.
“But that’s not an architecture decision.”
Let’s say it is the early days of social networking and you join a company that is using Java/J2EE instead of PHP, or Oracle instead of MySQL, or they’re using Perl/Mason instead of your favorite (PHP) framework, as Marco mentioned earlier—there are so many to choose from that the number is second only to Java.
Do you go in and say your experience building a CMS or e-store trumps their experience working on a nascent social network? Do you replace all the Java engineers with PHP ones? Do you replace MySQL with Oracle? Do you rewrite the site from scratch using your favorite framework?
These things may have happened and more.<!–
Maybe if a high-flying startup is attritting many top people, it’s because the new people they’re bringing in are guilty of this same mistake: the architectural hubris of not surveying the existing development before making their decisions. We have jokes in the Valley about the egos of former Apple V.P.’s, or the poor management style of former SGI employees, the wastefulness of former Microsoft execs, and the passive aggressiveness of former Yahoo! vice presidents—there will be jokes about former Googlers, Facebookers, or startuppers.
And those jokes will be deserved.–>
“So you’re always right?”
I’m not saying that in all these instances these architects shouldn’t have made the decisions they did. I am not qualified to answer that since I didn’t work at these places.
But what I do know was that in the vast majority of cases, people went in without considering the existing environment. I do know the dynamics of a Facebook.com is different from the dynamics of Gamespot.com or Amazon.com. I do know a social network is different from a CMS or e-store. And all these solutions are very different from ones in enterprise.
Like building Fallingwater without getting an adequate survey done, every day people make the mistake of not looking before acting. They try to make PHP look like Java with dollar signs, as Luke mentioned earlier. They expected the environment to conform to their reality so they can lord over it with “some isolated man-made imposition.”
And in those cases, you’re more likely to build a Bellefield Towers than a Fallingwater.
The Golden Gate Bridge
I’ve long since moved from the woods of Western Pennsylvania to the San Francisco Peninsula. I am fortunate that my weekly run passes with a near-constant view of the most recognizable architecture in the American West:
What’s interesting is that there are much longer spans in the country and the world. Even in the same city, there exists a beautiful bridge that is both longer and of more utility. And yet this bridge represents the icon of San Francisco and of the state as a whole.
Why?
I’m not sure, but consider these things:
- The original design was for a hybrid cantilever and suspension structure. But it was replaced with a pure suspension design because the former was deemed too ugly. A pure suspension of this length had never been attempted before.
- Irving Morrow designed the bridge tower, lighting, and pedestrian walkways with an entirely Art Deco decorative influence.
- The bridge was painted in a specially formulated anti-rust paint in International Orange on demand from locals.
Think a moment of any of those design decisions. Each of them, along with the building of the structure in the first place, was fought as an uphill battle against economists, the rail lines, engineers, The War Department, and others. The Navy alone originally demanded it be painted black with yellow stripes to assure visibility with passing ships.
Can you imagine that?
I run by or cycle over the Golden Gate Bridge once a week at all times of day in all weather conditions, and, whether seen from the north side or the south, from the east or the west, I’m struck by the salient fact that it is iconic because the rust-colored suspension-only art deco structure is just right for the environment it is in.
The rust-colored paint evokes the hills of Marin to the north as well as the setting sun. It is natural and visible enough to be safe. It becomes an icon. Every week I pass by it and am inspired and thankful I can live in so beautiful a city.
The Design Pattern
To me, the most salient point of a design pattern comes from its original definition. From Christopher Alexander’s book on architecture, The Timeless Way of Building:
Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.
So if Funky Caching is a design pattern, then it too can be used a million times over, without ever doing it the same way twice.
But how are we to know which way to do it? Or even if it is the right pattern to be using in our situation?
The answer is found in how both a house in Pennsylvania and a bridge in San Francisco represent ultimate expressions of architecture. They are wholly appropriate for the environment in which they stand.
When choosing between a singleton and a global variable, which pattern to use is determined by the environment. A cantilever is wholey appropriate to create the floating look of Fallingwater, but that same pattern would disrupted the naturalness of the Golden Gate Bridge.
So too must the solutions that use Funky Caching (or PHP in general) be wholly appropriate for the problem at hand.
In Rasmus’s original talk, he suggests that this solution can also be used to do the following: search for the closest matching valid URL and redirect, and use attempted url text as a DB keyword lookup. We can see php.net’s solution outlined right there!
At Plaxo, we had the problem where images are stored on the database but need to be generated in multiple sizes and thumbnails and streamed fast to the user. Databases are slow, lumbering stores. The solution was Funky Caching:
Recently, Tagged has run across the very same performance (size and number) issues in Javascript that Helgi mentioned earlier. The solution: Funky caching hooked up to a javascript compressor powered by a Java service back-end to dynamically catenate and compress javascripts into a unique URL on demand.
We recently imported 1/64th of our production data over to the staging environment for testing, but the user’s images would take too much time and disk space to import. We could just link the images, but then tester’s didn’t know which ones they uploaded and which ones were proxied from the live website. The solution was to spend an hour writing a Funky Caching Proxy. If the image was missing, the ErrorDocument handler would try to grav the image from the production website and add a watermark.
Here is the complete code. Since this is only for testing, there is no need to waste disk space by storing the created file. The performance hit of real-time generation of redundant requests is unnoticeable to Quality Assurance.
$watermark = '3129080702_c4e76f71d7_o.png'; $dead_url = 'http://example.com/dead_image.png'; // {{{ start_image($filename, &$data) /** * Creates a gd handle for a valid file * @param $filename string the file to get * @param $data array the imagesize * @return resource GD handle */ function start_image($filename, &$data) { $data = @getimagesize($filename); if (empty($data)) { return null; } $data['ratio'] = $data[0]/$data[1]; switch($data[2]) { case IMG_GIF: return imagecreatefromgif($filename); case 3: //problem where IMG_PNG is not bound correctly for my install :-( case IMG_PNG: return imagecreatefrompng($filename); case IMG_JPG: return imagecreatefromjpeg($filename); case IMG_WBMP: return imagecreatefromwbmp($filename); case IMG_XPM: return imagecreatefromxbm($filename); } return null; } // }}} $requestimg = $_SERVER['REDIRECT_URL']; if (!$_SERVER['QUERY_STRING']) { // redirect user to invalid image tag_http::redirect($dead_url); return ''; } // grab image to temp {{{ $ch = curl_init($_SERVER['QUERY_STRING']); $tempfile = tempnam('/tmp', 'prod_remote_'); $fp = f_open($tempfile, 'w'); //again delete the "_" curl_setopt($ch, CURLOPT_FILE, $fp); curl_setopt($ch, CURLOPT_HEADER, 0); curl_exec_($ch); //delete the final "_" curl_close($ch); fclose($fp); // }}} // configure image and dimensions {{{ $size_data = array(); $im = start_image($tempfile, $size_data); if (!$im) { unlink($tempfile); tag_http::redirect($dead_url); return; } // }}} // get watermark information {{{ $wm_data = array(); $wm = start_image($watermark, $wm_data); if (!$wm) { unlink ($tempfile); tag_http::redirect($dead_url); return; } // }}} // add watermark {{{ if ($size_data['ratio'] > $wm_data['ratio']) { // image is wider format than the watermark $new_smaller_dim = $wm_data[0] * ($size_data[1]/$wm_data[1]); $dst_x = ($size_data[0] - $new_smaller_dim)/2; $dst_y = 0; $dst_w = $new_smaller_dim; $dst_h = $size_data[1]; } else { // image is taller format than the watermark $new_smaller_dim = $wm_data[1] * ($size_data[0]/$wm_data[0]); $dst_x = 0; $dst_y = ($size_data[1] - $new_smaller_dim)/2; $dst_w = $size_data[0]; $dst_h = $new_smaller_dim;; } imagecopyresized($im, $wm, $dst_x, $dst_y, 0, 0, $dst_w, $dst_h, $wm_data[0], $wm_data[1]); header(sprintf('%s 200', $_SERVER['SERVER_PROTOCOL'])); header(sprintf('Content-type: %s',$size_data['mime'])); // }}} switch ($size_data[2]) { case IMG_GIF: imagegif($im); break; case 3: case IMG_PNG: imagepng($im); break; case IMG_JPG: imagejpeg($im); break; case IMG_WBMP: imagewbmp($im); break; case IMG_XPM: imagexbm($im); break; } imagedestroy($wm); imagedestroy($im); unlink($tempfile);
With a bit of creativity this concept can apply to modern applications where instead of caching on the filesystem, you cache in memcache; instead of bypassing the application server, you bypass the web servers themselves with a content distribution network; instead of serving static content from the edge, you serve dynamic pages.
Whether to use it and how to use it is always determined by the environment.
Comments
I hope this tour helps you see software development in a different way—that finding solutions are about using the right solution in a manner that fits with the environment. Even then I realize that we can’t architect structures that work as harmoniously together as a city such as San Francisco:
…but one can always hope. 🙂
Happy Holidays from me and the PHP community to you and yours.
If you would like to comment on this article, you can you can leave a comment here.
by terry chay
“I drop the F-bomb.”
URL: http://terrychay.com/blog
Location: San Francisco, California–>
In it’s simple form it’s
No first apostrophe.
or they’re using Perl/Mason instead of your favorite (PHP) framework, as Marco noted earlier, there are so many to choose from that the number is second only to Java.
I’d vote for splitting after the first comma into a second, parenthetical sentence.
Feel free to delete this comment. Great article.
Every time I come to this post, it’s longer.
Amazing commentary Terry.
Corrections made. Thanks everyone. This will appear in the PHP Advent Calendar tomorrow (last entry).
Oh boy, you’ve spoiled my holidays 🙁
Ok Ok, we get it – you hate frameworks (though you have difficulty telling framework from the language it’s written in) and you are an ARCHITECT. Cool.
The first thing you should do to blend better into environment – stop storing images in DB. That’s exactly the case of that ugly tower.
The second thing would be to stop hijacking ErrorDocument. We have RewriteCond -f to check wheather the file exists.
Just read it in the PHP Advent Calendar. Great article! I’ll need some more time to think about all the things you wrote, but I’m pretty sure I learned some things. 😉
The difference in quality between this and that CakePHP-centric crap Nate wrote yesterday is day and night. Thanks for ending PHP Advent on a spectacular note!
Great article Terry, the architecture comparisons and the environment topic was very refreshing and inspiring! thanks
@Grinch
Sorry about your Christmas. You didn’t have to read the article.
I have no difficulty in telling a framework apart from a language. I will point out however, that a framework takes on with it the characteristics of the language and language developers. For instance, most PHP frameworks are web frameworks and PHP is a web-focused language.
If you mean the PHP/Ruby/Ruby on Rails debate, I already outlined many times that I dislike the moving target. When I talk about Rails on the web, I get: “That’s Rails, that’s not Ruby” but when these same people brag about Ruby’s popularity on the web, they actually mean Rails. Quid rides? De te fabula narratur.
Not all images are stored in the database. In fact, at Tagged, not a single image is stored on a database. However, at Plaxo, the admin tool to manage ecard content requires the assets be done by a non-developer but the only way of pushing out filesystem changes is through change control and there is little need of user generated content. In fact, outside e-cards, the entire Plaxo address book database (including the thumbnail photo of yourself) is stored in a database. I’m not the architect of that, nor do I see much a problem with it.
What I point out you CAN now get away with putting image content on the database again with this trick. That’s pretty impressive because different assets on a site (statics, javascripts, web pages, user data, application, back-end, corporate site, customer support, etc) are usually managed on an ad hoc basis optimized for the task at hand. Not all of them can resort to using a filesystem push or web tier restart.
I realize mod_rewrite is far more flexible. It is also much more complicated to use. It also is may not be as available in .htaccess in hosting services. It also would slow down valid requests to the server. All of which gets away from the point. I introduce a pattern and you quibble on implementation religion—patterns are implementation independent, so this debate distracts from anything didactic. Don’t let your dislike of me personally destroy the basic foundations of fact. 🙁
For instance, the one line you call me out on:
would have to be written your way something like:
And front_controller.php would be similarly more complicates as it would have to have a switch to handle the valid file cases (or the RewriteCond would have to be a bigger mess than it is).
I end with pointing out that this starts resembling an entirely different pattern than the one I’m discussing. 😀
@Mike, @Anon, @Robert: Thanks! I really appreciate it.
@Anon: Nate did mention some stuff about i18n/l10n that bears repeating. I can’t tell you how many candidates I interview don’t know the difference or give me an incorrect answer. I did try to close this year’s advent by referencing 8 out of the other 23 previous advent entries, but I didn’t see how I could reference/relate Nate’s to this.
Besides, I’m trying to follow the spirit of two of the previous Advent articles (Elizabeth Smith and Andrei) and not piss on the hard work that others do, even if I’m not a fan of CakePHP. 😉
Fantastic article. I’ve created hacky versions of this late in the development process when customer-driven site mutations create problems. Obviously need to plan for it in the beginning 🙂
Just for historical purposes, the capture of the 404 and generating cached pages was an old trick CNET used over 10 years ago (it later became Vignette’s StoryServer). AND simply for the amusement factor, check out http://www.freepatentsonline.com/y2008/0040424.html.
I really enjoyed this article, thank you for taking the time to write it. You don’t see many php articles with this much umph these days. Keep it up, I’m off to scour your past articles now. Thanks again.
It’s great to see some unique content and a good quality blog for once, actually I would be very interested in doing a link exchange with you.
It is still php with php ..