Mr. Hansson doesn’t get to shart on sharding

(A draft of this article appeared on Wednesday because I hit the wrong button on WordPress. I apologize for the confusion it may have caused. What can I say except, “Freedom is messy.”)

This morning Andrei sent me an article from David Heinemeier Hansson titled, “Mr. Moore gets to punt on sharding.”

Since Andrei and I work at pretty well-trafficked websites which couldn’t operate without the very thing David is advocating against, normally I’d just laugh naïveté in his observations—it’s been eight years since the the Internet goldrush and all that’s happened is that a new generation is repeating our mistakes and rationalizing the inevitable fail that ensues.

But there are tons of people who quote David Henemeier Hansson’s words to me at conferences and on the blogs. For every speaking engagement in which I’ve saved someone from a huge architectural misconception, Mr. Hansson has indoctrinated ten more future programmers who will make that same mistake. Like a glacier during global warming, I move forward one inch during the winter and retreat a foot during the summer.

If I don’t do something about this… well someone’s gotta think about the polar bears?

DSC_3589.JPG
Su Lin
San Diego Wild Animal Park, Escondido, California

Nikon D3, Nikkor 70-200mm f/2.8G VR

Okay, this is not a polar bear, but I couldn’t get a good photograph of one. This is a different bear similarly endangered due to habitat destruction.

No, Mr. Hansson doesn’t get to shart on sharding. I’m going to Bush Doctrine it before I see this shitfart come out of the mouths of any of my colleagues.

Before ranting, it is best if we review what we’re really talking about lest we actually start actually confusing sharding with sharting. And, if we can’t explain this in a manner that non-programmers can understand, then we’re liable to hide behind terminology and quotes from famous computer scientists when confronted with facts.

A short primer on databases

The job of the database for a web application is rather simple: it’s where to put all the data the website needs in the event of a power failure—it’s the web-equivalent of a filing cabinet.

It was a revolution when I first learned that this filing consisted of basically a bunch of spreadsheets managed by a Chinese Room. The spreadsheets tables basically look like this:

user_id e-mail password name birthdate
1 Terry Chay 1971-06-09
2 Andrei Zmievski 1977-11-02
3 David Heinemeier Hansson 1979-10-15
4 John Searle 1932-07-31
5 Gene Amdahl 1922-11-16
6 David Allen 1945-12-28
7 Gordon Moore 1929-01-03
8 Jeremy Zawodny 1974-06-04
9 Martin Fowler
10 Donald Knuth 1938-01-10
11 Don MacAskill 1976-02-19
12 Chris Shiflett 1976-05-21

The complexity—if you can call this triviality complex—comes because there are more columns of data in this table (like gender, zip code, school attended), there are more rows in this table (in this case more users), there are other tables containing other data, and these tables are linked together (hence the need for a “user_id” which acts as a social security number for linking to other tables). But, that’s it!

Now the “web problem”—if you can call this a problem—is that this Chinese Room, known colloquially as a Relational Database Management System (RDBMS), has poor performance and, due to a quirk in Amdahl’s Law, this is the bottleneck of the entire web application.

When the bottleneck ends up being an issue to our customers, we say we have a “performance” problem. When this performance problem manifests as our application gets bigger and busier we say this performance problem is one of “scalability.”

This happens a whole lot.

Which explains why so many of us can remain gainfully employed.

A short primer on database scaling

Now when faced with this scalability performance problem in our relational database, there are a number of possible solutions, each with their own terminology.

First we can ask if we should be storing this at all? I like to call this the David Allen approach to database scalability. The reality is that even if you should really be trashing it, a convoluted business case will be made to keep the data.

Next we can ask if we should be storing it in the Chinese Room? This is data partitioning. Believe it or not, this is actually a common solution. Web log files are handled independently of regular OLTP-centric databases. Multiplayer online video games have high performance needs that force them to use large memory-based software with simpler flat files as non-volatile storage. Many search engines deal with so much throughput that they opt to merge the data and work and then distribute this among many nodes via a map reduce. All major websites wouldn’t operate if they don’t have some sort of simpler, faster unit to assist in reading—memcached being a popular solution today, Berkeley DB being one from a kindler, gentler Web era. Still, for most of the “traditional web”, the relational database seems to form both the lynchpin and bottleneck.

Next we can buy a better, faster Chinese Room. This is called vertical hardware scaling. One of the biggest problems of this is expensive because of the law of diminishing returns—twice the money won’t buy you twice the performance. The other is the ugly reality that there is a strict upper bound in performance—the fastest computers only are so fast at any given moment in time. This is the solution that David Heinemeier Hansson is advocating in the article. It is done by nearly everyone to differing extents.

Next we can buy more Chinese Rooms and have them act as a single big Chinese Room. This is called database clustering. The biggest problems here is that it is expensive and difficult to manage these things and that the aforementioned Amdahl’s Law means we have diminishing returns as the cluster grows. Surprisingly, despite these disadvantages, I see a number of websites (including Tagged at one time) try out this solution before abandoning it for the same reasons they abandoned vertical hardware scaling. In my more cynical moments, I say that this stupidity explains why Larry Ellison is so rich.

Finally, we can split the data set across multiple Chinese Rooms. This is called database partitioning and there are two ways of doing this.

One way is to split the columns. For example, say the first Chinese Room works on username and password (logging in) and the second Room works on the rest of the stuff. This is called vertical partitioning. It is often unattractive because the dynamics of the two rooms aren’t comparable. By this I mean there is no guarantee that splitting the amount of data evenly between two databases in this manner will ensure the work the data has to do is going to be evenly split. Even if you could do that now, there is no guarantee that the dynamics will remain the same going forward. It is also non-trivial to write an algorithm to automatically partition the data in this manner.

An example of this sort of partitioning is there is a part of Tagged that accounts for around 40% of the overall website traffic and yet only operates on two tables, so vertical partitioning was a sound strategy for improving performance.

The other way is to split the rows so that, say the first Chinese Room works on the even numbered user’s and the second room works on the odd numbered ones. This is called horizontal partitioning. The term sharding is also used since particular worker in a horizontal partition is called a “shard”—but there is some confusion because “sharding” is also used in horizonal data partitioning: google refers to their map-reduce workers (above) this way. Terminology aside, this is pretty much the accepted way of dealing with “the web problem” because unlike the other solutions it has nearly no scaling upper bound, the problem is homogenous, and an partitioning algorithm is easily written—in the example I gave, it’s even-odd. Remember these three advantages as they are all important.

Tagged, for instance, has one database with 64 logical partitions divided across 16 machines running Oracle (8 “Chinse Rooms” with an equal number of backups). Facebook uses thousands of MySQL machines and eBay uses thousands of in-memory MySQL servers backed by over 15,000 disk-based Oracle machines.

This last scaling strategy is the solution that DHH feels is unnecessary for his company (and yours) because of Moore’s Law.

What Mr. Moore really said

Moore’s Law should have been called Moore’s observation.

The history of switches was an odd one. At one time, all switching were done by humans and this soon was becoming infeasible, as demand for telephone switching was soon to reach a point where every woman in America would have to be employed as a telephone operator to satisfy American demand for telephone service.
Telephone Operators

The Vacuum tube solved that problem, at least temporarily until the much smaller transistor came along. This was quickly miniaturized into the integrated circuit or “chip.”

What Moore observed was that the density of the number of switches on integrated circuits seemed to double every two years since 1958.

Such phenomenal exponential growth over such a long time is very rare indeed.

Things that could tie their business to this growth curve could really benefit.

The internet happens to be one of those things.

Jeremy’s observation

When you cut through all the smoke and mirrors, the meat of the “Mr. Moore gets to punt on sharding” article is that Mr. Hansson doesn’t have to deal with any other forms of database scaling because he won’t have to double in size more than once every two years, at which point he can just buy a bigger computer.

Jeremy Zawodny took exception to this belief.

David Heinemeier Hansson, for those of you who don’t know, is a partner at a small web-based software company that puts out four software products in the individual and small team productivity space. At current count, their sites have an Alexa traffic rank of #14,340 where it reaches 0.00063% of the global internet, who view about 5.8 pages per day there.

Though we’ve cross paths a number of times. I’ve never actually met him. All politics aside, he might be a fun person to have a drink with.

He’s also the inventor of a Ruby framework called Ruby on Rails and from its position as the media darling of the Web 2.0 world, has a great influence over a great many programmers.

Unfortunately, I’ve actually met many of these programmers. They’re fun people to have drinks with right until the moment they lecture me on how I’m wrong and DHH is right.

(That’s the politics part.)

Jeremy observes a number of logical flaws with the argument, they are:

  1. Moore’s law is about transistor count doubling and not performance doubling—the latter is only being loosely tracked to the former
  2. There is diminishing returns with hardware at any given time. This means that the system cost isn’t linear with this form of scalability.
  3. He’s assuming that his problems are and will remain RAM limited. There are other components (like clock speed) which may become limiting factors and are uncorrelated with Moore’s law.
  4. That even if it’s not RAM limited, RAM performance does not scale linearly with RAM amount.
  5. That the particular “Chinese Room” 37signals is using (MySQL) is particularly vulnerable to these bottlenecks. Jeremy happens to specialize in MySQL performance and scalability.

Jeremy’s right, of course. Projecting out based on Moore’s law is highly optimistic at best, and company-killing at worst.

Jeremy Zawodny is a database specialist, first at Yahoo and now at Craigslist. Yahoo has a traffic rank of #2 where it reaches 27% of the global internet who view an average of 12.89 pages per day there. Craigslist has a traffic rank of #33 where it reaches 1.5% of the global internet who view an average of 20.1 pages per day there.

I’ve gotten drunk with Jeremy a couple of times. I’m certain he’s a fun person to get drunk with.

Now before you jump to conclude that I’m about to engage in an ad hominem, I’m just trying to observe that when you work on a site that does 8000x the traffic (and worked at a company with 100,000x the traffic) and this area is your area of expertise then perhaps you might have a little more experience than someone who has neither the traffic nor expertise.

You don’t have to be drunk to see the wisdom in that. ;-)

Moore is a red herring

Even if we assume ignore Jeremy’s arguments, David’s advice is extremely bad.

I have the equivalent role as David at my company which has a traffic rank of #75 where it reaches .72% of the global internet who spends an average of 13 pages per day there (around 4000x the traffic as David).

This modest statistic maps onto real numbers of 5 billion pages per month done by 90 million registered users. At one time (two years…or one Moore’s Law doubling…ago) those numbers were less than a half of a billion (1/10th) and 20 million (1/5th) respectively.

I’ll add that the growth didn’t occur smoothly either: the bulk of the page growth (8x) occurred in the last 10 months of this year and the bulk of the user growth (4x) occurred during three months in the year previous.

And while those numbers are large enough to be pretty remarkable, neither percentage is all that unusual. In fact, in the space I am in, Jeremy is in, and David is in (known colloquially as Web 2.0), that sort of growth is so ubiquitous that we have a term for it when it happens: the “hockey stick.”

David’s rationalization

David’s defense is to quote Martin Fowler’s First Law of Distributed Object Design:

Don’t distribute your objects!

As commenters have noted, this is just an application of this famous quote:

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”
—Donald Knuth, “Structured Programing with go to Statements”, 1974

This philosophy has been encased in the programming maxim: “You ain’t gonna need it.”

Let’s stop right there.

Where has the ability to think critically gone?

Defining things ex post facto

A very popular “Fast Company” business book is The Innovator’s Dilemma. The core thesis is that good companies fail, not because of bad management, but because they fail to capitalize on “disruptive technologies.”

It’s a most amusing book.

The natural question is: what defines a disruptive technology?

Basically it’s a technology that is inferior in price-performance to the existing technology that it ends up (over time) replacing.

You can imagine people reading that book and Only the Paranoid Survive then chasing good money after bad with the thinking that whatever worthless thing they’re investing in is going to be “disruptive.”

Oh wait, that already happened!

We called that the Internet Bubble.

The fundamental problem with Clayton’s definition is that a great many bad technologies look “disruptive” right until the moment they turn out not to be. For instance, the Personal Digital Assistant (PDA) is disruptive—smaller computers replace bigger ones: laptop over desktop PC over workstation over miniframe over mainframe. And you can Microsoft R&D or speculate shares in Palm right up to the moment the Blackberry/iPhone comes along and wipes the entire market out.

The problem, you see, is that a technology isn’t “disruptive” until it actually is disruptive.

The same concept can be applied to “premature optimization.”

Or, as Chris Shiflett likes to say:

Premature anything is bad.

Forgetting large efficiencies

It’s ironic that the people who are fond of quoting Donald Knuth are also fond of forgetting the first half of the quote: “We should forget about small efficiencies.” What they forget that horizontal scaling is not a small efficiency, it’s a very very large one.

As mentioned earlier, horizontal scaling has no upper bound to scale. This is because it is a form of shared nothing architecture and achieves the same benefits as such. By this, I mean if the database is a bottleneck, Tagged’s eight machines are improving performance by very nearly a factor of 8. Similarly, eBay’s performance by a factor of 15,000 over the single machine case!

(That sort of scalability is the reason we architected it this way in the first place.)

In particular, when we are talking web application development, we know we will be bottlenecked at the database. Ask yourself if you’ve run into a single website that wasn’t bottlenecked at the database?

Is it really premature optimization if you know that you’ll have performance and scalability issues in this area before you even start writing? To an experienced web developer building a consumer-facing web application built on viral growth saying “no thank you” to horizontal partitioning is like saying you aren’t going to need a lifeboat when the Titantic is sinking—that’s fine until it actually sinks. How many architects wish they waited to partition their database later? (Answer: None.)

“You ain’t gonna need it” unless you do.

Just because DHH said that you probably don’t need to shard, just because Martin Fowler said you shouldn’t distribute your objects, just because Donald Knuth said that premature optimization is bad, doesn’t mean that the argument is sound.

That’s the very definition of ad hominem.

The (real) ad hominem

Many people have used this argument: “My site isn’t as big as Tagged or Digg or Yahoo! or Craigslit, Their problems are not mine. Why should I have to worry about scalability? Terry, Andrei, and Jeremy are out of touch with the needs of the small developer like David Heinemeier Hansson (and me).”

But that argument is also an ad hominem. Apparently I’m unqualified to talk because I’m “too big.”

Here is an example from Don MacAskill, whom I chose because he’s puts things much more tactfully than me:

37signals should count their blessings that they have a profitable, growing business that can ignore sharding for now. That sounds like heaven. :)

I think [DHH makes] a very valid point by saying “very few applications actually need to scale.” Those of us working on applications that do need to scale often forget that the vast majority don’t.

Don’s site is SmugMug and has an Alexa traffic rank of #1966 where it reaches .041% of the global internet, who view about 6 pages per day there (about 70x David’s 37 Signals site).

I’ve made the exact same argument as Don, but the more I think about it, the more I think that this is bad argument.

Getting it.

Even if Moore’s law was some silver bullet, then, at best, an application near load can only double in size once every two years. Actually, a majority of applications in our space—by this, I mean Web 2.0 appsdo need to scale as they are consumer facing and have hockey-stick/viral growth.

The very fact that DHH is considering the problem implies that his servers are running “hot”—at 70% of its capacity or more. This means he has a performance problem. If this occurs because his site is growing, then this problem is one of scalability.

It may seem that I don’t “get it” because the site I (currently) work on is very large, but I do “get it.”

If the site is a vertical or an enterprise, You Ain’t Gonna Need It and therefore horizontal partitioning is premature optimization. An example of this is in enterprise applications in which Ruby/Rails is slowly transplating an existing Java/J2EE one. Another example is you are building a course registration application for a university: attendence at your university doesn’t double overnight!

But just because DHH hasn’t faced the problem doesn’t mean that the majority of Web 2.0 applications out there don’t double in size faster than every two years. If the site is a consumer-facing viral-growth based company, you are definitely going to need it so it’s not premature to buffer against explosive growth, it’s common sense.

A site based on the virality Web 2.0 model inherently runs on a hockey stick growth curve. This means that it you need it to double in size in less than a month—that rate has happened at least four different times in Tagged’s history and every account of every successful Web 2.0 company has had similar instances of growth doublings occurring faster than an two year timescale provided by even the most generous definition from Moore’s law. Running your servers hot, in the face of this model is irresponsible.

Missing the curve

In other words, the reason that David hasn’t had to deal with the web problem is because 37 Signals has never once actually experienced viral growth. My guess is it is because their product is a vertical in which social networking e-mail spam dynamics aren’t effective.

I’ve used Palm PDAs since the Palm V (1999) and, being a nerd, I’ve used it pretty extensively. I used to say that the problem with Palm application development is if you looked at how people use the Palm, most of them pretty much used just the address book app and memo pad, if they’re really committed they might use the to do list, and that depending on their job description these committed people might use the calendar. The problem is that all four apps are bundled with the Palm. People like me are a tiny minority of a tiny minority.

When you look at 37Signals you see they make the take the following applications and put them on the web: a contact manager, a project manager/collaborative to do list, a group calendar, and a group chat system—three of the four same applications that relegated PDAs to a niche. That sounds like a vertical market to me.

Another possibility is that their pay-to-use model regulates their growth intentionally or limits their growth unintentionally. Certainly usage of subscription sites is much lower, which is one of the reasons newspapers are in trouble today.

Don MacAskill’s SmugMug is also pay-to-use in a vertical market, but it’s two orders of magnitude bigger. I wonder if we could peak at the internal traffic numbers of 37Signal’s four sites would we not see missed opportunities from possible viral growth triggered by a news item that was not sustained because the site fell to its knees (a.k.a. was “Slashdotted”)? It is possible that avoiding scalability is a post hoc rationalization of the failure of those missed opportunities?

Who’s growth are you?

You can tell from the David Allen reference above, that I’ve read Getting Things Done. I can remember one time, six years later, an executive read it for the first time and bought a copy for all the managers in the office.

“Oh that book!” I said with a smile.

“Why do you look bemused? It’s a great book.”

“Yeah it’s good, but talk to me in six months and we’ll see how many of you have a tickler file.” Of course, none of them did six months later.

By this I mean that we need to be realistic. Is your site more like Tagged or like Basecamp?

Here is the typical argument, directly from David:

“And there are plenty of web applications out there that are much smaller than Basecamp. They should probably worry about getting to the size of Basecamp before worrying about getting to the size of Yahoo. The former is much more likely as well.”

Your difference with Tagged may be one of quantity (raw size), but your difference with Basecamp may be one of quality (different market size and revenue model).

Is your site ad-driven, consumer-facing, broad-appealing, and have virally-driven user growth? If so, then you should consider scalability sooner rather as any growth comes with high variance. The only proven scalability that can grow fast enough will be a shared-none architecture. If you have a relational database, this means horizontal partitioning.

Or Is your site in a small vertical? Is the upper bound of scale determined? Is it replacing an existing system that has been running successfully as-is for a long time already? Is it not even web-based or consumer-facing? Then perhaps you should wait on scaling or already know what the existing scaling and growth is going to be. Horizontal partitioning is not a panacea for all performance problems, just scalability ones.

I don’t know the answer to this question. But neither does David Heinemeier Hansson. You, however, do.

The only exception is if you’re database isn’t running close to single server load.

XXXX EDITING IN PROGRESS (Ick!) XXXXX

My Moore’s Law crushes your Moore’s law

sweet spots. Kevin Burton:

Long story short.. if DHH would have sharded his DB now he could have bought 2-64GB boxes or 4-32GB boxes (though with one replica) and saved from 5-20k.

The 4-8GB DIMMs needed to build a 128GB box are NOT cheap. The sweet spot is 1-2GB.M

four years: hardware cost and depreciation

cloud computing: http://www.phpcult.com/blog/sharding-startups/

Hidden assumptions and other errors

That I work for a big company. That I don’t know small company needs.
Jeremy writes:

I think there’s some confusion here. Apps that don’t need to scale don’t need to scale. Period. Those folks really don’t need to be in this discussion, aside from any passing interest in “how other people make it work.”

Apps that *do* need to scale shouldn’t rely on Moore’s Law to save their ass. Why? Because the moment you fall behind the curve, the battle becomes dramatically hard to win. In fact, simply “staying afloat” becomes quite challenging.

I’ve seen this first hand more than once or twice. It’s not pretty. Not at all.

That I don’t scale vertically. That I haven’t tried other solutions.

The complaint about schema changes (amusing)

That I never started out sharding. DHH: I absolutely agree that if you’re Yahoo or Flickr or LiveJournal, you’ll need to shard sooner rather than later. But bear in mind that neither Flickr or LiveJournal started life as a fully sharded application. When did they start sharding? When they needed to!…But thinking that you need to shard ahead of time to Play With The Big Boys is just stupid. Chances are you won’t build an application that’s that hot. And if you should be so lucky, you can deal with the problem then. Just like Flickr and LiveJournal and every big site under the sun did. (vs. WordPress)

The RAM argument. The part about it being disk based

What horizontal partitioning looks like

paritioning code

logical vs. physical partitions

handling joins

handling transactions.

Why Rails doesn’t partition

Hibernate is based on Active Record and it has it. SQLAlchemy, etc.

Self selection. i.e. he self-selects the problems he faces
I believe someone who chooses Rails self-selects as despising the database
if there is one lesson from Rails is that it’s not anti-PHP, it’s anti datbase
he thinks the databse is messy so abstracts his way away from it via ActiveRecord
ergo any solution which brings him face to face with the messyness of a database is, by definition, going to be a solution he avoids. therefore he choses a pay-first model in order to make his growth deliberately slow
For instance, most people look at 37sig and say “How lucky his product (to do lists, project management) that didn’t have viral growth”
I say, “He chose a niche BECAUSE it doesn’t have viral growth.”

3 thoughts on “Mr. Hansson doesn’t get to shart on sharding

  1. Pingback: Faster PHP fo shizzle—HipHop for PHP

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>