It came out in conversation today if there were other Ruby on Rails sites bigger than Twitter. The answer is, yes.
I get a lot of mileage out of ripping on Rails—besides being easy, it’s also quite fun, and I’m always game for a cheap laugh. (I’ll continue to get some more blog mileage out if it this summer). But it is important to remember, next time you go Fail Whaling, that there are other Rails sites out there, some of which you’ve heard of and use.
Some of them even run.
Scribd
If you want to read this graph correctly, it basically says that Scribd is easily the largest Rails site on the internet, and there are at least three others with traffic-parity with Rails.
Scribd hosts a presentation they made last year on how they scale Rails. It’s worth a look, though I’m curious what their numbers are today, and what a “request” is. The need (at the time) for only a single web server had to do with the mostly-static/segmented nature of their problem area. Remember: many CMS-style news sites reached very large traffice levels before memcache even existed. I’m also very curious why they’re disk-bound IO on their appserver. The section about analytics is very humorous in its naïveté.
In spite of the caveat of the problem space, it’s a pretty good job to do all that with three people. It reminds me when I was shocked to find out Cal built Flickr on his laptop (buy his book—review).
Yellow Pages
YellowPages.com made a high profile switch from J2EE to Rails last year. The cynical me notes that the only thing that gets beat on more than Rails in my universe is J2EE. It’s impressive that they handle 23 million visitors a month, which puts them around half a Tagged by dead reckoning. Of course, if you are counting dynamic page views, we do more in a day than they do in a month. So I guess it depends on how you count. Another cynical thought is the question with improvements like memcache which has come out since YellowPages was first built, they still had to write some binary-level code to Ruby and increase the server count.
But I’ve always said re-architecture midstream is one of the hardest things to do, so the team there deserves props, because they managed to change both the architecture and the app platform midstream with nary a hitch.
Hulu
I love Hulu. They’ve had phenomenal growth that is very deserved.
Of course the cynical me notes that it’s no YouTube in traffic or even in community structure. And they managed to migrate from PHP to a pure Python implementation fairly easily. It makes me think, that there really isn’t much of a challenge to build any mostly static high-bandwidth content delivery website that gets huge traffic in any platform. Yeah, I’m looking at you FunnyOrDie…and scribd…and Yellow Pages. 😉
Still, I love Hulu, if only to get my BSG fix. You should join.
Justin.TV
One Friday last year, I cut across Washington Square in order to drop off a letter on my way to work. I passed by a group of people starting a BBQ and stared at one of them who stared back at me. When I got in to work, before we headed to our 50 millionth user party, I asked a co-worker, “Hey, is that guy you mentioned who films himself 24/7 asian and have a hat on his head with a camera on it and a backpack?”
“Yeah, I think so.”
“Then, I think I saw him on my way to work.”
That’s the first time I met Justin of Justin.TV.
They’ve since moved to posher digs from the hellhole in North Beach they used to inhabit, and their traffic is also respectably within the realm of Twitter.
Another reason that they deserve mention is that, while they are one of those high-bandwidth content-driven site, Justin.TV does stand out from the others above in two key ways: live-streaming and a large social networking component. I am mildly curious of how much the live-stream and live chat architecture is powered by twitter—but not enough to bother asking at a party—besides I have a feeling I already know that answer.
But it does go to show you, that the lines between “content management” and “social networking”—the lines between “data driven” and “user driven”—are being blurred.
Or maybe it’s that social networking, like content-management, is becoming old hat, that now the solved problems are solved.
And that means, it soon won’t matter what language you choose or architecture you use.
Anyone know of any good memcache bindings for HOMESPRING? 🙂
I think most of these just go to show that the problem with Twitter is not rails, but with their write-heavy architecture, small team, and tiny infrastructure.
Scribd serves lots of medium-sized files in a weird flash interface.
Yellow Pages serves lots of mostly-static content.
Hulu and Justin.tv serve streaming video. Surely their backends are not Rails—and neither is twitter, for that matter, or so it would seem.
Don’t get me wrong, Twitter is great anti-Rails fodder, but their real problem is [hundreds or thousands of] thousands of writes per day, and proportionally more reads. Traditional technologies are read-centric.
Twitter, Facebook, Tagged. Those are all write-heavy sites, and the others in your list are read-heavy.
S
Alexa is probably the only thing you can use, but it’s almost certainly wrong, because it only covers web site traffic.
Twitter folks have told me — and I believe them — that the web site counts only for a small fraction of their traffic.
And, as Sean said, the architectural requirements for Twitter are far different. Rails plays to the needs of the other sites you mention here. Solving the problems of Twitter with *any* language/framework would not be easy, and requires expertise outside the norm for the web app industry.
I was going to come in and say the same thing Ed said.
Alexa (and other sites like Compete) is usually way off, especially for tech sites. But considering these are all tech sites, they’re probably all way off by a similar factor.
I’ve also heard multiple times that API traffic is somewhere in the order of 10x (and possibly) more than their web site traffic.
I actually forgot that Yellowpages is actually not tech-related, so it’s probably vastly over-represented on Alexa. Perhaps Hulu too, to some extent.
I renormalized the Alexa traffic based on Tagged’s internal numbers—Tagged is a write-heavy, ajax-driven site as well as all social networks. As for Twitter’s traffic being a majority non-web. In the comments I noted the differences between content-driven vs. user-driven sites.
How much? 10x? That’d put Twitter within parity of Yellow Pages. I actually think it’s more like an addition 1x (2x total) and reading between the lines of this article makes me think that.
http://www.techcrunch.com/2008/04/29/end-of-speculation-the-real-twitter-usage-numbers/
I’ll blog again in the future about why Twitter’s problems are related to Rails. 🙂
While I haven’t seen the internal structure of Twitter’s architecture, I tend to think Russel Beattie’s blog post probably has some truth to it:
http://www.russellbeattie.com/blog/let-the-microblogs-bloom
Sure, it’s fun to blame Rails and it probably accounts for a large portion of their problems (if not directly, indirectly for sure), but I am guessing that isn’t the only big problem.
One other tiny beef, while sure, DB session storage would never be my first choice by any stretch of the imagination, I have seen far worse solutions used. I’d much rather an inexperienced dev use that (especially since it should be a ten second switch to go with something better if the DB implementation was done right) than go with some hacked up solution which makes my stomach turn just thinking about it.
You’ve seen far worse solutions for sessions used back in 2002, before memcache existed.
If you choose a framework that is based on an active record hammer, then your problems seem like nails. So yes, Rails is to blame for twitter’s downtime today since it was the prefab house they started with and the usage patterns were evident from day one.
I don’t code websites in assembly; I don’t make video games in php-gtk.
Clint, Ed,
Maybe we should make a rule that anyone who doubts the Alexa numbers has to cough up his Hitwise login and password to prove the blogger wrong. It’s only fair. No Comscore numbers, please.