At Tagged, I’ve been working on the project for the last month. Basically it allows us to cache dynamic parts of throughout the site and keep the caches from having stale data. Since we call the APIs both externally and internally—over 20 calls to generate a user’s profile page alone—the caching is turned on at the API level.
The initial hit to the system was severe.
Basically, I was causing the profile page to take 30 extra milliseconds to render. The profile page accounts for 17% of all page views on the site where we do about 220 million “views” a day. This 30 milliseconds, believe it or not, was dropping profile page views by 5%. And it took about 20 hours of back and forth before I finally resolved to rewrite the whole thing so that it can be rolled out gradually without any performance hit.
That means I cost the company one and a half million ad impressions. 🙁
The second day’s cached piece is actually measurable on the live site with tools I wrote. I am saving between 17 and 900 milliseconds depending on the state of the backend and load on the server.
Since I can’t measure how much actual page views I’m adding back into the system, I was curious about how much extra server capacity these changes are getting me.
In other words, how much is a millisecond?
If I save a millisecond on the profile page, that means I save about 37,400,000 cpu-process-milliseconds during the course of the day or about half a CPU-process-day. During peak hours we have about 50 processes/machine.
This means a millisecond saved is worth at least a hundredth of a machine at capacity.
I estimate the new caching system, activated slowly, will be like adding five machines to the server pool a day—adding capacity about 5x faster than our growth rate.
I have enough caching projects to keep doing this for the rest of the month. 🙂