You explain them that it'll just take a little while to be updated, but the customer didn't like that answer. The data needs to always be current.
Apparently, you need to flush parts of the cache as new data arrives. Unfortunately though, you can't as memcache is a strict key/value store. So you change how you name the cache keys and make them dependent of, say the max(timestamp) of your hugetable.
Load goes back up to 2 because all requests now still have to check the table.
But it's still not as bad.
Until the next phone call...
Let's say that hugetable is some interface table filled by a different system you have no control on. You could add a trigger on the database that shells out to some script to clean the cache, but if the that external tool adds rows one by one, that's really expensive (aside of the fact that this is NOT what triggers were invented for).
Or the data in hugetable depends on a lot of different components in your application. Then it's really hard to always be sure to invalidate the cache correctly and there's sure to be a location where you'll forget.
In addition, invalidating the cache on write works counter to the pattern described in the tutorial that concentrates the caching around retrieval.
Don't get me wrong: I agree with the article. It's just never as easy as these tutorials make it seem.
Huh? You can delete keys in memcached just fine.
> So you change how you name the cache keys and make them dependent of, say the max(timestamp) of your hugetable.
Or you could use memcached's existing expiration support.
Using memcaches expiration is what the article does. But I was referring to the requirement of real-time data. If the source data changes, you might want to see the new data and not have to wait for the caches data to expire.
Although "One day the Sysadmin realizes" is pretty bad. They should have daily trends showing them it's coming days before it actually happened (unless it was some big release / marketing day).
-Terje Mathisen