This is a story of caching (opens in new tab)

(code.google.com)

95 pointsterpua15y ago18 comments

18 comments

15 comments · 6 top-level

pilif15y ago· 6 in thread

... and then the customer called and asked why the graphs on the front page were wrong even though they clearly just edited hugetable.

You explain them that it'll just take a little while to be updated, but the customer didn't like that answer. The data needs to always be current.

Apparently, you need to flush parts of the cache as new data arrives. Unfortunately though, you can't as memcache is a strict key/value store. So you change how you name the cache keys and make them dependent of, say the max(timestamp) of your hugetable.

Load goes back up to 2 because all requests now still have to check the table.

But it's still not as bad.

Until the next phone call...

fliph15y ago

Or you could just update the cache when the data changes.

pilif15y ago

True, if it's possible.

Let's say that hugetable is some interface table filled by a different system you have no control on. You could add a trigger on the database that shells out to some script to clean the cache, but if the that external tool adds rows one by one, that's really expensive (aside of the fact that this is NOT what triggers were invented for).

Or the data in hugetable depends on a lot of different components in your application. Then it's really hard to always be sure to invalidate the cache correctly and there's sure to be a location where you'll forget.

In addition, invalidating the cache on write works counter to the pattern described in the tutorial that concentrates the caching around retrieval.

Don't get me wrong: I agree with the article. It's just never as easy as these tutorials make it seem.

1 more reply

xentronium15y ago

Best way to deal with cache invalidation is not to invalidate it (smart key names technique). The other way round you may miss some edge-cases and/or add a lot of code smell.

jemfinch15y ago

> Apparently, you need to flush parts of the cache as new data arrives. Unfortunately though, you can't as memcache is a strict key/value store.

Huh? You can delete keys in memcached just fine.

> So you change how you name the cache keys and make them dependent of, say the max(timestamp) of your hugetable.

Or you could use memcached's existing expiration support.

pilif15y ago

Yes. You can delete keys. But you need to know the name of the key. You could not use some kind of tagging ("these keys are related to component A" and later "invalidate all entries related to component A")

Using memcaches expiration is what the article does. But I was referring to the requirement of real-time data. If the source data changes, you might want to see the new data and not have to wait for the caches data to expire.

1 more reply

tomjen315y ago

So you make the app update the cache when the customer visits - now the newest version is available and it works.

sloak15y ago· 2 in thread

The real story is in how they push untested code into production just to see what happens. ;)

bigiain15y ago

Heh - 'cause none of _us_ have ever done that, right? <looks around nervously for any colleagues reading>

viraptor15y ago

It depends on what they meant by load average of 20. Or specifically, what kind of workload is it. In some cases LA 20 is pretty much standard, in some other you cannot even login to that box anymore. If it was the second case... well - since nothing works anyways, someone might just as well push untested code into production ;)

Although "One day the Sysadmin realizes" is pretty bad. They should have daily trends showing them it's coming days before it actually happened (unless it was some big release / marketing day).

fizzfur15y ago· 1 in thread

hehe, all software documentation should come in 3 forms: Reference, Tutorial and Pop-up Book

steveklabnik15y ago

_why actually proposed (on a few separate occasions) that there should be more computer books in the 80 page range. More like the Poignant Guide or Nobody Knows Shoes than a dead-tree, slow version of Google.

fliph15y ago

For some reason, I started reading the story with the assumption that it was a "don't do it this way" tutorial, and I got very nervous towards the end. ("But that's exactly how I use memcache!")

adamtj15y ago

Programmer and Sysadmin were either very lucky, or not working on anything important, or else they would have been fired or gone out of business. You can't just add caching and magically expect things to work. You have to think hard about expiration policies and test to make sure you aren't going to get wrong answers, or else you need to prove that wrong answers are ok.

mikeklaas15y ago

"All programming is an exercise in caching."

-Terje Mathisen

j / k navigate · click thread line to collapse

18 comments

15 comments · 6 top-level

pilif15y ago· 6 in thread

... and then the customer called and asked why the graphs on the front page were wrong even though they clearly just edited hugetable.

You explain them that it'll just take a little while to be updated, but the customer didn't like that answer. The data needs to always be current.

Load goes back up to 2 because all requests now still have to check the table.

But it's still not as bad.

Until the next phone call...

fliph15y ago

Or you could just update the cache when the data changes.

pilif15y ago

True, if it's possible.

In addition, invalidating the cache on write works counter to the pattern described in the tutorial that concentrates the caching around retrieval.

Don't get me wrong: I agree with the article. It's just never as easy as these tutorials make it seem.

1 more reply

xentronium15y ago

Best way to deal with cache invalidation is not to invalidate it (smart key names technique). The other way round you may miss some edge-cases and/or add a lot of code smell.

jemfinch15y ago

> Apparently, you need to flush parts of the cache as new data arrives. Unfortunately though, you can't as memcache is a strict key/value store.

Huh? You can delete keys in memcached just fine.

> So you change how you name the cache keys and make them dependent of, say the max(timestamp) of your hugetable.

Or you could use memcached's existing expiration support.

pilif15y ago

1 more reply

tomjen315y ago

So you make the app update the cache when the customer visits - now the newest version is available and it works.

sloak15y ago· 2 in thread

The real story is in how they push untested code into production just to see what happens. ;)

bigiain15y ago

Heh - 'cause none of _us_ have ever done that, right? <looks around nervously for any colleagues reading>

viraptor15y ago

Although "One day the Sysadmin realizes" is pretty bad. They should have daily trends showing them it's coming days before it actually happened (unless it was some big release / marketing day).

fizzfur15y ago· 1 in thread

hehe, all software documentation should come in 3 forms: Reference, Tutorial and Pop-up Book

steveklabnik15y ago

fliph15y ago

For some reason, I started reading the story with the assumption that it was a "don't do it this way" tutorial, and I got very nervous towards the end. ("But that's exactly how I use memcache!")

adamtj15y ago

mikeklaas15y ago

"All programming is an exercise in caching."

-Terje Mathisen

j / k navigate · click thread line to collapse