Have you ever worked in an environment at the scale of Amazon? Machines go down all the time. I used the "six months" example to illustrate that the shopping cart is persistent, but the machine crash could just as easily happen the moment before you push "checkout." Losing shopping cart data just because one machine crashed is totally unacceptable.
Programming distributed systems that must survive machine failure and network partitions is a completely different ball of wax compared to simple web programming. If you haven't done it before, you would not believe how much more complicated it is.
Here is an extremely simple example. Suppose you're a radio station that's taking phone calls from people and you want to give an award to the 5th caller. Implementing a program that does this for a single machine is easy, and could be accomplished with a program something like this:
import SocketServer
class MyHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.server.caller += 1
if self.server.caller == 5:
self.request.sendall("Congratulations, you are the 5th caller!\n")
else:
self.request.sendall("Sorry, you're caller #%d\n" % (self.server.caller))
server = SocketServer.TCPServer(("localhost", 9999), MyHandler)
server.caller = 0
server.serve_forever()
Using this program, you can "call" the program by doing "telnet localhost 9999" and the program will tell you what caller you are. This took me about 5 minutes to write, and I'd never used this Python API before.
Now imagine that you want to implement this same logic, but using a cluster of machines that could go down at any time. You want the group of machines to form "consensus" about which number each caller is; consensus in this context just means that the group of machines arrives at a single answer, and any machine you ask will give you the same answer.
Finding an algorithm that can do this robustly is so difficult that it was a major breakthrough when one was discovered in 1988. It's called Paxos and you can read about it here: http://en.wikipedia.org/wiki/Paxos_(computer_science) Even though it has been known for over 20 years, it is still a complex topic that very few people understand the details of.
The point of all of this is just to say; you can't compare a single-process in-memory cache to a distributed and fault-tolerant system. They are completely different beasts, and many business problems do indeed need the latter.