Any recommendations/gotchas, or should everything be fine out-the-box?
And while we're on the subject, what other recommendations have people got for speeding up/optimising PHP/LAMP?
Do you have a cache policy? For files? For DB data? Do you use the APC user cache or Memcached? The APC user cache is faster (on the web front-end), but is not shared. Memcached is shared but there will always be small network latency. If DB access is a bottleneck, consider using a data cache.
Do you serve a lot of static files? Consider using a CDN (not a silver bullet! There are drawbacks.)
There are a lot of strategies for high performance on http://highscalability.com/ . But don't overdo it! Most of the time you don't need to set up your infrastructure to be scalable for millions of clients.
As missenlinx pointed out, try also to speed up the page display by following Yahoo! recommendations on http://developer.yahoo.com/performance/rules.html ; The number of requests per page is certainly something that made a difference when optimized, at least in my experience - YMMV.
our app is generally more reads, but we're optimising different parts depending on the read/write balance. mix of innodb & myisam - my understanding is MyISAM better for reads, InnoDB for Transactions and Row-level-locking => better for lots of writes while still reading .. that hold water as a good uber-simplified rule of thumb?
not using APC user cache or Memcached yet - thanks for pointing out the differences. The FB presentation linked on the other thread gave central app config data as a good example for using the APC user cache
for interest - what scale do you need to reach before justifying looking into a CDN? (we're definately not there yet!)
more generally we're building features focusing on developer productivity, then optimising as bottlenecks arise.
The "feature configuration" system at Facebook is pretty good. That said, it's not perfect: the fact that there are two different caches whether you're running in CLI or Apache SAPI mode has been a problem for me in the past; I still haven't figured out a way to run scripts from a crontab while getting access to the data cached by web pages. Running curl might be an option...
For a CDN, it really depends on the number of files you have, and how much data you're sending out. If you're paying for a hosting plan with more bandwidth in order to serve all your static images, then it might be interesting. You can calculate the costs yourself, actually: http://calculator.s3.amazonaws.com/calc5.html There are alternatives to S3 (I compared Akamai and S3 in a study in the web company I work for, S3 came out cheaper for our traffic patterns.) This is not to say that cheaper is better, you obviously have to look at the whole picture.
I general, the main problem we have to examine before moving our files to a CDN is the cache policy. S3 has datacenters both in Europe and the US, and there certainly is a propagation delay when you write a file. There are techniques to avoid this, such as naming your images with a version number, but you have to make sure your clients are going to find the files when your pages start linking to them.
Another important issue is the downtime. S3 recently had a several-hour downtime... this has never happened with our local hosting provider. What will your plan be when this happens? If you can detect that your CDN is down, can you redirect the traffic to your host? Will this risk bringing you down to, if you downsized the equipment after the CDN migration?
It's certainly not an easy decision to take, but if done right and properly planned, a CDN can save you a lot of money.
Anyway, have a look at Memcached. We added a "Cacheable" attribute to class definitions in our in-house ORM so that cacheable data is taken from Memcache instead of querying the DB (that is, if it is available in the cache; otherwise there is a miss + Memcache SET). For our relatively large website (~1000 people connected now, 1.7M subscribers), Memcache is saving 1 billion queries a month. I would suspect that APC is saving at least that many also.
Having this feature in the ORM makes it transparent for the developers. This is important, because having to wrap hundreds of DB GETs in
if(val = get_from_cache()) {
return val;
} else {
val = get_from_db();
cache_set(val);
return val;
}
is pretty tedious and leads to unmaintainable code.Use a different programming language.
No, seriously. http://shootout.alioth.debian.org/gp4/benchmark.php?test=all...
If runtime speed is your design goal, then you definitely want to think outside the PHP/Perl/Python/Ruby box. Those are fast enough for most people, but other languages are a lot faster. OCaml and Haskell are amazing; SBCL is pretty good too.
Let's give people advice for their problem and specific language and keep the opinions on framework/code to yourself.
If your looking to optimize your site, as messenlinx suggested yslow is a good place to start. Next, I would suggest looking at your queries, ie a select statement across a big table vs. a more refined query across a smaller data set will do wonders regardless of the language.
If your data is updated infrequently, you can look at caching your queries in the data level. If not, maybe look at memcached?
Cheers