Whether the title wording is ironic or sincere -- and I'm guessing it's used jokingly -- using it presumes a very specific shared outlook with the reader that is, likely as not, wrong. Certainly wrong in my case.
I'm not claiming the title is homophobic. I claim it's distracting to some portion of readers and serves to introduce the actual content of the linked submission poorly. The submission title does not nearly reflect the attitude or writing in the actual linked post which is, in comparison: specific, technical, and non-abrasive.
Fundamentally, most of these points have so many caveats were you to want to extrapolate them to general programming technique, that you're much better off simply saying "benchmark your actual cases and try the alternatives".
This means that you will need around 10GB of memory when serving 10000 request, while an event based app can serve the same amount of requests with minimal memory.
You also have to take into account that thread creation and context switching is really an expensive operation (contrary what the OP is saying), so the thread-per-request app is adequate only for serving small number of requests but for large numbers you will need to use the event-based approach.
You might get along with a threadpool based approach, but take into account that if your protocol is not stateless, then you will need shared state which means you have to use concurrent data structures, which might complicate code.
1. prefer processes over threads -- because your architecture can scale horizontally across multiple boxes easier, because you become free to write each piece in a different language, and because mutable shared memory is problematic in very subtle, counter-intuitive ways
2. prefer events over processes or threads -- because you can handle much higher concurrent IO traffic on a single machine, due in part to reduced memory use
1) Recent versions of the 1.6 JDK will use epoll on Linux. Thus the benchmark should be re-evaluated. poll() is known to not very scalable. There is an issue, however: NIO only supports level triggered (not edge triggered) epoll.
2) This doesn't cover the case of threadpool starvation. I.e., there are multiple connections, some are very fast, some are very slow.
Prime example of this would be a client for a WAN-distributed database or a WAN-distributed file system: most operations are local (5 ms), some operations are remote (80 ms). Remotes are lingering longer and longer in a fixed size threadpool, leaving less and less space in the threadpool causing a longer wait time for incoming operations. You can even have this without WAN distribution e.g., 80% of operations require no random disk seeks (are retrieved from cache), 20% require them (orders of magnitude slower operation even with elevator scheduling).