If it was possible in 2002-2004, I am not impressed that it is still possible in 2011.
One of the optimizations was to reduce the per-connection TCP buffers (net.ipv4.tcp_{mem,rmem,wmem}) to only allocate one physical memory page (4kB) per client, so that one million concurrent TCP connections would only need 4GB RAM. His machine had barely more than 4GB RAM (6 or 8? can't remember), which was a lot of RAM at the time.
I cannot find a link to my story though...
http://www.erlang-consulting.com/thesis/tcp_optimisation/tcp...
http://www.trapexit.org/Building_a_Non-blocking_TCP_server_u...
http://groups.google.com/group/erlang-programming/browse_thr...
(this erl mailing list thread is pretty typical, if you put up code, describe your app, hardware, network, database/external dependencies, etc, you'll get a ton of good advice about killing off bottlenecks. Another example
http://groups.google.com/group/erlang-programming/browse_frm...
Running netstat|grep like this on a high concurrency server takes a long time to run. I've never found a faster way to get real-time stats on our busy servers and would be interested if anyone else has.
I hope they publish how they did it - in fact let me drop them an email and see if I can convince them to do so.
FWIW, Yahoo still uses FreeBSD extensively.
http://www.metabrew.com/article/a-million-user-comet-applica...
Part 3 is my favorite. http://www.metabrew.com/article/a-million-user-comet-applica...
http://urbanairship.com/blog/2010/09/29/linux-kernel-tuning-...
The main advantage Erlang has over C/Python/Ruby/etc. is that asynchronous IO is the default throughout all its libraries, and it has a novel technique for handling errors. Its asynchronous design is ultimately about fault tolerance, not raw speed. Also, it can automatically and intelligently handle a lot of asynchronous control flow that node.js makes you manage by hand (which is so 70s!).
You can make event-driven asynchronous systems pretty smoothly in languages with first class coroutines/continuations (like Lua and Scheme), but most libraries aren't written with that use case in mind. Erlang's pervasive immutability also makes actual parallelism easier.
With that many connections, another big issue is space usage -- keeping buffers, object overhead, etc. low per connection. Some languages fare far, far better than others here.
It was several years ago, but I've done my share of high-concurrency stuff under Linux and the highest I got to was about 200K connections - at which point the single-threaded server bottlenecked at its disk I/O.
The main issue is not the actual connection count, it's what the per-socket OS overhead is (so not exhaust non-swappable kernel memory), how many sockets are concurrently active (have an inbound or outbound data queued) and if the application can handle all the events that epoll/kqueue report. This is not a rocket science by any means, and the kernel is relatively easy to fine-tune even when the actual load is present.
Using conntrack on a 1MC system will waste even more kernel memory!