SC does support automatic horizontal scaling across any number of machines out of the box if you're running it on Kubernetes.
There's also a CLI tool to deploy it automatically to any Kubernetes cluster: https://www.npmjs.com/package/baasil
See https://github.com/SocketCluster/socketcluster/blob/master/s...
From my understanding, you're basically saying "You can combine SocketCluster with the MQ of your choice (the installation and configuration of which is left as an exercise to the reader) and then between Docker, Kubernetes, and Baasil you can orchestrate and deploy it across a cluster". That sounds a bit more complex than just using SocketCluster, which is what the OP seemed to be indicating was all you needed, and is also including the DevOps story, which I don't think either he or I was intending to include.
I was not trying to indicate that SocketCluster can't be -used- to scale websockets horizontally, but that it's not just an off the shelf solution that would have solved Discord's problem either. It requires other parts, as both the docs and you mention.
I'll also reiterate from my post, SocketCluster has no benchmarks pertaining to what happens when you -do- scale horizontally (per docs here - http://socketcluster.io/#!/performance ). That lack alone would kill my interest in it (as would scc-state being a single instance, which would make fault tolerance a real concern to me, but it looks like you know that already). Is performing horizontal scalability tests on the roadmap?
It should only take a few minutes to deploy a cluster across hundreds of machines. The only limit is the maximum number of hosts that Kubernetes itself can handle (which is I think is over 1000 now)? SCC is self-sharding and runs and scales itself automatically with no downtime.
You can easily handle 5 million concurrent users with a small cluster. SC's problem isn't scalability, it's marketing.
For me to pick Socketcluster for a distributed solution (or more broadly, what I'd want for -any- technical solution) I'd want to know what else I need to pair it with (which the docs actually mislead me on), what else I can benefit from (which the docs don't tell me, but which does exist per your links), and what benefits I stand to get from using it (the docs tell me only marketing claims, but with no metrics, performance, data, etc, for what happens in a distributed context, well, I would avoid it).
Ideally, set up a clustered performance test, and then make as many of the artifacts (docker images, configs, readme, etc) available so others can conduct the same performance test themselves (as well as have a reference architecture for their own solution). Heck, if you're doing it in AWS, consider making the AMIs available along with whatever modifications need to happen. -That- would be very convincing for someone looking to adopt a solution in this space, if they could literally just spin up some EC2s and immediately start throwing load at a fully configured cluster.
Also, to make it clear, is this handling message passing between instances in the cluster?