> What happens when 1 million connections get disconnected and try to reconnect at the same time?
I think the distributed system term for your problem is called the 'thundering herd problem,' so searches that involve that would likely be fruitful. "Thundering herd websockets" would probably be fruitful.
From a reliability perspective, implement exponential back-off on the client that includes jitter. This is a core necessity in all clients. I only skimmed this article, but it looked right: https://aws.amazon.com/blogs/architecture/exponential-backof...
When Signal had outages from the increased load during the WhatsApp exodus, it was due to this not being implemented in their clients.
Additionally, consider your load balancing architecture. If one machine goes down, do all reconnects go to that machine, or do the reconnects get distributed to all the machines? Can you administratively drain a machine? Can you quickly allocate some spare capacity?
Lastly, you can get into situations where your entire infrastructure is overloaded. You will need a throttling mechanism. That throttling mechanism can synergisticly work with your load balancer or client. If you benchmark your server and it can only handle 500 concurrent re-connections, then that is a hard limit you know you can enforce fail-fast behavior with.
Summary:
Clients implemented with exponential backoff and jitter
Loadbalancer architecture
Defensive "fail fast" throttling or ability to administratively throttle.