Node event loop blockages are the primary reason we have so many processes running. We have enough integrations and iterate on them quickly enough that our infrastructure essentially treats them as untrusted/breakable. We want to avoid ReDoS-style bugs from affecting more than the current request, so we handle one request per process. A little inelegant, but we've still been able to horizontally scale the system, and frankly the extra infrastructure cost hasn't been enough to be worth the effort to change it.
To get around the start-task rate limit, we've tried running multiple identical containers per ECS task. However, they need to be marked as "essential" in CloudFormation to make sure our capacity doesn't degrade on container exits, and this means that one container exiting will also exit other containers in the same task.
Multiple processes per container is another interesting approach. We've used Node subprocesses in the past, but we found them tricky for reasons that are unrelated to deploy speed.
One thing we've really liked about rolling our own approach is that we decide when to declare a deploy complete. ECS is pretty conservative about not declaring a deploy complete until the final container has finished draining, which can take minutes for some of our requests. With our fast deploys, we declare a deploy complete when the final container running old code stops accepting new requests, which is significantly sooner. This makes follow-on deploys and rollbacks much smoother.