https://medium.com/@NetflixTechBlog/performance-under-load-3...
(And while reading that and thinking "I swear I recently read something else talking about applying TCP congestion control to RPC queueing". And indeed I had: http://www.evanjones.ca/prevent-server-overload.html)
I did this almost 10 years ago for accessing Amazon SimpleDB, too: http://www.daemonology.net/blog/2008-06-29-high-performance-...
The code on the back-end processing software would throttle itself back hard (about 50% of the speed it had reached) whenever it ran into a DynamoDB throttle message as a response, and then would ramp itself back up steadily. Combined with good retry logic on the front end services, it meant we could keep the DynamoDB table humming along at near maximum usage.
> SBroker is a framework to build fault tolerant task schedulers. It includes several built in schedulers based on TCP (and other network scheduling) algorithms.
> [...] in an ideal situation a target queue time would be chosen that keeps the system feeling responsive and clients would give up at a rate such that in the long term clients spend up to the target time in the queue. This is sojourn (queue waiting) time active queue management. CoDel and PIE are two state of the art active queue management algorithms with a target sojourn time, so should use those with defaults that keep systems feeling responsive to a user.
"Executor -- The BlockingAdaptiveExecutor adapts the size of an internal thread pool to match the concurrency limit based on measured latencies of Runnable commands and will block when the limit has been reached."
I'm often surprised this kind of auto-scaling thread pool is not a more common thing in Java land.
This adds flexibility to allow for use of multiple algorithms in different load balancing regions (mobile as a lossy fabric is better to stay with conventional, desktop and server clients can use a some smarter throughput maximizing algo, and in countries with high percentages of connections being laggy DSLs, you can use something else )
Netflix puts out some amazing Java libraries. I've had excellent results using Hystrix [0]. It has been an excellent addition to our systems.
Most of the problematic backedends are not those with response time of 20ms. On almost every request. The backedends with problems are those which could reply in 10ms or 2 minutes ...
Yes, the basic implementation does reject arbitrary requests. We do have a partitioned limit strategy (currently in the experimental state, which is why it wasn't brought up in the techblog). The partitioned limiters lets you guarantee a portion of the limit to certain types of requests. For example, let's say you want to give priority to live vs batch traffic. Live gets 90% of the limit, batch gets 10%. If live requests only account for 50% of the limit then batch can use up to the remaining 50%. But if all of a sudden there's sustained increase in live traffic you're guaranteed that live requests will only be rejected once the exceed 90% of the limit.
From https://medium.com/@NetflixTechBlog/performance-under-load-3...
> The discovered limit and number of concurrent requests can therefore vary from server to server, especially in a multi-tenant cloud environment. This can result in shedding by one server when there was enough capacity elsewhere. With that said, using client side load balancing a single client retry is nearly 100% successful at reaching an instance with available capacity. Better yet, there’s no longer a concern about retries causing DDOS and retry storms as services are able to shed traffic quickly in sub millisecond time with minimum impact to performance.
Edit: In terms of how they decide what to reject, from reading the blog post, there is a queue and there is a limit to how big the queue can be. Requests that come in while the queue is "full" get rejected immediately. They don't wait in the queue and timeout.
Is this better ?