Concurrency Limits by Netflix (opens in new tab)

(github.com)

191 pointsrshetty8y ago27 comments

27 comments

26 comments · 10 top-level

Anderkent8y ago· 3 in thread

This is a pretty cool design if your requests to a given endpoint are supposed to all take about the same time. It's not easy to see how you'd adjust it for things with more variance; perhaps rather than using the fastest seen mtt, you could look at your p99 over the last N minutes, and see if it's been changing?

Bombthecat8y ago

Jeah, I'm also thinking about this problem. Thought this could help...

Most of the problematic backedends are not those with response time of 20ms. On almost every request. The backedends with problems are those which could reply in 10ms or 2 minutes ...

nvarsj8y ago

I think you bring up the main problem w/ using tcp vegas. It's not clear to me this will work with heterogenous requests. If the typical request time distribution is long tailed, it might never increase the window size.

elandau8y ago

Even with heterogenous workload there normally is a uniform distribution of request types. Instead of generating complex statistics for average latency or tail latencies, especially for multimodal distributions, we just look at the minimum latencies as a proxy to identify queuing. So, when there is any queuing for whatever reason (increased RPS or latency in a dependent service) all latency measurements will show an increase, especially the minimum.

1 more reply

bvod8y ago· 3 in thread

Can anyone explain how they decide which requests to reject? The blog post just mentions that excess RPS gets rejected, but couldn't rejecting arbitrary requests cause other problems?

elandau8y ago

Requests are rejected essentially when an atomic counter of inflight requests hits the limit. It's important to note that the library doesn't actually keep any kind of queue of requests. That's really not necessary because every system already has a ton of queues in the form of socket buffers, executor queues, etc...

Yes, the basic implementation does reject arbitrary requests. We do have a partitioned limit strategy (currently in the experimental state, which is why it wasn't brought up in the techblog). The partitioned limiters lets you guarantee a portion of the limit to certain types of requests. For example, let's say you want to give priority to live vs batch traffic. Live gets 90% of the limit, batch gets 10%. If live requests only account for 50% of the limit then batch can use up to the remaining 50%. But if all of a sudden there's sustained increase in live traffic you're guaranteed that live requests will only be rejected once the exceed 90% of the limit.

nvarsj8y ago

My guess is they use a 0 / small queue in front of the request pool. If queue is full (indicating the server is at its concurrency limit), it returns a 429 (which is sort of weird - return a 503 instead). I don't think that is part of the library though - the library just provides the low level bricks.

1 more reply

ad-hominem8y ago

I think their point is basically "It doesn't matter" - any client that sends a request which then gets rejected automatically retries and is bound to get a server that is up and has capacity. The retry happens so fast that even with just a naive retry implementation, the end-user won't even notice the interruption.

From https://medium.com/@NetflixTechBlog/performance-under-load-3...

> The discovered limit and number of concurrent requests can therefore vary from server to server, especially in a multi-tenant cloud environment. This can result in shedding by one server when there was enough capacity elsewhere. With that said, using client side load balancing a single client retry is nearly 100% successful at reaching an instance with available capacity. Better yet, there’s no longer a concern about retries causing DDOS and retry storms as services are able to shed traffic quickly in sub millisecond time with minimum impact to performance.

Edit: In terms of how they decide what to reject, from reading the blog post, there is a queue and there is a limit to how big the queue can be. Requests that come in while the queue is "full" get rejected immediately. They don't wait in the queue and timeout.

jsnell8y ago· 2 in thread

The blog post announcing this might be a better read:

https://medium.com/@NetflixTechBlog/performance-under-load-3...

(And while reading that and thinking "I swear I recently read something else talking about applying TCP congestion control to RPC queueing". And indeed I had: http://www.evanjones.ca/prevent-server-overload.html)

cperciva8y ago

I swear I recently read something else talking about applying TCP congestion control to RPC queueing

I did this almost 10 years ago for accessing Amazon SimpleDB, too: http://www.daemonology.net/blog/2008-06-29-high-performance-...

Twirrim8y ago

Had a co-worker implement something similar to this dynamic throttling in a back-end analytical process, based around access of a DynamoDB table that was accessed by multiple front end services.

The code on the back-end processing software would throttle itself back hard (about 50% of the speed it had reached) whenever it ran into a DynamoDB throttle message as a response, and then would ramp itself back up steadily. Combined with good retry logic on the front end services, it meant we could keep the DynamoDB table humming along at near maximum usage.

lkrubner8y ago· 2 in thread

I like the sounds of this:

"Executor -- The BlockingAdaptiveExecutor adapts the size of an internal thread pool to match the concurrency limit based on measured latencies of Runnable commands and will block when the limit has been reached."

I'm often surprised this kind of auto-scaling thread pool is not a more common thing in Java land.

tyrankh8y ago

Agreed - not trying to trivialize the work here, but this thing seems solved with tools at hand. Perhaps I miss some of the subtlety in their usecase, though.

otterley8y ago

Which tools are you referring to, exactly? Some citations would be helpful.

baybal28y ago· 2 in thread

PROTIP: Managing congestion control on application level is not a good business. A better idea is to leave it to the edge and CDN, while the app level uses computationally cheap optimistic algos since the com in between the app and your edge will go over your own high quality infrastructure.

This adds flexibility to allow for use of multiple algorithms in different load balancing regions (mobile as a lossy fabric is better to stay with conventional, desktop and server clients can use a some smarter throughput maximizing algo, and in countries with high percentages of connections being laggy DSLs, you can use something else )

otterley8y ago

Can you refer us to any papers or analyses that support your claim?

baybal28y ago

03543058y ago· 2 in thread

I'm not grasping the concept; Do I use this in production to manage concurrency, or do I use this in 'testing' to fine tune my system?

michaelmior8y ago

You use this on production to allow either the server or client to self determine optimal concurrency levels.

993004328y ago

Thanks

stephen1238y ago· 2 in thread

I would have thought reactive streams or back pressure would be the usual way to deal with this issue.

Is this better ?

lllr_finger8y ago

Netflix's circuit breaking OSS project, Hystrix, is commonly seen alongside Ratpack/rxjava where both reactive streams and back pressure are in play. I don't think you're wrong, but Netflix and others using their solutions like Hystrix and Hollow, are outside of what I'd consider "usual" problems and solutions.

nvarsj8y ago

I don't think this library is meant to be used w/ reactive streams. It talks a lot about limiting number of concurrent threads, so it sounds more like traditional RPC with a request pool that they are trying to size to inform clients to back off (by returning 429).

ramchip8y ago

This reminds me a lot of the sbroker library in Erlang:

> SBroker is a framework to build fault tolerant task schedulers. It includes several built in schedulers based on TCP (and other network scheduling) algorithms.

> [...] in an ideal situation a target queue time would be chosen that keeps the system feeling responsive and clients would give up at a rate such that in the long term clients spend up to the target time in the queue. This is sojourn (queue waiting) time active queue management. CoDel and PIE are two state of the art active queue management algorithms with a target sojourn time, so should use those with defaults that keep systems feeling responsive to a user.

http://www.erlang-factory.com/euc2017/james-fish

https://github.com/fishcakez/sbroker

time0ut8y ago

This library looks very interesting. I've used a similar approach for pulling batches of items from a queue (e.g. discover the optimal batch size and inter-poll wait time). There are plenty of other places we could benefit from something like this. I can't wait to try this out.

Netflix puts out some amazing Java libraries. I've had excellent results using Hystrix [0]. It has been an excellent addition to our systems.

[0] https://github.com/Netflix/Hystrix

fwef8y ago

Would this work when connecting multiple threadpools (java executors)? Imagine I have a microservice where first threadpool downloads large files to disk (IO bound) and another threadpool that processes the downloaded data (CPU bound). Those two threadpools communicate with a bounded queue. Will using the concurrency-limit Executor allow me to get the best throughput in this scenario?

j / k navigate · click thread line to collapse

27 comments

26 comments · 10 top-level

Anderkent8y ago· 3 in thread

Bombthecat8y ago

Jeah, I'm also thinking about this problem. Thought this could help...

Most of the problematic backedends are not those with response time of 20ms. On almost every request. The backedends with problems are those which could reply in 10ms or 2 minutes ...

nvarsj8y ago

elandau8y ago

1 more reply

bvod8y ago· 3 in thread

Can anyone explain how they decide which requests to reject? The blog post just mentions that excess RPS gets rejected, but couldn't rejecting arbitrary requests cause other problems?

elandau8y ago

nvarsj8y ago

1 more reply

ad-hominem8y ago

From https://medium.com/@NetflixTechBlog/performance-under-load-3...

jsnell8y ago· 2 in thread

The blog post announcing this might be a better read:

https://medium.com/@NetflixTechBlog/performance-under-load-3...

cperciva8y ago

I swear I recently read something else talking about applying TCP congestion control to RPC queueing

I did this almost 10 years ago for accessing Amazon SimpleDB, too: http://www.daemonology.net/blog/2008-06-29-high-performance-...

Twirrim8y ago

Had a co-worker implement something similar to this dynamic throttling in a back-end analytical process, based around access of a DynamoDB table that was accessed by multiple front end services.

lkrubner8y ago· 2 in thread

I like the sounds of this:

I'm often surprised this kind of auto-scaling thread pool is not a more common thing in Java land.

tyrankh8y ago

Agreed - not trying to trivialize the work here, but this thing seems solved with tools at hand. Perhaps I miss some of the subtlety in their usecase, though.

otterley8y ago

Which tools are you referring to, exactly? Some citations would be helpful.

baybal28y ago· 2 in thread

otterley8y ago

Can you refer us to any papers or analyses that support your claim?

baybal28y ago

03543058y ago· 2 in thread

I'm not grasping the concept; Do I use this in production to manage concurrency, or do I use this in 'testing' to fine tune my system?

michaelmior8y ago

You use this on production to allow either the server or client to self determine optimal concurrency levels.

993004328y ago

Thanks

stephen1238y ago· 2 in thread

I would have thought reactive streams or back pressure would be the usual way to deal with this issue.

Is this better ?

lllr_finger8y ago

nvarsj8y ago

ramchip8y ago

This reminds me a lot of the sbroker library in Erlang:

> SBroker is a framework to build fault tolerant task schedulers. It includes several built in schedulers based on TCP (and other network scheduling) algorithms.

http://www.erlang-factory.com/euc2017/james-fish

https://github.com/fishcakez/sbroker

time0ut8y ago

Netflix puts out some amazing Java libraries. I've had excellent results using Hystrix [0]. It has been an excellent addition to our systems.

[0] https://github.com/Netflix/Hystrix

fwef8y ago

j / k navigate · click thread line to collapse