The most important thing to understand about queues (2016) (opens in new tab)

(blog.danslimmon.com)

218 pointsdhotson4y ago120 comments

120 comments

87 comments · 24 top-level

dcow4y ago· 14 in thread

I like the author's solution (always use bounded queues) because it usually forces you to confront back-pressure up front. It doesn't matter how big your queue is, your system must be designed to work at peak throughput essentially without a queue, and thus your system must handle the possibility that an event fails to be processed and must be retried/dropped. Queues only serve to mitigate bursts.

It's annoying but also understandable how often people just dump stuff into an unbounded queue and punt on making sure things work until the system is falling down. Often queues are more of a tool developers and operators can use to navigate unknown performance characteristics and scale later than they are a requirement for the actual system itself.

lanstin4y ago

This philosophy, all queues have explicit tuned limits with a drop log and increment a metric on queue full, was used thoroughly at AOL in the 90s. It worked very well, and let us use hardware at much higher loadings than is popular currently. Our goal was ~90% CPU at peak (systems I have worked with more recently start to die around fifty percent). Also of course there was a meeting each Friday to look at all queue depths and queue latencies and to see where we needed more capacity coming up. We did have so many subscribers that traffic was fairly smooth over all.

_vvhw4y ago

> I like the author's solution (always use bounded queues) because it usually forces you to confront back-pressure up front.

I work on a system that can process a million financial transactions a second, and this principle of "always use bounded queues" has definitely made an incredible impact on the design.

We use bounded queues everywhere — and everything is also static allocation: memory, network resources, disk too, even all the messages that might possibly be needed to execute different variants of the distributed consensus protocol. It's really nice coding on it, because all the limits are clearly defined.

It forces a little more upfront thinking. This thinking can be a little difficult for a day or two, but the net result is a better design.

I love Little's law.

konha4y ago

> and thus your system must handle the possibility that an event fails to be processed and must be retried/dropped

Isn’t a retry just extending the queue to the caller?

dcow4y ago

That's why I said "retry/drop". Handling back-pressure means you must propagate it backwards to a point in your program where there is enough context to enable decision to be made on how to handle the failure. Also, a retry can wait some amount of time until the system is less busy (and this can be enforced by some other part if needed). Further, you may have many callers and one central thing processing requests so propagating back-pressure to callers also allows you to push the problem back to a point where there are potentially more resources. In a silly example, if someone spamming your queue starts to experience a sluggish UI or run out of RAM because back-pressure has forced the queue to grow on their machine an not your infrastructure, perhaps they'll get the message and cool it (ideally the program handles this gracefully too and tells the user a maximum has been hit, but even without that back-pressure is useful).

So, yes. But that's the point.

ggrrhh_ta4y ago

That's the best scenario; regardless of whether the event reaches the destination or not, in most cases (except continuous monitoring of some parameter which admits losses) the source still will keep the information or the way to generate that information, so the amount of state that the queue at the source requires is still being used. But moreover, the source can throttle the generation of events, or it can just decide that that event is stale and can be dropped, or can pack that event with the next events, or can signal other processes about what is happening, or it can choose other path, or maybe it has some out-of-band signaling capability, or....

pornel4y ago

Even worse, it multiplies the amount of requests hitting the overloaded processor. You can create a "metastable failure" situation that will prevent an overloaded service from ever recovering.

3 more replies

im3w1l4y ago

Yes but that queue is bounded too. And if you follow the systems backward eventually you end up with a real person. That is then forced to prioritize away less important tasks.

sdevonoes4y ago

> It's annoying but also understandable how often people just dump stuff into an unbounded queue and punt on making sure things work until the system is falling down.

It's annoying if it is done by the infrastructure team (I mean, they should know the details of the queue they are managing). It's understandable if it is done by product developers (they are more into "inheritance vs composition" kind of things).

dcow4y ago

I've seen plenty of product engineers not understand this fundamental aspect of queues and just add them because it "felt right" or "for scale" or something silly...

1 more reply

capn_duck4y ago

I've never met an application developer who was unaware of what a "queue" was or the problems they purport to solve. Pretty sure stacks/queues are among the first "data structures" that students create, which inevitably leads to the question of, "what to do when full?" i.e. circular queues, double ended queues, priority queues. I know that the enterprise grade queuing systems we're talking about are a lot more involved than that, but to suggest that developers don't grok queues is pretty disingenuous. And the implications of rate in > rate out is pretty obvious for anyone that's ever had a clogged sink.

1 more reply

marcosdumay4y ago

> Queues only serve to mitigate bursts.

Well, they also serve to hold your state while you add processing capacity.

If you look at a market queues, when they start to grow, management deviates people from non time sensitive tasks into cashiers. The queues make keeps things working during the change.

Equivalent situations happen on software too.

dcow4y ago

It's two sides of the same coin. In your example management only adds more cashiers if the queues have become unacceptably long (they've tripped a soft limit). The queues are serving to accommodate irregular shopper purchasing patterns because shoppers don't arrive consistently throughout the day. So really they are serving to smooth over bursts. Adding more cashiers is simply a response to queues hitting an unacceptably long length. If management had infinite cashiers and you could add them instantly then you wouldn't need queues. But, and this is especially obvious if you consider the scenario where all registers are staffed, then you still need queues to handle bursts. Costco is a good place to experience this.

mwcampbell4y ago

I wonder which combination of HTTP reverse proxy (e.g. nginx, cloud load balancer) and application server (e.g. Python WSGI server) is best at using bounded queues and signaling backpressure (e.g. HTTP 503) rather than just letting requests pile up.

slrz4y ago

I guess that basically everything you'd consider for production use allows you to configure queue sizes and related behaviour however you fancy.

cjg4y ago· 8 in thread

I just finished implementing a simple backpressure algorithm. Once the queue is more than a certain percentage full, it rejects new items. These rejections get turned into HTTP 503s.

usrbin4y ago

What is the purpose of cutting off the queue at a n% full? Isn't that effectively saying that the queue size is actually n% of the stated size, or am I missing detail?

bombcar4y ago

Perhaps "rejects new items" is the key - a web server that hits a certain percentage utilization only allow existing clients/connections/logged in users/whatever to connect; refuse new connections.

cjg4y ago

It's probabilistic.

morelisp4y ago

Wait until you hear about hash table capacities.

SketchySeaBeast4y ago

Haven't you just moved the pressure back a step? Now you have a bunch of items retrying to get their stuff on the queue and a bunch of places to try and locate the problem, instead of an overly large queue. Is the idea that the messages aren't important enough to be placed on the queue in the first place if demand is high?

cjg4y ago

The idea is that it's not a sudden hard failure. Instead only a few things fail which helps in two ways: reducing the load by discarding some work and alerting that the system is beginning to be overloaded.

1 more reply

bmm6o4y ago

I think 429 is the preferred status code for this. Of course you can use whatever you want, but 503 can presumably come up in other scenarios and it's nice to have something unambiguous.

cjg4y ago

429 is only applicable if it's the same user sending all the traffic.

503 explicitly mentions server overload.

galaxyLogic4y ago· 6 in thread

Here's another article about the same issue I think https://ferd.ca/queues-don-t-fix-overload.html .

Solution: "Step 1. Identify the bottleneck. Step 2: ask the bottleneck for permission to pile more data in"

bombcar4y ago

This is wonderful and shows the problem very clearly.

If your system doesn't have a way to shed load, it will eventually overload.

The problem for many is that when it does finally shed load, the load shedder gets blamed (and turned off). See how many people look for ways to turn off the OOMKiller and how few look to figure out how to get more RAM.

dcow4y ago

This is also wonderful (emphasis mine):

> All of a sudden, the buffers, queues, whatever, can't deal with it anymore. You're in a critical state where you can see smoke rising from your servers, or if in the cloud, things are as bad as usual, but more!

cecilpl24y ago

This is a fantastic article, thank you for sharing!

prometheus764y ago

I work in a custom fabrication environment, and this solution doesn't really apply, because what happens in a dynamic system is that the bottleneck shifts all around the shop dynamically. It's never just one operation that is the constant bottleneck.

owenmarshall4y ago

How doesn't it? I've lived in a world of coordinated microservices before, where a nice linear progression of queues only happens at the atomic level; when you zoom out you've got a digraph full of cycles, nodes that come & go in response to various events & schedules, and the transitory nature of those services entering could cause a smooth-sailing pipeline to suddenly seize up because some downstream dependency is servicing higher-priority requests.

The only way we could stay sane was aggressively following Ferd's rule – nobody got to surface an API unless it surfaced HTTP 429s on overload[1], and if you called a downstream API without checking for and responding to a 429, tough noogies, your requests got dropped and you had to figure out what to do to fix it.

Funny enough, at the time I did software for a manufacturing facility that did high-throughput manufacturing for many varying products. I was lucky enough to do a "gemba walk" where they made us engineers put on hard-hats and eye protection and reminded us who actually made the company money, and by god those guys were doing exactly what we did: they had JIT delivery of raw materials / sub-components to assembly lines, and the delivery of resources to the line was always constrained by a fixed buffer, quite literally "how many units you could put on a shelf". You couldn't just toss more product at a line that wasn't ready for it, it would literally not fit.

Sure, they used different terms than we did – muda and muri and value-stream mapping – but the outcome was very similar. Their version of the NOC was a team of managers who watched the cycle time of each line, how long it took material to be transported, which areas were bottlenecking and which were at risk of running into resource starvation, and they had the ability to balance the hot spots by bursting product in, caching upstream components into hold docks, releasing held work product to downstream lines, or even turning on a flex line to work around longer-term constraints. They rarely had fixed, static bottlenecks, because they optimized the life out of them. Their bottlenecks arose when a shipment didn't arrive (and you better believe those were tracked just as aggressively), a machine broke down, an operator got sick, things like that.

But at the end of the day? Same principle: downstream components would only accept as much as they could process from upstream components; you asked before you pushed.

[1]: and if you said "I swear, you can't overload this service, we don't need to send 429s" a swarm of senior engineers descended upon you and only left until you convinced them they were wrong, or – way more likely – they convinced you.

1 more reply

hammock4y ago

Love that solution. It's plainly unfair but in my experience so critical to getting things done in business, even setting aside engineering.

mopierotti4y ago· 5 in thread

I'm no expert, but I don't think this is true. Couldn't tasks arrive into the queue at the same rate that they are processed, resulting in a fixed queue size at 100% utilization?

Put another way, in the second "No good" diagram showing one task being worked on with none in the queue, another task could arrive before the current one is finished.

I suspect the counterargument might be that the task arrival rate is not usually that predictable, but even so, I expect that the amount of variance predicts how true this theory is in practice.

seandavidfisher4y ago

The author addressed this in the comments.

> What’s to prevent the system from bouncing between “1 task in processor & 1 task in queue” and “1 task in processor & 2 tasks in queue” while maintaining 100% utilization?

> Nothing! That could totally happen in a queueing system. However, the arrival process would need to be tuned quite precisely to the processing rate. You would need to watch the status of the processor and, when a task finishes, only then insert a new task into the queue. But this implies a shell game: if you always have a task ready to put into the queue when it needs to be padded back out, then isn’t that task in a sense already “queued”?

> Instead, in queueing theory we usually assume a random arrival process: sometimes a minute may pass between arrivals into the queue; sometimes only a second. So the system can’t bounce for arbitrarily long between 1-in-queue and 2-in-queue states. Eventually, one of two things will randomly occur:

> 1. From the 1-in-queue state, the active task finishes processing before a new task arrives, bringing queue size to 0.

> 2. From the 2-in-queue state, a new task arrives in the queue before the active task finishes, causing the queue size to grow to 3.

drivebycomment4y ago

Classical misunderstanding of queueing. If you have a perfect control over an arrival, you can consider it as already queued. I.e. bank customers or grocery store checkout customers or online http requests for a service all arrive at random interval. You don't have a control over their arrival timing.

cormacrelf4y ago

This is not helped by the language used in the post. This was supposed to be an introduction to queuing, people shouldn’t come away with their classic misunderstandings completely intact.

> When a system has steady-state 100% utilization, its queue will grow to infinity. (Emph mine.)

This is clearly false, there are clearly counterexamples where the queue does not grow that way. You don’t have control over the arrival rate etc but it could just be your lucky day. It would be more accurate to say you can’t guarantee that either the queue length or wait time are finite. Author didn’t include the words “assume the worst case” or anything like that. I feel like that’s a pretty crucial piece of framing that’s missing.

2 more replies

crdrost4y ago

Yes, that is a very basic point, one basic reason to do queuing is that queues are buffers for uncertainty. If things are perfectly predictable, as you see on some robots, you allocate exactly what you need and no extra.

The uncertainty exists in two places though... It's not just the events coming in but also the time to process each queued event, is another source of uncertainty in the same system that can have the same effect.

The author mentions “M/M/1,” this quantifies that there is only one queue-handling process, that events come in with the sort of randomness that raindrops have (no periodicity), and that handling an event does not have any “timeouts” nor any “long tails” where once a handler has been occupied for 3 or 4 T (where T is the average time to resolve) then the prediction for how much time is remaining for them either drops to 0 (timeout) or escalates dramatically (long tail)... The exponential distribution is used, kind of to be nicely in between these, it is a model where over a time dt the handler is asked to pick a random number with success rate p=dt/T, if they pick the right random number then the event is handled, otherwise a new random number is generated, repeat ad nauseam.

The basic lessons are surprisingly transferable though. So for example traffic on the highway does not have this raindrop sort of randomness because cars kind of repel each other, they space out much more uniformly than you would have predicted on short term scales... The handling time is also nothing like this model of random guessing. Nevertheless, traffic jams still form at high utilization! And they form for a really interesting reason which is that the n handlers—lanes, we call them—have correlations between them. All these queuing models optimistically assume that a handler never stops another handler, “hey have you ever seen anything like this? where can I find that password?” etc.

But in traffic, a car can only lane change from one lane to another if both lanes synchronize speed. With a fixed ability to accelerate, this depends the amount of space between you and the next car, and the amount of space in the lane you want to switch to: as utilization increases both of these spaces drop and synchronization becomes a massive source of inefficiency.

Same happens at a company too, it is possible that an employee might spend more time waiting for meetings, and finding convenient times to schedule them, in order to synchronize two streams of work, than the work itself takes. Can be very frustrating!

remram4y ago

It does not immediately grow towards infinity, but it is unbounded. Over a sufficiently long period of time, any given queue size will be observed.

Assuming the input and output are just a little bit random, and are only equal on average. You will have periods during which the input and output balances out. You will have periods during which the output is faster, and the queue will shrink to 0 (but no lower). And you will have periods where the output is slower, and the queue grows, with no bound.

kingdomcome504y ago· 5 in thread

And what if items are processed at exactly the same rate they are added? Something is missing from this thesis. There must be an additional assumption at play beyond what is stated in the article.

seandavidfisher4y ago

The assumption you're looking for is there but relatively inconspicuous. He mentioned a M/M/1/∞ queue but then didn't go into the details about the M/M/1 part. From the Wikipedia page[0]:

> an M/M/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process...

So the arrival times are not arriving at any fixed rate, but according to a Poisson distribution. In the article he did reference this fact right before the first plot:

> (For the wonks: I used a Poisson arrival process and exponentially distributed processing times)

And then finally, he answered this question more directly in the comments to the article:

> What’s to prevent the system from bouncing between “1 task in processor & 1 task in queue” and “1 task in processor & 2 tasks in queue” while maintaining 100% utilization?

> 1. From the 1-in-queue state, the active task finishes processing before a new task arrives, bringing queue size to 0.

> 2. From the 2-in-queue state, a new task arrives in the queue before the active task finishes, causing the queue size to grow to 3.

[0]: https://en.wikipedia.org/wiki/M/M/1_queue

kingdomcome504y ago

Of course I know what was missing. My comment was more about making the implicit more explicit. I am not the only one who immediately thought of the above as a counter-example to the bold text (repeated twice) in the article.

prometheus764y ago

It's not just the arrival rate that is a poisson distribution, but it's also the processing time being a poisson distribution that really sets the table for the scenario the article is addressing. For me, it's easier to understand in a grocery store situation, where customers arrive in a poisson distribution at the checkout line, and that each cart has a poisson distribution on how long it takes to process the items in the cart. That second level of variance is key to the queue buildup, especially as the grocer checker reaches 100% utilization.

btilly4y ago

The processing time is exponential, not Poisson.

What all of this actually boils down to, though, is that the odds of your next event being completing a job or getting a new one are exactly even.

estro01824y ago

Theoretically I think that’s possible. In a practical setting it is not since task arrival times are of a probability distribution; this means that utilization is not 100% and therefore queues do not grow to infinity. Given the author’s point about ~80% and above being bad, I imagine the immediate processing scenario to still be pathological.

Edit: https://news.ycombinator.com/item?id=30601286

mwcampbell4y ago· 4 in thread

> but once you understand it, you’ll have deeper insight into the behavior not just of CPUs and database thread pools, but also grocery store checkout lines, ticket queues, highways – really just a mind-blowing collection of systems.

...and karaoke singer rotations. In 6 years of frequently singing karaoke at bars, I've never known a host to turn away new singers, unless it's almost time to end the show. So you could say the queue is unbounded, with predictable results for the few singers who show up early and then get frustrated with the ever-increasing time until their next turn as more singers arrive. I don't know what the best solution is.

bombcar4y ago

That's the same load shedding as grocery stores use - if everything gets too crowded people start leaving (not queueing).

Now that may actually be suboptimal for the business (if they can only let 10 people in they'd rather let the 10 who will spend the most, say) which is why things like reservations, etc come into play. I wonder if restaurants that do both reservations and a queue see one group pay more on average ...

Animats4y ago

That's the same load shedding as grocery stores use - if everything gets too crowded people start leaving (not queueing).

Yes. That's called the "rejection rate".

Unless, some of the time, queue length is zero, you will have a nonzero rejection rate. This is worth bringing up with store managers who want to run understaffed checkouts. One of the things retail consultants do is point out how sales are being lost that way, both in customers who leave and customers who never come back.

Much of my early work on network congestion was based on that. In the early days of networking, everyone was thinking Poisson arrivals, where arrivals are unaffected by queue lengths. This is partly because the original analysis for the ARPANET, by Leonard Klienrock, was done that way. It was done that way because his PhD thesis was based on analyzing Western Union Plan 55-A, which handled telegrams. (Think of Plan 55-A as a network of Sendmail servers, but with queues made from paper tape punches feeding paper tape readers. The queue was a bin between punch and reader.)[1], at 7:00. Queue length was invisible to people sending telegrams, so senders did behave like Poisson arrivals. That's still true of email today.

The IP layer is open loop with rejection. Transport protocols such as TCP are closed loop systems. The two have to be considered together. Everybody gets this now, but it was a radical idea in 1985.[2]

[1] https://archive.org/details/Telegram1956

[2] https://datatracker.ietf.org/doc/rfc970/

1 more reply

claytonjy4y ago

I've never managed a restaurant, but thinking about my own habits I'd bet that a restaurant that takes reservations but also some walk-ins almost surely makes more on the reservation guests. The meal is the main attraction for them, often a special occasion, while walk-ins are more likely to spread their money over other businesses (shopping before, drinks elsewhere after, etc.). I bet group size is also higher for reservations; most people know a reservation is all-but-required for larger groups.

testudovictoria4y ago

Focus on the bottleneck. There is a finite amount of time. Increase it to allow more participants.

There's also putting restrictions on the input. Have a maximum song length.

Otherwise, require additional constraints before queueing like making it computationally expensive or having a priority system based on criteria. Charge a cover (with increasing costs) or have a membership club that gives more/better access.

_0w8t4y ago· 3 in thread

A lot of frameworks have queues bounded only by the available memory. As the article nicely demonstrates it implies that CPU must be idle at least 20% of time for that to work to have reasonable bounds on latency. With various GUI event queues that is OK, but it puzzled me why, for example, Erlang opted to have unbounded message queues. Did not Erlang designers care about latency?

yetihehe4y ago

In erlang it is standard to have 5 seconds timeout on queries. So, when many processes start to send to one process and it gets overwhelmed, other processes start having timeouts and typically they have some backoff strategy. Essentially queues are not bounded by size, but by time. Most processes don't send messages in a fire-and-forget manner, but wait for confirmation of execution.

toast04y ago

Erlang designers certainly cared about latency, but a bounded message queue changes local messaging from reliable to unreliable and remote messaging mostly reliable (as long as the nodes don't disconnect) to unreliable. That's a big change and they were unwilling to do it for a long time, although I think there is an option to put bounds on the message queue now. It was possible to have bounds before --- either with brute force looking at the queue length for all processes and killing them, or looking at the queue length within the process and discarding requests if there are too many.

Also, for a long time, sending messages to a process with a long queue might be deprioritized (suspended), as a form of backpressure, although that got removed for local processes because it wasn't effective with SMP, if I'm remembering properly.

actionfromafar4y ago

Maybe the Erlang way is more letting the entire thread crash and "bound" it with a restart?

civilized4y ago· 3 in thread

> If the processor is always busy, that means there’s never even an instant when a newly arriving task can be assigned immediately to the processor. That means that, whenever a task finishes processing, there must always be a task waiting in the queue; otherwise the processor would have an idle moment. And by induction, we can show that a system that’s forever at 100% utilization will exceed any finite queue size you care to pick:

This argument is incorrect because it proves too much. If tasks arrive predictably, say at the rate of one per minute, and the processor completes tasks at the same rate, you can have maximum throughput and a finite queue -- indeed, the queue never has to exceed two waiting tasks. The unbounded queue growth problem only arises when tasks arrive unpredictably. Since this argument proves the same conclusion without regard to predictability, it's incorrect.

Here's how I think it actually works: if tasks randomly come in and out at the same rate, the size of the queue over time can be modeled as an unbiased random walk with a floor at zero. Elementary techniques prove that such a random walk will hit zero in finite time, and will therefore hit zero infinitely many times. These correspond to the idle times of the processor. The only way to avoid hitting zero over and over is if tasks come in faster than they come out, which leads to tasks piling up over time.

assbuttbuttass4y ago

Thanks, I was confused by the article's induction proof which seems to completely ignore new tasks coming in. The random walk argument is very clear.

KaiserPro4y ago

I think you are putting too many fancy words into the problem.

Queues are balloons. If you put more air in than you are taking out, they grow until they pop. As utilisation grows (ie as the input volume of air starts to match the output) it takes much longer to get rid of the backlog.

This means that any traffic anomalies that would otherwise be hidden by the queue are instantly amplified.

civilized4y ago

If all you care about is engineering intuition, you can use any number of metaphors. I'm talking about what it takes to understand queues mathematically and prove correct results about them, rather than incorrect results.

lliamander4y ago· 3 in thread

This explains why every JIRA backlog I've seen always grows until it becomes unwieldy.

I wonder if anyone here has experience with a fixed-size backlog for work tickets.

dcow4y ago

Shape Up [1] (from basecamp) rejects the notion of backlogs essentially for this reason. We've leaned into adopting it without using basecamp software specifically and it's pretty refreshing.

[1]: https://basecamp.com/shapeup

jrockway4y ago

What do you do when the queue overflows? Block work on backlog capacity? Drop the new item on the floor?

The problem is that people add things to the backlog during their ordinary work, so as you scale up the number of workers, the pressure on the queue increases.

That said, my experience in doing bug triage is that some large percentage of tickets are never going to be done. Those should just be dropped on the floor.

dcow4y ago

Yeah you drop the old ones. And then the insight is that nobody ever has time to process the backlog anyway so just drop the new ones instead and delete the backlog. If the work is important, it will come back up as a new request during cycle planning.

Scene_Cast24y ago· 2 in thread

What are the underlying assumptions we can break here?

For example, what if tasks were not monolithic? As the queue size grows, increase the caching timeout and don't hit that DB, or decrease timeout, or something like that.

Or, what if the input task variance was bounded and we just initialized the queue with 10 tasks? This way, the addition of tasks would never dip below 1 and would never exceed 20 (for example).

jusssi4y ago

Some more:

Processing capacity might not be constant. If you're in a cloud, maybe you can launch more processing power as queue length increases.

Multiple items in the queue might be satisfiable by the same processing, e.g. multiple requests for the same item. In that case, having more requests in queue can increase processing efficiency.

Edit: another one. All requests may not need the same quality of service. For some, best effort might be acceptable.

GordonS4y ago

I wonder if it might also be possible to introduce a kind of tiered storage, where, for example, the queue starts persisting to cheap, massive block storage (such as cloud block storage) instead of it's usual mechanism. That does imply tier 2 events would read slower when the busy period ceased though.

bob10294y ago· 2 in thread

For an alternative take on this, read up on the LMAX Disruptor. Of particular note is section 2.5 "The problem of queues."

https://lmax-exchange.github.io/disruptor/disruptor.html

It has a completely opposite pattern: the more you load it the faster it goes.

gpderetta4y ago

> It has a completely opposite pattern: the more you load it the faster it goes.

Not sure what do you mean with that. The article is about the general theoretical properties of (unbounded) queues.

LMAX is a bounded queue, with the quirk that it will drop older messages in favour of newer ones and it assumes that either there is a side channel for recovery or consumer(s) can tolerate message drops.

bob10294y ago

You're right, that was a rough take. My presentation of LMAX was to simply provide a different perspective over this space.

What I was trying to get at is the specific memory access patterns, batching effects and other nuances related to the physical construction of the CPU which modulate the actual performance of these things. I think this quote better summarizes what I was trying to convey:

> When consumers are waiting on an advancing cursor sequence in the ring buffer an interesting opportunity arises that is not possible with queues. If the consumer finds the ring buffer cursor has advanced a number of steps since it last checked it can process up to that sequence without getting involved in the concurrency mechanisms. This results in the lagging consumer quickly regaining pace with the producers when the producers burst ahead thus balancing the system. This type of batching increases throughput while reducing and smoothing latency at the same time. Based on our observations, this effect results in a close to constant time for latency regardless of load, up until the memory sub-system is saturated, and then the profile is linear following Little’s Law. This is very different to the “J” curve effect on latency we have observed with queues as load increases.

maayank4y ago· 2 in thread

Any good book/exploratory paper for queuing theory?

Jtsummers4y ago

https://web2.uwindsor.ca/math/hlynka/qonline.html

A large list of books (free and online), I cannot speak to the quality of any particular book but you can take some of the titles and authors and perhaps find reviews.

http://web2.uwindsor.ca/math/hlynka/qbook.html

Linked at the top of the first page I shared, but not available online (though a few are available through some services like O'Reilly's digital library).

maayank4y ago

Many thanks!

FourthProtocol4y ago· 2 in thread

Messaging middleware such as queues is mostly redundant. Simplistically -

Client --

If you cache a new submission (message) locally before submitting, you just keep re-submitting until the server returns an ACK for that submission; and your local cache is empty.

Scale clients out as needed.

Server --

If a message is received in good order, return an ACK.

If a message is a duplicate, discard the message and return an ACK.

If a message can only be received once, discard if it already exists on the server, and return an ACK.

If a message is invalid, return a FAIL.

Scale hardware out or up, as needed (ref. "capacity, item 2 in the linked blog post above). Scale queues out by adding service endpoints on the server. Async/await makes the client experience painless. You save your employer $$ because no additional procurement contracts, no licensing fees, no additional server-side infrastructure to run queues, and no consultancy fees to set up and operate Rabbit MQ/Tibco/IBM MQ/Amazon SQS/Azure Queue Storage or whatever other queueing solution the org uses.

Message passing includes concepts auch as durability, security policies, message filtering,delivery policies, routing policies, batching, and so on. The above can support all of that and, if your scenario calls for it, none of it.

The financial argument reduces to dev time vs. procurement, deployment and operational costs of whatever 3rd party product is used, as well as integration, of course.

* I note and welcome the downvotes. However I'd love it more if you'd present a coherent counter argument with your vote.

remram4y ago

> If you cache a new submission (message) locally before submitting, you just keep re-submitting until the server returns an ACK for that submission; and your local cache is empty.

This has terrible resource use and offers no visibility into how many clients are waiting. And yet it's still a queue. Why would anyone do that?

The rest of your post I can't parse at all.

joseph8th4y ago

I think you have defined "queue" too narrowly in the context of the OP article. MQs are one thing, but the article is about queues as data structures.

A directory of files to be processed may be treated as a queue. Add several directories for different stages of processing, and you have a rudimentary state machine.

Distributed systems in particular may benefit from an MQ, but are by no means necessary.

Generally, when we add an MQ we are really regulating an already existing implicit queue. It's such a common and intuitive data structure that one may easily create one without even realizing it.

natmaka4y ago· 1 in thread

If some logic re-orders a job-related queue in order to group similar things, enhancing caches' hit-ratio, then a quite large queue can be quite effective (especially if there is no priority nor barrier, however in such a case the high throughput cost is a high maximal latency).

taneq4y ago

Exactly. In fact I'd go so far as to say this is an equally Important Thing to the Important Thing in the article. If your queue is just a simple FIFO / LIFO / whatever to buffer surges in demand until you get to them, then you absolutely get the "queue suddenly goes to infinity as you hit a critical point near maximum throughput" thing happening.

If, however, you use the queue as a way to process queries more efficiently then that's a different story.

__dt__4y ago· 1 in thread

Interesting article, what bothers me a bit is

> I won’t go into the M/M/1 part, as it doesn’t really end up affecting the Important Thing I’m telling you about. What does matter, however, is that ‘∞’.

I'm really no expert but if I understand it correctly the `M/M` part defines the distributions of the arrival and service processes, so this definitely is important as others have already mentioned in the comments. E.g. a D/D/1 queue where D stands for deterministic arrival shouldn't suffer from this problem.

This doesn't change the interesting fact the article presents but throwing in technical terms without proper explanation is imo bad style. Either don't mention it (would be better in this case) or explain it and why it is relevant.

This is also a common "mistake" unexperienced people make when writing papers. They mention a fact that is somehow related but is completely irrelevant to the topic and the rest of the paper. I don't want to assume anything but to me this often smells like showing-off.

joseph8th4y ago

It's also a problem experts have, when assuming what is "common knowledge" to their readers.

Relevant xkcd: https://www.explainxkcd.com/wiki/index.php/2501:_Average_Fam...

encoderer4y ago· 1 in thread

This is unworkable if you are actively scaling your system. Am i supposed to calculate ideal queue size with each scale out of my data platform?

Instead, the right way to think about limiting queue size is load shedding when you feel back pressure.

Here’s an example at Cronitor: if our main sqs ingestion queue backs up, our pipeline will automatically move from streaming to micro-batching, drastically reducing the number of messages on the queue at the expense of slightly increased latency per message. At the same time, a less critical piece of our infrastructure pauses itself until the queue is healthy again, shedding one of the largest sources of db load and giving that capacity to ingestion.

To me the goal is to feel and respond to back pressure before blowing up and rejecting messages.

charcircuit4y ago

How do you feel back pressure? Do you measure the derivative of the queue in respect to time?

detritus4y ago· 1 in thread

The only thing to understand - and accept - is that you will always pick the wrong queue. So will everyone else. Always.

I appreciate that this truism has nothing to do with the article.

prometheus764y ago

To enumerate your viewpoint: if there are five checkout lines at the grocery store, your odds of picking the fastest line are 1/5 (20%). The more grocery store lines there are, the lower the odds are of you picking the fastest line.

They added "express lanes" back in the 80s to somewhat address this, but then you have the person who ignores the sign and gets in the line with a full cart.

jph4y ago

Queuing theory in more detail: https://github.com/joelparkerhenderson/queueing-theory

The article by Dan is referenced there too.

sporkland4y ago

Perfect time to ask, I've been toying with using the netflix concurrency limits library for a while as opposed to the manual tuning of threadpools, queue depths, etc to achieve good utilization at certain latency. Curious if others have experience with it, and their thoughts: https://github.com/Netflix/concurrency-limits

FWIW envoy also has an adaptive concurrency experimental plugin that seems similar that I'd also love to hear about any real world experience with: https://www.envoyproxy.io/docs/envoy/latest/configuration/ht...

kazinator4y ago

What the naysayers are missing is the definition of load; read it again: "[C]apacity is the number of tasks per unit time that can be processed."

But tasks are variable. One task requires 5 minutes of processing, another one of 35. They arrive randomly. This is also explicitly given, in a caption under one of the diagrams. "A queueing system with a single queue and a single processor. Tasks (yellow circles) arrive at random intervals and take different amounts of time to process."

People calling the article wrong may be thinking of capacity as "unit time amount of work"; like when the processor is at full capacity, it's doing one second's worth of work every second.

If we define capacity this way, the problem goes away: the processor just becomes a leaky bucket. So that is to say, if we know exactly how long each task will take, then we can measure the queue size in terms of total number of seconds of work in the queue. And so then, as long as no more than one second's worth of work is being added to the queue per second, it will not grow without bound, just like a leaky bucket that is not being refilled faster than its leak rate.

When capacity is given as a maximum number of tasks per second, there has to be some underlying justification for that, like there is some fixed part to servicing a job such as set up time and clean up time that doesn't go away even if the job takes next to zero seconds, such that jobs effectively have a built-in minimum duration. If it takes one second to set up a job, and one second to clean up, then the maximum capacity is half a job per second: 1800 jobs per hour and so on. Of course the queue starts to backlog when you approach capacity, because the jobs also require nonzero real work in relation to the administrative time.

If jobs have no minimum fixed cost attached, then the maximum job rate is unbounded: the shorter the jobs being queued, the more of them can be done per unit time: one million one-microsecond jobs can be done in a second, or a billion one-nanosecond jobs, and so on.

motohagiography4y ago

If you are interested in modelling microservices, service busses, and pubsub topologies, you can do some rough models of queues in python with this: queueing-tool.readthedocs.io/

KaiserPro4y ago

This is a good article.

Ted Dzuba Monitoring theory covers unbounded queues pretty well: http://widgetsandshit.com/teddziuba/2011/03/monitoring-theor...

AtlasBarfed4y ago

backpressure is the mechanism that solves this

or less elegantly, circuit breaking / fail-retry

ncmncm4y ago

Adding just a little capacity makes need for queue capacity collapse.

But if rates are not well specified, what "a little" means is also not, and rejecting additions to the queue is the only answer.

j / k navigate · click thread line to collapse