I've been writing ring buffers wrong all these years (opens in new tab)

(snellman.net)

361 pointsb3h3moth9y ago167 comments

167 comments

89 comments · 25 top-level

phaemon9y ago· 18 in thread

> Join me next week for the exciting sequel to this post, "I've been tying my shoelaces wrong all these years".

Probably. Use the Ian Knot: http://www.fieggen.com/shoelace/ianknot.htm

Seriously, spend 20 mins practising this, and you'll never go back to the clumsy old way again.

Bognar9y ago

The Ian Knot is quick, but as someone who never ties their shoes and just slips them on and off, I much prefer Ian's Secure Knot: http://www.fieggen.com/shoelace/secureknot.htm

I usually tie this knot twice over the lifetime of a pair of shoes. Once when I get them, and once more when they're worn in and need to be tightened.

hangonhn9y ago

I preach this knot to everyone I can. I'm a runner and a running coach. I've run literally thousands of miles (approaching 10,000 at this point) with this knot and it has NEVER come undone.

The really nice thing about this knot is that it looks really nice too so you can use them on both running shoes and dress shoes.

It makes no sense to teach the more common shoe tying knots.

1 more reply

chrisseaton9y ago

I don't get it - both of these knots seem to be identical to the standard shoelace knot, just illustrated differently.

4 more replies

cema9y ago

This is the double slip knot, I think. I have recently started using it (the standard knot is going loose too fast the way I wear my shoes) and will never go back to the standard knot. Just so good.

1 more reply

tmaly9y ago

These look like variations on the square knot. How is he the inventor? I was looking at a field scout manual dated 1948 the other night where they have this same knot.

tbirdz9y ago

I've been using this knot for 8 years, and it hasn't come untied on me once! I'd highly recommend it.

barrkel9y ago

How do you take your shoes off without untying the knot?

fbonetti9y ago

That looks very similar to the handcuff knot

torrent-of-ions9y ago

More importantly, make sure your starting knot and main knot are correct with respect to each other. When I learnt the Ian knot, I later learnt that I'd been tying my shoes using a "granny knot": http://www.fieggen.com/shoelace/grannyknot.htm If you are doing this, the easiest thing to do is reverse your starting knot; relearning the main knot is going to be much harder.

Since learning the Ian knot (and correct starting knot) I can honestly say I enjoy tying my shoes every day and relish the opportunity to tie a bow at any other time.

schoen9y ago

Wow, after reading that page I realized that I did this all throughout elementary school and middle school, which explains why my shoelaces would come undone all the time.

I use the "Two Loop Shoelace Knot Bad Technique 1" from that page.

In recent years I've been wearing shoes with a different fastening mechanism, but I have to tie some dress shoes for a wedding tomorrow, so this is very timely knowledge!

dahart9y ago

I wasn't given the attribution when I heard about this, it's nice to see the face behind it. Not sure I can even tie my shoes the old way anymore, I've never done it once since learning Ian's knot. Every once in a while someone observant will see me doing it and say, "whoa, WHAT?" I do get a kick out of telling people I re-learned how to tie my shoes on the internet.

kutkloon79y ago

It seems to be almost the same as the traditional method to me, since the crossing of the laces seems to cost about 50% of the time. It gets a lot better when you leave the crossing in. Edit: Apparently, there also is a routine to get the crossing in a nice, fast way, it just wasn't included in the pictures :)

More importantly, I can't seem to get a Ian's knot very tight. Does this get better over time?

RBerenguel9y ago

I prefer the double slipknot (mentioned below as Ian's Secure Knot). I started doing it for basketball, but it not only looks better (more even) on normal, dress shoes, but it neves come off on its own (but is easy to pull apart voluntarily). Also, it's not more complicated than a normal knot, it's more or less doing it "twice, in reverse".

spatulon9y ago

I love that method, and have used it exclusively for years, but the best thing about Ian's site is the explanation of the Granny Knot: http://www.fieggen.com/shoelace/grannyknot.htm

So many people walk around assuming they need to do complicated double knots to stop their shoelaces untying themselves. If only they knew they were doing Granny Knots, and that a standard knot is perfectly secure if tied properly.

flnz9y ago

thank you for this link. I didn't know about this website and I find it amazing. I just upgraded my shoes to a Secure Knot.

DonHopkins9y ago

I guess you're not a Perl programmer, otherwise you'd be using duct tape instead of shoe laces.

vaughngh9y ago

Been using this for years, symmetric knots ftw!

snerbles9y ago

I don't have to practice wearing loafers.

jgrahamc9y ago· 14 in thread

This is of course not a new invention. The earliest instance I could find with a bit of searching was from 2004, with Andrew Morton mentioning in it a code review so casually that it seems to have been a well established trick. But the vast majority of implementations I looked at do not do this.

I was doing this in 1992 so it's at least 12 years older than the 2004 implementation. I suspect it was being done long before that. Back then the read and write indexes were being updated by separate processors (even more fun, processors with different endianness) with no locking. The only assumption being made was that updates to the read/write pointers were atomic (in this case 'atomic' meant that the two bytes that made up a word, counters were 16 bits, were written in atomically). Comically, on one piece of hardware this was not the case and I spent many hours inside the old Apollo works outside Boston with an ICE and a bunch of logic analyzers figuring out what the hell was happening on some weird EISA bus add on to some HP workstation.

It's unclear to me why the focus on a 2^n sized buffer just so you can use & for the mask.

Edit: having had this discussion I've realized that Juho's implementation is different from the 1992 implementation I was using because he doesn't ever reset the read/write indexes. Oops.

wglb9y ago

I was doing this in 1992 so it's at least 12 years older than the 2004 implementation

Try late 1960s. Generally known then, widely used.

For an interesting proof about tokens in ring buffers, check out https://www.cs.utexas.edu/users/EWD/ewd04xx/EWD426.PDF, which, for 1974, has an interesting bit of multiprocessing.

kazinator9y ago

> It's unclear to me why the focus on a 2^n sized buffer just so you can use & for the mask.

The cost of a mask can probably be entirely buried in the instruction pipeline, so that it's hardly any more expensive than whatever it costs just to move from one register to another.

Modulo requires division. Division requires a hardware algorithm that iterates, consuming multiple cycles (pipeline stall).

ambrop79y ago

You do not need modulo or division to implement non-power-of-2 ring buffers. Because you will only increment by one. So instead of "x = x % BufferSize" you can do "if (x >= BufferSize) x -= BufferSize;" or similar.

That's for "normal" ring buffers. I suspect that the design described in the article can be implemented for non power-of-two without division but I'll need to think about the details.

3 more replies

caf9y ago

Modulo by a constant doesn't require a division, you can instead use multiplications, shifts, adds and subtracts. This transform is typical for compilers. For example, this is what gcc targetting x86_64 does to perform % 17 on the unsigned value in %edi:

  movl	$-252645135, %edx
  movl	%edi, %eax
  mull	%edx
  movl	%edx, %eax
  shrl	$4, %eax
  movl	%eax, %edx
  sall	$4, %edx
  addl	%edx, %eax
  subl	%eax, %edi

2 more replies

jwatte9y ago

The real problem with modulo is at the end of the integer range. Add one and overflow and suddenly jump to a totally different index in the array!

BTW: read and write pointers, power of two, did that in BeOS (1998) and many sound drivers did it earlier than that. To me, that seemed like the obvious way to do it when I needed it.

jsnell9y ago

It's not just an optimization, it's necessary for correct operation. With a non-power of two buffer the integer wraparound causes a discontinuity.

heisenbit9y ago

As far as I can tell that int wrap around could be avoided by

- subtracting buffer size from both pointers once the read pointer has wrapped.

- choosing a longer int for the math operation where possible

That seems a small price for the freedom to be able to choose an appropriate buffer size.

2 more replies

jgrahamc9y ago

No, it's an optimization.

3 more replies

fla9y ago

Depending on the machine, a modulo can cost _alot_ .

Last time I checked, the operation cost was 26k cycles on my PIC.

Using a 2^n + mask made my queue perform 10 times faster (if not more).

jstimpfle9y ago

Don't use a modulo then. Use subtraction.

1 more reply

pklausler9y ago

For what it's worth, Seymour Cray's Control Data 6600 implemented ring buffers for I/O channels correctly in hardware using gumdrop-sized single transistors in 1963. This is not exactly a new technology.

NotThe1Pct9y ago

Me too, back in 2005 we were already doing all this. Also there is an easy hack that solves the overflow part.

kitsuac9y ago

It's odd that you were using ring buffers in 1992 for low level code but don't understand the value of avoiding a modulus instruction. Masking is far more efficient and often a ring buffer will be used in code where performance is absolutely critical.

jgrahamc9y ago

You wouldn't use the modulus operation. You aren't adding some arbitrary number that's going to make you increase either index by more than the buffer length so you know that at worse you are going to need to subtract the length of the buffer.

IIRC the way we made this really fast was the write the buffer backwards. That way you can detect wrapping around the buffer because DEC will underflow and set the sign flag. Then you can JS to whatever code needs to ADD back the buffer length to handle the wrap around.

But 2^n has another problem (back in that era): buffer size. You are stuck with 1K, 2K, 4K, etc. buffers. When memory is tight you likely need something very specific, so you end up with the solution we had.

But, hey, if memory is free use 2^n bytes for your buffer.

2 more replies

planckscnst9y ago· 6 in thread

This is another interesting ring buffer implementation that uses mmap. https://github.com/willemt/cbuffer

lomnakkus9y ago

I was waiting for someone to mention this -- it seemed much more interesting to me. It's a real classic in the "what the hell, you can do that?" category. (Bonus points if you've done it in a language that requires "extra data" for strings, like storing the length somewhere.)

I must admit that I never actually benchmarked my implementation properly -- it might be interesting to see if there are actual trade-offs between mmap vs. copying. (I'm guessing that nothing can beat MMU support, but I think the MMU also supports copy operations, so...?)

leni5369y ago

With the additional benefit that one can have arbitrary slices between head and tail as a contiguous memory region.

ohazi9y ago

That's so cool. Unfortunately for me, the one time I could have used something like this, I was working on an embedded system with no mmap / virtual memory.

AndyKelley9y ago

Here's another implementation that works on Windows too: https://github.com/andrewrk/libsoundio/blob/master/src/ring_...

csl9y ago

This seems to use modulus. The whole point of the mmap trick is to get the kernel/MMU to do the work for you, IIRC.

EDIT: Oops, I see they use mirrored memory here as well.

jevinskie9y ago

Mike Ash talks about an implementation for macOS/iOS: https://www.mikeash.com/pyblog/friday-qa-2012-02-03-ring-buf...

ams61109y ago· 5 in thread

Why do people use the version that's inferior and more complicated?

Because it's easier to understand at first glance, has no performance penalty, and for most busy programmers that often wins.

hzhou3219y ago

The first version always leaves a "clean" state, that is both indices points to actual array locations. A mentally "clean" state makes understanding easier. For the third version one has to keep in mind the wrap around behavior of computer specific integers throughout the comprehension process, so it is a bit more difficult (to understand).

kbenson9y ago

The third version also allows for the write index to be a counter of total store operations, at least until overflow, which could be useful.

alfalfasprout9y ago

The reasoning comes down to how you use it. I use ringbuffers for ultra low latency buffering of market data for instance. If my ringbuffer is so full that I'm worried about its length approaching its capacity then I'm doing something wrong and I should be willing to lose the data. 1 element isn't going to make the difference.

The real reason to stick with the first approach is that your static analysis tools won't freak out that you have intentional unsigned int overflow. Heck, some compilers will now scream at you for doing this. Then what happens when someone goes to port your code to a language with stricter overflow behavior? It won't work.

IMO even in realtime systems, I don't use this. Heck, the linux kernel even uses the original version.

gwu789y ago

"Why do people use the version that's inferior and more complicated?"

This question needs little context to be relevant, so long as the topic is "computer programming".

Certainly not limited to writing ring buffers. It could be an apropos comment in almost any discussion.

Of course in many cases, the part about "no performance penalty" does not apply. Performance is a routine trade off for some other perceived gain.

falcolas9y ago

And you don't have expend any mental energy on the integer overflow edge case. It should be handled by using a bitmask and a power-of-2 sized array, should.

falcolas9y ago· 5 in thread

Usually when I'm writing a ring buffer, it's for tasks where the loss of an item is acceptable (even desirable - a destructive ring buffer for debugging messages is a fantastic tool). As such, I simply push the read indicator when I get to the r=1, w=1 case.

Using the mask method is slick (I'd cache that mask with the array to reduce runtime calculations), but it's definitely going to add cognitive overhead and get messy if you want to make it lockless with CAS semantics.

dllthomas9y ago

> (I'd cache that mask with the array to reduce runtime calculations)

So, store size-1 instead of size, and add one when asked for the size? I can see that, though I'm not confident it's worth the conceptual overhead.

If you mean storing it in addition to the size, I think that's a bad trade - cache is far more precious than many decrements.

Of course, if the size is fixed at compile time, the mask will probably be stored baked into the instructions (andl <const>, ...).

Bartweiss9y ago

In general, this makes sense; certainly data you're putting into a ring buffer is data you're willing to lose.

Doesn't it break the order invariant of the buffer, though? I can't see a way to do this without the risk of getting reads of newer data prior to older data. That's probably fine in many cases, but something like non-timestamped-debugging strikes me as a case where I'd want to know that the data arrived in the order I'm seeing.

falcolas9y ago

> Doesn't it break the order invariant of the buffer, though

No, if you increment the read pointer prior to the write pointer, the read pointer will still point at the oldest valid value in the buffer.

So, in pseudo code:

    if (w+1 >= r) {
       r = w + 2
    }
    w++
    b[w-1] = value

For a debugging ring buffer (i.e. looking at it in a core file), you have the last value of the write pointer, so you can simply read from write pointer + 1 back around to the write pointer and have your messages in order. This makes the assumption that there is no readers of the debug buffer, so you're only having to deal with the one pointer.

dllthomas9y ago

> certainly data you're putting into a ring buffer is data you're willing to lose.

When that's the case, a ring buffer is a great choice. It's not required, though - the writer could block when it detects a full buffer.

blub9y ago

This is exactly what I was thinking.

When pushing D in their example they overwrite the value to be read and items are out of order now.

But maybe I'm missing something, I lost interest at all the bit-twiddling.

hzhou3219y ago· 4 in thread

He keeps stating the case of one-element ring buffer. Is that a real concern ever?

qb459y ago

Probably it was a joke though one can imagine the size being configurable which surely would lead to interesting results if somebody sets it to 1 for some reason (like troubleshooting).

alanbernstein9y ago

It seemed like a sarcastic comment to me. Why would that ever be used?

jsnell9y ago

It's indeed a ridiculous data structure, but I did actually need it.

It's a dynamically sized ring buffer with an optimization analogous to that of C++ strings; if the required capacity is small enough, the buffer is stored inline in the object rather than in a separate heap-allocated object. So something in the spirit of (but not exactly like):

  struct rb {
      union {
          Value* array;
          // Set N such that this array uses the same amount of space as the pointer.
          Value inline_array[N];
       };
      uint16_t read;
      uint16_t write;
      uint16_t capacity;
  }

You'd dynamically switch between the two internal representations, and choose whether to read from array or inline_array based on whether capacity is larger than N. In this setup it'd be pretty common for N to be 1. Having to add a special case to every single method would kind of suck, generic code that could handle any size seemed like a nice property to have.

lomnakkus9y ago

Weirdly, I think Haskell has an equivalent: MVar. It has its (low-level) uses, but its quite hard to get any sort of non-trivial (non-rendezvous) synchronization protocol right. It's incredibly easy to deadlock. (But that may be mostly to do with the MVar's paucity of non-blocking primitives.)

falcolas9y ago· 4 in thread

My C is rusty, but won't this act... oddly... on integer overflow?

    size()     { return write - read; }

0 - UINT_MAX -1 = ?

[EDIT] Changed constant to reflect use of unsigned integers, which I forgot to specify initially.

crististm9y ago

Actually, this method counts on it.

What I find interesting are the trade-offs: machine vs explicit integer wrap-around and buffers with maximum ~size(int)/2 vs ~size(int).

falcolas9y ago

Got it. Modular arithmetic was the term I was looking for to resolve this.

    (0 - (2^32 - 1)) % 2^32 = 1

mfukar9y ago

In all examples, `read` and `write` are unsigned, and since they both are the same type, no integer conversions are performed, ergo no overflow.

PS. No wrap-around either, for different reasons.

falcolas9y ago

> No wrap-around either, for different reasons.

You'll have to explain that to me, since I can't assign `x = 2^32` without wraparound when x is an unsigned 32 bit integer.

zimpenfish9y ago· 2 in thread

If you use modulus instead of bitmasking, it doesn't have to be power-of-2 size, does it?

DblPlusUngood9y ago

No, the size of the array doesn't need to be a power-of-2 if you use modulus to derive indices. But you need to deal with the overflow somehow. For instance:

0xffffffff % 7 = 3, but (0xffffffff + 1) % 7 = 0.

RBerenguel9y ago

Also as mentioned elsewhere in the comments, modulo is expensive, even more for non-powers of 2

2 more replies

doktrin9y ago· 2 in thread

> So there I was, implementing a one element ring buffer. Which, I'm sure you'll agree, is a perfectly reasonable data structure.

I didn't even know what a ring buffer was

where do I dispose of my programmer membership card?

edit : lol, what a hostile reaction...

doktrin9y ago

I honestly can't tell whether the downvotes are from elitist neckbeards or offended plebs

pls explain I'd love to hear

mikekchar9y ago

Probably just because it doesn't add to the discussion. Though, from a certain standpoint it shows one of the problems with our education system pretty clearly. This is truly a fundamental technique. I don't know how one gets out of school without knowing it. It doesn't say anything about you, but it says a lot about what we are teaching people. Embarrassingly, for a long time I thought I had invented this technique ;-)

1 more reply

ChuckMcM9y ago· 1 in thread

I have always considered these "double ring" buffers. Along the same lines as how you figure out which race car is in the race is in lead by their position and lap count. You run your indexes in the range 0 .. (2 * SIZE) and then empty is

    EMPTY -> (read == write)
    FULL -> (read == (write + SIZE) % (2 * SIZE))

Basically you're full if you're at the same relative index and your on different laps, you are empty if you at the same relative index on the same lap. If you do this with power of 2 size then the 'lap' is just the bit 2 << SIZE.

nickodell9y ago

No, I think the author is using the full range of a 32 bit int. So read could be any 32 bit integer, even if the size of the ring is 1.

(The trick is that SIZE has to be a power of two, or else when you increment from 2^32-1 to 0, your pointers will jump to a different position in the array.)

dom09y ago· 1 in thread

A very related post by ryg: https://fgiesen.wordpress.com/2010/12/14/ring-buffers-and-qu...

gcatlin9y ago

Another one by ryg: https://fgiesen.wordpress.com/2012/07/21/the-magic-ring-buff...

phkahler9y ago· 1 in thread

I find the headline very interesting. It's very inviting because of the way it expresses a sort of epiphany about doing it wrong on a mundane programming task. One is tempted to read it in order to see if there is some great insight to this problem. just maybe it's applicable outside this one problem. It begs the question: if he's been doing it wrong on a fairly mundane thing, maybe I am too. I need to see what this is about.

buzzybee9y ago

I believe it's very common to find little variations on algorithms or coding style like this that could produce a nice gain in efficiency or elegance. They aren't really the same problem as whole-system engineering, though, since most of your bottlenecks come from the algorithm that is completely unsuitable, not the one that is a little bit suboptimal.

geophile9y ago· 1 in thread

His favored solution introduces subtlety and complexity. Remember that 20-year old binary search bug in the JDK a few years ago? That is the sort of bug that could be lurking in this solution.

I understand not wanting to waste one slot. A third variable (first, last, count) isn't too bad. But if you really hate that third variable, why not just use first and count variables? You can then compute last from first and count, and the two boundary cases show up as count = 0 and count = capacity.

simonbw9y ago

> Why not just use first and count variables?

I think he addressed that in the post:

The most common use for ring buffers is for it to be the intermediary between a concurrent reader and writer (be it two threads, to processes sharing memory, or a software process communicating with hardware). And for that, the index + size representation is kind of miserable. Both the reader and the writer will be writing to the length field, which is bad for caching. The read index and the length will also need to always be read and updated atomically, which would be awkward.

cannam9y ago

I love the way this discussion has divided neatly into thirds: history of ringbuffers; digression on shoelaces; fragmentary, widely ignored, replies about everything else (this one included, I'm sure).

I like this kind of article and enjoyed this particular one, but the long discussion above about the "right" way to do it goes some way to justifying why so many people are happy to do it the "wrong" way.

I've implemented and used ring buffers the "wrong" way many times (with the modulus operator as well!) and the limitations of this method have never been a problem or bottleneck for me, while its simplicity means that it's easier to write and understand than almost any other data structure.

In most practical applications, it's memory barriers that you really have to worry about.

tveita9y ago

The Linux kernel seems to leave one element free, which surprised me, but it does have this interesting note about it:

https://www.kernel.org/doc/Documentation/circular-buffers.tx...

  Note that wake_up() does not guarantee any sort of barrier unless something
  is actually awakened.  We therefore cannot rely on it for ordering.  However,
  there is always one element of the array left empty.  Therefore, the
  producer must produce two elements before it could possibly corrupt the
  element currently being read by the consumer.  Therefore, the unlock-lock
  pair between consecutive invocations of the consumer provides the necessary
  ordering between the read of the index indicating that the consumer has
  vacated a given element and the write by the producer to that same element.

RossBencina9y ago

From what I understand, this is the way you'd do it with hardware registers (maintain the read and write indices each with one extra MSB to detect the difference between full/empty).

We've been using similar code in PortAudio since the late 90s[0]. I'm pretty sure Phil Burk got the idea from his hardware work.

[0] https://app.assembla.com/spaces/portaudio/git/source/master/...

pawadu9y ago

> This is of course not a new invention

No, this is a well known construct in digital design. Basically, for a 2^N deep queue you only need two N+1 bit variables:

http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIF...

tankfeeder9y ago

PicoLisp: last function here as circular buffer task https://bitbucket.org/mihailp/tankfeeder/src/3258edaded514ef...

build in dynamic fifo function http://software-lab.de/doc/refF.html#fifo

kazinator9y ago

> don't squash the indices into the correct range when they are incremented, but when they are used to index into the array.

Great! Just don't use it if the indices are N bits wide and the array has 2N elements. :)

Not unheard of. E.g. tiny embedded system. 8 bit variables, 256 element buffer.

jstanley9y ago

I had to pause for a second to convince myself that the version relying on integer wrap-around is actually correct.

I guess that's the reason most people don't do it: they'd rather waste O(1) space than waste mental effort on trying to save it.

1 more reply

ansible9y ago

Hmm..., interesting.

I've always been doing it the "wrong" way, mostly on embedded systems. My classic application is a ring buffer for the received characters over a serial port. What's nice is that this sort of data structure doesn't need a mutex or such to protect access. Only the ISR changes the head, and only the main routine changes the tail.

noiv9y ago

Just in case, StackOverflow has some variations for JavaScript, although not that much optimized ;)

http://stackoverflow.com/questions/1583123/circular-buffer-i...

ared389y ago

Dumb question: why use power of two sized rings? If I know the reader won't be more than 100 behind the writer, isn't it better to waste one element of a 101 sized rings instead of 28 of a 128 sized ring?

ts3309y ago

i love that he has 20 different shoelace knots! life was too simple before now.

blauditore9y ago

> I've must have written a dozen ring buffers over the years

Why would someone do this instead of re-using previous (or third-party) implementations? Of course unless it's all in different languages, but I don't think that's the case here.

j / k navigate · click thread line to collapse

167 comments

89 comments · 25 top-level

phaemon9y ago· 18 in thread

> Join me next week for the exciting sequel to this post, "I've been tying my shoelaces wrong all these years".

Probably. Use the Ian Knot: http://www.fieggen.com/shoelace/ianknot.htm

Seriously, spend 20 mins practising this, and you'll never go back to the clumsy old way again.

Bognar9y ago

The Ian Knot is quick, but as someone who never ties their shoes and just slips them on and off, I much prefer Ian's Secure Knot: http://www.fieggen.com/shoelace/secureknot.htm

I usually tie this knot twice over the lifetime of a pair of shoes. Once when I get them, and once more when they're worn in and need to be tightened.

hangonhn9y ago

I preach this knot to everyone I can. I'm a runner and a running coach. I've run literally thousands of miles (approaching 10,000 at this point) with this knot and it has NEVER come undone.

The really nice thing about this knot is that it looks really nice too so you can use them on both running shoes and dress shoes.

It makes no sense to teach the more common shoe tying knots.

1 more reply

chrisseaton9y ago

I don't get it - both of these knots seem to be identical to the standard shoelace knot, just illustrated differently.

4 more replies

cema9y ago

This is the double slip knot, I think. I have recently started using it (the standard knot is going loose too fast the way I wear my shoes) and will never go back to the standard knot. Just so good.

1 more reply

tmaly9y ago

These look like variations on the square knot. How is he the inventor? I was looking at a field scout manual dated 1948 the other night where they have this same knot.

tbirdz9y ago

I've been using this knot for 8 years, and it hasn't come untied on me once! I'd highly recommend it.

barrkel9y ago

How do you take your shoes off without untying the knot?

fbonetti9y ago

That looks very similar to the handcuff knot

torrent-of-ions9y ago

Since learning the Ian knot (and correct starting knot) I can honestly say I enjoy tying my shoes every day and relish the opportunity to tie a bow at any other time.

schoen9y ago

Wow, after reading that page I realized that I did this all throughout elementary school and middle school, which explains why my shoelaces would come undone all the time.

I use the "Two Loop Shoelace Knot Bad Technique 1" from that page.

In recent years I've been wearing shoes with a different fastening mechanism, but I have to tie some dress shoes for a wedding tomorrow, so this is very timely knowledge!

dahart9y ago

kutkloon79y ago

More importantly, I can't seem to get a Ian's knot very tight. Does this get better over time?

RBerenguel9y ago

spatulon9y ago

I love that method, and have used it exclusively for years, but the best thing about Ian's site is the explanation of the Granny Knot: http://www.fieggen.com/shoelace/grannyknot.htm

flnz9y ago

thank you for this link. I didn't know about this website and I find it amazing. I just upgraded my shoes to a Secure Knot.

DonHopkins9y ago

I guess you're not a Perl programmer, otherwise you'd be using duct tape instead of shoe laces.

vaughngh9y ago

Been using this for years, symmetric knots ftw!

snerbles9y ago

I don't have to practice wearing loafers.

jgrahamc9y ago· 14 in thread

It's unclear to me why the focus on a 2^n sized buffer just so you can use & for the mask.

Edit: having had this discussion I've realized that Juho's implementation is different from the 1992 implementation I was using because he doesn't ever reset the read/write indexes. Oops.

wglb9y ago

I was doing this in 1992 so it's at least 12 years older than the 2004 implementation

Try late 1960s. Generally known then, widely used.

For an interesting proof about tokens in ring buffers, check out https://www.cs.utexas.edu/users/EWD/ewd04xx/EWD426.PDF, which, for 1974, has an interesting bit of multiprocessing.

kazinator9y ago

> It's unclear to me why the focus on a 2^n sized buffer just so you can use & for the mask.

The cost of a mask can probably be entirely buried in the instruction pipeline, so that it's hardly any more expensive than whatever it costs just to move from one register to another.

Modulo requires division. Division requires a hardware algorithm that iterates, consuming multiple cycles (pipeline stall).

ambrop79y ago

That's for "normal" ring buffers. I suspect that the design described in the article can be implemented for non power-of-two without division but I'll need to think about the details.

3 more replies

caf9y ago

  movl	$-252645135, %edx
  movl	%edi, %eax
  mull	%edx
  movl	%edx, %eax
  shrl	$4, %eax
  movl	%eax, %edx
  sall	$4, %edx
  addl	%edx, %eax
  subl	%eax, %edi

2 more replies

jwatte9y ago

The real problem with modulo is at the end of the integer range. Add one and overflow and suddenly jump to a totally different index in the array!

BTW: read and write pointers, power of two, did that in BeOS (1998) and many sound drivers did it earlier than that. To me, that seemed like the obvious way to do it when I needed it.

jsnell9y ago

It's not just an optimization, it's necessary for correct operation. With a non-power of two buffer the integer wraparound causes a discontinuity.

heisenbit9y ago

As far as I can tell that int wrap around could be avoided by

- subtracting buffer size from both pointers once the read pointer has wrapped.

- choosing a longer int for the math operation where possible

That seems a small price for the freedom to be able to choose an appropriate buffer size.

2 more replies

jgrahamc9y ago

No, it's an optimization.

3 more replies

fla9y ago

Depending on the machine, a modulo can cost _alot_ .

Last time I checked, the operation cost was 26k cycles on my PIC.

Using a 2^n + mask made my queue perform 10 times faster (if not more).

jstimpfle9y ago

Don't use a modulo then. Use subtraction.

1 more reply

pklausler9y ago

NotThe1Pct9y ago

Me too, back in 2005 we were already doing all this. Also there is an easy hack that solves the overflow part.

kitsuac9y ago

jgrahamc9y ago

But, hey, if memory is free use 2^n bytes for your buffer.

2 more replies

planckscnst9y ago· 6 in thread

This is another interesting ring buffer implementation that uses mmap. https://github.com/willemt/cbuffer

lomnakkus9y ago

leni5369y ago

With the additional benefit that one can have arbitrary slices between head and tail as a contiguous memory region.

ohazi9y ago

That's so cool. Unfortunately for me, the one time I could have used something like this, I was working on an embedded system with no mmap / virtual memory.

AndyKelley9y ago

Here's another implementation that works on Windows too: https://github.com/andrewrk/libsoundio/blob/master/src/ring_...

csl9y ago

This seems to use modulus. The whole point of the mmap trick is to get the kernel/MMU to do the work for you, IIRC.

EDIT: Oops, I see they use mirrored memory here as well.

jevinskie9y ago

Mike Ash talks about an implementation for macOS/iOS: https://www.mikeash.com/pyblog/friday-qa-2012-02-03-ring-buf...

ams61109y ago· 5 in thread

Why do people use the version that's inferior and more complicated?

Because it's easier to understand at first glance, has no performance penalty, and for most busy programmers that often wins.

hzhou3219y ago

kbenson9y ago

The third version also allows for the write index to be a counter of total store operations, at least until overflow, which could be useful.

alfalfasprout9y ago

IMO even in realtime systems, I don't use this. Heck, the linux kernel even uses the original version.

gwu789y ago

"Why do people use the version that's inferior and more complicated?"

This question needs little context to be relevant, so long as the topic is "computer programming".

Certainly not limited to writing ring buffers. It could be an apropos comment in almost any discussion.

Of course in many cases, the part about "no performance penalty" does not apply. Performance is a routine trade off for some other perceived gain.

falcolas9y ago

And you don't have expend any mental energy on the integer overflow edge case. It should be handled by using a bitmask and a power-of-2 sized array, should.

falcolas9y ago· 5 in thread

dllthomas9y ago

> (I'd cache that mask with the array to reduce runtime calculations)

So, store size-1 instead of size, and add one when asked for the size? I can see that, though I'm not confident it's worth the conceptual overhead.

If you mean storing it in addition to the size, I think that's a bad trade - cache is far more precious than many decrements.

Of course, if the size is fixed at compile time, the mask will probably be stored baked into the instructions (andl <const>, ...).

Bartweiss9y ago

In general, this makes sense; certainly data you're putting into a ring buffer is data you're willing to lose.

falcolas9y ago

> Doesn't it break the order invariant of the buffer, though

No, if you increment the read pointer prior to the write pointer, the read pointer will still point at the oldest valid value in the buffer.

So, in pseudo code:

    if (w+1 >= r) {
       r = w + 2
    }
    w++
    b[w-1] = value

dllthomas9y ago

> certainly data you're putting into a ring buffer is data you're willing to lose.

When that's the case, a ring buffer is a great choice. It's not required, though - the writer could block when it detects a full buffer.

blub9y ago

This is exactly what I was thinking.

When pushing D in their example they overwrite the value to be read and items are out of order now.

But maybe I'm missing something, I lost interest at all the bit-twiddling.

hzhou3219y ago· 4 in thread

He keeps stating the case of one-element ring buffer. Is that a real concern ever?

qb459y ago

Probably it was a joke though one can imagine the size being configurable which surely would lead to interesting results if somebody sets it to 1 for some reason (like troubleshooting).

alanbernstein9y ago

It seemed like a sarcastic comment to me. Why would that ever be used?

jsnell9y ago

It's indeed a ridiculous data structure, but I did actually need it.

  struct rb {
      union {
          Value* array;
          // Set N such that this array uses the same amount of space as the pointer.
          Value inline_array[N];
       };
      uint16_t read;
      uint16_t write;
      uint16_t capacity;
  }

lomnakkus9y ago

falcolas9y ago· 4 in thread

My C is rusty, but won't this act... oddly... on integer overflow?

    size()     { return write - read; }

0 - UINT_MAX -1 = ?

[EDIT] Changed constant to reflect use of unsigned integers, which I forgot to specify initially.

crististm9y ago

Actually, this method counts on it.

What I find interesting are the trade-offs: machine vs explicit integer wrap-around and buffers with maximum ~size(int)/2 vs ~size(int).

falcolas9y ago

Got it. Modular arithmetic was the term I was looking for to resolve this.

    (0 - (2^32 - 1)) % 2^32 = 1

mfukar9y ago

In all examples, `read` and `write` are unsigned, and since they both are the same type, no integer conversions are performed, ergo no overflow.

PS. No wrap-around either, for different reasons.

falcolas9y ago

> No wrap-around either, for different reasons.

You'll have to explain that to me, since I can't assign `x = 2^32` without wraparound when x is an unsigned 32 bit integer.

zimpenfish9y ago· 2 in thread

If you use modulus instead of bitmasking, it doesn't have to be power-of-2 size, does it?

DblPlusUngood9y ago

No, the size of the array doesn't need to be a power-of-2 if you use modulus to derive indices. But you need to deal with the overflow somehow. For instance:

0xffffffff % 7 = 3, but (0xffffffff + 1) % 7 = 0.

RBerenguel9y ago

Also as mentioned elsewhere in the comments, modulo is expensive, even more for non-powers of 2

2 more replies

doktrin9y ago· 2 in thread

> So there I was, implementing a one element ring buffer. Which, I'm sure you'll agree, is a perfectly reasonable data structure.

I didn't even know what a ring buffer was

where do I dispose of my programmer membership card?

edit : lol, what a hostile reaction...

doktrin9y ago

I honestly can't tell whether the downvotes are from elitist neckbeards or offended plebs

pls explain I'd love to hear

mikekchar9y ago

1 more reply

ChuckMcM9y ago· 1 in thread

    EMPTY -> (read == write)
    FULL -> (read == (write + SIZE) % (2 * SIZE))

nickodell9y ago

No, I think the author is using the full range of a 32 bit int. So read could be any 32 bit integer, even if the size of the ring is 1.

(The trick is that SIZE has to be a power of two, or else when you increment from 2^32-1 to 0, your pointers will jump to a different position in the array.)

dom09y ago· 1 in thread

A very related post by ryg: https://fgiesen.wordpress.com/2010/12/14/ring-buffers-and-qu...

gcatlin9y ago

Another one by ryg: https://fgiesen.wordpress.com/2012/07/21/the-magic-ring-buff...

phkahler9y ago· 1 in thread

buzzybee9y ago

geophile9y ago· 1 in thread

His favored solution introduces subtlety and complexity. Remember that 20-year old binary search bug in the JDK a few years ago? That is the sort of bug that could be lurking in this solution.

simonbw9y ago

> Why not just use first and count variables?

I think he addressed that in the post:

cannam9y ago

In most practical applications, it's memory barriers that you really have to worry about.

tveita9y ago

The Linux kernel seems to leave one element free, which surprised me, but it does have this interesting note about it:

https://www.kernel.org/doc/Documentation/circular-buffers.tx...

  Note that wake_up() does not guarantee any sort of barrier unless something
  is actually awakened.  We therefore cannot rely on it for ordering.  However,
  there is always one element of the array left empty.  Therefore, the
  producer must produce two elements before it could possibly corrupt the
  element currently being read by the consumer.  Therefore, the unlock-lock
  pair between consecutive invocations of the consumer provides the necessary
  ordering between the read of the index indicating that the consumer has
  vacated a given element and the write by the producer to that same element.

RossBencina9y ago

From what I understand, this is the way you'd do it with hardware registers (maintain the read and write indices each with one extra MSB to detect the difference between full/empty).

We've been using similar code in PortAudio since the late 90s[0]. I'm pretty sure Phil Burk got the idea from his hardware work.

[0] https://app.assembla.com/spaces/portaudio/git/source/master/...

pawadu9y ago

> This is of course not a new invention

No, this is a well known construct in digital design. Basically, for a 2^N deep queue you only need two N+1 bit variables:

http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIF...

tankfeeder9y ago

PicoLisp: last function here as circular buffer task https://bitbucket.org/mihailp/tankfeeder/src/3258edaded514ef...

build in dynamic fifo function http://software-lab.de/doc/refF.html#fifo

kazinator9y ago

> don't squash the indices into the correct range when they are incremented, but when they are used to index into the array.

Great! Just don't use it if the indices are N bits wide and the array has 2N elements. :)

Not unheard of. E.g. tiny embedded system. 8 bit variables, 256 element buffer.

jstanley9y ago

I had to pause for a second to convince myself that the version relying on integer wrap-around is actually correct.

I guess that's the reason most people don't do it: they'd rather waste O(1) space than waste mental effort on trying to save it.

1 more reply

ansible9y ago

Hmm..., interesting.

noiv9y ago

Just in case, StackOverflow has some variations for JavaScript, although not that much optimized ;)

http://stackoverflow.com/questions/1583123/circular-buffer-i...

ared389y ago

ts3309y ago

i love that he has 20 different shoelace knots! life was too simple before now.

blauditore9y ago

> I've must have written a dozen ring buffers over the years

Why would someone do this instead of re-using previous (or third-party) implementations? Of course unless it's all in different languages, but I don't think that's the case here.

j / k navigate · click thread line to collapse