Write to ring: ring[511&(head++)] = data
Read from ring: data = ring[511&(tail++)]
Ring is empty: head == tail
Ring is full: tail + 512 == headThe flag could have been hidden in any other field as a bit or something. Then it could be at least masked with simple AND operation which is usually faster than branching, especially on pipelined CPUs.
Update: Quick implementation: https://gist.github.com/dpc/a194b7784adfa150a450
This fix for concurrency issue is an ugly hack. I'm not sure if it's even correct in this particular scenario, and definitely not proper for anything that would aspire to be good reusable code. I'd advise this code to push atomicity requirement onto caller. Irqs should have been disabled by calling code.
"register" keyword is obsolete. There's no point in using it.
An unfortunate effect of this implementation is that both indices are modified by the consumer. It's not safe to write/read dats from different contexts, something that would be very useful in a driver.
Now it makes sense.
It's easier to understand without the decrementing. Initialize head and tail to 0, keep incrementing and cycling through the end as usual. When removing an element check if head and tail are equal before removing. When adding an element check if head and tail are equal after adding, and set the head to a sentinel value, say MAX_INT. Now you can check for the sentinel before attempting to add an element.
Neat trick!