You can do exactly this with MySQL and SQL server too because they both support SKIP LOCKED.
Interestingly, the plain old file system on Linux also makes the basis of a perfectly acceptable message queue for many use cases - the thing that makes it work is that the file move operation is atomic. Atomic moves are what make queuing systems possible.
You could write a file system based message queue in 100 lines of async python, which I did here:
https://github.com/bootrino/arniesmtpbufferserver
File system based message queues can be written in any language, extremely simple and, most importantly - zero configuration. One of the most frustrating things about queuing systems is configuration - that includes database backed queuing systems. They can also be fast - I wrote one in Rust which maxed out the hard disk's random write capability well before maxing out the CPU - from memory it beat most of the common queuing systems in terms of messages per second.
Not all use cases for queues need to be able to globally distributed messages queues with the sort of guarantees needed for financial transaction processing. I would suggest to you that in fact most queues out there are used as outbound SMTP queues, which are then over engineered to use something like Celery, which is a nightmare to configure and debug.
Do you have any experience with either?
Lock free.
Also listdir is a big bottleneck here:
while True:
# get files in outbox
files_in_outbox = [f'{PREFIX}/outbox/{x}' for x in os.listdir(f'{PREFIX}/outbox')]It's the move/rename that is atomic.
I used to work at an brokerage that worked with a panel of around 12 providers, all of whom offered 5+ products that were updated multiple times per year to meet changing regulatory requirements. When a sales adviser recommended product X from provider Y to a customer, the adviser would then need to fill in an application form that was precisely tailored to that particular revision of that particular product. Bear in mind that these were complex, multi-sectioned forms. Needless to say, this created a huge workload for the devteam to keep track of all the product changes and update the UI accordingly each time.
At some point, someone on the devteam had the genius idea to simply take the PDF application forms from the provider, extract the individual pages as PNGs and overlay HTML form elements on top of them. The provider would essentially be doing our UI design for us. Add in an editor tool so the sales managers could set up the forms themselves and a tagging system so specific fields could be pre-filled from customer data we already had stored in the DB and the devteam's workload dropped by maybe 90%. Simple, stupid perhaps, but effective.
Seriously... How can you ever consider saying "a rdbms is just fine as a kafka alternative" under those conditions ?
Your messages expire so if you want to archive you need a consumer to write them to disk. That sounds very similar.
There's alot of people who are ideologically opposed to database backed message queues. They're usually reluctant to give detailed explanations why, because it's an emotional thing.
Funny, we were going to use kafka too, but just sending to mysql worked just fine <shrug>. One day that will change, obviously.
Why do you have to lose things if the DB goes down? Agreed, untuned & unconfigured MySQL (and MongoDB) out-of-the-box can lose things due to bugs and design issues, but that is the case even when they are running. However, DBs, in general are made precisely for the purpose of safely storing things and not losing them.
OTOH, the number of Kafka setups I have seen that'd lose things when something goes down ... maybe this is not a guaranteed win for the Kafka side of arguments.
> Having a distributed, resilient queue has availability benefits.
High availability is not a function exclusive to Kafka. On the other hand, there's some functions that may come in handy to use in a queue that Kafka simply cannot provide, but DBs can. Off the top of my head: ACID, instant scalability (both up and down) of consumer groups, and the sheer flexibility (and power) that comes with a DB in general.
----
Overall, there's some merits to using a distributed log as a message queue, sure, but there are also merits to using a DB for that.
Queues are great for semi-infinite scalability, but you rarely need it.
There's numerous subtle benefits to using db compared to regular message queues that are often overlooked.
Being able to delete, reorder, or edit specific messages can be a lifesaver when things go wrong.
Most common issue with queue-based systems is getting overwhelmed with messages. Either your consumers went down. Or your producers had a bug / executed too frequently.
Being able to recover with a simple SQL query is a blessing.
1. Huge number of messages from test system were accidentally inserted into production.
Queue solution: disable consumers, move messages to a temporary queue while filtering them, move messages back to the old queue, enable consumers
DB solution: just delete rogue messages
2. We want to store some of the messages, but we're not ready to process them yet
Queue solution: create a separate queue for each message type, insert messages to different queues, manually move them when you're ready to process them (and keep in mind that not every queue can persist messages forever, SQS for example can't hold messages longer than 14 days)
DB solution: just skip those messages while processing
3. Consumers went down and the queue now contains a big number of duplicate messages. While it was fine to just wait a couple hours to let it stabilize, a whale customer started complaining
Queue solution: none (any hacky solution would take longer than it takes for the system to naturally stabilize)
DB solution: move whale customer messages to the front of the queue
2. Kafka - do what you did with your DB consumers.
3. Kafka - consumers going down can't cause duplicates, what's even going on with your queue?
But you do have a lot of valid points, particularly about failure recovery. My approach tends to involve a hybrid of the two if high speed is a necessary component somewhere. The system can restore its runtime state from a db, but the active mechanics of it are running on message passing of some kind.
They avoided over engineering and building more complexity than required.
Google Spanner for example is up to five nines. Are there a lot of systems that need to be three orders of magnitude more reliable than Google Ads?
Wouldn’t it have been easier to just use Kafka?
"We didn't actually evaluate Kafka for this use case, because we know Postgres, so we used that."
Weird article angle.
It seems like they only tried postgres? so they didn't compare it with kafka at all....so yeah, you can make a message queue or log table in a rdbms.
Maybe it’s not that Postgres is a better than Kafka as much as Postgres is a better solution for what they were trying to do and use Kafka?
LOL