undefined | Better HN

0 pointsryao1y ago0 comments

There are actually several distinct issues being reported there. I replied responding to everyone who posted backtraces and a few who did not:

https://github.com/openzfs/zfs/issues/9130#issuecomment-2614...

That said, there are many others who stress ZFS on a regular basis and ZFS handles the stress fine. I do not doubt that there are bugs in the code, but I feel like there are other things at play in that report. Messages saying that the txg_sync thread has hung for 120 seconds typically indicate that disk IO is running slowly due to reasons external to ZFS (and sometimes, reasons internal to ZFS, such as data deduplication).

I will try to help everyone in that issue. Thanks for bringing that to my attention. I have been less active over the past few years, so I was not aware of that mega issue.

0 comments

3 comments · 1 top-level

ein0p1y ago· 2 in thread

Regarding your comment - seems unlikely that it "affects Ubuntu less". I don't see why that would be the case - it's not like Ubuntu runs a heavily customized kernel or anything. And thanks for taking a look - ZFS is just the way things should be in filesystems and logical volume management, I do wish I could stop doing hash compares after large, high throughput copies and just trust it to do what it was designed to do.

ryaoOP1y ago

Ubuntu kernels might have a different default IO elevator than proxmox kernels. If the issue is in the IO elevator (e.g. it is reordering in such a way that some IOs are delayed indefinitely before being sent to the underlying device) and the two use different IO elevators by default, then it would make sense why Ubuntu is not affected and proxmox is. There is some evidence for this in the comments as people suggest that the issue is lessened by switching to mq-deadline. That is why one of my questions asks what Linux IO elevator people’s disks are using.

The correct IO elevator to use for disks given to ZFS is none/noop as ZFS has its own IO elevator. ZFS will set the Linux IO elevator to that automatically on disks where it controls the partitioning. However, when the partitioning was done externally from ZFS, the default Linux elevator is used underneath ZFS, and that is never none/noop in practice since other Linux filesystems benefit from other elevators. If proxmox is doing partitioning itself, then it is almost certainly using the wrong IO elevator with ZFS, unless it sets the elevator to noop when ZFS is using the device. That ordinarily should not cause such severe problems, but it is within the realm of possibility that the Linux IO elevator being set by proxmox has a bug.

I suspect there are multiple disparate issues causing the txg_sync thread to hang for people, rather than just one issue. Historically, things that cause the txg_sync thread to hang are external to ZFS (with the notable exception of data deduplication), so it is quite likely that the issues are external here too. I will watch the thread and see what feedback I get from people who are having the txg_sync thread hang.

ein0p1y ago

Thanks a lot for elaborating. I'm traveling at the moment, but I'm going to try reproducing this issue once I'm back in town. IIRC I did do partitioning myself, using GPT partition table and default partition settings in fdisk.

Upd mq-deadline for all drives seems to be `none` for me. OS is Ubuntu 22.04

1 more reply

j / k navigate · click thread line to collapse