Hell, I've slung C structs across the network between 3 CPU architectures. And I didn't even use htons!
Maybe it's not portable to some ancient architecture, but none that I have experienced.
If there is undefined behavior, it's certainly never been a problem either.
And I've seen a lot of talk about TLB shootdown, so I tried to reproduce those problems but even with over 32 threads, mmap was still faster than fread into memory in the tests I ran.
Look, obviously there are use cases for libraries like that, but a lot of the time you just need something simple, and writing some structs to disk can go a long way.