I'm gonna try to address everything I can in this reply. Sorry it has taken so long.
> C compilers existed in 1989 that would assume different structure types with common initial sequences couldn't alias? With the "common initial sequence" carve-out in §3.3.2.3? I am sceptical...
That's not quite what I was talking about. What some compilers would do is not generate extra reads when you had a `long * ` and a `int * ` in the same scope - with the idea being that those two cannot point to the same data, and thus it is not necessary to reread the data from the `int * ` when you write to the `long * `. Compilers have now taken it a slight step further - an `int` that belongs to a `struct a` and an `int` that belongs to a `struct b` can't alias - but it is really not much different from the original idea (And hence why it is legal). What the standard really describes is that objects have a single defined type and that accessing objects though a type other then their original type is invalid, which fits with what compiler writers have taken to doing. That said, I would not be opposed to the standard simply making that legal. While avoidable in most cases, it does cause problems in some instances (BSD sockets being a very notable example), and I'd wager it only brings marginal optimizations (for which `restrict` already provides a solution).
> I do not think this is that easy when you include sockaddr_un into the mix, because of the way that sockaddr doesn't include the full size of its path member. This is, in fact, the point that I throw up my hands and use -fno-strict-aliasing because the fact that pointer provenance, rather than just value and type, is important together with the fact that it's not actually clear whether you've correctly laundered the pointer through a union or not, makes it all too... grey.
Technically, you could use a `char` array for the `sockaddr_un`, and then just cast it to the right type. That's legal because `char` can alias. That said I'm fairly sure that `sockaddr_un` has a defined size - it doesn't use a FAM in implementation, it's just that the length of it's path member can vary. The POSIX standard isn't as clear as can be, but notes that it is left undefined only for the reason that different Unix's use different max lengths, and it says that it's typically somewhere in the range of 92 to 108. That along with the typical usage of `sockaddr_un` implies to me that it is perfectly fine to declare one, it just doesn't have a guaranteed length. Used in a `union` it should be fine. (All that said, I think what you've said also shows another current issue with C - there should be a way to statically declare a `struct` that has a FAM at the end by providing the length for the FAM. There's no way to do this currently except using a `char` array and casting, which is not an acceptable solution IMO).
On that note though, the entire issue here could actually be largely resolved by simply adding the `may_alias` gcc attribute to the definition of `struct sockaddr` (And `struct sockaddr_storage`). It would declare that a `sockaddr` can alias any other type you want, and thus would make it legal to read from a `sockaddr` and then cast it to another type for use - removing the need for the `union` BS and all the other various hacks to get around this issue. Obviously that's not standard C, but I think it makes a pretty good argument that adding something like `may_alias` to the standard would be a very good addition.
And that touches only the larger problem with strict-aliasing that I see - there's no way to avoid it. We have `restrict` which ironically allows us to avoid the aliasing problem for pointers which strict-aliasing doesn't apply, but we have no way to tell the compiler two pointers (or types) can alias when it thinks they can't. `may_alias` is one solution, but really any solution that fixes would problem would be extremely welcome in my book. I think the standards writers currently consider `union` to be the solution, but IMO that's simply not sufficient.
> No, that isn't guaranteed by POSIX - it has to have an sa_family_t member, but it doesn't have to be the first one.
>
> I also think it's problematic that the use explicitly contemplated by POSIX is considered ill-formed C.
As long as all the sa_family_t members in all of the various `sockaddr` types overlap then you could make it work (If they don't overlap I fell like that would create lots of other issues). Obviously though this is a pretty clumsy solution.
And I would agree - I wouldn't say it's anybodies particular fault that we've hit this particular point (Though you could argue that compiler writers jumped the gun on this one), but it is an issue worth addressing. I do think it's possible to use it correctly through the usage of a few different techniques, but 1. most programs already written don't do that, and 2. like I said before, you shouldn't have to go through a million hoops (that aren't even mentioned) to use the interface correctly.