undefined | Better HN

0 pointsbeefhash6y ago0 comments

Wait, doesn't this mean that the BSD sockets API is inherently dependent on UB, casing different socket types to each other and sometimes only using the first few members, or am I misunderstanding you?

0 comments

7 comments · 3 top-level

pascal_cuoq6y ago· 4 in thread

Yes and no.

The thing I am describing is when you link a compilation unit using:

  struct internal_state { int dummy; } state;

with another compilation unit that defined the same state differently:

  struct internal_state {
     int actual_meaningful_member_1;
     unsigned long actual_meaningful_member_2; } state;

As far as I know, BSD socked do not do this. Zlib was doing this (https://github.com/pascal-cuoq/zlib-fork/blob/a52f0241f72433... ), but I have had the privilege of discussing this with Mark Adler, and I think the no-longer-necessary hack was removed from Zlib.

BSD sockets probably have a different kind of UB, related to so-call “strict aliasing” rules, unless they have been carefully audited and revised since the carefree times in which they were written. I am going to have to let you read this article for details (example st1, page 5): https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.p...

loeg6y ago

BSD sockets are weird in that the first struct's (sockaddr) size wasn't big enough, so APIs all take a nominal pointer to sockaddr but may require larger storage (sockaddr_storage) depending on the actual address.

  /*
   * Structure used by kernel to store most
   * addresses.
   */
  struct sockaddr {
          unsigned char   sa_len;         /* total length */
          sa_family_t     sa_family;      /* address family */
          char            sa_data[14];    /* actually longer; address value */
  };


  /*
   * RFC 2553: protocol-independent placeholder for socket addresses
   */
  #define _SS_MAXSIZE     128U
  #define _SS_ALIGNSIZE   (sizeof(__int64_t))
  #define _SS_PAD1SIZE    (_SS_ALIGNSIZE - sizeof(unsigned char) - \
                              sizeof(sa_family_t))
  #define _SS_PAD2SIZE    (_SS_MAXSIZE - sizeof(unsigned char) - \
                              sizeof(sa_family_t) - _SS_PAD1SIZE - _SS_ALIGNSIZE)
  
  struct sockaddr_storage {
          unsigned char   ss_len;         /* address length */
          sa_family_t     ss_family;      /* address family */
          char            __ss_pad1[_SS_PAD1SIZE];
          __int64_t       __ss_align;     /* force desired struct alignment */
          char            __ss_pad2[_SS_PAD2SIZE];
  };

wahern6y ago

struct sockaddr_storage is insufficient as well. A Unix domain socket path can be longer than `sizeof ((struct sockaddr_un){ 0}).sun_path`. That's a major reason why all the socket APIs take a separate socklen_t argument. Most people just assume that a domain socket path is limited to a relatively short string, but it's not (except possibly Minix, IIRC).

1 more reply

haberman6y ago

I'm curious what exactly makes this undefined behavior.

And in particular, what about something like this?

    struct Foo {
    #ifdef __cplusplus
      int bar() const { return bar_; }
     private:
    #endif
      int bar_;
    };

Or, taking this a step further:

    struct _Foo;
    typedef struct _Foo Foo;

    // In C "struct _Foo" is never defined.
    int Foo_bar(const Foo* foo) { return *(int*)foo; }
    void Foo_setbar(Foo* foo) { *(int*)foo; }
    Foo* Foo_new() { return malloc(sizeof(int)); }

    #ifdef __cplusplus
    struct _Foo {
      void set_bar() { bar_ = bar; }
      int bar() const { return bar_; }
     private:
      int bar_;
    };
    #endif

The above isn't ideal but it does provide encapsulation in a way that doesn't seem to violate strict aliasing (the memory location is consistently read/written as "int").

pascal_cuoq6y ago

I think this is plenty ok. For one thing, If a struct as a member of type T, it's ok to access it through a pointer to T (and also the address of the struct is guaranteed to be identical to the address of the first member). For another, you are using dynamically allocated memory, so the only thing that matters is the type of the pointer when the access is finally made. It doesn't matter that it was a Foo* before, if what you dereference is an int*.

This is different from pretending that the address of a struct s { int a; double b; } is the address of a struct t { int a; long long c; } and accessing it through a pointer to that. If you do that, C compilers will (given the opportunity) assume that the write-through-a-pointer-to-struct-t does not modify any object of type “struct s”. This is what the example st1 in the article illustrates.

The latter is what I suspect plenty of socket implementations still do (because there are several types of sockets, represented by different struct types with a common prefix). It is possible to revise them carefully so that they do not break the rules, but I doubt this work has been done.

1 more reply

loeg6y ago

Yeah, the BSD socket API is kind of terrible like that. You could consider it an unspecified union type, or use memcpy() exclusively to access it safely.

emilfihlman6y ago

Yeah, it depends on well agreed convention but which is ub according to the standard.

j / k navigate · click thread line to collapse

0 comments

7 comments · 3 top-level

pascal_cuoq6y ago· 4 in thread

Yes and no.

The thing I am describing is when you link a compilation unit using:

  struct internal_state { int dummy; } state;

with another compilation unit that defined the same state differently:

  struct internal_state {
     int actual_meaningful_member_1;
     unsigned long actual_meaningful_member_2; } state;

loeg6y ago

  /*
   * Structure used by kernel to store most
   * addresses.
   */
  struct sockaddr {
          unsigned char   sa_len;         /* total length */
          sa_family_t     sa_family;      /* address family */
          char            sa_data[14];    /* actually longer; address value */
  };


  /*
   * RFC 2553: protocol-independent placeholder for socket addresses
   */
  #define _SS_MAXSIZE     128U
  #define _SS_ALIGNSIZE   (sizeof(__int64_t))
  #define _SS_PAD1SIZE    (_SS_ALIGNSIZE - sizeof(unsigned char) - \
                              sizeof(sa_family_t))
  #define _SS_PAD2SIZE    (_SS_MAXSIZE - sizeof(unsigned char) - \
                              sizeof(sa_family_t) - _SS_PAD1SIZE - _SS_ALIGNSIZE)
  
  struct sockaddr_storage {
          unsigned char   ss_len;         /* address length */
          sa_family_t     ss_family;      /* address family */
          char            __ss_pad1[_SS_PAD1SIZE];
          __int64_t       __ss_align;     /* force desired struct alignment */
          char            __ss_pad2[_SS_PAD2SIZE];
  };

wahern6y ago

1 more reply

haberman6y ago

I'm curious what exactly makes this undefined behavior.

And in particular, what about something like this?

    struct Foo {
    #ifdef __cplusplus
      int bar() const { return bar_; }
     private:
    #endif
      int bar_;
    };

Or, taking this a step further:

    struct _Foo;
    typedef struct _Foo Foo;

    // In C "struct _Foo" is never defined.
    int Foo_bar(const Foo* foo) { return *(int*)foo; }
    void Foo_setbar(Foo* foo) { *(int*)foo; }
    Foo* Foo_new() { return malloc(sizeof(int)); }

    #ifdef __cplusplus
    struct _Foo {
      void set_bar() { bar_ = bar; }
      int bar() const { return bar_; }
     private:
      int bar_;
    };
    #endif

The above isn't ideal but it does provide encapsulation in a way that doesn't seem to violate strict aliasing (the memory location is consistently read/written as "int").

pascal_cuoq6y ago

1 more reply

loeg6y ago

Yeah, the BSD socket API is kind of terrible like that. You could consider it an unspecified union type, or use memcpy() exclusively to access it safely.

emilfihlman6y ago

Yeah, it depends on well agreed convention but which is ub according to the standard.

j / k navigate · click thread line to collapse