Modernizing C arrays for greater memory safety: a case study in the Linux kernel (opens in new tab)

(people.kernel.org)

284 pointsdiegocg3y ago118 comments

118 comments

61 comments · 11 top-level

WalterBright3y ago· 14 in thread

> int flex[] __attribute__((__element_count__(items)));

While what the article describes is clever, it is needlessly complex, and filled with various compiler switches and extensions.

In contrast, here's a stupid simple approach:

https://www.digitalmars.com/articles/C-biggest-mistake.html

where bounds-checkable arrays are declared as:

    int a[..];

`a` consists of two fields, a `length` and a `pointer`. Indexing it means the compiler can (optionally) insert a bounds check it.

    int s[..] = "string";
    s[10] = 'x'; // fatal runtime error

We can turn a pointer into a bounds checked array by "slicing" it:

    int *p = (int*) malloc(10);
    int a[..] = p[0 .. 10];

A bounds checked array can be turned into a pointer:

    int *p = &a[3];  // point to 3rd element of a[..]

That's all there is to it. No pages and pages of compiler switches and extensions.

Does it work? We've been doing that with D for over 20 years. Hell yeah, it works. It works fantastically well. It does not disturb any existing C code.

ndesaulniers3y ago

That's nice, but attributes let us easily retrofit existing C code such as the Linux kernel in a way that supports multiple compilers and compiler versions. Just extensions, not compiler switches. And they don't muck with the ABI which is a requirement for stable kernel driver interfaces.

Also what you're proposing...would be an extension!

WalterBright3y ago

Reading the article, it doesn't look easy at all.

With the [..] proposal, it is easy enough to convert it back and forth between pointers and [..] to conform to required interfaces. One could even make the [..] implicitly convertible to a pointer.

2 more replies

downvotetruth3y ago

Starting with "That's nice" is extremely flippant and is an immediate turn off to any subsequent statement. A stronger opening to discredit the [..] proposal would be to take the closing quote "We've been doing that with D for over 20 years" and point out that storing the length of an array of a certain type is a specialization to arrays of dependent typing that has been around since Howard and de Bruijn extended lambda calculus to match predicate logic by creating types for dependent functions and pairs. Each axis of the lambda cube is another source of nondeterminism that has to be reasoned with and languages that explicitly exclude said functionality for simplicity can point to as justification for such restrictions.

3 more replies

eklitzke3y ago

This isn't the same thing though. A flex array includes the data inline in the struct so allocating a struct with a flex array at the end requires just one call to malloc and avoids a pointer indirection when indexing into the array.

WalterBright3y ago

Bounds checked arrays do not have an extra indirection.

1 more reply

lelanthran3y ago

I like how this is going, but I'm missing a few things here:

> We can turn a pointer into a bounds checked array by "slicing" it:

> int *p = (int*) malloc(10); > int a[..] = p[0 .. 10];

When `p` is a parameter in a function, the function cannot know that it can create a slice of up to 10 elements (I assume that the `p[0 .. 10]` creates an array indexed from 0 - 9).

What if the line was:

     int a[..] = p[0..12]

Do we still get undefined behaviour?

> A bounds checked array can be turned into a pointer:

> int *p = &a[3]; // point to 3rd element of a[..]

Assuming that a indexes from 0 to 9, what happens when we use p with an out of range index, for example:

     int *p = &a[8];
     blah = p[3];

My main concern is how to tell other functions that the array has a maximum size, and how to determine (inside a function) what the maximum length of its parameters is.

WalterBright3y ago

> When `p` is a parameter in a function, the function cannot know that it can create a slice of up to 10 elements (I assume that the `p[0 .. 10]` creates an array indexed from 0 - 9).

That's right, when a bounds checked array is converted to a pointer, the bounds does not go with it. Presumably, the function receiving the p has some way to determine the length (such as strlen, or via another parameter) from which the correct array can be reconstructed by doing a slice.

> What if the line was: int a[..] = p[0..12] Do we still get undefined behaviour?

Yes, if the 12 extends past the end of the data p points to.

> Assuming that a indexes from 0 to 9, what happens when we use p with an out of range index, for example: int *p = &a[8]; blah = p[3];

You get undefined behavior.

> My main concern is how to tell other functions that the array has a maximum size

The same way it's done now, by strlen, passing another argument with the length, or the function is able to get the length by other means. When a bounds checked array is converted to a pointer, the bounds are not part of the pointer.

1 more reply

crabbone3y ago

You aren't going to get a lot of love from C programmers by casting malloc() result...

uecker3y ago

We have bounds-checkable arrays already since C99:

int (p)[n] = malloc(sizeof p); (*p)[i] = 1; // run-time bounds check

https://godbolt.org/z/vb8dqx1od

But yes, having a type that included the bound makes sense. But I do not think using array syntax for pointers as in your proposal makes any sense.

Dennis Ritchie got it right: https://www.bell-labs.com/usr/dmr/www/vararray.pdf

Ritchie DM. Variable-size arrays in C. The Journal of C Language Translation 1990;2:81-86.

WalterBright3y ago

> We have bounds-checkable arrays already since C99

    void foo(int n, int (*p)[n]) {
      (*p)[n] = 1;
    }

which has failed to catch on, because it still stores the pointer and the length as two separately handled objects.

> Dennis Ritchie got it right

"This paper proposes to extend C by allowing pointers to adjustable arrays and arranging that the pointers contain the array bounds necessary to do subscript calculations and compute sizes."

It appears to be phat pointers.

1 more reply

cassepipe3y ago

I had always assumed it was because of backwards compat that the proposal was never accepted but since it turns out that is not the issue, do you have any idea why the proposal was never accepted ?

pjmlp3y ago

Objective-C and C++ never had any issues having additionaly types for arrays and strings, C could do the same, but WG14 has clearly decided they don't want to do that.

1 more reply

planede3y ago

The problem is that this would have one specific ABI, which probably wouldn't match many existing structs with a flexible array member at the end. Could potentially be used for new code (while requiring every user to upgrade the standard they compile with), but has the risk of not usable for modernizing old code.

WalterBright3y ago

You're right that bounds checked arrays do nothing at all for existing code. But they can be added incrementally to an existing code base, as a normal part of working on the code.

In that aspect it's like when prototypes were added to C. Nothing changed for existing code, but prototypes are so advantageous people would retrofit existing code incrementally when doing routine maintenance.

1 more reply

segfaultbuserr3y ago· 12 in thread

For code that is critical to performance, C99's "flexible array at the end of a struct" is an useful tool. It basically allows you to attach a header at the beginning of some dynamically-allocated binary data of infinite length (yes, it can be implemented as a pointer at the end of the struct, but the extra latency of another pointer chasing can reduce performance). Before C99, the "size-1 hack" or "size-0 GCC extension" for this purpose was already widespread in both the Linux kernel and Windows [1], but with the disadvantage of triggering memory-safety tools, as the author pointed out.

Meanwhile, unlike C99, this construction is not allowed by any version of the C++ standards, any such use would be a non-standard extension, I think this is unfortunate. I only write C, I wonder if any C++ guru out there can answer this question: does modern C++ have a better solution to implement the same thing?

[1] https://devblogs.microsoft.com/oldnewthing/20040826-00/?p=38...

jcalvinowens3y ago

> Does modern C++ have a better solution to implement the same thing?

I'm no guru, but I know from experience you can do it in C++:

  {0}[calvin ~] cat test.cpp
  #include <iostream>
  #include <memory>
  
  struct foo {
    int len;
    int v[];
  };
  
  int main(void) {
    auto p = std::unique_ptr<foo>(reinterpret_cast<struct foo *>(
        malloc(sizeof(struct foo) + sizeof(int) * 2)));
  
    p->v[1] = 99;
    std::cerr << p->v[1] << std::endl;
  
    return 0;
  }
  {0}[calvin ~] g++ -Wall -Wextra -std=c++17 test.cpp -o test
  {0}[calvin ~] ./test 
  99
  {0}[calvin ~] clang++ -Wall -Wextra -std=c++17 test.cpp -o test
  {0}[calvin ~] ./test 
  99

EDIT: Remove unnecessary extern block, as pointed out by wahern in the replies.

wahern3y ago

Flexible array members are commonly supported as an extension in C++ compilers, but the C++ standard itself does not permit them. And FWIW `extern "C"` doesn't drop a C++ compiler into a "C mode", it merely effects linkage (i.e. name mangling); all the code within the scope must still be valid, compiler-supported C++. Your example code compiles the same without the extern "C" declaration. And it compiles the same if main is also placed within the extern "C" scope; but not the headers, as C++ constructs like templates cannot have C linkage.

2 more replies

loeg3y ago

Yes, you can do this in C++, but keep in mind that implicitly generated copy/move constructors don't understand this and will not copy the full object. This can produce surprising memory corruption that can be difficult to debug. So you should be sure to either explicitly mark the struct not copy/movable, or implement smarter copy/move operators.

1 more reply

jenadine3y ago

It compiles and work doesn't mean it is not undefined behaviour. You should at least try with the ub sanitizer.

But in this case, as other said, it may be accepted as an extension. Still not part of C++

quelsolaar3y ago

You can do it without trailing arrays, by stacking the structures after one and other:

MyStructA a; MyStructB b;

a = malloc((sizeof a) + (sizeof b)); b = (MyStructB *)&a[1];

You need to make sure that the second struct doesn't have stricter alignment requirements than the one preceding it, but using this technique you can stack any number of structures or arrays of structures in one allocation.

(I would generally not recommend this coding style unless you have very specific requirements of memory usage)

loeg3y ago

This is pretty similar to:

  MyStructA {
    ...
    MyStructB b[];
  };

  MyStructA* a = malloc(sizeof(MyStructA) + sizeof(MyStructB));
  b = &a->b[0];

(Except, of course, that the syntax for locating 'b' is nicer this way, because you don't have to explicitly address the memory after 'a' and cast it to 'MyStructB'.)

planede3y ago

> does modern C++ have a better solution to implement the same thing?

No, the only standard way is to allocate a buffer large enough, and placement new the header and the bulk payload separately. The manual handling of alignment makes this very cumbersome.

Flexible array members would be nice, but I don't like that it is yet an other overload on some array declaration syntax (other meanings of "T x[]": 1) declare an array of unknown size, 2) in a definition, deduce the size). For some reason C likes to overload the array declaration syntax with widely different meanings (looking at you, VLAs).

jancsika3y ago

> yes, it can be implemented as a pointer at the end of the struct, but the extra latency of another pointer chasing can reduce performance

For the latency to be significant digits, there must be a lot of repeated calls which only read/write a very small number of elements in the array. Otherwise the accumulation of read/write operations performed when iterating over the data would dwarf the single pointer dereference.

So I'm curious now-- do OS kernels spend most of their time doing lots of calls that dive into such dynamically-allocated data just to extract a single datum or two?

neerajsi3y ago

I think they were slightly off the mark. The bigger deal is avoiding an extra allocation for an entity that has a variable sized part.

1 more reply

pjmlp3y ago

Because C++ has offered better alternatives for bounds checked data structures since it exists.

hn_go_brrrrr3y ago

No, C++ has nothing better.

cozzyd3y ago

here's a worse way that's a lot more work and maybe has some advantages:

https://gist.github.com/cozzyd/efda739301bb7eb3a4a63a145c93e...

ashvardanian3y ago· 8 in thread

Call me crazy, but zero length arrays are a great abstraction, when you working with implicit data-structures. Not safe, but elegant and performant. Many codebases could be 2x faster if their designers embraced that concept.

segfaultbuserr3y ago

> Not safe, but elegant and performant.

I'd say it's also not more dangerous (or equally dangerous, depending on your camp) than a pointer from malloc().

zabzonk3y ago

both c and c++ have the concept of zero-length arrays, if you malloc them - int * a = malloc(0); is ok

monocasa3y ago

I think the parent is talking about the c pattern of having the last member of a struct be a zero length array, which is actually a dynamically sized array that the struct is only the header to (ostensibly with another field of the struct specifying the length of the array). It's fallen a bit out of favor, but it is a handy way to commingle the header and array with one allocation/pointer.

And interestingly COBOL handled this in a cleaner way. I forget some of the specfics but there was a way to specify to the compiler that one field of a record specified the length of the following array, allowing the same pattern in a type safe way.

4 more replies

ChrisSD3y ago

Kind of:

> If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object

So it may actually allocate (although the allocation is unusable).

tmtvl3y ago

Hang on, let me think this through...

If malloc(0) gets called as first malloc in the program the system break does not need to be moved, as there is always 0 bytes space available... but malloc does like to move sysbreak by a large amount at a time to reduce the need for repeated calls...

I'm guessing malloc(0) does not move sysbreak and simply returns a pointer to the bottom of the heap?

3 more replies

russdill3y ago

Then you have two allocations instead of one and related data is now likely to be farther away, possibly even in a different page.

yyyk3y ago

malloc(0) return value is undefined by POSIX and can return NULL (IIRC it did on NetBSD).

kevin_thibedeau3y ago

malloc() doesn't allocate arrays. It allocates blocks of memory. Hence sizeof doesn't work the same for malloc() objects as it does on arrays.

1 more reply

chungy3y ago· 6 in thread

> C is not just a fancy assembler any more

I wish this trope would die. It really never was one.

bpye3y ago

Optimising C compilers maybe not, but you absolutely can naïvely translate C into assembler - at least for stack based machines. I do think ‘fancy assembler’ is a fitting description in that case.

nequo3y ago

In what sense was C never a fancy assembler?

I am not an expert on C nor assembly and would be curious if you could expand on this. The statement makes sense to me because my impression is that most of what happens in C code gets translated fairly straightforwardly to machine code, with the compiler taking care of bridging differences in the instruction sets of targeted architectures. I guess the reason this is simplistic is the inlining and loop unrolling done by an optimizing compiler. Is this what you mean?

jcranmer3y ago

The basic problem with this assumption is that people who follow it tend to get it in their heads that since C is merely "translat[ing] fairly straightforwardly to machine code", they assume they can rely on the semantics of the machine code being the semantics of their C program. That isn't true, and hasn't been for a long time [1]: compilers are only required to uphold the looser semantics of C, and they will happily apply optimizations that deviate from the semantics of a purported naive translation to machine code. The usual example brought out to explore this difference is signed integer overflow, which has nothing to do with inlining or loop unrolling.

[1] I don't know enough about the early history of C to be able to assert that it was never true, but it certainly hasn't been true since at least 1989.

1 more reply

astrange3y ago

C is a language with a specification which defines it in terms of a virtual machine, not translation to machine code. The memory model is also totally different and it has lots of undefined behavior.

jokoon3y ago

Care to explain?

Not saying C is a great standard, but that idea means that using C instead of assembly doesn't generate a big overhead, and it's still much easier to write C than assembly, especially when compiler support is very common.

I'm currently making a language that translates directly to C.

pjmlp3y ago

I have a couple of books of the days when C was relatively new that claim otherwise (around mid-1980's).

tmtvl3y ago· 6 in thread

TL;DR is the introduction of C99 VLAs, not Pascal-style arrays, though a potential attribute could be added so we could do

  int some_int;
  int some_array[] __attribute__((__element_count__(some_int)));

to store the size of some_array in some_int.

kevin_thibedeau3y ago

This has nothing to do with VLAs. The article covers FAMs at the end of structs.

tmtvl3y ago

I must have misunderstood, thank you for the clarification. I thought that having no size specifier in the array declaration turned it into a VLA, which could have repercussions when embedding structs, e.g.:

  struct
  {
    int a;
    /* ... */
    int b[];
  } foo;

  struct
  {
    struct foo;
    /* ... */
    int c;
  } bar;

I would expect to have issues when trying to access the other members of struct bar like c.

xenadu023y ago

You could take this idea further by adding an "array_ref" type to the C standard where it is a length and elements. In practice this would be broken into separate parameters when passed to a function so existing functions could be fixed post-hoc using attributes - similar to printf formatting attributes. Then passing an array_ref to a function that expects a pointer/length would let the compiler automatically translate. Then you could define the rules for how an array_ref decays to a pointer and how to make one out of a pointer and length.

In other words the C standard could make bounds-checking of arrays possible with a good interop story if the standards committee believed it was worth doing. Compilers would have a flag to enable or disable the runtime checks based on safety/perf tradeoffs. Libraries would slowly add the relevant annotations. Eventually most code would have the option of having all array accesses bounds checked.

loeg3y ago

Sibling already pointed out this article is not talking about VLAs, but I do like your flexible-array attribute extension proposal.

pabs33y ago

The proposal is almost exactly the same as the one in the article (near the end).

1 more reply

yyyk3y ago

Since clang will never support VLAIS, your proposition is a non-starter.

hgs33y ago· 2 in thread

Good article. If you're compiling C with MSVC then you can use SAL annotations [1] which serve the same purpose.

[1] https://learn.microsoft.com/en-us/cpp/code-quality/annotatin...

funny_falcon3y ago

Wow, such a great annotation language. Wish it were in GCC as well.

leeter3y ago

I keep hoping that the C and C++ committees will get together and standardize some of that in the form of C23/C++11 style attributes. But that is sadly likely a naïve hope.

1 more reply

zabzonk3y ago· 2 in thread

> Is it actually a 4 element array, or is it sized by the bytes member?

i give up, what does sizeof say? and why would it be sized by bytes?

ntrz3y ago

The previous paragraph says

> ...due to yet more historical situations (e.g. struct sockaddr, which has a fixed-size trailing array that is not supposed to actually be treated as fixed-size), GCC and Clang actually treat all trailing arrays as flexible arrays.

But I don't know, that doesn't seem to match the result I am getting with clang 13.1.6. It does seem to respect the array size declared in the struct, not treat it as a flexible array. I get -Warray-bounds warnings if I try to access anything past o->variable[3]. Maybe I'm misunderstanding what they're saying or my example is screwed up.

Edit: Actually, I guess it does end up treating it like a flexible array -- it produces -Warray-bounds warnings when compiling, but the resulting binary works (and doesn't trigger asan). Not sure I entirely understand it though.

wahern3y ago

It treats them as flexible arrays in the sense that it doesn't assume indexing beyond the declared size is undefined behavior, which would have implications for code elision and other optimizations.

1 more reply

ufo3y ago

Does anyone know what is the status of their refactoring effort to update all the flexible array declarations in the kernel? How far along are they?

cryptonector3y ago

IMO the right approach is to start with counted array struct wrapper types like `struct array_of_xyz { unsigned count; xyz a[1]; };` and use them to hold and pass by reference. When the array sizes are fixed, then use `struct array5_of_xyz { xyz a[5]; };` and pass by reference or by value as needed. Add to this a decoration to indicate that the `count` field is a count of the number of elements in the array and now the compiler can do bounds checking.

Then fix codebases recursively until it's all ok. At ABI boundaries that don't use such types create values of such types corresponding to the given arguments (e.g., you could count the elements of `argv[]` then create a wrapper for the `argv`).

manv13y ago

Everyone says a memory-safe C would be slower, but has anyone actually tested that recently?

It seems that a memory safe C would be faster, in that you wouldn't have to learn yet another language and runtime to deploy your stuff.

tmsln3y ago

> A simpler approach is the addition of struct member attributes, and is under discussion and early development by both the GCC and Clang developer communities.

Does anyone know where I can follow these discussions?

j / k navigate · click thread line to collapse

118 comments

61 comments · 11 top-level

WalterBright3y ago· 14 in thread

> int flex[] __attribute__((__element_count__(items)));

While what the article describes is clever, it is needlessly complex, and filled with various compiler switches and extensions.

In contrast, here's a stupid simple approach:

https://www.digitalmars.com/articles/C-biggest-mistake.html

where bounds-checkable arrays are declared as:

    int a[..];

`a` consists of two fields, a `length` and a `pointer`. Indexing it means the compiler can (optionally) insert a bounds check it.

    int s[..] = "string";
    s[10] = 'x'; // fatal runtime error

We can turn a pointer into a bounds checked array by "slicing" it:

    int *p = (int*) malloc(10);
    int a[..] = p[0 .. 10];

A bounds checked array can be turned into a pointer:

    int *p = &a[3];  // point to 3rd element of a[..]

That's all there is to it. No pages and pages of compiler switches and extensions.

Does it work? We've been doing that with D for over 20 years. Hell yeah, it works. It works fantastically well. It does not disturb any existing C code.

ndesaulniers3y ago

Also what you're proposing...would be an extension!

WalterBright3y ago

Reading the article, it doesn't look easy at all.

With the [..] proposal, it is easy enough to convert it back and forth between pointers and [..] to conform to required interfaces. One could even make the [..] implicitly convertible to a pointer.

2 more replies

downvotetruth3y ago

3 more replies

eklitzke3y ago

WalterBright3y ago

Bounds checked arrays do not have an extra indirection.

1 more reply

lelanthran3y ago

I like how this is going, but I'm missing a few things here:

> We can turn a pointer into a bounds checked array by "slicing" it:

> int *p = (int*) malloc(10); > int a[..] = p[0 .. 10];

When `p` is a parameter in a function, the function cannot know that it can create a slice of up to 10 elements (I assume that the `p[0 .. 10]` creates an array indexed from 0 - 9).

What if the line was:

     int a[..] = p[0..12]

Do we still get undefined behaviour?

> A bounds checked array can be turned into a pointer:

> int *p = &a[3]; // point to 3rd element of a[..]

Assuming that a indexes from 0 to 9, what happens when we use p with an out of range index, for example:

     int *p = &a[8];
     blah = p[3];

My main concern is how to tell other functions that the array has a maximum size, and how to determine (inside a function) what the maximum length of its parameters is.

WalterBright3y ago

> When `p` is a parameter in a function, the function cannot know that it can create a slice of up to 10 elements (I assume that the `p[0 .. 10]` creates an array indexed from 0 - 9).

> What if the line was: int a[..] = p[0..12] Do we still get undefined behaviour?

Yes, if the 12 extends past the end of the data p points to.

> Assuming that a indexes from 0 to 9, what happens when we use p with an out of range index, for example: int *p = &a[8]; blah = p[3];

You get undefined behavior.

> My main concern is how to tell other functions that the array has a maximum size

1 more reply

crabbone3y ago

You aren't going to get a lot of love from C programmers by casting malloc() result...

uecker3y ago

We have bounds-checkable arrays already since C99:

int (p)[n] = malloc(sizeof p); (*p)[i] = 1; // run-time bounds check

https://godbolt.org/z/vb8dqx1od

But yes, having a type that included the bound makes sense. But I do not think using array syntax for pointers as in your proposal makes any sense.

Dennis Ritchie got it right: https://www.bell-labs.com/usr/dmr/www/vararray.pdf

Ritchie DM. Variable-size arrays in C. The Journal of C Language Translation 1990;2:81-86.

WalterBright3y ago

> We have bounds-checkable arrays already since C99

    void foo(int n, int (*p)[n]) {
      (*p)[n] = 1;
    }

which has failed to catch on, because it still stores the pointer and the length as two separately handled objects.

> Dennis Ritchie got it right

"This paper proposes to extend C by allowing pointers to adjustable arrays and arranging that the pointers contain the array bounds necessary to do subscript calculations and compute sizes."

It appears to be phat pointers.

1 more reply

cassepipe3y ago

I had always assumed it was because of backwards compat that the proposal was never accepted but since it turns out that is not the issue, do you have any idea why the proposal was never accepted ?

pjmlp3y ago

Objective-C and C++ never had any issues having additionaly types for arrays and strings, C could do the same, but WG14 has clearly decided they don't want to do that.

1 more reply

planede3y ago

WalterBright3y ago

You're right that bounds checked arrays do nothing at all for existing code. But they can be added incrementally to an existing code base, as a normal part of working on the code.

1 more reply

segfaultbuserr3y ago· 12 in thread

[1] https://devblogs.microsoft.com/oldnewthing/20040826-00/?p=38...

jcalvinowens3y ago

> Does modern C++ have a better solution to implement the same thing?

I'm no guru, but I know from experience you can do it in C++:

  {0}[calvin ~] cat test.cpp
  #include <iostream>
  #include <memory>
  
  struct foo {
    int len;
    int v[];
  };
  
  int main(void) {
    auto p = std::unique_ptr<foo>(reinterpret_cast<struct foo *>(
        malloc(sizeof(struct foo) + sizeof(int) * 2)));
  
    p->v[1] = 99;
    std::cerr << p->v[1] << std::endl;
  
    return 0;
  }
  {0}[calvin ~] g++ -Wall -Wextra -std=c++17 test.cpp -o test
  {0}[calvin ~] ./test 
  99
  {0}[calvin ~] clang++ -Wall -Wextra -std=c++17 test.cpp -o test
  {0}[calvin ~] ./test 
  99

EDIT: Remove unnecessary extern block, as pointed out by wahern in the replies.

wahern3y ago

2 more replies

loeg3y ago

1 more reply

jenadine3y ago

It compiles and work doesn't mean it is not undefined behaviour. You should at least try with the ub sanitizer.

But in this case, as other said, it may be accepted as an extension. Still not part of C++

quelsolaar3y ago

You can do it without trailing arrays, by stacking the structures after one and other:

MyStructA a; MyStructB b;

a = malloc((sizeof a) + (sizeof b)); b = (MyStructB *)&a[1];

(I would generally not recommend this coding style unless you have very specific requirements of memory usage)

loeg3y ago

This is pretty similar to:

  MyStructA {
    ...
    MyStructB b[];
  };

  MyStructA* a = malloc(sizeof(MyStructA) + sizeof(MyStructB));
  b = &a->b[0];

(Except, of course, that the syntax for locating 'b' is nicer this way, because you don't have to explicitly address the memory after 'a' and cast it to 'MyStructB'.)

planede3y ago

> does modern C++ have a better solution to implement the same thing?

No, the only standard way is to allocate a buffer large enough, and placement new the header and the bulk payload separately. The manual handling of alignment makes this very cumbersome.

jancsika3y ago

> yes, it can be implemented as a pointer at the end of the struct, but the extra latency of another pointer chasing can reduce performance

So I'm curious now-- do OS kernels spend most of their time doing lots of calls that dive into such dynamically-allocated data just to extract a single datum or two?

neerajsi3y ago

I think they were slightly off the mark. The bigger deal is avoiding an extra allocation for an entity that has a variable sized part.

1 more reply

pjmlp3y ago

Because C++ has offered better alternatives for bounds checked data structures since it exists.

hn_go_brrrrr3y ago

No, C++ has nothing better.

cozzyd3y ago

here's a worse way that's a lot more work and maybe has some advantages:

https://gist.github.com/cozzyd/efda739301bb7eb3a4a63a145c93e...

ashvardanian3y ago· 8 in thread

segfaultbuserr3y ago

> Not safe, but elegant and performant.

I'd say it's also not more dangerous (or equally dangerous, depending on your camp) than a pointer from malloc().

zabzonk3y ago

both c and c++ have the concept of zero-length arrays, if you malloc them - int * a = malloc(0); is ok

monocasa3y ago

4 more replies

ChrisSD3y ago

Kind of:

So it may actually allocate (although the allocation is unusable).

tmtvl3y ago

Hang on, let me think this through...

I'm guessing malloc(0) does not move sysbreak and simply returns a pointer to the bottom of the heap?

3 more replies

russdill3y ago

Then you have two allocations instead of one and related data is now likely to be farther away, possibly even in a different page.

yyyk3y ago

malloc(0) return value is undefined by POSIX and can return NULL (IIRC it did on NetBSD).

kevin_thibedeau3y ago

malloc() doesn't allocate arrays. It allocates blocks of memory. Hence sizeof doesn't work the same for malloc() objects as it does on arrays.

1 more reply

chungy3y ago· 6 in thread

> C is not just a fancy assembler any more

I wish this trope would die. It really never was one.

bpye3y ago

nequo3y ago

In what sense was C never a fancy assembler?

jcranmer3y ago

[1] I don't know enough about the early history of C to be able to assert that it was never true, but it certainly hasn't been true since at least 1989.

1 more reply

astrange3y ago

C is a language with a specification which defines it in terms of a virtual machine, not translation to machine code. The memory model is also totally different and it has lots of undefined behavior.

jokoon3y ago

Care to explain?

I'm currently making a language that translates directly to C.

pjmlp3y ago

I have a couple of books of the days when C was relatively new that claim otherwise (around mid-1980's).

tmtvl3y ago· 6 in thread

TL;DR is the introduction of C99 VLAs, not Pascal-style arrays, though a potential attribute could be added so we could do

  int some_int;
  int some_array[] __attribute__((__element_count__(some_int)));

to store the size of some_array in some_int.

kevin_thibedeau3y ago

This has nothing to do with VLAs. The article covers FAMs at the end of structs.

tmtvl3y ago

  struct
  {
    int a;
    /* ... */
    int b[];
  } foo;

  struct
  {
    struct foo;
    /* ... */
    int c;
  } bar;

I would expect to have issues when trying to access the other members of struct bar like c.

xenadu023y ago

loeg3y ago

Sibling already pointed out this article is not talking about VLAs, but I do like your flexible-array attribute extension proposal.

pabs33y ago

The proposal is almost exactly the same as the one in the article (near the end).

1 more reply

yyyk3y ago

Since clang will never support VLAIS, your proposition is a non-starter.

hgs33y ago· 2 in thread

Good article. If you're compiling C with MSVC then you can use SAL annotations [1] which serve the same purpose.

[1] https://learn.microsoft.com/en-us/cpp/code-quality/annotatin...

funny_falcon3y ago

Wow, such a great annotation language. Wish it were in GCC as well.

leeter3y ago

I keep hoping that the C and C++ committees will get together and standardize some of that in the form of C23/C++11 style attributes. But that is sadly likely a naïve hope.

1 more reply

zabzonk3y ago· 2 in thread

> Is it actually a 4 element array, or is it sized by the bytes member?

i give up, what does sizeof say? and why would it be sized by bytes?

ntrz3y ago

The previous paragraph says

wahern3y ago

It treats them as flexible arrays in the sense that it doesn't assume indexing beyond the declared size is undefined behavior, which would have implications for code elision and other optimizations.

1 more reply

ufo3y ago

Does anyone know what is the status of their refactoring effort to update all the flexible array declarations in the kernel? How far along are they?

cryptonector3y ago

manv13y ago

Everyone says a memory-safe C would be slower, but has anyone actually tested that recently?

It seems that a memory safe C would be faster, in that you wouldn't have to learn yet another language and runtime to deploy your stuff.

tmsln3y ago

> A simpler approach is the addition of struct member attributes, and is under discussion and early development by both the GCC and Clang developer communities.

Does anyone know where I can follow these discussions?

j / k navigate · click thread line to collapse