For instance, say we write a function that rotates an array: it moves the low M bytes to the top of the array, and shuffles the remaining M - N bytes down to the bottom. This function will work fine with the zero byte memmove or memcpy operations in the special case when N == 0, because the pointer will be valid.
Now say we have something like this:
struct buf {
char *ptr;
size_t size;
};
we would like it so that when the size is zero, we don't have an allocated buffer there. But we'd like to support a zero sized memcpy in that case: memcpy(buf->ptr, whatever, 0) or in the other direction likewise.We now have to check for buf->ptr being buf in the code that deals with resizing.
Here is a snag in the C language related to zero sized arrays. The call malloc(0) is allowed to return a null pointer, or a non-null pointer that can be passed to free.
oops! In the one case, the pointer may not be used with a zero-sized memcpy; in the other case it can.
This also goes for realloc(NULL, 0) which is equivalent to malloc(0).
And, OMG I just noticed ...
In C99, this was valid realloc(ptr, 0) where ptr is a valid, allocated pointer. You could realloc an object to zero.
I'm looking at the April 2023 draft (N3096). It states that realloc(ptr, 0) is undefined behavior.
When did that happen?
[0] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf
When those implementations eventually pick up C23, they surely could fix the bug as well. At best this should have been an errata/defect for the previous standard, so that the previous standards document behavior of implementations of said standards.
The requirements in C99 and before are perfectly clear. realloc is described as liberating the old pointer, and then allocates a new one as if by malloc. (Except that it magically has access to both objects so it can transfer the necessary bytes that must be transferred from the old to the new.)
It is perfectly clear what happens when size is zero. No byte can be copied from the old object, if any. The behavior is like free(oldptr) followed by return malloc(newsize).
Your IQ would have to be well below 85 to misunderstand the requirements.
And those requirements are still there; there is still the description of realloc in terms of freeing the old pointer and allocating a new object with malloc.
There was no need to insert a gratuitous removal of definedness for the size zero case, given that malloc handles it.
Applications now have to do this:
void *sane_realloc(void *ptr, size_t size)
{
if (size == 0) {
// behave literally as required in C99
free(ptr);
return malloc(0);
}
return realloc(ptr, size);
}
Supposedly because a few vendors were not able to code this logic in their realloc functions?C17 says "If size is zero and memory for the new object is not allocated, it is implementation-defined whether the old object is deallocated" (emphasis added).
What standard, exactly, is BSD violating?
It's very strange. I wrote my own memory allocator and I can't figure out the right way to handle this. Eliminating the need for these "technically" valid pointers that can't actually be accessed because they're zero sized seems like the better solution.
> When did that happen?
More importantly, why did that happen? People have told me that I should care about the C standards committee because they take backwards compatibility very seriously. Then they come out with breaking changes like these.
Mainly, that it has supported that before and programs rely on it.
Programs written to the C99 standard can resize a dynamic vector down to empty with a resize(ptr, 0). The pointer coming from that will be the same as if malloc(0) has been called.
So now, that has been taken away; those programs can now make demons fly out of your nose.
Thank you, ISO C!
> Do allocators really keep track of these null allocations? That would require keeping state for every single address in the worst case...
Implementations of malloc(0) that don't return null are required to return a unique object. To do that, all they have to do is pretend that the size is some nonzero value like 1 byte. (The application must not assume that there is any byte there that can be accessed).
C99 has no resize() function. Assuming you mean realloc(), C99 does not guarantee you can use realloc() in this manner.
See also:
https://news.ycombinator.com/item?id=38850575
https://stackoverflow.com/questions/16759849/using-realloc-x...
https://wiki.sei.cmu.edu/confluence/plugins/servlet/mobile?c...
https://developers.redhat.com/articles/2023/07/26/checking-u...
Another option is to treat them as being of size 1.
(In theory you could do endless allocations of size 0, and eventually you'd run out of space, even though you've allocated 0 bytes in total. But you end up in exactly that situation, whatever the allocation size, if you don't take bookkeeping overhead into account!)
[1]: https://github.com/rust-lang/unsafe-code-guidelines/issues/4...
If you ensure that the 'zero page' (so to speak) is empty you can also exploit this property for optimizations, and in some cases the emscripten toolchain will do so.
i.e. if you have
struct MyArray<T> {
uint length;
T items[0];
}
you can elide null pointer checks and just do a single direct bounds check before dereferencing an element, because for a nullptr, (&ptr->length) == nullptr, and if you reserve the zero page and keep it empty, (nullptr)->length == 0.this complicates the idea of 'passing nothing' because now it is realistically possible for your code to get passed nullptr on purpose and it might be expected to behave correctly when that happens, instead of asserting or panicking like it would on other (sensible) targets
Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
But this is fine, since pointer comparisons (as in, less/greater comparisons) are actually both pretty restricted and required to have reasonable semantics when comparing pointers that point into the same object/array: When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
By the way, this means that, among other things, if you use number N to represent a null pointer then number N-1 can not ever be a valid pointer to anything: adding 1 to a valid pointer is always allowed, and this addition should produce a non-null pointer — because the resulting pointer is required to be well-behaved in comparisons, and comparisons with null pointer are UB.E.g., I am pretty sure Go relies on some of the behavior described here: that the 0 page is unmapped, and that accesses will trap. This is why Go code will sometimes SIGSEGV despite being an almost memory-safe language: Go is explicitly depending on that trap (and it permits Go, in those cases, to elide a validity check). (Vs. some memory accesses will incur a bounds check & panic, if Go cannot determine that they will definitely land in the first page; Go there must emit the validity check, and failing it is a panic, instead of a SIGSEGV.)
IIRC, Linux doesn't permit at least unprivileged processes to map address 0, I believe. (Although I can't find a source right now for that.)
¹Yes, in most languages this is UB … but what I'm saying is that having it trap makes errors — usually security errors — obvious & fail, instead of really letting the UB just do whatever and really going off into "it's really undefined now" territory.
> But suppose we want an empty (length zero) slice.
So is there an actual rationale for this? I've written the memory allocator and am in the process of developing the foreign interface. I've been wondering if I should explicitly support zero length allocations. Even asked this a few times here on HN but never got an answer. It seems to be a thing people sort of want but for unknown reasons.
I definitely see the benefits of well-defined arithmetic on null pointers. As a data type though it seems to me that any pointer could be a zero sized allocation.
There are a bunch of languages where empty arrays are "falsy", and in those it's not recommendable to use the two to differentiate valid states. Feels like the same could apply here
The C++ type discussed is much newer than Rust (std::span was standardized in C++ 20).
Yes in many cases what C++ APIs mean here isn't a slice of zero Ts at all but instead None, and Rust has an appropriate type for that Option<&[T]> which works as expected, and so in many cases where people have built an API which they think is &[T] and are trying to make it with the unsafe functions mentioned it's actually Option<&[T]> they needed anyway, they don't even have a type correct design.
Arrays know their size, so the "I'll interpret it as zero Ts" makes even less sense for an array where we know up front the size as it is part of the type.
Pass something with a 0 length, pointing to NULL. Enjoy your blue screens and kernel panics.
The solution is clear: just ignore the C spec. It’s total garbage. Of course you can memcpy between any ptr values if the count is zero and those ptr values don’t have to point to anything.
UB to pass memcpy to null means after that call, the pointer is assumed to be non-null. So if(ptr) can constant fold. Maybe faster.
I'm in agreement with you on this but your compiler probably isn't.
No need for that.
> the pointer is assumed to be non-null
Just give us an option to tell the compiler to stop assuming nonsense like that. I'm gonna make it standard on my makefiles just like -fno-strict-aliasing and -fwrapv.
There's no use trying to work around C standard problems. Compilers should just be told to define the undefined and to disable everything that can't be defined. Then we can write code on solid foundations instead of quicksand.
> Or your own memcpy with a different name.
I wish. I couldn't escape that function even on my freestanding nolibc project. The compilers will happily emit calls to memcpy and memset all by themselves whenever they feel like it and god help you if you don't provide them because for some reason this nonsense can't be disabled.
I don't trust the clang -fno-strict-aliasing -fno-pointer-whatever strategy. There's too many ways for that to go wrong. Code needs to be correct/safe by default and opt into optimisations to have a chance of working, otherwise it is really easy to fail to check that flag.
There are a few fairly simple C compilers out there. LCC, the one that derives from the obfuscated project, one in gnu mes associated with guix. There's a grammar from the compcert people.
I haven't convinced myself writing a working C compiler is a weekend project but it's surely less than a year, seriously considering it on paranoia grounds. Idea being use it as a reference - when I suspect clang to be breaking things, run against the dumb one that doesn't really do optimisations as a comparison.
My impression is that Zig doesn't have a documented memory model that cares about things like whether an address corresponds to an allocation or not, so problems relating to this sort of thing cannot come up yet :)
https://github.com/ziglang/zig/commit/32e0dfd4f0dab351a024e7...
From the title, I assumed that this article was going to be about either (a) permissive grading standards at university or (b) chronic constipation.