But you could get most of the benefit by just isoheaping (strictly allocate different types in different heaps).
Because this is C, the programmers can do wherever they want, but before it must do some negotiation with the static analysis.
In Rust you're told that if it's impossible under ownership then you should find a different way to express it rather than trying to circumvent ownership. I guess it's different in C.
Here’s an allocator optimized for that use case.
https://github.com/WebKit/WebKit/blob/main/Source/bmalloc/li...
The experience is similar to changing a header file to use a const argument where previously the argument was non-const. This change will propagate everywhere.
I also think a similar experience is converting JavaScript to typescript. The type system will complain before it stabilizes.
As in, running nothing but memory-safety-ified C code down to the syscall boundary?
Can you elaborate on this? I've been reading up on memory allocation algorithms and most of them seem to favor segregation of blocks by element size instead. Are there additional benefits to coming up with a complex typing scheme for a custom memory allocation interface?
CVEs are gross. How do you prove your code is free from use-after-free and so on?
> If this can be reasonably retrofitted to existing libraries and projects
That's the problem.
If you want to fool around in this space, consider revisiting C++ to Rust conversion. There's something called Corrode, which compiles C to a weird subset of Rust full of objects that implement C raw pointers. The output is verbose and unmaintainable. What's needed is something that can figure out how big things are and who owns what, possibly guessing, and generate appropriate ideomatic Rust. Now that LLMs are sort of working, that might be possible.
Can you ask Github Co-pilot to look at C code and answer the question "What is the length of the array 'buf' passed to this function"? That tells you how to express the array in a language where arrays have enforced lengths, whicn includes both C++ and Rust. With hints like that, ideomatic translation becomes possible. Bad guesses will result in programs that subscript out of range, which is caught at run time. But guesses should be correct most of the time, because C programmers tend to use the same idioms for arrays with lengths. Forms such as "int read(int fd, char* buf, size_t buf_l)" show up often.
Using LLMs to help with tightening up existing code might work.
Optional memory safety is, when you can opt an entire project into a "strict" mode, and this becomes trivially verifiable by others. I imagine that's the goal here.
Optional security is a problem only when you need to remember a million different rules and gotchas, because you will inevitably miss a spot. But if it's a global toggle, it's pretty good. "Use '-fmemsafe' for C/C++" is as tractable as "don't use 'unsafe' in Rust".
Yeah, as you note, library compatibility is an issue. But it's an even bigger issue when bootstrapping a new, safe language: you gotta implement the libraries from scratch, and you never really get to full parity with C/C++. Getting it done for the top 10 most-used libraries would make a spectacular difference in itself.
I should note that I'm not a huge believer in "saving" C/C++ as the memory-safe language of the future - I think there are lingering cultural problems around the standards that we had no luck overcoming for decades - but I also don't think the duo is going away any time soon, so might as well expend some effort on making it a safer tool.
this is the way you tell C what is the size of array.
void f(int n, int a[n]) {
} void f(int n, int a[]) {
}
Why? So that you can write void f(int n, int m, int a[n][m]) {
}
which declares a 2-dimensional array parameter. In that case, the "m" is used to compute the position in the array for a 2D array. The "m" doesn't do anything.
This is equivalent to writing void f(int n, int m, int a[][m]) {
}
This is C's minimal multidimensional array support, known by few and used by fewer.Over a decade ago, I proposed that sizes in parameters should be checkable and readable I worked out how to make it work.[1] But I didn't have time for the politics of C standards.
[1] http://animats.com/papers/languages/safearraysforc43.pdf
The major difference is when the array is multi-dimensional. If you don't have VLAs then you can only set the inner dimensions at compile time, or alternatively use pointer-based work-arounds.
Even in the case of one-dimensional arrays, a compiler or a static analyzer can take advantage of the VLA size information to insert run-time checks in debug mode, or to perform compile-time checks.
Parameters like `const double b[static restrict 10]` for at least 10 long and doesn't alias other parameters.
Syntactically this is pretty weird.
Owner pointers take on the responsibility of owning the pointed object and its associated memory, treating them as distinct entities. A common practice is to implement a delete function to release both resources, as illustrated in Listing 7:
Listing 7 - Implementing the delete function
#include <ownership.h>
#include <stdlib.h>
struct X {
char *owner text;
};
void x_delete(struct X *owner p) {
if (p) {
/*releasing the object*/
free(p->text);
/*releasing the memory*/
free(p);
}
}
int main() {
struct X \* owner pX = calloc( 1, sizeof \* pX);
if (pX) {
/*...*/;
x_delete( pX);
}
} struct X {
text: Option<Box<str>>,
}
fn main() {
let pX = Box::new(X { text: None });
// automatically dropped (freed) at end of scope
}For instance, this code is correct.
#include <ownership.h>
#include <stdlib.h>
struct X {
char * owner text;
};
void x_delete(struct X * owner p)
{
if (p)
{
free(p->text);
free(p);
}
}
int main() {
struct X * owner p = malloc(sizeof(struct X));
p->text = malloc(10);
free(p->text); //object text destroyed
struct X x2 = {0};
*p = x2; //x2 MOVED TO *p
x_delete(p);
//no need to destroy x2
}Even languages like Rust make memory safety optional. One can drop into an unsafe block and perform all sorts of abominable things. Such escape hatches are necessary to color outside of the lines when one must do system software development or optimize software beyond what the compiler can do on its own. At some point, the developer must be trusted to learn the tool or to use discretion when considering something like unsafe. In both cases, a development team can peer review these choices.
What makes me interested in tools like Cake and similar tools is that these bring us closer to being able to use proof assistants to build up reasoning about the times when we must color outside of the lines. Whether C, C++, or Rust, being able to import code into a proof assistant or extract efficient code from a proof assistant can further assist us when our use cases exceed what is possible with the safety features in our language or tooling.
Bad guesses will result in programs that subscript out of range, which is caught at run time.
That is sadly not always the case.A good mempool implementation is all you need (i.e keeps track of every request, and zeros out the memory on release)
A lot of code in that article doesn't use mempools, and furthermore, just because a double free exists doesn't mean that its always exploitable. And if its exploitable, it doesn't mean that you can gain a shell or even exfil data, sometimes it means you can just crash the program.
Fundamentally, if you write a wrapper around memory management that keeps track of allocated resources, much in the same way how rust includes some runtime code during compilation for memory safety, you gain the same functionality.
Can you substantiate that? There are commonly employed tracking allocators, such as ASAN that can catch certain kinds of UB, and UBSAN other, and with special interpreters you can catch even more. But even basic ASAN is more exhaustive than what you are suggesting, and it provably can't provide the same guarantees that safe and sound Rust gives you https://stackoverflow.com/a/48902567:
> And that is not accounting for the fact that sanitizers are incompatible with each others. That is, even if you were willing to accept the combined slow-down (15x-45x?) and memory overhead (15x-30x?), you would still NOT manage for a C++ program to be as safe as a Rust one.
Also, I think you misunderstand the way Rust works, it does compile-time ownership checking, which allows it to avoid run-time checking, so this part "same way how rust includes some runtime code during compilation for memory safety" is factually wrong.
The whole point of a good mempool is that you malloc once, and only call free when you exit the program. The data structures for memory allocation will never get corrupted. And the memory pool will never release chunk twice cause it keeps tracks of allocated chunks.
User after free is mitigated in the same way. When you allocate, you get a struct back that contains a pointer to the data. When you release, that pointer is zeroed out.
No true Scotsman.
> The whole point of a good mempool is that you malloc once, and only call free when you exit the program. The data structures for memory allocation will never get corrupted. And the memory pool will never release chunk twice cause it keeps tracks of allocated chunks.
Then you've just moved the same problem one layer up - "use after returned to mempool" takes the place of "use after free" and causes the same kind of problems.
> When you allocate, you get a struct back that contains a pointer to the data. When you release, that pointer is zeroed out.
And the program - or, more likely, library code that it called - still has a copy of that pointer that it made when it was valid?
So you're describing fork() and _exit(). That's my favorite memory manager. For example, chibicc never calls free() and instead just forks a process for each item of work in the compile pipeline. It makes the codebase infinitely simpler. Rui literally solved memory leaks! No idea what you're talking about.
Maybe it was bad luck on my part, and other embedded frameworks are better; but I got into both ESP32 and STM32, both frameworks are the worst spaghetti code I have ever seen. You need to jump through at least one, often two layers of indirection to understand what a particular function call will do. Here's an example of what I mean:
// peripheral_conf.h
#define USE_FOOBAR_PERIPHERAL 1
// obj_t.h
#define USE_OBJ_PARAM2
// In the library header
#ifdef USE_FOOBAR_PERIPHERAL
#define DoSomethingCallback FoobarCallback
#endif
// foobar.h
status_t FoobarCallback(int32_t data, int32_t param);
// obj_t.c
status_t Init(Obj_t* obj) {
obj->param1 = obj->init.initparam & 0xFF;
#ifdef USE_OBJ_PARAM2
obj->param2 = (obj->init.initparam >> 16) & 0xFF;
#endif
obj->callback = DoSomethingCallback;
return OK;
}
status_t DoSomething(Obj_t *obj, int32_t data) {
#ifdef USE_OBJ_PARAM2
return obj->callback(data, obj->param2);
#else
return obj->callback(data, obj->param1);
#endif
}
// main.c
Obj obj = {0};
obj.init.initparam = 0x12345678;
Init(obj);
DoSomething(obj, 0x42);
And that's an easy example. Macros everywhere, you need to grok what's happening in four different files to understand what the hell a single function call will actually do. Sure, the code is super efficient, because once it's compiled all the extraneous information is pre-processed away if you don't use such and such peripheral or configuration option. But all this could be replaced by an abstract class, perhaps some templates... And if you disable stuff you may not need (RTTI, exceptions) then you'd get just as efficient compiled code. It would be much easier to understand what going on, and you wouldn't be able call DoSomething on uninitialized data... Because you'd have to call the constructor first to even have access to the method.Anyway, thank god for debuggers, step-by-step execution, and IDEs.
Isn't it just that your personal in-head GPT has been trained on C++ and wants to see it everywhere? It's not so easy to make very small embedded implementations and there's a reason after 25+ years C++ has not made inroads there.
No, C++ is not even my programming language of predilection. Not sure why you would make assumptions about my background while knowing nothing about me. But I can recognize OOP patterns when I see them. There's even a book about that https://www.cs.rit.edu/~ats/books/ooc.pdf C-styled OOP is not a new concept. C++ just does it better.
The reason C++ has "not made inroads" may just be inertia, you know. And look at Arduino - if C++ code can run on an 8bit ATmega MCU, it can run anywhere. The whole language is designed around "pay for what you use and nothing else".
An abstract class is a struct with function pointers in it. Mark the fields const and the instance const and it'll be devirtualised and optimised away. If you miss overloading, `static inline __attribute__((overloadable))` wrappers in a header will bring it back.
Code generators can be better for debugging than built in templates. At the source level they look the same, but if it's behaving weirdly, you can look at the generated C instead of the templated layer.
C code can look rather like modern C++. If you're up for feeding it to a custom preprocessor to implement templates, or especially if you've gone as hardcore as the compiler front end under discussion here, C++ starts to look a lot like a syntactic obfuscation over C.
[it's not quite syntax over C, the languages play divergent games with semantics as well, but picking a different set of syntax abstractions over C to the C++ one is an interesting way to go]
There would be many more steps required "toward" memory safety, such as eliminating all forms of UB including uninitialized memory, out of bounds pointers, data races, etc. but if this direction is to be pursued it has to start somewhere.
FILE* open_file(const char* p) {
FILE* owner f = …
return f;
}
Now open_file callers would need to know that ownership is being returned which means that local variables would need to have the owner annotation propagated. That’s what I mean when I say it’s not composable - the ownership has to propagate fully throughout the codebase for a specific resource. Of course maybe you know better as this is just an initial glimpse on my part.Just for that #embed directive I would already use cake for the moment (although it seems like it is only doing the file->array conversion)
If that's correct, then this is somewhat practically limited: either pre-existing codebases will need to be retrofitted with an essentially bespoke set of macros, or the compiler will need to be "fail open" by default. The tradeoffs between these two are hard (substantial developer pain versus being ineffective against the bulk of a compiled program's API surface).
(Also, this design appears to be for temporal safety only, not spatial safety. But again I might have missed something.)
In particular: there are a lot of ~universally used libraries with ludicrously complex C APIs that undergo significant own/borrow semantic changes between releases. Things like OpenSSL. These libraries will need to carry these annotations upstream for correctness and up-datedness reasons.
The temporary solution, is the re-declare the malloc etc when compiling with cake and not complain withe the function signature difference only by owner qualifiers.
if this ownership were standard then gcc and mscv headers would have the qualifiers there enabled or not , but they would be there.
https://reviews.llvm.org/D64448
https://github.com/microsoft/GSL/blob/main/docs/headers.md#g...
The difference with cake ownership and RAII , is that with C++ RAII, the destructor is unconditionally called at end of scope. Then flow analysis is not required in RAII.
Cake requires flow analysis because "destructor" is not unconditionally called.
When the compiler can see that the owner is not owning a object (because the pointer is null for instance) then the "destructor" is not necessary.
To understand the difference.
With flow analysis (how it works today)
int main()
{
FILE *owner f = fopen("file.txt", "r");
if (f)
fclose(f);
}
Without flow analysis (or with a very simple one, where the destroy must be the last statement) void fclose2(FILE * owner p) {
if (p) fclose(p);
}
int main()
{
FILE *owner f = fopen("file.txt", "r");
if (f){
}
fclose2(f);
} struct X x = {0};
//...
view struct X x2 = x;
destroy(&x);
//x2 does not need destructorThat it can translate C23 to C89 means it has most of the work in place to translate C23 to C23, or C99 to C99 etc. If that is done in a (mostly) reversible fashion - successfully re-encode back to the original, where you `preprocess -> parse -> unparse -> re-preprocess` which is a nuisance but possible, then it opens the door to much more aggressive type systems.
In particular, the input can be C with the ownership annotations, and if they're valid, the output can be C with those annotations dropped to be fed into some other compiler. Or whatever other invariant systems the compiler dev is interested in.
Or the input could be C extended with namespace {} syntax, C++ style lambdas, contract checking - whatever you wish really, and the output can be the extensions desugared into C. Templates (possibly the D style ones) can be implemented as instantiating normal functions from said template.
That the output is C means this is usable in all the pipelines that already work with C.
Good stuff, thanks for posting.
Equally one which replaces 'auto' with the name of the type (and similar desugaring games) is still a C to C compiler, just running as a C23 to C99 or whatever. Resolve the branch in _Generic before emitting code as part of downgrading C11.
The lifetime annotations are an interesting one because they're a different language which, if it typechecks, can be losslessly converted into C (by dropping the annotations on the way out).
I'm not sure where in that design space the current implementation lies. In particular folding preprocessed code back into code that has the #defines and #includes in is a massive pain and only really valuable if you want to lean into the round trip capability.
From what I understand, this appears to a be separate binary from GCC/Clang that does static analysis and outputs C99.
Can this be a GCC plugin? I know we can write plugins that are activated when a specific macro is provided, and the GCC plugin event list allows intercepting the AST at every function declaration/definition. Unless you're rewriting the AST substantially, I feel this could be a compiler plugin. I'd like to know a bit more about what kinds of AST transformations/checks are run as part of Cake.
Inside visual studio for instance, we can have on external tools
C:\Program Files (x86)\cake\cake.exe $(ItemPath) -msvc-output -no-output -analyze -nullchecks
The main annotations are qualifiers (similar to const). C23 attributes were considered instead of qualifiers, but qualifiers have better integration with the type system. In any case, macros are used to be declared as empty when necessary.
The qualifiers and the rules can be applied to any compiler. Something harder to specify (but not impossible) is the flow analysis.
Sample of rule for compilers.
int * owner a; int * b; a = b;
we cannot assign view to an owner object. this kind of rule does not require flow analysis.
int * owner p = calloc(1, sizeof(int));
defer free(p);
However, with ownership checks, the code is already safe. This may also change the programmer's style, as generally, C code avoids returns in the middle of the code.In this scenario, defer makes the code more declarative and saves some lines of code. It can be particularly useful when the compiler supports defer but not ownership.
One difference between defer and ownership checks, in terms of safety, is that the compiler will not prompt you to create the defer. But, with ownership checks, the compiler will require an owner object to hold the result of malloc, for instance. It cannot be ignored.
The same happens with C++ RAII. If you forgot to free something at our destructor or forgot to create the destructor, the compiler will not complain.
In cake ownership this cannot be ignored.
struct X {
FILE * owner file;
};
int main(){
struct X x = {};
//....
} //error x.file not freedhttps://floooh.github.io/2018/06/17/handles-vs-pointers.html
and most issues can be caught by using a static analyser of a memory leak checker (getting ppl to consistently use them is another issue, but still)
On the other hand static analysis will catch the error at first compilation even on those almost never executed code.