Rust also statically prevents race conditions in multi-threaded code, via the Send and Sync traits. This blog post describes annotations which make Xr0 a safer dialect of C, but its still far less safe than rust.
This looks like a cool project, but I've gotta agree with other commenters. The blog post title is a false promise. If you think the project is good (and it looks good!), then why lie about its capabilities?
The fundamental problem with fixing C/C++ is that you have to take stuff out of the language to make it safer. That's not going to happen.
The trouble with going to Rust is that the language is so different from C/C++ that conversion is tough, both for code and for programmers. It's not the syntax. It's the semantics of affine types. Strict single ownership requires a major rethink of program organization. The result is, arguably, better programs. But it's a wrenching change. Interlinked data types have to be completely redesigned.
(There are people trying to fix that problem in Rust. There's "Arc::new_cyclic", which lets you have weak backlinks. Now you can find your parent in the hierarchy, which means you can do object-like things if you want to. There's also UniqueRc in development.[1] There's the theoretical possibility of proving at compile time that no .upgrade() operation on such objects will result in a failure due to two strong pointers existing at the same time. If that could be done, the reference counts could be eliminated and Rust would have safe back-references. Nobody has figured out how to do that yet. So for now, the mismatch between C/C++ and Rust is pretty bad.)
Yes, Xr0 is young. It took Rust nine years to reach v1.0.0. Xr0 may need slightly over the nine months its had.
I am curious to know how you would handle tagged unions, say, something like this:
struct object {
int type; /* OBJ_INT, OBJ_STR, etc. */
union {
int i;
char *s; /* heap allocated */
/* etc */
} val;
};
Does Xr0 prevent type confusion, and can it model the data-dependent behaviour of any functions that operate on struct object?How would you model things like reference counting? In CPython, for example, one of the most pernicious programming problems is invalid reference counting because the C API functions are rather inconsistent in their handling of reference counts. One has to carefully peruse the documentation to understand if the function will steal a reference (from a parameter), borrow a reference (for the return value), or if it makes a new reference. Such setups are perfect candidates for annotation.
let s1 = String::from("hello");
let s2 = s1;
println!("{}, world!", s1);
is in fact safe code would suggest they indeed don't have much concern for thread safety.But yes, thread safety is down the road for us, when we get to C11.
We're beginning with what is central in C and then moving outward. C, as designed by Ritchie, and for the first 40 years of its existence had no official support for multithreading, so this can hardly be called central.
If we add support for multithreading it'll be after finishing with C89 and C99.
I feel this frequently with C. I would love to formalize pointer use in interfaces. But Xr0's proposed syntax is clunky and not backward-compatible.
void foo(int* x) ~ [ free(x); ];
Yikes... Why not // <xr0.h>
#ifndef __XR0__
#define freed
#endif
// my code:
void foo(int* freed x);
Boom. Looks clean, easily readable, backward compatible with standard compilers.As per the issue of backward-compatibility, Xr0 strips the annotations with a simple command: `0v -s`, producing C source that works with C compilers.
For complicated annotations, instead of
void foo(int* x) ~ [ free(x); ];
They could do: #include <xr0.h>
void foo(int* x) XR0( free(x); );
Then this will compile in a standard C compiler (assuming the XR0 macro is defined properly in xr0.h)Yes. The ability to mutate an object necessarily implies the ability to mutate the state of an object to break guarantees. While there are definitely patterns where multiple mutable references to the same object can be safe, it is not inherently safe, and proving safety is far more challenging because you can't as easily rely on pre/postconditions for lemmas.
The particular case that makes the utility of a one-mutable-xor-many-read-only model clear is something I trip over surprisingly frequently in C/C++:
// Getting a value gets a pointer to somewhere in the hashtable...
Value &V = hash_table[key];
// This may cause the hash table to grow, thereby invalidating said pointer...
hash_table.insert(key2, value2);
// ... and now I used it!
V = meow;
Sure, in a three-line function, it's kind of obvious (if you know to be aware of this possibility!), but in a function where the pointer and the use are separated by hundreds of lines of code, it's easy to miss that there is a requirement that the hash table not be resized between the two calls. With Rust, the fact that &mut for V means that I can't touch the hash table until I'm finished with V makes it a compiler error.Yeah, there are times when Rust's rule here pisses me off and feels too finicky, but it's also saved my butt several times. And quite frankly, a lot of the finicky can be worked around by building some solid container libraries to handle the cases where you can guarantee the safety with higher-level abstractions (say append-only data structures).
More to the crux of the article however, I'm not generally impressed with automated theorem proving as a solution to safety. Already from the meat of the article, it seems that this solution is setting you up for using a complex language to specify the interface, which immediately begs the question of how you can be sure the interface is correctly specified.
This is insightful. That is exactly what is happening, but our sense is that this language will be no more complicated than C itself, and will to a large extent parallel it.
The question is really what is more effective, working under one set of constraints imposed universally (like the ownership semantics in Rust), or taking upon yourself the responsibility of defining such constraints within each program. In this way what we're doing is an application of the "trust the programmer" design philosophy; trust him, not only to know what he's doing, but to be able to express it (in the interface specification).
> Sure, in a three-line function, it's kind of obvious (if you know to be aware of this possibility!), but in a function where the pointer and the use are separated by hundreds of lines of code, it's easy to miss that there is a requirement that the hash table not be resized between the two calls. With Rust, the fact that &mut for V means that I can't touch the hash table until I'm finished with V makes it a compiler error.
It can seem as though the example we provide is contrived (because of course it is), but it is not mere contrivance. The point we stress the most is not that the bug is obvious if there are only three lines, but that the presence or absence of a bug is impossible to see even with a single function call whose interface is informal. So what is remarkable is not that we can write three lines of code that have or do not have an obvious error, but that by the mere fact of the interfaces being informal, a three-lined function can be impossible to analyse by itself.
> The ability to mutate an object necessarily implies the ability to mutate the state of an object to break guarantees. While there are definitely patterns where multiple mutable references to the same object can be safe, it is not inherently safe, and proving safety is far more challenging because you can't as easily rely on pre/postconditions for lemmas.
This doesn't answer the question. "it is not inherently safe" does not mean "it is inherently unsafe", it only means "it may be unsafe". So our point is that Rust precludes many kinds of programming that can be safe in relevant contexts, and this is what is so limiting.
The point of interfaces is usually to hide underlying complexity. What's the point of interfaces that are just as complicated as their implementation?
> The question is really what is more effective, working under one set of constraints imposed universally (like the ownership semantics in Rust), or taking upon yourself the responsibility of defining such constraints within each program.
One of the things I personally like about Rust is that its rules are very simple and still capture 95% of all situations I find myself in. Adding the complexity of having to write your own formal specifications for the remaining 5% doesn't seem like it's worth the effort.
> So our point is that Rust precludes many kinds of programming that can be safe in relevant contexts, and this is what is so limiting.
Xr0 will also either 1) reject programs that are safe or 2) accept programs that are unsafe. An (computable) algorithm that decides for every C program whether it is safe or not is mathematically impossible. Start implementing loops and you'll quickly see why the idea of provable correctness without any restrictions is doomed to fail.
Not that similar projects don't exist: CBMC, Frama-C, ... How does Xr0 differ from those existing projects?
int *p = malloc(1);
bar(p);
free(p);
Wouldn't the safe way to write that code be to just allocate the value on the stack so it has automatic lifetime managed for you?Excepting that, why are you annotating functions, it's the pointer itself that has the use case invariant. Shouldn't the type be annotated to declare these semantics instead?
> Wouldn't the safe way to write that code be to just allocate the value on the stack so it has automatic lifetime managed for you?
To be fair, there are a lot of reasons why you might not do that. One reason is that stack space is limited, so if you are allocating a large object, it would be unwise to necessarily use stack space in some cases, as you might overflow the stack, and it's hard to know if your code is overflow-prone since different platforms have different default max stack sizes. Another reason is if you don't know how much memory you might need at compile-time and thus can't use static allocations. Finally, and probably one of the main reasons, sometimes the lifetime of a memory allocation just doesn't map to a stack frame; it may malloc/free within one stack frame, or it may malloc and then get freed in a parent stack frame indeterminately. Basically, any reason you might use a smart pointer in C++ is also a reason you'd reach for dynamic allocations in C.
Sure, if you malloc(1) specifically, you know it's only one byte, and a static allocation. But clearly, that's just a contrived example, so focusing too hard on it is a little silly.
> Excepting that, why are you annotating functions, it's the pointer itself that has the use case invariant. Shouldn't the type be annotated to declare these semantics instead?
Their entire hypothesis is the idea that the boundaries between interfaces is where the unsafety occurs. The pointer itself has the invariant, but locally proving invariants like "pointer can never be freed twice" or "pointer is either freed or returned" can be done trivially using control flow analysis; the problem comes when you start calling other functions. If you add annotations and make it a compiler error to have invalid annotations, you can make it so that these invariants can extend across function boundaries without needing to recursively prove the control flow over the entire program, which would usually be infeasible due to the amount of possible code paths.
I'm not suggesting this is necessarily the best way to go FWIW, but this is also pretty similar to how, say, TypeScript adds incremental safety to JavaScript; null safety is accomplished by having not just the types but also the interface boundaries contain assertions that can be checked on both sides to avoid needing deep, recursive control flow analysis.
Instead, I think something like "Makes C safer than the C code you were using previously" is the better approach.
I haven't read every detail of this long post, but I do want to highlight something I strongly agree with:
> Interface informality is the root cause of unsafety
When writing C and C++, I've always felt an uneasy when passing a pointer to a function, because I don't really have a way of knowing if this is safe or not.
The comparison is not meant antagonistically, but almost as tongue-in-cheek.
In making the comparison we're acknowledging that Rust is the touchstone for safety in today's (programming) world. Our praise for Rust in the article shows this.
Most C programmers use C because they are forced to use it, by technical limitations like lack of compiler access or human limitations like lack of funding, curiosity or motivation.
C unfortunately became the API specification language of the most popular operating systems in the world. Both Unix and Windows world defines their low level system API using C types that strongly ties everybody and every program to C language, C standard library and the preferred compiler of the OS. Even Rust cannot escape it now. However Rust may give us hope / motivation and may force people to formally define the interfaces outside C so we can truly get rid of it one day.
Or other ecosystem effects; if I want to work on existing kernel code for pretty much any FOSS unix-family OS, C isn't possible to avoid, and AIUI even for all-new code it's still really spotty (like, you can write Linux drivers in Rust now, but only certain kinds of drivers).
I think the problem with your example here is that it's a bit simplistic. In your example, if we drop the concept of ownership, sure it's not unsafe they can both point to the same allocation. But multiple mutable references can absolutely be a big issue. For example:
pub enum Thing {
AA(i32),
BB(String),
}
pub fn foo() {
let mut thing = Thing::BB(String::from("Hello"));
let Thing::BB(str_ref) = &mut thing else {unreachable!()};
bar(&mut thing);
println!("{str_ref}");
}
fn bar(_: &mut Thing) { ... };
Without knowing the body of `bar` it's not possible to determine whether `foo` is sound. `bar` could change the enum to the `AA` variant, at which point `str_ref` is now pointing at an integer and some padding bytes, not a String. We've violated type safety, and read uninitialized memory both in the enum itself and in the now-dropped heap allocation.This is literally the point we're making in the article. Note that what protects you from violating type safety is not the ownership rules, but the ability to propagate these rules by annotating `bar`. So we're proposing a direct focus on this propagation.
How anyone can think pointer arithmetic is elegant is beyond me. Especially when the title mentions a language with algebraic data types and higher order functions. Or even just RAII.
I'm also not sure C is the lingua franca of programming anymore. From my personal experience (and language is strongly tied to geography and culture) people tend to talk about strings and booleans, JSON and other custom types, stacks and heaps. ints and longs have fixed sizes and representations in their minds. (In fact, many C concepts are used erroneously from a strictly-C-language point of view.)
This is not to insult C, however, which is elegant in its simplicity (and perhaps The Mother of Modern Programming Languages, if you fancy a majestic title), nor the project, which has a noble goal.
> How anyone can think pointer arithmetic is elegant is beyond me. Especially when the title mentions a language with algebraic data types and higher order functions. Or even just RAII.
I have nothing against algebraic data types. They're great. With respect to elegance, I will refer to nothing more than this (from K&R):
/* strcmp: return <0 if s<t, 0 if s==t, >0 if s>t */
int strcmp(char *s, char *t)
{
for ( ; *s == *t; s++, t++)
if (*s == '\0')
return 0;
return *s - *t;
}
I challenge you to provide a more elegant function with the same specification. void with_bytes(size_t count, void (*cb)(void *bytes);
You also may need to write similar wrappers around dependencies that require manual freeing of resources.Programming in this style will force you to think about the life cycle of every allocation and there are ways you can still shoot yourself in the foot. But as long as you avoid storing references in static variables, you will generally avoid memory leaks and use after free when programming in this style.
And you can just do this with any c compiler without relying on new dependencies.
This seems to try to mimic Rust safety rails as impractical annotations. Even if these annotations can express non-trivial cases (which I doubt), they will constrain you for reasons Rust lifetimes can be constraining, you'll have to do tons of extra work, with crappy bare-bone programming language, inside an ecosystem where no one gives a crap and that can't even agree on a common sane tooling.
And then I have no idea from where does it follow that the at thend it will be "Safer than Rust".
How is that better idea than just using a most loved programming language along with a maturing large ecosystem that actually has a culture of building safe and reliable software, using approaches that are not bolted on but built-in?
Nobody is claiming this is the only possible way to make safe programs, as far as I know formal proofs a la seL4, when done correctly, can also prove the absence of essentially any kind of defect. The problem with multiple mutable references is that if you have them, you need some way to validate that across all possible mutable references to a given piece of memory, there are no unsafe concurrent accesses. Rust's claim to fame is "fearless concurrency", and as far as I know borrow checking with one mutable reference exclusive with all other references is the simplest approach to providing concrete data race safety in the face of still having mutable shared state across code. Nothing on this page even mentions data races, and I'm not sure how you would accomplish safety against data races from this design alone.
I would prefer something like this:
void* foo() /* return malloc(SIZE) */ {
void *p = (void*) = malloc(SIZE);
bzero(p, SIZE);
return p;
}
It remains compatible with any compiler of choise.*