https://em-tg.github.io/csborrow/
These kinda-sorta fall under borrow checking or regions, just without any annotations. Then again, Ada/Spark's strategy also technically falls under Tofte-Talpin regions:
https://www.cs.cornell.edu/people/fluet/research/substruct-r...
Also, since it’s an error, I guess it must be different from gcc and clang (and, likely other C compilers) -Wreturn-local-addr warnings (https://www.emmtrix.com/wiki/Clang:Flag/-Wreturn-local-addr), which can have false positives.
What’s the difference? Is the language stricter, disallowing some constructs that are valid, but hard to prove or is the error not always triggered for buggy code?
That constraint is just a Tofte-Talpin region restriction, eg. lexically scoped regions.
https://github.com/prasannavl/WinApi
The performance of this interop layer is so close to native it's difficult to argue for doing things the much more painful way.
With an acute accent, that should be roughly /ˌkɜːrˈseɪd/ “curse-ay-d”. (Think “café” or “sashayed”.)
The stylised pronunciation being evoked is roughly /ˈkɜːrˌsɛd/, “curse-ed”, and would be written with a grave accent: “cursèd”.
It is really good as food for thought though.
It reminds me of the early days of the web, when text was king and content was king. I particularly like the sidenotes in the margins approach.
(Hope the author sees this comment :) Hats off)
There's some great tooling for that via https://edwardtufte.github.io/tufte-css/ and https://tufte-latex.github.io/tufte-latex/.
> Wait a minute, this list goes to 17, yet the intro only mentions 14! I actually did that because a couple might overlap and a couple of them are half-approaches, and that last one is just here for fun. Besides, as I learn more approaches and add them to the list, the title will get more and more out of date anyway.
1. laissez-faire / manual memory management (c, c++, etc)
In this approach, the programmer decides everything.
2. dictatorship / garbage collection (java, go, etc)
In this approach, the runtime decides everything.
3. deterministic / lifetime memory management (rust, c with arenas, etc)
In this approach, the problem determines everything.
https://www.absint.com/astree/index.htm
For those who think static analyzers cannot do that, notice the word “sound”. This is a different type of static analyzer than the more common ones that do not catch everything.
Sadly, there is no open source option that works across a broad range of software. NASA’s IKOS is open source for example, but it does not support multithreading and some other things that I do not recall offhand, which makes it unable to catch all memory safety bugs in software using the features that it does not support. For now, people who want to use sound static analyzers need to use closed source tools or restrict themselves to a subset of what C/C++ can do so they can use IKOS:
You can think of it as the rather classic "Vec of struct + numeric IDs" that is used a lot e.g. in Rust to represent complex graph-like structures.
This combined with bound checking is absolutely memory safe. It has a bunch of correctness issue that can arise due to index confusion but those are not safety issues. When combined with some kind of generational counters those correctness issue also go away but are only caught at runtime not at compile time (and they incur a runtime cost).
Rust's memory safety is about avoiding liveness issues (that become type confusions since all memory allocators will reuse memory for different types), nothing more, nothing less.
there are not memory safety issues. But they definitely can lead to security issues with some sort of confused deputy attack.
For example a capability based system that relied on just this form of memory safety would be pointless.
Of course this can be mitigated by adding version counters to objects and object selectors.
If a language is merely type-safe, then it might be OK to silently replace a value with a different one of the same type, sure, fine. Who cares if the program transmits the wrong message to the wrong recipient as long as it's definitely some message and some recipient?
But a value-safe language, I suggest, is one that doesn't pull this kind of switcheroo.
I am interested in Vale and it feels very promising, though because my interested in bootstrapping I don't like that it is written in Scala. I know, that is shallow, but that's a thing that limits my enthusiasm.
If you are like me and don't like jumping around between notes and text and you prefer to read the notes anyway, here is a little snippet you can run in Web Inspector's Console:
document.querySelectorAll(".slice-contents a[data-noteid]").forEach(e => {document.querySelectorAll('.slice-notes [data-noteid="' + e.attributes["data-noteid"].nodeValue + '"] p').forEach(p => {p.style.fontSize = 'smaller'; e.parentNode.insertBefore(p, e)}); e.remove() })
It will replace note links with notes themselves making them smaller, because they will not always fit smoothly.Borrow Checking, RC, GC, and the Eleven (!) Other Memory Safety Approaches - https://news.ycombinator.com/item?id=41974185 - Oct 2024 (1 comment)
Borrow Checking, RC, GC, and the Eleven () Other Memory Safety Approaches - https://news.ycombinator.com/item?id=40146615 - April 2024 (68 comments)
- Type safety > Memory management and type safety: https://en.wikipedia.org/wiki/Type_safety#Memory_management_...
- Memory safety > Classification of memory safety errors: https://en.wikipedia.org/wiki/Memory_safety#Classification_o...
- Template:Memory management https://en.wikipedia.org/wiki/Template:Memory_management
- Category:Memory_management https://en.wikipedia.org/wiki/Category:Memory_management
The only real problem with this approach is code reuse, because library writers will insist on opaque structs and malloc rather than letting the caller allocate.
https://www.absint.com/astree/index.htm
None make any mention of stack exhaustion that I can find in a cursory search.
https://en.wikipedia.org/wiki/Stack_overflow
> In software, a stack overflow occurs if the call stack pointer exceeds the stack bound.
Also your searching astree site reveals:
> StackAnalyzer automatically determines the worst-case stack usage of the tasks in your application. It lets you find any stack overflows, or formally prove the absence thereof.
Something that's missing is full-on formal verification where you write unrestricted C code and then mathematically prove it doesn't have any bugs. Nobody does this because proving a C program is correct is harder than mining a bitcoin block by hand, but it's useful to anchor one end of the safety/freedom spectrum. Many other approaches (such as borrow checking) can also be viewed as variants of this where you restrict the allowed program constructs to ones that are easier to prove things about.
This isn't to say reference counting is without problems (there are plenty of them, inability to collect cyclical references being the most well known), but I don't normally think of it as a slow technique, particularly on modern CPUs.
[1] an atomic inc on x86 is typically ~30 clock cycles, doesn't really pipeline well and will stall at the very least other load operations.
Is that actually true? I'm pretty sure you need the borrow checker even for single threaded Rust to prevent use after frees.
My understanding is that most (all?) GC languages are memory safe, but do not ensure statically verifiable thread safety at all. Like Java, Go, C#, Python, etc.
Also, no point in calling it “tracing garbage collection”. Its just “garbage collection”. If you’re garbage collecting, you’re tracing.
You're against more explicit naming just for the sake of it? In the literature reference counting is also referred to as a type of garbage collection, and doesn't involve tracing. If you talking about a specific context you can probably drop the "tracing", but in a general article like this it would just be very confusing?
This way, someone can google "tracing garbage collection", and will find the relevant wikipedia article: https://en.wikipedia.org/wiki/Tracing_garbage_collection
For example, nobody says that Objective-C is garbage collected just because it has ARC. Nobody says that C++ is garbage collected even though shared_ptr is widespread. And systems that do tracing GC just call it GC (see for example https://www.oracle.com/webfolder/technetwork/tutorials/obe/j...)
To think clearly about the tradeoff between GC and RC it’s important to acknowledge the semantic differences:
- GC definitely collects dead cycles.
- RC knows exactly when objects die, which allows for sensible destructor semantics and things like CoW.
- it’s possible to use RC as an optimization in a GC, but then you end up with GC semantics and you still have tracing (hence: if it’s got tracing, it’s a garbage collector).
It’s a recent fad to say that RC is a kind of GC, but I don’t think it ever took off outside academia. Folks who write GCs call them GCs. Folks who do shared_ptr or ARC say that they don’t use GC.
And its good if this fad dies because saying that RC is a kind of GC causes folks to overlook the massive semantic elephant in the room: if you use a GC then you can’t swap it for RC because you’d leak memory (RC would fail to delete cycles), and if you use RC and swap it for a GC then you’d leak resources (your destructors would no longer get called when you expect them to).
On the other hand, it is possible to change the guts of an RC impl without anyone noticing. And it’s possible to change the guts of a GC while preserving compatibility. So these are really two different worlds.
But concurrent GC is the basis for making deterministic GC, since it gives you the option of scheduling GC work whenever you like rather than pausing the world.
Some concurrent GCs are also deterministic while others aren’t. I’ve written both kinds.
Intuitively it feels like making it concurrent should do the opposite of making GC deterministic! I’d love to read something showing that intuition is wrong
If you want to see my latest concurrent GC, see
https://github.com/pizlonator/llvm-project-deluge/blob/delug...
https://github.com/pizlonator/llvm-project-deluge/blob/delug...
fn modify(val: &mut u8) {
// ...
}
No other language appears to have this seemingly trivial capability; their canonical alternatives are all, IMO, clumsier. In light of the article, is this due to Rust's memory model, or an unrelated language insight?This is automatically thread-safe, without any locks. It's guaranteed that there can't be any side effects that could affect this memory, no matter what code you call. You don't need any defensive coding copying the memory just in case. It optimizes well, because it's certain that it won't overlap with any other region.
C++ doesn't have that kind of strong no-alias guarantee. Even memory behind const pointers can be mutated by something else at distance. The closest equivalent is C's restrict pointers, but they're more coarse-grained, and aren't checked by the compiler.
There is no equivalent in Python, Javascript, Typescript, Java, or Kotlin.
Safety requires unions to be safe, so unions have to become tagged enums. To have tagged enums usable, you have to have pattern matching, otherwise you'd get something awkward like C++ std::variant.
Borrow checking works only on borrowed values (as the name suggests), so you will need something else for long-lived/non-lexical storage. To avoid GC or automatic refcounting, you'll want moves with exclusive ownership.
Exclusive ownership lets you find all the places where a value is used for the last time, and you will want to prevent double-free and uninitialized memory, which is a job for RAII and destructors.
Static analysis of manual malloc/realloc and casting to/from void* is difficult, slow, and in many cases provably impossible, so you'll want to have safely implemented standard collections, and for these you'll want generics.
Not all bounds checks can be eliminated, so you'll want to have iterators to implement typical patterns without redundant bounds checks. Iterators need closures to be ergonomic.
…and so on.
Every time you plug a safety hole, it needs a language feature to control it, and then it needs another language feature to make this control fast and ergonomic.
If you start with "C but safe", and keep pulling that thread, nearly all of Rust will come out.
So there are obviously different ways of addressing this.
Memory safety is just one aspect where the compiler can help making sure a program is correct. The more the compiler helps with static analysis, the less we need to rely on creating tests for edge cases.
I feel as though not enough attention is given to how std is designed. For example: [u8], str, Path, and OsStr may be confusing at first, but when you understand why they are there any other approach feels icky. std guides you down a path of caring about things that really should matter (at least if you're only unwrapping provably safe values).
Have you considered what happens if not-utf8 data winds up in an environment variable that you are writing to stdout? What if it contains malicious VT commands?
https://en.wikipedia.org/wiki/Maya_peoples
and secondly, the reason why the pre-Colombian cultural texts and script are not in use today, even by the people who speak the 28 Mayan languages currently in use, is because of genocide by Columbus and those that followed. The Catholic church destroyed every piece of Mayan script they could get their hands on.
The article reads like the author is not aware of these basic facts of American geography and history.
This is wrong, Interaction nets (and combinators) can model any kind of computational systems, including ones that use mutation. In fact, ICs are not really about types at all, although they do come from a generalization of Girard's proofs nets, which came from work in linear logic.
The interesting thing about ICs is that they are beta-optimal (any encoding of a computation will be done in the minimum number of steps required -- there is no useless work being done), and maximum-parallel with only local synchonization (all reduction steps are local, and all work that can be parallelized will be parallelized).
Additionally ICs have the property that any encoding of a different computational system in ICs will preserve the asymptotic behavior of all programs written for the encoded computational system. In fact, ICs are the only computational system with this property.
Interaction nets absolutely require garbage collection in the general sense. However, interaction combinators are linear and all garbage collection is explicit (but still exists). HVMs innovation is that by restricting the class of programs encoded in the ICs you can get very cheap lambda duplication and eschew the need for complex garbage collection while also reducing the overhead of implementing ICs on regular CPUs (no croissants or brackets, see Asperti[1] for what that means).
Having a linear language with the above restriction allows for a very efficient implementation with a very simple GC, while maximizing the benefits of ICs. In principle any language can be implemented on top of ICs, but to get most benefits you want a language with these properties. It's not that HVM starts with affine types and an efficient lazy clone operation, it's that a linear language allows extremely efficient lazy cloning (including lambda cloning) to be implemented on top of ICs, and the result of that is HVM.
> The HVM runtime implements this for Haskell.
This is very wrong. HVM has nothing to do with Haskell. HVM3 is written in C[2], HVM2 has three implementations, one in C[3], one in Rust[4], and a CUDA[5] one. HVM1 was just a prototype and was written in Rust[6].
HOC[7], the company behing HVM provides two languages that compile to HVM, Bend[8], and Kind[9]. Bend is a usual functional language, while Kind is a theorem prover based on self types.
Haskell is not involved in any of these things except that the HVM compiler (not runtime) is written in Haskell, but that is irrelevant, before Haskell it used to be written in TypeScript and then in Agda (Twitter discussion, sorry, no reference). It's an implementation detail, it's not something the user sees.
Please note that HVM adds some stuff on top of ICs that makes it not strictly beta-optimal, but nevertheless the stuff added is useful in practice and the practical downgrade from theoretical behaviour is minimal.
[1] Andrea Asperti, The Optimal Implementation of Functional Programming Languages, ISBN-13: 978-0060815424
[2] https://github.com/HigherOrderCO/HVM3/blob/main/src/HVML/Run...
[3] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.c
[4] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.rs
[5] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.cu
[6] https://github.com/HigherOrderCO/HVM1
A generalized RCU is just a tracing GC.
The problem is that it is very easy to write non-GC'd code in a GC'd language, but the other way around it is much much harder.
Therefore, I think the fundamental choice of Rust to not support a GC is wrong.
If GC is an option and you want all the nice parts of Rust, use OCaml
Having said that, there are still OCaml (as you noted), Haskell, .NET languages with Native AOT, JVM languages with GraalVM/OpenJ9, D, Nim, Swift, ....
And if one wants to get fancy with type systems, Idris, Dafny, FStar,..
So are you saying it would be possible to use a hypothetical "non-GC-enabled" OCaml compiler that complains if GC'd code is invoked/generated, and it would be a similar experience as using Rust?
Do you think that projects that have a large GUI component should be written in Rust?
What if a project has both a "systems" and a GUI component to it?
I was talking with a colleague about that, he said “in C I know exactly where things are when” And I replied that under any OS with virtual memory, you have basically no clue where are things at any time, in the N levels of cache, and you cannot do accurate time predictions anyway… [1]
I’m convinced today GC is the way to go for almost all. And I was until 5 years ago or so, totally opposed to that view.
Many of their customers are factory processes and military deployments with weapons control, two scenarios where any kind of stall might produce deadly results.
It just runs at compile time. Bonus feature, it helpfully prevents a number of common bugs too.
If you have to do it yourself, then it does not "have" a GC.
Use after free is important, but in my experience not common and not too hard to track down when it happens (maybe I'm lucky? - we generally used a referenced counted GC for the cases where ownership is hard to track down in C++)
I'm more worried about other issues of memory safety that are not addressed: write into someone else's buffer - which is generally caused by write off the end of your buffer.
Yes. A garbage collector is only safe if it works correctly. What an irrelevant observation. Nothing can guarantee that something works correctly if it doesn't work correctly.