I don’t even like rust.
It does feel like the code was ported as-is to Rust, and only adjusted slightly to compile; there are going to be pain points as a result of this process. I suspect this is the source of some of the author's complaints, especially given:
> Although it provides us with a greater sense of how the code is executing, it brings very little value to the algorithm itself.
Rust is, in general, for people who find value in having that information; it is okay to not want to have to worry about ownership, borrowing, safety, etc., but it seems a bit odd to complain about this when that's what Rust is for? If you want to focus on just the algorithm, and not how it's executing, then OCaml is definitely a valid choice.
However, the point about GADTs - can Rust's recently-stabilized GATs not work in the same way? Though I will admit that Rust's GATs don't seem nearly as powerful as OCaml's GADTs in this regard.
that's the point of the article - rust gives you a lot of low-level control, but if you don't actually need that control then you're paying the cost in ergonomics for nothing.
The Rust one reads like...well a program. A program that's not as beautiful, but is very much designed to be taken apart, debugged, improved, etc.
I fully agree that if you're writing pure, recursive, data structure manipulations, OCaml is likely a better fit. It's closer to mathematical notation and I see the elegance in that. But if I were to take that data structure manipulation and turn it into a compiler with thousands of lines that I navigate with an IDE, with logging, with annoying little edge cases, with dozens of collaborators, I'd choose Rust.
You can add additional annotations in OCaml if you want, or just query the type of a term in Merlin.
> a compiler with thousands of lines that I navigate with an IDE, with logging, with annoying little edge cases, with dozens of collaborators, I'd choose Rust.
Why? OCaml supports logging and IDEs. Simple elegant code without the burden of manual memory management, makes it better able to cope with edge cases, being taken apart and refactored etc. Less of the complexity budget has already been spent.
To be clear, elegance is important. A language absent of elegance would be a bore to write (cough Java cough). But too much elegance and it can eclipse the legibility of the language. No type annotations is elegant. Is it legible? Not in my opinion. But perhaps it is in yours.
If types can be perfectly inferred, I don't see why I wouldn't take advantage of it. Whatever Ocaml IDE you use will easily be able to tell you the types anyways
While working at Routine (YC W21), I was tasked with porting our core library to iOS to minimize duplication of business logic. This was a lucky opportunity to write something resembling a compiler: it took in schemas described with our in-house data exchange library and generated C (for FFI) and Swift code (for the end users, i.e., iOS developers).
Since Routine uses OCaml for everything (which was a big motivator for joining the company—I wanted to see how that would work out), I wrote it in OCaml. The end result is a 3-5k LOC project. It's by no means a full compiler, but it was lots of fun to write. The language got in the way incredibly rarely. On average, it made my life a lot easier. We did encounter our fair share of issues, mostly due to the cross-compilation tooling[1], third-party libraries, and intricacies of FFI. Those do take their toll on sanity.
I tried my hand at writing small compilers / interpreters in Rust, and the experience was nowhere near as smooth. It was fun, and the runtime performance is definitely there, but the ergonomics aren't the same. I especially miss first-class modules whenever I code in something other than OCaml now.
[1]: we initially used esy [2], flirted with Nix, and eventually switched to opam-cross-ios [3].
[1]: https://github.com/esy/esy/
[2]: https://github.com/ocaml-cross/opam-cross-iosWith such requirements, I think a point that is fair to make is that Rust acts as a great middle-ground. It avoids the cost of automatic memory management and provides low-level control while also having a more powerful type system and a more "functional" style.
Brushing off the actual efficiency of the produced binary seems like a huge oversight when dealing with a compiler.
You could write a compiler for Pascal in Python or another very slow language and it would be faster than a Rust or C++ compiler written in Rust or C++. That is because those languages have designs that make compilation algorithmically slow, while Pascal was designed to be fast to compile.
It becomes visible when you build a large project: you notice that when you face 100k LOCs, efficiency of every compiler's part starts to matter, and RAM usage may grow to uncomfortable levels if your compiler does not care enough.
People like fast compile times and people either like to use (or are forced to use) languages that are inherently slow to compile. That's exactly why compiler performance is absolutely critical.
If the compiler is for JIT, then efficiency will be important.
I'd want to believe that compiler engineers really put effort into compilers performance, but I just don't buy it.
LLVM, GCC, MSVC, etc, etc all of them touch C/C++ and are slow as hell
For compilers written in other languages I'd say that still LLVM is the bottleneck
>It avoids the cost of automatic memory management and provides low-level control.
What "low-level control" do you need? It is not firmware development.
Btw: Microsoft rewrote their C# compiler from C++ to C#.
It's really easy to accidentally write code that walks off a performance cliff on unexpected input, but that's likely to get hacked around if someone reports it as slow compilers do annoy people.
For instance, passing RefCell<u32> by value as their code does makes no sense (just use u32...), and the code seems to have a lot of clones, most of which are probably unnecessary, while not having a single instance of the "mut" keyword.
In fact, I'm pretty sure it's completely broken, since their gensym doesn't do what they want, due to their wrong use of clones and refcells (it should take an &mut u32 and just increment it).
And definitely not idiomatic at all.
In my opinion, OCaml would see even more widespread use if the documentation were improved. I find it a chore to figure out how to use OCaml well. I also would like to use third-party libraries like Jane Street's Base because they've put a lot of work into providing even more functionality in their standard library, but their documentation is absolutely atrocious (where it exists at all).
OCaml is a mature language but does not have a very supportive ecosystem. I'm hoping the renewed interest will prompt changes there.
I felt that way about a dozen years ago. These things have cycles, apparently. But they also recently released multi-core OCaml in OCaml 5 which opens some doors for OCaml that were previously not open.
Edit:
I have used OCaml in production and currently I don't see a point to doing it again for the vast majority of problems. From a holistic language + runtime point of view OCaml occupies a space where it's not useful enough from a runtime perspective to replace any of the more convenient languages that exist and not low-level enough to fill the spot of any of the good alternatives in that space. Modularity-wise functors are nice but ultimately plenty of alternatives exist even it the lower-level languages.
With all that said, people should probably use the hell out of it if they're excited. It's a bit tiring seeing the constant stream of misinformation regarding alternatives to OCaml, though. There are good reasons it's losing out in industrial use to even languages like Haskell.
People should do basic research before writing something silly like this. Qualifying your statement with 'usually' is just a chicken sh*t approach. Common Lisp and Racket have optional strong typing, leaving the responsibility and choice to the developer. Common Lisp is great for implementing compilers. You also have things like Typed Racket and Coalton. The latter is completely statically typed ala MLTON
Sounds to me like they are comparing bad code and good code, not the languages themselves.
So the answer apparently is simple: between two languages, for an important project use the one which you wield best.
The same article could easily be written the other way around. We could start with a high performance rust program (which makes use of arena allocators, internal mutation and any other rust features you love) and then try and convert it line by line into ocaml. We would find that many of rust's concepts can't be clearly expressed in ocaml. The ocaml code would end up uglier and measurably slower than rust. And just like that the article would reach the opposite conclusion - that rust is clearly the better language!
But this is silly.
In general, you obviously can't translate between languages line by line like this and expect to have a good time. A beautiful C program is constructed using different ideas than a beautiful Lua program. And a beautiful Ocaml program is very different from a beautiful rust program.
Some obvious examples of ocaml ideas being overapplied to rust in this article:
1. The types don't really need to be wrapped in Rc here.
2. Rust generally prefers mutable imperative code over applicative code. And if you insist on applicative patterns, functions should take a &Foo.
3. Rust code usually doesn't rely on recursion that much, so the lack of guaranteed TCO isn't something people in the community care about.
4. Rust is optimized for runtime performance over code beauty or code size. Of course rust is less elegant looking than a garbage collected language! The trade is that it should also run faster. But where are the benchmarks to make the comparison fair?
The match example is just straight out bad rust code. This code:
fn eval(term: &Term) -> Value {
match term {
Bool(b) => Value::Bool(*b),
Not(m) => match eval(m) {
Value::Bool(b) => Value::Bool(!b),
_ => panic!("`Not` on a non-boolean value"),
},
// ... lots more nested matches & panics
}
}
Can be flattened, to approximately halve the length of the program like this: fn eval(term: &Term) -> Value {
match term {
Bool(b) => Value::Bool(*b),
Not(Value::Bool(b)) => Value::Bool(!b),
// ... (all other valid patterns)
_ => panic!("{term} invalid"),
}
}
There's an old saying: "Every programming language you learn should teach you to see programs in a new way". Rust is not a crappy alternate syntax for ocaml any more than ocaml is a crappy, alternate syntax for rust. The only thing I learned from the article is that the author doesn't know rust well enough to evaluate it.What would be the idiomatic way for the function `eval` to provide good error messages in Rust?
What you really probably want is to make this a Result type-returning thing, and then have have `not` be a function of type Value -> Result<Value,ErrType>, and then you can do not(eval(m)) and panic at the top-level.
enum Term { Bool(bool), Not(Box<Term>), ... }
that your code simply does not typecheck. `Not` expects a `Box<Term>`, not a `Value`.
It's also worth noting that one would probably want to consider something like
Not(Not(Bool(true)))
a valid term, which your implementation wouldn't.
In any case, I stand by all the other points I’ve made in my comment.
No it can't. You're missing the recursive call to `eval`.
> Other alternatives to consider is Haskell and various Lisp dialects. If you have already “tamed” Haskell (my congratulations and condolences), probably learning OCaml just for writing a compiler is not going to be worth it; if you have not, OCaml is a much more approachable language.
This is an interesting claim, as I thought Haskell and OCaml were more or less equivalently inscrutable.
1) built in eval -- this allows you to transpile to the host language which is invaluable for writing small tests
2) multiline string syntax -- for evaling more than just one liners
3) built in associative and sequential arrays (for the ast)
4) first class closures
5) panic support (for aborting early from unimplemented use cases)
The AST can be represented as an associative array. Each element type can have a 'type' field and rather than pattern matching, you can use if/else. Performance doesn't really matter for the bootstrap compiler because it will only ever be run on relatively small input sets. To get started, you simply walk the ast to transpile to the host language. The snippet is then evaled in the host language to test functionality. Closures allow you to implement the visitor pattern for each ast node, which allows contextual information to be seamlessly interwoven amongst ast nodes during the analysis/transpilation steps.
Keeping all of this in mind, I have identified luajit as my personal favorite language for compiler development. It checks the boxes above, has excellent all around performance for a dynamic language (particularly when startup time is included -- js implementations may beat it on many benchmarks but almost always have slow start up time relative to luajit) and provides a best in class ffi for host system calls. You can run 5000+ line lua scripts faster than most compilers can compile hello, world.
The other reason I like lua(jit) is the minimalism. Once you master lua (which is possible because of its small size) it becomes very obvious that if you can implement something in lua, you can translate the lua implementation to essentially any other language. In this way, there is a sense in which writing a lua implementation becomes almost like a rosetta stone in which a translation can be produced for nearly any other language. With more powerful languages, it is hard to resist the temptation to utilized features that can't always be easily transported to another language. In other words, lua makes it easy to write portable code. This is true both in the sense that lua can be installed on practically any computer in at most a few minutes and in the sense that the underlying structure of a lua program that transcends the syntax of the language can be ported to another computing environment/language.
Another benefit of transpiling to lua is that your new language can easily inherit lua's best properties such as embeddability, fast start up time and cross platform support while removing undesirable features like global variables. Your language can then also be used to replace lua in programs like nginx, redis and neovim that have lua scripting engines. This of course extends to transpiling to any language, which again should be relatively easy if you have already transpiled to lua.
I also wish it was a bit more performant, but here it's likely my medium to high level code and not luajit's fault. However running the test suite in plain Lua seem some order of magnitude slower than luajit, so it's a lot faster than plain Lua at least.