Yet, it seems that OCaml is more popular among programmers and real-world projects. Even though these programmers come from the academic field, given the niche existence that OCaml still is. For example, the astonishing MigrageOS project chose OCaml instead of SML.
So my question is:
How is that? Why is OCaml so much popular, despite having just one implementation and no real spec? Why is SML with its real spec and multiple implementations not as least equally popular?
EDIT: Here are two possible answers that I don't think apply:
1. OCaml may be "good enough", which, combined with network effects, make choosing OCaml over SML a self-fulfilling prophecy. I don't think it is that simple, because OCaml users and projects come mostly from the academic field. They are deeply concerned with correctness of code. Which would mean they should all have favored SML over OCaml. In fact, sometimes correctness seems to be the sole motivation. For example, the author(s) of OCaml-TLS didn't just want to create yet another TLS library in a hip language. They are concerned with the state of the OpenSSL and similar libraries, and wanted to create a 100% correct, bullet-proof, alternative.
2. Although one could attribute this to the "O" in Objective Caml, I don't think it is that simple, because it seems the object-oriented extensions are almost unused, and wherever I saw them being used (e.g. LablGTK, an OCaml wrapper for the GTK UI library) I don't see that much value, and that sticking to plain OCaml Modules and Functors would have led to a better interface.)
My own extrapolation is that a multitude of SML implementations meant that programmers couldn't take advantage of implementation-specific extensions, as that would prevent their code from running elsewhere. This hindered real-world experimentation (as opposed to academic investigations) and organic growth of the language.
But, I should say that as starting point for creating new functional languages, SML is an exceedingly clean design. I'm using it myself for my own language design efforts.
I would definitely agree with the opinion above that the SML language designers fall in the camp of deliberating extensively and fully formalizing all new language features before considering them for inclusion.
Today, there are two challenges right now in the Successor ML space:
- There's some broad agreement on a few things that should obviously be improved, but today many of the implementation owners don't have a huge amount of time, so getting them to also agree to the new features is complicated. Life is way easier when you only have one implementation.
- There's a bunch of fantastic ideas (largely driven by Bob Harper) to integrate stuff like parallelism, cost model, etc. that will be fantastic for a next version of the language and specification. It's just a lot of work, and not really publishable, so it's mainly tenured faculty and "alumni" (like myself and probably the people you work with) driving it forward in our free time.
That said, I still love working on SML and its implementations (I mainly contribute to SML/NJ and Manticore) and it's a wonderful escape from the grim reality of quarterly goals and "get it out the door" release deadlines :-)
Caml had these, and thus was used to build some software somewhat popular on Unix systems (e.g. Unison, MLDonkey, a flash compiler). And while these days it might seem like we've got plenty of freely available, fast compiling, native languages to choose from, just a few years the situation was very different. So some just chose the language for that reason, never mind syntax or semantics. As long as it wasn't C.
So you just got the feeling of more dedicated, pragmatic language maintainers and actual community use. People who really care about purism went over to Haskelly anyway.
(Personally, I like the idea of languages with more than one compiler, if only the "Standard" part of "SML" would be a bit bigger)
- CakeML is written in HOL4, which is using ML.
- SML, unlike Ocaml has a well-defined semantics and so there is a good starting point for a verified compiler.
Second, regarding Ocaml, the easiest reason for choosing Ocaml over ML is because there is only one implementation of the language that everybody is using. This makes it a lot more likely that the ocaml compiler will still be around a few decades from now. There are several modern ML compilers, most of which are more or less unmaintained. PolyML, which everybody seems to be using these days, is substantially developed by a single person.
That said, the CakeML team welcomes a translator for Ocaml that outputs CakeML programs. Additionally, the Ocaml compiler was clean enough in architecture that Esterel was about to do source-to-object code validation required for its DO-178B-certified, code generator. They said they had to do way less work modifying or analyzing it than they expected. That means the Ocaml compiler itself might be a candidate for verification albeit probably a non-optimizing form of it. I'd also try the K framework that successfully handled the C semantics with KCC compiler. If it can handle C, I'd venture a guess that it should handle a better designed compiler and language. ;)
EDIT to add: Ocaml syntax is being worked on at link below.
https://github.com/CakeML/cakeml/tree/master/unverified/ocam...
Doesn't matter if you actually code with objects. The important point for adoption is that it does them.
So for a noob .NET programmer you say, hey, look at F#! You can code it just like C#. Kinda.
But as soon as they start coding, you tell them nope, all of these types of things are actually antipatterns.
The real value of objects in F# lies in the CLR compatibility. Access to the BCL and some existing code is a big advantage over OCaml for .neteers.
I think network effects do matter much more than you say. There is virtually no community around SML.
Side note: Cake stands for CAmbridge KEnt, which is where (most of) CakeML's verification was carried out.
The pioneering project in this space was X. Leroy's CompCert. This was the first verified optimising compiler. More precisely, a realistic, moderatly-optimising compiler for a large subset of the C language down to PowerPC and ARM assembly code.
Is this is a first because it is is theoretically difficult to do, or because it requires a lot of implementation time? What are some key points to read up and understand in order to properly appreciate this result, past the Wikipedia article on formal verification [1]?
Thank you in advance for any elaboration.
- Provide a verified software toolchain for programmers with a minimal trusted computing base.
- Investigate how the cost (in a general sense) of formal verification in general and compiler verification in particular can be lowered, ideally to the point that normal programmers can routinely use formal verification.
The advance that CakeML makes over CompCert is bootstrapping: CakeML can compile itself, while CompCert (being a C compiler written in Ocaml) can't. Simplifying a bit, bootstrapping lowers the trusted computing base.
Maybe Leroy's [3, 4] are good starting point for learning about this field.
[1] T. Nipkow, G. Klein, Concrete Semantics. http://www.concrete-semantics.org/
[2] A. Chlipala, A verified compiler for an impure functional language. http://adam.chlipala.net/papers/ImpurePOPL10
[3] X. Leroy, Verifying a compiler: Why? How? How far? http://www.cgo.org/cgo2011/Xavier_Leroy.pdf
[4] X. Leroy, Formal verification of a realistic compiler. http://gallium.inria.fr/~xleroy/publi/compcert-CACM.pdf
I know of a few programs that when compiled with -O2 work fine but break with -O3. This is because some optimizations that are applied are just not what the programmer expected or in the older days just broken working code.
Formally verifying the output is difficult because you need to prove the same operation is happening both times even if they are extremely different in what they are doing. I'm assuming that's where the difficulty comes in.
Just email them first in case someone has done the work already. Academics sometimes are slow to update web sites due to digging deep into their research. ;) The best uses I can think of for CakeML are:
A reference implementation to do equivalence checks against with main language, a ML or not, being something optimized.
Someone to build other tools in that need high assurance of correctness. Prototype it to get the algorithm right using any amount of brains and tooling that already exist with an equivalent CakeML program coming out. Then, that turns into vetted object code.
A nice language for writing low-level interpreters, assemblers, or compilers that bootstrap others in a high-confidence way. Idea being in verifiable or reproducible builds where you want a starting point that can be verified by eye. They can look at the CakeML & assembly output with some extra assurance on top of hand-doing it. One might even use the incremental compilation paper on building up a Scheme to end up with a powerful, starting language plus assurance binary matches code.
val num = 10
Then take a look at the x86 generation.What is all of that. It doesn't look like executable data needed. Is that just implicit functions or something baked into the language? If it is, why isn't it being tree-shook?