CakeML – A Verified Implementation of ML (opens in new tab)

(cakeml.org)

163 pointssetori889y ago37 comments

37 comments

26 comments · 6 top-level

vog9y ago· 11 in thread

I find it interesting that CakeML, like many other developments in this area, is based on SML (Standard ML) and not OCaml (Objective Caml). Moreover, whenever I read something about ML languages, it seems most people in the academic field talk about SML.

Yet, it seems that OCaml is more popular among programmers and real-world projects. Even though these programmers come from the academic field, given the niche existence that OCaml still is. For example, the astonishing MigrageOS project chose OCaml instead of SML.

So my question is:

How is that? Why is OCaml so much popular, despite having just one implementation and no real spec? Why is SML with its real spec and multiple implementations not as least equally popular?

EDIT: Here are two possible answers that I don't think apply:

1. OCaml may be "good enough", which, combined with network effects, make choosing OCaml over SML a self-fulfilling prophecy. I don't think it is that simple, because OCaml users and projects come mostly from the academic field. They are deeply concerned with correctness of code. Which would mean they should all have favored SML over OCaml. In fact, sometimes correctness seems to be the sole motivation. For example, the author(s) of OCaml-TLS didn't just want to create yet another TLS library in a hip language. They are concerned with the state of the OpenSSL and similar libraries, and wanted to create a 100% correct, bullet-proof, alternative.

2. Although one could attribute this to the "O" in Objective Caml, I don't think it is that simple, because it seems the object-oriented extensions are almost unused, and wherever I saw them being used (e.g. LablGTK, an OCaml wrapper for the GTK UI library) I don't see that much value, and that sticking to plain OCaml Modules and Functors would have led to a better interface.)

Athas9y ago

I work with a bunch of people who were very involved in SML, and I've asked this question many times. They usually say that SML (as a language) stagnated because of personal conflicts in the standardisation committee. A bit of a "worse is better" microcosmos, really, with a focus on on discussing theoretical perfection instead of how to get things into the hands of users. You can partially see this in play today in the Successor-ML[0] discussions, where there's not really any clear vision or leadership driving things forward, just a lot of (often very good) language suggestions.

My own extrapolation is that a multitude of SML implementations meant that programmers couldn't take advantage of implementation-specific extensions, as that would prevent their code from running elsewhere. This hindered real-world experimentation (as opposed to academic investigations) and organic growth of the language.

But, I should say that as starting point for creating new functional languages, SML is an exceedingly clean design. I'm using it myself for my own language design efforts.

[0]: https://github.com/SMLFamily/Successor-ML/issues

larsberg9y ago

You can hear a bunch more about the full history of development of SML and interaction with OCaml here: https://www.youtube.com/watch?v=NVEgyJCTee4

I would definitely agree with the opinion above that the SML language designers fall in the camp of deliberating extensively and fully formalizing all new language features before considering them for inclusion.

Today, there are two challenges right now in the Successor ML space:

- There's some broad agreement on a few things that should obviously be improved, but today many of the implementation owners don't have a huge amount of time, so getting them to also agree to the new features is complicated. Life is way easier when you only have one implementation.

- There's a bunch of fantastic ideas (largely driven by Bob Harper) to integrate stuff like parallelism, cost model, etc. that will be fantastic for a next version of the language and specification. It's just a lot of work, and not really publishable, so it's mainly tenured faculty and "alumni" (like myself and probably the people you work with) driving it forward in our free time.

That said, I still love working on SML and its implementations (I mainly contribute to SML/NJ and Manticore) and it's a wonderful escape from the grim reality of quarterly goals and "get it out the door" release deadlines :-)

1 more reply

mhd9y ago

The old ML compilers were more deeply rooted in academia. It was almost popular to write a new one to fiddle with a new idea (like concurrency). Performance and more pragmatic considerations were a bit secondary and that included OS interfaces.

Caml had these, and thus was used to build some software somewhat popular on Unix systems (e.g. Unison, MLDonkey, a flash compiler). And while these days it might seem like we've got plenty of freely available, fast compiling, native languages to choose from, just a few years the situation was very different. So some just chose the language for that reason, never mind syntax or semantics. As long as it wasn't C.

So you just got the feeling of more dedicated, pragmatic language maintainers and actual community use. People who really care about purism went over to Haskelly anyway.

(Personally, I like the idea of languages with more than one compiler, if only the "Standard" part of "SML" would be a bit bigger)

agumonkey9y ago

I wonder if the Objective part of it made it more lively in the mainstream eyes too instead of "stagnant/crude" sml.

1 more reply

fmap9y ago

I can add a few more data points. First regarding CakeML:

- CakeML is written in HOL4, which is using ML.

- SML, unlike Ocaml has a well-defined semantics and so there is a good starting point for a verified compiler.

Second, regarding Ocaml, the easiest reason for choosing Ocaml over ML is because there is only one implementation of the language that everybody is using. This makes it a lot more likely that the ocaml compiler will still be around a few decades from now. There are several modern ML compilers, most of which are more or less unmaintained. PolyML, which everybody seems to be using these days, is substantially developed by a single person.

nickpsecurity9y ago

On top of it, SML was literally designed to do proving. Writing provers, feeding algorithms into them, and later extracting algorithms out. Ocaml was designed as a practical language for software development. Unsurprisingly, the provers and verified tools tend to be done in SML or a subset of it while real-world software is done in Ocaml.

That said, the CakeML team welcomes a translator for Ocaml that outputs CakeML programs. Additionally, the Ocaml compiler was clean enough in architecture that Esterel was about to do source-to-object code validation required for its DO-178B-certified, code generator. They said they had to do way less work modifying or analyzing it than they expected. That means the Ocaml compiler itself might be a candidate for verification albeit probably a non-optimizing form of it. I'd also try the K framework that successfully handled the C semantics with KCC compiler. If it can handle C, I'd venture a guess that it should handle a better designed compiler and language. ;)

EDIT to add: Ocaml syntax is being worked on at link below.

https://github.com/CakeML/cakeml/tree/master/unverified/ocam...

2 more replies

tempodox9y ago

I can't speak for others, but from a purely practical and non-academic point of view, I chose OCaml over SML because it's painless and easy to produce a standalone binary executable. Most SML implementations seem to be able to do that too, but require me to write compiler-specific glue code, IIRC. I don't want to have to dive into compiler internals to produce an executable. Also, I appreciate that the OCaml maintainers are not afraid to provide useful syntactic sugar although it may not seem “real FP”, like the `for` loop for instance. I seldom use the OO extension, but support for virtual methods can be very handy at times. You can implement the mechanics for those with functors too, but it's just more code to write that way.

DanielBMarkham9y ago

It may be similar to the reason for adoption for F# -- it does objects

Doesn't matter if you actually code with objects. The important point for adoption is that it does them.

So for a noob .NET programmer you say, hey, look at F#! You can code it just like C#. Kinda.

But as soon as they start coding, you tell them nope, all of these types of things are actually antipatterns.

jetti9y ago

I disagree with your premise on why F# was adopted. It isn't because it does objects but it was really the first functional language that was pushed by Microsoft and ran on the .NET platform. Had F# not run on the CLR I don't think it would have nearly as many users as it does now.

1 more reply

titanix29y ago

I think people wanting to use C# will stick to it, not learning another language to use it the same way as the one they already know.

The real value of objects in F# lies in the CLR compatibility. Access to the BCL and some existing code is a big advantage over OCaml for .neteers.

rwmj9y ago

We use OCaml for the virt tools because it (a) generates efficient, standalone binaries and (b) lets us call out to C libraries really easily. We use OCaml over (say) C or C++ because it allows us to write code with far fewer bugs and headaches. You can find one of these large, real world OCaml programs here: https://github.com/libguestfs/libguestfs/tree/master/v2v

I think network effects do matter much more than you say. There is virtually no community around SML.

mafribe9y ago· 3 in thread

Summary: CakeML is the first verified optimising compiler that bootstraps.

Side note: Cake stands for CAmbridge KEnt, which is where (most of) CakeML's verification was carried out.

The pioneering project in this space was X. Leroy's CompCert. This was the first verified optimising compiler. More precisely, a realistic, moderatly-optimising compiler for a large subset of the C language down to PowerPC and ARM assembly code.

aisofteng9y ago

Could you expand on the significance of the first, for those of us not familiar with formal verification?

Is this is a first because it is is theoretically difficult to do, or because it requires a lot of implementation time? What are some key points to read up and understand in order to properly appreciate this result, past the Wikipedia article on formal verification [1]?

Thank you in advance for any elaboration.

[1] https://en.wikipedia.org/wiki/Formal_verification

mafribe9y ago

The problem with verifying realistic compilers is scale. We have known how to do it in principle since forever, and verification of toy compilers is part of textbooks on verification, such as [1], see also [2]. Realistic compilers are very complicated and Leroy's verification of CompCert took several man years for one of the world's leading compiler and verification guys. The purpose of research like CompCert and CakeML is twofold:

- Provide a verified software toolchain for programmers with a minimal trusted computing base.

- Investigate how the cost (in a general sense) of formal verification in general and compiler verification in particular can be lowered, ideally to the point that normal programmers can routinely use formal verification.

The advance that CakeML makes over CompCert is bootstrapping: CakeML can compile itself, while CompCert (being a C compiler written in Ocaml) can't. Simplifying a bit, bootstrapping lowers the trusted computing base.

Maybe Leroy's [3, 4] are good starting point for learning about this field.

[1] T. Nipkow, G. Klein, Concrete Semantics. http://www.concrete-semantics.org/

[2] A. Chlipala, A verified compiler for an impure functional language. http://adam.chlipala.net/papers/ImpurePOPL10

[3] X. Leroy, Verifying a compiler: Why? How? How far? http://www.cgo.org/cgo2011/Xavier_Leroy.pdf

[4] X. Leroy, Formal verification of a realistic compiler. http://gallium.inria.fr/~xleroy/publi/compcert-CACM.pdf

2 more replies

gravypod9y ago

The problem with optimizing compilers is that you're changing what the user expects to come out. This is very dangerious because you need to be able to prove that what you've generated will work as the non-optimized version.

I know of a few programs that when compiled with -O2 work fine but break with -O3. This is because some optimizations that are applied are just not what the programmer expected or in the older days just broken working code.

Formally verifying the output is difficult because you need to prove the same operation is happening both times even if they are extremely different in what they are doing. I'm assuming that's where the difficulty comes in.

1 more reply

gravypod9y ago· 3 in thread

Go into the compiler explorer [0] and type the following

    val num = 10

Then take a look at the x86 generation.

What is all of that. It doesn't look like executable data needed. Is that just implicit functions or something baked into the language? If it is, why isn't it being tree-shook?

junke9y ago

It seems you need to add a semi-colon after the line (the AST was empty). A minimal program is simply "();", and compiles into something even more complex. A big part of it looks like runtime setup/cleanup code, common for all programs.

clarus9y ago

It seems that there are some examples here: https://github.com/CakeML/cakeml/blob/master/explorer/exampl... Whould be cool to have these examples accessible from the web interface of the compiler explorer.

gravypod9y ago

Sadly it doesn't seem like you can link to a compiler output from their compiler.

Twirrim9y ago· 3 in thread

What is ML in this context? Neither CakeML nor Standard ML site appear to actually define it, and it's an acronym with a few definitions in tech (e.g. Machine Learning)

more_original9y ago

It's Meta Language. The ML language was developed in the early 1970s as a meta-language for the LCF theorem prover. ML was developed as a language for programming proof tactics. The strong type system and type soundness guarantees of ML were important to guarantee that such tactics could only prove correct theorems.

https://en.wikipedia.org/wiki/ML_(programming_language)

z1mm32m4n9y ago

For a bit of history about ML, see Appendix F: The Development of ML (on page 89) from here:

http://sml-family.org/sml97-defn.pdf

igravious9y ago

https://en.wikipedia.org/wiki/ML_(programming_language)

nickpsecurity9y ago

Everyone with formal method background interested in this work should consider taking on one of their posted projects that would improve it. Especially Ocaml to CakeML translator.

https://cakeml.org/projects

Just email them first in case someone has done the work already. Academics sometimes are slow to update web sites due to digging deep into their research. ;) The best uses I can think of for CakeML are:

A reference implementation to do equivalence checks against with main language, a ML or not, being something optimized.

Someone to build other tools in that need high assurance of correctness. Prototype it to get the algorithm right using any amount of brains and tooling that already exist with an equivalent CakeML program coming out. Then, that turns into vetted object code.

A nice language for writing low-level interpreters, assemblers, or compilers that bootstrap others in a high-confidence way. Idea being in verifiable or reproducible builds where you want a starting point that can be verified by eye. They can look at the CakeML & assembly output with some extra assurance on top of hand-doing it. One might even use the incremental compilation paper on building up a Scheme to end up with a powerful, starting language plus assurance binary matches code.

fithisux9y ago

What is the difference between CakeML and StandardML in terms of Syntax and Semantics?

j / k navigate · click thread line to collapse

37 comments

26 comments · 6 top-level

vog9y ago· 11 in thread

So my question is:

How is that? Why is OCaml so much popular, despite having just one implementation and no real spec? Why is SML with its real spec and multiple implementations not as least equally popular?

EDIT: Here are two possible answers that I don't think apply:

Athas9y ago

But, I should say that as starting point for creating new functional languages, SML is an exceedingly clean design. I'm using it myself for my own language design efforts.

[0]: https://github.com/SMLFamily/Successor-ML/issues

larsberg9y ago

You can hear a bunch more about the full history of development of SML and interaction with OCaml here: https://www.youtube.com/watch?v=NVEgyJCTee4

Today, there are two challenges right now in the Successor ML space:

1 more reply

mhd9y ago

So you just got the feeling of more dedicated, pragmatic language maintainers and actual community use. People who really care about purism went over to Haskelly anyway.

(Personally, I like the idea of languages with more than one compiler, if only the "Standard" part of "SML" would be a bit bigger)

agumonkey9y ago

I wonder if the Objective part of it made it more lively in the mainstream eyes too instead of "stagnant/crude" sml.

1 more reply

fmap9y ago

I can add a few more data points. First regarding CakeML:

- CakeML is written in HOL4, which is using ML.

- SML, unlike Ocaml has a well-defined semantics and so there is a good starting point for a verified compiler.

nickpsecurity9y ago

EDIT to add: Ocaml syntax is being worked on at link below.

https://github.com/CakeML/cakeml/tree/master/unverified/ocam...

2 more replies

tempodox9y ago

DanielBMarkham9y ago

It may be similar to the reason for adoption for F# -- it does objects

Doesn't matter if you actually code with objects. The important point for adoption is that it does them.

So for a noob .NET programmer you say, hey, look at F#! You can code it just like C#. Kinda.

But as soon as they start coding, you tell them nope, all of these types of things are actually antipatterns.

jetti9y ago

1 more reply

titanix29y ago

I think people wanting to use C# will stick to it, not learning another language to use it the same way as the one they already know.

The real value of objects in F# lies in the CLR compatibility. Access to the BCL and some existing code is a big advantage over OCaml for .neteers.

rwmj9y ago

I think network effects do matter much more than you say. There is virtually no community around SML.

mafribe9y ago· 3 in thread

Summary: CakeML is the first verified optimising compiler that bootstraps.

Side note: Cake stands for CAmbridge KEnt, which is where (most of) CakeML's verification was carried out.

aisofteng9y ago

Could you expand on the significance of the first, for those of us not familiar with formal verification?

Thank you in advance for any elaboration.

[1] https://en.wikipedia.org/wiki/Formal_verification

mafribe9y ago

- Provide a verified software toolchain for programmers with a minimal trusted computing base.

Maybe Leroy's [3, 4] are good starting point for learning about this field.

[1] T. Nipkow, G. Klein, Concrete Semantics. http://www.concrete-semantics.org/

[2] A. Chlipala, A verified compiler for an impure functional language. http://adam.chlipala.net/papers/ImpurePOPL10

[3] X. Leroy, Verifying a compiler: Why? How? How far? http://www.cgo.org/cgo2011/Xavier_Leroy.pdf

[4] X. Leroy, Formal verification of a realistic compiler. http://gallium.inria.fr/~xleroy/publi/compcert-CACM.pdf

2 more replies

gravypod9y ago

1 more reply

gravypod9y ago· 3 in thread

Go into the compiler explorer [0] and type the following

    val num = 10

Then take a look at the x86 generation.

What is all of that. It doesn't look like executable data needed. Is that just implicit functions or something baked into the language? If it is, why isn't it being tree-shook?

junke9y ago

clarus9y ago

gravypod9y ago

Sadly it doesn't seem like you can link to a compiler output from their compiler.

Twirrim9y ago· 3 in thread

What is ML in this context? Neither CakeML nor Standard ML site appear to actually define it, and it's an acronym with a few definitions in tech (e.g. Machine Learning)

more_original9y ago

https://en.wikipedia.org/wiki/ML_(programming_language)

z1mm32m4n9y ago

For a bit of history about ML, see Appendix F: The Development of ML (on page 89) from here:

http://sml-family.org/sml97-defn.pdf

igravious9y ago

https://en.wikipedia.org/wiki/ML_(programming_language)

nickpsecurity9y ago

Everyone with formal method background interested in this work should consider taking on one of their posted projects that would improve it. Especially Ocaml to CakeML translator.

https://cakeml.org/projects

A reference implementation to do equivalence checks against with main language, a ML or not, being something optimized.

fithisux9y ago

What is the difference between CakeML and StandardML in terms of Syntax and Semantics?

j / k navigate · click thread line to collapse