Algebraic Data Types are almost always one of the things I miss when I use imperative languages. I have to do Java at work, and while I've kind of come around on Java and I don't think it's quite as bad as I have accused it of being, there's been several dozen instances of "man I wish Java had F#'s discriminated unions".
Obviously I'm aware that you can spoof it with a variety of techniques, and often enums are enough for what you need, but most of those techniques lack the flexibility and terseness of proper ADTs; if nothing else those techniques don't have the sexy pattern matching that you get with a functional language.
This C extension looks pretty sweet since it appears to have the pattern matching I want; I'll see if I can use it for my Arduino projects.
:)
To me it feels very similar to an interface (trait) implemented by a bunch of classes (structs). I have multiple times wondered which of those two approaches would be better in a given situation, often wanting some aspects of both.
Being able to exhaustively pattern match is nice. But being able to define my classes in different places is also nice. And being able to define methods on the classes is nice. And defining a function that will only accept particular variant is nice.
From my perspective a discriminant vs a vtable pointer is a boring implementation detail the compiler should just figure out for me based on what would be more optimal in a given situation.
I do find it a little annoying that it's taken so long for Java to get a feature that, in my opinion, was so clearly useful; it feels like they were about a decade later on this than they should have been, but I'll take whatever victories I can get.
Java has https://github.com/functionaljava/functionaljava
which is unsupported but stable.
It wasn't that I though that the JVM was incapable of doing something like an ADT, just that vanilla Java didn't support it. While it's easy to say that "companies should just use Kotlin", that's a bit of a big ordeal if you already have a 15 year old codebase that's written in Java.
I've heard of but never used the Functional Java library, though it'd be a tough sell to get my work to let me import a library that hasn't been updated in two years.
For Java, see https://www.baeldung.com/java-lts-21-new-features
Kotlin's: https://www.baeldung.com/kotlin/when
Make up your own mind.
Algebraic data types and pattern matching actually work really well in imperative languages, too. See eg Rust.
They've recently added support for compiler-enforced pattern matching over sealed classes, which I suppose does get you halfway there though.
Think Scala, Elm and Haskell have it as well.
Having that and elixirs pattern matching would be insane.
I only sometimes use it as a "I would recommend this repo" -- how can one do that anyways, given that the repo could morph into something one would no longer recommend?
I've known C for almost 20 years, and never would I have thought the macro system was powerful enough to allow such black magic.
This is awesome!
The author is only 19 years old. I feel really dumb now.
They are kind of cursed but at their core they are actually incredibly simple and a reliable tool for reducing cognitive complexity and boilerplate in C based projects.
There is a lengthy blog post about the same stuff, except that the author doesn't seem to have come across the said wiki section yet: https://nandakumar.org/blog/2023/12/paradigms-in-disguise.ht...
Kudos to the dev of datatype99 for showing the problem with such ad-hoc methods in the readme right away.
Seems like a pretty big footgun. But otherwise, very cool.
I had written a immediate mode UI out of macros, and this reminded me of that although in my case it is not a problem, although some blocks are ones that you can use "break". For example, you can use "break" to exit out of a win_form block ("goto" also works), while a win_command block does not capture "break" so using break (or goto) inside of a win_command block will break out of whatever block the win_command is in (probably a win_form block; for example, this would commonly be used in the case of a "Cancel" button).
So it's not just about being slightly better in some ways, but smoothing over so many paper cuts that it can be hard to see how they have added up overtime across ecosystems, like CPython and co having so many of its own vocab types, or HPC libs.
For example, the problem with this macro that causes this wouldn't even be problems in a well written Rust macro. They're artifacts of smart people trying to work around C's limitations.
But then the macro wouldn't have been written anyway because this is a port of a native Rust feature (which means it gets taken advantage of in community software).
Let's say further that you already know Rust exists, and aren't going to use it for reasons that anyone writing a C program already knows.
At least consider Zig. Here's a little something I wrote in Zig two days ago:
/// Adjust a label-bearing OpCode by `l`. No-op if no label.
pub fn adjust(self: *OpCode, l: i16) void {
switch (self.*) {
inline else => |*op| {
const PayType = @TypeOf(op.*);
if (PayType != void and @hasField(PayType, "l")) {
op.*.l += l;
}
},
}
}
This uses comptime (inline else) to generate all branches of a switch statement over a tagged union, and add an offset to members of that union which have an "l" field. You can vary the nature of the branches on any comptime-available type info, which is a lot, and all the conditions are compile-time, each branch of the switch has only the logic needed to handle that variant."But my program is already in C, I just need it for one file" right. Try Zig. You might like it.
Considered the example given. Now my eyes hurt. I think I started appreciating Lisp more.
https://gist.github.com/unclechu/eb37cc81e80afbbb5e74990b62e...
The difference here is that Nim compiles to C and you can turn the garbage collector off: Zig compiles C and there's no garbage collector. That means the entire standard library is available when generating object code. It's also trivial to opt-in to the C ABI on a fine-grained basis, by defining a function or struct with the extern keyword.
I believe this is still fairly current about the difficulties of building Nim dylibs for C programs: https://peterme.net/dynamic-libraries-in-nim.html
I expect Nim will stabilize about where D has: it will have a dialect of the language which, with relatively painless accommodations, is able to produce object code which speaks C ABI. Zig is different. The language is relentlessly focused on providing a better alternative to C while occupying the same niche, and a lot of design time has been spent on making it practical to take an existing C program and start writing the new parts of it in Zig.
It's a good language, Nim, and getting better. I'd recommend it for someone who is considering Go, for example.
The only issue is you can't do a clean switch statement that matches on the specific value of a field, but nested switch statements aren't that messy.
The trouble with this approach is that there's a lot of mental overhead in dotting all of your i's and crossing all of your t's. It's draining, so you start to, e.g., shoehorn additional functionality into existing classes instead of making new ones.
You eventually wind up perceiving the abstraction as costly which lessons your use of it at the expense of producing a more elegant solution to the problem(s) you're solving.
tl,dr? The ability to just state "Darmok and Jalad at Tanagra" is transformative when the alternative is telling an entire story every time you want to reference a complex idea.
The modelling aspects can be simulated, yes, but that's barely half of the benefits of ADTs. Pattern matching is a big ergonomic benefit.
The critical thing is that the compiler (or macro system) needs to check that you've checked all the alternatives.
datatype(
BinaryTree,
(Leaf, int),
(Node, BinaryTree *, int, BinaryTree *)
);
No names for the struct fields, so you need to rely on the position.And then used:
int sum(const BinaryTree *tree) {
match(*tree) {
of(Leaf, x) return *x;
of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);
}
// Invalid input (no such variant).
return -1;
}
Where lhs, x, rhs magically match the types defined above. What a nonsense design!(And in fact Algol 68 had a better implementation than most later languages, but Algol 68 was missing completely any documentation suitable for newbies, like tutorials and programming examples, while not being promoted by any hardware vendor, like IBM or DEC, so it was doomed.)
https://web.eecs.umich.edu/~bchandra/courses/papers/Hoare_Hi...
I also would love a future where ADTs are more common in imperative languages
contrived :: ([a], Char, (Int, Float), String, Bool) -> Bool
contrived ([], 'b', (1, 2.0), "hi", True) = False
To achieve a result like this using Zig's switch syntax would seem to involve a huge amount of boilerplate code and nested switch statements.Most of the common languages today have product types.
Java[1], Rust, Haskell, etc. have sum types.
I think it gets a bit more escoteric beyond that though - i don't doubt that there's probably some haskell extension for quotient types[2] or some other category theory high-jinx.
Most languages have ADTs built in.
[1] https://blogs.oracle.com/javamagazine/post/inside-the-langua... [2] https://en.wikipedia.org/wiki/Quotient_type
https://medium.com/dartlang/dart-3-1-a-retrospective-on-func...
You can do the same thing with boolean logic and just have not-and, but thankfully we have and, or, not, xor. Similarly, you don't need greater-than-or-equal because you can just write 'x > y || x == y'.
Comprehension is linked to how closely you can express the idea of what you're doing in the object language that you have to look at. It might be convenient to compile everything down to SK combinators so that your optimizer and evaluator can be simpler, but people should never look at that level (at least not until you suspect a compiler defect).
So we get to object oriented programming. Where our data expression has an AND property (a class has an INT field AND a STRING field), an existential property (interfaces: there exists some object with these methods), and inheritance (a truly bizarre feature where we duck tape subtyping to a method and field grouping mechanism with a bunch of hooks).
With interfaces and inheritance you can simulate both a universal property (generic) and an OR property. But because it's not a direct expression, we leave this giant gap for what people intended to happen to diverge from what actually happens. Especially after time passes, defects are found, and requirements change. [For example, when using interfaces to simulate an OR property, there really isn't any mechanism to let everyone know that this construct is closed. So if something erroneously gets added, you won't know to check the entire code base. And if requirement change and you need to add a new case, then you have to check the entire code base. Completeness checking of ADTs give you this for free in your pattern matches.]
Too many non-trivial architectural messes that I've encountered in my career have been due to either someone trying to solve all of their problems with interfaces or the same with inheritance* when a simple OR data structure would have made everything simple, clear, and correct.
[*] - Inheritance being more problematic when someone tries to create a non-trivially sized category hierarchy, which ruins the day when requirements change and suddenly the tree needs to be reorganized but doing so would invalidate entire swaths of the code base already accepting types with a different assumed (and undocumented) hierarchal tree structure. Thankfully most people have gotten the memo and switched to interfaces.
They were both introduced in the same decade.
I think that's actually wrong for "Sum types". Product types, sure. The idea of storing a bunch of fields in a single thing matches the way we've been organizing information since we started writing things down.
But I genuinely don't think I've seen an attempt at a sum/union/enumerant/whatever syntax in a programming language that wasn't horrifyingly confusing.
Where by extension: class-based inheritance is actually pretty simple to understand. The classic "IS-A" relationship isn't as simple as "fields in a struct", but it's not hard to understand (c.f. all the animal analogies), and the syntax for expressing it is pretty clean in most languages.
Is it the "best" way to solve a problem? Maybe not. Neither are ADT sum types. But I think there's a major baby-in-the-bathwater problem with trying to be different. I really don't think, for the case of typical coders writing typical code, that ADTs are bringing as much to the table as the experts think.
Syntaxes like: `A | B(Int) | C(String)`. That means A, B, or C.
> Where by extension: class-based inheritance is actually pretty simple to understand. The classic "IS-A" relationship isn't as simple as "fields in a struct", but it's not hard to understand
This value is either `A`, an Int (`B(Int)`) or a String (`C(String)`). Or: this knapsack either contains an A, B, or C. Difficult?
> (c.f. all the animal analogies),
Reminds me of the fact that software isn’t typically as static as animal taxonomies.
> and the syntax for expressing it is pretty clean in most languages.
I’m most used to Java where you spread `extends` over N files. (Sealed classes in Java is an exception.)
It’s fine. I don’t understand how it is particularly clean.
> Is it the "best" way to solve a problem? Maybe not. Neither are ADT sum types.
Is this an argument? ’Cause I don’t see it.
> But I think there's a major baby-in-the-bathwater problem
Inheritance is something that needs to have some concrete implementation affordance. Baby in the bathwater? I don’t see how you bolt this onto the struct model in a way that gets out of the way for people who don’t want to use it (the zero-cost-abstraction (in the if you don’t use it sense) is important to some low-level languages).[1]
Maybe the designers of hypothetical language X thinks that algebraic data types is enough. What baby are you missing then?
[1] For algebraic data types: structs are straightforward enough. While the “sum type” can be implemented by leaving enough space for the largest variant. That one-size-fits-all strategy isn’t perfect for all use-cases but it seems to have been good enough for Rust which has a lot, a lot of design discussions over minutiae.
> trying to be different.
With a technology from the ’70s. I also saw your “one oddball new idea is a revolution” snark. You’re clearly being very honest.
data Bool = True | FalseSimple to understand, a nightmare to debug, as you'll be chasing where your data and data contracts across a ton of files.
It honestly does annoy me that a lot of mainstream languages still haven't really adopted ADTs; when Java 8 added a lot of (well-needed) new syntax, it felt like that was an ideal opportunity to add ADTs and pattern matching (though I'm sure that was easier said than done).
Well at least Java does now (as of Java 21) have pattern matching (including nested record destructuring) and sealed classes, which let you have decent sum types.
The one issue is that everything is nullable, but that's a wider Java issue.
Also, algebraic data types can be seen as hierarchy consisting of abstract base class and several final children classes. So it is an inheritance model, just restricted one.
However
What we do see is a bunch of mathematical disciplines that end up creating properties like: AND, OR, Universal, Existential, Implication, (and a few others). They end up in places like: set theory, type theory, category theory, various logics, lattice theory, etc.
Now, maybe they're only copying one another and this is more of a memetic phenomena. Or maybe they've hit upon something that's important for human comprehensibility.
That would be the 'evidence' of the positive effect of ADTs (scare quotes because it might just be math memes and not fundamental). But we can also think about what I feel is legit evidence for the negative effect of lacking ADTs.
Consider what happens if instead of having the standard boolean logic operators and, or, not, xor, we only have the universal not-and operator. Now a straightforward statement like: A && B || C becomes (((A !& B) !& (A !& B)) !& ((A !& B) !& (A !& B))) !& (B !& B) [I think...]. It's more complicated to tell what's actually supposed to be going on AND the '&&' simulation can get intertwined with the '||' simulation. The result being that requirements changes or defect fixes end up modifying the object level expression in a way where there is no longer any mapping back to standard boolean logic. Comprehensibility approaches zero.
And we've seen this happen with interfaces and inheritance being used to implement what would otherwise be a relatively simple OR property (with the added benefit that pattern matching ADTs often comes with totality checking; not something you can do with interfaces which can always have another instance even up to and including objects loaded at runtime).
Swift, Kotlin and Scala all have had ADTs for a while, even Java has it now.
Niklaus Wirth is well known as a critic of Algol 68 (before the design of Algol 68 was finalized), but in the case of his variant records he has completely failed to create something competitive.
In practice you need to couple it with an enum, and your visitation mechanism is a switch statement. But C doesn't impose that on you and lets you do it as you see fit.
It also has all the features of Haskell, since you can implement a Haskell compiler in C.
The worst codebases to inherit as a maintenance programmer are the ones where people got clever with the C preprocessor. Impossible to debug and impossible to maintain.
This library on the other hand addresses a nasty papercut whose presence usually stops folks with modern language experience from choosing C when it might otherwise be valid. Plus you can't beat C's long-term stability.
Though I agree that 90+% who _think_ they still need C should probably move on to making Rust work for them, instead.