But there are many strongly-typed languages that look at types at compile time and decide how large they are, by which I mean, how many bytes of physical RAM they take. These languages have gotten better and better over time at the APIs they offer that give the benefits of this without incurring the programmer costs, but under the hood, no matter how slick they've gotten, there is still a mapping of type -> size & layout. In these languages, this sort of narrowing is a great deal more complicated, because it implies a new type, which will probably require more allocation. For instance, in the example given between a "string" and a "number", that definitely looks like two differently-sized types to me. This would become very complicated very quickly as you start having to allocate these things in ways that can't be transparent to the user, you have to pass these types around, etc. etc.
One would expect that the affordances of statically-typed languages would end up being very different. And in fact, they are. Many modern statically-typed languages can solve the same problems outlined in the post, they just solve it in a different way. In this case, an Either, or more generally, a sum type. Then "getViewsOf" returns a value of that sum type and you can switch on it based on the value. It isn't exactly the same, in fact I'm not sure there's one aspect of it that's the same let alone the whole, but it solves the same problems.
So my suggestion would be that there are in fact a lot of languages that essentially have the same feature. They just don't call it "flow typing" and it's not spelled the same way.
But you kind of get at the real answer toward the end despite talking around it most of your post: Sum types plus pattern matching solves the same set of problems union types with flow typing solves, and a bunch that the latter doesn't (like composability.)
What you call "speaking around" I call an important thing to understand about a lot of languages: At some point, your data will be physically laid out in memory. If you don't care about performance... and I mean that as a perfectly valid choice in many situations, not a snark... it doesn't much matter how it is. But if you do, and you selected a language based on that, it matters a lot, and you have to view every type system detail through the lens of "what does it look like in memory?" if you want to understand it. The choices this class of language makes for their type systems will never fully make sense if you are not paying attention to this, and also if you gloss over the legitimate difficulties that arise for any sort of "why don't they just...?" sorts of questions.
(In particular, you really don't understand how good Rust is until you look at it through this lens, then look at just how much simultaneous power and convenience they've drawn out while never compromising on the memory issues. It's an incredible piece of work. It probably isn't "optimal" because such things don't exist in our real universe, but it probably is as close as we'll ever see for that class of langauge.)
Any given typing judgement we might want to make about a particular term can be checked statically or not, and it can be checked dynamically or not.
If the language has terms (... that might be evaluated) that don't get checked statically or dynamically, then it is going to be weakly typed (with respect to those typing judgements we're considering).
If we've already checked something statically, we "know" that it will always hold, and checking it dynamically is in some sense redundant (provided we actually believe our checking to be correct and our environment to be sufficiently robust that things don't change out from under us) and I think that's a part of the reason the two feel opposed, but in principle you could check a property both statically and dynamically for a particular term.
Step back a little to the language level and they aren't at all opposed; you might want to check some properties statically and some dynamically (eg. static checking of class/interface vs dynamic nullability in Java) or check a given property statically for some terms and dynamically for others (eg. gradual typing).
Pattern matching and sum types help, but they solve different problems.
1) The type is generic: A function may be called with a different type on each call site - but for each particular call site, the type is known at compile time.
2) The type is polymorphic - i.e. the full type is not known at compile time at all.
When the first case occurs, a compiler could either go the C++ way and generate different, non-generic variants of the function - or go the Java way and just pretend the type is polymorphic.
For the "C++" option, I think flow typing would fit very well without introducing additional complexity: For each variant the compiler generates, it knows which concrete type is used, so it can evaluate the condition at compile-time and just drop the branches that don't apply for a particular variant of the function.
For polymorphic types, on the other hand, the additional complexity is already priced in: You need some kind of mechanism to represent values of different structure anyway (pointers, unions, whatever) - so flow typing wouldn't be much more than some syntactic sugar to save you from writing casts you'd have to do anyway.
e.g.
interface A { type: 'a' }
interface B { type: 'b' }
type C = A | B;
const c: C = ...;
if (c.type === 'a') { /* c is of type A here */ }Take rust. You have a function that on one path of a branch returns an int, in the other, a float. Rust says that's an error. I say your function returns a union type :)
And this is totally pervasive. A function which returns an int in an if branch (w/out an else) and elsewhere returns nothing actually returns an optional.
A variable which is assigned in one branch to a reference to an int (an l-value in languages that screwed this up, like rust; a ptr int in languages that, sensibly, use types to encode this property, like algol68) in one branch, and to an int value in the other, is technically a union type. If that union is added to a to a float - that shouldn't be an error! The int should be widened, the ref int should be dereferenced, then widened.
OTOH, if you try to assign an int to that union - now that is an error, because values can't be assigned to. You can only assign to types which represent a location in memory, like ref int, unlike int.
In the above discussion, the effect of control flow on the types is critical. Languages like rust ignore this, and to my mind, their ergonomics suffer considerably because of this. C++ side-steps this through statement based syntax, which makes variant types clunky.
Union types are beautiful and powerful, but there seems to be a lack of appreciation for the subtle and deep ways they impact language design. You have to design them in right from the beginning; it's a tricky thing thing to get right, and doing so absolutely requires flow typing.
A first class sum type in c++ ala Rust or Haskell would certainly be appreciated, as the clunkiness of std::variant/std::visit is a well known annoyance.
Flow typing need not imply any kind of "narrowing", though. It could simply endow a control-flow branch with a statically-typed proof object, asserting whatever post-conditions were established in the branch. The "narrowing" could then become a separately-coded step, taking that proof object as part of its input arguments.
But if you mean what I think you mean, now your entire compiler suite needs to be reworked to change its internal concept of what guarantees a "type" has. Previously the compiler was able to assume it was a contiguous chunk of memory, but now it needs the concept of a non-contiguous chunk of memory representing a certain type, and it needs everything everywhere updated to accommodate that, and it may need to add code to make copies for things that didn't used to need to make copies if you create an array of this new type, etc. It goes on and on.
I'm not the world's expert on all this under-the-hood stuff, but there is still something to be said for learning a bit of C or something deep enough to understand how it interacts with the machine, and C probably is still the best language for this despite its manifold other flaws. There's all kinds of fun games you can play in conceptual math space to make awesome programming languages, but if you want your language to play in the "goes fast" space you will find only some of those games map to anything efficient on real machines. (There is certainly a space for languages that just provide the nice abstractions, too; between "C performance" and "Python performance" there's a lot of room to play!)
I do think think that was the right call at that time.
Just that he picks a language that doesn't support union-types. But that doesn't mean that flow typing would be necessary here - it means that the language(s) should support union-types and extend their pattern matching accordingly.
In fact, I would say that flow typing is almost like a workaround for missing pattern matching.
If flow typing is 'type narrowing' (the article kinda dragged on pulling in unrelated concepts so it lost me), then if anything, it is at least as good as pattern-matching/switch-blocks. At least in Typescript. That is because it works in switch statements and non-switch statements and provides the same guarantees. I don't get this argument.
Consider something like:
data: {pages: number} | null = f()
if (!data){
return 0
}
return data.pages + 5
Sure you can switch on the 'data' as well (exhaustive type-switches are a thing in Typescript too), but thats a syntactic preference. It surely is nice to get the same guarantees without _needing_ a switch statement.Maybe it's because you focus on typescript here. But typescript is in a position that other languages are not: it has to make things work within an existing ecosystem, in particular with an untyped language. Under these constraints, flow typing might be the best solution. But that doesn't mean it's generally "at least as good as pattern-matching/switch-blocks".
Btw, pattern matching and switch-blocks are very different things - maybe it make it easier for you to understand to dive into a language that has native pattern matching and no switch-blocks at all because from what you write it seems to me that you have not been exposed to pattern matching much yet.
Is it?
let data: Some(usize) = f();
match data {
Some(pages) => pages + 5,
None => 0
}
Seems pretty nice.Or if you want a more structurally similar code with a guard (and removing the useless bits):
let Some(pages) = f() else {
return 0
};
pages + 5For example, consider this Rust snippet :
pub fn positive1(x: isize) -> Option<usize> {
if x > 0 { Some(x as usize) } else { None }
}
Unless I'm mistaken, without narrow typing, this cast would be impossible, and there would be no way to avoid the redundant runtime check caused by try_into().unwrap(), that is not even optimized unless you trick the compiler.same sized integer casts in Rusts are no-ops [0]; the conditional isn't type narrowing, it just avoids the cases where the cast would not preserve the same semantic value.
[0] https://doc.rust-lang.org/reference/expressions/operator-exp...
sealed interface Foo<T> permits Impl {} // union type
record Impl() implements Foo<String> {} // product type
<T> T m(Foo<T> foo) {
return switch(foo) {
case Impl impl -> "hello"; // flow typing T=String
};
}switch resp {
case Result(val, meta):
doStuff(val)
case Error(msg, code):
logger.main.debug(msg)
} zip: Vec n t1 -> Vec n t2 -> Vec n (t1, t2)
zip Nil Nil = Nil
zip (Cons x xs) (Cons y ys) = Cons (x, y) (zip xs ys)
Here the 'Vec' type has two constructors, Nil and Cons. Since their types specify the same length, the type-checker knows we can ignore the 'zip Nil Cons' or 'zip Cons Nil' cases.> Since their types specify the same length, the type-checker knows we can ignore the 'zip Nil Cons' or 'zip Cons Nil' cases.
Specifically, I have trouble understanding "their types specify the same length". Whose types ?
I often want a single constructor/branch of an enum (sum type) to a be a type as well, specifically a sub-type of the full enum. So once I learn about what branch a value is, I can treat it like that and even first class pass it around – without unpacking.
I think this should be implementable in Rust without any runtime overhead as pure syntactic sugar. You get the same effect by creating a new struct for every set of values per branch and using those wrappers instead.
data Up
data Down
data Foo tag a where
Up :: a -> Foo Up a
Down :: String -> Int -> Foo Down a
When you enable all the necessary extensions for this to compile, it gives you exactly what you've asked for. If you leave the tag variable polymorphic in a function argument, you can receive values of either constructor. If you specify Foo Up a, you can only receive values with the Up constructor.It might be a bit more verbose than you want, with needing to declare types to use as the type level tags. (I looked into using PolyKinds and DataKinds to use the constructor as its own type argument, but GHC doesn't allow that type of recursion.) But it does do exactly what you've requested - it allows you to treat each constructor as a separate type in some contexts, and allow any of them in other contexts.
It's a loooong time ago, but the pattern exhaustiveness checker of GHC wasn't up to this last time I tried. But my guess is things are much better now and this might actually work.
e.g. inside "if (x != null) { }", or after "x = y ?? throw new ArgumentNullException()" then x is known not null.
C# has flow types, but only in this specific and limited way. You could say that about a lot of programming language features.
Most languages don’t have the type system necessary to make this work in a sensible way.
Doing a null check means that after that you can treat the variable as non nullable. Doing a type check, narrows down the type to what you just checked. Or going down a switch statement branch on the type actually implicitly does the cast as well.
It's both strongly typed and convenient.
Kotlkin even goes a step further and introduced contracts few versions ago that you can bind to functions. For example calling isNullOrBlank() on a nullable string changes to the type to nullable if the answer is false. You can write your own contracts for your own functions even. This is what the source code for this function looks like:
@kotlin.internal.InlineOnly
public inline fun CharSequence?.isNullOrEmpty(): Boolean {
contract {
returns(false) implies (this@isNullOrEmpty != null)
}
return this == null || this.length == 0
}
I guess typescript does something similarish. function isFish(pet: Fish | Bird): pet is Fish {
return (pet as Fish).swim !== undefined;
}
and Typescript is smart enough to let you use it in an if-branch // Both calls to 'swim' and 'fly' are now okay.
let pet = getSmallPet();
if (isFish(pet)) {
pet.swim();
} else {
pet.fly();
}
Typescript also has typechecker support for assertion functions. So like, function assertIsDefined<T>(val: T): asserts val is NonNullable<T> {
if (val === undefined || val === null) {
throw new AssertionError(`Expected 'val' to be defined, but received ${val}`);
}
}
Which we can use like so: function doSomething(pet: Fish | undefined) {
assertIsDefined(pet);
// safe to use without checking for null/undefined,
// because our assertion function narrowed the type for
// usage later in the function
pet.swim();
}For anyone confused like I was, that's a typo, it changes the type to not nullable. Likewise, isNullOrBlank() is an extension of the listed isNullOrEmpty().
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/is-...
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/is-...
> One of Typed Racket’s distinguishing type system features is occurrence typing, which allows the type system to ascribe more precise types based on whether a predicate check succeeds or fails.
One of the big advantages is that it lets a text editor use the added information to supply better auto-complete suggestions. It also makes it possible to eliminate some safety checks at runtime because the type checker has ensured that certain calls will be safe based on prior conditional checks.
[^1]: https://docs.racket-lang.org/ts-guide/occurrence-typing.html
There is a language construct which is purpose built for unpacking sum types: The case expression (sometimes called match expression)
case resp of
Views n -> ...
Error e -> ...
Inside each branch, we have access to a value of the more specific type.Consider predicate types, or first-class null safety through unions.
If statements introduce propositions, which type systems could take advantage.
example:
let idx = randomInt();
if (0 <= idx && idx < arr.length) { // proposition introduced
// idx now has type: int where 0 <= idx && idx < arr.length
// irrefutable proof of safe indexing by type propositions
let value = arr[idx];
}This is one of the many reasons why I've found Kotlin so refreshing after five years of writing enterprise Java
The idea from a programming point of view is really nice and in a dynamic programming language environment everything is an object and you can pass whatever through any function. This doesn't work all that well if you want to translate your program into a statically typed environment, in a cost effective way.
You need some way to reason about the type information that hopefully doesn't create memory allocations all over the place.
This can be done by introducing tagged data types but it can also lead to a combinatorial nightmare where you need to generate a lot of extra code.
In general, I don't think all this complexity is warranted in a statically typed environment and depending what you are doing this doesn't end up being that important. Even if, it's a nice to have, it's mostly that, a nice to have.
Take the example of a variant type, which is what I'm most interested in.
You have something like type X = int32 | string | boolean | SomeCustomType
When you then write a function that accepts X as a type you need to be able to pass any of these things into that function. To do that you need a auto user defined type that can carry the information, in C that might look like this
struct { union { int a; char* b; boolean c; } data; Tag type, }
then your type assertions can work on this. If you wrote this in my fancy language:
if x != nil { // x cannot be nil here x.foo }
the equivalent C code for this would be something like
void f(X x) { if (x.type != WELL_KNOWN_NIL) { x.data.foo // we know it's safe to access foo (assuming we typed X as Foo | nil) } }
Trying to solve this at compile time without any runtime checks might be possible but I don't know that it would be better, it might be too slow if I have to solve some nasty combinatorial problems. This is at least somewhat easy to explain.
One question I have is whether there's any possibility that TypeScript could, in the long run, gain performance advantages over pure JS? That the compiler could leave behind some type information artifacts so that V8 (or similar) could use that information at runtime to optimize method dispatch and other operations?
Likewise with the flow pieces mentioned here, I gotta think that the VM could optimize out certain runtime checks on dispatch or conditional branching if it knows that the compiler has already checked for these?
It would have to be a new language, sharing maybe 90% of syntax etc. but differing in some places where interoperability with JS has forced compromises (base number types is one thing I can think of. I'd like separate int and floats, etc.) and, yeah, targeting WASM etc. And honestly, I'd be game for that. Esp if it came with performance advantages.
When you write your code in a way that is akin to what you would do in a static environment, V8 emits additional type checks, if these pass, then it will run your code through a optimized version of the code that makes certain assumptions about the data types in use. If these type checks fail, then V8 will deoptimize the function/code.
Deoptimization needs to happen because the assumptions made about the execution of the code was wrong and the optimized code cannot handle this special edge case. V8 will then revert to less optimal code for the specific case but this code is more general and can handle the special case that occurred.
V8 can and will toggle between optimized and unoptimized versions of your code now and then but it has limits. If it cannot settle on a version of your function that is optimized it will stop trying to optimize the code because the cost of doing so is significant.
When you write your JavaScript as if it was more statically typed than it actually is, you do enjoy optimization benefits from V8.
A better option is to make/use a language which interfaces well with JavaScript but is not JavaScript and actually checks it’s types. If the type system is actually be sound, and any code going in from JavaScript is type-checked at runtime, then compiler optimizations are reasonable.
As for your last point, I know that JIT-compilers like V8 can and do infer types at runtime (e.g. by noticing if a particular variable or function arg is always set to the same type) and will compile specialized function overloads which take advantage of the inferred type for optimizations (e.g. using fixed-width integers even though all JavaScript numbers are doubles).
I thought it also supported cases like:
if (foo instanceof Bar) { // foo is typed as Bar in this block }
Is that true, or did I get that wrong?
if (o instanceof String s) {
var foo = s.indexOf("bar");
}
Java 17 introduced a preview of the same feature for switch statements return switch (o) {
case Integer i -> String.format("int %d", i);
case Long l -> String.format("long %d", l);
case Double d -> String.format("double %f", d);
case String s -> String.format("String %s", s);
default -> o.toString();
};
Future versions will have improved support: http://openjdk.java.net/jeps/420)Most languages have embraced statements over expressions for a lot of language constructs.
This causes a lot of issues:
An if-statement is in essence a function from boolean to unit/() (i.e. from a barely useful type to a type with no useful information), while an if-expressions will contain enough type information to at least provide this kind of “flow typing”, (even if it induces an amount of boolean blindness.)
It’s even worse when you get to loop-statements, which often is a function from unit/() to (unit/() | bottom/⊥), compared to a map or a reduce or fold, which again provide enough type information to provide this kind of “flow typing”.
Sometimes you will of course need to provide impure conditionals or loops, but that can be made explicit in the type system.
IMHO statements in programming languages are an anti-pattern.
Expression-based languages tend to have pattern matching which provides another way to solve the same problem.
Flow typing is most useful in imperative languages where code like this is common:
if (foo is! Bar) return "not a Bar";
foo.someBarMethod();
Early returns and other imperative control flow is idiomatic and it's annoying if the static type system doesn't understand it.In a more functional or expression-oriented language, early returns and other imperative control flow like that is rarer, so there's less "flowing" to type over.
Some of the most used languages with flow typing have statements and follow them for typing, so that is emphatically not the reason.
(Specifically for this point: “and follow them for typing”)
Flow typing is definitely a thing also in compiled languages, one example is Crystal, which is both compiled and has flow typing.
However, if I had a time machine and could redesign Dart from scratch, I would be tempted avoid flow typing and instead do something more like Rust and Swift do: Variables keep their static type but have pattern maching and nice syntactic sugar for simple use cases of it to make it easier to bind new variables with the narrowed type.
The main problem is that flow typing is very complex, subtle, and can fail in ways that users find surprising. For example:
foo(Object obj) {
closure() {
obj = "not int any more.";
}
if (obj is int) {
print(obj.abs());
}
closure();
}
This function is technically safe, but it's very hard for static analysis to reliably prove what kinds of flow analysis are valid when closures come into play. If that closure can escape, it can be impossible to prove that it won't be called before the variable is used.A simpler, more annoying example is:
class C {
Object obj;
foo() {
if (obj is int) {
bar();
print(obj.abs());
}
}
bar() {}
}
This code looks like it should be fine. But if C is an unsealed class and some subclass overrides `bar()` to assign to `obj`, then the promotion could fail. Because of this, Dart can't promote fields and it causes no end of user annoyance. Top-level variables and static fields have similar limitations.Even when it works, it can be surprising:
foo(Object obj) {
if (obj is int) {
var elements = [obj];
}
}
Should `elements` be inferred as a `List<Object>` or `List<int>`? What about here: foo(Object obj) {
if (obj is int) {
var elements = [obj];
elements.add("not int");
}
}
Should that `add()` call be OK or an error?Flow analysis is cool and feels like magic. It does the right thing 90+% of the time, but there's a lot going on under the hood that pops up in weird surprising ways sometimes.
It's probably the right thing to do if your language has already invested in an imperative style. But if you have the luxury of defining a language from scratch, I think you can get something simpler and more predictable if you make it easier to define new variables of the refined type instead of mutating the type of an existing variable.
Dart is heavily imperative, so I think flow analysis makes sense for it. Accommodating an imperative style is one of the main things that makes Dart so easy for new users to pick up, and that's an invaluable property. But I admit I envy Swift and Rust at times.
> almost every enterprise Java codebase I've had to work with has observed this pattern of testing whether an object is an instance of particular subclass, and using that to drive further computation. Even in the presence of such tests, Java demands that the author cast their objects appropriately. Of course you can always introduce an intermediate variable after the test, but I argue that this is still too much—upon verifying the instance of an object, the type system should be smart enough to update the object's type appropriately!
Because Java considers null to inhabit every type, in a Java project a few years ago, I handled this in a dynamic_cast-like way, as follows (abbreviated):
public abstract class Security { // ...
public Stock asStock() { return null; }
public Future asFuture() { return null; }
}
public class Stock extends Security { // ...
public Stock asStock() { return this; }
}
public class Future extends Security {
public final SecurityTable factory;
public final String exchange, symbol; // ...
public Future asFuture() { return this; }
}
This allows you to write relatively uncluttered type-dispatching downcasting code like this: public boolean contains(Security sec) {
Future f = sec.asFuture();
return f != null
&& SecurityTable.this == f.factory
&& f.symbol.equals(symbol);
}
Of course this violates the open-closed principle (the abstract base class should be closed for modification) and official OO doctrine is that if you want different behavior for different subclasses you should put that behavior into a method that gets overridden by the subclasses, not in a conditional that attempts a downcast, so we did that a lot more often. But I found it a pleasant solution to the problem in the context of this Java project.Of course it doesn't help in languages without implicit nullability, like TypeScript, which would ideally be all statically-typed languages.
— ⁂ —
The big difficulty with flow typing is, as I see it, not that it clashes with nominal typing; it's that it incorporates your compiler's static control flow analysis capabilities into your language's type system. Consider Hafiz's Java example:
if (node instanceof DomNode.Element) {
Layout layout = ((DomNode.Element) node).layout;
return new RenderNode.Styled(layout, /* ... */);
}
return new RenderNode.Noop(/* ... */);
It is entirely reasonable to request that the type system handle this and allow you to write: if (node instanceof DomNode.Element) {
Layout layout = node.layout;
return new RenderNode.Styled(layout, /* ... */);
}
return new RenderNode.Noop(/* ... */);
But how about this? Now we're no longer inside the static extent of the `if`, and we're depending on the compiler to recognize the unconditional early return: if (!(node instanceof DomNode.Element)) {
return new RenderNode.Noop(/* ... */);
}
Layout layout = node.layout;
return new RenderNode.Styled(layout, /* ... */);
How about constant folding and partial evaluation? if (1 == 0 || node instanceof DomNode.Element) {
Layout layout = node.layout;
return new RenderNode.Styled(layout, /* ... */);
}
return new RenderNode.Noop(/* ... */);
Do we want to skip type checking entirely for dead code? Straightforwardly flow typing gives us that the type of every variable inside unreachable code is void, the uninhabited type, so any operation whatsoever on it is type-valid. Do we want the compiler to accept code like this? if (1 == 0) {
Layout layout = node.layout + node / node;
return new RenderNode.Styled(layout, /* ... */);
}
return new RenderNode.Noop(/* ... */);
And of course doing control-flow analysis precisely isn't feasible due to the halting problem; you need to do some conservative approximation.So, if you want your programs to be portable from one version of the compiler to the next, somewhere you need to write down precisely what conservative approximation you're using for control-flow analysis, and in particular what you're not.
if (!(node instanceof DomNode.Element)) {
return new RenderNode.Noop(/* ... */);
}
Layout layout = node.layout;
return new RenderNode.Styled(layout, /* ... */);This 'flow typing' is a programming topic, nothing to do with mobile keyboards.