That said, I agree with the rest of your comment - static types for correctness and static types for performance are two different goals, and most gradually typed systems work towards the former and not the matter. (And more to the point: adding types alone is not sufficient for the latter. Python is not slow because it has no types, Python is slow because it isn't designed to be fast, and changing that now would require a lot of fundamental changes to the language and ecosystem, of which types would be one of the least relevant.)
What I’ve heard from people who work on performance is, the type hints can provide benefits potentially, but just not relatively as useful as other runtime improvements. The development effort is simply better placed elsewhere.
It is valid to critique gradually typed systems like Python for not improving performance.
But in a dynamic language like python your options are limited.
Consider:
def add(a: int, b: int) -> int: return a + b
assert add(1,2) == 3
a, b = random.choice((3, 4), ("foo", "bar"))
assert add(a,b) in (7, "foobar")
In the first call we use ints and are golden. In the second we get ints or strings and we cannot know which until runtime.
So if we are going to use type information for speed improvements how should we proceed?
Some ideas.
1. Generate special purpose small (32 or 64 bit) integer version in a custom type aware bytecode format or ASM and a common version of the function. Use run-time typechecks to decide which one to use.
We generate two versions of the bytecode instead of one we also run some optimizations on the integer version to actually get a speedup when we call it.
To do the integer optimization we probably need a new special type aware bytecode format. (or just compile to assembly directly) To get advantage over the interpretor where everything including integers are objects.
When we call 'add' we just check the type send the values to our new bytecode format and bam faster add (possibly faster anyways).
Having two versions of the function will take extra memory and increase startup latency (during compilation to bytecode)
In some cases we could directly call the integer version of add like in the first call we know we are working with ints, in the second call we cannot know until the type check if we are dealing with ints or strings.
Will all of this be worth it? We use extra memory, slower startup time and an extra typcheck when we are not 100% clear on the types.
For hot code it could be worth but we do not know what code is hot and what is cold until we run the program. We could tell the compiler what is hot and what's not like 'numba' does, but then we are going beyond typing.
So why does python not so this? Well for a start it's very unclear that we will gain performance from this extra compilation / bytecode swapping but it will slow down startup. It will also be a massive change to the internals of the standard interpretor and need tons on man hours supporting this beast.
At a first glance it does not look that hard to do this extra compilation but the add function is very simple. What about more complex code that calls unknown methods that have not been defined yet possibly even loaded at very late at runtime. Will the ASM layer help or hinder when you need to go back and forth between the interpretor and optimized code? The interpretor really does not have enough information to make intelligent decision on this.
2. We could go the route of static languages. Compile only a special case integer version.
Assuming that we always use the function correctly this could be fast.
But this would be extremely unsafe in a dynamic language, if we don't have an integer but do integer additions on the objects internals anything could happenn.
Now if you have full static typing you can do this, really you should do exactly this. It is safe to assume that we are calling the function with the correct types, since it's enforced at the language level. Which leads us to do all kinds of optimizations that the dynamic nature of python prevents.
3. Just use a JIT like 'pypy'.
A JIT would do basically option 1 but only after hot code paths have been identified. Startup time is less of an issue since we delay compilation untill we are sure we need it (or fairly sure at least).
Type hinting could help guide the JIT if you using fairly specific types such as 'int32' instead of int to highlight that we are just interested in small numbers. But the information might also be missleading so it might opt out of using it altogether.
-------------------------------
Cpython does not do (1) since its not clear that you get any substantial speedup and startup latency is a big deal. Does not do (2) since its super unsafe. And it's not a JIT (3).