I think Vera might be missing something here. In my experience, LLMs code better the less of a mental model you need, vs the more is in text on the page.
Go – very little hidden, everything in text on the page, LLMs are great. Java, similar. But writing Haskell, it's pretty bad, Erlang, not wonderful. You need much more of a mental model for those languages.
For Vera, not having names removes key information that the model would have, and replaces it with mental modelling of the stack of arguments.
How often do you read assembly to check what your compiler is doing?
There is a niche of people doing it when they have special constraints, but that's a tiny niche.
"No LLM has ever been trained on Vera. There are no Vera examples on GitHub, no Stack Overflow answers, no tutorials — the language was created after these models' training cutoffs. Every token of Vera code in these results was written by a model that learned the language entirely from a single document (SKILL.md [https://veralang.dev/SKILL.md]) provided in the prompt at evaluation time."
If LLMs do much better with Vera (or something like it) than with traditional languages, we may be entering a time when most machine-written code will be difficult for humans to review - but maybe that ship has already sailed.
I’m surprised by this. Most likely significant white space is a big part of the problem (LLMs seem horrible at white space). Functional with types has been a win for me with Gleam.
I don't think that follows. It could just be that there is way more Go and Java code to train on than Haskell and Erlang. Haskell's terseness and symbol-named operators probably don't help either.
Seems like a weird decision taken from a weird paper that make everything harder for humans AND llms. Variables names give useful context when correctly named.
Surely, denser languages should be better for LLMs?
I think in the context of already trained LLMs, the languages most suited to LLMs are also the ones most suited to humans. Besides just having the most code to train on, humans also face similar limitations, if the language is too dense they have to be very careful in considering how to do something, if it's too sparse, the code becomes a pain to maintain.
If I had to design one of these, I'd go for:
1. Token minimization (which may be circular, I'm sure tokens are selected for these models at least in part based on syntax of popular languages)
2. As many compile time checks as possible (good for humans, even better for machines with limited context)
3. Maximum locality. That is, a feature can largely be written in one file, rather than bits and pieces all over the codebase. Because of how context and attention work. This is the one I don't see much in commercially popular languages. It's more of a declarative thing, "configuration driven development".
So, orthogonal to the accepted, common code organization idiom (no matter how infrequently adhered to)?
Fascinating! Just the other day I decomposed a massive Demeter violation into stepwise proxying "message passing." I was concerned that implementing this entire feature—well, at least a solid chunk of it— as a single, feature-scoped module would cause the next developers eyes to glaze over upon encountering such a ball-of-mud, such a dense vortex of spaghetti.
But, as I drove home that evening, I couldn't help wonder if I hadn't, instead, merely buried the gordian lede behind so many ribbons of silk.
This seems to be at odds with the goal of token minimization. Lots of small files that are narrowly scoped means less has to be loaded into context when making a change, right?
Throwing out another idea: I wonder if we could see some kind of equivalent of c header files for more modern languages so that an llm just has to read the equivalent of a .h file to start using a library.
There is no actual thought occurring. Arguably, we can say the same about a lot of humans at any given moment, but with machines there never is. It's all statistics.
You already lost me here. There's a reason variable names are a thing in programming, and that's to semantically convey meaning. This matters no matter whether a human is writing the code or a LLM.
So I wonder, doesn't this apply to function names too, which the author keeps in? I've seen LLMs use wrong functions/classes as well.
I think a proper harness, LSP and tests already solve everything Vera is trying to solve. They mostly cite research from 2021 before coding harnesses and agentic loops were a thing, back when they were basically trying to one-shot with relatively weak models (by modern standards)
Tokens combine multiple characters into a single vector. Attention computes similarity scores between vectors. This means you'd want each variable to be a single token so that the LLM can instantly know that two names refer to the same variable. If everything is numbered, the attention mechanism will attend every first parameter to every first parameter in every function. This means that the numbering scheme would have to be randomized instead of starting at zero.
Coding agents are now capable of using tools, including text search, which means that having the ability to look for specific variable names is extremely helpful. By using numbering, the author of the language has now given himself the burden of relying entirely on LSPs rather than innate model properties that operate on the text level.
So yeah, on a textual level, the language is designed for an era of LLMs that has been obsolete for a long time.
Agreed.
I'm working on a language designed for machines to write and humans to understand and review.
It doesn't seem worthwhile to have code nobody can understand.
Good luck managing hallucinations on that context
It doesn't have Hindley-Milner type inference, but it has very strong type inference.
We will get linearity soon thanks to and as part of the Capybara[1] effort.
Refinement types are already long a reality.
The whole new effect tracking thing is based on delimited continuations.
The Unison style content addressability comes up now and then, maybe it will become a reality at some point. It's though mostly not a language thing but more a build system thing.
Scala is already great for for LLMs also for other reasons:
https://arxiv.org/html/2510.11151v1
[1] https://2025.workshop.scala-lang.org/details/scala-2025/6/Sy...
Elaborate a little here.
C# can do something similar with null references. It can require you to indicate which arguments and variables are capable of being null, and then compiler error/warning if you pass it to something that expects a non-null reference without a null check.
LLMs are stateless, they don't "remember" your bespoke programming language manual and examples between completion calls, so you have to repeatedly include all that with each and every completion call, which balloons the number of tokens used, reduces how much useful work you can do with the remaining tokens and attention, and is a costly waste of tokens and electricity and money.
That isn't anywhere near as effective or efficient as using the LLM's pre-existing training on billions of lines of well known programming languages, manuals, tutorials, examples, code bases, stack overflow discussions, books, github repos, pr's, etc.
What is your extraordinary evidence for your extraordinary claims? Have you empirically measured how well it works, or is it just vibes and handwaving?
Whittling everything down so the language is relatively 1-to-1 with the structure of the compute. With little or no extraneous decoration.
So then why is the first mentioned and most obvious difference from other languages
> There are no variable names. @Int.0 is the most recent Int binding
LLMs are trained on code written by humans. They are most “familiar” with popular programming languages, have large datasets of examples and idioms to draw on. I don’t see the advantage of inventing a new language the machine must “learn” with syntax unlike anything it’s been trained on.
Validation and testing are also already things we do with human written code, too.
Edit: the more I think about it the more this seems like a really bad idea. Three more issues come to mind: 1) it becomes impossible to grep for a variable, which I know agents do all the time. 2) editing code at the top of the function, say introducing a new variable, can require editing all the code in the rest of the function, even if it was semantically unchanged! 3) they say it is less context for the LLM to track but now, instead of just having to know the name of one variable, you have to keep track of every other variable in the function
The major design decision I'm a little skeptical about is removing variable names; it would be interesting to see empirical data on that as it seems a bit unintuitive. I would expect almost the opposite, that variable names give LLMs some useful local semantics.
Yes, you’re adding this layer of verification, but LLMs don’t think in ASTs or use formal logic.
They are statistical predictors, just predicting what the next token will be.
There is a reason they perform best with TS/PY and not Haskell. The difference in size of the code corpus for each language.
The premise behind this seems to ignore all of that.
This reminds me of Dafny: https://dafny.org/
Actually, that's an interesting question: how good are LLMs at writing Dafny?
The right granularity for agents isn't files or lines, it's entities: functions, classes, methods. That's how both humans and agents actually think about code.
We built sem(Ataraxy-Labs/sem) which extracts entities from 30+ languages via tree-sitter and builds a cross-file dependency graph, so building semantic version control and semantic diff. weave (same org) takes it further and does git merges at entity level. Matches functions by name, merges their bodies independently.
The dependency graph also answers questions LLMs can't. I love the analysis based on ASTs.
This has been tried so many times already. It works nice for functions that only do some arithmetic. But in any real life system that pushes data around over the network or to databases, most things will happen inside effects which leaves the compiler clueless as to whether the function implementation does what it's supposed to do or not.
Don't get me wrong, I'm a big fan of using the compiler to improve productivity and I also believe strong typing leverages LLM power. But this kind of function specification is a dead end IMO.
Nanolang: A tiny experimental language designed to be targeted by coding LLMs
https://news.ycombinator.com/item?id=46684958
https://github.com/jordanhubbard/nanolang
SudoLang: A Powerful Pseudocode Programming Language for LLMs:
https://medium.com/javascript-scene/sudolang-a-powerful-pseu...
Programming Without People: Designing a Language for LLMs:
(ALaS (AI Language Specification)):
https://dshills.medium.com/programming-without-people-design...
LMQL ("LMQL is a programming language for LLMs."):
Which language is best for AI code generation? The answer might surprise you:
https://revelry.co/insights/artificial-intelligence/which-la...
1) Programming language training happens by volume, and the amount of JS/TS/python out there, and the rate it's growing at - is causing a training effects loop, which means for a few generations of models, these will be the best performing languages. Will be hard for a contender to spin up.
2) At some point, if we plateau on productivity - then efficiency improvements will happen, which will open a door for programming languages that maintains productivity, but is 10x cheaper on cost.
3) I think more immediate gains are at the cloud level. IMO, one of the reasons Google cloud is performing better(along with firebase) is much better overall CLI experience, leading to a pleasurable experience developing against it. This part of the market is ripe - whoever builds a most LLM friendly cloud has a shot of shooting up. Hence projects like exe.dev, and whatever cloudflare and vercel are trying. It would be good to have some shakeup in the cloud world.
Anyway, this is where my thoughts are currently.
I do applaud the lang design effort as there are countless routes of accepting Jesus as your savior.
Is this a thing for the llms? As a human, I also prefer being told what went wrong and why and how to fix it, rather than `expected {`
Seemingly opposing forces.
I do, too!
It appears that me and creator have had vastly different experiences with LLMs and their capabilities with complex code bases and complicated business logic.
My observations point to LLMs being much more successful when variables and methods have explicit, detailed names, it's the best way to keep them on track and minimize the chance of confusion, next closest thing being explicit comments and inline documentation.
Poorly named and poorly documented things in a codebase only cause it to reason more on what it could be, often reaching a (wrong) conclusion, wasting tokens, wasting time.
Perhaps this diversion in philosophy is due to fundamental differences in how we view the tool at hand.
I do not trust the machine, as such I review it's output, and if the variables lacked names, that would be significantly harder. But if I had a "Jesus, take the wheel!" attitude, perhaps I'd care far less.