I have a hypothesis that an LLM can act as a pseudocode to code translator, where the pseudocode can tolerate a mixture of code-like and natural language specification. The benefit being that it formalizes the human as the specifier (which must be done anyway) and the llm as the code writer. This also might enable lower resource “non-frontier” models to be more useful. Additionally, it allows tolerance to syntax mistakes or in the worst case, natural language if needed.
In other words, I think llms don’t need new languages, we do.
That is, in the same way that event sourcing materializes a state from a series of change events, this language needs to materialize a codebase from a series of "modification instructions". Different models may materialize a different codebase using the same series of instructions (like compilers), or say different "environmental factors" (e.g. the database or cloud provider that's available). It's as if the codebase itself is no longer the important artifact, the sequence of prompts is. You would also use this sequence of prompts to generate a testing suite completely independent of the codebase.
Back in the day, JetBrains tried revision-controlling AST trees or psi-nodes in their parlance. That project was cancelled, as it became a research challenge. That was 10 years ago or so. At this point, things may work out well, time will tell.
Jonathan Edwards (Subtext lang) has a lot of great research on this.
- LLMs can act as pseudocode to code translators (they are excellent at this)
- LLMs still create bugs and make errors, and a reasonable hypothesis is at a rate in direct proportion to the "complexity" or "buggedness" of the underlying language.
In other words, give an AI a footgun and it will happily use it unawares. That doesn't mean however it can't rapidly turn your pseudocode into code.
None of this means that LLMs can magically correct your pseudocode at all times if your logic is vastly wrong for your goal, but I do believe they'll benefit immensely from new languages that reduce the kind of bugs they make.
This is the moment we can create these languages. Because LLMs can optimize for things that humans can't, so it seems possible to design new languages to reduce bugs in ways that work for LLMs, but are less effective for people (due to syntax, ergonomics, verbosity, anything else).
This is crucially important. Why? Because 99% of all code written in the next two decades will be written by AI. And we will also produce 100x more code than has ever been written before (because the cost of doing it, has dropped essentially to zero). This means that, short of some revolutions in language technology, the number of bugs and vulnerabilities we can expect will also 100x.
That's why ideas like this are needed.
I believe in this too and am working on something also targeting LLMs specifically, and have been working on it since Mid to Late November last year. A business model will make such a language sustainable.
It sounds like your plan is for it to write fewer bugs in NewLang, but, well, that seems a bit hard to achieve in the abstract. From bugs I've fixed in generated code, early LLM, it was just bad code. Multiple variables for the same thing, especially. Recently they've gotten better at that, but it still happens.
For a concrete example, any app dealing with points in time. Which sometimes have a date attached but sometimes do not. And also, what are timezones. The complexity is there because it depends on what you're trying to do. An alarm clock is different than a calendar is different than a pomodoro timer. How are you going to reduce the bugged-ed-ness of that without making one of those use cases more complicated than need be, given access to various primitives.
This is something that could be distilled from some industries like aviation, where specification of software (requirements, architecture documents, etc.) is even more important that the software itself.
The problem is that natural language is in itself ambiguous, and people don't really grasp the importance of clear specification (how many times I have repeated to put units and tolerances to any limits they specify by requirements).
Another problem is: natural language doesn't have "defaults": if you don't specify something, is open to interpretation. And people _will_ interpret something instead of saying "yep I don't know this".
This is literally what software developers are actually paid to do. They are not paid to write code. This is reinventing software development.
Thats again programming languages. Real issue with LLMs now is it doesn't matter if it can generate code quickly. Some one still has to read, verify and test it.
Perhaps we need a need a terse programming language. Which can be read quickly and verified. You could call that specification.
The programming language can look more like code in parts where the specification needs to be very detailed. I think people can get intuition about where the LLM is unlikely to be successful. It can have low detail for boilerplate or code that is simple to describe.
You should be able to alter and recompile the specification, unlike the wandering prompt which makes changes faster than normal version control practices keep up with.
Perhaps there's a world where reading the specification rather than the compiled code is sufficient in order to keep cognitive load at reasonable levels.
At very least, you can read compiled code until you can establish your own validation set and create statistical expectations about your domain. Principally, these models will always be statistical in nature. So we probably need to start operating more inside that kind of framework if we really want to be professional about it.
Many of our traditional functional languages, ML family in particular, let you write hyper concise expressions (pure math if you’re in to that sort of thing), craft DSLs of unlimited specifiable power (‘makeTpsReportWith “new cover page format”’), and also in natural language (function names like `emptied cart should have zero items`).
I think if we did that and leveraged the type systems of those languages and the systematic improvements we see from ADTs and pattern matching in those languages, combined with a specification first approach like TDD, that we’d have a great starting point to have an LLM generate the rest of the system perfectly.
… yes, that is just writing Haskell/OCaml/F# with extra steps.
… yes, that level of specification is also the point with those languages where your exploratory type-diddling suddenly goes ‘presto’ and you magically have a fully functioning system.
I guess I’m old-fashioned, but sometimes I wonder if compilers are good for what they’re good for.
This is where LLMs slip up. I need a higher-level spec language where I don't have to specify to an LLM that I want the jpeg crop to be lossless if possible. It's doubly obvious that I wouldn't want it to be lossy, especially because making it lossy likely makes the resulting files larger. This is not obvious to an LLM, but it's absolutely obvious if our objects are users and user value.
A truly higher-level spec language compiler would recognize when actual functionality disappeared when a feature was removed, and would weigh the value of that functionality within the value framework of the hypothetical user. It would be able to recognize the value of redundant functionality by putting a value on user accessibility - how many ways can the user reach that functionality? How does it advertise itself?
We still haven't even thought about it properly. It's that "software engineering" thing that we were in a continual argument about whether it existed or not.
It's just part of the software lifecycle. People think their job is to "write code" and that means everything becomes more and more features, more abstractions, more complex, more "five different ways to do one thing".
Many many examples, C++, Java esp circa 2000-2010 and on and on and on. There's no hope for older languages. We need simpler languages.
Of course someone eventually will, so I might as well: Well, except for lisp-likes. I think the main reason programming languages grow and grow, is because people want to use them in "new" (sometimes new-new, sometimes existing) ways, and how you add new language features to a programming language? You change the core of the language in some way.
What if instead you made it really easy to change the core language from the language itself, when you need to, without impacting other parts of the codebase? Usually if you use a language from the lisp-"family" of languages, you'll be able to.
So instead of the programming language everyone is using grows regardless if you need it or not, it can stay simple and relatively small for everyone, while for the people who need it, they can grow their own hairballs "locally" (or be solid engineers and avoid hairballs in the first place, requires tenure/similar though).
Just this week, I decided to start learning Kotlin because I want to build a mobile app.
Everything was going great until I reached lambda functions.
Honestly, I can't wrap my head around either their purpose or their syntax. I find them incredibly confusing. Right now, they feel like something that was invented purely to confuse developers.
I know this might just be one of those topics where you suddenly have an "aha" moment and everything clicks, but so far, that moment hasn't come.
Did anyone else coming from older, more imperative languages struggle this much with lambdas? Any tips or mental models that helped you finally "get" them?
And then we can look at multiple LLM-generated implementations to inform how the prompt might need to be updated further until it's a one-shot.
Now you have perfect intention behind code, and you can refine the intention if it's wrong.
p.s. a combination of the above fares very well during my agentic coding adventures.
Consider:
"Eat grandma if you're hungry"
"Eat grandma, if you're hungry"
"Eat grandma. if you're hungry"
Same words and entirely different outcome.
Pseudo code to clarify:
[Action | Directive - Eat] [Subject - Grandma] [Conditional of Subject - if hungry]
The code was always a secondary effect of making software. The pain is in fully specifying behavior.
But seriously, llms can transmit ideas to each other through English that we do understand, we are screwed if it’s another language lol
My understanding/experience is that LLM performance in a language scales with how well the language is represented in the training data.
From that assumption, we might expect LLMs to actually do better with an existing language for which more training code is available, even if that language is more complex and seems like it should be “harder” to understand.
This does fill up context a little faster, (1) not as much as debugging the problem would have in a dynamic language, and (2) better agentic frameworks are coming that “rewrite” context history for dynamic on the fly context compression.
This isn't even true today. Source: heavy user of claude code and gemini with rust for almost 2 years now.
This is such a silly thing to say. Either you set the bar so low that "hello world" qualifies or you expect LLMs to be able to reason about lifetimes, which they clearly cannot. But LLMs were never very good at full-program reasoning in any language.
I don't see this language fixing this, but it's not trying to—it just seems to be removing cruft
I still experience agents slipping in a `todo!` and other hacks to get code to compile, lint, and pass tests.
The loop with tests and doc tests are really nice, agreed, but it'll still shit out bad code.
Additionally just the ability to put an entire language into context for an LLM - a single document explaining everything - is also likely to close the gap.
I was skimming some nano files and while I can't say I loved how it looked, it did look extremely clear. Likely a benefit.
Eventually AIs will create their own languages. And humans will, of course, continue designing hobbyist languages for fun. But in terms of influence, there will not be another human language that takes the programming world by storm. There simply is not enough time left.
This isn't really true. LLMs understand grammars really really well. If you have a grammar for your language the LLM can one-shot perfect code.
What they don't know is the tooling around the language. But again, this is pretty easily fixed - they are good at exploring cli tools.
In the long term I expect it won't matter - already GPT3.5 was able to reason about the basic semantics of programs in languages "synthesised" zero-shot in context by just describing it as a combination of existing languages (e.g. "Ruby with INTERCAL's COME FROM") or by providing a grammar (e.g. simple EBNF plus some notes on new/different constructs) reasonably well and could explain what a program written in a franken-language it had not seen before was likely to do.
I think long before there is enough training data for a new language to be on equal grounds in that respect, we should expect the models to be good enough at this that you could just provide a terse language spec.
But at the same time, I'd expect the same improvement to future models to be good enough at working with existing languages that it's pointless to tailor languages to LLMs.
The characteristics of failures have been interesting: As I anticipated it might be, an over ambitious refactoring was a train wreck, easily reverted. But something as simple as regenerating Android launcher icons in a Flutter project was a total blind spot. I had to Google that like some kind of naked savage running through the jungle.
Getting the Doom sound working on it involved me setting there typing "No I can't hear anything" over and over until it magically worked...
Maybe I should have written a helper program to listen using the microphone or something.
As others said, the key is feedback and prompting. In a model with long context, it'll figure it out.
https://github.com/jordanhubbard/nanolang/blob/main/MEMORY.m...
Optimistically I dumped the whole thing into Claude Opus 4.5 as a system prompt to see if it could generate a one-shot program from it:
llm -m claude-opus-4.5 \
-s https://raw.githubusercontent.com/jordanhubbard/nanolang/refs/heads/main/MEMORY.md \
'Build me a mandelbrot fractal CLI tool in this language'
> /tmp/fractal.nano
Here's the transcript for that. The code didn't work: https://gist.github.com/simonw/7847f022566d11629ec2139f1d109...So I fired up Claude Code inside a checkout of the nanolang and told it how to run the compiler and let it fix the problems... which DID work. Here's that transcript:
https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5...
And the finished code, with its output in a comment: https://gist.github.com/simonw/e7f3577adcfd392ab7fa23b1295d0...
So yeah, a good LLM can definitely figure out how to use this thing given access to the existing documentation and the ability to run that compiler.
# Control flow
if (> x 0) {
(println "positive")
} else {
(println "negative or zero")
}
But that's nothing compared to the scream for a case/switch-statement in the Mandelbrot example... # Gradient: " .:-=+*#%@"
let gradient: string = " .:-=+*#%@"
let gradient_len: int = 10
let idx: int = (/ (* iter gradient_len) max_iter)
if (>= idx gradient_len) {
return "@"
} else {
if (== idx 0) {
return " "
} else {
if (== idx 1) {
return "."
} else {
if (== idx 2) {
return ":"
} else {
if (== idx 3) {
return "-"
} else {
if (== idx 4) {
return "="
} else {
if (== idx 5) {
return "+"
} else {
if (== idx 6) {
return "*"
} else {
if (== idx 7) {
return "#"
} else {
if (== idx 8) {
return "%"
} else {
return "@"
}
}Maybe I’m missing some context, but all that actually should be needed in the top-level else block is ‘gradient[idx]’. Pretty much anything else is going to be longer, harder to read, and less efficient.
The thing that really unlocked it was Claude being able to run a file listing against nanolang/examples and then start picking through the examples that were most relevant to figuring out the syntax: https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5...
So I wonder how exhausting would it be to write in a language that required, for all functions, that they are tested with 100% path coverage.
Of course, this by itself wouldn't still be equivalent to proving the code, but it would probably point people to the corner cases of code quite rapidly. Additionally it would make it impossible to have code that cannot be tested with 100% path coverage due to static relationships within it, that are not (or cannot be) expressed in the type system, e.g. if (foo) { if (!foo) {..} }.
And would such a language need to have some kind of dynamic dependency injection mechanism for mocking the tests?
I'm still skeptical of the value add having to teaching a custom language to an LLM instead of using something like lua or python and applying constraints like test requirements onto that.
I think that a real world file of source code will be either completely polluted by tests (they are way longer than the actual code they test) or become
fn process_order {
...
}
shadow process_order {
assert test_process_order
}
and the test code will be written in another file, and every function in the test code will have its own shadow function asserting true, to please the compiler.Seems unlikely for an out-of-distribution language to be as effective as one that’s got all the training data in the world.
Really needs an agent-oriented “getting started” guide to put in the context, and evals vs. the same task done with Python, Rust etc.
It has several such documents, including a ~1400 line MEMORY.md file referencing several other such files, a language specification, a collection of ~100 documents containing just about every thought Jordan has ever had about the entire language and the evolution of its implementation, and a collection of examples that includes an SDL2 based OpenGL program.
Obviously, jkh clearly understands the need to bootstrap LLMs on his ~5 month old, self-hosted solo programming language.
Summary:
- Co-created FreeBSD.
- Led UNIX technologies at Apple for 13 years
- iXSystems, lead FreeNAS
- idk something about Uber
- Senior Director for GPU Compute Software at NVIDIA
For whatever it’s worth.
Apparently he did as well[1]: "The start of the 2.0 ports collection. No sup repository yet, but I'll make one when I wake up again.. :)" Submitted by: jkh Aug 21, 1994
[1] https://github.com/freebsd/freebsd-ports/commit/7ca702f09f29...
Interesting commit starting Ports 2.0. Three version of bash, four versions of Emacs, plus jove.
I might accidentally summon a certain person from Ork.
Quick reaction:
1. Nanolang is a total thought experiment. The key word its description is "experimental" - whether it's a Good experiment or a Bad experiment can be argued either way, especially by language purists!
2. Yes, it's a total Decorator Crab of a language. An unholy creation by Dr Frankenstein, yes! Those criticisms are entirely merited. It wasn't designed, it accreted features and was a fever dream I couldn't seem to stop having. I should probably take my own temperature.
3. I like prefix notation because my first calculator was an HP calculator (the HP 41C remains, to this day, my favorite calculator of ALL TIME). I won't apologize for that, but I DO get that it's not everybody's cup of tea! I do, however, use both vi and emacs now.
Umm. I think that about covers it. All of this LLM stuff is still incredibly young to me and I'm just firing a shotgun into the dark and listening to hear if I hit anything. It's going to be that way for a while for all of us until we figure out what works and what does not!
- jkh
That's an educated assumption to make. But therein lies the issue with every LLM "optimized" language, including those recent ones posted here oriented toward minimizing tokens. Assumptions, that are unvalidatable and unfalsifiable, about the kind of output LLMs synthesize/emit when that output is code (or any output to be real).
``` #this is where functions are defined and should compile and give syntax errors ```
:->r = some(param)/connected(param, param, @r)/calls(param)<-:
(yeah, ugly but the idea is there) The point being that the behavior could change. In the streaming world it may, for instance, have guarantees of what executes and what doesn't in case of errors. Maybe transactional guarantees in the stream blocks compared to pure compile optimization in the other blocks? The point here isn't that this is the golden idea, but that we probably should think about the use cases more. High on my list of use cases to consider (I think)
- language independence: LLMs are multilingual and this should be multilingual from the start.
- support streaming vs definition of code.
- Streaming should consider parallelism/async in the calls.
- the language should consider cached token states to call back to. (define the 'now' for optimal result management, basically, the language can tap into LLM properties that matter)
Hmm... That is the top of my head thoughts at least.
What you want is something that is safe, performant, uses minimal tokens and takes careful note of effects and capabilities. Tests aren't really even important for that use case.
hey I found this project december 23 and you just commented on another thing I posted "amazing one shot that" I will give you an invite if you want (because it also does that) check bio will add contact dets now...
it was posted to this site earlier about 20 days ago and front paged and hilariously about half the comments were shooting it down the top voted comment was even "this is the worst website ever" lol xD and they since invite only to manage abuse (its a very capable service and currently free)
It's capable of what you just mentioned, and it made the other site that one-shot you said was amazing for the one shot (literally cut and paste the comment into the prompt, then 2nd was "Good, now do it better")
context("Loading configuration from {file}")
Then you get a useful error message by unfolding all the errors at some point in the program that is makes sense to talk to a human, e.g. logs, rpc error etc.Failed: Loading configuration from .config because: couldn't open file .config because: file .config does not exist.
It shouldn't be harder than a context command in functions. But somehow Rust conspires to require all this error type conversion and question marks. It it is all just a big uncomfortable donkey game, especially when you have nested closures forced to return errors of a specific type.
Clanker (ai assistant) also love to unwrap and if you don’t catch them you have an abort waiting for you.
(You should try to get something like that into various language specs; I'd love you to success with it.)
EDIT: typo fixed.
It’s peculiar to see s-expressions mixed together with imperative style. I’ve been experimenting along similar lines - mixing s-expressions with ML style in the same dialect (for a project).
Having an agentic partner toiling away with the lexer/parser/implementation details is truly liberating. It frees the human to explore crazy ideas that would not have been feasible for a side/toy/hobby project earlier.
This is why I made nanolang always report line numbers accurately and also have a built-in "trace mode" where, with an environment variable set, it would tell you exactly what line number was producing which C code and which behaviors were being exhibited (state changes). The LLM uses this aggressively for debugging!
I am using a variation of spec-driven development.
I also have a few geometric 3D printed objects on my desk that I made with openscad as "printing challenges" and then beat my head against my 3D printers for hours trying to actually print them. They did not need to be printed, they serve no purpose other than to be aesthetically pleasing and educational. :)
NanoLang solves three problems:
LLM Code Generation - Unambiguous syntax reduces AI errors Testing Discipline - Mandatory tests improve code quality
particularly the first.
LLM Code Generation - Unambiguous syntax reduces AI errors
Testing Discipline - Mandatory tests improve code quality
Simple & Fast - Minimal syntax, native performance
Design Philosophy:
Minimal syntax (18 keywords vs 32 in C)
One obvious way to do things
Tests are part of the language, not an afterthought
Transpile to C for maximum compatibility
ehh. i dont think the overhead of inventing a new language makes up for the lack of data around it. in fact if you're close enough to rust/c then llms are MORE likely to make up stuff from their training data and screw up your minimal language.
(pls argue against this, i want to be proven wrong)
so like Go?
> Key Features; Prefix Notation
wow
NEXT!