Show HN: Marsha – An LLM-Based Programming Language (opens in new tab)

(github.com)

140 pointsISV_Damocles2y ago66 comments

66 comments

56 comments · 16 top-level

vector_spaces2y ago· 11 in thread

I don't think it is correct to call this a programming language.

This program enforces some structure on your specifications for LLMs and provides some guardrails, which is absolutely a move in the right direction, but these are related more to formatting rather than specification, i.e. it provides syntax without transparent or unambiguous semantics. Ultimately this is more of a markup format than a programming language. Since:

> The next section is the description of the function. Here you explain what the function should do. Being more explicit here will reduce variability in the generated output and improve reliability in behavior, but it's up to you just how explicit you will be and how much you leave to the LLM to figure out.

There are reasons that natural language is problematic when specifying systems! I wish I could tape a sign that says this to the top of HN given the number of projects on the front page the past few months calling all sorts of things compilers that are just thin wrappers over the OpenAI API.

These projects frame programming with natural language as though it is something desirable, like it is the holy grail of computer science that will open up engineering to the masses. But we use programming languages not to make programming difficult, but because unrestricted natural language is problematic. Systems are best specified with highly constrained languages that are (more) semantically unambiguous. Without sufficient constraints, there's a good chance that we don't even know what we ourselves are talking about when we specify systems -- how can we hope then that computers will?

Even software engineers misinterpret and misunderstand requirements, requirements are poorly specified, or requirements are (apparently) well-understood but the space of possible inputs and use cases are not. This is why mathematicians use a (loosely) restricted jargon and notation, and even they run into difficulties

All of that said: LLMs are surely a useful tool for software engineering and systems-building -- I personally use them most days to that end. But make no mistake that this is a markup language with some added guardrails to help users make better specifications for LLMs and verify the output. It is not a programming language, and programming with natural language is not generally something that is generally possible or even desirable

FireInsight2y ago

> calling all sorts of things compilers that are thin wrappers over the OpenAI API

Well compilation and transpilation are kind or related, and translation is kind of an ambigous version of transpilation, and LLMs are kind of a more nondeterministic superset of natural language translation (processing), which really makes calling it a compiler a huge stretch, I agree.

This all makes me wonder, could it be possible for an LLM to spit out the bits, or even assembly, for a hello world program written in another programming language. That'd kind of make it a really bad compiler.

obeavs2y ago

Nicely considered. I've spent the last two months writing a paper on how natural language (and the ambiguities within) drive most of the inefficiency in today's financial markets. Properly designed DSLs for these spaces (which we've been working on) would do absolute wonders for how anything from housing development to critical infrastructure is financed.

It's remarkable how much more clearly logic is when semantic flourish (and associated ambiguity) from the English language is thoughtfully constrained away, and it's always seemed interesting to see how computer engineers see natural language as the target.

phosphor.co/20x if you'd like to see how this works in contract law and finance :).

ibestvina2y ago

I'm always drawn to these types of initiatives, and your whitepaper looks (at least with my very limited knowledge of the domain) interesting.

What I am always wondering, and maybe you can give some details here, is the following: isn't the fact that regulations are in natural language, with all its ambiguity, a necessary requirement to have the system operate without being fully specified?

In other words, wouldn't any kind of strict DSL force us to think through all the edge cases that might possibly arise, instead of dealing with them when they do arise, which is basically what the judiciary is for? And isn't that a price too high to implement these kinds of systems?

Der_Einzige2y ago

If it's turing complete, it's a programming language. That's it. There's no other requirements.

williamstein2y ago

Turing complete is a property that a programming language may or may not have. Bloop is a non Turing complete programming language: https://en.m.wikipedia.org/wiki/BlooP_and_FlooP

mepian2y ago

Let me introduce you to https://en.wikipedia.org/wiki/Total_functional_programming

jachee2y ago

So, Magic: the Gathering (being Turing complete[0]) is a programming language?

[0]: https://arxiv.org/abs/1904.09828

ISV_DamoclesOP2y ago

I don't believe that I can change your mind on this, so I didn't intend to respond, but as this is the top comment, I do want to provide a rebuttal on why we do think this is actually a programming language, that the code we have written is actually a compiler, and why Marsha is a useful exploration of the programming language design space.

First, a programming language is just a syntax to describe functionality that could be turned into an actual program. Lisp[1] was defined in 1958 but didn't have a full compiler until 1962. Was it not a programming language in the intervening 4 years? Marsha does not fall into this, since it can already generate working code, but the bar for what is a programming language, I believe, is lower than most would immediately think.

Second, a programming language does not need to be imperative to be a programming language, or languages like Lean[2] that have you write proofs that the compiler then figures out how to generate the code to fulfill would not be programming languages. Lean, Coq, and other such languages are much more technically impressive than Marsha, true, but they share the property you describe the properties a function should have and then the compiler generates the program that fulfills those properties.

Marsha differs from these Proof-based languages in that poor specificity still produces some sort of program instead of a compilation error, which makes it sort of like Javascript that will do something with the code you write as long as it is syntactically valid. This is not a desirable property of Marsha, but it is a trade-off that in practice makes it more immediately usable to a larger number of people than Lean or Coq, because the skill level required is lower.

This is also, as you allude to, the current state of the world in most software development -- project managers come up with high-level requirements for new features, technical leads on engineering teams convert this into tasks and requirements for individual contributors who then write the code and tests which are then peer reviewed by the team as a sanity check and then committed. This process may or may not cover all situations and the specifications at all levels are likely not as rigorous as what Lean would require of you.

Marsha mimics this process, starting from the tech lead level and bleeding into the individual contributor level. The type and function descriptions are analogous to the tech lead requirements and the examples are analogous to the test suite the individual contributor would write. Just like in real world development, if these are not well specified, the resulting code will likely have logic bugs that would need to be addressed with a stricter definition and improved test cases.

The compiler consumes this definition into an AST[3], walks the tree to generate intermediate forms, and generates an output in a format that can be executed by a computer. Some use "transpiler" for a compiler that targets another language, but that is a subset of compilers, not a separate kind of tool, in my opinion, or the Java compiler would be a "transpiler" for the JVM bytecode format that is also not directly executable by a computer.

We are still in the very early stages with Marsha and agree that more syntax could be helpful -- we already have 4 different syntactic components to Marsha versus the fully open-ended text entry behavior of Github Copilot or ChatGPT. But what makes Marsha interesting (to me) is that it makes it possible to explore a totally new dimension in programming language design: the formalization of the syntax to define a program itself. In many papers on new algorithms, the logic is often described in a human-readable list of steps without the hard specificity of programming languages, improving the ability of the reader to understand the core of the algorithm, rather than getting bogged down in the implementation details of this or that programming language. There is still a formalism, but it differs from that of traditional programming languages, and Marsha lets you work with your computer in a similar way.

Are there cases where this is a bad idea? Absolutely. Just like there are cases where writing your code in Python is a bad idea versus writing it in Rust. There is no perfect programming language useful for all scenarios, and probably never will exist. But there will be a subset of situations where the trade-offs Marsha provides makes sense. By being more forgiving than even the most forgiving interpreted languages out there, Marsha is in a good position to fill that niche if the primary barrier is difficulty.

[1]: https://en.wikipedia.org/wiki/Lisp_(programming_language)#Hi... [2]: https://en.wikipedia.org/wiki/Lean_(proof_assistant) [3]: https://github.com/alantech/marsha/blob/main/marsha/parse.py...

satisfice2y ago

If I understand this correctly, the source code of a Marsha program does not fully determine the running code. And we aren’t talking about immaterial optimizations, the LLM could do vastly different things with the same Marsha source.

A programmer is a human who connects the world of humans with the world of machines. To do this, he is required to sufficiently understand both worlds. On the human side this requires social competence and professional accountability, which machines don’t have. On the computing side, it requires at least that machines behave in predictable and comprehensible ways. Marsha appears to fall short on both counts.

Using an LLM for programming is inherently irresponsible. The people arguing in favor of doing so have not subjected LLMs to any kind of rigorous testing. They simply have unshakeable faith.

I am in the midst of a careful review and surgical takedown of a 9000 word demonstration of ChatGPT’s supposed ability to help testers test. It took maybe 20 minutes for some drooling consultant fan-boy to produce the demo. It has so far been about 30 hours of work to carefully pore through each sentence and show how it is wrong. I am doing the testing and critical thinking that the original consultant failed to do.

The Marsha site has a brief line about how it produces “tested” Python code. The one thing you can bank on with LLMs is none of you big eyed enthusiasts have a serious attitude about testing. It’s all simplistic demonstration.

I’m frustrated by this culture of fawning adoration of unproven and unprovable tools. I hope this trend peaks and become a generally acknowledged joke soon! Then we can resume with craftsmanship and responsible engineering.

1 more reply

amoss2y ago

> First, a programming language is just a syntax to describe functionality that could be turned into an actual program. Lisp[1] was defined in 1958 but didn't have a full compiler until 1962. Was it not a programming language in the intervening 4 years?

The claim that you make here is not true, and the example that you give does not support your claim. A programming language is more than just syntax - it is the combination of both syntax and semantics together to give a computational meaning to the strings in a language. This is not controversial, this is emphasized in the introduction to any textbook on compilers / language theory so I'll just give you one easy to google reference for this claim:

* https://www.cs.mcgill.ca/~rwest/wikispeedia/wpcd/wp/p/Progra...

A programming language is more than a well defined set of strings. Each of those strings defines a particular computation. This is not true of natural language, where any definition of semantics relies on the semantics of the natural language.

For your specific example of lisp, here is the original 1958 letter:

* https://dl.acm.org/doi/10.1145/368405.1773349

As you can see the description is more than just the syntax of expressions - it describes the evaluation process and how to perform it. This is different from a modern description of semantics as it predates the introduction of operational and denotational styles by a couple of decades.

From the same era here is ALGOL, again it is more than the syntax as a description of the semantics is required to defined which computation is being written down in the language:

* https://www.softwarepreservation.org/projects/ALGOL/report/B...

One of the pillars that you are building your argument upon is very faulty, and I think it would be good to take a moment and consider what that means. Marsha is clearly a program synthesis tool. It is clearly automated in the production of programs. It looks useful in the overall process of programming. But describing it as a programming language is not helpful or useful. Watering down language and definitions does not help to explain what Marsha is or can do, and when you have made something new there is no particular need to try to fit it into an old label that means something else.

wokwokwok2y ago

> Marsha does not fall into this, since it can already generate working code, but the bar for what is a programming language, I believe, is lower than most would immediately think.

Well, we can go back and forth about the technical definition of individual words all day, but 'is it a programming language?' is such a vague question, the argument is basically meaningless.

Do you want to put that label on it? Ok. Someone else disagrees? Huh. Someone called something else a programming language? Someone disagreed with that?

eh...

Since it's purely opinion based, who cares? There's no answer which is 'right'.

I would argue that regardless of semantic details about terminology, there is a fundamental difference between what you're doing here and most common programming languages:

You can have:

1) A series of instructions to do a task, which can be unambiguously mapped into a series of instructions in another format.

2) A series of instructions to do a task, which is mapped non-deterministically into a series of instructions in another format.

Just like you have functions (deterministic) and probability functions (non-deterministic), there is a difference here between those two things.

...

In this case, you're basically generating non-deterministic imperative logic; that's obviously and unambiguously distinct from a deterministic sequence of imperative logic.

It is novel; it is interesting. ...but I don't think it's worth the argument about 'is it a programming language'; it's clearly very different from existing languages.

> improving the ability of the reader to understand the core of the algorithm, rather than getting bogged down in the implementation details of this or that programming language. There is still a formalism, but it differs from that of traditional programming languages, and Marsha lets you work with your computer in a similar way.

I applaud this intent, but I'm skeptical.

Once again, you are non-deterministically mapping the 'core logic' of the algorithmic into a sequence of deterministic steps that may or may not match the request. That's the point; it's non-deterministic.

It could do anything; the P value of it doing something crazy might drop, but it's not zero; and fundamentally, how can you rely on a system where the instructions you give may or may not map to the machine code output?

You add tests? Sure... but, those are generated too right?

You have to dance through a series of tighter and tighter hoops to try to reduce the P value of "crazy hallucination and chaos", but I see no meaningful insight here about how you plan to mitigate that problem completely?

...and if you don't mitigate it completely, unlike a constraint solver, the non-deterministic output you get cannot be validated to be correct...

It's not about specifying the syntax in a different more readable form; it's about confidence that the output matches the constraints of the input; and I don't see that here.

Given the context length (and nature of large contexts in general) in LLMs, I also ponder whether it's even possible to do this beyond the trivial form, because it seems like as the constraint set scales, the capability of any LLM to address those constraints (and to be confident that it has) seems like a difficult problem to solve.

However, I would like to say that I see this domain as an interesting area of research; and most certainly neither a) a solved problem, or b) a dead end. There's definitely stuff here worth playing with and exploring.

...regardless of if people think of it 'as a programming language', or not.

1 more reply

andreygrehov2y ago· 7 in thread

Sharing a comment of mine (that got downvoted) from another, unrelated, thread. IMHO, it somewhat applies here as well:

> Looking back, we can see how Machine Code, with its intricate and challenging nature, paved the way for more accessible options. Assembly language then emerged, providing a higher level of abstraction and reducing the complexities of directly working with machine instructions. And of course, C followed suit, offering even greater simplicity and ease of use compared to Assembly.

> Imagine a future where programming languages, as we know them today, become akin to CPU instructions – a foundational and low-level primitive. LLMs will revolutionize the way we interact with code, providing a unified interface where the complexities of various languages are distilled into a common representation. The proliferation of individual programming languages will wane. Knowing Java or C++ will become a rare skill, akin to individuals specializing in low-level optimizations using Assembly language these days.

> As time progresses, even the convenience of LLMs may pose challenges, given our inherent tendency towards laziness, so an additional layer of abstraction will be introduced, bridging the gap between LLMs and spoken languages. BCIs will revolutionize the act of coding itself so that individuals can seamlessly "code" by simply "thinking" about their desired actions.

albert_e2y ago

Mostly agree.

Not sure why you were downvoted -- but this seems fairly straightforward in terms of logical progression of abstraction.

I would say LLMs may or may not be that next step in the chain before we see direct interfaces with brain for creating new programs -- but they sure seem promising.

BCI itself will still need go through a few steps where first we can reliably select from choices (click buttons) across diverse use cases, and then graduate to inferring intent / extracting logic or language etc. -- to reach a stage where we are able to ask a question of Alexa/Siri or prompt ChatGPT in a coherent way by simply thinking about it. Even the hardware would need to achieve a form factor that can be adopted by masses.

By that time hopefully LLMs would have graduated to a level that can take that input and give us robust programs / API calls etc. Or something else other than LLMs emerges by that time to fulfil the same function.

But once we get there though -- our communication with machines need not be constrained solely by clicks and words or programming constructs and logic -- but could take on a much more complex and unconstrained form. We can think in abstract concepts, ideas, visualizations, feelings and emotions -- and all of them are rich fodder as input for BCI + generative models.

For example, I could think musical ideas and am able to generate actual music live -- or "save" musical ideas as .RAW thoughts for later processing.

Or I think of the concept of frisson and imagine the feeling of goosebumps -- and able to generate content / physical design ideas that result in that sensation or state of mind.

Telepathy is one step away from there.

mathbalduino2y ago

Agree. It’s just that the majority of Hacker News users will resist such a major change

Right now, it’s not that reliable, of course, but eventually it will be. As everything else, it will evolve

Don’t think it will be production-ready in the next 10 years or so, but eventually it will happen…

(I’ll save this reply to revisit in the future lol)

TheMode2y ago

> Right now, it’s not that reliable, of course, but eventually it will be. As everything else, it will evolve

What make you think that natural languages will ever become reliable/stable? It hasn't been for millennia

1 more reply

nborwankar2y ago

I have a fundamental disagreement about the premise that abstraction from CPU instructions should be the basis of a language model. I posit that the starting point should be lambda calculus which is formally equivalent to Turing machine based theories. Lambda calculus -> functional languages -> algebraic type aware languages (Haskell?) -> languages used in theorem proving (Agda ?..) is a progression that allows logic and provably correct programs and reasoning about programs. Messy natural language can be used as away to collect requirements by an LLM that can translate them into formally correct tests and specifications which can be used to Instruction-train LLM’s that could (insert frenetic hand waving here) write provably correct (or close) code in say Haskell or Agda. In summary Lambda Calculus is a better starting point because it’s logically closer to formal languages and reasoning.

andreygrehov2y ago

My speculation was not about LLM being an abstraction around CPU instructions. I don't think such a jump would produce a reliable code, so I hands down agree with you. In fact, your proposal is on par with what I had in mind.

bryanrasmussen2y ago

You seem to believe that code will be able to be written in natural language?

Many people argue this will never happen because of ambiguity of spoken language. I suppose you will agree that the level of ambiguity in spoken language will never decrease significantly give that it does not seem to have done so over the course of human evolution.

IF you do believe it will decrease - how?

I agree there will probably be thought -> computer interfaces at some time but it seems to me, from what I understand of human thought, that the non-verbal thought can be even more ambiguous than the verbal. It at any rate varies greatly between people how precise they can think, surely some people will be more suited to precise thinking and be the ones who write programs - perhaps using some sort of tool to focus and make more precise their thinking such as writing which I have found is often a great enabler of focus and clarity.

If however you can agree that ambiguity will not decrease what does this do about inherent bugs introduced by ambiguity, how do you see the debugging process? Do you think that the ambiguity of spoken language creates more and greater bugs than the sort of bugs that occur today in programs? I personally believe that ambiguity bugs will be a whole new order of bugs layered on top of all the other types of bugs we have now.

Thus, due to the problems of ambiguity in human thought, I do not believe this vision will come to pass, although I believe there will be spoken instructions to computers that do particular things.

quickthrower22y ago

Take a typical remote first development effort like a OSS project. How does this get developed? Language as text!

So at some level it must be possible to specify what you mean in language and have agent’s turn that into code.

I think what you need is more refined language. Don’t just say I want X. Give examples (these become unit tests). Have the LLM ask clarifying questions.

Whether we can make something useful today like that with GPT4 I am not sure but it will be possible.

The language crafting (and understanding if importance if accuracy) will be the hired skill when “programmers are fired”. That is a harder to AI-ify skill.

sudosteph2y ago· 7 in thread

Is there a new trend to give AI/tech products common women's names, (Alexa, Macie, Ada, Clara, Julia, etc)? The only male named tech product I can think of really is Watson, and that's old hat by now. It honestly kind of weirds me out and I feel for people who have to share their names with a product. At least Siri and Cortana seem pretty unique in that respect. Fwiw, I'm sure plenty of products with traditionally male names exist, I just can't recall encountering many at work or casually.

FireInsight2y ago

I think it's some weird manifestation of women being culturally more often in jobs as assistants to men and men in the leading roles. That and wanting the AI to seem approachable and human.

ISV_DamoclesOP2y ago

Well, a prior project we worked on was named Alan[1]. The choice was somewhat arbitrary: https://marsha.ai was available and we thought it was a fine name so here we are.

[1]: https://alan-lang.org

quickthrower22y ago

Bard (neutral but makes me think of Shakespeare), Claude?

swyx2y ago

there's also Jarvis and Alfred, on the male side, but yeah, it's a known cognitive bias to implicitly associate subservient things with female names. not much to do except be aware and try to correct for it. i think there's like 3 AI assistants named Samantha. obviously inspired by the movie, but does show a lack of imagination.

gs172y ago

ELIZA probably started it back in the 60s.

bestcoder692y ago

Hm, well, ChatGPT is a boy.

FireInsight2y ago

> As an artificial intelligence language model, I don't have a gender or personal identity. I'm just a program designed to assist and provide information to users like you. How can I help you today?

_ea1k2y ago· 5 in thread

This feels inevitable to me. All software engineering problems can be solved by the addition of another layer of abstraction.

Why not abstract away the "how" layer and focus completely on the "what" layer?

For a lot of data processing and integration problems, this would both eliminate a lot of work and increase reliability.

pombo2y ago

I'm not sure I agree with the fact that an abstraction is always the answer, but your last sentence outlines precisely one of our motivations behind Marsha. You can specify an imperative set of steps in the description of a function/program, or you can (in the future) write a Python function that you use from Marsha

_ea1k2y ago

The first sentence was a little tongue in cheek, so I'm glad that you didn't 100% agree with it. :) But it was a all-too-common paradigm in my early years as a Java developer.

ModernMech2y ago

Because a lot of the time, exactly how is very important. The devil is in the details.

pombo2y ago

I would say that it depends. Python and JS, two of the most used programming languages, abstract away memory management and threading details that most of the time you don't need, but you can "drop down" to write code that does pay attention to that within the language (building an object pool and reusing it, or memoization, etc) and if that's not enough, go to the "deeper" language like C/Rust to handle those cases with first-class primitives.

_ea1k2y ago

That's what all the C engineers told me about manual memory management.

Of course, there are cases where they were right.

cabalamat2y ago· 2 in thread

How easy would it be to make it output programs in a functional language, such as Haskell? it would be nice if it could be made to produce functions that are guaranteed to be free from side effects.

In the long term I can imagine that the output languages of code generators like this might be ones specially designed to work well with AI code generators.

ISV_DamoclesOP2y ago

So, targeting other languages is on our roadmap, and we have found higher reliability when we stick to a functional style with Marsha, so I think Haskell will be one of the "good ones" amongst target languages, eventually.

But we are focusing on Python first because it's one of the most popular languages on the planet and many LLMs out there are able to generate high quality code for it, while less-used languages tend to produce worse results. Haskell may or may not need a manually-tuned LLM to generate solid results.

cabalamat2y ago

> I think Haskell will be one of the "good ones" amongst target languages, eventually

Another advantage of Haskell is that its side-effect-free nature makes it harder for a rogue AI to put in malicious code, thus helping to solve the alignment problem.

It may be that languages and OSes will be specifically designed to put everything in a sandbox (or multiple layers of sandboxing), which will also be helpful when humans write code.

> while less-used languages tend to produce worse results

I figured that might be the case.

> Haskell may or may not need a manually-tuned LLM to generate solid results.

One possibility would be to write some Haskell programs, that illustrate paradigms in the language and train the LLM on them. Then get the LLM to solve similar problems, and automatically test how good its solutions are on grounds of solving the problem (passing test cases), performance (time and space used) and how easy it is for humans to understand the programs.

Then use those solution to train the LLM to be better at Haskell, and give it another set of slightly harder problems to solve.

In this way it would learn programming pretty much as humans do, by solving incrementally harder problems.

Winse2y ago· 2 in thread

I thought this would kind of just be a pile of garbage, but I have to admit I was drawn in. There are some interesting novel pieces in Marsha and I am somewhat impressed with this project.

weego2y ago

The most interesting thing for me is them providing examples of the function made me realise this is the logical conclusion of TDD.

Write tests to build the input -> output contracts, have AI build the logic that conforms to those contracts.

ISV_DamoclesOP2y ago

yeah, the AI hype cycle can be a bit tiring, but I'm glad you took a second look. What parts did you find interesting, in particular?

roguecoder2y ago· 2 in thread

It is interesting to me how enthusiastic people are about natural language interfaces despite presumably having worked with requirements documents already (which are a programming language compiled by programmers).

If we want our systems to behave gracefully and predictably, they aren't going to be built in natural language: they are going to be built in a system with locally-specific semantics.

To build that with natural language, I would expect to see a conversational system that can request clarifications for ambiguity and negotiate local, specific definitions of the words it is using, similar to Domain Driven Design today. But for that to work, it will need semantic modeling not just the correlation in LLMs.

voidhorse2y ago

This touches on how I've felt about this since the LLM craze started and people began exploring how to use it for programming. It's kind of amazing that we can define formal languages in which it's possible to specify provably correct behavior and now we want to....introduce randomness and greater possibility for unanticipated errors by generating code via LLMs?

I feel like those pursuing LLMs basically lack respect for the inherent complexities of the many spheres of life. Not everything is reducible to linguistic pattern matching.

ISV_DamoclesOP2y ago

I think you are painting with too broad of a brush. There are many domains that I would never use an LLM-based tool for; all tools can be used incorrectly, but that doesn't make the tool at fault.

Software engineering is about trade-offs, for LLM-based code generation in general the trade-off is speeding up the writing of code at the expense of precision in what is generated. When you use something like Copilot it uses the comment or function signature to "guess" what you intend to write, and sometimes it right, sometimes it's not.

Marsha is exploring that trade-off space. Copilot finishes in 5-20 seconds, usually, while Marsha's slower, sometimes as fast as 20 seconds, but usually a little over a minute. The syntax requires you to provide more information up front than just a comment or a function signature and also uses that up front information to generate a test suite to improve the reliability of what it outputs, which increases iterations with the LLM and therefore slows it down.

Only when the code generated passes the test suite will it actually return an output to you, so the code it generated passes the cases that you were able to think of, which should make it much more precise than Copilot. That may still fail, but probably in ways your own code would have failed for cases you hadn't considered, so this particular trade-off feels closer to "free" versus writing it up by hand, in my opinion.

But again, when to use the tool is a decision you must make. You can see from our own examples that we've only used it so far on toy problems or problems small enough that manual review is feasible. Since the test suite always passes at the end (or it simply fails to generate if not), that makes it better than many Eng I and some Eng II level engineers I have worked with in the past. ;)

ilaksh2y ago· 2 in thread

This is great. Have you thought about adding tree of thoughts?

pombo2y ago

one of the contributors here. we have a statistical test script we can run on our branches to test the compile time, cost and reliability. we want to try tree of thought and also something like https://www.reddit.com/r/ChatGPT/comments/14d7pfz/become_god.... that said, we found that when we asked GPT to first explain why a test case is failing and then to correct that failure instead of just asking it to correct the failure, unexpectedly costs went up and reliability went down

ISV_DamoclesOP2y ago

Btw, here's the test job: https://github.com/alantech/marsha/blob/main/.github/workflo... And the core script for the job: https://github.com/alantech/marsha/blob/main/marsha/.time.py

jstarfish2y ago· 1 in thread

Heh...and here I thought I was being clever using Inform 7 as a metalanguage. This is way more concise.

ISV_DamoclesOP2y ago

Well, to be fair, we do bulk up what you type quite a bit[1] to improve GPT's response.

[1]: https://github.com/alantech/marsha/blob/main/marsha/parse.py...

cabalamat2y ago· 1 in thread

Why not just define the function headers in Python? It's less verbose.

ISV_DamoclesOP2y ago

Simply, Marsha is not Python. ;)

More seriously, in Python, types are optional and the exact behavior is explicitly defined at runtime, but the author of the code already had a behavior in mind when they wrote the code. Getting the intended type information from the user to provide to the LLM improves the quality of the generated code, which is actually somewhat lossy. The "intended type information" isn't "explicit type information", though. We support any character besides these four as part of a type `(),:`. That includes spaces and periods, so you can describe the type as a short sentence if you aren't exactly sure what the structure ought to be.[1]

And since in Marsha you don't actually write an imperative function body, argument names don't matter, only the argument order (as Marsha does not support named parameters, at least right now) so there's no argument naming in the declaration, which can make it more succinct than Python def lines in certain situations, though to be fair none of our examples do this.

[1]: https://github.com/alantech/marsha/blob/main/examples/web/du...

ianbicking2y ago

This reminds me a lot of AutoGPT and GPT-engineer and other "full automation" systems, only much more practical and reasonable.

Like if I was going to try to make one of those self-programming systems work I'd be focused on getting it to outline and specify components, then divide that work on components, maybe doing revision passes, think about the specification order, etc. Many of those steps are things programmers naturally do and are supported by modules and functions and tests...

So phrasing the specification language as a programming language instead of a single prompt is pretty reasonable.

That said, I'm not sure if this needs to be its own language? Could it be in Python directly? (I'm reminded of my own magic Python module: https://ianbicking.org/blog/2023/01/infinite-ai-array.html – though honestly I haven't touched it since writing it... too much magic makes something not really useful.)

I think it would help if the development process was maybe more incremental. I don't really use REPLs because I find it hard to construct and maintain the REPL environment. But maybe those compromises are more reasonable given the unique challenges of AI-written code. Like maybe the AI should be building mocks from the beginning, kind of gentle placeholders that get thrown away quickly, but since the AI wrote the original code no one really cares.

I guess I'm proposing processes (architectures?) of construction that are specific to AI. Maybe not entirely novel, but the result of looking holistically at the development experience and picking out ideas that allow for incremental confirmation and refinement. AI is often better at making changes and refinements than creating something from whole cloth.

chrisjj2y ago

> Marsha uses this to provide more information to the LLM to generate the logic you want, but also uses it to generate a test suite to validate that what it has generated actually does what you want it to.

What I can't find here is the component that reads "what you want it to" from the mind of the user.

chadnorvell2y ago

Check out maccarone (https://github.com/bsilverthorn/maccarone) for a different take on this concept.

petabytes2y ago

I just made something exactly like this a few weeks ago. My main goal was to see if I could self-host it. How long will it be before Marsha is self hosted?

dd_xplore2y ago

I never knew there was a Wikipedia package!

franciscomello2y ago

Sounds very interesting.

j / k navigate · click thread line to collapse

66 comments

56 comments · 16 top-level

vector_spaces2y ago· 11 in thread

I don't think it is correct to call this a programming language.

FireInsight2y ago

> calling all sorts of things compilers that are thin wrappers over the OpenAI API

obeavs2y ago

phosphor.co/20x if you'd like to see how this works in contract law and finance :).

ibestvina2y ago

I'm always drawn to these types of initiatives, and your whitepaper looks (at least with my very limited knowledge of the domain) interesting.

Der_Einzige2y ago

If it's turing complete, it's a programming language. That's it. There's no other requirements.

williamstein2y ago

Turing complete is a property that a programming language may or may not have. Bloop is a non Turing complete programming language: https://en.m.wikipedia.org/wiki/BlooP_and_FlooP

mepian2y ago

Let me introduce you to https://en.wikipedia.org/wiki/Total_functional_programming

jachee2y ago

So, Magic: the Gathering (being Turing complete[0]) is a programming language?

[0]: https://arxiv.org/abs/1904.09828

ISV_DamoclesOP2y ago

[1]: https://en.wikipedia.org/wiki/Lisp_(programming_language)#Hi... [2]: https://en.wikipedia.org/wiki/Lean_(proof_assistant) [3]: https://github.com/alantech/marsha/blob/main/marsha/parse.py...

satisfice2y ago

Using an LLM for programming is inherently irresponsible. The people arguing in favor of doing so have not subjected LLMs to any kind of rigorous testing. They simply have unshakeable faith.

1 more reply

amoss2y ago

* https://www.cs.mcgill.ca/~rwest/wikispeedia/wpcd/wp/p/Progra...

For your specific example of lisp, here is the original 1958 letter:

* https://dl.acm.org/doi/10.1145/368405.1773349

From the same era here is ALGOL, again it is more than the syntax as a description of the semantics is required to defined which computation is being written down in the language:

* https://www.softwarepreservation.org/projects/ALGOL/report/B...

wokwokwok2y ago

> Marsha does not fall into this, since it can already generate working code, but the bar for what is a programming language, I believe, is lower than most would immediately think.

Well, we can go back and forth about the technical definition of individual words all day, but 'is it a programming language?' is such a vague question, the argument is basically meaningless.

Do you want to put that label on it? Ok. Someone else disagrees? Huh. Someone called something else a programming language? Someone disagreed with that?

eh...

Since it's purely opinion based, who cares? There's no answer which is 'right'.

I would argue that regardless of semantic details about terminology, there is a fundamental difference between what you're doing here and most common programming languages:

You can have:

1) A series of instructions to do a task, which can be unambiguously mapped into a series of instructions in another format.

2) A series of instructions to do a task, which is mapped non-deterministically into a series of instructions in another format.

Just like you have functions (deterministic) and probability functions (non-deterministic), there is a difference here between those two things.

...

In this case, you're basically generating non-deterministic imperative logic; that's obviously and unambiguously distinct from a deterministic sequence of imperative logic.

It is novel; it is interesting. ...but I don't think it's worth the argument about 'is it a programming language'; it's clearly very different from existing languages.

I applaud this intent, but I'm skeptical.

You add tests? Sure... but, those are generated too right?

...and if you don't mitigate it completely, unlike a constraint solver, the non-deterministic output you get cannot be validated to be correct...

It's not about specifying the syntax in a different more readable form; it's about confidence that the output matches the constraints of the input; and I don't see that here.

...regardless of if people think of it 'as a programming language', or not.

1 more reply

andreygrehov2y ago· 7 in thread

Sharing a comment of mine (that got downvoted) from another, unrelated, thread. IMHO, it somewhat applies here as well:

albert_e2y ago

Mostly agree.

Not sure why you were downvoted -- but this seems fairly straightforward in terms of logical progression of abstraction.

I would say LLMs may or may not be that next step in the chain before we see direct interfaces with brain for creating new programs -- but they sure seem promising.

For example, I could think musical ideas and am able to generate actual music live -- or "save" musical ideas as .RAW thoughts for later processing.

Or I think of the concept of frisson and imagine the feeling of goosebumps -- and able to generate content / physical design ideas that result in that sensation or state of mind.

Telepathy is one step away from there.

mathbalduino2y ago

Agree. It’s just that the majority of Hacker News users will resist such a major change

Right now, it’s not that reliable, of course, but eventually it will be. As everything else, it will evolve

Don’t think it will be production-ready in the next 10 years or so, but eventually it will happen…

(I’ll save this reply to revisit in the future lol)

TheMode2y ago

> Right now, it’s not that reliable, of course, but eventually it will be. As everything else, it will evolve

What make you think that natural languages will ever become reliable/stable? It hasn't been for millennia

1 more reply

nborwankar2y ago

andreygrehov2y ago

bryanrasmussen2y ago

You seem to believe that code will be able to be written in natural language?

IF you do believe it will decrease - how?

Thus, due to the problems of ambiguity in human thought, I do not believe this vision will come to pass, although I believe there will be spoken instructions to computers that do particular things.

quickthrower22y ago

Take a typical remote first development effort like a OSS project. How does this get developed? Language as text!

So at some level it must be possible to specify what you mean in language and have agent’s turn that into code.

I think what you need is more refined language. Don’t just say I want X. Give examples (these become unit tests). Have the LLM ask clarifying questions.

Whether we can make something useful today like that with GPT4 I am not sure but it will be possible.

The language crafting (and understanding if importance if accuracy) will be the hired skill when “programmers are fired”. That is a harder to AI-ify skill.

sudosteph2y ago· 7 in thread

FireInsight2y ago

I think it's some weird manifestation of women being culturally more often in jobs as assistants to men and men in the leading roles. That and wanting the AI to seem approachable and human.

ISV_DamoclesOP2y ago

Well, a prior project we worked on was named Alan[1]. The choice was somewhat arbitrary: https://marsha.ai was available and we thought it was a fine name so here we are.

[1]: https://alan-lang.org

quickthrower22y ago

Bard (neutral but makes me think of Shakespeare), Claude?

swyx2y ago

gs172y ago

ELIZA probably started it back in the 60s.

bestcoder692y ago

Hm, well, ChatGPT is a boy.

FireInsight2y ago

> As an artificial intelligence language model, I don't have a gender or personal identity. I'm just a program designed to assist and provide information to users like you. How can I help you today?

_ea1k2y ago· 5 in thread

This feels inevitable to me. All software engineering problems can be solved by the addition of another layer of abstraction.

Why not abstract away the "how" layer and focus completely on the "what" layer?

For a lot of data processing and integration problems, this would both eliminate a lot of work and increase reliability.

pombo2y ago

_ea1k2y ago

The first sentence was a little tongue in cheek, so I'm glad that you didn't 100% agree with it. :) But it was a all-too-common paradigm in my early years as a Java developer.

ModernMech2y ago

Because a lot of the time, exactly how is very important. The devil is in the details.

pombo2y ago

_ea1k2y ago

That's what all the C engineers told me about manual memory management.

Of course, there are cases where they were right.

cabalamat2y ago· 2 in thread

How easy would it be to make it output programs in a functional language, such as Haskell? it would be nice if it could be made to produce functions that are guaranteed to be free from side effects.

In the long term I can imagine that the output languages of code generators like this might be ones specially designed to work well with AI code generators.

ISV_DamoclesOP2y ago

cabalamat2y ago

> I think Haskell will be one of the "good ones" amongst target languages, eventually

Another advantage of Haskell is that its side-effect-free nature makes it harder for a rogue AI to put in malicious code, thus helping to solve the alignment problem.

It may be that languages and OSes will be specifically designed to put everything in a sandbox (or multiple layers of sandboxing), which will also be helpful when humans write code.

> while less-used languages tend to produce worse results

I figured that might be the case.

> Haskell may or may not need a manually-tuned LLM to generate solid results.

Then use those solution to train the LLM to be better at Haskell, and give it another set of slightly harder problems to solve.

In this way it would learn programming pretty much as humans do, by solving incrementally harder problems.

Winse2y ago· 2 in thread

I thought this would kind of just be a pile of garbage, but I have to admit I was drawn in. There are some interesting novel pieces in Marsha and I am somewhat impressed with this project.

weego2y ago

The most interesting thing for me is them providing examples of the function made me realise this is the logical conclusion of TDD.

Write tests to build the input -> output contracts, have AI build the logic that conforms to those contracts.

ISV_DamoclesOP2y ago

yeah, the AI hype cycle can be a bit tiring, but I'm glad you took a second look. What parts did you find interesting, in particular?

roguecoder2y ago· 2 in thread

If we want our systems to behave gracefully and predictably, they aren't going to be built in natural language: they are going to be built in a system with locally-specific semantics.

voidhorse2y ago

I feel like those pursuing LLMs basically lack respect for the inherent complexities of the many spheres of life. Not everything is reducible to linguistic pattern matching.

ISV_DamoclesOP2y ago

I think you are painting with too broad of a brush. There are many domains that I would never use an LLM-based tool for; all tools can be used incorrectly, but that doesn't make the tool at fault.

ilaksh2y ago· 2 in thread

This is great. Have you thought about adding tree of thoughts?

pombo2y ago

ISV_DamoclesOP2y ago

Btw, here's the test job: https://github.com/alantech/marsha/blob/main/.github/workflo... And the core script for the job: https://github.com/alantech/marsha/blob/main/marsha/.time.py

jstarfish2y ago· 1 in thread

Heh...and here I thought I was being clever using Inform 7 as a metalanguage. This is way more concise.

ISV_DamoclesOP2y ago

Well, to be fair, we do bulk up what you type quite a bit[1] to improve GPT's response.

[1]: https://github.com/alantech/marsha/blob/main/marsha/parse.py...

cabalamat2y ago· 1 in thread

Why not just define the function headers in Python? It's less verbose.

ISV_DamoclesOP2y ago

Simply, Marsha is not Python. ;)

[1]: https://github.com/alantech/marsha/blob/main/examples/web/du...

ianbicking2y ago

This reminds me a lot of AutoGPT and GPT-engineer and other "full automation" systems, only much more practical and reasonable.

So phrasing the specification language as a programming language instead of a single prompt is pretty reasonable.

chrisjj2y ago

What I can't find here is the component that reads "what you want it to" from the mind of the user.

chadnorvell2y ago

Check out maccarone (https://github.com/bsilverthorn/maccarone) for a different take on this concept.

petabytes2y ago

I just made something exactly like this a few weeks ago. My main goal was to see if I could self-host it. How long will it be before Marsha is self hosted?

dd_xplore2y ago

I never knew there was a Wikipedia package!

franciscomello2y ago

Sounds very interesting.

j / k navigate · click thread line to collapse