Software 3.1? – AI Functions (opens in new tab)

(blog.mikegchambers.com)

42 pointsaspittel4mo ago59 comments

59 comments

58 comments · 32 top-level

renegade-otter4mo ago· 7 in thread

This has big "let's do this because we can" energy.

What is the BENEFIT of all this?

Let's use Blockchain instead of a database - because we can.

Let's create a maze of microservices - because we can.

Let's make every function a lambda function - because we can.

Let's make AI write code, run it, verify it, fix it, then run it again - because we can.

Let's burn untold amounts of energy to do simple things - because we can.

marginalia_nu4mo ago

Because we can? More like because I have equity in a company that sells this stuff.

bilekas4mo ago

"Does this mean I won't need to pay a skilled person?"

1-64mo ago

To you, what's the point of spending countless billions on space exploration?

otikik4mo ago

Space exploration needs things to be better. Better propulsors. Better seals. Better materials. Or else the space capsule explodes and people die.

What the article is proposing is making programming worse, for no apparent benefit for anyone except those who sell AI data center cycles.

gdulli4mo ago

Good comp. Working with expensive materials and stuff that can explode while people are inside by necessity forces a greater scrutiny of good vs. bad ideas. You don't get that ideal balance between experimentation and wisdom when anyone can type anything into an editor at no cost.

renegade-otter4mo ago

You can make that argument about every single thing that is wasteful but can be justified as "research".

Sure, every bit of f--ing around is research, but ROI is far from constant.

gdulli4mo ago

Discretion will be the better part of the tech industry, if we ever reach that maturity level.

leoedin4mo ago· 6 in thread

I can't even imagine how many joules would be used per function call!

As an experiment, it's kind of cool. I'm kind of at a loss to what useful software you'd build with it though. Surely once you've run the AI function once it would be much simpler to cache the resulting code than repeatedly re-generate it?

Can anyone think of any uses for this?

ryancoleman4mo ago

They're handy for situations where it would be impractical to anticipate the way your input might vary. Like say you want to accept invoices or receipts in a variety of file formats where the data structure varies but you can rely on the LLM to parse and organize. AI Functions lets you describe how that logic should be generated on-demand for the input received, with post-conditions (another Python function the dev write) which define what successful outcomes look like. Morgan wrote about the receipt parser scenario here: https://dev.to/morganwilliscloud/the-python-function-that-im... (FYI I'm on the Strands Agents team)

simsla4mo ago

I've used stuff like this for a hobby project where "effort to write it" vs "times I'm going to use it" is heavily skewed [0]. For production use cases, I can only see it being worth it for things that require using an ML model anyway, like "summarize this document".

[0] e.g. something like the below which I expect to use maybe a dozen times total.

Main routine: In folder X are a bunch of ROM files (iso, bin, etc) and a JSON file with game metadata for each. Look for missing entries, and call [subroutine] once per file (can be called in parallel). When done, summarise the results (successes/failures) based on the now updated metadata.

Subroutine: (...) update XYZ, use metacritic to find metadata, fall back to Google.

amelius4mo ago

You just tell the AI: use as little energy as possible, by whatever means necessary!

pphysch4mo ago

Anthropic announces deal to buy 100% of Idaho's potato crop, in return for options, in new energy efficiency push

re-thc4mo ago

> run the AI function once it would be much simpler to cache the resulting code than repeatedly re-generate it?

Surely, you'll run a function that does an AI call to cache the resulting code.

ryancoleman4mo ago

The initial version on GitHub does not implement caching or memorization but it's possible and where the project will likely head. (FYI I'm on the Strands Agents team).

chaboud4mo ago· 4 in thread

Why stop there? Just call the LLM with the data and function description and get it to return the result!

(I'll admit that I've built a few "applications" exploring interaction descriptions with our Design team that do exactly this - but they were design explorations that, in effect, used the LLM to simulate a back-end. Glorious, but not shippable.)

ryancoleman4mo ago

That's basically how it works! (with human authored functions that validate the result, automatically providing feedback to the LLM if needed)

falcor844mo ago

Because you often need the result not as a standalone artifact, but as a piece in a rigid process, consisting with well-defined business logic and control flow, with which you can't trust AI yet.

mtw144mo ago

What was the gap you discovered that made it not shippable? This is an experimental project, so I'm curious to know what sorts of problems you ran into when you tried a similar approach.

chaboud4mo ago

Three things:

1. Confirmable, predictable behavior (can we test it, can we make assurances to customers?).

2. Comparative performance (having an LLM call to extract from a list in 100s of ms instead of code in <10ms).

3. Operating costs. LLM calls are spendy. Just think of them as hyper-unoptimized lossy function executors (along with being lossy encyclopedias), and the work starts to approach bogo algorithm levels of execution cost for some small problems.

Buuuuuut.... I had working functional prototype explorations with almost no work on my end, in an hour.

We've now extended this thinking to some experience exploration builders, so it definitely has a place in the toolbox.

stackghost4mo ago· 2 in thread

I'm normally pessimistic about LLMs but I'll be the contrarian here and suggest there's actually a potential use case for what TFA proposes and it's programmatic/procedural generation for large game worlds.

renegade-otter4mo ago

There is a use for everything. The problem is, people will try to use this to create CRUD apps for no goddamned reason.

stackghost4mo ago

>There is a use for everything.

Eventually, perhaps. I've yet to see a use case for blockchains that isn't merely a worse facsimile of something already existing.

But the electron was useless when it was discovered, so maybe one day

waynesonfire4mo ago· 2 in thread

Obvisouly you have never built software. English is a terrible programming language, you cannot have ambiguity in defining your computation.

PhunkyPhil4mo ago

Product owners and business people request code in vague English all the time. It's our job to parse it to code using our own judgement.

squeefers4mo ago

> you cannot have ambiguity in defining your computation

nobody except for maybe nasa would make software in this scenario.

throwup2384mo ago· 1 in thread

Haven’t we been seeing libraries that implement this pattern going on two years now? Take the docstring and monkey patch the function with llm generated code, with optional caching against an AST hash key.

The reason it hasn’t take off is that it’s a supremely bad and unmaintable idea. It also just doesn’t work very well because the LLM doesn’t have access to the rest of the codebase without an agentic loop to ground it.

kingstnap4mo ago

The real reason its bad is because its not really easier to be more productive doing this:

> You write a Python function with a natural language specification instead of implementation code. You attach post-conditions – plain Python assertions that define what correct output looks like.

> You write a Python function with ~~a natural language specification instead of~~ implementation code.

In many cases.

manofmanysmiles4mo ago· 1 in thread

I'd like to see this with a proper local "instruction cache."

It might even be fun that the first call generates python (or other langauge), and then subsequent calls go through it. This "otpimized" or "compiled" natural langauge is "LLMJitted" into python. With interesting tooling, you could then click on the implementation and see the generated cod, a bit like looking at the generated asssembly. Usually you'd just write in some hybrid pytnon + natural language, but have the ability to look deeper.

I can also imagine some additional tooling that keeps track of good implementations of ideas that have been validated. This could extend to the community. Package manager. Through in TRL + web of tust and... this could be wild.

Really tricky functions that the LLM can't solve could be delegated back for human implementation.

falcor844mo ago

Nice! I can almost see your vision. In terms of tooling, I think this could be integrated with deep instrumentation (a-la datadog) and used to create self-improving systems.

zeckalpha4mo ago· 1 in thread

There were people doing this sort of thing 2-3 years ago. What are they doing now?

blibble4mo ago

apparently still writing blog posts on it and posting them to HN

furyofantares4mo ago· 1 in thread

People did this 3 or more years ago. It's funny, but no less dumb now than it was then.

re-thc4mo ago

It's in the title. Software 3.1 (years ago).

moffers4mo ago· 1 in thread

Could you do this with erlang’s term to binary functionality?

Stromgren4mo ago

I use Tidewave as my coding agent and it’s able to execute code in the runtime. I believe it’s using Code.eval_string/3, but you should be able to check the implementation. It’s the project_eval tool.

In my experience it’s a huge leap in terms of the agent being able to test and debug functionality. It’ll often write small code snippets to test that individual functions work as expected.

kaspermarstal4mo ago

I’m quite sure that’s the en state of software except without the software around it. There will only be an AI and interface. For now, though, while tokens cost a non-trivial amount of energy, I think you can do something more useful if you have the LLM modify the program at runtime because it’s just may orders of magnitude cheaper. Fx, use the BEAM, it’s actor model, hot code reloading, and REPL introspection and you can build a program that an LLMs can change, e.g. user says “become a calculator” and “become a pdf to html converter”.

I’m not just making this stuff up of course, got the idea yesterday after reading Karpathy’s tweet about Nanoclaws contribution model (don’t submit PRa with features, submit PRs that tell an llm how to modify the program). Now I can’t concentrate on my day job. Can’t stop thinking about my little elixir beam project.

spoj4mo ago

A lot of valid concerns against arb code execution in prod for security, performance, auditability etc.

However, I do resonate somewhat with the post if I think about some accounting processes.

Accounting is where I came from, and a lot of data processing we do is mostly determinstic, with some "smartness" or judgement sprinkled in. Take for example bank reconciliation, the basic process is to match bank statement lines with accounting entry lines. In practice, dates, descriptions, and amounts often mismatch between the 2 for various reasons (typos, grouped bookings, value date vs transaction date differences, truncated values). This impacts a lot of SME's and these basic accounting processes are still manual as you need eyeballing. You look at a typical back office excel spreadsheet and will understand this.

You can pre-program the matching rules up to a certain point until it becomes unmaintainable. Or you can use LLM to generate data-dependent matching logic on the fly. I think there is a space for the latter approach, if we keep the scope tight and well contained. As with all engineering, it's about the trade-offs.

Useful targets for LLM to generate can be subsets of sql statements (create views and selects) or pure functions (haskell?), where side effects are strictly limited and there is only data in - data out. I am toying with SQL idea myself (GH: https://github.com/spoj/taskgraph).

aspittelOP4mo ago

AWS just shipped an experimental library through strands-labs, AI Functions, which execute LLM-generated code at runtime and return native Python objects. They use automated post-conditions to verify outputs continuously. Unlike generate-and-verify approaches, the AI-generated code runs directly in your application.

xiphias24mo ago

This looks like Symbolica, except the great thing of what they are doing is that they are setting new ARC-AGI records.

https://www.symbolica.ai/blog/arcgentica

kkukshtel4mo ago

I wrote about something along these lines 3 years ago, but used the name "Heisenfunctions," which I think is better :)

https://kylekukshtel.com/incremental-determinism-heisenfunct...

A lot of this was also inspired by Ian Bicking's work here:

https://ianbicking.org/blog/2023/01/infinite-ai-array.html

amelius4mo ago

Why even return Python data structures? You might even return things like "A list that contains in order 1 ... 10, except the number 5".

adityagolatkar4mo ago

Part of the Strands team here. We hope AI functions will expand the tool belt of programmers. They aren't meant to replace regular Python, but to handle the inherently unpredictable parts of a pipeline that require an agent: parsing unstructured uploads, normalizing messy user input, research tasks, etc.

With AI Functions and post-conditions, we want to make this process more robust, ergonomic and cheaper: you don't always need a frontier model for ambiguous tasks. Smaller/faster agents can do the work if you have robust correctness checks.

On the roadmap: JIT-compiled functions that reuse previously generated code to cut costs, LLM-based backprop for learning/memory/prompt tuning, and strong remote sandboxing for code execution. We're focused on getting the DevX right before shipping these — happy to answer questions.

bilekas4mo ago

> Now consider a different arrangement. The LLM generates code that actually runs inside your application – at call time, every time the function is invoked.

I'm sure there's a lot of effort put into this, god knows why, but I pray I never have to have this in a production environment im on.

bwestergard4mo ago

The "Grace" language is based on the same idea, but lets you get the full benefit of specifying static types.

https://github.com/Gabriella439/grace

It's still probably not a great idea.

vjerancrnjak4mo ago

Funny how pydantic is used to parse and not validate but then there are post conditions after parsing which you should parse actually or which can be enforced with json schema and properly implemented constrained sampling on the LLM side.

yomismoaqui4mo ago

I guess the next one will be Software 3.11 (for Workgroups)

bilater4mo ago

Had a similar idea a couple of years ago but I think this is still tied to the old way of doing things. More like software 2.9 rather than 3.1.

alecco4mo ago

This is why RAM is 5x.

nglander4mo ago

Apparently we have blogging-3.0 as well, since the article is littered with AI-isms.

These attempts at generating code that adheres to a whatever spec in Python of all languages are futile and just please investors.

There is a reason that really proving adherence to a spec or making arguments that the spec is reasonable in the first place is hard.

But hey, thinking is hard, let's go AI shopping.

fd-codier4mo ago

Is there at least a single benefit using this ?

Kuinox4mo ago

It may seems that a terrible idea, but I think that's good to run quick scripts. It means you can delegate some uninteresting parts the AI is likely to succeed at.

For example, connecting to endpoints, etc... then the logic of your script can run.

otikik4mo ago

Why would I want to do that?

bpavuk4mo ago

so, this idea looks like follows: expose programmatic access to your program, which potentially operates in destructive manner (no Undo button) on potentially sensitive data; give a sloppy LLM (sloppy - due to its sheer unpredictability and ability to fuck up things a sober human with common sense never ever would) a Python interpreter; then let it run away with it and hope that your boundaries are enough to stop it at the edges YET don't limit the user too much?

nah, I'm skipping this update.

OutOfHere4mo ago

It is a horrific idea because it takes the energy requirement of Python and multiplies it by a 1000. I guess they're looking for people they can fool.

bilekas4mo ago

"Code will be in constant flux and nobody but the allocated llm will understand it"

exfalso4mo ago

This is a terrible idea

khalic4mo ago

Is this satire?

j / k navigate · click thread line to collapse

59 comments

58 comments · 32 top-level

renegade-otter4mo ago· 7 in thread

This has big "let's do this because we can" energy.

What is the BENEFIT of all this?

Let's use Blockchain instead of a database - because we can.

Let's create a maze of microservices - because we can.

Let's make every function a lambda function - because we can.

Let's make AI write code, run it, verify it, fix it, then run it again - because we can.

Let's burn untold amounts of energy to do simple things - because we can.

marginalia_nu4mo ago

Because we can? More like because I have equity in a company that sells this stuff.

bilekas4mo ago

"Does this mean I won't need to pay a skilled person?"

1-64mo ago

To you, what's the point of spending countless billions on space exploration?

otikik4mo ago

Space exploration needs things to be better. Better propulsors. Better seals. Better materials. Or else the space capsule explodes and people die.

What the article is proposing is making programming worse, for no apparent benefit for anyone except those who sell AI data center cycles.

gdulli4mo ago

renegade-otter4mo ago

You can make that argument about every single thing that is wasteful but can be justified as "research".

Sure, every bit of f--ing around is research, but ROI is far from constant.

gdulli4mo ago

Discretion will be the better part of the tech industry, if we ever reach that maturity level.

leoedin4mo ago· 6 in thread

I can't even imagine how many joules would be used per function call!

Can anyone think of any uses for this?

ryancoleman4mo ago

simsla4mo ago

[0] e.g. something like the below which I expect to use maybe a dozen times total.

Subroutine: (...) update XYZ, use metacritic to find metadata, fall back to Google.

amelius4mo ago

You just tell the AI: use as little energy as possible, by whatever means necessary!

pphysch4mo ago

Anthropic announces deal to buy 100% of Idaho's potato crop, in return for options, in new energy efficiency push

re-thc4mo ago

> run the AI function once it would be much simpler to cache the resulting code than repeatedly re-generate it?

Surely, you'll run a function that does an AI call to cache the resulting code.

ryancoleman4mo ago

The initial version on GitHub does not implement caching or memorization but it's possible and where the project will likely head. (FYI I'm on the Strands Agents team).

chaboud4mo ago· 4 in thread

Why stop there? Just call the LLM with the data and function description and get it to return the result!

ryancoleman4mo ago

That's basically how it works! (with human authored functions that validate the result, automatically providing feedback to the LLM if needed)

falcor844mo ago

Because you often need the result not as a standalone artifact, but as a piece in a rigid process, consisting with well-defined business logic and control flow, with which you can't trust AI yet.

mtw144mo ago

What was the gap you discovered that made it not shippable? This is an experimental project, so I'm curious to know what sorts of problems you ran into when you tried a similar approach.

chaboud4mo ago

Three things:

1. Confirmable, predictable behavior (can we test it, can we make assurances to customers?).

2. Comparative performance (having an LLM call to extract from a list in 100s of ms instead of code in <10ms).

Buuuuuut.... I had working functional prototype explorations with almost no work on my end, in an hour.

We've now extended this thinking to some experience exploration builders, so it definitely has a place in the toolbox.

stackghost4mo ago· 2 in thread

renegade-otter4mo ago

There is a use for everything. The problem is, people will try to use this to create CRUD apps for no goddamned reason.

stackghost4mo ago

>There is a use for everything.

Eventually, perhaps. I've yet to see a use case for blockchains that isn't merely a worse facsimile of something already existing.

But the electron was useless when it was discovered, so maybe one day

waynesonfire4mo ago· 2 in thread

Obvisouly you have never built software. English is a terrible programming language, you cannot have ambiguity in defining your computation.

PhunkyPhil4mo ago

Product owners and business people request code in vague English all the time. It's our job to parse it to code using our own judgement.

squeefers4mo ago

> you cannot have ambiguity in defining your computation

nobody except for maybe nasa would make software in this scenario.

throwup2384mo ago· 1 in thread

kingstnap4mo ago

The real reason its bad is because its not really easier to be more productive doing this:

> You write a Python function with a natural language specification instead of implementation code. You attach post-conditions – plain Python assertions that define what correct output looks like.

> You write a Python function with ~~a natural language specification instead of~~ implementation code.

In many cases.

manofmanysmiles4mo ago· 1 in thread

I'd like to see this with a proper local "instruction cache."

Really tricky functions that the LLM can't solve could be delegated back for human implementation.

falcor844mo ago

Nice! I can almost see your vision. In terms of tooling, I think this could be integrated with deep instrumentation (a-la datadog) and used to create self-improving systems.

zeckalpha4mo ago· 1 in thread

There were people doing this sort of thing 2-3 years ago. What are they doing now?

blibble4mo ago

apparently still writing blog posts on it and posting them to HN

furyofantares4mo ago· 1 in thread

People did this 3 or more years ago. It's funny, but no less dumb now than it was then.

re-thc4mo ago

It's in the title. Software 3.1 (years ago).

moffers4mo ago· 1 in thread

Could you do this with erlang’s term to binary functionality?

Stromgren4mo ago

In my experience it’s a huge leap in terms of the agent being able to test and debug functionality. It’ll often write small code snippets to test that individual functions work as expected.

kaspermarstal4mo ago

spoj4mo ago

A lot of valid concerns against arb code execution in prod for security, performance, auditability etc.

However, I do resonate somewhat with the post if I think about some accounting processes.

aspittelOP4mo ago

xiphias24mo ago

This looks like Symbolica, except the great thing of what they are doing is that they are setting new ARC-AGI records.

https://www.symbolica.ai/blog/arcgentica

kkukshtel4mo ago

I wrote about something along these lines 3 years ago, but used the name "Heisenfunctions," which I think is better :)

https://kylekukshtel.com/incremental-determinism-heisenfunct...

A lot of this was also inspired by Ian Bicking's work here:

https://ianbicking.org/blog/2023/01/infinite-ai-array.html

amelius4mo ago

Why even return Python data structures? You might even return things like "A list that contains in order 1 ... 10, except the number 5".

adityagolatkar4mo ago

bilekas4mo ago

> Now consider a different arrangement. The LLM generates code that actually runs inside your application – at call time, every time the function is invoked.

I'm sure there's a lot of effort put into this, god knows why, but I pray I never have to have this in a production environment im on.

bwestergard4mo ago

The "Grace" language is based on the same idea, but lets you get the full benefit of specifying static types.

https://github.com/Gabriella439/grace

It's still probably not a great idea.

vjerancrnjak4mo ago

yomismoaqui4mo ago

I guess the next one will be Software 3.11 (for Workgroups)

bilater4mo ago

Had a similar idea a couple of years ago but I think this is still tied to the old way of doing things. More like software 2.9 rather than 3.1.

alecco4mo ago

This is why RAM is 5x.

nglander4mo ago

Apparently we have blogging-3.0 as well, since the article is littered with AI-isms.

These attempts at generating code that adheres to a whatever spec in Python of all languages are futile and just please investors.

There is a reason that really proving adherence to a spec or making arguments that the spec is reasonable in the first place is hard.

But hey, thinking is hard, let's go AI shopping.

fd-codier4mo ago

Is there at least a single benefit using this ?

Kuinox4mo ago

It may seems that a terrible idea, but I think that's good to run quick scripts. It means you can delegate some uninteresting parts the AI is likely to succeed at.

For example, connecting to endpoints, etc... then the logic of your script can run.

otikik4mo ago

Why would I want to do that?

bpavuk4mo ago

nah, I'm skipping this update.

OutOfHere4mo ago

It is a horrific idea because it takes the energy requirement of Python and multiplies it by a 1000. I guess they're looking for people they can fool.

bilekas4mo ago

"Code will be in constant flux and nobody but the allocated llm will understand it"

exfalso4mo ago

This is a terrible idea

khalic4mo ago

Is this satire?

j / k navigate · click thread line to collapse