undefined | Better HN

0 pointsproc08mo ago0 comments

Well, if it replaces all engineers, then I'm not up to date on the capabilities of the state of the art. So far I've just used the available commercial models. I quickly hit walls when I try to push its limits even a little.

In theory, any prompt should result in a good output just as if I suggest it to an engineer. In practice I find that there are real limitations that require a lot of iterations and "handholding" that is unless I want something that has already been solved and the solution is widely available. One simple example is I prompted for a physics simulation in C++ with a physics library, and it got a good portion of it correct, but the code didn't compile. When it compiled, it didn't work, and when it worked it wasn't even remotely close to being "good" in the sense of how a human engineer would judge their output if I where to ask for the same thing, not to mention making it production ready or multiplatform. I just have not experienced any LLM capable of taking ANY prompt... but because they do complete some prompts and those prompts do have some value it seems as if the possibilities are endless.

This is a lot easier to see with generative image models, i.e. Flux, Sora, etc. We can see amazing examples, but does that mean anything I can imagine I can prompt and it will be capable of generating? In my experience, not even close. I can imagine some wild things and I can express them in whatever detail is necessary. I have experimented with generative models and it turns out that they have real limitations as to what they can "imagine". Maybe they can generate car driving through a road in the mountains, and it's rendered perfectly, but when you change the prompt to something less generic, i.e. adding more details like car model, maybe time of the day, it starts to break down. When you try and prompt something completely wild, i.e. make the car transform into a robot and do a back flip, it fails spectacularly. There is no "logic" to what it can or cannot generate, as one might think. A talented artist that can create a 3d scene with a car can also create a scene with a car transforming into a robot (granted it might take more time and require experimentation).

The main point is that there is a creative capability that LLMs are lacking and this will translate to engineering in some form but it's not something that can be easily measured right away. Orgs will adapt and are already extracting value from LLMs, but I'm wondering what is going to be the real long term cost.

0 comments

2 comments · 1 top-level

payneio8mo ago· 1 in thread

So, what we do is automate the hand-holding. In your physics simulation example, you can have the system attempt to compile on every change and fix any errors it finds (we use strict linting, type-checking, compile errors, etc.); and you can provide a metric of "good" and have it check for that and revise/iterate as needed. What we've found particularly useful is breaking the problem into smaller pieces--"The Unix Philosophy" as the system is quite capable of extracting, composing, defining APIs, etc. over small pieces. Make larger things out of reliable smaller things like any reasonable architecture.

These things are not "creative"... they are just piecing together decent infrastructure and giving the "actor" the ability to use it.

Then break planning, design, implementation, testing, etc. apart and do the same for each phase--reduce "creativity" to process and the systems can follow the process quite nicely with minimal intervention.

Then, any time you do need to intervene, use the system to help you automate the next thing so you don't have to intervene in the same way again next time.

This is what we've been doing for months and it's working well.

proc0OP8mo ago

Right, I can see how using an agentic system like that would go a long way. However there is a distinction between using AI models directly, and architecting a system in the context of this conversation, because it means the limitations of the models are being overcome by human engineers (and at scale, since this is hard outside of enterprise). If the models were intelligent enough this would not be needed.

So my claim of knowledge bases still stands. An agentic system designed by humans is still a system of knowledge bases that work with natural language, and of course their capability is impressive, but I remained unconvinced they can push the boundaries like a human can. That said, maybe pushing boundaries is not needed for the majority of applications out there, which I guess is fair enough and what we have now is good enough to make most human engineering obsolete. I guess we'll see in the near future.

j / k navigate · click thread line to collapse