Edit: Better at chain of thought, long running agentic tasks, following rigid directions.
Figures that any article written on LLM limits is immediately out of date. I'll write an update piece to summarize new findings.
It's very hard to evaluate whether a model is better than another, especially doing it in a scientifically sound way is time consuming and hard.
This is why I find these types of comments like "model X is so much better than model Y" to be about as useful as "chocolate ice cream is so much better than vanilla"
I'm no expert in the matter, but for "holistic" things (where there are a lot of cross-connections and inter-dependencies) it feels like a diffusion-based generative structure would be better-suited than next-token-prediction. I've felt this way about poetry-generation, and I feel like it might apply in these sorts of cases as well.
Additionally, this is a highly-specialized field. From the conclusion of the article:
> Overall we have some promising directions. Using LLMs for circuit board design looks a lot like using them for other complex tasks. They work well for pulling concrete data out of human-shaped data sources, they can do slightly more difficult tasks if they can solve that task by writing code, but eventually their capabilities break down in domains too far out of the training distribution.
> We only tested the frontier models in this work, but I predict similar results from the open-source Llama or Mistral models. Some fine tuning on netlist creation would likely make the generation capabilities more useful.
I agree with the authors here.
While it's nice to imagine that AGI would be able to generalize skills to work competently in domain-specific tasks, I think this shows very clearly that we're not there yet, and if one wants to use LLMs in such an area, one would need to fine-tune for it. Would like to see round 2 of this made using a fine-tuning approach.
But I think there's also the bitter lesson to be learned here: many times people say LLMs won't do well on a task, they are often surprised either immediately or a few months later.
Overall not sure what to expect, but fine tuning experiments would be interesting regardless.
I have my own library of nuances but how would you even fine tune anything to understand the black box abstraction of an IC to work out if a nuance applies or not between it and a load or what a transmission line or edge would look like between the IC and the load?
This is where understanding trumps generative AI instantly.
Heh. This is very true. I think perhaps the thing I'm most amazed by is that simple next-token prediction seems to work unreasonably well for a great many tasks.
I just don't know how well that will scale into more complex tasks. With simple next-token prediction there is little mechanism for the model to iterate or to revise or refine as it goes.
There have been some experiments with things like speculative generation (where multiple branches are evaluated in parallel) to give a bit of a lookahead effect and help avoid the LLM locking itself into dead-ends, but they don't seem super popular overall -- people just prefer to increase the power and accuracy of the base model and keep chugging forward.
I can't help feeling like a fundamental shift something more akin to a diffusion-based approach would be helpful for such things. I just want some sort of mechanism where the model can "think" longer about harder problems. If you present a simple chess board to an LLM or a complex board to an LLM and ask it to generate the next move, it always responds in the same amount of time. That alone should tell us that LLMs are not intelligent, and they are not "thinking", and they will be insufficient for this going forward.
I believe Yann LeCun is right -- simply scaling LLMs is not going to get us to AGI. We need a fundamental structural shift to something new, but until we stop seeing such insane advancements in the quality of generation with LLMs (looking at you, Claude!!), I don't think we will move beyond. We have to get bored with LLMs first.
There is one posted on HN every week. How many more do we need to accept the fact this tech is not what it is sold at and we are bored waiting for it get good? I am not say "get better", because it keeps getting better, but somehow doesn't get good.
It's frustrating because it's infantalizing, it derails the potential of an interesting technical discussion (ex. Here, diffusion), and it misses the mark substantially.
At the end of the day, it's useful in a thousand ways day to day, and the vast majority of people feel this way. The only people I see vehemently arguing the opposite seem to assume only things with 0 error rate are useful or are upset about money in some form.
But is that really it? I'm all ears. I'm on a 5 hour flight. I'm genuinely unclear on whats going on that leads people to take this absolutist position that they're waiting for ??? to admit ??? about LLMs.
Yes, the prose machine didnt nail circuit design, that doesn't mean whatever They you're imagining needs to give up and accept ???
Soon everything you see and hear will be built up through a myriad of AI models and pipelines.
If you are interested I highly recommend this + your favorite llm. It does not do everything but is far superior to some highly expensive tools, in flexibility and repeatability. https://github.com/devbisme/skidl
One thing I've been personally really intrigued by is the possibility of using self-play and adversarial learning as a way to advance beyond our current stage of imitation-only LLMs.
Having a strong rules-based framework to be able to be able to measure quality and correctness of solutions is necessary for any RL training setup to proceed. I think that skidl could be a really nice framework to be part of an RL-trained LLM's curriculum!
I've written down a bunch of thoughts [1] on using games or code-generation in an adversarial training setup, but I could see circuit design being a good training ground as well!
What about the topic, it is impossible to synthesize STEM things not in the manner an engineer does this. I mean thou shalt to know some typical solutions and have all the calculations for all what's happening in the schematic being developed.
Textbooks are not a joke and no matter who are you - a human or a device.
Yes, as well as dealing with a variable-length window.
When generating images with diffusion, one specifies the image ahead-of-time. When generating text with diffusion, it's a bit more open-ended. How long do we want this paragraph to go? Well, that depends on what goes into it -- so how do we adjust for that? Do we use a hierarchical tree-structure approach? Chunk it and do a chain of overlapping segments that are all of fixed-length (could possibly be combined with a transformer model)?
Hard to say what would finally work in the end, but I think this is the sort of thing that YLC is talking about when he encourages students to look beyond LLMs. [1]
I cannot help but think there are some similarities between large model generative AI and human reasoning abilities.
For example if I ask a physician with a really high IQ some general questions about say anything like fixing shocks on my mini van … he may have some better ideas than me.
However he may be wrong since he specialized in medicine, although he may have provided some good overall info.
Let’s take a lower IQ mechanic who has worked as a mechanic for 15 years. Despite this human having less IQ, less overall knowledge on general topics … he gives a much better answer of fixing my shocks.
So with LLM AI fine tuning looks to be key as it is with human beings. Large data sets that are filtered / summarized with specific fields as the focus.
> The AI generated circuit was three times the cost and size of the design created by that expert engineer at TI. It is also missing many of the necessary connections.
Exactly what I expected.
Edit: to clarify this is even below the expectations of a junior EE who had a heavy weekend on the vodka.
- https://www.damninteresting.com/on-the-origin-of-circuits/
- https://www.sciencedirect.com/science/article/abs/pii/S03784...
It's a distinction I fear many people will have trouble keeping in-mind, faced with the misleading eloquence of LLM output.
What natural language processing does is just make a much smarter (and dumber, in many ways) parser that can make an attempt to infer the intent, as well as be instructed how to recover from mistakes.
Personally I'm a skeptic since I've seen some hilariously bad hallucinations in generated code (and unlike a human engineer who will say "idk but I think this might work" instead of "yessir this is the solution!"). If you have to double check every output manually it's not that much better than learning yourself. However, at least with programming tasks, LLMs are fantastic at giving wrong answers with the right vocabulary - which makes it possible to check and find a solution through authoritative sources and references instead of blindly analyzing a problem or paying a human a lot of money to tell you the answer to your query.
For example, I don't use LLMs to give me answers. I use them to help explore a design space, particularly by giving me the vocabulary to ask better questions. And that's the real value of a conversational model today.
AI happy as it worked the first 10ns of the cycle.
Agree with OP that the raw models aren't that useful for schematic/pcb design.
It's why we build flux from the ground up to provide the models with the right context. The models are great moderators but poor sources of great knowledge.
Here are some great use cases:
https://www.youtube.com/watch?v=XdH075ClrYk
https://www.youtube.com/watch?v=J0CHG_fPxzw&t=276s
https://www.youtube.com/watch?v=iGJOzVf0o7o&t=2s
and here a great example of levering AI to go from idea to full design https://x.com/BuildWithFlux/status/1804219703264706578
It kind of grosses me out that we are entering a world where programming could be just testing (to me) random permutations of programs for correctness.
Most people are wrong that AI won't be able to do this soon. The same way you can't expect an AI to generate a website in assembly, but you CAN expect it to generate a website with React/tailwind, you can't expect an AI to generate circuits without having strong functional blocks to work with.
Great work from the author studying existing solutions/models- I'll post some of my findings soon as well! The more you play with it, the more inevitable it feels!
Can you? Because last time I tried (probably about February) it still wasn’t a thing
The industry does not like sharing, and the openly available datasets are full of mistakes. As a junior EE you learn quite quickly to never trust third-party symbols and footprints - if you can find them at all. Even when they come directly from the manufacturer there's a decent chance they don't 100% agree with the datasheet PDF. And good luck if that datasheet is locked behind a NDA!
If we can't even get basic stuff like that done properly, I don't think we can reasonably expect manufacturers to provide ready-to-use "building blocks" any time soon. It would require the manufacturers to invest a lot of engineer-hours into manually writing those, for essentially zero gain to them. After all, the information is already available to customers via the datasheet...
Are you able to accomplish this with prompt-engineering, or are you doing fine-tuning of LLMs / custom-trained models?
I don't know how feasible it is. This would probably take low $millions or so of training, data collection and research to get not trash results.
I'd certainly love it for trying to diagnose circuits.
It's probably not really that possible even at higher end consumer grade 1200dpi.
And the devices, in this case, bluetooth aux transceivers, they all do the same things. They've even more or less converged on all being 3 buttons. When optimizing for cost reduction with the commodity chips that everyone is using to do the same things, the manufacturer variation isn't that vast.
In the same way you can get 3d models from 2d photos because you can identify the object based on a database of samples and then guess the 3d contours, the hypothesis to test is whether with enough scans and schematics, a sufficiently large statistical model will be good enough to make decent guesses.
If you've got say 40 devices with 80% of the same chips doing the same things for the same purpose, a 41st device might have lots of guessable things that you can't necessarily capture on a cheap flatbed
This will probably work but it's a couple million away from becoming a reality. There's shortcuts that might make this a couple $100,000s project (essentially data contracts with bespoke chip printers) but I'd have to make those connections. And even then, it's just a hobbyist product. The chances of recouping that investment is probably zero although the tech would certainly be cool and useful. Just not "I'll pay you money" level useful.
They are already far ahead of many others with respect to next generation EE CAD.
Judicious application of AI would be a big win for them.
Edit: adding "TL;DRN'T" to my vocabulary XD
Adding Skynetn't to company charter...
"If we make a really really good specialty text-prediction engine, it could be able to productively mimic an imaginary general AI, and if it can do that then it can productively mimic other specialty AIs, because it's all just intelligence, right?"
few really understand what the limits of the tech are. and if it will even unlock the usecases for which it is being touted.
TLDR: We test LLMs to figure out how helpful they are for designing a circuit board. We focus on utility of frontier models (GPT4o, Claude 3 Opus, Gemini 1.5) across a set of design tasks, to find where they are and are not useful. They look pretty good for building skills, writing code, and getting useful data out of datasheets.
TLDRN'T: We do not explore any proprietary copilots, or how to apply a things like a diffusion model to the place and route problem.
* Failed to properly understand and respond to the requirements for component selection, which were already pretty generic.
* Succeeded in parsing the pinout for an IC but produced an incomplete footprint with incorrect dimensions.
* Added extra components to a parsed reference schematic.
* Produced very basic errors in a description of filter topologies and chose the wrong one given the requirements.
* Generated utterly broken schematics for several simple circuits, with missing connections and aggressively-incorrect placement of decoupling capacitors.
Any one of these failures, individually, would break the entire design. The article's conclusion for this section buries the lede slightly:
> The AI generated circuit was three times the cost and size of the design created by that expert engineer at TI. It is also missing many of the necessary connections.
Cost and size are irrelevant if the design doesn't work. LLMs aren't a third as good as a human at this task, they just fail.
The LLMs do much better converting high-level requirements into (very) high-level source code. This make sense (it's fundamentally a language task), but also isn't very useful. Turning "I need an inverting amplifier with a gain of 20" into "amp = inverting_amplifier('amp1', gain=-20.0)" is pretty trivial.
The fact that LLMs apparently perform better if you literally offer them a cookie is, uh... something.
But the bottom line is that it's a task that a novice could have solved with a Google search or two, and the LLM fumbled it in ways that'd be difficult for a non-expert to spot and rectify. LLMs are generally pretty good at information retrieval, so it's quite disappointing.
The cookie thing... well, they learn statistical patterns. People on the internet often try harder if there is a quid-pro-quo, so the LLMs copy that, and it slips past RLHF because "performs as well with or without a cookie" is probably not one of the things they optimize for.
The number of times I've had to entirely redo a circuit because of one misplaced connection, yeah, none of those circuits worked for any price before I fixed every single error.
I think Gemini could definitely do that microphone study. Good test case! I remember spending 8 hours on DigiKey in the bad old times, looking for an audio jack that was 0.5mm shorter.