The History, Status, and Future of FPGAs (opens in new tab)

(queue.acm.org)

170 pointsskovorodkin5y ago158 comments

158 comments

72 comments · 13 top-level

jcranmer5y ago· 18 in thread

As a bit of a counterpoint:

One of my prior projects involved working with a lot of ex-FPGA developers. This is obviously a rather biased group of people, but I saw a lot of feedback around that was very negative about FPGAs.

One comment that's telling is that since the 90s, FPGAs were seen as the obvious "next big technology" for HPC market... and then Nvidia came out and pushed CUDA hard, and now GPGPUs have cornered the market. FPGAs are still trying to make inroads (the article here mentions it), but the general sense I have is that success has not been forthcoming.

The issue with FPGAs is you start with a clock rate in the 100s of MHz (exact clock rate is dependent on how long the paths need to be), compared with a few GHz for GPUs and CPUs. Thus you need a 5× performance win from switching to an FPGA just to break even, and you probably need another 2× on top of that to motivate people going through the pain of FPGA programming. Nvidia made GPGPU work by being able to demonstrate meaningful performance gains to make the cost of rewriting code worth it; FPGAs have yet to do that.

Edit: It's worth noting that the programming model of FPGAs has consistently been cited as the thing holding back FPGAs for the past 20 years. The success of GPGPU, despite the need to move to a different programming model to achieve gains there, and the inability of the FPGA community to furnish the necessary magic programming model suggests to me (and my FPGA-skeptic coworkers) that the programming model isn't the actual issue preventing FPGAs from succeeding, but that FPGAs have structural issues (e.g., low clock speeds) that prevent their utility in wider market classes.

kyboren5y ago

GPUs work great for accelerating many applications, and it's true that that reduces interest in FPGAs. For applications that map well to GPUs, you're absolutely correct that the higher clock speeds (and greater effective logic area) make GPUs superior as accelerators.

However, some applications do not map well to GPUs. Particularly those applications with a great deal of bit-level parallelism can achieve enormous speedups with bespoke hardware. For those applications where it doesn't make sense to tape out an ASIC, FPGAs are beautiful--even if they only operate at a few hundred MHz.

I think the "programming model" is actually the biggest barrier to wider adoption. Your comment is suffused with what I believe is the source of this disagreement: The idea that one programs an FPGA. One designs hardware that is implemented on an FPGA. The difference may sound pedantic, but it really is not. There is a massively huge difference between software programming and hardware design, and hardware design is downright unnatural for software developers. They are completely different skill sets.

On top of that add all the headaches that come with implementing a physical device with physical constraints (the article complains about P&R times but this is far from the only burden) and it becomes clear that FPGAs are quite frankly a massive pain in the ass compared to software running on CPUs or GPUs.

exmadscientist5y ago

Very much this.

(Also, in general, FPGA tools are just some of the lowest quality garbage out there... and that is saying something. They're that bad. This is a completely unnecessary speedbump.)

The rebuttal to your objection is always tools like "HLS" (High-Level Synthesis), or in English it's "C to HDL" (FPGAs are 'programmed' in the two Hardware Definition Languages VHDL (bad) or Verilog (worse, but manageable if you learn VHDL first).) These are not programming languages, they are hardware definition languages. That means things like "everything in a block always executes in parallel". (Take that, Erlang?) In fact, everything on the chip always executes in parallel, all the time, no exceptions; you "just" select which output is valid. That's because this is how hardware works.

This model maps very, very poorly to traditional programming languages. This makes FPGAs hard to learn for engineers and hard to target for HLS tools. The tools can give you decent enough output to meet low- to mid-performance needs, but if you need high performance -- and if not, why are you going through this masochism? -- you're going to need to write some HDL yourself, which is hard and makes you use the industry's worst tools.

Thus, FPGAs languish.

6 more replies

Stubb5y ago

Another big advantage of FPGAs is low latency and the ability to hit precise timing deadlines. When working with radio hardware, you still need an FPGA for automatic gain control calculations and recording/playing out samples. Similarly, you need to do your CRC and other calculations in an FPGA if you need to immediately respond to incoming signals, such as the CTS->RTS->DATA->ACK exchange in 802.11.

1 more reply

cmrdporcupine5y ago

See it's funny, I (software guy) have recently started doing a bunch of FPGA stuff on the side for "fun" and I find the programming model to not be the biggest challenge.

The tools, yes, because it seems like hardware engineers have a fetish for all-encompassing painful vendor specific IDEs with half the features that us software developers have, and with a crapload of vendor lock-in... but I digress.

I find working in Verilog to be pretty pleasant. Yes I can see that with sufficient complexity it wouldn't scale out well. But SystemVerilog does give you some pretty good tools for managing with modularity.

On the other hand, I've never particularly enjoyed working with GPUS, CUDA, etc.

So I would agree with your statement that the structural issues prevent their utility in wider market classes -- and those really are as you say ... lower clock speeds, cost, but also vendor tooling.

FPGAs could really do with a GCC/LLVM type open, universal, modular tooling. I use fusesoc, which is about as close to that as I will get (declarative build that generates the Vivado project behind the scenes), but it's not perfect, still.

jjoonathan5y ago

I don't mean to belittle your exploration, but are you sure it's an apples-to-apples comparison? This suggests to me that it isn't:

> it seems like hardware engineers have a fetish for all-encompassing painful vendor specific IDEs

Hardware engineers feel pain just like you do. The reason why they put up with those awful software suites is because they have features they need that aren't available elsewhere. In particular, they interface with IP blocks and hard blocks, including at a debug + simulation level. Those tend to evolve quickly and last time I looked -- which admittedly was a while ago -- the open source FPGA tooling pretty much completely ignored them, even though they're critical to commercial development.

If you are content to live without gigabit transceivers, PCIe controllers, DRAM controllers, embedded ARM cores, and so on, I suspect it would be relatively easy to use the open source tooling, but you would only be able to address a small fraction of FPGA applications.

3 more replies

tieze5y ago

LLVM folks have actually just started on such tooling: CIRCT. With Chris Lattner at the helm, and industry players like Xilinx and Intel seemingly on board.

daxfohl5y ago

Agreed. I never thought the mental leap to Verilog was a big hurdle. It's just C-like syntax with some new constructs around signaling and parallelism. I found this interesting rather than foreboding.

The main challenge I had was compilation time. It can sometimes take overnight to compile a simple application if there's a lot of nested looping, only to have it run out of gates. This can be a royal pain.

I'd expect most HPC scenarios would have lots of nested looping, and probably memory accesses, and thus have to spend a lot of time writing state machines to get around gate count limitations and wait for memory responses, at which point you're basically designing a 200 MHz CPU.

So I don't see it as being very useful for general purpose acceleration, but could be a good CPU offload for some very specific use cases that are more bit-banging than computing. Azure accelerates all its networking via FPGA, which seems like the ideal use case.

1 more reply

lnsru5y ago

It’s not the speed, that holds FPGA adaptation back. It’s development process/time. While one can start with GPU immediately, there is a need for FPGA to develop whole PCIe infrastructure and efficient data movers. One is done with GPU while FPGA developers just start with algorithms. As long as one does not need real time capability, GPU is an obvious choice. My 200 MHz design outcompetes every CPU and GPU out there with very narrow data processing window, but development time is 5x compared to regular software.

sfgnilnio5y ago

You ever work with an FPGA? The programming model and the tooling are a huge part of the problem.

Verilog and VHDL have basically nothing in common with any language you've ever used.

Compilation can take multiple days. This means that debugging happens in simulation, at maybe 1/10000th of the desired speed of the circuit.

If you try to make something too big, it just plain won't fit. There is no graceful degradation in performance; an inefficient design will just not function, come Hell or high water.

The existing compilers will happily build you the wrong thing if you write something ill-defined. There are a ton of things expressible in a hardware description language that don't actually map onto a real circuit (at least not one that can be automatically derived). In any normal language anything you can express is well-defined and can be compiled and executed. Not so in hardware.

Timing problems are a nightmare. Every single logic element acts like its own processor, writing directly into the registers of its neighbours, with no primitives for coordination. Imagine if you had to worry about race conditions inside of a single instruction!

Maybe if all these problems are solved FPGAs still wouldn't catch on, but let's not pretend the programming model isn't a problem. Hardware is fundamentally hard to design and the tooling is all 50 years out of date.

formerly_proven5y ago

> You ever work with an FPGA? The programming model and the tooling are a huge part of the problem.

I'd argue FPGAs aren't programmed and don't have a programming model. Complaints that the programming model of FPGAs holds their adoption back are thus conceptually ill-founded. (The tooling still sucks).

1 more reply

tyingq5y ago

There is another traditional FPGA use case where you need real time data capture or signal generation. That seems to be getting eaten from the bottom now that there are really high speed MCUs that are easier to program. It's less efficient, but easier to develop for.

exmadscientist5y ago

The other problem with using an FPGA here is that microcontrollers are cheap and have great cheap dev boards. FPGAs, not so much. I've wanted to just "drop in" a small FPGA in several designs, the way you can drop in a microcontroller, but there's no available FPGA that's not a massive headache in that use case. Trust me, I've looked.

The iCE40 series is almost there but not quite. It's a bit pricey (this is sometimes okay, sometimes a dealbreaker) but its care and feeding is too annoying. Who wants to source a separate configuration memory? Sometimes I don't have the space for that crap.

If any company can bring a small, cheap, low power FPGA to the market, preferably with onboard non-volatile configuration memory, a microcontroller-like peripheral mix (UART, I2C, SPI, etc.), easy configuration (re)loading, and with good tool and dev board support, they'll sell a lot of units. They don't even have to be fast!

5 more replies

rogerbinns5y ago

I good example of this is XMOS. Their chips are divided into "tiles" which can simultaneously run code, together with multiple interfaces such as USB, i2s, i2c, and GPIO. Latency is very deterministic because the tiles are not using caches, interrupts, shared buses etc.

Their development environment is Eclipse based with numerous libraries such as audio processing, interface management, DFU etc. They use a variant of C (xc) that lets you send data between channels/tiles, and easily parallelize processing.

An example use is in voice assistants where multiple microphones need to be analyzed simultaneously, echo and background noise has to be eliminated, and the speaker isolated into a single audio stream. I've used it for an audio processing product that needed match hardware timers exactly, provide USB access, matched input and output etc.

borramakot5y ago

Just to throw in one more complication, I'll assert that the only benefits of FPGAs over ASICs are one time costs and time to market. Those are big benefits, but almost by definition, they aren't as important for workloads that are large scale and stable. So, if you do have a workload that's an excellent match for FPGAs, and if that workload will have lots of long term volume, you should make an ASIC for it.

So, for FPGAs to be the next big thing in HPC, you'd need to find a class of workloads that benefit from the FPGA architecture, for long enough and with high enough volume to be worth the work to move over, and are also unstable or low volume enough that it's not worth making them their own chip.

cbzoiav5y ago

Thats not entirely true - the flexibility can have its own value. Unlike an ASIC you can handle multiple workloads or update flows.

For example timing protocols on backbone equipment handling 100-400Gbps. Depending on how its configured you may need to do different things. Additionally you probably don't want to replace 6 figure hardware every generation.

Another example is test equipment where you can't run the tests in parallel. A single piece of hardware can be far more portable / cost effective.

1 more reply

Unklejoe5y ago

> I'll assert that the only benefits of FPGAs over ASICs are one time costs and time to market.

There's one more big one: the ability to update the logic in the field.

rthomas65y ago

Take a look at Vitis. Xilinx is aware of this problem and are seeking to capture the market of people that want magic programming solutions to speed up existing software. Who knows if it will be successful, but they are trying more than ever to make FPGAs usable without having to know how to make hardware designs and verification.

mdiesel5y ago

I work with fpgas, but from LabVIEW. NI have put some effort into making the same language work for everything including fpgas, and a graphical language is great for this kind of work.

It's so easy that it's quite common to see people pass off work onto the fpga if it involves some slightly heavier data processing, which is exactly how it should be.

d_silin5y ago· 12 in thread

I wonder if it is possible to add a (small) FPGA to a personal computer that could accelerate any specific software tasks (video/audio encoding, ML algorithms, compression, extra FPU capabilities) on user demand.

jeffreyrogers5y ago

The problem with this will be the overhead of transferring data to/from the FPGA, which once accounted for often causes doing the computation on the CPU to make more sense. It's obviously not a show-stopper, since GPUs have the same problem, but are still useful, but it's hard to find a workload that maps well to this solution.

derefr5y ago

In a DAW, accelerating a heavy VST plugin might make sense. But often those are amenable to being translated to GPGPU code already.

I guess the one place where GPGPU-based solutions wouldn't work, is when the code you want to accelerate is necessarily acting as some kind of Turing machine (i.e. emulation for some other architecture.) However, I can't think of a situation where an FPGA programmed with the netlist for arch A, running alongside a CPU running arch B, would make more sense than just getting the arch-B CPU to emulate arch A; unless, perhaps, the instructions in arch-A are very, very CISC, perhaps with analogue components (e.g. RF logic, like a cellular baseband modem.)

not2b5y ago

This is normally handled in emulation by putting the inner parts of the testbench (the transactors) onto the FPGA as well, to minimize the amount of data that has to be transferred between the CPU and the FPGA. If the FPGA is to be used as a peripheral, again a division of labor needs to be found that minimizes the amount of data that needs to be communicated. But if there is FPGA logic on the same chip as the CPU cores, the overhead can be greatly reduced, and we're seeing more of that now.

deelowe5y ago

I assumed this was kind of intel's plan when they purchased Altera. I this issue with this is the amount of time it takes to load the bitstream, but I thought I saw some things recently where progress was being made on this front.

vzidex5y ago

> issue with this is the amount of time it takes to load the bitstream, but I thought I saw some things recently where progress was being made on this front

You saw correctly, work is indeed being done to build "shells" that can accept workloads without the user having to go through the FPGA tooling/build process.

daxfohl5y ago

It's been possible for a long time, but there are big challenges to adoption. Every FPGA is different and the image is tightly coupled to the chip, so you'd have to compile the algorithm specifically to your chip before loading, which can take hours. Then loading the image each time you change out accelerators for a different application can take minutes. Then the software that uses the accelerator would have to know which chip and which image you're running and send data to it accordingly. Then you have to remember that FPGA's aren't really that great of accelerators sometimes, since they run at such low clock speeds, have crummy memory interfaces, limited gate support for floating point or even integer multiplication, etc. CPU's commonly outperform them even at the things they're supposed to be good at.

So it's unlikely ever to gain broad acceptance because the software vendors would have to support such a high number of permutations and the return can be questionable. This is why you see far more accelerators based on ASICs that have higher clock speeds and baked-in circuitry for specific tasks, with standardized APIs.

But sure, there's nothing preventing you from buying an FPGA board, hooking it up to your PC, creating a few images that do the accelerations you want, and writing software that uses them, swapping the image in when your program loads. You could even write a smart driver that swaps the image only if it's not in use by another app, or whatever. It's just unlikely you'll ever find a bunch of third-party software that supports it.

tails4e5y ago

There absolutely is. There are PCIe cards you can plugin and use them as accelerators, just like you would use a GPU. Of course programming them to do the task you want is harder, but it can do anything. Saw a great example where someone implemented memcached on a single FPGA plugin and replaced many Xeons with it.

sod5y ago

Isn't that what Apple did with that Afterburner Card for the MacPro? I read in https://www.anandtech.com/show/15646/apple-now-offering-stan... that that card is an fpga.

I could imagine that Apple will include something like this in their Apple Silicon SOC for ARM macs.

The Afterburner Card is not user programmable, but maybe it may in the future and this was just the first try to get the hardware in the field.

rustybolt5y ago

Yes, and it has been done. There are FPGA's that you can connect to with PCIe, and you only have to pay the small price of writing an FPGA implementation for your usecase. It usually takes just a couple of weeks (OK, maybe months).

SomeoneFromCA5y ago

You might actuall go even faster than PCIe, by pretending being a DDR4 memory stick.

geforce5y ago

IIRC some CPUs of the Intel Atom series already have an embedded FPGA.

duskwuff5y ago

Intel has launched a couple of Xeon Gold CPUs (like a variant of the 6138P) with integrated FPGAs for specific markets. Nothing mass-market, though, and they don't seem to have caught on much.

rwmj5y ago· 9 in thread

> Intel, AMD, and many other companies use FPGAs to emulate their chips before manufacturing them.

Really? I'm assuming if this is true it can only be for tiny parts of the design, or they have some gigantic wafer-scale FPGA that they're not telling anyone about :-) Anyway I thought they mainly used software emulation to verify their designs.

UncleOxidant5y ago

Having been involved with CPU emulation in the past a couple of comments:

1. It's not just a single FPGA but a large box full of them. for example: https://www.synopsys.com/verification/emulation/zebu-server....

2. Software models are employed for parts of the system (For example, the southbridge and all the peripherals connected to it are generally a software model which communicates with the hardware emulated portion in the FPGA via a PCIe model which is partly in hardware and partly in software.) This saves a lot of gates in the FPGA - those parts have already been well tested anyway so no need to put them into the hardware emulation.

variaga5y ago

Of the half-dozen semiconductor- designing companies I've worked for, all of them used FPGAs for emulation.

- modern FPGAs are huge.

- when an asic design won't fit in a single FPGA, it's usually possible to partition the design into multiple FPGAs

- software emulation/ simulation is not guaranteed to be "more accurate". FPGAs can interact with a real-world environment in ways that simulation simply cannot

- simulations run 1000s of times slower than FPGAs. Months of simulation time can be covered in minutes on the FPGA

Edit: to be clear, they all use simulation too, but FPGAs are used to accelerate the verification process

GeorgeTirebiter5y ago

Is that still true in 2020? Or is the simulation getting good enough to skip the FPGA prototyping phase?

1 more reply

TomVDB5y ago

Many years ago, we had a custom made board with 8 huge Xilinx Virtex 5 FPGAs (the largest available at the time) to emulate a large SOC. Those FPGAs were something like $20K a piece.

We had 10 such boards, good for millions of dollars in hardware, and a small team to keep it running.

These platform were mostly used by the firmware team to develop everything before real silicon came back. It could run the full design at ~1 to 10MHz vs +500MHz on silicon or 10kHz in simulation.

After running for a while, that FPGA platform crashed on a case where a FIFO in a memory controller overflowed.

Our VP of engineering said that finding this one bug was sufficient to justify the whole FPGA emulation investment.

jacquesm5y ago

Design verification is big business and your VP was exactly right, a factor of 100 to 1000 speed increase would allow for much more thorough testing and broader testing as well, for instance hooked up to other hardware with reasonable fidelity compared to the real thing. Still coarse but a lot better than nothing. Good call. It isn't rare at all to have a respin if you don't do design verification.

One of the nicer stories about the first ARM chip is that they built a software simulator to verify the design and as a result they found plenty of bugs in the hardware before committing to silicon. The first delivered chips worked right away.

mindentropy5y ago

The multiple FPGA on a board is generally from Dini Group right? Fantastic boards.

Ref: https://www.dinigroup.com/web/index.php

3 more replies

k0stas5y ago

The largest FPGAs were reticle-busters when I used to work on them. Today I think the largest FPGAs use chiplet-style integration. Even with the inefficiency of an FPGA, many smaller chip designs can still fit on the largest FPGA.

Also, there are prototyping boards specifically built for emulation that integrate multiple FPGAs, although this does introduces a partitioning problem that has to be solved either manually or via dedicated emulator software.

jcranmer5y ago

The FPGA emulator for a chip I was working on involved an entire rack of FPGAs... for a single core.

formerly_proven5y ago

https://www.youtube.com/watch?v=650yVg9smfI

wwarner5y ago· 7 in thread

This is really interesting. If a cpu hardware vulnerability like spectre could be repaired by patching an fpga on the SOC that would be incredible. That type of functionality would overtake the entire cloud market in about 3 days.

rwmj5y ago

I'm afraid it doesn't work like this. That would only be possible if the chip was using an FPGA fabric for the relevant parts of the design. For example if the L1 cache was implemented as an FPGA you could in theory patch around L1TF. But they wouldn't do that because it would be far slower/larger than implementing it directly as an ASIC.

Or you might imagine a chip that has an FPGA on the side (I expected Intel would ship this after acquiring Altera, but it never happened). But the FPGA would somehow have to have access to the paths that caused the vulnerability, which is highly unlikely, and would also be really slow compared to what they actually do which is hacking around it by microcode changes.

duskwuff5y ago

> Or you might imagine a chip that has an FPGA on the side (I expected Intel would ship this after acquiring Altera, but it never happened).

They did: https://www.anandtech.com/show/12773/intel-shows-xeon-scalab...

But I get the sense this part was aimed at a few very specific customers. It required some PCB-level power delivery changes, so you couldn't even drop it into a standard server motherboard.

jeffreyrogers5y ago

FPGAs are too slow for that. I think you can get the clock rate up to about 600Mhz, but that is only for very small portions of the chip. Otherwise you run into timing issues. The clock speed for most of the chip will be significantly lower.

rcxdude5y ago

Yup. If you just want a CPU, use a CPU. an FPGA is a terrible substitute, and generally you only want to embed a CPU on them if you are either developing a CPU or you want a not very fast CPU as an addon to a design which is already using an FPGA (and generally for this nowadays the vendors make FPGAs whith a CPU on the same die, because it's so common and frees up quite a lot of the FPGA fabric and power budget).

rustybolt5y ago

Amazon already has FPGA's on the cloud: https://aws.amazon.com/ec2/instance-types/f1/

I don't think they are very popular though. Maybe they are used sometimes for machine learning?

glitchc5y ago

It would also open up new attack vectors.

thehappypm5y ago

That's the real nightmare. Now all of a sudden, you can program the CPU itself if you can access the update mechanism. CPUs being non-programmable is a feature as well as a bug.

1 more reply

lnsru5y ago· 6 in thread

I am working right now on bare metal websockets implementation on Xilinx Series 7 FPGAs. Currently it’s ZynQ SoC, but final product will probably have Kintex 7 inside, so no Linux. The tools make me cry, no examples, application notes from 2014 with ancient libraries. I hope, vendors will fix tooling. But I see, Xilinx has released Vitis, so their scope is elsewhere, no interest in old crap. Using Git with Vivado is already enough pain. So I keep my text sources in Git and complete zipped projects as releases. Ouch!

tails4e5y ago

I posted this elsewhere, there are a lot of good resources and examples for the tools:

https://github.com/xupgit/FPGA-Design-Flow-using-Vivado/tree...

https://www.xilinx.com/support/university.html

https://www.xilinx.com/video/hardware/getting-started-with-t...

There are others thst cover the SDK side of things, but the HW side/Vivado is well documented.

beefok5y ago

I feel you completely. The Vivado IDE/toolchain is absolutely atrocious and the designers should be shamed for the horrifying bloatware they push as the STANDARD. Sometimes I have better luck doing everything in tcl/commandline there.

tails4e5y ago

Vivado is amazing compared with the ASIC counterparts: Design compiler is for RTL synthesis only and you need years of experience to get any decent qor out of it. In ASIC land you have separate tools for every step, synthesis, STAs, PnR, simulation, floor planning, power analysis, etc. Vivado does all that in one seamless tool, and allows you to cross probe from a routed net right back to the RTL code it came from. Try doing that with ASIC tools. So to me it's a matter of perspective, once you understand how difficult the problem of hardware design is to solve, and what some of the existing de facto industry standard tools are like (for ASIC), you come to appreciate vivado for just how well it brings all of these complex facets together. Of course if you come from a SW background you make think vivado is terrible compared to VScode or some other IDE, but that's an unfair comparison. I guess to reframe the question - show me a hardware design environment that is better than Vivado. Also, I separate vivado fron the Xilixn SDK, as they are different tools, and Vivado is expclitly got the HW parts of the design

1 more reply

phendrenad25y ago

FPGA vendors are in a tight spot, thanks to their customers. Their customers want better silicon, so they're forced to allocate their resources toward R&D, rather than making their software tools better. If you look at the Xilinx jobs page, you'll see maybe ONE job related to software tools programming, which is shocking given the complexity of Vivado/Vitis.

If some FPGA company comes along and throws out conventional market wisdom (the old Henry Ford quote seems pertinent: "If I'd asked customers what they wanted, they would have said "a faster horse"") and makes a FPGA with software tools that are fast, non-buggy, with good UI/UX, I think they would be able to steal significant market share. Early FPGA patents should be expiring by now...

mindentropy5y ago

Have you looked at open source solutions? Tim Ansell is managing some great projects on open source solutions. Check out Symbiflow, LiteX, Yosys etc.

lnsru5y ago

Are these mature already? It took some time for KiCad to get to current usable state and I don’t want to be early adopter. In fact, I want to have my private hardware MVP next year with current tools. On the other hand I can’t imagine my slacker colleagues using anything else than Vivado. Learning Vivado for them was already mission impossible.

2 more replies

Koshkin5y ago· 4 in thread

I wonder what would be the advantages of using an FPGA to test a CPU design - compared to relying on a (presumably more accurate) computer-based simulation. (I understand the reasons one might want to implement a CPU in an FPGA.)

dbcurtis5y ago

This idea is more than 30 years old. It has been done, and one upon a time companies were built around this idea.

First off, mapping an entire CPU to an FPGA cluster is a design challenge itself. Assuming you can build an FPGA cluster large enough to hold your CPU, and reliable enough to get work done on it, you have the problem of partitioning your design across the FPGA's. Second problem: observability. In a simulator, you can probe anywhere trivially, with an FPGA cluster, you must route the probed signal to something you can observe. (I am not even going to talk about getting stimulus in and results out, since with FPGA or simulator, either way you have that problem, it is just different mechanics.)

The big problem is that an FPGA models each signal with two states: 1 and 0. A logic simulator can use more states, in particular U or "unknown". All latches should come up U, and getting out of reset (a non-trivial problem), to grossly oversimplify, is "chasing the U's away". An FPGA model could, in theory, model signals with more than two states. The model size will grow quickly.

Source: Once upon a time I was pre-silicon validation manager for a CPU you have heard of, and maybe used. Once upon a time I was architect of a hardware-implemented logic simulator that used 192 states (not 2) to model the various vagaries of wired-net resolution. Once upon a time I watched several cube-neighbors wrestle with the FPGA model of another CPU you have heard of, and maybe used.

Note: What would 3 state truth tables look like, with states 0,1,U? 0 and 1 is 0. 0 and U is 0. 1 and U is U -- etc. You can work out the rest with that hint, I think.

Edit to add: Why are U's important? They uncover a large class of reset bugs and bus-clash bugs. I once worked on a mainframe CPU where we simulated the design using a two-state simulator. Most of the bugs in bring-up were getting out of reset. Once we could do load-add-store-jump, the rest just mostly worked. Reset bugs suck.

jacquesm5y ago

> Reset bugs suck.

Indeed they do. And even if you have working chips you get the next stage: board level reset bugs. A MC68K board I helped develop didn't want to boot, some nasty side effect of a reset line that didn't stay at the same level long enough stopped the CPU from resetting reliably when everything else did just fine. That took a while to debug.

rcxdude5y ago

Because it's substantially faster. Simulating a large CPU design in software is slow and it doesn't parallelise well, so your tests will take a lot longer (and these aren't fast even with FPGA acceleration: runtimes can be days or weeks if you're running a large fraction of the design for even a tiny amount of time in the simulation).

NotCamelCase5y ago

SW-based simulation is mostly about functional correctness and robustness of an implementation. Even with cycle-accurate simulations there is a lot of data you can't just extrapolate from simulation results pertaining to timing and performance constraints. And that's where emulating CPU/GPU/ASIC designs generally help the most.

justicezyx5y ago· 2 in thread

FPGAs are good at nothing in the scale that can challenge non-configurable silicons...

They are good at a lot of things that are in a smaller scales. Like general prototyping/testing/simulation, telecom, special-purpose real-time computing etc.

The behind-scene logic is that FPGAs can never make things as flexible as software. And flexible software always offset the inefficiency in a non-configurable chips. Just comparing FPGAs and CPUs/GPUs will never teach FPGAs vendors the reality, or they choose to ignore after all...

GeorgeTirebiter5y ago

I believe you are incorrect. A counterexample to your claim is the increasing use of FPGAs in the datacenter. And various AI engines are FPGA-based. You'll do better for a CPU in Real Silicon; but a full-featured MPU w/standard peripherals + FPGA for unusual & must-be-fast functions is hard to beat.

justicezyx5y ago

Tell me how much users are using FOGAs and why xillinx is just a fraction of nVidia's market cap. 5 years ago, nvidia was 2x of xillinx in market cap, now it's 10x.

bsder5y ago· 1 in thread

The problem that FPGAs have is that they are only good for low-volume solutions that require flexibility and have no power constraints.

That's a really narrow market. Telecom equipment and lab equipment, basically.

If I need volume, I need at least an ASIC. If I need to manage power, I need a full custom design.

GeorgeTirebiter5y ago

MicroSemi (now part of Microchip) makes some low-power FPGAs. Xilinx has made the coolrunner CPLDs for years that are mighty low-power (they're not huge, but often are big enough for some needed extra logic.). (Another not care too much about power is Military.)

retro_guy5y ago

Maybe you will find this article about Large-Scale Field-Programmable Analog Arrays [FPAAs] interesting as well: https://hasler.ece.gatech.edu/FPAA_IEEEXPlore_2020.pdf

inaccel5y ago

2 are the main challenges of the FPGA utilization:

- The first one is the FPGA programming. Now using OpenCL and HLS is much easier compared to VHDL/verilog to design your own accelerators.

- The second one is the FPGA deployment and integration. Until now it was very difficult to integrate your design with applications, to scale-out efficiently and to share it among multiple threads/users. The main reason was the lack of an OS_layer (or abstraction layer) that would enable to treat FPGAs as any other computing resource (CPU, GPU).

This is why at inaccel we developed a unique vendor-agnostic orchestrator for FPGAs. The orchestrator allows much easier integration, scaling and resource sharing of FPGAs.

That way we have managed to decouple the FPGA designer from the software developer. The FPGA designer creates the bitstream and the software developer just call the function that wants to accelerate. No need to define the bitstream file, no need to define the interface or the memory buffer allocation.

And the best part: It is vendor and platform agnostic. The FPGA designer creates multiple bitstream for different platform and the software developer couldn't care less. The developer just call the function and the inaccel FPGA orchestrator magically configure the right FPGA for the right function.

andromeduck5y ago

IMO the next big application for FPGAs is going to be to serve as a programmable DMA-engine of sorts. Have some a bunch of hard logic like ALUs and/or IO/s strewn about. Like for hw accelerated sql queries, malloc/free, data-specific compressors and the like.

m3kw95y ago

The thing with FPGA is that companies when faced with cash and time crunch will opt to use a FPGA instead of designing ASICs. The tools suck but companies will hire someone that will do it. FPGA fit a very particular constraint and still solves very specific problems efficiently

PanosJee5y ago

inaccel.com is making lots of steps to bring FPGA to 2020

Spark/k8s integration Abstraction of popular cores Python APIS Serverless deployments Etc

j / k navigate · click thread line to collapse

158 comments

72 comments · 13 top-level

jcranmer5y ago· 18 in thread

As a bit of a counterpoint:

One of my prior projects involved working with a lot of ex-FPGA developers. This is obviously a rather biased group of people, but I saw a lot of feedback around that was very negative about FPGAs.

kyboren5y ago

exmadscientist5y ago

Very much this.

(Also, in general, FPGA tools are just some of the lowest quality garbage out there... and that is saying something. They're that bad. This is a completely unnecessary speedbump.)

Thus, FPGAs languish.

6 more replies

Stubb5y ago

1 more reply

cmrdporcupine5y ago

See it's funny, I (software guy) have recently started doing a bunch of FPGA stuff on the side for "fun" and I find the programming model to not be the biggest challenge.

On the other hand, I've never particularly enjoyed working with GPUS, CUDA, etc.

So I would agree with your statement that the structural issues prevent their utility in wider market classes -- and those really are as you say ... lower clock speeds, cost, but also vendor tooling.

jjoonathan5y ago

I don't mean to belittle your exploration, but are you sure it's an apples-to-apples comparison? This suggests to me that it isn't:

> it seems like hardware engineers have a fetish for all-encompassing painful vendor specific IDEs

3 more replies

tieze5y ago

LLVM folks have actually just started on such tooling: CIRCT. With Chris Lattner at the helm, and industry players like Xilinx and Intel seemingly on board.

daxfohl5y ago

1 more reply

lnsru5y ago

sfgnilnio5y ago

You ever work with an FPGA? The programming model and the tooling are a huge part of the problem.

Verilog and VHDL have basically nothing in common with any language you've ever used.

Compilation can take multiple days. This means that debugging happens in simulation, at maybe 1/10000th of the desired speed of the circuit.

If you try to make something too big, it just plain won't fit. There is no graceful degradation in performance; an inefficient design will just not function, come Hell or high water.

formerly_proven5y ago

> You ever work with an FPGA? The programming model and the tooling are a huge part of the problem.

1 more reply

tyingq5y ago

exmadscientist5y ago

5 more replies

rogerbinns5y ago

borramakot5y ago

cbzoiav5y ago

Thats not entirely true - the flexibility can have its own value. Unlike an ASIC you can handle multiple workloads or update flows.

Another example is test equipment where you can't run the tests in parallel. A single piece of hardware can be far more portable / cost effective.

1 more reply

Unklejoe5y ago

> I'll assert that the only benefits of FPGAs over ASICs are one time costs and time to market.

There's one more big one: the ability to update the logic in the field.

rthomas65y ago

mdiesel5y ago

I work with fpgas, but from LabVIEW. NI have put some effort into making the same language work for everything including fpgas, and a graphical language is great for this kind of work.

It's so easy that it's quite common to see people pass off work onto the fpga if it involves some slightly heavier data processing, which is exactly how it should be.

d_silin5y ago· 12 in thread

jeffreyrogers5y ago

derefr5y ago

In a DAW, accelerating a heavy VST plugin might make sense. But often those are amenable to being translated to GPGPU code already.

not2b5y ago

deelowe5y ago

vzidex5y ago

> issue with this is the amount of time it takes to load the bitstream, but I thought I saw some things recently where progress was being made on this front

You saw correctly, work is indeed being done to build "shells" that can accept workloads without the user having to go through the FPGA tooling/build process.

daxfohl5y ago

tails4e5y ago

sod5y ago

Isn't that what Apple did with that Afterburner Card for the MacPro? I read in https://www.anandtech.com/show/15646/apple-now-offering-stan... that that card is an fpga.

I could imagine that Apple will include something like this in their Apple Silicon SOC for ARM macs.

The Afterburner Card is not user programmable, but maybe it may in the future and this was just the first try to get the hardware in the field.

rustybolt5y ago

SomeoneFromCA5y ago

You might actuall go even faster than PCIe, by pretending being a DDR4 memory stick.

geforce5y ago

IIRC some CPUs of the Intel Atom series already have an embedded FPGA.

duskwuff5y ago

Intel has launched a couple of Xeon Gold CPUs (like a variant of the 6138P) with integrated FPGAs for specific markets. Nothing mass-market, though, and they don't seem to have caught on much.

rwmj5y ago· 9 in thread

> Intel, AMD, and many other companies use FPGAs to emulate their chips before manufacturing them.

UncleOxidant5y ago

Having been involved with CPU emulation in the past a couple of comments:

1. It's not just a single FPGA but a large box full of them. for example: https://www.synopsys.com/verification/emulation/zebu-server....

variaga5y ago

Of the half-dozen semiconductor- designing companies I've worked for, all of them used FPGAs for emulation.

- modern FPGAs are huge.

- when an asic design won't fit in a single FPGA, it's usually possible to partition the design into multiple FPGAs

- software emulation/ simulation is not guaranteed to be "more accurate". FPGAs can interact with a real-world environment in ways that simulation simply cannot

- simulations run 1000s of times slower than FPGAs. Months of simulation time can be covered in minutes on the FPGA

Edit: to be clear, they all use simulation too, but FPGAs are used to accelerate the verification process

GeorgeTirebiter5y ago

Is that still true in 2020? Or is the simulation getting good enough to skip the FPGA prototyping phase?

1 more reply

TomVDB5y ago

Many years ago, we had a custom made board with 8 huge Xilinx Virtex 5 FPGAs (the largest available at the time) to emulate a large SOC. Those FPGAs were something like $20K a piece.

We had 10 such boards, good for millions of dollars in hardware, and a small team to keep it running.

These platform were mostly used by the firmware team to develop everything before real silicon came back. It could run the full design at ~1 to 10MHz vs +500MHz on silicon or 10kHz in simulation.

After running for a while, that FPGA platform crashed on a case where a FIFO in a memory controller overflowed.

Our VP of engineering said that finding this one bug was sufficient to justify the whole FPGA emulation investment.

jacquesm5y ago

mindentropy5y ago

The multiple FPGA on a board is generally from Dini Group right? Fantastic boards.

Ref: https://www.dinigroup.com/web/index.php

3 more replies

k0stas5y ago

jcranmer5y ago

The FPGA emulator for a chip I was working on involved an entire rack of FPGAs... for a single core.

formerly_proven5y ago

https://www.youtube.com/watch?v=650yVg9smfI

wwarner5y ago· 7 in thread

rwmj5y ago

duskwuff5y ago

> Or you might imagine a chip that has an FPGA on the side (I expected Intel would ship this after acquiring Altera, but it never happened).

They did: https://www.anandtech.com/show/12773/intel-shows-xeon-scalab...

But I get the sense this part was aimed at a few very specific customers. It required some PCB-level power delivery changes, so you couldn't even drop it into a standard server motherboard.

jeffreyrogers5y ago

rcxdude5y ago

rustybolt5y ago

Amazon already has FPGA's on the cloud: https://aws.amazon.com/ec2/instance-types/f1/

I don't think they are very popular though. Maybe they are used sometimes for machine learning?

glitchc5y ago

It would also open up new attack vectors.

thehappypm5y ago

That's the real nightmare. Now all of a sudden, you can program the CPU itself if you can access the update mechanism. CPUs being non-programmable is a feature as well as a bug.

1 more reply

lnsru5y ago· 6 in thread

tails4e5y ago

I posted this elsewhere, there are a lot of good resources and examples for the tools:

https://github.com/xupgit/FPGA-Design-Flow-using-Vivado/tree...

https://www.xilinx.com/support/university.html

https://www.xilinx.com/video/hardware/getting-started-with-t...

There are others thst cover the SDK side of things, but the HW side/Vivado is well documented.

beefok5y ago

tails4e5y ago

1 more reply

phendrenad25y ago

mindentropy5y ago

Have you looked at open source solutions? Tim Ansell is managing some great projects on open source solutions. Check out Symbiflow, LiteX, Yosys etc.

lnsru5y ago

2 more replies

Koshkin5y ago· 4 in thread

dbcurtis5y ago

This idea is more than 30 years old. It has been done, and one upon a time companies were built around this idea.

Note: What would 3 state truth tables look like, with states 0,1,U? 0 and 1 is 0. 0 and U is 0. 1 and U is U -- etc. You can work out the rest with that hint, I think.

jacquesm5y ago

> Reset bugs suck.

rcxdude5y ago

NotCamelCase5y ago

justicezyx5y ago· 2 in thread

FPGAs are good at nothing in the scale that can challenge non-configurable silicons...

They are good at a lot of things that are in a smaller scales. Like general prototyping/testing/simulation, telecom, special-purpose real-time computing etc.

GeorgeTirebiter5y ago

justicezyx5y ago

Tell me how much users are using FOGAs and why xillinx is just a fraction of nVidia's market cap. 5 years ago, nvidia was 2x of xillinx in market cap, now it's 10x.

bsder5y ago· 1 in thread

The problem that FPGAs have is that they are only good for low-volume solutions that require flexibility and have no power constraints.

That's a really narrow market. Telecom equipment and lab equipment, basically.

If I need volume, I need at least an ASIC. If I need to manage power, I need a full custom design.

GeorgeTirebiter5y ago

retro_guy5y ago

Maybe you will find this article about Large-Scale Field-Programmable Analog Arrays [FPAAs] interesting as well: https://hasler.ece.gatech.edu/FPAA_IEEEXPlore_2020.pdf

inaccel5y ago

2 are the main challenges of the FPGA utilization:

- The first one is the FPGA programming. Now using OpenCL and HLS is much easier compared to VHDL/verilog to design your own accelerators.

This is why at inaccel we developed a unique vendor-agnostic orchestrator for FPGAs. The orchestrator allows much easier integration, scaling and resource sharing of FPGAs.

andromeduck5y ago

m3kw95y ago

PanosJee5y ago

inaccel.com is making lots of steps to bring FPGA to 2020

Spark/k8s integration Abstraction of popular cores Python APIS Serverless deployments Etc

j / k navigate · click thread line to collapse