(Also, in general, FPGA tools are just some of the lowest quality garbage out there... and that is saying something. They're that bad. This is a completely unnecessary speedbump.)
The rebuttal to your objection is always tools like "HLS" (High-Level Synthesis), or in English it's "C to HDL" (FPGAs are 'programmed' in the two Hardware Definition Languages VHDL (bad) or Verilog (worse, but manageable if you learn VHDL first).) These are not programming languages, they are hardware definition languages. That means things like "everything in a block always executes in parallel". (Take that, Erlang?) In fact, everything on the chip always executes in parallel, all the time, no exceptions; you "just" select which output is valid. That's because this is how hardware works.
This model maps very, very poorly to traditional programming languages. This makes FPGAs hard to learn for engineers and hard to target for HLS tools. The tools can give you decent enough output to meet low- to mid-performance needs, but if you need high performance -- and if not, why are you going through this masochism? -- you're going to need to write some HDL yourself, which is hard and makes you use the industry's worst tools.
Thus, FPGAs languish.
A weak spot of high-end commercial HLS tools (Catapult, Stratus) is in interfacing with the rest of the hardware world, and how the clock is handled (SystemC, you handle it yourself) or kind of vaguely (Catapult's ac_channel). Getting HLS to deal with pipeline scheduling is great, but sometimes you want to break through and do something with the clock. Want to write a memory DMA in HLS? Talk AXI? Build a NoC in HLS? Build even something like a CPU in HLS? Interface with "legacy" RTL blocks, whether combinational or straight pipeline or with ready/valid interfaces or whatever? These things are sort of/just feasible at present with these commercial HLS tools, but very very hard (I've tried it).
If they want to stick with it, I think C++11 could provide a superior type-safe metaprogramming facility for building hardware (compared to the extremely primitive metaprogramming and lack of type safety notions in SystemVerilog) or generators such as Chisel or the hand-written Perl/Python/TCL/whatever ones in use at most companies, but sometimes you need to break down and do something with the clock or interface with things that care about a clock, much in the same way that one would put inline asm statements in code. I want to do that, but not have to deal with the clock 95% of the time when I don't really need to, which is where the generators fail (let the tool determine the schedule most of the time). HLS needs to sit between the two: not a generator (glorified RTL), but not "pretend you write untimed C++ all the time" (not hardware at all).
I worked on hardware for something akin to a FPGA on a much coarser granularity (kind of like coarse-grained reconfigurable arrays)--close enough that you have to adapt tools like place-and-route to compile to the hardware. The programming for this was mostly driven in pretty vanilla C++, with some extra intrinsics thrown in. This C++ was close enough to handcoded performance that many people didn't even bother trying to tune their applications by resorting to hand-coding in the assembly-ish syntax.
This helped bolster my opinion that FPGAs aren't really the answer that most people are looking for, and that there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU.
> [...] there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU
I think CGRAs are really cool but they're even more niche, and I suspect your original point about GPUs eating everyone's lunch applies particularly strongly to CGRAs. The point is well taken, though, and I don't necessarily disagree.
I think things are about to change thanks to yosys and other open source tools.
> VHDL (bad) or Verilog (worse,
VHDL (and its software counterpart Ada) are very well thought and great to use once you get to know them (and understand why they are the way they are). Yeah, they are a bit verbose but I prefer a strong base to syntactic sugar.
As a professional FPGA developer: VHDL (and Verilog even moreso) are bad [1] at what they're used for today: implementing and verifying digital hardware designs. In fact, they're at most moderately tolerable at what they were originally intended for: describing hardware.
[1] They're not completely terrible – a completely terrible idea would be to start with C and try to bend it so that you can design FPGAs with it...
Yup. I know HLS has gotten a lot better recently but my impression is that, somewhat like fusion, HLS as a first-class design paradigm is always a decade away.
> FPGA tools are just some of the lowest quality garbage out there
Absolutely. I think the problem is vendors see FPGA tooling as a cost center and a necessary evil in order to use their real products, the chips themselves. Users are also highly technical and traditionally have no alternative, so (mostly) working but poor-quality software is simply pushed out the door. "They'll figure it out".
Finally, to expand on the difficulties imposed by physical constraints, I think another huge blocker to wide adoption is that FPGAs are physically incompatible. I cannot take a bitstream compiled for one FPGA and program it to any other FPGA. Hell, I can't even take a bitstream compiled for one FPGA and use that bitstream for any other device in the same device family. Without some kind of standardized portability, FPGAs will remain niche devices used only for very specific applications.
Like considering dumping memory content on a PC and reinject it on another with different RAM layout and devices and complaining the OS and programs can't continue running? Is that a sane expectation?
There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.
Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?
Alternatively, would building whole images for many families of FPGA make sense? Feels like programs distributed as binaries for p OS variants times q hardware architectures, each producing a different binary... random example https://github.com/krallin/tini/releases/tag/v0.19.0 has 114 assets.
No. Bitstream formats are not in any way compatible across devices. Because timing is a factor, even if you had the same physical layout of LUTs and routing, it's unlikely that your design would work.
(From parent)
> use that bitstream for any other device in the same device family
Not at the bitstream level. However, you can take a place&routed chunk of logic and treat it as a unit. You can replicate it (without repeating P&R), move it around, copy it onto other devices in the same family. This is super useful as most FPGA applications have large repeating structures, but P&R doesn't know that it's a factorable unit. It'll repeat P&R for each instance and you'll get unpredictable timing characteristics.
> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?
> would building whole images for many families of FPGA make sense
You can license libraries that are a P&R'd blob and drop them into your design. There's no easy way to make this generalizable across devices without shipping the original RTL, and conversion from RTL->bitstream is where most of the pain lies.
Even worse; it's more like that plus extracting the raw microarchitectural state of a CPU, serializing it in a somewhat arbitrary way, trying to shove that blob into a different CPU and still expecting everything to continue running.
I'm not necessarily complaining, just pointing out this significant difference WRT software programs running on CPUs.
> There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.
Can you show me an example? I'd like to see this. You do not mean FPGA overlays, correct?
> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?
Like you say, at the very least you will need to re-do place and route. But actually the problem is much worse than this. Different FPGAs have different physical resources. Not just differing amounts of logic area, but different amounts of block RAM, different DSP blocks and in varying numbers, high-speed transceivers, etc. This necessitates making different design trade-offs. Simply shoehorning the same design into different FPGAs, even if it were kind of possible, will not work well.
> Alternatively, would building whole images for many families of FPGA make sense?
Currently I think that's the only real option. But the extreme overhead, duplication of effort and maintenance burden make it very unattractive.
My napkin sketch is some sort of generalized array of partial reconfiguration regions with standardized resources in each region. Accelerator applications can distribute versions targeting different numbers of regions (e.g. one version for FPGAs supporting up to 8 regions, one for FPGAs supporting up to 16 regions, etc.). The FPGA gets loaded with a bitstream supporting a PCIe endpoint and management engine, and some sort of crossbar between regions. At accelerator load time, previously mapped, placed, and routed logical regions used in the application are placed onto actual partial reconfiguration regions and connections between regions are routed appropriately. The idea is to pre-compute as much of the work as possible, leaving a lower dimension problem to solve for final implementation. Timing closure and clock management are left as exercises for the reader :P.
Yes to a degree, but another part of the problem is the "physical constraints" you mention. FPGA tooling has to solve multiple hard problems, on the fly, at large scale (some of the latest chips are edging up to 10M logic elements). Unfortunately for the FPGA industry, I think that this is unavoidable - though a lot of interesting work is being done around partial reconfiguration, which should allow for users to work with smaller designs on a large chip.
I think partial reconfiguration is really sexy, but it's been around for a long time. What's new and exciting there? Genuinely curious.
There's a subtle point in that Verilog/SystemVerilog and VHDL are also just not powerful languages. While parametric, they lack polymorphism, object oriented programming (excluding SV simulation-only constructs), functional programming, etc.
Your point about the abstraction being different is well taken---hardware description languages describe circuits and programming languages describe programs. However, it's exceedingly unfortunate that the industry is stuck in a rut of such weak languages and trying to explain that weakness to hardware engineers, who haven't seen anything else, runs into the "Blub paradox" (e.g., a programmer who only knows assembly can't evaluate the benefits of C++). [^1]
Edit: Disclaimer, I'm well aware of the pros and cons of these paradigms in software development and use them plenty
Polymorphism makes it way easier to build hardware that can handle any possible data type. Things like queues and arbiters beg for type parameters (you should be able to enqueue any data). Without polymorphism you can make something parameterized by data width (and then flatten/reconstruct the data), but it's janky and you lose any concept of type safety (as you're "casting" to a collection of bits and then back).
There was some interesting work out of the University of Washington [^1] to build a "standard template library" using SystemVerilog. Polymorphism was identified as one of the shortcomings that made this difficult (Section 5: "A Wishlist for SystemVerilog"). [^2]
[^1]: https://github.com/bespoke-silicon-group/basejump_stl [^2]: http://cseweb.ucsd.edu/~mbtaylor/papers/BaseJump_STL_DAC_Sli...