undefined | Better HN

0 pointsexmadscientist5y ago0 comments

Very much this.

(Also, in general, FPGA tools are just some of the lowest quality garbage out there... and that is saying something. They're that bad. This is a completely unnecessary speedbump.)

The rebuttal to your objection is always tools like "HLS" (High-Level Synthesis), or in English it's "C to HDL" (FPGAs are 'programmed' in the two Hardware Definition Languages VHDL (bad) or Verilog (worse, but manageable if you learn VHDL first).) These are not programming languages, they are hardware definition languages. That means things like "everything in a block always executes in parallel". (Take that, Erlang?) In fact, everything on the chip always executes in parallel, all the time, no exceptions; you "just" select which output is valid. That's because this is how hardware works.

This model maps very, very poorly to traditional programming languages. This makes FPGAs hard to learn for engineers and hard to target for HLS tools. The tools can give you decent enough output to meet low- to mid-performance needs, but if you need high performance -- and if not, why are you going through this masochism? -- you're going to need to write some HDL yourself, which is hard and makes you use the industry's worst tools.

Thus, FPGAs languish.

0 comments

21 comments · 6 top-level

kyboren5y ago· 9 in thread

> The rebuttal to your objection is always tools like "HLS"

Yup. I know HLS has gotten a lot better recently but my impression is that, somewhat like fusion, HLS as a first-class design paradigm is always a decade away.

> FPGA tools are just some of the lowest quality garbage out there

Absolutely. I think the problem is vendors see FPGA tooling as a cost center and a necessary evil in order to use their real products, the chips themselves. Users are also highly technical and traditionally have no alternative, so (mostly) working but poor-quality software is simply pushed out the door. "They'll figure it out".

Finally, to expand on the difficulties imposed by physical constraints, I think another huge blocker to wide adoption is that FPGAs are physically incompatible. I cannot take a bitstream compiled for one FPGA and program it to any other FPGA. Hell, I can't even take a bitstream compiled for one FPGA and use that bitstream for any other device in the same device family. Without some kind of standardized portability, FPGAs will remain niche devices used only for very specific applications.

s_gourichon5y ago

> cannot take a bitstream compiled for one FPGA and program it to any other FPGA.

Like considering dumping memory content on a PC and reinject it on another with different RAM layout and devices and complaining the OS and programs can't continue running? Is that a sane expectation?

There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.

Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

Alternatively, would building whole images for many families of FPGA make sense? Feels like programs distributed as binaries for p OS variants times q hardware architectures, each producing a different binary... random example https://github.com/krallin/tini/releases/tag/v0.19.0 has 114 assets.

ianhowson5y ago

> bitstream ... Is that a sane expectation?

No. Bitstream formats are not in any way compatible across devices. Because timing is a factor, even if you had the same physical layout of LUTs and routing, it's unlikely that your design would work.

(From parent)

> use that bitstream for any other device in the same device family

Not at the bitstream level. However, you can take a place&routed chunk of logic and treat it as a unit. You can replicate it (without repeating P&R), move it around, copy it onto other devices in the same family. This is super useful as most FPGA applications have large repeating structures, but P&R doesn't know that it's a factorable unit. It'll repeat P&R for each instance and you'll get unpredictable timing characteristics.

> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

> would building whole images for many families of FPGA make sense

You can license libraries that are a P&R'd blob and drop them into your design. There's no easy way to make this generalizable across devices without shipping the original RTL, and conversion from RTL->bitstream is where most of the pain lies.

kyboren5y ago

> Like considering dumping memory content on a PC and reinject it on another with different RAM layout and devices and complaining the OS and programs can't continue running? Is that a sane expectation?

Even worse; it's more like that plus extracting the raw microarchitectural state of a CPU, serializing it in a somewhat arbitrary way, trying to shove that blob into a different CPU and still expecting everything to continue running.

I'm not necessarily complaining, just pointing out this significant difference WRT software programs running on CPUs.

> There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.

Can you show me an example? I'd like to see this. You do not mean FPGA overlays, correct?

> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

Like you say, at the very least you will need to re-do place and route. But actually the problem is much worse than this. Different FPGAs have different physical resources. Not just differing amounts of logic area, but different amounts of block RAM, different DSP blocks and in varying numbers, high-speed transceivers, etc. This necessitates making different design trade-offs. Simply shoehorning the same design into different FPGAs, even if it were kind of possible, will not work well.

> Alternatively, would building whole images for many families of FPGA make sense?

Currently I think that's the only real option. But the extreme overhead, duplication of effort and maintenance burden make it very unattractive.

My napkin sketch is some sort of generalized array of partial reconfiguration regions with standardized resources in each region. Accelerator applications can distribute versions targeting different numbers of regions (e.g. one version for FPGAs supporting up to 8 regions, one for FPGAs supporting up to 16 regions, etc.). The FPGA gets loaded with a bitstream supporting a PCIe endpoint and management engine, and some sort of crossbar between regions. At accelerator load time, previously mapped, placed, and routed logical regions used in the application are placed onto actual partial reconfiguration regions and connections between regions are routed appropriately. The idea is to pre-compute as much of the work as possible, leaving a lower dimension problem to solve for final implementation. Timing closure and clock management are left as exercises for the reader :P.

1 more reply

phkahler5y ago

Not sure why they think chip details and bitstreams need to be kept secret. If they would open up, people would make better tools for them.

imtringued5y ago

Because competitors could make compatible chips.

vzidex5y ago

>I think the problem is vendors see FPGA tooling as a cost center and a necessary evil

Yes to a degree, but another part of the problem is the "physical constraints" you mention. FPGA tooling has to solve multiple hard problems, on the fly, at large scale (some of the latest chips are edging up to 10M logic elements). Unfortunately for the FPGA industry, I think that this is unavoidable - though a lot of interesting work is being done around partial reconfiguration, which should allow for users to work with smaller designs on a large chip.

kyboren5y ago

Well, that's an explanation for why FPGA compilation flows take so much time, but it's not a good explanation for why the software is so crap.

I think partial reconfiguration is really sexy, but it's been around for a long time. What's new and exciting there? Genuinely curious.

qppo5y ago

> HLS as a first-class design paradigm is always a decade away.

What about Chisel?

henrikeh5y ago

Chisel is not a HSL. Chisel is much closer to VHDL and Verilog, since the hardware is directly described.

1 more reply

panpanna5y ago· 3 in thread

> FPGA tools are just some of the lowest quality garbage out there

I think things are about to change thanks to yosys and other open source tools.

> VHDL (bad) or Verilog (worse,

VHDL (and its software counterpart Ada) are very well thought and great to use once you get to know them (and understand why they are the way they are). Yeah, they are a bit verbose but I prefer a strong base to syntactic sugar.

adwn5y ago

> VHDL (and its software counterpart Ada) are very well thought and great to use once you get to know them (and understand why they are the way they are). Yeah, they are a bit verbose but I prefer a strong base to syntactic sugar.

As a professional FPGA developer: VHDL (and Verilog even moreso) are bad [1] at what they're used for today: implementing and verifying digital hardware designs. In fact, they're at most moderately tolerable at what they were originally intended for: describing hardware.

[1] They're not completely terrible – a completely terrible idea would be to start with C and try to bend it so that you can design FPGAs with it...

tinus_hn5y ago

So what’s better?

1 more reply

roastsquirrel5y ago

Parts of VHDL leave a little to be desired but overall I find it to be a really great language. To the extent I bought Ada 2012 by John Barnes and I kind of like that too after coding in C/C++ etc, but maybe I'm now biased after many years of VHDL coding :) It's not uncommon to see "VHDL is bad" and such like, and I do wonder what the reasons are for those comments.

2 more replies

seldridge5y ago· 2 in thread

> These are not programming languages, they are hardware definition languages.

There's a subtle point in that Verilog/SystemVerilog and VHDL are also just not powerful languages. While parametric, they lack polymorphism, object oriented programming (excluding SV simulation-only constructs), functional programming, etc.

Your point about the abstraction being different is well taken---hardware description languages describe circuits and programming languages describe programs. However, it's exceedingly unfortunate that the industry is stuck in a rut of such weak languages and trying to explain that weakness to hardware engineers, who haven't seen anything else, runs into the "Blub paradox" (e.g., a programmer who only knows assembly can't evaluate the benefits of C++). [^1]

[^1]: http://www.paulgraham.com/avg.html

mikevin5y ago

While there's plenty of room to improve a language like Verilog I fail to see how these paradigms would help me in RTL. What would polymorphism even look like in an environment without a concept of runtime? Can you elaborate and enlighten me?

Edit: Disclaimer, I'm well aware of the pros and cons of these paradigms in software development and use them plenty

seldridge5y ago

(Sorry! Just saw this!)

Polymorphism makes it way easier to build hardware that can handle any possible data type. Things like queues and arbiters beg for type parameters (you should be able to enqueue any data). Without polymorphism you can make something parameterized by data width (and then flatten/reconstruct the data), but it's janky and you lose any concept of type safety (as you're "casting" to a collection of bits and then back).

There was some interesting work out of the University of Washington [^1] to build a "standard template library" using SystemVerilog. Polymorphism was identified as one of the shortcomings that made this difficult (Section 5: "A Wishlist for SystemVerilog"). [^2]

[^1]: https://github.com/bespoke-silicon-group/basejump_stl [^2]: http://cseweb.ucsd.edu/~mbtaylor/papers/BaseJump_STL_DAC_Sli...

jcranmer5y ago· 1 in thread

Again, a counterpoint:

I worked on hardware for something akin to a FPGA on a much coarser granularity (kind of like coarse-grained reconfigurable arrays)--close enough that you have to adapt tools like place-and-route to compile to the hardware. The programming for this was mostly driven in pretty vanilla C++, with some extra intrinsics thrown in. This C++ was close enough to handcoded performance that many people didn't even bother trying to tune their applications by resorting to hand-coding in the assembly-ish syntax.

This helped bolster my opinion that FPGAs aren't really the answer that most people are looking for, and that there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU.

kyboren5y ago

For sure. FPGAs are probably not the answer that most people are looking for. FPGAs are but one point in the trade-off space, and they're not one you jump to "just because".

> [...] there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU

I think CGRAs are really cool but they're even more niche, and I suspect your original point about GPUs eating everyone's lunch applies particularly strongly to CGRAs. The point is well taken, though, and I don't necessarily disagree.

jhj5y ago

The biggest problem with HLS is that the HLS vendors still want to pretend it's "C++ / OpenCL / whatever to gates". What you get is pretending that there is no such concept of a clock even though you know it is always there and you care about it, and the language you are really writing consists mostly of all the crazy pragmas that you have to sprinkle over everything. It ends up failing on both counts: it isn't C++ to gates, and it is an exceedingly difficult HDL to use because it tries to hide the clock from you always even when you really need to do something with it (e.g., a handshake).

A weak spot of high-end commercial HLS tools (Catapult, Stratus) is in interfacing with the rest of the hardware world, and how the clock is handled (SystemC, you handle it yourself) or kind of vaguely (Catapult's ac_channel). Getting HLS to deal with pipeline scheduling is great, but sometimes you want to break through and do something with the clock. Want to write a memory DMA in HLS? Talk AXI? Build a NoC in HLS? Build even something like a CPU in HLS? Interface with "legacy" RTL blocks, whether combinational or straight pipeline or with ready/valid interfaces or whatever? These things are sort of/just feasible at present with these commercial HLS tools, but very very hard (I've tried it).

If they want to stick with it, I think C++11 could provide a superior type-safe metaprogramming facility for building hardware (compared to the extremely primitive metaprogramming and lack of type safety notions in SystemVerilog) or generators such as Chisel or the hand-written Perl/Python/TCL/whatever ones in use at most companies, but sometimes you need to break down and do something with the clock or interface with things that care about a clock, much in the same way that one would put inline asm statements in code. I want to do that, but not have to deal with the clock 95% of the time when I don't really need to, which is where the generators fail (let the tool determine the schedule most of the time). HLS needs to sit between the two: not a generator (glorified RTL), but not "pretend you write untimed C++ all the time" (not hardware at all).

imtringued5y ago

Just let those programmers play around with Redstone in Minecraft before you hand them an FPGA. They'll understand it very quickly.

j / k navigate · click thread line to collapse

0 comments

21 comments · 6 top-level

kyboren5y ago· 9 in thread

> The rebuttal to your objection is always tools like "HLS"

Yup. I know HLS has gotten a lot better recently but my impression is that, somewhat like fusion, HLS as a first-class design paradigm is always a decade away.

> FPGA tools are just some of the lowest quality garbage out there

s_gourichon5y ago

> cannot take a bitstream compiled for one FPGA and program it to any other FPGA.

There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.

Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

ianhowson5y ago

> bitstream ... Is that a sane expectation?

(From parent)

> use that bitstream for any other device in the same device family

> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

> would building whole images for many families of FPGA make sense

kyboren5y ago

I'm not necessarily complaining, just pointing out this significant difference WRT software programs running on CPUs.

> There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.

Can you show me an example? I'd like to see this. You do not mean FPGA overlays, correct?

> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

> Alternatively, would building whole images for many families of FPGA make sense?

Currently I think that's the only real option. But the extreme overhead, duplication of effort and maintenance burden make it very unattractive.

1 more reply

phkahler5y ago

Not sure why they think chip details and bitstreams need to be kept secret. If they would open up, people would make better tools for them.

imtringued5y ago

Because competitors could make compatible chips.

vzidex5y ago

>I think the problem is vendors see FPGA tooling as a cost center and a necessary evil

kyboren5y ago

Well, that's an explanation for why FPGA compilation flows take so much time, but it's not a good explanation for why the software is so crap.

I think partial reconfiguration is really sexy, but it's been around for a long time. What's new and exciting there? Genuinely curious.

qppo5y ago

> HLS as a first-class design paradigm is always a decade away.

What about Chisel?

henrikeh5y ago

Chisel is not a HSL. Chisel is much closer to VHDL and Verilog, since the hardware is directly described.

1 more reply

panpanna5y ago· 3 in thread

> FPGA tools are just some of the lowest quality garbage out there

I think things are about to change thanks to yosys and other open source tools.

> VHDL (bad) or Verilog (worse,

adwn5y ago

[1] They're not completely terrible – a completely terrible idea would be to start with C and try to bend it so that you can design FPGAs with it...

tinus_hn5y ago

So what’s better?

1 more reply

roastsquirrel5y ago

2 more replies

seldridge5y ago· 2 in thread

> These are not programming languages, they are hardware definition languages.

[^1]: http://www.paulgraham.com/avg.html

mikevin5y ago

Edit: Disclaimer, I'm well aware of the pros and cons of these paradigms in software development and use them plenty

seldridge5y ago

(Sorry! Just saw this!)

[^1]: https://github.com/bespoke-silicon-group/basejump_stl [^2]: http://cseweb.ucsd.edu/~mbtaylor/papers/BaseJump_STL_DAC_Sli...

jcranmer5y ago· 1 in thread

Again, a counterpoint:

kyboren5y ago

For sure. FPGAs are probably not the answer that most people are looking for. FPGAs are but one point in the trade-off space, and they're not one you jump to "just because".

> [...] there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU

jhj5y ago

imtringued5y ago

Just let those programmers play around with Redstone in Minecraft before you hand them an FPGA. They'll understand it very quickly.

j / k navigate · click thread line to collapse