undefined | Better HN

0 pointsfoxhill12y ago0 comments

> But yes, OpenCL with its basic C API is way behind what CUDA offers in terms of language support.

strange, language support is really the only thing that CUDA doesn't have over OpenCL. there are C++ (and python, Java, various others) bindings for host code, at least. if you're looking to use device intrinsics in your kernel code (at the cost of portability) then blame nvidia for not exposing it (and for their lack of support for OpenCL in general).

> Maybe SPIR will fix it, but it remains to be seen if anyone on HPC will care.

yes, there are people doing HPC that care.

0 comments

6 comments · 2 top-level

oneofthose12y ago· 2 in thread

I guess "language support" was referring to the restrictions within the kernel language. And CUDA is better there - they support templates, you can typecheck a CUDA kernel call. With OpenCL you have to jump through hoops to get that. While some projects have managed to do it, it could be better still.

What bugs me about OpenCL is the intentional vagueness of the specification that gives every implementer the freedom to do whatever they want with the result that performance portability is often difficult to achieve.

foxhillOP12y ago

templates are not something that (outside of simple uses) you'd want to use in your kernels. regardless, nvidia should be pushing their improvements through to OpenCL by exposing extensions. they might get adopted into the core profile.

> What bugs me about OpenCL is the intentional vagueness of the specification that gives every implementer the freedom to do whatever they want with the result that performance portability is often difficult to achieve.

well, that flexibility is required for OpenCL to be meaningful. that's where the variation in the hardware platforms exists. it's what differentiates compute devices. if that vagueness wasn't there, then we couldn't have things like OpenCL on FPGAs (altera, xilinx)

as for your statement on performance portability, perhaps that is an issue (but that's entirely dependent on the type of problem you're trying to compute). but something i don't understand is this;

you could have picked a proprietary API to do your compute. but say you choose CL. you optimize for your hardware, then what do you know - it's not really that fast on other hardware. but you're entirely overlooking the biggest boon here - your code ran on the other hardware in the first place. getting performant code is now only a matter of optimizing for that piece of hardware.

you could argue that's entirely too complicated, but that's what we have been doing already with our regular C/C++ programs (SSE/AVX/SMP...)

oneofthose12y ago

Templates are an essential tool to write type-independent algorithms. They enable meta-programming, an invaluable tool to provide flexible yet efficient active libraries to users. They allow automated kernel-space exploration. So templates are exactly what you want.

I understand the need for a standard that supports various different architectures, even architectures that might not exist yet. I guess I just dislike the way the did it. Compared to other standards (that also leave various things to the implementer), I think they did a poor job. They should have defined the semantics and the types better. The entire buffer mapping for example is a huge mess. Nvidia went ahead and fitted pinned memory in there somewhere. Others didn't, with the result that the meaning of the code changes completely depending on which library you link against.

I'm not arguing against OpenCL here, I'm saying they could do even better. It should not be too much effort too. And if companies like Apple and Google would have chimed in, we would have pretty awesome OpenCL standard and implementations today.

As for your argument about hand-optimization: C++ library implementers [0,1] (and compiler vendors probably too) found abstractions, tricks and tools that give performance portability today. They are of course domain-specific but it is possible.

[0] https://github.com/MetaScale/nt2 [1] http://eigen.tuxfamily.org/

1 more reply

pjmlp12y ago· 2 in thread

So where are those Fortran and C++ compilers for OpenCL kernel code?

foxhillOP12y ago

at the kernel level, fortran isn't really that different to C. it has a power operator. there is no for loop. you certainly will not be reading from files inside a kernel, so.. why do you want to use fortran? i can't tell you whom, but one of the big OpenCL vendors is actually working on fortran OpenCL kernels, as a direct result of SPIR.

if you want C++ in your kernels.. well, you're going to have a bad time if you want performance.

MaxBarraclough12y ago

> why do you want to use fortran?

Performance, and legacy code?

> if you want C++ in your kernels.. well, you're going to have a bad time if you want performance.

Not necessarily, no. Only if you abuse it.

1 more reply

j / k navigate · click thread line to collapse

0 comments

6 comments · 2 top-level

oneofthose12y ago· 2 in thread

foxhillOP12y ago

as for your statement on performance portability, perhaps that is an issue (but that's entirely dependent on the type of problem you're trying to compute). but something i don't understand is this;

you could argue that's entirely too complicated, but that's what we have been doing already with our regular C/C++ programs (SSE/AVX/SMP...)

oneofthose12y ago

[0] https://github.com/MetaScale/nt2 [1] http://eigen.tuxfamily.org/

1 more reply

pjmlp12y ago· 2 in thread

So where are those Fortran and C++ compilers for OpenCL kernel code?

foxhillOP12y ago

if you want C++ in your kernels.. well, you're going to have a bad time if you want performance.

MaxBarraclough12y ago

> why do you want to use fortran?

Performance, and legacy code?

> if you want C++ in your kernels.. well, you're going to have a bad time if you want performance.

Not necessarily, no. Only if you abuse it.

1 more reply

j / k navigate · click thread line to collapse