Halide, a language for image processing and computational photography (opens in new tab)

(halide-lang.org)

194 pointsroquin12y ago36 comments

36 comments

30 comments · 11 top-level

sharpneli12y ago· 8 in thread

I cannot shake the feeling that this is just yet another iteration of the usual concept.

If you look at Google Renderscript, Microsoft C++AMP, NV Cuda, OpenCL and whatnot you'll realize that they all do exactly the same thing, difference being only in syntax level (with few caveats, but those are relatively minor). Some being more cumbersome to use than others.

Halide looks neat to use syntax wise but it doesn't seem to make the actual algorithms any easier to write. You still have to do the same mental work on same level as you have to do with OpenCL.

All of these are to eachother what Fortran, Pascal and C are to eachother. Same basic idea in different package. I'm waiting for the first system that is equivalent of C++. Something that really brings new concepts to the table instead of just a different syntax.

zalman12y ago

This comment completely misses the fundamental innovation in Halide, which is a separation of "algorithm," a technical term denoting the part of the program that defines the computation, and "schedule," the part of the program which defines the mapping to hardware resources.

The end result is not like OpenCL, CUDA/PTX, or RenderScript. (All of which we regard as suitable target languages for Halide. PTX (CUDA's low-level assembly language) and OpenCL are currently supported.) As a concrete example, there is no explicit memory allocation in Halide and loops are often implicit. The overall character of Halide is that of a functional programming language.

As another example, segmentation violations due to errors in Halide code are impossible, barring bugs in the compiler implementation or errors in user provided C++ code, which Halide has no control over. Some set of bounds errors are reported at compile time and the rest turn into (efficient) assertion checks at runtime.

Schedules do get pretty low-level and writing them is difficult. Algorithms are generally a fair bit easier to write and while there is plenty to be done in improving the language, it is already a step up in productivity over e.g. OpenCL or even straight C++ for a lot of tasks within the domain of data parallel processing.

linhat12y ago

  I'm waiting for the first system that is equivalent of C++.

Not gonna happen anytime soon. C++ is much more than just a programming language, it is an entire "ecosystem". You have toolchains, libraries, drivers, APIs, existing software stacks, all written in C++ and able to interface with C/C++ directly, i.e. no translation layer needed. You're not just gonna replace that with a new language.

Sure, Halide may be seen as just some syntactic sugar (much like quite a few bits and pieces of C++11), but it actually provides you with a different level of abstraction than say OpenCl, MPI or OpenMP where you are very specific about e.g. level of concurrency (which heavily impacts the design of your algorithm) while Halide tries to almost completely separate algorithm and scheduling.

sharpneli12y ago

I maybe wrote it badly. Skrebbel said it a lot better. It's about providing something that's on truly higher level.

It's nice that Halide attempts to separate the algorithm and scheduling, however as of now it's also possible in OpenCL too. The implementations are just so bad at it that it's not really useful.

skrebbel12y ago

> I'm waiting for the first system that is equivalent of C++. Something that really brings new concepts to the table instead of just a different syntax.

Obligatory HN reaction: I'd wait it out for the first Lisp to appear.

iamsalman12y ago

I haven't yet given it a spin but it won't be fair to call it just syntactic sugar. The value it provides, besides a little easier to write syntax is abstracting kernel level code. You don't need to specify block/grid size, move memories around CPU/GPU but I doubt if it can match the performance of hand-written OpenCL/CUDA kernels when it becomes to bleeding-edge Imaging performance.

sharpneli12y ago

You don't also need to specify the block size in OpenCL either. It's just extremely useful to do that.

frozenport12y ago

Did you watch the video? Did you read the article?

This is about optimizing memory locality and parralelism. Its not about getting access to the underlying hardware such as Microsoft C++AMP, NV Cuda, OpenCL.

sharpneli12y ago

I did watch the video. It is a neat idea. However it is something that is already made explicitly possible by OpenCL.

If you write a kernel which doesn't use local memory nor doesn't use the local_id it produces a kernel that is effectively a Halide pipeline stage. The points can be evaluated in arbitrary order (spec says everything is implementation defined).

If we look at the blur example on the video the OpenCL implementation is also free to effectively merge the stages like in Halide. It's because the spec only defines that whatever the previous kernel invocation has written must be visible for the next kernel. Nothing more and nothing less.

Sure OpenCL allows you to fiddle around with low level details, but it also allows you to write completely platform neutral code that is then the responsibility of the platform to actually optimize.

I do agree that Halide allows you to easily explore the different scheduling options, that's something OpenCL is not capable of. In OpenCL you either do it manually or leave it as totally defined by the implementation.

1 more reply

linhat12y ago· 3 in thread

As a computer vision researcher, this looks very interesting, although somehow I have yet to understand how they want to generalize highly complicated optimization patterns (access order, locality, pipelines, platform limitations ...), especially since some algorithms (other than the shown blur filter) require quite complicated access patterns on the image data and can only be hand optimized most of the time (that doesn't imply that they would not benefit from general optimization at all, just that they might be way faster when hand optimized). Still, if Halide produces faster code for some cases (e.g. filter operations amongst others), it will still be worth its salt.

newhouseb12y ago

I went to one of the SIGGRAPH talks they did a couple years ago.

The theoretical plan is basically to write an optimizer that could intuit good schedules, recognizing that they actually don't have a good idea of how to do this. I think they can currently run some ML algorithms that churn through a bunch of different schedules and find the one most fit for a problem but it's rather brute force and slow at the moment.

That said, the conceptual distinction of separating the algorithm from the scheduling also allows you to tune scheduling by hand much easier than would be possible otherwise.

oneofthose12y ago

Since this is a language and a compiler, my guess would be the answer to your question is: the compiler will optimize for the underlying platform. The whole point of Halide is stated in their abstract: "... make it easier to write high-performance image processing code" which is the exact opposite of "hand optimization". Halide allows developers to express what to do in a powerful, domain specific language - the compiler takes care of the "how".

This approach makes a lot of sense: abstract the annoying low level architecture details. They have a lot of targets which is fantastic: x86/SSE, ARM v7/NEON, CUDA, Native Client, and OpenCL. Let the architecture specialist worry about the architecture specifics. The disadvantage: the achieved performance then depends on quality and wisdom of the compiler. But once certain things are optimized for a specific architecture, every user will benefit.

How they do it on the compiler end of things, I'm not sure. There are a number of techniques. Among the simpler is auto-tuning. There is also a new term: "copious-parallelism" [0]. It acknowledges that to achieve performance portability across platforms, algorithms must offer explicit ways of parametrization and tuning to adapt to different platforms. I think this is the right concept but believe that it could be implemented within the compiler. The domain specialist should not have to think about those things.

[0] http://www.hpcwire.com/2014/01/09/future-accelerator-program...

seanmcdirmid12y ago

The paper was specifically about hand doing your schedules, not the compiler doing them automagically, which is a huge pipe dream at any rate. For the programs in the class, you are looking at a few orders of magnitude in perf differences for different schedules, which is why the programmer needs control so they can be guaranteed the performance they are expecting. Compilers only reliably optimize at the few percentage point nice to have level.

1 more reply

rimantas12y ago· 3 in thread

Really cool name. In rare case someone does not get it: http://en.wikipedia.org/wiki/Silver_halide

egeozcan12y ago

By coincidence, "Halide" also happens to be a Turkish female name.

hafif12y ago

I thought the creators were turkish for a few seconds.

Here's the most famous Halide in Turkey: http://en.wikipedia.org/wiki/Halide_Edip_Ad%C4%B1var

jedismith12y ago

Slightly different pronunciation though I think.

1 more reply

sqrt1712y ago· 2 in thread

I might be a stickler here, but this strikes me as a good example of a "domain-specific language" rather than of a general programming language.

Now, it's certainly possible to nail general-purpose programming features onto a DSL (e.g. matlab or R), but the approach (also used by Halide) to embed the DSL into a general-purpose language that allows you to do so (here:C++) is, in my eyes, vastly better suited for scaling up from experimentation code to something that can become part of a larger system (think OpenCV in robotics etc.)

kakali12y ago

They don't advertise it as a general programming language. The end of the video they specifically state it only works with optimizing algorithms that deal with images. General stuff like trees, they have no application towards.

zalman12y ago

The general domain of applicability is data parallel processing. There is very little that is image specific in Halide. So audio, dense linear algebra, some finite element applications, etc. would also be well within the domain.

marpalmin12y ago· 2 in thread

We evaluated this some time ago in the company I used to work but it doesn't have intel compiler nor ms compiler support so it was an instant no go. Anyway it is an interesting idea.

zalman12y ago

You should be able to link generated Halide code(1) with either compiler. (Using Halide generated code does not impose much at all in the way of requirements other than C-level ABI compatibility and basic library support.)

A fair bit of work has recently been done on making Halide work well on the Windows platform. We are erring toward requiring newer versions of Visual Studio however. (E.g. there are advantages on being able to rely on C++11 support for generating Halide code.)

I do not see any issues filed on things not working with Intel's compiler. Feel free to file one. Halide is an open source project and both bug reports and patches help a great deal.

(1) Halide can operate as either a Just In Time (JIT) compiler or an Ahead Of Time (AOT) compiler. Generally AOT is much more convenient for deploying in production. The end result of of compiling a single filter in Halide is an object file and a header file a single C-linkage function with all of its parameters passed in on the call. The minimal runtime is included in the object file and dependencies on libraries are also quite minimal, things like libc and pthreads. Functionality provided by the runtime can easily be overridden by providing one's own definitions for various functions, as weak linkage is used.

There is also a set of Python bindings, which can be used to write Halide code which is then AOT compiled. The AOT compiled object file can be linked into pretty much anything that can handle C functions either via direct linkage or a foreign function interface.

modeless12y ago

You should reevaluate; I'm using it with MSVC right now.

SixSigma12y ago· 1 in thread

As usual, the Unix Room at Bell Labs already had a go at this. It was called Pico and is from 10th Ed. Unix. 1984

I have the print version of book in my collection. It is interesting.

http://spinroot.com/pico/

http://spinroot.com/pico/tutorial.pdf

http://spinroot.com/pico/atttj.pdf

psuter12y ago

It looks like they indeed also had a DSL for image processing. Despite what the Halide presentation may be emphasizing, its novelty is not so much in the DSL than the superoptimizer that comes with it, though.

Now, superoptimization is not a particularly new topic either (see e.g. [1]), but Halide demonstrated that improvements can be significant in this domain. Far more impressive than, e.g., superoptimization applied to generic assembly code (hasn't really worked so far), query optimization, etc.

[1] http://blog.regehr.org/archives/923

blt12y ago

I saw their talk at Siggraph in 2012. This is really interesting work.

The target image processing algorithms are low level. Think "How can I write the fastest 3x3 box blur for Sandy Bridge and later architectures", not "How can I speed up my face detector".

Examples of scheduling decisions that Halide deals with: process image in 4x4? 8x8? 16x16 chunks? use a temporary buffer for an intermediate processing stage, causing worse cache behavior but reusing results that are needed more than once? use a sliding window the size of a couple rows as a compromise?

This kind of work means the difference between a laggy or interactive Gaussian Blur radius slider in Photoshop.

The Halide code shown at the talk was replacing really hard-to-read code with lots of SIMD compiler intrinsics. Dozens of lines of code doing something that would be 5 lines in a naive implementation. With Halide, it's almost as readable as the naive version because the scheduling stuff is separated from the algorithm.

For an application like Photoshop, this is a big win because they will never choose code readability over performance. Performance wins every time. If they can get the same performance with readable code, they are very happy.

GPU code generation falls naturally out of the scheduling intelligence and restricted problem domain.

I have never used Halide, so I do not intend to endorse it, but this line of inquiry is absolutely useful for a certain niche of programming.

adw12y ago

I guess the /? in the url is here to defeat the dupe filter, but the site loaded much faster for me without it.

bitL12y ago

Can you combine this with AVISynth for rendering videos? It would be cool to be able to completely bypass Lightroom when I need to process individual images and do it instead in a single script.

ccozan12y ago

Sincerely, I was expecting something like in Blade Runner [0]. It's quite easy to take "language" for something spoken.

[0] - http://www.imdb.com/title/tt0083658/quotes?item=qt1827458

frozenport12y ago

One wonders if you could build an OpenCV interface using Halide to maintain compatibility but also the potential for optimizing across OpenCV function calls?

I want to write in OpenCV, but I also want the compiler to fix everything across function boundaries :-)

j / k navigate · click thread line to collapse

36 comments

30 comments · 11 top-level

sharpneli12y ago· 8 in thread

I cannot shake the feeling that this is just yet another iteration of the usual concept.

Halide looks neat to use syntax wise but it doesn't seem to make the actual algorithms any easier to write. You still have to do the same mental work on same level as you have to do with OpenCL.

zalman12y ago

linhat12y ago

  I'm waiting for the first system that is equivalent of C++.

sharpneli12y ago

I maybe wrote it badly. Skrebbel said it a lot better. It's about providing something that's on truly higher level.

It's nice that Halide attempts to separate the algorithm and scheduling, however as of now it's also possible in OpenCL too. The implementations are just so bad at it that it's not really useful.

skrebbel12y ago

> I'm waiting for the first system that is equivalent of C++. Something that really brings new concepts to the table instead of just a different syntax.

Obligatory HN reaction: I'd wait it out for the first Lisp to appear.

iamsalman12y ago

sharpneli12y ago

You don't also need to specify the block size in OpenCL either. It's just extremely useful to do that.

frozenport12y ago

Did you watch the video? Did you read the article?

This is about optimizing memory locality and parralelism. Its not about getting access to the underlying hardware such as Microsoft C++AMP, NV Cuda, OpenCL.

sharpneli12y ago

I did watch the video. It is a neat idea. However it is something that is already made explicitly possible by OpenCL.

Sure OpenCL allows you to fiddle around with low level details, but it also allows you to write completely platform neutral code that is then the responsibility of the platform to actually optimize.

1 more reply

linhat12y ago· 3 in thread

newhouseb12y ago

I went to one of the SIGGRAPH talks they did a couple years ago.

That said, the conceptual distinction of separating the algorithm from the scheduling also allows you to tune scheduling by hand much easier than would be possible otherwise.

oneofthose12y ago

[0] http://www.hpcwire.com/2014/01/09/future-accelerator-program...

seanmcdirmid12y ago

1 more reply

rimantas12y ago· 3 in thread

Really cool name. In rare case someone does not get it: http://en.wikipedia.org/wiki/Silver_halide

egeozcan12y ago

By coincidence, "Halide" also happens to be a Turkish female name.

hafif12y ago

I thought the creators were turkish for a few seconds.

Here's the most famous Halide in Turkey: http://en.wikipedia.org/wiki/Halide_Edip_Ad%C4%B1var

jedismith12y ago

Slightly different pronunciation though I think.

1 more reply

sqrt1712y ago· 2 in thread

I might be a stickler here, but this strikes me as a good example of a "domain-specific language" rather than of a general programming language.

kakali12y ago

zalman12y ago

marpalmin12y ago· 2 in thread

We evaluated this some time ago in the company I used to work but it doesn't have intel compiler nor ms compiler support so it was an instant no go. Anyway it is an interesting idea.

zalman12y ago

I do not see any issues filed on things not working with Intel's compiler. Feel free to file one. Halide is an open source project and both bug reports and patches help a great deal.

modeless12y ago

You should reevaluate; I'm using it with MSVC right now.

SixSigma12y ago· 1 in thread

As usual, the Unix Room at Bell Labs already had a go at this. It was called Pico and is from 10th Ed. Unix. 1984

I have the print version of book in my collection. It is interesting.

http://spinroot.com/pico/

http://spinroot.com/pico/tutorial.pdf

http://spinroot.com/pico/atttj.pdf

psuter12y ago

[1] http://blog.regehr.org/archives/923

blt12y ago

I saw their talk at Siggraph in 2012. This is really interesting work.

The target image processing algorithms are low level. Think "How can I write the fastest 3x3 box blur for Sandy Bridge and later architectures", not "How can I speed up my face detector".

This kind of work means the difference between a laggy or interactive Gaussian Blur radius slider in Photoshop.

GPU code generation falls naturally out of the scheduling intelligence and restricted problem domain.

I have never used Halide, so I do not intend to endorse it, but this line of inquiry is absolutely useful for a certain niche of programming.

adw12y ago

I guess the /? in the url is here to defeat the dupe filter, but the site loaded much faster for me without it.

bitL12y ago

Can you combine this with AVISynth for rendering videos? It would be cool to be able to completely bypass Lightroom when I need to process individual images and do it instead in a single script.

ccozan12y ago

Sincerely, I was expecting something like in Blade Runner [0]. It's quite easy to take "language" for something spoken.

[0] - http://www.imdb.com/title/tt0083658/quotes?item=qt1827458

frozenport12y ago

One wonders if you could build an OpenCV interface using Halide to maintain compatibility but also the potential for optimizing across OpenCV function calls?

I want to write in OpenCV, but I also want the compiler to fix everything across function boundaries :-)

j / k navigate · click thread line to collapse