The tl;dr of Haxl: what if you could describe accessing a data store (a la SQL) and have the compiler and library work together to "figure out" the most efficient way to perform queries, including performing multiple queries in parallel? That's what Haxl does, it allows you to specify the "shape" of your query, the type checker verifies its correctness, and the library executes it in parallel for you, without the developer having to know about synchronizing access or anything.
Here's a link to their paper (PDF): http://www.haskell.org/wikiupload/c/cf/The_Haxl_Project_at_F...
* - I am not sure if he's still committing, or if he's only doing application development. His accomplishments in Haskell land though, are many.
Edited: I removed my comment about GitHub issues, seems it's a known problem. :)
[1] http://chimera.labs.oreilly.com/books/1230000000929
[2] http://www.serpentine.com/blog/2014/03/18/book-review-parall...
I haven't used databases much, but don't most SQL implementations already "figure out" the most efficient way to perform queries? Can't most implementations already perform queries in parallel?
http://www.haskell.org/haskellwiki/ZuriHac2014#Talk_by_Simon...
As said by @nbm, we also have a blog post up: https://code.facebook.com/posts/302060973291128/open-sourcin....
(I appreciate that migrating code that already works to a new language often just introduces bugs for no gain, so please don't take my questions as trying to dig up dirt or anything. I'm genuinely just curious.)
We previously had a custom DSL and it outgrew it's DSL-ness. The DSL was really good at one thing (implicit concurrency and scheduling io), and bad at everything else (cpu, memory, debugging, tooling). The predecessor was wildly successful and created new problems. Once all those secondary concerns became first order, we didn't want to start building all this ecosystem stuff for our homemade DSL. We needed to go from DSL to, ya know, an L. So the question is which...
If you understand the central idea of Haxl, I don't know of any other language that would let you do what Haxl in Haskell does. The built in language support for building DSLs (hijacking the operators including applicative/monadic operations) -really- shines in this case. I would -love- to see haxl-like implicit concurrency in other languages that feel as natural and concise. Consider that a challenge. I thought about trying to do it in C++ for edification/pedagogical purposes but it's an absolutely brutal mess of templates and hackery. There may be a better way, though.
It contains a lot more information about the problem it was originally created to solve, and potential other use cases.
I am really interested in seeing how you solve problems for distributed systems with Haxl and how query sharding is handled etc..
I've wasted a whole day looking for Haxl online a few weeks ago, just to find out that it wasn't released yet. The release really makes me happy :)
Query sharding is at the data source layer, which Haxl doesn't delve into. It's up to each data source integration with Haxl to do the appropriate routing/etc.
Hope you find it useful!
Is it like a query engine, where you work with the entire query up-front, apply transforms and build a query plan?
Or is it more like an event loop, where you run as far as you can until the code blocks on IO, batch up and send all the pending IO requests, and run further when the tasks you're blocked on resolve?
That said, the way it currently works is more like the first. You can think of the entire haxl run (program) as an AST that is given to the execution. It expands as much of the AST as possible (anything that's not IO), and anywhere it needs IO it enqueues those requests to be scheduled. Once it's explored as much as possible, it aggressively schedules the IO (deduping, batching, and overlapping the calls). Once it all comes back, it unblocks the AST where it can, and repeats the process.
This isn't necessarily the optimal scheduling (as you point out, unblocking each part of the tree as each result comes in might be better). It was specifically designed to make it easy to play with this kind of stuff later. Since the concurrency is entirely implicit the implementation is entirely abstracted away.
Interpreted code was no longer cutting it for perf reasons, and any time you create your own language you end up reinventing the entire tool chain (debuggers, profilers, etc.). Haskell provides so much functionality in the language itself and has mature solutions to the other issues plaguing us in FXL, so it was a natural choice.
Why do Haskell libraries on Hackage doesn't come even with a single example, getting started, how to use, quick start, nothing, really, just function declarations? This scares Haskell newbies.
Most libraries list a "Home Page" that more often than not includes more useful documentation (Haxl's, for example, has the things you've mentioned).
I concur, that most of the time, the documentation on Hackage isn't really sufficient, but I've found that for the most part I just use it to find the homepage, and then go there to read the actual documentation.
I agree that it would be nice if everything was all in one place.
I actually find "distilled reference with links to source" a fantastically valuable view. I've no objection to providing some sort of combined view, but let's not lose what we have in a quest for consolidation. I've no idea if that's what you meant or not, and don't mean to put words in your mouth of course, just expressing a concern.
http://hackage.haskell.org/package/pipes-4.1.2/docs/Pipes-Tu...
http://hackage.haskell.org/package/aeson-0.7.0.6/docs/Data-A...
Like all Facebook services, they are communicated with over the Thrift RPC system, and may have PHP (or any other language) clients, and may talk to other services using Thrift (or occasionally other protocols), some of which may use PHP.
If you're asking a general question about batching requests in PHP, http://docs.hhvm.com/manual/en/hack.async.php may be informative.