fnk's use their argument names to define how to connect edges of the graph together. This means that fnk's are not modular -- they are inherently coupled with the particular graph they are used in by their argument names.
You couldn't make a utility library of commonly used fnk's because you would need their argument names to match up with the graph you are using.
How does this work in practice at Prismatic? If there is a common computation you use across graphs, do you just pull the logic into a regular function in a library and then make a wrapper fnk in each graph that just calls the function? Making a wrapper fnk seems awkward / boilerplate-y to me.
Another thing I wanted to see from Graph but didn't notice any mention of was modular / composable graphs. If I had a simple univariate stats graph (like the example) that takes an input stream and produces aggregate metrics (count, sum, average, median, variance, etc.) I would want to re-use that all over the place as a sub-graph of other graphs. Again, you run into problems with the implicit-glue of using function arguments. How do you know what output names the sub-graph would use? Add a namespace/prefix? It will get messy quickly.
EDIT: I found one mention in a slide from your Strange Loop talk about nesting graphs. Is this transparent to the compiler (i.e. you write a fnk that runs a compiled graph inside of it) or can the compiler optimize and reason about the sub-graph? By composing graphs I am talking about the graph compiler being aware of and able to optimize the computation of the sub-graph. For example, lets say I have a sub-graph that calculates univariate stats and also generates 1TB of random numbers. If no graph node is hooked up to the 1TB of random number output, the Graph compiler should optimize it out and never run it. Is that possible with sub-graphs?
So, I think the implicit-glue of using argument names is a trade-off that may make it hard to improve Graph in the future. It would be very interesting to hear what you Prismatic folks think because you surely discussed the trade-offs while building it.
> You couldn't make a utility library of commonly used fnk's > because you would need their argument names to match up > with the graph you are using.
This is a great point. This is an issue that we've largely solved in our own codebase, but haven't quite polished yet -- look for a release soon.
Long-story short, we have a macro 'instance' that works on fnks and graphs, which looks like this:
(def graph-using-stats (graph :my-data ... :stats (instance stats-graph [my-data] {:xs my-data})))
For the fnk case, instance can be defined just as:
(defmacro instance ([f bind m] `(pfnk/comp-partial ~f (fnk ~bind ~m))))
This allows you to provide arguments to a subgraph or node fnk via arbitrary computations on input parameters or other node values, including the trivial case of renaming.
With this in place, you can always name your fnk arguments and Graph nodes whatever makes sense in this particular context, and then adapt the graph to a new circumstance using instance.
We use this strategy extensively across our codebase, and will provide lots more examples as we release more of our infrastructure. Please let me know if this makes sense, seems reasonable to you, or you have questions.
> If no graph node is hooked up to the 1TB of random number > output, the Graph compiler should optimize it out and never > run it. Is that possible with sub-graphs?
Yes, one of the design goals of Graphs is to make everything transparent, until the last second when you compile a Graph. Our current compilation strategies are pretty simple (and it's very simple to build your own), but right now you can lazily compile a hierarchical graph and any results that are unused (including in subgraphs) will not be executed.
https://github.com/prismatic/plumbing
and a literate test with lots of real examples:
https://github.com/Prismatic/plumbing/blob/master/test/plumb...
We're also working on other kinds of compilation, including 'direct' ones that compile directly to a single fn, and 'async' ones that handle asynchronous functions and are smarter about spinning up threads:
I work on a stream processing/workflow engine used by a few large physics experiments. It's declarative in nature too, but we use XML and let users write the glue. We also have the notion of persistent files and variables, although we don't compile and verify dependencies quite so much.
http://en.wikipedia.org/wiki/Dataflow_programming
http://www.altdevblogaday.com/2011/02/24/introduction-to-beh...
The first comes up anytime you want to make a signal processing chain more modular and composable(graphics and audio are the classic applications) and many of its concepts share space with FP theory. Graph demonstrates a implementation built around certain needs of web apps. Note that it seems like implementations vary a lot with the data types - audio processing, for example, may allow for cyclical feedback loops, and mainly distinguishes between two types of data - multi-channel PCM data(which may be split and combined between nodes) and parameter changes over time.
The second describes a form of concurrent finite states with good compositional properties - parent-child relationships that result in concurrency expressions passed back to parents(success, failure, in progress). Coroutines are comparable in power, but put emphasis on direct control of the concurrency, while BTs use modules of state + logic with pre-designed yielding points. (I think other finite state constructs have applications, too, BTs just happen to be my focus right now)
I currently believe that highly-concurrent applications can be abstractly architected as a combination of dataflow, behavior trees, and asynchronous events - each one of those covers a very distinct set of concepts surrounding concurrency problems, and they present natural boundary points with each other.
If by 'handle cycles', you mean 'throw an exception', then yes :). Graph models single-pass data flows, which must be acyclic, and the (graph) constructer and (*-compile) methods throw if you give them cyclic specifications. Do you have a particular use case in mind where cycles are desirable?
For example, I don't think protocols or multimethods could easily do any of the things mentioned in the second half of the post (execute part of a computation, auto-parallelize it, monitor the components, etc).
That said, there is actually one case where we use Graphs to solve a difficult polymorphism problem, which I discussed a bit in my Strange Loop talk. Our core newsfeed generation logic used to be composed of protocols/multimethods (we tried both), since each feed type (we have about 10) can define different variants of various steps in the pipeline (but most of the steps are the same). This worked fairly well, but as our system grew more and more complex, we found that there was still a lot of overhead, since the protocol had to contain all the steps that could change, leading to lots of extra complexity.
We've replaced all of this with Graph, where we just define an 'abstract' graph with the most common steps, and each feed type modifies the graph by changing or adding steps -- and we've found this way to be much simpler and easy to understand than what we had before.
This case is special, since it involves both a complex composition and polymorphism. Everywhere else in our codebase, we use (and love) protocols and multimethods for polymorphism.
It seems to be similar to something I've been thinking about and trying to build lately, so I'm definitely going to check this out.
I wanted to point out that the programming model behind Dryad and Storm represent computations as graphs, but that the programming model behind Pregel is for computations on graphs. It's a subtle difference in words, but an enormous difference in what you actually do.
If one has not stumbled upon specific use cases like disparate data sources, custom/widely varying transformation logic between these data sources and more then it might be difficult to appreciate your contribution. Thanks for this..even if not right away I hope to utilize it for our startup!