Graph: Abstractions for Structured Computation (opens in new tab)

(blog.getprismatic.com)

143 pointsharper13y ago48 comments

48 comments

37 comments · 14 top-level

w01fe13y ago· 7 in thread

I'm one of the authors of Graph, and I'll be here to answer questions and read comments. Please let us know what you think, and help us make plumbing and Graph better. Thanks!

rryan13y ago

I've poked around at Graph a little bit and it looks nice but I have a few nagging questions bothering me:

fnk's use their argument names to define how to connect edges of the graph together. This means that fnk's are not modular -- they are inherently coupled with the particular graph they are used in by their argument names.

You couldn't make a utility library of commonly used fnk's because you would need their argument names to match up with the graph you are using.

How does this work in practice at Prismatic? If there is a common computation you use across graphs, do you just pull the logic into a regular function in a library and then make a wrapper fnk in each graph that just calls the function? Making a wrapper fnk seems awkward / boilerplate-y to me.

Another thing I wanted to see from Graph but didn't notice any mention of was modular / composable graphs. If I had a simple univariate stats graph (like the example) that takes an input stream and produces aggregate metrics (count, sum, average, median, variance, etc.) I would want to re-use that all over the place as a sub-graph of other graphs. Again, you run into problems with the implicit-glue of using function arguments. How do you know what output names the sub-graph would use? Add a namespace/prefix? It will get messy quickly.

EDIT: I found one mention in a slide from your Strange Loop talk about nesting graphs. Is this transparent to the compiler (i.e. you write a fnk that runs a compiled graph inside of it) or can the compiler optimize and reason about the sub-graph? By composing graphs I am talking about the graph compiler being aware of and able to optimize the computation of the sub-graph. For example, lets say I have a sub-graph that calculates univariate stats and also generates 1TB of random numbers. If no graph node is hooked up to the 1TB of random number output, the Graph compiler should optimize it out and never run it. Is that possible with sub-graphs?

So, I think the implicit-glue of using argument names is a trade-off that may make it hard to improve Graph in the future. It would be very interesting to hear what you Prismatic folks think because you surely discussed the trade-offs while building it.

w01fe13y ago

> fnk's use their argument names to define how to connect > edges of the graph together. This means that fnk's are not > modular

> You couldn't make a utility library of commonly used fnk's > because you would need their argument names to match up > with the graph you are using.

This is a great point. This is an issue that we've largely solved in our own codebase, but haven't quite polished yet -- look for a release soon.

Long-story short, we have a macro 'instance' that works on fnks and graphs, which looks like this:

(def graph-using-stats (graph :my-data ... :stats (instance stats-graph [my-data] {:xs my-data})))

For the fnk case, instance can be defined just as:

(defmacro instance ([f bind m] `(pfnk/comp-partial ~f (fnk ~bind ~m))))

This allows you to provide arguments to a subgraph or node fnk via arbitrary computations on input parameters or other node values, including the trivial case of renaming.

With this in place, you can always name your fnk arguments and Graph nodes whatever makes sense in this particular context, and then adapt the graph to a new circumstance using instance.

We use this strategy extensively across our codebase, and will provide lots more examples as we release more of our infrastructure. Please let me know if this makes sense, seems reasonable to you, or you have questions.

> If no graph node is hooked up to the 1TB of random number > output, the Graph compiler should optimize it out and never > run it. Is that possible with sub-graphs?

Yes, one of the design goals of Graphs is to make everything transparent, until the last second when you compile a Graph. Our current compilation strategies are pretty simple (and it's very simple to build your own), but right now you can lazily compile a hierarchical graph and any results that are unused (including in subgraphs) will not be executed.

w01fe13y ago

Also, here's a direct link to the source:

https://github.com/prismatic/plumbing

and a literate test with lots of real examples:

https://github.com/Prismatic/plumbing/blob/master/test/plumb...

We're also working on other kinds of compilation, including 'direct' ones that compile directly to a single fn, and 'async' ones that handle asynchronous functions and are smarter about spinning up threads:

https://gist.github.com/w01fe/4710008

juiceandjuice13y ago

Hi, do you actually do the bookkeeping for the processing inside graph? How do you do this?

I work on a stream processing/workflow engine used by a few large physics experiments. It's declarative in nature too, but we use XML and let users write the glue. We also have the notion of persistent files and variables, although we don't compile and verify dependencies quite so much.

scott_s13y ago

Is this what you work on? "Combining in-situ and in-transit processing to enable extreme-scale scientific analysis": http://dl.acm.org/citation.cfm?id=2389063

w01fe13y ago

Interesting! What do you mean by 'bookkeeping'?

joe_the_user13y ago

It looks pretty interesting.

chipsy13y ago· 6 in thread

Two related things I've been studying in more depth lately: Dataflow programming and behavior trees(the game AI concept).

http://en.wikipedia.org/wiki/Dataflow_programming

http://www.altdevblogaday.com/2011/02/24/introduction-to-beh...

The first comes up anytime you want to make a signal processing chain more modular and composable(graphics and audio are the classic applications) and many of its concepts share space with FP theory. Graph demonstrates a implementation built around certain needs of web apps. Note that it seems like implementations vary a lot with the data types - audio processing, for example, may allow for cyclical feedback loops, and mainly distinguishes between two types of data - multi-channel PCM data(which may be split and combined between nodes) and parameter changes over time.

The second describes a form of concurrent finite states with good compositional properties - parent-child relationships that result in concurrency expressions passed back to parents(success, failure, in progress). Coroutines are comparable in power, but put emphasis on direct control of the concurrency, while BTs use modules of state + logic with pre-designed yielding points. (I think other finite state constructs have applications, too, BTs just happen to be my focus right now)

I currently believe that highly-concurrent applications can be abstractly architected as a combination of dataflow, behavior trees, and asynchronous events - each one of those covers a very distinct set of concepts surrounding concurrency problems, and they present natural boundary points with each other.

tel13y ago

I'd love to talk with you about this design. I've been looking into a similar kind of build and I'm really curious to compare notes.

chipsy13y ago

Shoot me an email. (I just updated my profile)

vdm13y ago

Behavior Trees as a way to do REST hypermedia. http://vimeo.com/50215125

w01fe13y ago

I'd love to chat about this as well -- I'll send you an email.

msutherl13y ago

Can we make a party?

drudru1113y ago

this is why I read HN. I'm going to enjoy reading about behavior trees for the next week :-) Thanks for posting.

olenhad13y ago· 2 in thread

This is quite amazing, and frankly quite an eyeopener in the way large clojure projects can be organized. Just curious though: does Graph handle cycles?

w01fe13y ago

Thanks!

If by 'handle cycles', you mean 'throw an exception', then yes :). Graph models single-pass data flows, which must be acyclic, and the (graph) constructer and (*-compile) methods throw if you give them cyclic specifications. Do you have a particular use case in mind where cycles are desirable?

olenhad13y ago

I was thinking of nodes with feedback loops which is desirable in some data flows. Particularly learning agents.

1 more reply

msandford13y ago· 2 in thread

Can graph programs modify the graph they're in, or is that completely fixed? Add new computation nodes, say, if necessary.

w01fe13y ago

Any particular execution is fixed once it's compiled. But it's easy to compile different variants of a graph and choose between them based on the input parameters, if that's all you need.

msandford13y ago

I was referring more to "Do this computation and then based on the output, run X or Y" more for automated decision making. When the computation is expensive and you're going for "real time" (people waiting around) then it's nice to shave any measurable fraction of a second.

dschiptsov13y ago· 2 in thread

Something which cannot be made out of conses in Scheme or CL?)

w01fe13y ago

I'm not sure I follow, can you elaborate? I think something similar could be done in CL, although some of the design decisions might be different because Clojure has nice map literals and function metadata.

dschiptsov13y ago

I'm trying to get what all excitement is about. "We have put functions and data in the same graph-like data-structure because Clojure is so cool"?)

1 more reply

vannevar13y ago· 1 in thread

It doesn't take much of a stretch to see Graph integrated with something like Nathan Marz's Storm (also written in Clojure) to provide the distribution and deployment aspect. Have you guys given that any consideration?

w01fe13y ago

For now we're focusing on the in-process use case, which we think is underserved and allows the simplicity of Graph to really shine. That said, distributed Graphs (and possibly, integration with frameworks like Storm) are on the horizon. If this is something you're interested in working with us on, please let us know.

islon13y ago· 1 in thread

What graphs let you do that multimethods and/or protocols/records don't?

w01fe13y ago

Protocols and multimethods are great tools to manage polymorphism, whereas Graph is about composition. We use both extensively in our codebase, and treat them as separate tools in our toolbox for building fine-grained, composable abstractions.

For example, I don't think protocols or multimethods could easily do any of the things mentioned in the second half of the post (execute part of a computation, auto-parallelize it, monitor the components, etc).

That said, there is actually one case where we use Graphs to solve a difficult polymorphism problem, which I discussed a bit in my Strange Loop talk. Our core newsfeed generation logic used to be composed of protocols/multimethods (we tried both), since each feed type (we have about 10) can define different variants of various steps in the pipeline (but most of the steps are the same). This worked fairly well, but as our system grew more and more complex, we found that there was still a lot of overhead, since the protocol had to contain all the steps that could change, leading to lots of extra complexity.

We've replaced all of this with Graph, where we just define an 'abstract' graph with the most common steps, and each feed type modifies the graph by changing or adding steps -- and we've found this way to be much simpler and easy to understand than what we had before.

This case is special, since it involves both a complex composition and polymorphism. Everywhere else in our codebase, we use (and love) protocols and multimethods for polymorphism.

shurcooL13y ago· 1 in thread

This looks very interesting.

It seems to be similar to something I've been thinking about and trying to build lately, so I'm definitely going to check this out.

w01fe13y ago

Thanks! We'd love to hear your feedback -- and if Graph doesn't meet your needs, work with you to fix that.

jared31413y ago· 1 in thread

This initially looks like an IOC Container (StructureMap, etc) with automatic dependency resolution, except you can control the compilation of the internal graph. Is that accurate?

w01fe13y ago

Interesting, I hadn't heard of StructureMap. It seems related, but Graph is less complex -- just the dependency and composition parts, without being tied to any particular use case.

scott_s13y ago

When reading the background on Graph from October (http://blog.getprismatic.com/blog/2012/10/1/prismatics-graph...), I came across this: Of course, this idea is not new; for example, it is the basis of graph computation frameworks like Pregel, Dryad, and Storm, and existing libraries for system composition such as react.

I wanted to point out that the programming model behind Dryad and Storm represent computations as graphs, but that the programming model behind Pregel is for computations on graphs. It's a subtle difference in words, but an enormous difference in what you actually do.

saurabh13y ago

Here's a cool presentation on Graph that I watched a couple of days back.

http://www.infoq.com/presentations/Graph-Clojure-Prismatic

Moocar13y ago

I think this could be used to solve similar problems for event-driven programming. For instance, in Aleph/Lamina (async clojure library), pipelines work great when only one value is returned. But if you want to wait for two remote calls to return in parallel, and feed both results into the next function, the syntax can a bit painful. Here, you could supply something like async-compile which would work similarly to parallel-compile but use pipelines and merge-results under the covers.

maheshcr13y ago

Brilliant! Been following Prismatic/Bradford for a while now and thought you would not share your 'Graph' library.

If one has not stumbled upon specific use cases like disparate data sources, custom/widely varying transformation logic between these data sources and more then it might be difficult to appreciate your contribution. Thanks for this..even if not right away I hope to utilize it for our startup!

owenjones13y ago

Related functionality as data, I like it.

j / k navigate · click thread line to collapse

48 comments

37 comments · 14 top-level

w01fe13y ago· 7 in thread

I'm one of the authors of Graph, and I'll be here to answer questions and read comments. Please let us know what you think, and help us make plumbing and Graph better. Thanks!

rryan13y ago

I've poked around at Graph a little bit and it looks nice but I have a few nagging questions bothering me:

You couldn't make a utility library of commonly used fnk's because you would need their argument names to match up with the graph you are using.

w01fe13y ago

> fnk's use their argument names to define how to connect > edges of the graph together. This means that fnk's are not > modular

> You couldn't make a utility library of commonly used fnk's > because you would need their argument names to match up > with the graph you are using.

This is a great point. This is an issue that we've largely solved in our own codebase, but haven't quite polished yet -- look for a release soon.

Long-story short, we have a macro 'instance' that works on fnks and graphs, which looks like this:

(def graph-using-stats (graph :my-data ... :stats (instance stats-graph [my-data] {:xs my-data})))

For the fnk case, instance can be defined just as:

(defmacro instance ([f bind m] `(pfnk/comp-partial ~f (fnk ~bind ~m))))

This allows you to provide arguments to a subgraph or node fnk via arbitrary computations on input parameters or other node values, including the trivial case of renaming.

With this in place, you can always name your fnk arguments and Graph nodes whatever makes sense in this particular context, and then adapt the graph to a new circumstance using instance.

> If no graph node is hooked up to the 1TB of random number > output, the Graph compiler should optimize it out and never > run it. Is that possible with sub-graphs?

w01fe13y ago

Also, here's a direct link to the source:

https://github.com/prismatic/plumbing

and a literate test with lots of real examples:

https://github.com/Prismatic/plumbing/blob/master/test/plumb...

https://gist.github.com/w01fe/4710008

juiceandjuice13y ago

Hi, do you actually do the bookkeeping for the processing inside graph? How do you do this?

scott_s13y ago

Is this what you work on? "Combining in-situ and in-transit processing to enable extreme-scale scientific analysis": http://dl.acm.org/citation.cfm?id=2389063

w01fe13y ago

Interesting! What do you mean by 'bookkeeping'?

joe_the_user13y ago

It looks pretty interesting.

chipsy13y ago· 6 in thread

Two related things I've been studying in more depth lately: Dataflow programming and behavior trees(the game AI concept).

http://en.wikipedia.org/wiki/Dataflow_programming

http://www.altdevblogaday.com/2011/02/24/introduction-to-beh...

tel13y ago

I'd love to talk with you about this design. I've been looking into a similar kind of build and I'm really curious to compare notes.

chipsy13y ago

Shoot me an email. (I just updated my profile)

vdm13y ago

Behavior Trees as a way to do REST hypermedia. http://vimeo.com/50215125

w01fe13y ago

I'd love to chat about this as well -- I'll send you an email.

msutherl13y ago

Can we make a party?

drudru1113y ago

this is why I read HN. I'm going to enjoy reading about behavior trees for the next week :-) Thanks for posting.

olenhad13y ago· 2 in thread

This is quite amazing, and frankly quite an eyeopener in the way large clojure projects can be organized. Just curious though: does Graph handle cycles?

w01fe13y ago

Thanks!

olenhad13y ago

I was thinking of nodes with feedback loops which is desirable in some data flows. Particularly learning agents.

1 more reply

msandford13y ago· 2 in thread

Can graph programs modify the graph they're in, or is that completely fixed? Add new computation nodes, say, if necessary.

w01fe13y ago

Any particular execution is fixed once it's compiled. But it's easy to compile different variants of a graph and choose between them based on the input parameters, if that's all you need.

msandford13y ago

dschiptsov13y ago· 2 in thread

Something which cannot be made out of conses in Scheme or CL?)

w01fe13y ago

dschiptsov13y ago

I'm trying to get what all excitement is about. "We have put functions and data in the same graph-like data-structure because Clojure is so cool"?)

1 more reply

vannevar13y ago· 1 in thread

w01fe13y ago

islon13y ago· 1 in thread

What graphs let you do that multimethods and/or protocols/records don't?

w01fe13y ago

This case is special, since it involves both a complex composition and polymorphism. Everywhere else in our codebase, we use (and love) protocols and multimethods for polymorphism.

shurcooL13y ago· 1 in thread

This looks very interesting.

It seems to be similar to something I've been thinking about and trying to build lately, so I'm definitely going to check this out.

w01fe13y ago

Thanks! We'd love to hear your feedback -- and if Graph doesn't meet your needs, work with you to fix that.

jared31413y ago· 1 in thread

This initially looks like an IOC Container (StructureMap, etc) with automatic dependency resolution, except you can control the compilation of the internal graph. Is that accurate?

w01fe13y ago

Interesting, I hadn't heard of StructureMap. It seems related, but Graph is less complex -- just the dependency and composition parts, without being tied to any particular use case.

scott_s13y ago

saurabh13y ago

Here's a cool presentation on Graph that I watched a couple of days back.

http://www.infoq.com/presentations/Graph-Clojure-Prismatic

Moocar13y ago

maheshcr13y ago

Brilliant! Been following Prismatic/Bradford for a while now and thought you would not share your 'Graph' library.

owenjones13y ago

Related functionality as data, I like it.

j / k navigate · click thread line to collapse