Differentiable Programming Mega-Proposal (opens in new tab)

(forums.swift.org)

162 pointsxmmrm6y ago74 comments

74 comments

53 comments · 16 top-level

abeppu6y ago· 10 in thread

I'm not a Swift programmer, so perhaps my confusion is just a symptom of broader ignorance, but I find two things unclear here: - What does 'first-class' mean, really? - Which of these benefits are unique to integrating notions of derivatives into the language, and which could be enjoyed well-written libraries?

The mega-proposal links out to a separate doc on embedded DSLs, with broad statements about what's "typical" or "often" true of existing DSLs -- but which of those issues are insurmountable? That section mentions Swift's limited metaprogramming facilities. Why choose to carve out this added support for a single family of algorithms rather than add in some more general metaprogramming abilities that enable better EDSLs?

aylmao6y ago

From the proposal:

> While the differentiation APIs are flexible and fully dynamic, differentiation is based on a program transformation that happens at compile-time. This enables many static analyses that not only help produce more efficient programs, but also detect common numerical programming mistakes such as non-differentiable functions and zero derivatives.

> With a first-class differentiable programming language, some of the most common runtime errors in machine learning become directly debuggable without library boundaries. Simply step through backpropagation using LLDB to debug derivatives.

abeppu6y ago

I still don't get it. Why can't I use a debugger to step through derivatives when autodiff is implemented as a library?

FooBarBizBazz6y ago

Reverse mode autodiff is best implemented with a non-local transformation of the program: First you run the original operations forward; then you run corresponding operations in reverse order.

You can do this with a library by implementing "number" types that, as a side effect of arithmetic operations, record those operations onto a "tape", so that corresponding (different) operations can be played back later in reverse order.

Unfortunately, those side effects have a cost during the forward pass. And during the backwards pass you are essentially running a little interpreter to execute your instructions.

Putting this stuff into the compiler lets the forward pass just be normal, native code on doubles/floats, and the same for the reverse-pass code. Moreover, all of this can now be worked on by the optimizing compiler.

This is especially important for embedded applications where you can't afford all this interpretation.

I guess the original TensorFlow "computation graph" approach is a little different to the "tape" I described because the graph is built explicitly rather than through side effects, but that just makes even more clear that you're really assembling an AST for some /other/ language and (when not using XLA) running an interpreter.

In principle some suitably powerful macro language with full access to the AST might be able to do good compile-time reverse-mode AD, but I am not aware of any language with powerful enough macros. Again, this is because the transformation is not just a local pattern replacement; it involves flipping the code upside down.

What's funny is that physicists have been able to do this kind of program transformation in a slightly clunky(?) way -- FORTRAN in, FORTRAN out -- since the 70s, supposedly.

It all makes me think that our ideas about what a "language" is, what a "compiler" is, what a "library" is, are all stuck in convention and prematurely ossified. The first compilers, which we celebrate, looked to their users like codegen (which we detest, I think)! I would love for the boundary between "language" and "compiler" to be broken down more so that AD could be more easily done "within the language"; maybe one day that will happen in Julia, or Nim, or Jai, or Terra.

But for now I think I agree with the designers that the best hope for good results, and really the most straightforward way, is to just do it in the compiler.

2 more replies

aylmao6y ago

I think it's mostly due to a couple of reasons:

1. The debugging experience is probably better. Computing a derivative can be complex— you might be seeing a high value where you were expecting a low one, and you want to "step through the equation and how it changes. Having to do that when "the equation" is a bunch of obscure data structures, through the internal representation of functions in 3-rd party library could get very complex very quickly.

2. You might not catch non-differentiable functions and zero derivatives. Moreover, given just the nature of math, you could see millions of inputs that wouldn't trigger an exception, ship your model, and then one day the one that yields a 0 shows up, your model crashes, and you don't know why. Having the compiler essentially act as proof that something will _never_ be zero is awesome for correctness and reliability.

If you think about it, derivatives aren't really an operation "through" the equation, but "on" the equation. You're writing some function, but instead of passing a value through it, you're changing the function itself.

So the functions are the values, and need to be changed, morphed, combined, split, etc. If a library wanted to do this, my guess is either:

a) devx would suffer since wouldn't be writing functions normally, but rather defining them as objects with verbose constructors, etc.

b) for the sake of devx, the library would have to do some hacky introspection and jump hoops to get to work on the functions themselves, not with them, at the cost of performance or debuggability.

There's already software we use all the time that takes these functions, breaks them apart, understands them and does things with them though— the compiler. It transforms functions to machine code. Let's have it add a step in the middle there, and if a function is marked as being derived, let's have the compiler take it, transform it to its derivative, and then to machine code ¯\_(ツ)_/¯.

mpweiher6y ago

>Why choose to carve out this added support for a single family

> of algorithms rather than add in some more general

> metaprogramming abilities that enable better EDSLs?

It is The Swift Way™

The answer is always to add something to the language and the compiler, the question seems to be largely irrelevant.

Of course, that's a consequence of not heeding Alan Kay's advice from 1998, that when you design a new language, you need to focus on the metasystem first, the rest will follow.

http://wiki.c2.com/?AlanKayOnMessaging

When you don't do that, every new requirement comes as a surprise that you need to hack into the compiler somehow. Objective-C compatibility: hack the language; Python integration: hack the language; SwiftUI: hack the language; Differentiable Programming: hack the language.

And of course, each additional hack makes the already not-to-elegant base language even more difficult for implementing stuff without hacking the language.

What kind of a metasystem would we need in order to not have to hack the language for all of these features? That, detective, is the right question!

https://www.youtube.com/watch?v=ZKxr0wyIic4

thedataangel6y ago

My understanding of automatic differentiation (AD) is that it's only really possible at the compiler level, since you need the ability to interpret and manipulate function definitions themselves. Certainly, no library would be able to offer the same level of guarantees telling you if you've done it wrong, nor the same opportunities for optimisation.

abeppu6y ago

It's certainly not the case that autodiff is only possible at the compiler level. I've implemented forward mode (via dual numbers) and reverse mode (via tapes / wengert nodes) autodiff in libraries before.

mlevental6y ago

notice the qualifier "really". obviously you can implement autodiff kind of outside the complier since pytorch and tensor flow exist. but those implementations constrain you to a select few compositions (please no comments on Turing completeness with just loops and conditionals). so for example if statements in pytorch are not differentiable (they might have piece wise continuous derivates) because pytorch doesn't actually trace the ast. I'm not a languages expert but outside of implementing in the compiler I imagine you'd need a homoiconic language to implement as a library.

1 more reply

seanmcdirmid6y ago

AD is totally possible at the library level (I did it in C# 10 years ago using Conal Elliott’s ideas), however if you want to autodiff more than an expression-only language, compiler support is useful.

abeppu6y ago

Maybe this is the piece I don't understand. What do you mean by "more than an expression-only language"? What is a non-expression in a language? And what would it mean to have a derivative for your non-expression?

1 more reply

wenc6y ago· 9 in thread

I'm curious (for folks in the know): how does differentiable programming handle non-differentiable points? Can it detect non-differentiable/non-smooth functions?

Non-smooth functions like abs(), max(), min() have points where derivatives do not exist. ReLU functions are non-differentiable at their hinge points.

Disjoint IF-THEN-ELSE conditions are discontinuities in the function space, and are traditionally handled in optimization with mixed-integer formulations (i.e. split up the space and do something clever like branch-and-bound to find the optimum).

chillee6y ago

Other people have answered what they do, but this is the big gap between people talking about 'differentiable programming' in theory, and having it actually work in practice.

It's true that once you have control flow, the gradient quickly becomes meaningless. I posted an example here: https://news.ycombinator.com/item?id=20892287

That's also the biggest reason I tend to find much of this "differentiable programming" stuff to be overhyped. It's hard to reformulate programs in a way s.t. the derivative can mean something meaningful. And I'm not convinced traditional languages will benefit.

That's not to say that there isn't cases where your program can be formulated to have a meaningful derivative.

See this differentiable ray tracer: https://people.csail.mit.edu/tzumao/diffrt/

taliesinb6y ago

> It's hard to reformulate programs in a way s.t. the derivative can mean something meaningful.

Really? The gradients computed by AD are the exact answer to the following question: if I were to change this input or parameter an infinitesimal amount, how much would it change the output of my function? That is always meaningful (when it is defined), and means what I just said. You can easily make functions where it is not defined, of course, just like you can make a sphere into two spheres with the Banach-Tarski theorem!

But there are vast, vast forests of numerical computation employed in industry, science, finance, engineering, where it is almost always defined.

And even for more “chunky” computations where the non-differentiability is more severe, there are algorithms like REINFORCE that you can use to estimate gradients through these parts.

chillee6y ago

It's not meaningful in the sense that it won't correspond with your intuition of "will increasing the input increase my output". Presumably the point of differentiable programming isn't just getting the derivative for fun, it's for optimizing some quantity.

For example, take this code.

     x,y
     for (int i=0; i<x; i++)
         y += 1
     return y

It's technically true that the gradient of 0 is correct (modulo boundaries). But if someone was trying to optimize this function, that's not very helpful.

I believe REINFORCE is not of much help either - it's not magic. I'm not aware of any stochastic gradient estimators that are helpful in this case (although if there is a method I'd like to hear about it).

taliesinb6y ago

While the gradients don’t technically exist, you usually use the limiting value from one side (e.g for the commonly used ReLU activation, you can use either 0 or 1). This is justified by the observation that randomly initialized networks are extremely unlikely to encounter these particular values without some kinder of deeper conspiracy happening. You could put this on a mathematical footing by saying the set of non differentiable points has measure zero, for example.

abeppu6y ago

> randomly initialized networks

But "networks" here, you're thinking of ANNs, yes?

But in the context of proposing differential programming as an addition to a general purpose language (and where the proposal explicitly brings up a bunch of cases outside of deep learning), is it fair to justify behavior based on what makes sense in a popular but narrow application?

taliesinb6y ago

It’s a good question what the plans are for DP languages to handle situations where non-differentiability shouldn’t be ignored.

For sensitivity analysis it might be disastrous to conclude that an output is sensitive to an input when it is actually not, merely because an intermediary ReLU hit 0, for example.

A conservative approach could be to define versions of the relevant functions that threw exceptions at such points, or that also calculated the trusted margin of the resulting gradients; non-differentiability would then produce a zero trust margin.

mlevental6y ago

he/she addressed that - the points at which the function isn't differentiable has measure zero. besides this isn't some kind of new hack - one sided limits (and therefore derivatives) were invented exactly for such cases (min, max, abs) and have been used by mathematicians probably since just about when calculus was invented.

JustFinishedBSG6y ago

> on-smooth functions like abs(), max(), min() where derivatives do not exist

While these functions are not differentiable they are sub-differentiable. Which is an extension of differentiability.

Subderivatives are set, if the set only contain one point the function is differentiable. Otherwise it doesn't matter for gradient descent you can just use any element of the set.

> Disjoint IF-THEN-ELSE conditions are discontinuities in the function space

Same, it doesn't matter, they mathematically are equivalent to indicator functions and are subdifferentiable.

https://en.wikipedia.org/wiki/Subderivative

abeppu6y ago

My understanding is you don't explicitly handle that. For whichever place you're evaluating, you have some computation graph, and you use the elementary operators on that graph. So, if you have defined ReLU with an `if x >= 0 return x` clause, then if you evaluate the derivative at 0, that's the branch you go down, and you say the answer is 1.

hsaliak6y ago· 5 in thread

How does this compare to Jax? https://github.com/google/jax Why do this in the language instead of a library?

taliesinb6y ago

Probably the main justification is that the analysis and transformation steps needed to compute the vjp and jvp pullbacks of a function (which correspond to reverse- and forward-mode automatic differentiation) require enough of the other machinery of a compiler that they are best done WITHIN a compiler. Then other things become quite natural, too, like producing the tangent vector versions of data structures like tuples and maps!

Moreover, a statically typed language like Swift is a much better starting point for this kind of effort than Python. Array shapes and dimensions are already a type system - you might as well go the whole distance and get all the other safety, readability, and efficiency benefits!

PS shout out for named array axes as the future of array-based (and hence differentiable) programming... see http://nlp.seas.harvard.edu/NamedTensor for a good rationale

hsaliak6y ago

jax does have a just in time compiler that lets you compile your python functions to XLA-optimized kernels (through llvmlite under the hood?).

The fact that you can jit compile and gain the benefits of "doing this within the compiler" is one of its main selling points.

mlevental6y ago

is there something in the proposal that addresses named tensors? ctrlf doesn't find anything.

optevo6y ago

If you were targeting a Lisp you could do this as a (macro) library

andrekandre6y ago

that’s basically the conversation that is being had in the linked thread:

should this be done as a special case in the compiler

shouldn’t this be a more general meta programming based feature

with those in favor of meta-programming also bringing up the potential complexity/speed loss added to the compiler for this feature

the problem for swift is, there hasn’t been a coherent meta-programming story (something like a meta-programming mega proposal) so it seems hard to push for that instead right now....

thedataangel6y ago· 4 in thread

This is actually huge. I saw a proof of concept of something like this in Haskell a few years back, but it's amazing it see it (probably) making it into the core of a mainstream language.

This may let them capture a large chunk of the ML market from Python - and hopefully greatly improve ML apis while they're at it.

krapht6y ago

Huh? Nobody is writing numerically intensive libraries in Python. Clearly this language proposal is taking aim at C++ and Fortran. Even if this caused TensorFlow & others to rewrite everything in Swift, people would write Python bindings to it and keep using Python.

I'll get excited if Apple actually merges this into Swift. It's a niche feature that their compiler team will need to maintain forever. I actually have been working on algorithmic differentiation in C++, so it's not even that I wouldn't want to try Swift out if it actually made it in. However, because this sort of thing is of such narrow interest I believe the future will stay with embedded DSLs / libraries / ugly macro/template hackery.

mlevental6y ago

lol this is literally by the group that's rewriting tensorflow in Swift https://www.tensorflow.org/swift so you're off on the intention here in that it is exactly taking aim at python as the main data ecosystem language.

seanmcdirmid6y ago

> I saw a proof of concept of something like this in Haskell a few years back, but it's amazing it see it (probably) making it into the core of a mainstream language.

Probably Conal Elliott’s work, eg in Vertigo (http://conal.net/Vertigo/, circa 2005)? There he was using it for normal computations used in pixel shading, pretty cool stuff. He is still active in this field, and has a lot of new papers that are more ML focused. I do wonder if “general” AD support will be useful for computer graphics as well as ML?

thedataangel6y ago

It was Conal Elliott's stuff, though it was a more recent paper - Compiling to Categories, if memory serves me.

cs7026y ago· 3 in thread

How does this relate to all the work that Christ Lattner et al have been doing at Google with Swift, MLIR, etc.?[a]

Is this... a separate, parallel, more encompassing proposal?

Is there any coordination between these two groups?

[a] https://www.youtube.com/watch?v=yCd3CzGSte8

samtheprogram6y ago

I don't know if "Christ" Lattner was intentional (humorous) or an accident, but I chuckled.

cs7026y ago

Accident.

adamnemecek6y ago

It's the same group.

__erik6y ago· 3 in thread

I'm really excited for this. I'm unaware of any mainstream language with first class support for differentiation, I think its going to be really interesting to see what people use it for out side of ML.

adamnemecek6y ago

Julia is in the same space.

taliesinb6y ago

Yup, work is continuing apace with Julia’s next-gen Zygote project.

Also, from the GP’s thought about applications beyond DL, my favorite examples so far are for model-based RL [1] and Neural ODEs [2]

[1] https://fluxml.ai/2019/03/05/dp-vs-rl.html

[2] https://julialang.org/blog/2019/01/fluxdiffeq

__erik6y ago

Julia is great but it doesnt play in the domain of apps and servers

3 more replies

taliesinb6y ago· 1 in thread

Capitalizing on the presence of people who might be new to Automatic Differentiation and want a deeper understanding of how it works, here is an interactive Colab notebook I wrote about this topic entitled “Build your Own TensorFlow” for the Deep Learning Indaba that just happened in Kenya: https://colab.research.google.com/drive/14GeXkFd5pQKKNIJ7BMs...

Donald6y ago

That's an excellent tutorial - more are available on http://www.deeplearningindaba.com/practicals-2019.html for those interested.

chriscaruso6y ago· 1 in thread

I don't see why a well written library could not serve the same purpose. It seems like a lot of cruft. I doubt, for example, Python would ever consider adding this and it's the defacto language that would benefit the most from something like this - due to the existing tools and communities.

It just seems so narrow and not at the same level of abstraction that languages typically sit at. I could see the language supporting higher level functionality so a library could do this without a bunch of extra work (such as by some reflection).

AbrahamParangi6y ago

I would counter that differentiable programming should perhaps rise to the level of baseline functionality that most languages should offer.

I think the applications for automatic differentiation and gradient optimization well exceed what we think of as ML and data science today.

antpls6y ago· 1 in thread

Can "Differentiable Programming" be related to "Differentiable Privacy", or have we now one word (and acronym!) to describe two different things?

omaranto6y ago

I think you meant "differential privacy", and no, it's not closely related to differentiable programming.

[1] https://en.wikipedia.org/wiki/Differential_privacy

taliesinb6y ago

If you are interested in following along, the Swift for TensorFlow team has a design meeting every Friday. The meetings are live-streamed and anyone can join. I recommend them if you are curious and want to hear more about the challenges and opportunities! A recent talk had Jeremy Howard (of fasti.ai fame) present the full-featured deep learning API he has designed on top of Swift for TensorFlow.

You can find out more on the mailing list https://groups.google.com/a/tensorflow.org/forum/m/#!forum/s...

mark_l_watson6y ago

This will certainly help people who work on new deep learning theories and model architectures, but not so much the large crowd of deep learning practitioners.

In his excellent interview with Lex Fridman, Yann LeCun was critical of any approach to AI that was not differentiable, even constraint satisfaction, and other solid optimization techniques. In the context of scaling to very large problems or models with many billions of parameters, he is probably correct.

I have had problems with the Swift and TensorFlow code drops. Sometimes they work for me and sometime they don’t. So, very good technology but perhaps wait for it to mature. I read that some students for the fast.ai course using Swift have also had some setup difficulties.

EDIT: you might also want to look at Julia for differentiable programming and Julia with deep learning libraries like Flux is also a ‘turtles all the way down’ system, where unlike TensorFlow where the guts are implemented in C++, for Swift and Julia the entire stack can be implemented in a single language.

akhilcacharya6y ago

I really like the idea of building AD directly into the language and compiler infrastructure itself, but the challenge of upstreaming it in a language built for very specific tasks makes me concerned. Is Latner just going to end up making a GSwift? Will it just be forked?

layoutIfNeeded6y ago

Ew, that’s really not something I would do at the language level...

oyashius6y ago

This is super exciting

adamnemecek6y ago

Now do automatic integration.

sethryclaus6y ago

Well done Swift. Insightful language design.

j / k navigate · click thread line to collapse

74 comments

53 comments · 16 top-level

abeppu6y ago· 10 in thread

aylmao6y ago

From the proposal:

abeppu6y ago

I still don't get it. Why can't I use a debugger to step through derivatives when autodiff is implemented as a library?

FooBarBizBazz6y ago

Reverse mode autodiff is best implemented with a non-local transformation of the program: First you run the original operations forward; then you run corresponding operations in reverse order.

Unfortunately, those side effects have a cost during the forward pass. And during the backwards pass you are essentially running a little interpreter to execute your instructions.

This is especially important for embedded applications where you can't afford all this interpretation.

What's funny is that physicists have been able to do this kind of program transformation in a slightly clunky(?) way -- FORTRAN in, FORTRAN out -- since the 70s, supposedly.

But for now I think I agree with the designers that the best hope for good results, and really the most straightforward way, is to just do it in the compiler.

2 more replies

aylmao6y ago

I think it's mostly due to a couple of reasons:

So the functions are the values, and need to be changed, morphed, combined, split, etc. If a library wanted to do this, my guess is either:

a) devx would suffer since wouldn't be writing functions normally, but rather defining them as objects with verbose constructors, etc.

b) for the sake of devx, the library would have to do some hacky introspection and jump hoops to get to work on the functions themselves, not with them, at the cost of performance or debuggability.

mpweiher6y ago

>Why choose to carve out this added support for a single family

> of algorithms rather than add in some more general

> metaprogramming abilities that enable better EDSLs?

It is The Swift Way™

The answer is always to add something to the language and the compiler, the question seems to be largely irrelevant.

Of course, that's a consequence of not heeding Alan Kay's advice from 1998, that when you design a new language, you need to focus on the metasystem first, the rest will follow.

http://wiki.c2.com/?AlanKayOnMessaging

And of course, each additional hack makes the already not-to-elegant base language even more difficult for implementing stuff without hacking the language.

What kind of a metasystem would we need in order to not have to hack the language for all of these features? That, detective, is the right question!

https://www.youtube.com/watch?v=ZKxr0wyIic4

thedataangel6y ago

abeppu6y ago

mlevental6y ago

1 more reply

seanmcdirmid6y ago

abeppu6y ago

1 more reply

wenc6y ago· 9 in thread

I'm curious (for folks in the know): how does differentiable programming handle non-differentiable points? Can it detect non-differentiable/non-smooth functions?

Non-smooth functions like abs(), max(), min() have points where derivatives do not exist. ReLU functions are non-differentiable at their hinge points.

chillee6y ago

Other people have answered what they do, but this is the big gap between people talking about 'differentiable programming' in theory, and having it actually work in practice.

It's true that once you have control flow, the gradient quickly becomes meaningless. I posted an example here: https://news.ycombinator.com/item?id=20892287

That's not to say that there isn't cases where your program can be formulated to have a meaningful derivative.

See this differentiable ray tracer: https://people.csail.mit.edu/tzumao/diffrt/

taliesinb6y ago

> It's hard to reformulate programs in a way s.t. the derivative can mean something meaningful.

But there are vast, vast forests of numerical computation employed in industry, science, finance, engineering, where it is almost always defined.

And even for more “chunky” computations where the non-differentiability is more severe, there are algorithms like REINFORCE that you can use to estimate gradients through these parts.

chillee6y ago

For example, take this code.

     x,y
     for (int i=0; i<x; i++)
         y += 1
     return y

It's technically true that the gradient of 0 is correct (modulo boundaries). But if someone was trying to optimize this function, that's not very helpful.

taliesinb6y ago

abeppu6y ago

> randomly initialized networks

But "networks" here, you're thinking of ANNs, yes?

taliesinb6y ago

It’s a good question what the plans are for DP languages to handle situations where non-differentiability shouldn’t be ignored.

For sensitivity analysis it might be disastrous to conclude that an output is sensitive to an input when it is actually not, merely because an intermediary ReLU hit 0, for example.

mlevental6y ago

JustFinishedBSG6y ago

> on-smooth functions like abs(), max(), min() where derivatives do not exist

While these functions are not differentiable they are sub-differentiable. Which is an extension of differentiability.

Subderivatives are set, if the set only contain one point the function is differentiable. Otherwise it doesn't matter for gradient descent you can just use any element of the set.

> Disjoint IF-THEN-ELSE conditions are discontinuities in the function space

Same, it doesn't matter, they mathematically are equivalent to indicator functions and are subdifferentiable.

https://en.wikipedia.org/wiki/Subderivative

abeppu6y ago

hsaliak6y ago· 5 in thread

How does this compare to Jax? https://github.com/google/jax Why do this in the language instead of a library?

taliesinb6y ago

PS shout out for named array axes as the future of array-based (and hence differentiable) programming... see http://nlp.seas.harvard.edu/NamedTensor for a good rationale

hsaliak6y ago

jax does have a just in time compiler that lets you compile your python functions to XLA-optimized kernels (through llvmlite under the hood?).

The fact that you can jit compile and gain the benefits of "doing this within the compiler" is one of its main selling points.

mlevental6y ago

is there something in the proposal that addresses named tensors? ctrlf doesn't find anything.

optevo6y ago

If you were targeting a Lisp you could do this as a (macro) library

andrekandre6y ago

that’s basically the conversation that is being had in the linked thread:

should this be done as a special case in the compiler

shouldn’t this be a more general meta programming based feature

with those in favor of meta-programming also bringing up the potential complexity/speed loss added to the compiler for this feature

the problem for swift is, there hasn’t been a coherent meta-programming story (something like a meta-programming mega proposal) so it seems hard to push for that instead right now....

thedataangel6y ago· 4 in thread

This is actually huge. I saw a proof of concept of something like this in Haskell a few years back, but it's amazing it see it (probably) making it into the core of a mainstream language.

This may let them capture a large chunk of the ML market from Python - and hopefully greatly improve ML apis while they're at it.

krapht6y ago

mlevental6y ago

seanmcdirmid6y ago

> I saw a proof of concept of something like this in Haskell a few years back, but it's amazing it see it (probably) making it into the core of a mainstream language.

thedataangel6y ago

It was Conal Elliott's stuff, though it was a more recent paper - Compiling to Categories, if memory serves me.

cs7026y ago· 3 in thread

How does this relate to all the work that Christ Lattner et al have been doing at Google with Swift, MLIR, etc.?[a]

Is this... a separate, parallel, more encompassing proposal?

Is there any coordination between these two groups?

[a] https://www.youtube.com/watch?v=yCd3CzGSte8

samtheprogram6y ago

I don't know if "Christ" Lattner was intentional (humorous) or an accident, but I chuckled.

cs7026y ago

Accident.

adamnemecek6y ago

It's the same group.

__erik6y ago· 3 in thread

adamnemecek6y ago

Julia is in the same space.

taliesinb6y ago

Yup, work is continuing apace with Julia’s next-gen Zygote project.

Also, from the GP’s thought about applications beyond DL, my favorite examples so far are for model-based RL [1] and Neural ODEs [2]

[1] https://fluxml.ai/2019/03/05/dp-vs-rl.html

[2] https://julialang.org/blog/2019/01/fluxdiffeq

__erik6y ago

Julia is great but it doesnt play in the domain of apps and servers

3 more replies

taliesinb6y ago· 1 in thread

Donald6y ago

That's an excellent tutorial - more are available on http://www.deeplearningindaba.com/practicals-2019.html for those interested.

chriscaruso6y ago· 1 in thread

AbrahamParangi6y ago

I would counter that differentiable programming should perhaps rise to the level of baseline functionality that most languages should offer.

I think the applications for automatic differentiation and gradient optimization well exceed what we think of as ML and data science today.

antpls6y ago· 1 in thread

Can "Differentiable Programming" be related to "Differentiable Privacy", or have we now one word (and acronym!) to describe two different things?

omaranto6y ago

I think you meant "differential privacy", and no, it's not closely related to differentiable programming.

[1] https://en.wikipedia.org/wiki/Differential_privacy

taliesinb6y ago

You can find out more on the mailing list https://groups.google.com/a/tensorflow.org/forum/m/#!forum/s...

mark_l_watson6y ago

This will certainly help people who work on new deep learning theories and model architectures, but not so much the large crowd of deep learning practitioners.

akhilcacharya6y ago

layoutIfNeeded6y ago

Ew, that’s really not something I would do at the language level...

oyashius6y ago

This is super exciting

adamnemecek6y ago

Now do automatic integration.

sethryclaus6y ago

Well done Swift. Insightful language design.

j / k navigate · click thread line to collapse