Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine (opens in new tab)

(github.com)

161 pointsj_juggernaut10y ago53 comments

53 comments

37 comments · 8 top-level

ktamura10y ago· 8 in thread

First TensorFlow and now this. Tensor is quickly becoming a mathematical-term-that-sounds-familiar-to-developers-but-most-don't-know-what-it-is-actually.

Another example is topology =)

vardhanw10y ago

When I entered college after high school* in India (around 1990), I was enamored by their library (my school didn't have one), and I was a math enthusiast (also ranked in a few state level math talent competitions). After being introduced to vectors (in math and physics) I chanced upon tensors - it seemed interesting. I found some good books in the catalog, and asked the librarian to issue one. He just refused to lend it to me, saying that it was a topic for "higher level/senior studies" (BSc/MSc). Unfortunately that time I did could not get any other source for it, so it remained sufficiently out of my radar that I never managed to get back to it. Surprisingly, looking back, it never got covered even in my engineering curriculum - probably because it was (is) considered a more Higher mathematics thing without much engineering application. Did come across it while scanning though relativity literature, but never attempted to understand it in depth. Now seems to be the time to do it!

* College or 11th std in India is the same as 11th grade High school in the US.

rdtsc10y ago

Other one is isomorphic. Anything that sounds sciency or mathy will be adopted. There is no other way ;-)

ryanobjc10y ago

my new programming language has isomorphic tensors built in as a first class language feature :-)

2 more replies

qrian10y ago

But can we say we know what vectors are though? As far as I know tensors are derived from vectors and I would imagine programmers don't know what vectors are in a mathematical sense.

llamaz10y ago

Tensors can work on other things apart from vector spaces (e.g. modules), but programmers don't know those either.

orm10y ago

functor is another one.

ecesena10y ago

field, group...

drauh10y ago

All I know about tensors is this: https://upload.wikimedia.org/math/8/7/2/87202e291e05ddf49049...

waleedka10y ago· 6 in thread

At a glance:

  - Only supports fully connected layers for now. No convnets or RNNs.

  - Requires a GPU. No option to run on CPU, not even for development. 

  - Setup instructions for Ubuntu only. No Mac or Windows.

  - Uses JSON to define the network architecture. Which limits what you can build.

  - Takes in data in NetCDF format only.

  - Very little documentation.

  - The name is bad. I'm not going to remember how to spell DSSTNE.

It seems like a very early proof of concept. I wouldn't expect it to be useful to most people at this point. Built-in support for sparse vectors is interesting, but not a strong selling point by itself. I hope Amazon continues to develop it. Or, even better, contribute to one of the existing more mature frameworks.

scottlegrand10y ago

It's more than that, and it's in use in production at Amazon. 8 TitanX GPUs can contain networks with up to 6 billion weights. As Geoffrey Hinton once said:

"My belief is that we’re not going to get human-level abilities until we have systems that have the same number of parameters in them as the brain."

And you're right that it's a specialized framework/engine. But IMO making it more general purpose is a matter of cutting and pasting the right cuDNN code or we can double down on emphasizing sparse data. Amazon OSSed this partially IMO to see what people would want here.

jrapdx310y ago

> "My belief is that we’re not going to get human-level abilities until we have systems that have the same number of parameters in them as the brain."

An interesting quote.

Replicating functioning of the brain, or some major subsystem of it, is no doubt going to require far more than just billions of parameters. The cortex contains >15 billion neurons, but there are also the neurons contained in all the other brain structures. Furthermore, neurons connect via dense dendritic trees, the human brain having on the order of 100 trillion synapses.

Adding to the complexity, neurons have numerous "communication ports", including numerous pre- and postsynaptic neurotransmitter receptors, and a wide range of receptors for endocrine, immune system and other types of signals. Message propagation typically involves as well the layer of complex intracellular "second-messenger" transformations.

While it's highly probably future NNs will be developed that do even more amazing things than now possible, I think the challenge of equaling what real brains do is to say the least enormously daunting.

Somebody smarter than me could probably figure out the magnitude, how many nodes or weights it takes for a NN to function like the brain, though I imagine it will be a really impressive number.

Edit: typos

2 more replies

waleedka10y ago

Thanks for the clarification. I'd change "early proof of concept" to "a specialized framework", but the other observations stand, I believe.

It's totally fine that it's a specialized framework, and it doesn't need to become general purpose. I just think the product description should do a better job positioning it and explaining what it's NOT intended for to set expectations correctly.

taneq10y ago

Why does JSON limit what you can build? Or do you just mean it only supports certain architectures because there are no options to specify other ones in JSON?

waleedka10y ago

Exactly what you said. The declarative approach is great for the common architectures, but if your requirements are different, and the JSON format doesn't have a way to declare it then you're stuck.

incepted10y ago

> It seems like a very early proof of concept.

Agreed, it looks like a rushed response to TensorFlow.

throwaway649710y ago· 5 in thread

Amazon is turning a new leaf. They stopped publishing to any major conferences after their last significant paper, DynamoDB.

My perception of Amazon is that they take everything from open-source but don't actively give back. Amazon and open-source never went hand-in-hand. Making their deep learning frameworks open-source is cool. Kudos to the team which managed to do this. I am sure internally, it must have been a huge struggle to get the approval from execs.

[Edit: Grammar]

throwaway649710y ago

For a second, a thought crossed my mind that Amazon is actively trying to change its external perception after the NY times article and is trying to cozy up to developers. I found this on Glassdoor. Apparently, it will take a long time for them to make their culture less toxic.

===From Glassdoor===

Cons

====

The management process is abusive, and I'm currently a manager. I've seen too much "behind the wall" and hate how our individual performers can be treated. You are forced to ride people and stack rank employees...I've been forced to give good employees bad overall ratings because of politics and stack ranking. Advice to Management Don't pretend that the recent NY Times article was all about "isolated incidents". The culture IS abusive and it WILL backfire once stock value starts to drop. I'm an 8 year veteran and I no longer recommend former peers to interview with Amazon.

== [Edit: Formatted to make it clear what was pulled from Glassdoor]

eranation10y ago

I just joined AWS ProServ and I really don't see any of these things. Pretty amazing team and one of the best work life balance I've seen in a tech company so far. I have 4 other friends who work at AWS and all seem very happy so far. I found the glass door comment and it seems to be from an engineering manager. I have a friend who manages one of the AWS products and he seems to be pretty happy.

I just joined so I really am not a statistically significant case but so far it's no where near what was in that NYT article.

Edit: I can't read apparently :) thanks heuving for clarifying and the commenter for reformatting

2 more replies

ryanobjc10y ago

amazon management is more likely to optimize for optics than actually fix the "problem". In bezos's mind the problem is the NYT article and the external reputation. The work culture isnt an accident nor is it intended to be eventually fixed.

in my tenure at Amazon, I went from getting a 2 and PIP, then to a 4, then having my promotion held up because my VP didnt like me. Finally when I left to Google they offered SDE3, and another $15k a year. I didn't take that offer.

This was all back in the 2001-2006 timeframe. Sounds like nothing has changed.

1 more reply

manigandham10y ago

> take everything from open-source but don't actively give back

There's nothing wrong with this. There's no contract when using open-source and this is probably how 99% of people interact with it.

barnacle_bill10y ago

You really think Amazon has something to contribute? A popular thing to do at Amazon is take a complex open source package, wrap it in a web server and announce your team has launched a revolutionary new PAAS. Or take an someone else's web service and build a new web service on top of it with minimal new features and more restrictions. Then announce it and hope for Jeff visibility.

scottlegrand10y ago· 4 in thread

Lead author of DSSTNE here...

1. DSSTNE was designed two years ago specifically for product recommendations from Amazon's catalog. At that time, there was no TensorFlow, only Theano and Torch. DSSTNE differentiated from these two frameworks by optimizing for sparse data and multi-GPU spanning neural networks. What it's not currently is another framework for running AlexNet/VGG/GoogleNet etc, but about 500 lines of code plus cuDNN could change that if the demand exists. Implementing Krizhevsky's one weird trick is mostly trivial since the harder model parallel part has already been written.

2. DSSTNE does not yet explicitly support RNNs, but it does have support for shared weights and that's more than enough to build an unrolled RNN. We tried a few in fact. CuDNN 5 can be used to add LSTM support in a couple hundred lines of code. But since (I believe) the LSTM in cuDNN is a black box, it cannot be spread across multiple GPUs. Not too hard to write from the ground up though.

3. There are a huge number of collaborators and people behind the scenes that made this happen. I'd love to acknowledge them openly, but I'm not sure they want their names known.

4. Say what you want about Amazon, and they're not perfect, but they let us build this from the ground up and now they have given it away. Google hired me away from NVIDIA (another one of those offers I couldn't refuse) OTOH blind-allocated me into search in 2011 and would not let me work with GPUs despite my being one of the founding members of NVIDIA's CUDA team because they had not yet seen them as useful. I didn't stay there long. DSSTNE is 100% fresh code, warts and all, and I think Amazon both for letting me work on a project like this and for OSSing the code.

5. NetCDF is a nice efficient format for big data files. What other formats would you suggest we support here?

6. I was boarding a plane when they finally released this. I will be benchmarking it in the next few days. TLDR spoilers: near-perfect scaling for hidden layers with 1000 or so hidden units per GPU in use, and effectively free sparse input layers because both activation and weight gradient calculation have custom sparse kernels.

7. The JSON format made sense in 2014, but IMO what this engine needs now is a TensorFlow graph importer. Since the engine builds networks from a rather simple underlying C struct, this isn't particularly hard, but it does require supporting some additional functionality to be 100% compatible.

8. I left Amazon 4 months ago after getting an offer I couldn't refuse. I was the sole GPU coder on this project. I can count the number of people I'd trust with an engine like this with two hands and most of them are already building deep learning engines elsewhere. I'm happy to add whatever functionality is desired here. CNN and RNN support seem like two good first steps and the spec already accounts for this.

8. Ditto for a Python interface, easily implemented IMO through the Python C/C++ extension mechanism: https://docs.python.org/2/extending/extending.html

Anyway, it's late, and it's turned out to be a fantastic day to see the project on which I spent nearly two years go OSS.

shoyer10y ago

Thanks for sharing your story!

Let me comment on file formats as someone familiar with both netCDF and deep learning.

I agree that netCDF is a sane binary file format for this application. It's designed for efficient serialization of large arrays of numbers. One downside is that netCDF does not support streaming without writing the data to intermediate files on disk.

Keep in mind that netCDF v4 is itself just a thin wrapper around HDF5. Given that your input format is basically a custom file format written in netCDF, I would have just used HDF5 directly. The API is about as convenient, and this would skip one layer of indirection.

The native file format for TensorFlow is its own custom TFRecords file format, but it also supports a number of other file formats. TFRecords is much simpler technology than NetCDF/HDF5. It's basically just a bunch of serialized protocol buffers [1]. About all you can do with a TFRecords file is pull out examples -- it doesn't support the fancy multi-dimensional indexing or hierarchical structure of netCDF/HDF5. But that's also most of what you need for building machine learning models, and it's quite straightforward to read/write them in a streaming fashion, which makes it a natural fit for technologies like map-reduce.

[1] https://www.tensorflow.org/versions/r0.8/api_docs/python/pyt...

scottlegrand10y ago

Thanks for that! And boy, I wish I had the resources the TensorFlow team has to build standards like this and also to write their own custom CUDA compiler.

I do want the multi-dimensional indexing for RNN data though. Maybe support HDF5 directly is the path forward.

Thanks again!

xiphias10y ago

Where do you wok now? It's interesting to hear what offer you couldn't refuse after being in so many places

zellyn10y ago

https://www.linkedin.com/in/scott-le-grand-b752111

1 more reply

nate_martin10y ago· 3 in thread

Maybe someone who works on deep learning could comment on what this provides vs other open source systems like theano, tensorflow, torch, etc.

curuinor10y ago

They claim it's twice as fast as tensorflow, which is not blow-you-out-of-the-water (compare to like 50x speedup from GPU on most places), but it's a solid speedup.

It's easily parallelizable on GPU's, or so the claim goes.

Its configuration language is much, much shorter than caffe's, but upon inspection it looks like that the configuration language is also much less flexible than caffe's and they implemented a damn sight less stuff. No recurrent anything, for example, or LSTM, no gating stuff that you would need if you were doing LSTM, no residual net stuff, just off the top of my head.

It looks like much, much less complete docs in comparison to TF and Theano and things. Note the probability of dropout given in the user docs, but the actual documentation for dropout feature is hidden away inside the repo.

The important thing, however, is that they claim that there's a significant improvement on doing training on extraordinarily sparse datasets, like recommender systems and things like that. It seems very specialized for that specific exact purpose: see only accepting NetCDF format data, which is common enough in climatology-land but less common in machine learning-land proper.

The test coverage... To a first approximation, there is no test coverage. It seems quite research project-y.

romerocesar10y ago

One important difference is model-parallel training. From the FAQ:

DSSTNE instead uses “model-parallel training”, where each layer of the network is split across the available GPUs so each operation just runs faster. Model-parallel training is harder to implement, but it doesn’t come with the same speed/accuracy trade-offs of data-parallel training.

https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md

gidim10y ago

They claim to perform much better on sparse data sets. "DSSTNE is much faster than any other DL package (2.1x compared to Tensorflow in 1 g2.8xlarge) for problems involving sparse data". It also has good support for distributing the computation over multiple GPUS. Theano for example can't do anything like that. On the other hand using JSON to design my models sound much worse than using a programming language.

vr369010y ago· 2 in thread

I get the acronym is easy to pronounce with the suggested word, but why not just use the suggested word (destiny) as the name instead of the acronym. So much easier to read and write. They could explain the name's origin in Readme.md

abtinf10y ago

"Destiny" would also be ungooglable.

oh_sigh10y ago

Meanwhile, DSSTNE is completely unmemorable, so even if you wanted to google it, you're going to end up typing "amazon destiny machine learning" or something

1 more reply

Giorgi10y ago· 1 in thread

Soo... what is the application for this (other than buzzwords)

romerocesar10y ago

srsly? all the discussion above 10hrs+ before your comment and that's your question?

RTFM: https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md

jbandela110y ago

Deep Learning systems are becoming C++11's halo projects. Here are some deep learning libraries from the Internet Big 4.

Amazon DSSTNE - https://github.com/amznlabs/amazon-dsstne

Google TensorFlow - https://github.com/tensorflow/tensorflow/

Microsoft CNTK - https://github.com/Microsoft/CNTK/

Facebook fbcunn - https://github.com/facebook/fbcunn/

They all utilize C++11 or later. Just as Hadoop pushed Java in the big data, map-reduce realm, I think these libraries will push C++11 in the Deep Learning realm.

j / k navigate · click thread line to collapse

53 comments

37 comments · 8 top-level

ktamura10y ago· 8 in thread

First TensorFlow and now this. Tensor is quickly becoming a mathematical-term-that-sounds-familiar-to-developers-but-most-don't-know-what-it-is-actually.

Another example is topology =)

vardhanw10y ago

* College or 11th std in India is the same as 11th grade High school in the US.

rdtsc10y ago

Other one is isomorphic. Anything that sounds sciency or mathy will be adopted. There is no other way ;-)

ryanobjc10y ago

my new programming language has isomorphic tensors built in as a first class language feature :-)

2 more replies

qrian10y ago

But can we say we know what vectors are though? As far as I know tensors are derived from vectors and I would imagine programmers don't know what vectors are in a mathematical sense.

llamaz10y ago

Tensors can work on other things apart from vector spaces (e.g. modules), but programmers don't know those either.

orm10y ago

functor is another one.

ecesena10y ago

field, group...

drauh10y ago

All I know about tensors is this: https://upload.wikimedia.org/math/8/7/2/87202e291e05ddf49049...

waleedka10y ago· 6 in thread

At a glance:

  - Only supports fully connected layers for now. No convnets or RNNs.

  - Requires a GPU. No option to run on CPU, not even for development. 

  - Setup instructions for Ubuntu only. No Mac or Windows.

  - Uses JSON to define the network architecture. Which limits what you can build.

  - Takes in data in NetCDF format only.

  - Very little documentation.

  - The name is bad. I'm not going to remember how to spell DSSTNE.

scottlegrand10y ago

It's more than that, and it's in use in production at Amazon. 8 TitanX GPUs can contain networks with up to 6 billion weights. As Geoffrey Hinton once said:

"My belief is that we’re not going to get human-level abilities until we have systems that have the same number of parameters in them as the brain."

jrapdx310y ago

> "My belief is that we’re not going to get human-level abilities until we have systems that have the same number of parameters in them as the brain."

An interesting quote.

Somebody smarter than me could probably figure out the magnitude, how many nodes or weights it takes for a NN to function like the brain, though I imagine it will be a really impressive number.

Edit: typos

2 more replies

waleedka10y ago

Thanks for the clarification. I'd change "early proof of concept" to "a specialized framework", but the other observations stand, I believe.

taneq10y ago

Why does JSON limit what you can build? Or do you just mean it only supports certain architectures because there are no options to specify other ones in JSON?

waleedka10y ago

Exactly what you said. The declarative approach is great for the common architectures, but if your requirements are different, and the JSON format doesn't have a way to declare it then you're stuck.

incepted10y ago

> It seems like a very early proof of concept.

Agreed, it looks like a rushed response to TensorFlow.

throwaway649710y ago· 5 in thread

Amazon is turning a new leaf. They stopped publishing to any major conferences after their last significant paper, DynamoDB.

[Edit: Grammar]

throwaway649710y ago

===From Glassdoor===

Cons

====

== [Edit: Formatted to make it clear what was pulled from Glassdoor]

eranation10y ago

I just joined so I really am not a statistically significant case but so far it's no where near what was in that NYT article.

Edit: I can't read apparently :) thanks heuving for clarifying and the commenter for reformatting

2 more replies

ryanobjc10y ago

This was all back in the 2001-2006 timeframe. Sounds like nothing has changed.

1 more reply

manigandham10y ago

> take everything from open-source but don't actively give back

There's nothing wrong with this. There's no contract when using open-source and this is probably how 99% of people interact with it.

barnacle_bill10y ago

scottlegrand10y ago· 4 in thread

Lead author of DSSTNE here...

3. There are a huge number of collaborators and people behind the scenes that made this happen. I'd love to acknowledge them openly, but I'm not sure they want their names known.

5. NetCDF is a nice efficient format for big data files. What other formats would you suggest we support here?

8. Ditto for a Python interface, easily implemented IMO through the Python C/C++ extension mechanism: https://docs.python.org/2/extending/extending.html

Anyway, it's late, and it's turned out to be a fantastic day to see the project on which I spent nearly two years go OSS.

shoyer10y ago

Thanks for sharing your story!

Let me comment on file formats as someone familiar with both netCDF and deep learning.

[1] https://www.tensorflow.org/versions/r0.8/api_docs/python/pyt...

scottlegrand10y ago

Thanks for that! And boy, I wish I had the resources the TensorFlow team has to build standards like this and also to write their own custom CUDA compiler.

I do want the multi-dimensional indexing for RNN data though. Maybe support HDF5 directly is the path forward.

Thanks again!

xiphias10y ago

Where do you wok now? It's interesting to hear what offer you couldn't refuse after being in so many places

zellyn10y ago

https://www.linkedin.com/in/scott-le-grand-b752111

1 more reply

nate_martin10y ago· 3 in thread

Maybe someone who works on deep learning could comment on what this provides vs other open source systems like theano, tensorflow, torch, etc.

curuinor10y ago

They claim it's twice as fast as tensorflow, which is not blow-you-out-of-the-water (compare to like 50x speedup from GPU on most places), but it's a solid speedup.

It's easily parallelizable on GPU's, or so the claim goes.

The test coverage... To a first approximation, there is no test coverage. It seems quite research project-y.

romerocesar10y ago

One important difference is model-parallel training. From the FAQ:

https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md

gidim10y ago

vr369010y ago· 2 in thread

abtinf10y ago

"Destiny" would also be ungooglable.

oh_sigh10y ago

Meanwhile, DSSTNE is completely unmemorable, so even if you wanted to google it, you're going to end up typing "amazon destiny machine learning" or something

1 more reply

Giorgi10y ago· 1 in thread

Soo... what is the application for this (other than buzzwords)

romerocesar10y ago

srsly? all the discussion above 10hrs+ before your comment and that's your question?

RTFM: https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md

jbandela110y ago

Deep Learning systems are becoming C++11's halo projects. Here are some deep learning libraries from the Internet Big 4.

Amazon DSSTNE - https://github.com/amznlabs/amazon-dsstne

Google TensorFlow - https://github.com/tensorflow/tensorflow/

Microsoft CNTK - https://github.com/Microsoft/CNTK/

Facebook fbcunn - https://github.com/facebook/fbcunn/

They all utilize C++11 or later. Just as Hadoop pushed Java in the big data, map-reduce realm, I think these libraries will push C++11 in the Deep Learning realm.

j / k navigate · click thread line to collapse