- Only supports fully connected layers for now. No convnets or RNNs.
- Requires a GPU. No option to run on CPU, not even for development.
- Setup instructions for Ubuntu only. No Mac or Windows.
- Uses JSON to define the network architecture. Which limits what you can build.
- Takes in data in NetCDF format only.
- Very little documentation.
- The name is bad. I'm not going to remember how to spell DSSTNE.
It seems like a very early proof of concept. I wouldn't expect it to be useful to most people at this point. Built-in support for sparse vectors is interesting, but not a strong selling point by itself. I hope Amazon continues to develop it. Or, even better, contribute to one of the existing more mature frameworks."My belief is that we’re not going to get human-level abilities until we have systems that have the same number of parameters in them as the brain."
And you're right that it's a specialized framework/engine. But IMO making it more general purpose is a matter of cutting and pasting the right cuDNN code or we can double down on emphasizing sparse data. Amazon OSSed this partially IMO to see what people would want here.
An interesting quote.
Replicating functioning of the brain, or some major subsystem of it, is no doubt going to require far more than just billions of parameters. The cortex contains >15 billion neurons, but there are also the neurons contained in all the other brain structures. Furthermore, neurons connect via dense dendritic trees, the human brain having on the order of 100 trillion synapses.
Adding to the complexity, neurons have numerous "communication ports", including numerous pre- and postsynaptic neurotransmitter receptors, and a wide range of receptors for endocrine, immune system and other types of signals. Message propagation typically involves as well the layer of complex intracellular "second-messenger" transformations.
While it's highly probably future NNs will be developed that do even more amazing things than now possible, I think the challenge of equaling what real brains do is to say the least enormously daunting.
Somebody smarter than me could probably figure out the magnitude, how many nodes or weights it takes for a NN to function like the brain, though I imagine it will be a really impressive number.
Edit: typos
It's totally fine that it's a specialized framework, and it doesn't need to become general purpose. I just think the product description should do a better job positioning it and explaining what it's NOT intended for to set expectations correctly.
Agreed, it looks like a rushed response to TensorFlow.
My perception of Amazon is that they take everything from open-source but don't actively give back. Amazon and open-source never went hand-in-hand. Making their deep learning frameworks open-source is cool. Kudos to the team which managed to do this. I am sure internally, it must have been a huge struggle to get the approval from execs.
[Edit: Grammar]
===From Glassdoor===
Cons
====
The management process is abusive, and I'm currently a manager. I've seen too much "behind the wall" and hate how our individual performers can be treated. You are forced to ride people and stack rank employees...I've been forced to give good employees bad overall ratings because of politics and stack ranking. Advice to Management Don't pretend that the recent NY Times article was all about "isolated incidents". The culture IS abusive and it WILL backfire once stock value starts to drop. I'm an 8 year veteran and I no longer recommend former peers to interview with Amazon.
== [Edit: Formatted to make it clear what was pulled from Glassdoor]
I just joined so I really am not a statistically significant case but so far it's no where near what was in that NYT article.
Edit: I can't read apparently :) thanks heuving for clarifying and the commenter for reformatting
in my tenure at Amazon, I went from getting a 2 and PIP, then to a 4, then having my promotion held up because my VP didnt like me. Finally when I left to Google they offered SDE3, and another $15k a year. I didn't take that offer.
This was all back in the 2001-2006 timeframe. Sounds like nothing has changed.
There's nothing wrong with this. There's no contract when using open-source and this is probably how 99% of people interact with it.
Another example is topology =)
* College or 11th std in India is the same as 11th grade High school in the US.
1. DSSTNE was designed two years ago specifically for product recommendations from Amazon's catalog. At that time, there was no TensorFlow, only Theano and Torch. DSSTNE differentiated from these two frameworks by optimizing for sparse data and multi-GPU spanning neural networks. What it's not currently is another framework for running AlexNet/VGG/GoogleNet etc, but about 500 lines of code plus cuDNN could change that if the demand exists. Implementing Krizhevsky's one weird trick is mostly trivial since the harder model parallel part has already been written.
2. DSSTNE does not yet explicitly support RNNs, but it does have support for shared weights and that's more than enough to build an unrolled RNN. We tried a few in fact. CuDNN 5 can be used to add LSTM support in a couple hundred lines of code. But since (I believe) the LSTM in cuDNN is a black box, it cannot be spread across multiple GPUs. Not too hard to write from the ground up though.
3. There are a huge number of collaborators and people behind the scenes that made this happen. I'd love to acknowledge them openly, but I'm not sure they want their names known.
4. Say what you want about Amazon, and they're not perfect, but they let us build this from the ground up and now they have given it away. Google hired me away from NVIDIA (another one of those offers I couldn't refuse) OTOH blind-allocated me into search in 2011 and would not let me work with GPUs despite my being one of the founding members of NVIDIA's CUDA team because they had not yet seen them as useful. I didn't stay there long. DSSTNE is 100% fresh code, warts and all, and I think Amazon both for letting me work on a project like this and for OSSing the code.
5. NetCDF is a nice efficient format for big data files. What other formats would you suggest we support here?
6. I was boarding a plane when they finally released this. I will be benchmarking it in the next few days. TLDR spoilers: near-perfect scaling for hidden layers with 1000 or so hidden units per GPU in use, and effectively free sparse input layers because both activation and weight gradient calculation have custom sparse kernels.
7. The JSON format made sense in 2014, but IMO what this engine needs now is a TensorFlow graph importer. Since the engine builds networks from a rather simple underlying C struct, this isn't particularly hard, but it does require supporting some additional functionality to be 100% compatible.
8. I left Amazon 4 months ago after getting an offer I couldn't refuse. I was the sole GPU coder on this project. I can count the number of people I'd trust with an engine like this with two hands and most of them are already building deep learning engines elsewhere. I'm happy to add whatever functionality is desired here. CNN and RNN support seem like two good first steps and the spec already accounts for this.
8. Ditto for a Python interface, easily implemented IMO through the Python C/C++ extension mechanism: https://docs.python.org/2/extending/extending.html
Anyway, it's late, and it's turned out to be a fantastic day to see the project on which I spent nearly two years go OSS.
Let me comment on file formats as someone familiar with both netCDF and deep learning.
I agree that netCDF is a sane binary file format for this application. It's designed for efficient serialization of large arrays of numbers. One downside is that netCDF does not support streaming without writing the data to intermediate files on disk.
Keep in mind that netCDF v4 is itself just a thin wrapper around HDF5. Given that your input format is basically a custom file format written in netCDF, I would have just used HDF5 directly. The API is about as convenient, and this would skip one layer of indirection.
The native file format for TensorFlow is its own custom TFRecords file format, but it also supports a number of other file formats. TFRecords is much simpler technology than NetCDF/HDF5. It's basically just a bunch of serialized protocol buffers [1]. About all you can do with a TFRecords file is pull out examples -- it doesn't support the fancy multi-dimensional indexing or hierarchical structure of netCDF/HDF5. But that's also most of what you need for building machine learning models, and it's quite straightforward to read/write them in a streaming fashion, which makes it a natural fit for technologies like map-reduce.
[1] https://www.tensorflow.org/versions/r0.8/api_docs/python/pyt...
I do want the multi-dimensional indexing for RNN data though. Maybe support HDF5 directly is the path forward.
Thanks again!
Amazon DSSTNE - https://github.com/amznlabs/amazon-dsstne
Google TensorFlow - https://github.com/tensorflow/tensorflow/
Microsoft CNTK - https://github.com/Microsoft/CNTK/
Facebook fbcunn - https://github.com/facebook/fbcunn/
They all utilize C++11 or later. Just as Hadoop pushed Java in the big data, map-reduce realm, I think these libraries will push C++11 in the Deep Learning realm.
It's easily parallelizable on GPU's, or so the claim goes.
Its configuration language is much, much shorter than caffe's, but upon inspection it looks like that the configuration language is also much less flexible than caffe's and they implemented a damn sight less stuff. No recurrent anything, for example, or LSTM, no gating stuff that you would need if you were doing LSTM, no residual net stuff, just off the top of my head.
It looks like much, much less complete docs in comparison to TF and Theano and things. Note the probability of dropout given in the user docs, but the actual documentation for dropout feature is hidden away inside the repo.
The important thing, however, is that they claim that there's a significant improvement on doing training on extraordinarily sparse datasets, like recommender systems and things like that. It seems very specialized for that specific exact purpose: see only accepting NetCDF format data, which is common enough in climatology-land but less common in machine learning-land proper.
The test coverage... To a first approximation, there is no test coverage. It seems quite research project-y.
DSSTNE instead uses “model-parallel training”, where each layer of the network is split across the available GPUs so each operation just runs faster. Model-parallel training is harder to implement, but it doesn’t come with the same speed/accuracy trade-offs of data-parallel training.
https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md
RTFM: https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md