I updated my Keras container with the TF 1.5 RC, CUDA 9, and cuDNN 7 (https://github.com/minimaxir/keras-cntk-docker), but did not notice a significant speed increase on a K80 GPU (I'm unsure if Keras makes use of FP16 yet either).
The other two main features are: Eager execution and TensorFlow Lite
When eager execution is enabled, you no longer need to worry about graphs: operations are executed immediately. The upshot is that eager execution lets you implement dynamic models, like recursive NNs, using Python control flow. We've published some example implementations of such models on Github:
https://github.com/tensorflow/tensorflow/tree/master/tensorf...
I'd be happy to answer other questions about eager execution, and feedback is welcome.
EDIT: Just because you don't have to worry about graphs doesn't mean that graph construction and eager execution aren't related; take a look at our research blog post for more information if you're curious about the ways in which they relate to each other (https://research.googleblog.com/2017/10/eager-execution-impe...).
I think I'm either going to change my workflow and use another OS or switch fully to PyTorch.
(I still used CUDA 8, but 9 should also work. You just need to find the version of the command line tools it works with)
(How good is openCL when it comes to this sort of stuff? Could they support it without crazy effort?)
I guess we never know what's running on our cloud instances.