Introducing Amazon EC2 P3 Instances (opens in new tab)

(aws.amazon.com)

214 pointsaseidl8y ago96 comments

96 comments

81 comments · 18 top-level

DTE8y ago· 11 in thread

Hi guys, Dillon here from Paperspace (https://www.paperspace.com). We are a cloud that specializes in GPU infrastructure and software. We launched V100 instances a few days ago in our NY and CA regions and its much less expensive than AWS.

Think of us as the DigitalOcean for GPUs with a simple, transparent pricing and effortless setup & configuration:

AWS: $3.06/hr V100*

Paperspace: $2.30 /hr or $980/month for dedicated (effective hourly is only $1.3/hr)

Learn more here: https://www.paperspace.com/pricing

[Disclosure: I am one of the founders]

Veratyr8y ago

Your pricing page notably omits transfer pricing. Do you have free bandwidth between yourself and AWS/GCP/Azure or do you peer at any major exchanges?

Getting the data into and out of compute services is the most difficult part financially, at least in my experience.

dkobran8y ago

Dan here (also Paperspace team). Totally agree that transfer costs are a significant pain point which is why we do not charge for it. We can peer with other providers (eg with AWS we can leverage Direct Connect directly from our datacenters) but most of our customers don't implement this unless they're moving major traffic.

1 more reply

ryanlol8y ago

>Getting the data into and out of compute services is the most difficult part financially, at least in my experience.

You can never forget that this is entirely because of compute services ripping you off, not because they're providing a valuable service in return for the transfer pricing.

1 more reply

illumin88y ago

One of the biggest challenges with deep learning is training data. AWS makes loading large datasets easy with S3. What does Paperspace have to help with this? If I have to perform deep learning on multi-TB datasets in S3, any compute cost benefits get cancelled out by the increased data transfer cost from S3.

haldora8y ago

I've really enjoyed using your service, especially the cloud desktops. I use them for running Fusion 360 (windows only) from my ubuntu xps when I'm away from home.

Both the interface and GPU prices are fantastic.

Keep up the good work!

sspiff8y ago

This is great!

I'm looking for a way to run serverless (Amazon Lambda style) GPU operations (preferably using OpenCL). Are there any plans for such a service in your platform?

DTE8y ago

We have definitely been thinking a lot about what that would look like (i.e. is it more of a job architecture, an API, clustering, etc). Would love to hear your thoughts on what GPU Lambda might look like. Feel free to hit me up directly dillon [@] paperspace [dot] com if you want to continue the conversation :)

dylanz8y ago

We're adding sync support to Worker (which has GPU support) at Iron.io soon! This will allow you to run long running background jobs (current behavior) as well as sync serverless/faas Lambda-like functions within a single API.

jnbiche8y ago

DigitalOcean for GPUs, awesome! For someone wanting to play around learning more about machine learning, would one of your Standard GPU units be ideal? If so, which one would you recommend? (Or do you think I'd need a dedicated GPU unit?)

DTE8y ago

For ML/deep learning tasks you should definitely use a dedicated GPU. I would recommend our GPU+ (NVIDIA Quadro M4000/P4000) which has 8GB of VRAM and 1664 CUDA cores.

zamland8y ago

do you have spot instance pricing as well?

Smerity8y ago· 10 in thread

The P3 instances are the first widely and easily accessible machines that use the NVIDIA Tesla V100 GPUs. These GPUs are straight up scary in terms of firepower. To give an understanding of the speed-up compared to the P2 instances for a research project of mine:

+ P2 (K80) with single GPU: ~95 seconds per epoch

+ P3 (V100) with single GPU: ~20 seconds per epoch

Admittedly this isn't exactly fair for either GPU - the K80 cards are straight up ancient now and the Volta isn't sitting at 100% GPU utilization as it burns through the data too quickly ([CUDA kernel, Python] overhead suddenly become major bottlenecks). This gives you an indication of what a leap this is if you're using GPUs on AWS however. Oh, and the V100 comes with 16GB of (faster) RAM compared to the K80's 12GB of RAM, so you win there too.

For anyone using the standard set of frameworks (Tensorflow, Keras, PyTorch, Chainer, MXNet, DyNet, DeepLearning4j, ...) this type of speed-up will likely require you to do nothing - except throw more money at the P3 instance :)

If you really want to get into the black magic of speed-ups, these cards also feature full FP16 support, which means you can double your TFLOPS by dropping to FP16 from FP32. You'll run into a million problems during training due to the lower precision but these aren't insurmountable and may well be worth the pain for the additional speed-up / better RAM usage.

- Good overview of Volta's advantages compared to event the recent P100: https://devblogs.nvidia.com/parallelforall/inside-volta/

- Simple table comparing V100 / P100 / K40 / M40: https://www.anandtech.com/show/11367/nvidia-volta-unveiled-g...

- NVIDIA's V100 GPU architecture white paper: http://www.nvidia.com/object/volta-architecture-whitepaper.h...

- The numbers above were using my PyTorch code at https://github.com/salesforce/awd-lstm-lm and the Quasi-Recurrent Neural Network (QRNN) at https://github.com/salesforce/pytorch-qrnn which features a custom CUDA kernel for speed

agibsonccc8y ago

Great write up as usual! Could you elaborate more on the python overhead a bit? We have fp16 support running in dl4j but I don't think we've really done much with volta yet beyond get it working. In practice, (especially when we do multi gpu async back round loading of data) we find gpus being data starved. I would love to compare support for what you're seeing with pytorch.

Smerity8y ago

Honestly, I didn't spend enough time delving in to the Python overhead, especially in terms of the framework. Most of it would be an issue of my own causing however rather than the framework's. The original code I wrote was never written with data loading / saving in mind as the source for speed issues so I avoided what would have been premature optimization at the time.

Some of the slowdowns now just seem silly and aren't even listed in the per epoch timings: PyTorch doesn't have an asynchronous torch.save(). This means that if you save your model after each epoch, and the model save takes a few seconds, you're increasing your per epoch timings 5-10% just by saving the damn thing!

Regarding FP16, PyTorch supports, and there's even a pull request that updates the examples repo with FP16 support for language modeling and ImageNet. It's not likely to be merged as it greatly complicates a codebase that's meant primarily for teaching purposes but it's lovely to look at. I also think many of the FP16 issues will get a general wrapper and they'll become far more agnostic to the end user. For the most part they're all outlined in NVIDIA / Baidu's "Mixed Precision Training" paper. Might be useful for DeepLearning4j to go through the most common heavy throughput use cases and get them running (just as an example of how to work around issues really) if customers were using P100s/V100s?

I'm really interested in exploring the FP16 aspect as the QRNN, especially for single GPU, is sitting at basically 100% utilization, with almost all the time spent on matrix multiplications. FP16 is about the only way to speed it up at that stage. This gets a tad more complicated regardless as the CUDA kernel is not written in FP16 (and is not easy to do so) but even converting FP16->FP32->(QRNN element-wise CUDA kernel)->FP16 ("pseudo" FP16) should still be a crazy speedup. I tested that on the P100 and it took per epoch AWD-QRNN from ~28 seconds to ~18.

- PyTorch async save issue: https://github.com/pytorch/pytorch/issues/1567

- PyTorch FP16 examples pull request: https://github.com/pytorch/examples/pull/203

- "Mixed Precision Training": https://arxiv.org/abs/1710.03740

ablekh8y ago

Nice comment. In regard to your reference to reducing precision to FP16 for performance gains, you might want to read a recently published paper by Baidu Research and NVIDIA teams on mixed precision training of deep learning models (link to the paper is at the end of the following relevant post): https://www.nextplatform.com/2017/10/11/baidu-sheds-precisio.... Enjoy! :-)

mv48y ago

I've been using the P100 on Softlayer and was impressed. Looks like V100 may be 2..3x faster on some tasks, will be interesting to test it.

P.S. with that memory speed, it can probably run 300..400MH/s on ETH.

mamon8y ago

Genuinely curious: Given that Softlayer bare metal server prices start at 700$ per month is there even remote chance of this actually being profitable?

2 more replies

Beltiras8y ago

I can't easily find pricing information on the P3 instances. Have you come across a simple table with the prices?

detaro8y ago

On-demand prices from amazons pricing page https://aws.amazon.com/ec2/pricing/on-demand/ (select Virginia region):

p3.2xlarge: 8 vCPU, 61 GB RAM, $3.06/h

p3.8xlarge: 32 vCPU, 244 GB, $12.24/h

p3.16xlarge: 64v CPU, 488 GB., $24.48/h

joelhaasnoot8y ago

Unfortunately, P3 isn't listed yet, but this is my go to site for EC2 pricing: http://www.ec2instances.info/

1 more reply

smn12348y ago

showing up for me on https://aws.amazon.com/ec2/pricing/ for each of On-Demand Instances, Reserved Instances, Spot Instances, and Dedicated Hosts pricing lists. Are you selecting the regions where this is available - US East (N. Virginia), US West (Oregon), EU West (Ireland) and Asia Pacific (Tokyo)?

1 more reply

ablekh8y ago

Oops, just saw that you referenced the same paper in a comment below. Sorry! :-)

science4048y ago· 7 in thread

Why Ireland and not the UK? I can imagine a lot of startups/banks in London could use this... Brexit fears?

maffydub8y ago

I wouldn't read too much into this - Amazon's Ireland region was deployed earlier (2008?) than London (2016?) and seems to receive updates earlier too.

remus8y ago

London only came online relatively recently, maybe there's some operational stuff getting in the way of deploying? Or perhaps London has relatively few users at the moment, so the number of clients who will be able to take advantage of more specialised instances is also relatively low?

tt2938y ago

Because Ireland is so close to the UK that the latency will not matter?

rsynnott8y ago

Ireland is one of the older and larger regions, and generally gets new stuff at the same time as Virginia or very shortly afterwards. London and Frankfurt tend to be delayed a bit. See https://aws.amazon.com/about-aws/global-infrastructure/regio...

maxehmookau8y ago

The London Region is really new. Amazon are using it as a way in to UK-only projects (specifically healthcare due to NHS regulations). I suspect their datacentres are considerably smaller than that of Ireland along with their client-base for the moment. I wouldn't read too much in to it, brexit-wise.

moonbug228y ago

London region is tiny and not the region to go to unless you have specific geolocation requirements.

puzzle8y ago

The Dublin data center is probably much larger than the one in London, for starters.

jeffbarr8y ago· 6 in thread

More details in my blog post at https://aws.amazon.com/blogs/aws/new-amazon-ec2-instances-wi...

mcherm8y ago

I thought the comparison of 1 second of computation today to the lifetime computation of older computers (since they were released) was clever.

SloopJon8y ago

This post states, "In order to take full advantage of the NVIDIA Tesla V100 GPUs and the Tensor cores, you will need to use CUDA 9 and cuDNN7." What version of TensorFlow does it use? From what I can tell, TensorFlow doesn't fully support the latest versions yet.

sumt8y ago

You can use the new AWS Deep Learning AMI which has a version of TensorFlow enhanced for CUDA 9 and Volta support https://aws.amazon.com/blogs/ai/announcing-new-aws-deep-lear...

sipherhex8y ago

Chris from NV here. You can also get a full compliment of DL framework containers, as well as CUDA 9/CuDNN 7/NCCL 2 base container, optimized for Volta by NVIDIA via this AMI https://aws.amazon.com/marketplace/pp/B076K31M1S?qid=1509089...

1 more reply

puzzle8y ago

TF 1.4 does, but you need to build it yourself. RC1 is out:

https://github.com/tensorflow/tensorflow/releases/tag/v1.4.0...

  All our prebuilt binaries have been built with CUDA 8 and cuDNN 6.
  We anticipate releasing TensorFlow 1.5 with CUDA 9 and cuDNN 7.

ZeroCool2u8y ago

Thanks Jeff, I forwarded your blog post to our Chief Scientist.

psychometry8y ago· 6 in thread

Random question: Why are we still using mostly GPUs for computation rather than CPUs custom-designed for ML tasks?

zolthrowaway8y ago

GPUs are quite good at doing arithmetic in parallel. A large part of machine learning is doing arithmetic on large data sets. It makes sense to do these operations in parallel. For example, implementing k-nearest neighbors on a GPU is almost 2 orders of magnitude faster than on a CPU[0].

GPUs just work very well when you have a a lot of data and you are able to run the operations on the data set in parallel. Machine learning seems to fit this model quite well which is why you see many GPUs used in this field. Other things that take advantage of parallelism would be graphics and crypto-currency mining.

[0] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.159...

fooker8y ago

The definition of an 'ML task' tends to change.

scott_karana8y ago

If you want to offer PaaS with FPGAs or ASICs, by all means. I'm sure there'd be some interest :)

ML might be a bit of a moving target though.

dmoy8y ago

For some definition of "we", we are not.

Diederich8y ago

Can you expand on that?

oh-kumudo8y ago

Why not?

dharma18y ago· 4 in thread

Price: p3.2xlarge - $3/hr, p3.8xlarge - $12/hr, p3.16xlarge - $25/hr

These look very good for half precision training

moonbug228y ago

Come on, no one with any sense pays the on demand price for these things. Watch the spots.

maffydub8y ago

Yes, p3.2xlarge in us-east-1b is currently sitting at $0.3204 spot. That's only marginally more than p2.xlarge (at $0.2259 in us-east-1e).

I'm sure this will change with demand, though. :(

dx0348y ago

There are enough companies out there with deep pockets that want to do some ML. They'll pay pay those prices, no questions asked.

2 more replies

dharma18y ago

Yep agreed. Didn't want to post the spot pricing since it changes all the time :)

moconnor8y ago· 4 in thread

An exaflop of mixed-precision compute for $250M over 3 years. That’s ballpark what the HPC community is paying for their exaflop-class machines.

You’d still build your own for that money, I think, but it’s an interesting datapoint.

dx0348y ago

How long if you build it your own incl electricity prices? If margins are similar to other EC2 instances, you'd probably break-even after 6 months or so. Which makes EC2 uneconomical for any lab/company that can utilise the cluster 24/7.

Still nice if you quickly need to get some model results though.

laumars8y ago

Amazon prices are for the pay as you go model. You can shave a significant amount off the price if you know you're going to be running them for 12 months.

1 more reply

dharma18y ago

If you're going to be running it for 24x7 for 3 years, I think it'd be worth doing the apples-to-apples comparison of buying your own V100s vs renting them from AWS. The DGX Station with 4 V100s is $70k

dexterdog8y ago

It wouldn't be $250K over 3 years. It would be $250K up-front to get the lowest current pricing.

sethgecko8y ago· 3 in thread

Is there an AMI that comes with Tensorflow/keras with GPU support preinstalled or you have to do it yourself?

Smerity8y ago

Amazon offer an official AMI which comes preloaded with various deep learning frameworks: MXNet, TensorFlow, CNTK, Caffe/2, Theano, Torch and Keras.

For the P3 (Volta V100) instances you'll want to ensure you use an AMI preloaded with CUDA 9, though not all DL frameworks are happy with that yet.

https://aws.amazon.com/amazon-ai/amis/

sipherhex8y ago

Be careful with the non-CUDA 9 AMIs.

CUDA 8 programs will run, but terribly slowly as they JIT their GPU code without optimization for Volta. You want the CUDA 9 AMI version (https://aws.amazon.com/marketplace/pp/B076TGJHY1?qid=1509090...), but it currently only has MXNet and TF.

If you need other frameworks there's the NVIDIA AMI (https://aws.amazon.com/marketplace/pp/B076K31M1S?qid=1509090...) and Volta optimized containers for NVCaffe, Caffe2, CNTK, Digits, MXNet, PyTorch, TensorFlow, Theano, Torch, CUDA 9/CuDNN7/NCCL.

mv48y ago

Didn't know they had an AMI like that. Thank you.

plantain8y ago· 2 in thread

But where are the C5 instances? It's been 11 months since Amazon announced Skylake C5's and we're still waiting!

https://aws.amazon.com/about-aws/whats-new/2016/11/coming-so...

STRML8y ago

Waiting for them as well. Most of all, we really need fast-CPU instances with the ENA, not the Intel NIC.

jsolson8y ago

Out of professional curiosity, what are you looking for from ENA?

(I'm an engineer on Google Compute Engine with a deep interest in customer networking use stories, particularly heavy utilization customers, even if they're not my customers :)

eggie58y ago· 2 in thread

Here's my results:

Testing new Tesla V100 on AWS. Fine-tuning VGG on DeepSent dataset for 10 epochs.

GRID 520K (4GB) (baseline):

* 780s/epoch @ minibatch 8 (GPU saturated)

V100(16Gb):

* 30s/epoch @ minibatch 8 (GPU not saturated)

* 6s/epoch @ minibatch 32 (GPU more saturated)

* 6s/epoch @ minibatch 256 (GPU saturated)

dharma18y ago

Thanks! Curious how this would scale on the 8x or 16x instances

eggie58y ago

what do you mean? 8 or 16 GPUs? That's require changing the code to use distributed tensorflow...

1 more reply

corford8y ago· 2 in thread

Hmm just tried to spool up a p3.2xlarge in Ireland but hit an instance limit check (it's set at 0), went to request a service limit increase but P3 instances are not listed in the drop down box :(

avvakum8y ago

Same problem here and it does not seem to be zone specific. I wonder how others worked around this ...

corford8y ago

Maybe by being bigger customers... :)

1 more reply

bprasanna8y ago· 2 in thread

...advanced workloads such as machine learning (ML), high performance computing (HPC), data compression, and crypto__________.

Yuioup8y ago

How many bitcoins can you mine out of this on max power and would it be profitable? I'm sure that Amazon has done the math on this but I'm still curious.

geofft8y ago

It's not just that Amazon has done the math, it's that sufficiently liquid cryptocurrencies will, by the efficient market hypothesis, quickly gain enough value to make mining on whatever Amazon offers no longer profitable. As soon as you're able to profitably mine without an up-front capital investment, people will take advantage of the arbitrage opportunity until the market adjusts its price, and if the currency is designed at least somewhat competently and has enough of a working market (both of which are definitely true of Bitcoin), that won't take very long.

Cryptocurrencies are the invisible robot hand of the market. (Which is, I think, not a claim about whether they're good, but certainly a claim about whether they are to be feared. If you squint hard enough, the giant Bitcoin mines in China are the work of an unfriendly AI employing people to make paperclips.)

mamon8y ago· 1 in thread

Slightly off-topic but I'm curious: Nvidia Volta is advertised as having "tensor cores" - what does it take for a programmer to use them? Will typical Tensorflow or Cafe code take advantage of it? Or should we wait for some new optimized version of ML frameworks?

exDM698y ago

> Will typical Tensorflow or Cafe code take advantage of it?

Yes, the support should already be there for both frameworks.

againa8y ago· 1 in thread

Use reserve instances or use spot. The price decrease substantially. Then when you don’t need it... you don’t pay it... it’s a good deal

jerianasmith8y ago

yaah

jerianasmith8y ago· 1 in thread

P3 instances no doubt provide a powerful platform and is going to be useful for data compression.

arnon8y ago

And GPU databases that use compression will gain another big advantage

g105b8y ago· 1 in thread

Bitcoin?

pyvpx8y ago

bitcoin mining at any hope of profitability comes with custom, specific ASICs.

kshnell8y ago

Looks like Paperspace announced Volta support yesterday: https://blog.paperspace.com/tesla-v100-available-today/ One nice thing here is you can do monthly plans instead of reserved on AWS which is a minimum $8-17k upfront. Really great to see the cloud providers adopting modern GPUs.

JeanMarcS8y ago

If ever you've got password hash to decrypt :)

j / k navigate · click thread line to collapse

96 comments

81 comments · 18 top-level

DTE8y ago· 11 in thread

Think of us as the DigitalOcean for GPUs with a simple, transparent pricing and effortless setup & configuration:

AWS: $3.06/hr V100*

Paperspace: $2.30 /hr or $980/month for dedicated (effective hourly is only $1.3/hr)

Learn more here: https://www.paperspace.com/pricing

[Disclosure: I am one of the founders]

Veratyr8y ago

Your pricing page notably omits transfer pricing. Do you have free bandwidth between yourself and AWS/GCP/Azure or do you peer at any major exchanges?

Getting the data into and out of compute services is the most difficult part financially, at least in my experience.

dkobran8y ago

1 more reply

ryanlol8y ago

>Getting the data into and out of compute services is the most difficult part financially, at least in my experience.

You can never forget that this is entirely because of compute services ripping you off, not because they're providing a valuable service in return for the transfer pricing.

1 more reply

illumin88y ago

haldora8y ago

I've really enjoyed using your service, especially the cloud desktops. I use them for running Fusion 360 (windows only) from my ubuntu xps when I'm away from home.

Both the interface and GPU prices are fantastic.

Keep up the good work!

sspiff8y ago

This is great!

I'm looking for a way to run serverless (Amazon Lambda style) GPU operations (preferably using OpenCL). Are there any plans for such a service in your platform?

DTE8y ago

dylanz8y ago

jnbiche8y ago

DTE8y ago

For ML/deep learning tasks you should definitely use a dedicated GPU. I would recommend our GPU+ (NVIDIA Quadro M4000/P4000) which has 8GB of VRAM and 1664 CUDA cores.

zamland8y ago

do you have spot instance pricing as well?

Smerity8y ago· 10 in thread

+ P2 (K80) with single GPU: ~95 seconds per epoch

+ P3 (V100) with single GPU: ~20 seconds per epoch

- Good overview of Volta's advantages compared to event the recent P100: https://devblogs.nvidia.com/parallelforall/inside-volta/

- Simple table comparing V100 / P100 / K40 / M40: https://www.anandtech.com/show/11367/nvidia-volta-unveiled-g...

- NVIDIA's V100 GPU architecture white paper: http://www.nvidia.com/object/volta-architecture-whitepaper.h...

agibsonccc8y ago

Smerity8y ago

- PyTorch async save issue: https://github.com/pytorch/pytorch/issues/1567

- PyTorch FP16 examples pull request: https://github.com/pytorch/examples/pull/203

- "Mixed Precision Training": https://arxiv.org/abs/1710.03740

ablekh8y ago

mv48y ago

I've been using the P100 on Softlayer and was impressed. Looks like V100 may be 2..3x faster on some tasks, will be interesting to test it.

P.S. with that memory speed, it can probably run 300..400MH/s on ETH.

mamon8y ago

Genuinely curious: Given that Softlayer bare metal server prices start at 700$ per month is there even remote chance of this actually being profitable?

2 more replies

Beltiras8y ago

I can't easily find pricing information on the P3 instances. Have you come across a simple table with the prices?

detaro8y ago

On-demand prices from amazons pricing page https://aws.amazon.com/ec2/pricing/on-demand/ (select Virginia region):

p3.2xlarge: 8 vCPU, 61 GB RAM, $3.06/h

p3.8xlarge: 32 vCPU, 244 GB, $12.24/h

p3.16xlarge: 64v CPU, 488 GB., $24.48/h

joelhaasnoot8y ago

Unfortunately, P3 isn't listed yet, but this is my go to site for EC2 pricing: http://www.ec2instances.info/

1 more reply

smn12348y ago

1 more reply

ablekh8y ago

Oops, just saw that you referenced the same paper in a comment below. Sorry! :-)

science4048y ago· 7 in thread

Why Ireland and not the UK? I can imagine a lot of startups/banks in London could use this... Brexit fears?

maffydub8y ago

I wouldn't read too much into this - Amazon's Ireland region was deployed earlier (2008?) than London (2016?) and seems to receive updates earlier too.

remus8y ago

tt2938y ago

Because Ireland is so close to the UK that the latency will not matter?

rsynnott8y ago

maxehmookau8y ago

moonbug228y ago

London region is tiny and not the region to go to unless you have specific geolocation requirements.

puzzle8y ago

The Dublin data center is probably much larger than the one in London, for starters.

jeffbarr8y ago· 6 in thread

More details in my blog post at https://aws.amazon.com/blogs/aws/new-amazon-ec2-instances-wi...

mcherm8y ago

I thought the comparison of 1 second of computation today to the lifetime computation of older computers (since they were released) was clever.

SloopJon8y ago

sumt8y ago

You can use the new AWS Deep Learning AMI which has a version of TensorFlow enhanced for CUDA 9 and Volta support https://aws.amazon.com/blogs/ai/announcing-new-aws-deep-lear...

sipherhex8y ago

1 more reply

puzzle8y ago

TF 1.4 does, but you need to build it yourself. RC1 is out:

https://github.com/tensorflow/tensorflow/releases/tag/v1.4.0...

  All our prebuilt binaries have been built with CUDA 8 and cuDNN 6.
  We anticipate releasing TensorFlow 1.5 with CUDA 9 and cuDNN 7.

ZeroCool2u8y ago

Thanks Jeff, I forwarded your blog post to our Chief Scientist.

psychometry8y ago· 6 in thread

Random question: Why are we still using mostly GPUs for computation rather than CPUs custom-designed for ML tasks?

zolthrowaway8y ago

[0] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.159...

fooker8y ago

The definition of an 'ML task' tends to change.

scott_karana8y ago

If you want to offer PaaS with FPGAs or ASICs, by all means. I'm sure there'd be some interest :)

ML might be a bit of a moving target though.

dmoy8y ago

For some definition of "we", we are not.

Diederich8y ago

Can you expand on that?

oh-kumudo8y ago

Why not?

dharma18y ago· 4 in thread

Price: p3.2xlarge - $3/hr, p3.8xlarge - $12/hr, p3.16xlarge - $25/hr

These look very good for half precision training

moonbug228y ago

Come on, no one with any sense pays the on demand price for these things. Watch the spots.

maffydub8y ago

Yes, p3.2xlarge in us-east-1b is currently sitting at $0.3204 spot. That's only marginally more than p2.xlarge (at $0.2259 in us-east-1e).

I'm sure this will change with demand, though. :(

dx0348y ago

There are enough companies out there with deep pockets that want to do some ML. They'll pay pay those prices, no questions asked.

2 more replies

dharma18y ago

Yep agreed. Didn't want to post the spot pricing since it changes all the time :)

moconnor8y ago· 4 in thread

An exaflop of mixed-precision compute for $250M over 3 years. That’s ballpark what the HPC community is paying for their exaflop-class machines.

You’d still build your own for that money, I think, but it’s an interesting datapoint.

dx0348y ago

Still nice if you quickly need to get some model results though.

laumars8y ago

Amazon prices are for the pay as you go model. You can shave a significant amount off the price if you know you're going to be running them for 12 months.

1 more reply

dharma18y ago

dexterdog8y ago

It wouldn't be $250K over 3 years. It would be $250K up-front to get the lowest current pricing.

sethgecko8y ago· 3 in thread

Is there an AMI that comes with Tensorflow/keras with GPU support preinstalled or you have to do it yourself?

Smerity8y ago

Amazon offer an official AMI which comes preloaded with various deep learning frameworks: MXNet, TensorFlow, CNTK, Caffe/2, Theano, Torch and Keras.

For the P3 (Volta V100) instances you'll want to ensure you use an AMI preloaded with CUDA 9, though not all DL frameworks are happy with that yet.

https://aws.amazon.com/amazon-ai/amis/

sipherhex8y ago

Be careful with the non-CUDA 9 AMIs.

mv48y ago

Didn't know they had an AMI like that. Thank you.

plantain8y ago· 2 in thread

But where are the C5 instances? It's been 11 months since Amazon announced Skylake C5's and we're still waiting!

https://aws.amazon.com/about-aws/whats-new/2016/11/coming-so...

STRML8y ago

Waiting for them as well. Most of all, we really need fast-CPU instances with the ENA, not the Intel NIC.

jsolson8y ago

Out of professional curiosity, what are you looking for from ENA?

(I'm an engineer on Google Compute Engine with a deep interest in customer networking use stories, particularly heavy utilization customers, even if they're not my customers :)

eggie58y ago· 2 in thread

Here's my results:

Testing new Tesla V100 on AWS. Fine-tuning VGG on DeepSent dataset for 10 epochs.

GRID 520K (4GB) (baseline):

* 780s/epoch @ minibatch 8 (GPU saturated)

V100(16Gb):

* 30s/epoch @ minibatch 8 (GPU not saturated)

* 6s/epoch @ minibatch 32 (GPU more saturated)

* 6s/epoch @ minibatch 256 (GPU saturated)

dharma18y ago

Thanks! Curious how this would scale on the 8x or 16x instances

eggie58y ago

what do you mean? 8 or 16 GPUs? That's require changing the code to use distributed tensorflow...

1 more reply

corford8y ago· 2 in thread

Hmm just tried to spool up a p3.2xlarge in Ireland but hit an instance limit check (it's set at 0), went to request a service limit increase but P3 instances are not listed in the drop down box :(

avvakum8y ago

Same problem here and it does not seem to be zone specific. I wonder how others worked around this ...

corford8y ago

Maybe by being bigger customers... :)

1 more reply

bprasanna8y ago· 2 in thread

...advanced workloads such as machine learning (ML), high performance computing (HPC), data compression, and crypto__________.

Yuioup8y ago

How many bitcoins can you mine out of this on max power and would it be profitable? I'm sure that Amazon has done the math on this but I'm still curious.

geofft8y ago

mamon8y ago· 1 in thread

exDM698y ago

> Will typical Tensorflow or Cafe code take advantage of it?

Yes, the support should already be there for both frameworks.

againa8y ago· 1 in thread

Use reserve instances or use spot. The price decrease substantially. Then when you don’t need it... you don’t pay it... it’s a good deal

jerianasmith8y ago

yaah

jerianasmith8y ago· 1 in thread

P3 instances no doubt provide a powerful platform and is going to be useful for data compression.

arnon8y ago