GCP (opens in new tab)

(gpu.land)

168 pointsilmoi5y ago93 comments

93 comments

79 comments · 27 top-level

ilmoiOP5y ago· 20 in thread

I’m a self taught ML engineer, and when I was starting on my ML journey I was incredibly frustrated by cloud services like AWS/GCP. a)they were super expensive, b)it took me longer to setup a working GPU instance than to learn to build my first model!

So I built https://gpu.land/.

It’s a simple service that only does one thing: rents out Tesla V100s in the cloud.

Why is it awesome?

- It’s dirt-cheap. You get a Tesla V100 for $0.99/hr, which is 1/3 the cost of AWS/GCP/Azure/[insert big cloud name].

- It’s dead simple. It takes 2mins from registration to a launched instance. Instances come pre-installed with everything you need for Deep Learning, including a 1-click Jupyter server.

- It sports a retro, MS-DOS-like look. Because why not:)

The most common question I get is - how is this so cheap? The answer is because AWS/GCP are charging you a huge markup and I’m not. In fact I’m charging just enough to break even, and built this project really to give back to community (and to learn some of the tech in the process).

HN special: email me a few lines about yourself and what you’re working on and get $10 in free credit. I’m at hi@gpu.land.

Otherwise I’m around for any questions!

etaioinshrdlu5y ago

Can you use GeForce cards and then make us pinky swear to only use it for blockchain processing applications? This is what LeaderGPU does (https://www.leadergpu.com/#chose-best), and they had even lower pricing.

Although lately they have had demand outstrip supply.

Hetzner used to have GTX 1080 instances for about $100 a month, no longer though, and I'm lucky to be grandfathered in to about 8 of them.

I told myself a couple years ago that compute will only get cheaper over time. In reality, compute has gotten MORE expensive over time! I cannot match or scale the compute price I locked in a couple years ago with Hetzner. Some of that is the crypto market raising GPU prices, but it is also NVIDIA's licensing making their cheaper cards unavailable in cloud servers...

ghoomketu5y ago

For somebody who was 0 knowledge of ML but has always been wary of the huge costs involved this sure looks like a great offering!

But I still don't know what it would cost to get something useful out of it. Can you (or anybody who knows about ML) tell me a very very ballpark amount of what costs it incurs to train a model like www.remove.bg that automatically removes background from photos? I'm not trying to build a clone but curios what sort ot financial investment it takes to make such things.

perennate5y ago

Unless you're using unsupervised learning or can find a good dataset, most of the cost will be in labeling the data. I'm not too familiar with background removal, there may be some self-supervised/contrastive learning approaches, but generally they don't work as well as supervised learning. Even for a week of training, the compute cost is only $200.

Edit: maybe you can get some decent results just with COCO segmentation labels: https://towardsdatascience.com/background-removal-with-deep-...

1 more reply

benohear5y ago

This is all very cool, and FWIW I also love the retro design.

Sorry if I missed it, but one thing I couldn't find on the website is what legal entity is behind the service. It would be important for me and the organisations I work with to know who we are trusting with our data.

It might even be a legal obligation to have that info on the site depending on which country you are located in (Germany for sure, not sure about others).

ilmoiOP5y ago

Hey - for sure. Because I'm the only team member, the entity behind the business is registered as a "sole trader". If you email me on hi@gpu.land I'm happy to share the details.

ashish015y ago

Nice! I really like the top up feature. Really gives me peace of mind that I will not blow up my budget accidentally. I have been using https://datacrunch.io/ which has almost same feature set.

typon5y ago

Hey ilmoi,

Do you support multi-node training? Particularly, can I reserve for example a 64-GPU instance via 8 nodes and perform distributed training?

ilmoiOP5y ago

You can! All machines within a single account are inside a private VLAN, which means they all talk to each other (out of the box, no setup required), but nobody else on gpu.land sees them.

See the "Are instances within my account connected?" item in the FAQ - https://gpu.land/faq

If you're going to need 64 GPUs I'll have to increase your account limit (currently 16 GPUs per account). Email me at hi@gpu.land

liuliu5y ago

Thank you! This looks indeed cheap.

Can you share more stories on data persistence / checkpointing? If I have a job that requires 8 V100 with 3 days, what types of reliability am I looking at?

joecot5y ago

I'm not very familiar with machine learning setups, but in general with hosts you trade SLA for pricing. You trade the 99.999% uptime and data consistency of hyperscaled hosts for cheaper pricing at smaller hosts. Assuming you can backup your data at checkpoints, consider running it on a setup like this for very cheap, and then back it up every hour/day to a consistently reliable data storage. AWS EC2 is very expensive for computing, but S3 is relatively cheap for storage. Backblaze B2 storage is also even cheaper, but with less guaranteed reliability than S3. But the odds of both this going down on you and Backblaze failing at the same time are pretty low.

I run a credit card processor on AWS, my important personal websites on Linode, and my fun time websites and video conferencing on a small Chicago host called Genesis Hosting with a website out of the 90s and dirt cheap pricing (but excellent support). Match your price and SLA with how much pain it's going to cause you if it goes down, and don't put pay extra to put all your stuff on the same 5 9s host if you don't really have to.

ilmoiOP5y ago

Exactly the same as you would be had you been running an EC2 instance at AWS. The machines are hosted, maintained and managed in exactly the same way in a whitelabel DC.

1 more reply

mraza0075y ago

Hey I loved the product you made. As a student who is always concerned about the spending on cloud services as they can be pricey but I’m just curious I have used google Colab in the past and it charges $10/Month so I wanted to know how is it different when compared to google colab

I would love to hear your thoughts on this

ilmoiOP5y ago

Hey!

So the way I would describe your progression as a student (at least from my experience):

- Traditional ML -> probably can run on your laptop

- Simple DL -> colab is great. Cheapest there is.

- SOTA DL -> you're probably looking at training times well into 10s of hours / days, so you'll need something that can last longer than 12h of colab time. Plus at this point you're probably sophisticated enough that you want to setup your instance once and start/stop it rather than going through setup with colab every time. That's where https://gpu.land/ fits it.

So tl;dr; - absolutely use colab first (it's cheapest), but when you outgrow it, consider gpu.land.

1 more reply

teruakohatu5y ago

Is there any persistent storage options? I looked through the faq and comparisons and couldn't find any mention of it.

ilmoiOP5y ago

When you stop the instances (instead of killing it), you persist data. It costs $0.02/gb/mo, so $4/mo for a 200gb harddrive.

I will add to FAQ, thanks for pointing out.

1 more reply

ogiberstein5y ago

Wow, sounds really useful! Will check it out

taf25y ago

Because a lot of the data could be sensitive do you sign baa’s for HIPAA?

ilmoiOP5y ago

Sorry I missed this. Yes, more than happy to. Email me at hi@gpu.land.

chovybizzass5y ago

was going to mine but you say not to.

ilmoiOP5y ago

thanks for reading the FAQ! Most people don't seem to bother:)

1 more reply

37ef_ced35y ago· 6 in thread

AVX-512 neural net inference on inexpensive, CPU-only cloud compute instances: https://NN-512.com

An AVX-512 Skylake-X cloud compute instance costs $10 per CPU-core per month at Vultr (https://www.vultr.com/products/cloud-compute/), and you can do about 18 DenseNet121 inferences per CPU-core per second (in series, not batched) using tools like NN-512

GPU cloud compute is almost unbelievably expensive. Even Linode charges $1000 per month, or $1.50 per hour (look at the GPU plans: https://www.linode.com/pricing/#row--compute). It's really hard to keep that GPU saturated, which is what you need to do to get your money's worth

As AVX-512 becomes better supported by Intel and AMD chips, it becomes more attractive as an alternative to expensive GPU instances for workloads with small amounts of inference mixed with other computation

rckoepke5y ago

What does this have to do with https://gpu.land/ ? I've been very impressed with NN-512 in your past postings but I'm failing to see the direct salience w.r.t the top level post.

Perhaps readers would benefit from an apples-to-apples comparison of V100 to some CPU in a training per dollar metric? Preferably using something like MLPerf. You do mention inference but I think most people looking at https://gpu.land/ will be far more interested in training rather than inference.

I think the most direct competitor to this ShowHN would be https://vast.ai/

ilmoiOP5y ago

You are correct that vast.ai is pretty close.

The biggest difference is probably security / guaranteed uptime. With vast you're getting what it says on the tin - a machine from a marketplace. Could come from anyone / anywhere. No idea what else is running on it. Ours are hosted in a professional DC, managed and secured as they should be.

If anyone's curious, there's a detailed comparison page with other platforms here - https://gpu.land/versus

perennate5y ago

I think Linode/Paperspace/AWS/GCP/etc. are closer to "direct competitors". I would be much more comfortable trusting a single entity that owns its servers than a GPU rental marketplace like vast.ai.

sharemywin5y ago

Here's an article about ROBLOX running BERT on cpus versus GPUs.

https://blog.roblox.com/2020/05/scaled-bert-serve-1-billion-...

They're not training so I think that's the difference.

37ef_ced35y ago

The suggestion is that you may not want to use a GPU if you can use AVX-512 CPUs for your machine learning workload. Reducing the cost of the GPU helps, but it's still relatively expensive

dindresto5y ago

"AVX512 instructions cannot be sustained on a high clock speed for very long -- the processor downclocks right away if you send too many of them within a short time frame, probably for thermal reasons. This will probably be much more true for laptop processors." https://github.com/pytorch/pytorch/issues/4825#issuecomment-...

How much does this apply to the Xeon CPUs found in servers?

DougN75y ago· 5 in thread

I know nothing about crypto mining, but why wouldn’t this be allowed? How would you stop it?

jsnell5y ago

Mining cryptocurrencies is a good way to launder stolen credit cards numbers to cash. It is scalable (unlike most digital goods) and automatable (unlike most physical goods). It being unprofitable will not dissuade anyone from this use case: they are not paying with their own money to start with.

So it is best to forbid it in the TOS and just autokill the miners, rather than wait for the chargebacks. Especially if the compute is being sold at cost, since there is no buffer of profits to balance out the abuse.

ilmoiOP5y ago

Wow thanks for posting this! I defo haven't thought about this use case. So interesting.

So we actually explicitly prohibit mining in the T&Cs, it's just that in the FAQ I wanted to be a bit more human and dissuade people.

Also, there's quite a few protections built-in at the network layer to make sure mining isn't possible. Ports, ips, dns, even DPI. I learnt a lot about the early days of bitcoin and mining protocols when I was building gpu.land:)

But again, thanks for sharing this. Really good to know.

ilmoiOP5y ago

you'll be making around $6/d, while losing $24h paying for the service.

ska5y ago

see jsnell's point which is worth considering and mitigating if your (CC) processor isn't able to catch.

imhoguy5y ago

Woudn't pay off really, check the FAQ.

ffpip5y ago· 3 in thread

FYI, the site shows nothing until the scripts from stripe.com load. I know that that these breakages might not happen to all users, but it shouldn't be completely broken until the browser connects to Stripe.

https://images2.imgbox.com/b2/42/n87NuBRT_o.png

ilmoiOP5y ago

Do you think it's the scripts from stripe that are blocking the view? I would guess it's auth0.

This was my first front-end project so any feedback 100% welcome.

ffpip5y ago

Yes. When stripe is blocked, the page is blank, but when the js from stripe.com is allowed, the page loads.

jandrese5y ago

FWIW when I went to the site I had to whitelist scripts from js.stripe.com to get anything more than a black screen.

2 more replies

usmannk5y ago· 2 in thread

This is seriously cool. I want to mention that on AWS you can reliably get V100's for $0.918/hr (I do this all the time) by using spot instances. The spot price hasn't changed by even 0.01 for at least a year, so you're extremely unlikely to get an unexpected shutdown. Also, how did you manage to get V100s? What was that like?

ilmoiOP5y ago

Yep you're correct that V100s are available as spot instances. If you're willing to write some extra code for saving down weights you could easily go with them.

Actually, I'm renting V100s. Got lucky to know the right person at the right time:)

usmannk5y ago

Ah gotcha, did you have to move them to your DC and have them racked?

iujjkfjdkkdkf5y ago· 2 in thread

One more comment / suggestion: I watched the video and saw there are lots of pre-configured environments. I see this a lot, but to be honest this is not to helpful for me because I'm not doing development in your environment, I'm writing and checking the code somewhere else and then training on the gpus.

For the same reason, I'm only getting value out of the gpus when I am right ready to train. So I would be much happier if I could push a docker, or maybe even a conda environment spec, along with my code, attach data storage, and run e.g. train.py (or more likely a shell script that calls it) to completion and then immediately release the GPUs. Everything else is just the overhead of as quickly as possible trying to get the environment right, run my script, and then shut down the instance as soon as it's done. It would be awesome to have this kind of functionality, or is there a way to do that I missed?

ilmoiOP5y ago

Both are valid points. And no, there's currently no programmatic access to machines - but that's high on the roadmap.

Regarding environments, it's funny, when I was starting out I would kill for a pre-configured instance because I wanted to focus on modeling and didn't care if package X was version x.x.y or x.x.z. But as you grow as an ML engineer and develop your own toolkit these things start to matter.

So when creating a machine on gpu.land you have the choice of going pre-configured or just having a clean Ubuntu image. The former is meant for newcommers while the latter for pros. That was my thingking!

iujjkfjdkkdkf5y ago

Thanks a lot for answering my questions. I am definitely going to give this a try. Right now I see this as a great option for an always on dev instance that I can work in all day with GPU access and not worry about breaking the bank.

I saw someone ask on reddit about separating storage from the GPU instance, so that one could do data transfer and other setup without reserving a gpu. I want to echo how important this is. Another case is where I might want to use only one GPU and then scale up to 8 for training, or i might want to have N GPU instances attached to the same storage to run jobs in parallel. There are lots of other examples, but overall it would add much flexibility.

On the other hand, it might encourage people to be more "peaky" in their use, which could be a challenge for you. From what I understand, you are much better off if I want 1 gpu for 70 hrs vs, 70 gpus for an hour, in which case I understand how you might want to encourage steady use.

dindresto5y ago· 2 in thread

How does the Tesla V100 compare to the Tesla P100, which Scaleway offers at a similar price of €1/hour?

https://www.scaleway.com/en/gpu-instances/

ilmoiOP5y ago

Tesla P100 is the previous generation. P100 > V100 > A100. You can see detailed benchmarking here - http://ai-benchmark.com/ranking_deeplearning.html

knrz5y ago

For others who may have been confused, like me, in order of GPU power:

(Least) P100 < V100 < A100 (Most)

1 more reply

Reubend5y ago· 2 in thread

Hey there, just wanted to share that for me (using Opera browser with ad block) scrolling is broken on your landing page.

ilmoiOP5y ago

Hey - I'm just trying to replicate the issue. Could you share which adblock you're using? I downloaded Opera but the browser seems to open gpu.land fine.

ilmoiOP5y ago

Woah, that's not good. Will look into asap (today). Thanks for the feedback!

paul_milovanov5y ago· 2 in thread

this is great, except apparently there's no support for containers? is that on the roadmap?

thanks!

ilmoiOP5y ago

Not externally (in the same way that vast.ai does it), but you could always boot the instance -> ssh in -> manually run nvidia-docker.

Added to feature requests!

paul_milovanov5y ago

thanks!

iujjkfjdkkdkf5y ago· 1 in thread

Sorry if I missed it in the FAQ, do you have finite capacity or do you have an agreement with the data center where you can bring on more gpus with demand. My worry would be to wake up one day and find that all GPUs are in use.

And (this may sound naive) I work with other companies' data, I have a responsibility to them to keep it safe, do you have any concerns with this being used for "professional" applications or are you targeting research / hobby?

To be clear, I am dealing with things that my clients are comfortable with me working with on mainstream cloud providers, not state secrets or data with legislative requirements.

ilmoiOP5y ago

Great questions.

1) There's a finite capacity in the high 10s of GPUs. Currently the service is utilized at <10% capacity, so there's a lot of room to grow before we run out. If that was to happen (or even if we get to 50%), I could go and request more and (hopefully) expand the capacity quickly.

2)When building gpu.land I specifically wanted to make it safe for users to upload / store sensitive data. Of course the service is more geared towards hobbyists / researchers, just because that's an easier market to reach for a solo dev - but there is nothing at the technical level that sacrifices data privacy. For example, data on instances is encrypted both at rest and in transit within the DC + you and only you have SSH keys to your instance and nobody else can access it. Check out our security & privacy section in our FAQ - https://gpu.land/faq

DIVx05y ago· 1 in thread

Could this be used for bespoke cloud gaming? I was considering setting up a parsec host with paperspace GPU. I want more control over the game client than geforce now or stadia provides and a better GPU and availability than Shadow.tech currently offers.

*edit: I see that only Linux hosts are available at the moment so that kills my use case but I'll keep the question up for grins and giggles

ilmoiOP5y ago

Not really. Firstly it will be way too expensive for you (cloud gamins is like $10/mo, this is $10/10h). Secondly the GPUs installed (Tesla V100s) are not designed for gaming - rather for ML & scientific computing.

You should check out some of the services people mention in https://www.reddit.com/r/cloudygamer/

perennate5y ago· 1 in thread

The website design put me off but the FAQ makes it clear that it is quite professional. They have information about physical security, GDPR compliance, and VLAN. Wish there was some information about storage, e.g. if it's local disk or distributed block storage, and how many replicas. Very nice that unused credit is refundable.

ilmoiOP5y ago

Great point re storage, I'll be sure to add to the FAQ. To answer your question it's SSD block storage.

stuartbman5y ago· 1 in thread

This is a very cool project, but can I ask if there's a particular CSS package you used to get this site appearance?

ilmoiOP5y ago

Thanks! I used Tailwind CSS and built the theme from scratch, but I drew inspiration from:

- https://nostalgic-css.github.io/NES.css/

- https://jdan.github.io/98.css/

mastermojo5y ago· 1 in thread

Very cool. Is there a way to persist disks at a cost and attach/unattach Voltas for training?

EDIT: found in the FAQ:

Compute: $0.99/hr / 1x Tesla V100 (running instance only) Storage: $0.02/GB/month (running and stopped instances)

ilmoiOP5y ago

Yep - most people stop instances and come back whenever they're ready to train again. No charges for GPUs while the instance is stopped.

If you have a say 200GB harddrive you're only paying $4/month for storage.

tehsauce5y ago· 1 in thread

For anyone looking for cheap gpus (quite a bit cheaper than this even), I highly recommend the marketplace on vast.ai I’ve spent hundreds of dollars there and will spend many more.

ilmoiOP5y ago

You are right in that it's cheaper, but not that much cheaper if you look at renting v100s. Actually, on the website there's a whole section that compares us to them - https://gpu.land/versus

tombh5y ago· 1 in thread

These seems very similarly priced to AWS Spot Instances. But you're guaranteeing uptime?

perennate5y ago

In the FAQ it says:

> Is my instance guaranteed?

> Yes, unless you run out of credit. To prevent that from happening be sure to setup automatic top ups.

wrongdonf5y ago· 1 in thread

I love a dutch programmer

ilmoiOP5y ago

hah I am European indeed but not Dutch:) But I have spent a year living in the Netherlands.

maremmano5y ago

I came for the GPU, I stayed for the vintage CSS

bitobserver5y ago

> At 1/3 the Cost of AWS/GCP

For on-demand this is true, but spot GPU instances can have competitive pricing.

At this moment, a p3.2xlarge instance (1 Tesla v100) is 92 cents/hr on the AWS spot market in us-east-1. The p3.8xlarge (4 GPU) is $3.62/hr.

lostmsu5y ago

I wonder if this project will stay sustainable when V100 prices in other clouds drop below $1/h. This is not unthinkable considering A100 becoming more widespread.

akrymski5y ago

Congrats on the launch! Cheaper GPUs are always welcome ;-)

bayesianasian5y ago

Is there an option to request for more persistent storage? From the website, I inferred that each 8x GPU machine comes with 2TB SSD storage? Would I be able to request for another 2TB without requesting for more GPU instances?

Also, I am assuming that requesting more GPU instances allows me to access more persistent storage?

2Gkashmiri5y ago

Offtopic but I've been considering starting a "render farm" business. I can fork out some capital, take a loan and such but the important thing is, rent and electricity is almost free. So, wpuld it work?

neil15y ago

This is a pretty cool project

artem_mazur5y ago

Looks interesting! Cool!!

samsammurphy5y ago

Cooooool!

seefashion5y ago

looks great

j / k navigate · click thread line to collapse

93 comments

79 comments · 27 top-level

ilmoiOP5y ago· 20 in thread

So I built https://gpu.land/.

It’s a simple service that only does one thing: rents out Tesla V100s in the cloud.

Why is it awesome?

- It’s dirt-cheap. You get a Tesla V100 for $0.99/hr, which is 1/3 the cost of AWS/GCP/Azure/[insert big cloud name].

- It’s dead simple. It takes 2mins from registration to a launched instance. Instances come pre-installed with everything you need for Deep Learning, including a 1-click Jupyter server.

- It sports a retro, MS-DOS-like look. Because why not:)

HN special: email me a few lines about yourself and what you’re working on and get $10 in free credit. I’m at hi@gpu.land.

Otherwise I’m around for any questions!

etaioinshrdlu5y ago

Although lately they have had demand outstrip supply.

Hetzner used to have GTX 1080 instances for about $100 a month, no longer though, and I'm lucky to be grandfathered in to about 8 of them.

ghoomketu5y ago

For somebody who was 0 knowledge of ML but has always been wary of the huge costs involved this sure looks like a great offering!

perennate5y ago

Edit: maybe you can get some decent results just with COCO segmentation labels: https://towardsdatascience.com/background-removal-with-deep-...

1 more reply

benohear5y ago

This is all very cool, and FWIW I also love the retro design.

It might even be a legal obligation to have that info on the site depending on which country you are located in (Germany for sure, not sure about others).

ilmoiOP5y ago

Hey - for sure. Because I'm the only team member, the entity behind the business is registered as a "sole trader". If you email me on hi@gpu.land I'm happy to share the details.

ashish015y ago

Nice! I really like the top up feature. Really gives me peace of mind that I will not blow up my budget accidentally. I have been using https://datacrunch.io/ which has almost same feature set.

typon5y ago

Hey ilmoi,

Do you support multi-node training? Particularly, can I reserve for example a 64-GPU instance via 8 nodes and perform distributed training?

ilmoiOP5y ago

You can! All machines within a single account are inside a private VLAN, which means they all talk to each other (out of the box, no setup required), but nobody else on gpu.land sees them.

See the "Are instances within my account connected?" item in the FAQ - https://gpu.land/faq

If you're going to need 64 GPUs I'll have to increase your account limit (currently 16 GPUs per account). Email me at hi@gpu.land

liuliu5y ago

Thank you! This looks indeed cheap.

Can you share more stories on data persistence / checkpointing? If I have a job that requires 8 V100 with 3 days, what types of reliability am I looking at?

joecot5y ago

ilmoiOP5y ago

Exactly the same as you would be had you been running an EC2 instance at AWS. The machines are hosted, maintained and managed in exactly the same way in a whitelabel DC.

1 more reply

mraza0075y ago

I would love to hear your thoughts on this

ilmoiOP5y ago

Hey!

So the way I would describe your progression as a student (at least from my experience):

- Traditional ML -> probably can run on your laptop

- Simple DL -> colab is great. Cheapest there is.

So tl;dr; - absolutely use colab first (it's cheapest), but when you outgrow it, consider gpu.land.

1 more reply

teruakohatu5y ago

Is there any persistent storage options? I looked through the faq and comparisons and couldn't find any mention of it.

ilmoiOP5y ago

When you stop the instances (instead of killing it), you persist data. It costs $0.02/gb/mo, so $4/mo for a 200gb harddrive.

I will add to FAQ, thanks for pointing out.

1 more reply

ogiberstein5y ago

Wow, sounds really useful! Will check it out

taf25y ago

Because a lot of the data could be sensitive do you sign baa’s for HIPAA?

ilmoiOP5y ago

Sorry I missed this. Yes, more than happy to. Email me at hi@gpu.land.

chovybizzass5y ago

was going to mine but you say not to.

ilmoiOP5y ago

thanks for reading the FAQ! Most people don't seem to bother:)

1 more reply

37ef_ced35y ago· 6 in thread

AVX-512 neural net inference on inexpensive, CPU-only cloud compute instances: https://NN-512.com

rckoepke5y ago

What does this have to do with https://gpu.land/ ? I've been very impressed with NN-512 in your past postings but I'm failing to see the direct salience w.r.t the top level post.

I think the most direct competitor to this ShowHN would be https://vast.ai/

ilmoiOP5y ago

You are correct that vast.ai is pretty close.

If anyone's curious, there's a detailed comparison page with other platforms here - https://gpu.land/versus

perennate5y ago

I think Linode/Paperspace/AWS/GCP/etc. are closer to "direct competitors". I would be much more comfortable trusting a single entity that owns its servers than a GPU rental marketplace like vast.ai.

sharemywin5y ago

Here's an article about ROBLOX running BERT on cpus versus GPUs.

https://blog.roblox.com/2020/05/scaled-bert-serve-1-billion-...

They're not training so I think that's the difference.

37ef_ced35y ago

The suggestion is that you may not want to use a GPU if you can use AVX-512 CPUs for your machine learning workload. Reducing the cost of the GPU helps, but it's still relatively expensive

dindresto5y ago

How much does this apply to the Xeon CPUs found in servers?

DougN75y ago· 5 in thread

I know nothing about crypto mining, but why wouldn’t this be allowed? How would you stop it?

jsnell5y ago

ilmoiOP5y ago

Wow thanks for posting this! I defo haven't thought about this use case. So interesting.

So we actually explicitly prohibit mining in the T&Cs, it's just that in the FAQ I wanted to be a bit more human and dissuade people.

But again, thanks for sharing this. Really good to know.

ilmoiOP5y ago

you'll be making around $6/d, while losing $24h paying for the service.

ska5y ago

see jsnell's point which is worth considering and mitigating if your (CC) processor isn't able to catch.

imhoguy5y ago

Woudn't pay off really, check the FAQ.

ffpip5y ago· 3 in thread

https://images2.imgbox.com/b2/42/n87NuBRT_o.png

ilmoiOP5y ago

Do you think it's the scripts from stripe that are blocking the view? I would guess it's auth0.

This was my first front-end project so any feedback 100% welcome.

ffpip5y ago

Yes. When stripe is blocked, the page is blank, but when the js from stripe.com is allowed, the page loads.

jandrese5y ago

FWIW when I went to the site I had to whitelist scripts from js.stripe.com to get anything more than a black screen.

2 more replies

usmannk5y ago· 2 in thread

ilmoiOP5y ago

Yep you're correct that V100s are available as spot instances. If you're willing to write some extra code for saving down weights you could easily go with them.

Actually, I'm renting V100s. Got lucky to know the right person at the right time:)

usmannk5y ago

Ah gotcha, did you have to move them to your DC and have them racked?

iujjkfjdkkdkf5y ago· 2 in thread

ilmoiOP5y ago

Both are valid points. And no, there's currently no programmatic access to machines - but that's high on the roadmap.

iujjkfjdkkdkf5y ago

dindresto5y ago· 2 in thread

How does the Tesla V100 compare to the Tesla P100, which Scaleway offers at a similar price of €1/hour?

https://www.scaleway.com/en/gpu-instances/

ilmoiOP5y ago

Tesla P100 is the previous generation. P100 > V100 > A100. You can see detailed benchmarking here - http://ai-benchmark.com/ranking_deeplearning.html

knrz5y ago

For others who may have been confused, like me, in order of GPU power:

(Least) P100 < V100 < A100 (Most)

1 more reply

Reubend5y ago· 2 in thread

Hey there, just wanted to share that for me (using Opera browser with ad block) scrolling is broken on your landing page.

ilmoiOP5y ago

Hey - I'm just trying to replicate the issue. Could you share which adblock you're using? I downloaded Opera but the browser seems to open gpu.land fine.

ilmoiOP5y ago

Woah, that's not good. Will look into asap (today). Thanks for the feedback!

paul_milovanov5y ago· 2 in thread

this is great, except apparently there's no support for containers? is that on the roadmap?

thanks!

ilmoiOP5y ago

Not externally (in the same way that vast.ai does it), but you could always boot the instance -> ssh in -> manually run nvidia-docker.

Added to feature requests!

paul_milovanov5y ago

thanks!

iujjkfjdkkdkf5y ago· 1 in thread

To be clear, I am dealing with things that my clients are comfortable with me working with on mainstream cloud providers, not state secrets or data with legislative requirements.

ilmoiOP5y ago

Great questions.

DIVx05y ago· 1 in thread

*edit: I see that only Linux hosts are available at the moment so that kills my use case but I'll keep the question up for grins and giggles

ilmoiOP5y ago

You should check out some of the services people mention in https://www.reddit.com/r/cloudygamer/

perennate5y ago· 1 in thread

ilmoiOP5y ago

Great point re storage, I'll be sure to add to the FAQ. To answer your question it's SSD block storage.

stuartbman5y ago· 1 in thread

This is a very cool project, but can I ask if there's a particular CSS package you used to get this site appearance?

ilmoiOP5y ago

Thanks! I used Tailwind CSS and built the theme from scratch, but I drew inspiration from:

- https://nostalgic-css.github.io/NES.css/

- https://jdan.github.io/98.css/

mastermojo5y ago· 1 in thread

Very cool. Is there a way to persist disks at a cost and attach/unattach Voltas for training?

EDIT: found in the FAQ:

Compute: $0.99/hr / 1x Tesla V100 (running instance only) Storage: $0.02/GB/month (running and stopped instances)

ilmoiOP5y ago

Yep - most people stop instances and come back whenever they're ready to train again. No charges for GPUs while the instance is stopped.

If you have a say 200GB harddrive you're only paying $4/month for storage.

tehsauce5y ago· 1 in thread

For anyone looking for cheap gpus (quite a bit cheaper than this even), I highly recommend the marketplace on vast.ai I’ve spent hundreds of dollars there and will spend many more.

ilmoiOP5y ago

You are right in that it's cheaper, but not that much cheaper if you look at renting v100s. Actually, on the website there's a whole section that compares us to them - https://gpu.land/versus

tombh5y ago· 1 in thread

These seems very similarly priced to AWS Spot Instances. But you're guaranteeing uptime?

perennate5y ago

In the FAQ it says:

> Is my instance guaranteed?

> Yes, unless you run out of credit. To prevent that from happening be sure to setup automatic top ups.

wrongdonf5y ago· 1 in thread

I love a dutch programmer

ilmoiOP5y ago

hah I am European indeed but not Dutch:) But I have spent a year living in the Netherlands.

maremmano5y ago

I came for the GPU, I stayed for the vintage CSS

bitobserver5y ago

> At 1/3 the Cost of AWS/GCP

For on-demand this is true, but spot GPU instances can have competitive pricing.

At this moment, a p3.2xlarge instance (1 Tesla v100) is 92 cents/hr on the AWS spot market in us-east-1. The p3.8xlarge (4 GPU) is $3.62/hr.

lostmsu5y ago

I wonder if this project will stay sustainable when V100 prices in other clouds drop below $1/h. This is not unthinkable considering A100 becoming more widespread.

akrymski5y ago

Congrats on the launch! Cheaper GPUs are always welcome ;-)

bayesianasian5y ago

Also, I am assuming that requesting more GPU instances allows me to access more persistent storage?

2Gkashmiri5y ago

neil15y ago

This is a pretty cool project

artem_mazur5y ago

Looks interesting! Cool!!

samsammurphy5y ago

Cooooool!

seefashion5y ago

looks great

j / k navigate · click thread line to collapse