MNIST training: Showdown between JavaScript and WebAssembly (opens in new tab)

(ai.danruta.co.uk)

76 pointsneptvn8y ago30 comments

30 comments

I compiled hello world in Rust into wasm the other day for the first time and it was incredibly satisfying for whatever reason. New technology is fun I guess.

Note: This page isn't rendering properly on iPhone

steveklabnik8y ago

We're working on making it even easier! Expect news soonish.

darknoon8y ago

It would be a more fair comparison to https://deeplearnjs.org/, since CPU training is not standard practice for neural nets these days.

nsthorat8y ago

We've done some initial tests ourselves. WASM doesn't yet support SIMD so WebGL tends to be 5-10x faster. SIMD is actively being worked on by many smart people in Chromium / other browsers, so I would expect to see huge wins in the near term future. When that happens, deeplearn.js will have a WASM backend. WASM has a much better memory management story (destructors on the C++ side) so I'm super excited about its future.

seanmcdirmid8y ago

Is SIMD going to close that performance gap for a problem that otherwise fits well on a GPU? Wouldn't the ideal by WASM support for GPU access?

fulafel8y ago

Indeed WASM supports GPU access through the normal JS WebGL API. Many asm.js/WebAssembly apps use the GPU in this way.

Though compiling WebAssembly to GPU code would be a very interesting thing to explore, too.

1 more reply

make38y ago

GPUs are at least an order of magnitude faster at training neural nets than CPUs. This is why companies like Google very large amounts of multi-thousand dollar GPUs like the Tesla V100. If cpu WASM is ever faster than WebGL, it's just that the webGL implementation is suboptimal (or that WebGL has too much overhead for real GPU compute)

gok8y ago

This seems like a race between crawling and hopping on one foot.

bo10248y ago

Where the feet and knees belong to your users...(?)

fulafel8y ago

So what's the explanation for the 20x difference? Is the JS implementation untuned?

edit: seems the JS is written in the straightforward, idiomatic JS way without employing speed hacks or typed arrays. So that I guess explains some of it.

__s8y ago

You tell me, https://github.com/DanRuta/jsNet/tree/master/dev/js

Ran in Chrome profiler, spent most time in these forward/backward functions https://github.com/DanRuta/jsNet/blob/master/dev/js/ConvLaye...

catman998y ago

If you care about performance use GPU shaders.

ekelsen8y ago

If you care about performance, don't use JS for training neural networks.

raverbashing8y ago

Not everything requires a multiple layer NN to be trained. It is not optimal, sure, but it might open up different possibilities

nsthorat8y ago

https://github.com/pair-code/deeplearnjs

ekelsen8y ago

My comment was meant to imply that if you're using JS, then you already care about other things _more_ than you care about performance.

Once you've decided that you do care about those other things, then getting the best performance you can is great and deeplearnjs is very useful.

But if what you care about most is performance, then using JS is not the way to go. I'm guessing deeplearnjs is an order of magnitude slower (at least) for training a modern convnet relative to cudnn or MKL. And I don't think it's currently possible to use multiple GPUs? Although you could imagine some crazy distributed asynchronous training running in browsers all over the world.

DanRuta8y ago

I'm in the process of doing that :)

exikyut8y ago

PLEASE NOTE

This is NOT a fair and honest comparison.

The WebAssembly implementation pegs 100% of one CPU core (as monitored in htop) on my system until it is completed.

The JS button sits between 66.2%-66.8% and 19-26% of one core depending on whether the tab is focused or not.

The JS version does not use Web Workers. I can, however, see two references to setTimeout(). This leads me to assume that the JS version is being slowed down so that the UI does not lock up.

Completely understandable, but patently dishonest, as there is no mention of this fact on the webpage.

The JS version should be reimplemented so it can run at 100% speed.

DanRuta8y ago

I'm currently re-writing the JS version to not use the setTimeout functions, in favour of just running the whole thing in a WebWorker, to still maintain browser usability.

I'm focusing more on a WebGL version, but once that's done, it should be a fairer comparison.

ENGNR8y ago

It's still an order of magnitude slower though at least on my machine

exikyut8y ago

Same here too. I just think it's dishonest that there's no mention that the JS version is being throttled while the wasm version is not.

DanRuta8y ago

woah, I never expected this much traffic! Thank you for all the suggestions, don't worry, I am still hard at work on this.

I'm currently most focused on designing and implementing a WebGL version (partially working FC forward shader, so far), between uni assignments, trying to see if I can get it working nicely together with WebAssembly (need to figure out the best way to create the contexts with an off-screen canvas).

The JS version uses setTimeout to stop locking up the browser, and is something old, which I'm about to change, in favour of using WebWorkers, and (optionally) collecting error data in an array, for displaying charts at the end of training, instead of during. That should come out in version 3.3, unless the GPU stuff comes out first, in v4.0.

I didn't expect the link would get posted somewhere, so the library versions used were pretty old, haha, but I'll update them now. I've also added a note about the setTimeout thing, which, again, will be removed soon.

The repo is here, if anyone was interested: https://github.com/DanRuta/jsNet

lol7688y ago

My browser at least seems to struggle to download the file at https://ai.danruta.co.uk/webassembly/mnist.js - how big is this file in total?

llao8y ago

It is 20 Megabytes, transmitted gzipped at 2.4 Megabytes. It is super slow to grab though, due to the other end.

DanRuta8y ago

Yep, running this on a 2 year old raspberry pi, on my desk, haha

make38y ago

Do webGL now, with this library from Google: https://deeplearnjs.org/#getting-started

akmittal8y ago

I showed me 1 min as 31 mins

nsthorat8y ago

Come build a WASM backend for deeplearn.js :)

DanRuta8y ago

That sounds like it could be a lot of fun :)

j / k navigate · click thread line to collapse

30 comments

dvddgld8y ago

I compiled hello world in Rust into wasm the other day for the first time and it was incredibly satisfying for whatever reason. New technology is fun I guess.

Note: This page isn't rendering properly on iPhone

steveklabnik8y ago

We're working on making it even easier! Expect news soonish.

darknoon8y ago

It would be a more fair comparison to https://deeplearnjs.org/, since CPU training is not standard practice for neural nets these days.

nsthorat8y ago

seanmcdirmid8y ago

Is SIMD going to close that performance gap for a problem that otherwise fits well on a GPU? Wouldn't the ideal by WASM support for GPU access?

fulafel8y ago

Indeed WASM supports GPU access through the normal JS WebGL API. Many asm.js/WebAssembly apps use the GPU in this way.

Though compiling WebAssembly to GPU code would be a very interesting thing to explore, too.

1 more reply

make38y ago

gok8y ago

This seems like a race between crawling and hopping on one foot.

bo10248y ago

Where the feet and knees belong to your users...(?)

fulafel8y ago

So what's the explanation for the 20x difference? Is the JS implementation untuned?

edit: seems the JS is written in the straightforward, idiomatic JS way without employing speed hacks or typed arrays. So that I guess explains some of it.

__s8y ago

You tell me, https://github.com/DanRuta/jsNet/tree/master/dev/js

Ran in Chrome profiler, spent most time in these forward/backward functions https://github.com/DanRuta/jsNet/blob/master/dev/js/ConvLaye...

catman998y ago

If you care about performance use GPU shaders.

ekelsen8y ago

If you care about performance, don't use JS for training neural networks.

raverbashing8y ago

Not everything requires a multiple layer NN to be trained. It is not optimal, sure, but it might open up different possibilities

nsthorat8y ago

https://github.com/pair-code/deeplearnjs

ekelsen8y ago

My comment was meant to imply that if you're using JS, then you already care about other things _more_ than you care about performance.

Once you've decided that you do care about those other things, then getting the best performance you can is great and deeplearnjs is very useful.

DanRuta8y ago

I'm in the process of doing that :)

exikyut8y ago

PLEASE NOTE

This is NOT a fair and honest comparison.

The WebAssembly implementation pegs 100% of one CPU core (as monitored in htop) on my system until it is completed.

The JS button sits between 66.2%-66.8% and 19-26% of one core depending on whether the tab is focused or not.

The JS version does not use Web Workers. I can, however, see two references to setTimeout(). This leads me to assume that the JS version is being slowed down so that the UI does not lock up.

Completely understandable, but patently dishonest, as there is no mention of this fact on the webpage.

The JS version should be reimplemented so it can run at 100% speed.

DanRuta8y ago

I'm currently re-writing the JS version to not use the setTimeout functions, in favour of just running the whole thing in a WebWorker, to still maintain browser usability.

I'm focusing more on a WebGL version, but once that's done, it should be a fairer comparison.

ENGNR8y ago

It's still an order of magnitude slower though at least on my machine

exikyut8y ago

Same here too. I just think it's dishonest that there's no mention that the JS version is being throttled while the wasm version is not.

DanRuta8y ago

woah, I never expected this much traffic! Thank you for all the suggestions, don't worry, I am still hard at work on this.

The repo is here, if anyone was interested: https://github.com/DanRuta/jsNet

lol7688y ago

My browser at least seems to struggle to download the file at https://ai.danruta.co.uk/webassembly/mnist.js - how big is this file in total?

llao8y ago

It is 20 Megabytes, transmitted gzipped at 2.4 Megabytes. It is super slow to grab though, due to the other end.

DanRuta8y ago

Yep, running this on a 2 year old raspberry pi, on my desk, haha

make38y ago

Do webGL now, with this library from Google: https://deeplearnjs.org/#getting-started

akmittal8y ago

I showed me 1 min as 31 mins

nsthorat8y ago

Come build a WASM backend for deeplearn.js :)

DanRuta8y ago

That sounds like it could be a lot of fun :)

j / k navigate · click thread line to collapse