I had a look at Spark, but its linear algebra packages seemed too limited (I guess abstraction comes at a cost). I can see that Spark would be nice if it does what you need out of the box.
Heard good things about Scala, is it straightforward to get a process on a remote machine to execute code?
Did you look at MLlib and/or just using Breeze directly? There's a bit of awkwardness in the initial set up of the cluster (mainly just having LAPACK installed on all nodes, see https://spark.apache.org/docs/1.1.0/mllib-guide.html ). Spark itself is essentially just sugar to let you write a map/reduce in natural scala style and have it distributed across a cluster - it'll only work if you can factor your algorithm in a way that fits into that paradigm. (I've heard arguments that it's possible to do that with any distributable algorithm if you're clever enough, but I'm not sure I believe them).
> Heard good things about Scala, is it straightforward to get a process on a remote machine to execute code?
Honestly, no. I love the language but Spark is very much what I think of (perhaps unfairly) as typical scientific software. Spark clusters are finicky - they're cobbled together from a few unrelated projects (especially for cases where you need LAPACK as well), and it shows, especially when it comes to updating them. There are a few organizations like Cloudera (I think there was an open-source effort under the Apache umbrella somewhere too) that try to provide a working package, and various efforts with Puppet/Chef/etc. to automate the process of putting a cluster together, and it's certainly a lot better than it was even a few years ago, but a cluster still need at least a little bit of dedicated sysadmin time (or, at a bare minimum, a programmer with a bit of *nix admin experience who's willing to get their hands dirty - that was me at times) to keep it running reliably.
If you're part of an institution that already maintains a Spark cluster - or maintains an ordinary Hadoop cluster and you're friendly enough with the sysadmins to suggest they install it - it's wonderful. If you're having to do it all from scratch I won't lie, it's going to involve a lot of fiddling and may well not be worth it for your problem.
I have yet to come across any other linear algebra library for any other high level language that provides the depth of integration available in the Julia base library. Want all eigenvalues of a symmetric tridiagonal 10x10 matrix between 1.0 and 12.0? Simply call T=SymTridiagonal(randn(10), randn(9)); eigvals(T, 1.0, 12.0). Or if you want to work closer to LAPACK, simply call LAPACK.stein!. I don't see a wrapper in Breeze or SciPy for this function. Want an LU factorization on a matrix of high precision floats? lufact(big(randn(5,4))). And so on.
Julia may not have everything users want, but its base library really tries to make matrix computations easy and accessible.