C++ Library for Linear Algebra on Supercomputers (opens in new tab)

(libelemental.org)

39 pointspoulson12y ago38 comments

Elemental is an open-source C++ library meant for performing dense linear algebra on tens of thousands of processes (via the Message Passing Interface). I have been the primary developer for the past four years, but a community is starting to emerge. In addition to the website, the project is also hosted on github: http://github.com/elemental/Elemental

C++ Library for Linear Algebra on Supercomputers

(libelemental.org)

39 pointspoulson12y ago38 comments

38 comments

16 comments · 4 top-level

poulsonOP12y ago· 5 in thread

I'm happy to answer any questions about the pros and cons of the library (and to remain quite objective). The main feature that the library currently lacks that is available in ScaLAPACK is a parallel Schur decomposition, but one is in the works.

I'm also happy to answer general questions about parallel linear algebra and to point people to the appropriate literature.

dmlorenzetti12y ago

Do you see this getting put into the ACTS toolkit (http://acts.nersc.gov/tools.html)? I see some on your project team are at Argonne, which would seem like a natural fit.

jedbrown12y ago

Elemental is available via PETSc (http://mcs.anl.gov/petsc), which is part of ACTS. We might be able to talk ACTS into giving Jack a session at the tutorial session next year. (Disclaimer: I'm a PETSc developer and helped with the Elemental interface.)

poulsonOP12y ago

Honestly, I doubt it. Getting added to such a list has both political and merit-based components.

1 more reply

balsam12y ago

what is the range of matrix sizes that would enjoy a speedup over ScaLAPACK? let's assume a 8 or 16-core cluster with N nodes, so, a function of N.

poulsonOP12y ago

There is unfortunately no good answer to this question, as it would completely depend upon the algorithm, MPI implementation, communication network, vendor-tuned BLAS library, etc., etc. However, I can safely say that Elemental is always competitive with ScaLAPACK in performance. The Elemental papers on my website (pick between the preprint or journal article), http://www.stanford.edu/~poulson, go into this in more detail, but keep in mind that the article was written a couple of years ago, when it was still a fledgling library, and so the tone is a bit more aggressive than necessary.

With that said, the main advantage of the library is its high-level of software-engineering, which tends to encourage the rapid development of new features. It is actively used within a large number of research projects.

1 more reply

n00b10112y ago· 3 in thread

Without GPU support I don't see how this is a good idea, if we are talking about HPC applications. Based on my understanding, a large number of dense matrix operations can be significantly accelerated in GPUs.

poulsonOP12y ago

Accelerator support is something that happens within the node and is in some sense orthogonal to the high-level design. I recently received funding to add such support to the library (and I hope to add it within the next year).

Also, not all supercomputers have accelerators (consider Blue Gene/Q), and often simply having access to more memory is more of a concern than solving the problem at the absolute fastest rate.

dmlorenzetti12y ago

This targets distributed-memory applications, so it's not clear why failing to support GPUs with the first iteration makes it a bad idea.

I imagine the authors could make a single node of the cluster use a GPU, if they wanted.

n00b10112y ago

Maybe I should have just said that I think GPU support would be a great addition.

Distributed memory and GPUs are not mutually exclusive. Multi-GPU clusters are extremely common. In fact the latest devices (e.g. Tesla K10) have multiple GPU processors packaged in a single card, so it is necessary for applications to target multiple GPUs. There is explicit support for distributed-memory applications in GPUs through the "GPUDirect" technology that allows peer-to-peer DMA and RDMA transfers between GPUs.

Given that reports of 30-50x GPU performance gains (versus CPUs) are common, the issue is important because it means solving a problem with (say) $10,000 of kit instead of $500,000.

1 more reply

temp45346334312y ago· 2 in thread

Great.. another C++ linear algebra library...

  boost::uBLAS
  eigen
  armadillo
  a dozen other...

why not contribute to an existing project?

The reoccurring bifurcation of talent and resources in the open source community is really disheartening. Can't we focus on one or two libraries and make them actually good? Or at least fork off of something that already exists and add your own features. I look at benchmarks of the existing tools and one library will do one operation very efficiently, while another will work well with something. Often the differences in speed are huge (more than a factor of 10). So I end up having to flip a coin in choosing which library to use.

poulsonOP12y ago

With all due respect, did you even read the title of the post? This is a distributed-memory library, unlike all of the ones you just mentioned. This is a fundamental difference in design and capability. The only related libraries are ScaLAPACK, PLAPACK, and DPLASMA.

temp45346334312y ago

And why exactly can't that be made part of an existing library?

1 more reply

ssawyer0612y ago· 2 in thread

Nice work, this is important stuff. Next step... sparse linear algebra!

poulsonOP12y ago

Thanks! I actually work on a lot of fast/sparse linear algebra. For example, see Clique: http://github.com/poulson/Clique

ssawyer0612y ago

Interesting, a direct sparse solver (for structured sparse matrices?). The name "clique" implies graph theory, so I was expecting to see distributed iterative SVD. I've yet to see a good distributed SVD for huge real-world/power-law graphs.

2 more replies

j / k navigate · click thread line to collapse

38 comments

16 comments · 4 top-level

poulsonOP12y ago· 5 in thread

I'm also happy to answer general questions about parallel linear algebra and to point people to the appropriate literature.

dmlorenzetti12y ago

Do you see this getting put into the ACTS toolkit (http://acts.nersc.gov/tools.html)? I see some on your project team are at Argonne, which would seem like a natural fit.

jedbrown12y ago

poulsonOP12y ago

Honestly, I doubt it. Getting added to such a list has both political and merit-based components.

1 more reply

balsam12y ago

what is the range of matrix sizes that would enjoy a speedup over ScaLAPACK? let's assume a 8 or 16-core cluster with N nodes, so, a function of N.

poulsonOP12y ago

1 more reply

n00b10112y ago· 3 in thread

poulsonOP12y ago

Also, not all supercomputers have accelerators (consider Blue Gene/Q), and often simply having access to more memory is more of a concern than solving the problem at the absolute fastest rate.

dmlorenzetti12y ago

This targets distributed-memory applications, so it's not clear why failing to support GPUs with the first iteration makes it a bad idea.

I imagine the authors could make a single node of the cluster use a GPU, if they wanted.

n00b10112y ago

Maybe I should have just said that I think GPU support would be a great addition.

Given that reports of 30-50x GPU performance gains (versus CPUs) are common, the issue is important because it means solving a problem with (say) $10,000 of kit instead of $500,000.

1 more reply

temp45346334312y ago· 2 in thread

Great.. another C++ linear algebra library...

  boost::uBLAS
  eigen
  armadillo
  a dozen other...

why not contribute to an existing project?

poulsonOP12y ago

temp45346334312y ago

And why exactly can't that be made part of an existing library?

1 more reply

ssawyer0612y ago· 2 in thread

Nice work, this is important stuff. Next step... sparse linear algebra!

poulsonOP12y ago

Thanks! I actually work on a lot of fast/sparse linear algebra. For example, see Clique: http://github.com/poulson/Clique

ssawyer0612y ago

2 more replies

j / k navigate · click thread line to collapse