Octave is using UMFPACK to solve sparse systems of linear equations. There may be some performance to be gained by using KLU with AMD preordering.
I am interested to know how best to compile BLAS and LAPACK to WebAssembly. Traditionally, implementations optimized for a particular machine architecture are used to extract maximum performance. However, WebAssembly targets a stack-based conceptual machine. At present, I use LAPACK v3.4.2 and convert it to C with f2c before compiling to WebAssembly. It would be interesting to perform some benchmark tests against other implementations and compare across browsers.