We will cover:
Core features of modern JavaScript
Programming with callbacks and promises
Creating objects and classes
Writing HTML and CSS
Creating interactive pages with React
Building data services
Testing
Data visualization
Combining everything to create a three-tier web application
- this isn't data science.Like, all the stuff "for my data science", such as making a visualization website etc.
Publishers should just offer a free e-book of said language, and make it a requirement.
That sentence from the book clarifies a lot for me. It is Javascript for Data Science People. Taken in that context this is an excellent book written with empathy for the Data Science user who is usually making uneasy excursions which they hope and pray is only temporary into Javascript and running back to Python the first time they encounter a Promise or a Callback.
Compare this with: „Data wrangling with JavaScript“ [1]
[1] https://www.amazon.de/Data-Wrangling-JavaScript-Ashley-Davis...
A better name would be “JS for data scientists”
I’d suggest that there are 3 important primitives for data science: flexible numeric types, fast math/algorithm libraries, and data manipulation being easy.
JavaScript doesn’t really have any of these. Numbers are 64bit floats only - no integers, no big numbers. There aren’t equivalents to Numpy/Pandas/Scikit Learn, and the lack of standard library and expressiveness in data manipulation in the language makes basic tasks harder.
JavaScript has its uses, but there’s really no reason to force data science be one of them.
https://github.com/nicolaspanel/numjs
https://www.npmjs.com/package/fast-math
https://smartbear.com/de/blog/2013/four-serious-math-librari...
That's not the problem. The problem is mindshare and network effects. When analyzing why Python is used one way and JS another we're tempted to retroactively rationalize this with something fundamental about the language. There's nothing fundamental about it. It's just happenstance. Python was around longer as a general purpose script, and it filled that niche. JS is relatively new as a script outside the browser.
https://github.com/nicolaspanel/numjs/graphs/contributors
I sincerely believe it is possible for JavaScript to be a viable language ecosystem, but there is dire need for cohesion, collaboration, and longevity. As it stands, there are so many potentially viable projects strewn across the NPM landscape like old, discarded toys.
I'm not aware of an initiative, let alone ethos, in the JS community that comes anywhere close to something like NumFocus.
I'm not suggesting these are the first tools you'd reach for for data science in production, but I've found them extremely useful for prototyping, experimenting with algorithms, and visualization. I think it's got to the stage they should be seriously considered for some types of relatively simple data processing work due to their ease of deployment.
[0]https://github.com/uwdata/arquero [1]https://observablehq.com/
* tensorflowjs, which runs on GPUs https://www.tensorflow.org/js and
* danfo, which aims to be a pandas equivalent for JS: https://danfo.jsdata.org/
Given the powerful interactive visualisation capabilities available in JS, its only a matter of time until JS becomes a serious contender IMO.
performance-wise, BigInts are terrible. Tried to use them, made things about a hundred times slower.
When you’re writing data science code, the value is in the answer more than the process of getting to that answer. Anything that complicates that gets in the way. This is why things like Pandas are so popular despite having some questionable engineering. Using a library for big number support, having to get that to play nicely with other libraries, it all goes against the aims.
Now for data engineering it’s very different. I wouldn’t choose JS myself, but it’s a much more reasonable choice. For engineering the process by which you get the answer matters far more - is it scalable, testable, repeatable, etc. Having to use a library for big number support is fine.
It’s two very different ways of working and I’m still fairly convinced that JS is not conducive to the former.
Is that generally true for data science type tasks, though, where the "fast" in python is really numpy, pandas, etc?
>or can use bindings just like Python
But there's not really anything like numpy/pandas for it to bind to at the moment, is there? Meaning anything as broad in functionality, fast, mature, etc.
both of those libraries are for arbitrary precision decimals, not floats.
NodeJS evented architecture is great for multitasking training (and prediction) jobs. I use Node Streams to extract and process data flows out of several data sources on my Macbook Air M1 using the new Neural Engine 16-core chip to train CNN models with excellent results.
Data prediction then runs on a ReactJS app, which gives my users a way to model, transform and visualize data on a browser. Everything is in Typescript, which reduces cognitive overload for our programmers and produces good end-to-end duck-type coherence and easy, integrated testing.
Now, most data science libs for Javascript are either on their infancy, are proof-of-concept or just abandoned, but TFJS is solid and if you know what, why and to which extent you're using JS for data science, then it's absolutely fine.
JavaScript does have integers (e.g. `Uint8Array`) and it also has big numbers (e.g. `BigInt`). It's true that there's not yet an equivalent to Numpy/Pandas/Scikit yet, but POCs show that it will be possible to create such a thing and that we will be able to use the WebGPU API to access higher performance than is available using Python [2].
I'm not saying that it will definitely happen, but why not?
[1] http://benschmidt.org/post/2020-01-15/2020-01-15-webgpu/
That is not true. BigInt has been available for a bit already.
MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
Availability: https://caniuse.com/bigint
I don't want to argue for or against using JS for "data science" (I myself used R for that but I use JS a lot for other things), just a clarification on this one concrete claim.
performance-wise, BigInts are terrible. Tried to use them, made things about a hundred times slower. What JS needs are 64 bit integer types, and some form of typing system that allows differentiating between various number types.
There is absolutely nothing wrong with coders/analysts/scientists building solutions in any language. The "hate" that you mention -- and then proceed to echo -- is a narrow way of asserting the superiority of $mylanguage and the inferiority of $yourlanguage.
> flexible numeric types, fast math/algorithm libraries, and data manipulation
Your point b) is usually written in a performant, compiled language, and your point c) can be built from robust primitives in any language. However, I will add a point d) about speed and memory usage.
I do data analysis with the simplest set of performant tools: sqlite, bash-awk-sed-grep, Perl, Python, C++, SVG, and a browser to render. Any kind of glorified REPL beyond a terminal creates fragile complexity and dependency Hell.
My kit doesn't include Node.js or ECMAscript but I'm willing to open my mind enough to think it might, one day. The current tooling for data analysis (or "data science" if we want to be faddish) is a mess and I look forward to better tools in the future.
There is a lot of great work happening in this space already. In the R world for example, shiny makes heavy use of js to the point that you often can't tell where R code ends and javascript begins. Plotly's Dash provides bindings for R, Python, and Julia. Personally, as a data scientist, I have been excitedly learning React because it really rips the landscape wide open for all the use cases I mentioned above. It then makes sense to have libraries that give JS users a good data model and can do most of the same numerical computation that we'd be doing in other languages. Again, you probabaly don't want to do serious numerical work in js, but remember people said that about Python ten years ago too.
I love the framing of this book, because I want more data scientists to start thinking about the presentation of data and spark some bits of ingenuity to make datasets and model outputs accessible to non-data scientists. Data scientists should be the ones writing the tools that interface data with humans because of their domain knowledge. But this is a different skillset and usually the work of SW engineers. Of course engineers can also have great data intuition too, but I really do encourage data scientists to develop their front end skills, it's well worth it.
Python definitely has some problems, but if you were going to have a new lingua franca for data science, it would probably be something like Julia, certainly not JavaScript.
So it seems a bit weird to me that they advocate using arrow function syntax instead of the regular syntax. They seem to be advocating using the new class syntax instead, so I guess they don't need the constructor or method creation features of the normal syntax, but I still don't see why they would specifically advocate for arrow function syntax. Is it faster? They say it interferes with other features, but which features?
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
FWIW I personally like the function keyword, since it's clear what it is to non-JS readers, but primarily because it hoists to the top of its file, so unimportant utility functions can sit unobtrusively at the end of the file, thereby letting readers encounter more important logic earlier in the file.
side-note: Most of my JS experience is writing userscripts for myself, so I definitely do my share of 'this' shenanigans.
even python is not used for data science, all heavy lifting is done in C/fortran, and python is just a glue
I can't find any benchmark how they compare to data.table or pandas.
Without a dominant and high performance data frame library as a foundation, I wouldn't even try.
Personally, I'm excited to build apps that don't require cloud computing and if they do, have access to one of the largest software engineering libraries through NPM. Sure, I'm not doing just Data Science in JavaScript but rather building apps that use DS/ML/AI, but that's still a valid use case. The alternative would be to use Python for prototyping then rewrite for production apps.