In my case, I admit, I don't have Windows ready to reproduce the result, but then I'm also not complaining about how the test was done, nor do I care about this at the current point.
If I ever have issues with performance of an application, then I will be doing the profiling and evaluating other options. I certainly wouldn't use some benchmark I'm too lazy to reproduce as a basis for selecting the technology to use. Instead, I use the tool I'm most comfortable with, I see the most advantages from or I just like best.
While the OP's benchmark might have been (and IMHO probably was) flawed, at least they provided source code and documentation for us to improve and re-run it. That's way more important than to know who they are and what their motivations might be.
As other comments pointed already out, the main flaw in this post is NOT that the node.js implementation is not as efficient as it could be. It is that the author choosed to benchmark a CPU intensive task, as opposed to an IO intensive one. This is not what node.js is good at nor what it is typically used for. That is something that can not be fixed without creating an entirely new benchmark.
I certainly wouldn't use some benchmark I'm too lazy to reproduce as a basis for selecting the technology to use.
However, it is good to have a general idea how different technologies compare. If for example you were to decide whether to learn node.js or C#, a benchmark like this might be one datapoint you could use.
One of the succesfactors of the Silicon Valley startup culture is open and honest feedback, even if it is sometimes not what you want to hear. There is not a culture of: "Do it better yourself, before you criticize others".
suddenly you can't use httplistener anymore and you have build an httpmodule for iis. so now you either build your own handler or use one of the existing ones.
i guess if you build your own isapi extension it's probably not very bad, but if you go through the asp.net "stack" it suddenly becomes a whole different story.
I'm glad he provided source code, but he clearly doesn't know Node well enough to provide an Apples to Apples comparison. Maybe .NET will blow Node away? Fine, but make sure you know Node well before you do the test.
Still, it's a nice conversation to start. I don't care how much I hate the company that builds it: if a tool is best for a job I want to know about it.
But your complaint that nobody is doing the tests: C'mon, how many Node experts even have Windows laying around?
The problem here is that not only is the code essentially broken and doing an apple to barnacles comparison, the core idea of the benchmark pretty certainly plays against node's strength (IO, lots of connections) as most of the hard work is not IO.
Why are you using an async sort function when you're not doing anything asynchronous in the sort callback? That's going to entail some overhead obviously, and it also can't take advantage of the sort optimizations v8 implements. Are you trying to show the overhead the asynchronous sort entails? Any overhead it causes would be easily outweighed by whatever asynchronous IO it's doing in that sort function anyway. It is negligible in comparison: not even worth pointing out.
> The key point I want to make is that I am using the async NPM instead of the default blocking Array.Sort.
I don't think you realize how this works. That async sort function is there to help deal with sort callbacks that already have to do something asynchronous. You're doing something synchronous and using that async sort function for no reason. If you're going to run these benchmarks, use the synchronous sort which can take advantage of certain optimizations. Or, you can also just do something actually asynchronous, which would justify using an async sort in the first place.
------------------
% time ./test.go "test1,test2,test3"
["test1" "test2" "test3"]
0.00s user 0.00s system 79% cpu 0.005 total
------------------
% time php -r ''
0.03s user 0.01s system 93% cpu 0.044 total
Fixed that for ya.
But that's not what async.sortBy does according to its documentation, async.sortBy is used to sort using an async comparator/key function e.g. to sort file names using a stat(2) call, the .Net version would have to sort using an asynchronous IComparator of some sort.
edit: plus, for some reason the benchmark explicitly parses the input strings to floats in javascript (and sorts using that), but seems to sort the raw bytes in the .net version. The C# version further performs the string split in the sparked task, where it's performed in request in JS (not that it'd help since — as I mentioned — the sort itself is incorrect)
second edit: to be fair, I like neither technology and am not a specialist in either, I have a pretty basic basic grasp of both and thus may have made mistakes in my cursory reading, if I have and somebody knows better don't hesitate correcting me.
But as you may see from the paragraphs above, as far as I understand the code this comparison is complete nonsense (even ignoring the rather dubious higher level case of reading a single big file in memory at once, sorting it and extracting a bit of stat, which is unlikely to be IO-bound)
> I work for Microsoft as a software engineer for Bing in Silicon Valley. My group specializes in building platform and experiences for consumers. ( http://www.salmanq.com/ )
I suspect the author did this as an evaluation to see which would work best for a future project, not a teardown piece on the platform. We really love node.js - seriously!
I love .NET. I also love nodejs. The truth is that node is best as a glue system between technologies, and it works perfectly there.
The results do not surprise me, but what I find more interesting is how different ecosystems and frameworks can encourage your average developer to write scalable or non-scaleable apps. And I still think node's model has a lot going for in this regard.
.net is faster than node but you absolutely failed at showing it.
Now granted every tool has its strengths and weaknesses this benchmark shows a very large difference in operations that aren't uncommon in web servers.
Sorting in memory should concern you, since caching happens in the memory. Again, I don't know the state of ORM technology in nodejs but any NHibernate user will be able to tell you that level 2 caches are a common thing and they will boost the benchmarks even further if you were to compare it to a nodejs stack.
The only scenario where I see nodejs faster and better than .NET are the very smallish script scenarios like a log receiver etc, where no state is needed and minimal memory footprint is required
- For very obvious reasons, JS (or Python or Ruby) would be slower than Java, C# at number crunching.
- Node.js would handle concurrent connections better than most Asp.Net apps since the framework is non-blocking. Now you could write non-blocking code with .Net, but that isn't how most people write Asp.Net code. OTOH, with Node.js that's the only way you could write an app.
- I am impressed that Node.js actually performed this well in that benchmark. For a dynamic, hard-to-optimize language, being 2.5x slower than .Net (or Java) code is a fairly good result.
Windows has come a long way in terms of development of non-.NET languages. But, there are still too many idiosyncrasies to function as a full-time platform for many languages.
Seems like my base need for Photoshop will push me to Mac sooner than I'd like.
There is a reason why people are using it to simultaneously develop for Win, OSX, Linux, Android, iOS, XBOX, Wii, etc....
Every tool user will think tools more complicated than there's are ridiculous. Visual Studio users will think Sublime Text users are morons who are making their lives harder for no reason. Sublime Text users will think this about Vim users...... but my analogy:
Visual Studio: training wheels Sublime Text: fat tire 5 speed bike Vim/Emacs: 21 speed skinny tire Tour de France bike
PS: I am a .NET fan, so I want fair comparisons, if I were to use them :)
What comparison do I find useful? The one that incorporates everything my application must address -- db queries, page rendering, cpu activity, memory consumption, operations, maintenance, etc. Given that I need lots of things for my application, single-point comparisons just really don't provide me with much value.
mind blown
(keep in mind, microsoft haters, that microsoft is currently a HUGE proponent of node.js - a lot of the Azure Mobile Services and the like are built on node)
However, I would guess that typical Node.js apps would be better at handling concurrent connections than typical Asp.Net apps; since Asp.Net apps are not usually written in non-blocking style.
First of all, you should be aware that Async.js is a mere flow-control library. It does not offload work to separate threads, and neither is it able to parallelize work. Internally, it mostly does bean-counting (but very helpfully so).
As you can see in the source https://github.com/caolan/async/blob/master/lib/async.js#L35... async.sortBy simply uses Array.sort for the actual sorting. The only reason you'd want to use Async.sortBy is if the values that the array of "keys" is not known beforehand (and needed to be loaded through io - asynchronously). This is clearly exemplified in the documentation. https://github.com/caolan/async#sortBy
The implication of this that your call to async.sortby can be replaced by a call to array.sort. This will remove two unnecessary runs of async.map, inflicting a potentially huge performance penalty.
You do need to pass array.sort a comparator function, otherwise it will sort lexicographically (see https://developer.mozilla.org/en-US/docs/JavaScript/Referenc... ). That said, I'm not sure what the actual contents of your input file is. In your .Net example, you do not seem to bother to convert the array of strings to an array of ints (or floats). I think that .Net sort will sort an array of strings lexicographically as well. Furthermore, in the node.js example, you seem to be content with returning the resulting median as an int, not as float. Do the input "decimals" in the input file represent ints or floats? Do they all have exactly the same amount of decimals? Are both the Node.js and .Net algorithms doing the same thing? I think not.
Finally, we get to Array.sort. Array.sort blocks. Depending on the multi-threading efficiency of the underlying algorithm of Array.sort (which I don't have insight in), the code may not be able to use all available system resources. Keep in mind that Node.js is single-threaded. I practically don't know anything about .Net, but I assume it will magically start new processes and or threads if the runtime deems this beneficial. For Node.js, you may want to try the Cluster api, http://nodejs.org/api/cluster.html . You could try seeing if performance increases by adding one or more extra server processes.
I can't comment about the quality of the .net code since I don't have any experience with it.
I think it would be fair (and very educative to others) if you'd rerun the benchmarks with 1. Async.sortBy replaced with array.sort 2. with both .Net and Node.js algorithms fully doing the same thing (i.e. let them both sort either floats, ints, or strings), and 3. at least one extra server process for Node. I think most interesting would be if you'd made the changes step-by-step, and run the benchmarks at each step.
My guess is that step 1 would give the biggest difference. Depending on how you decide to resolve the differences in the two algorithms, performance of your .Net code may be slightly affected. It could potentially be speed up in fact, if somehow it's able to sort ints (or floats) faster than strings. The actual job of sorting probably overshadows it all though."
What do you guys think of this?
It's kind of funny he talks about it being fast for due to non-blocking IO:
One of the key reasons most argue is that node.js is fast, scalable because of forced non-blocking IO, and it’s efficient use of a single threaded model.
...then goes on and sets up a benchmark which is more dependent on CPU than IO. Not to mention as mentioned here that benchmark itself is flawed.
In my experience Node is faster in the case of most web apps that select a row from the db, read network requests to do aggregation, or update a column in a db. Anything that does do CPU computation generally uses a native hook or a different tech altogether.
Thank you. It amazes me how some people can write code in a framework like Node without even understanding the event-driven paradigm's strength and weaknesses. I was sitting there reading it, and then he states he is using a file sort??? WTF? As if anyone would use Node for that purpose.
The big performance hit for the node version is the float conversion.
Multithreaded .NET version is a bit harder to write because usually this work is delagated to ASP.NET/IIS.
UPDATE: the .NET version is actually already multithreaded because of the Task system, so Node.JS seems to be actually faster in this scenario...
I'm by no means a .NET guru, but reading his code a couple non-optimal points jumped out at me:
- after parsing the file into an Array, he needlessly converts it into a List (while calling his variable 'array'). How bad this is is hard to say -- if the .NET compiler is sufficiently smart enough, it could optimize this to just a wrap operation, since the default List implementation uses an Array internally. That seems unlikely to me though. You'd have to check the generated code and benchmark.
- by sorting with the default string comparator, he's doing a culturally-aware unicode sort. I.e., the values are being sorted to "alphabetical" order, for however Unicode defines "alphabetical" for his current culture setting. A lot of people seem to feel it's obviously faster to compare the strings than the parsed floats. I don't think that's at all obvious.
EDIT: I meant to imply that most people who write benchmarks usually aren't experts in every language/framework in the comparison, so it would be nice to see someone who is competent in both .NET and in nodejs put together a benchmark.
If I ever would be doing such a thing, it would be more of a lesson in how to use node.js properly (even for non-typical tasks like sorting) then a performance comparison between node.js and .Net. Even then, I don't think it would be very interesting for people who like to read about node.js. It would be non-news to them, and for an atypical use-case.
I actually have no interest at all in getting a .net stack running on my mac.
The misleading information is what bothers me, so I hope to see it corrected. It's especially harmful because his blog will be most likely read by people who naturally favor .net. It's not even a tech blog per se. Then objective information is all the more important. (EDIT: Actually it does seem to a tech blog, which surprises me a bit, given the superficiality of his analysis)
You can see at the end of his posts, how he diverts from the main subject and goes on about (I'm sure to be) wonderful technologies and possibilities that the .net stack offers.
1. How many threads does the .Net Runtime use?
I know little about .Net, but I know C# is like Java. The Jvm with servlet will use many threads to serve, so that many cpu cores will be involved.
However, this is not the case for node. One node process will use just one thread(thought it is not the exact fact).
So it's unfair int programming model.
2. What's the hardware you use, how many cores is there?
Interesting!
There should be.