undefined | Better HN

0 pointsacqq10y ago0 comments

Ditto for computationally intensive work: if it is CPU dominated, more CPU's calculating in parallel will be of advantage, even if the data could fit some RAM.

There's no a single simple answer, but sure, whenever less computers are enough, less should be used.

The recent problem is, some people love "clouds" so much today that they push there the work that could really be done locally.

0 comments

vidarh10y ago

Part of the problem is that a lot of problems that are CPU dominated on a single system becomes IO dominated once you start distributing it at very low node counts without very careful attention to detail.

acqqOP10y ago

Yes, and starting from the distributed version first, one can even miss the fact that everything can be immensely faster once the communication is out of the way.

There's no a single simple answer, except: don't decide about the implementation first, instead competently and without prejudice evaluate your actual possibilities.

Also don't decide the language like "Python" or "Ruby" first. If you expect that the calculations are going to take a month, then code them in Ruby and wait the month for the result, you can miss the fact that you could have had the results in one day by just using another language, most probably without sacrifying the readability of the code much. Only if the task is really CPU-bound, of course.

On another side, if you have a ready solution in Python, and you'd need a month to develop the solution for other language, the first thing you have to consider is how often you plan to repeat the calculations afterwards. Etc.

acdha10y ago

I'd also toss in ease of correctness if you don't have a really well-known problem with a good test suite: I've seen a number of cases where people developed “tight” C code which gave results which appeared reasonable and then ran it for months before noticing logic errors which would either have been impossible or easier to find in a higher-level language or, particularly, one which had better numeric error handling characteristics.

There is a lot to be said for confirming that you have the right logic and then looking into something like PyPy/Cython/etc. or porting to a lower-level language.

1 more reply

exelius10y ago

The entire reason for the popularity of distributed systems is because application developers in general are very bad at managing I/O load. Most developers only think of CPU/memory constraints, but usually not about disk I/O. There's nothing wrong with that; because if your services are stateless then the only I/O you should have is logging.

In a stateless microservice architecture, disk I/O is only an issue on your database servers. Which is why database servers are often still run on bare metal as it gives you better control over your disk I/O - which you can usually saturate anyway on a database server.

In most advanced organizations, those database servers are often managed by a specialized team. Application servers are CPU/memory bound and can be located pretty much anywhere and managed with a DevOps model. DBAs have to worry about many more things, and there is a deeper reliance on hardware as well. And it doesn't matter which database you use; NoSQL is equally as finicky as a few of my developers recently learned when they tried to deploy a large Couchbase cluster on SAN-backed virtual machines.

vidarh10y ago

For web services, distributed usually makes sense, because the resource usage of a single connection is rarely high. It's largely uninteresting to consider huge machines for that use case for the reasons you outline.

But that's not really what the discussion is about. In the web services case, your dataset often/generally fits in memory because your data set is tiny. You don't need large servers for that, most of the time. Even most databases people have to work with are relatively small or easily sharded.

In the context of this discussion, consider that what matters is the size of the dataset for an individual "job". If you are processing many small jobs, then the memory size to consider is the memory size of an individual job, not the total memory required for all jobs you'd like to run in parallel. In that case many small servers is often cost effective.

If you are processing large jobs, on the other hand, you should seriously consider if there are data dependencies between different parts of your problem, in which case you very easily become I/O bound.

speeder10y ago

I own a ASUS gaming laptop...

It has two flaws: One, it has nVidia Optimus (that just suck, whoever implemented it should be shot).

Two, the I/O is not that good, even with a 7200 RPM disk, and Windows 8.1 make it much worse (windows for some reason keep running his anti-virus, superfetch, and other disk intensive stuff ALL THE TIME).

This is noticeable when playing emulated games: games that use emulator, even cartridge ones, need to read from the disc/cartridge on the real hardware, on the computer they need to read from disc quite frequently, the frequent slowndowns, specially during cutscenes or map changes is very noticeable.

The funny thing is: I had old computers, with poorer CPU (this one has a i7) that could run those games much better. It seems even hardware manufacturers forget I/O

1 more reply

jacquesm10y ago

0 comments

vidarh10y ago

acqqOP10y ago

Yes, and starting from the distributed version first, one can even miss the fact that everything can be immensely faster once the communication is out of the way.

There's no a single simple answer, except: don't decide about the implementation first, instead competently and without prejudice evaluate your actual possibilities.

acdha10y ago

There is a lot to be said for confirming that you have the right logic and then looking into something like PyPy/Cython/etc. or porting to a lower-level language.

1 more reply

exelius10y ago

vidarh10y ago

speeder10y ago

I own a ASUS gaming laptop...

It has two flaws: One, it has nVidia Optimus (that just suck, whoever implemented it should be shot).

The funny thing is: I had old computers, with poorer CPU (this one has a i7) that could run those games much better. It seems even hardware manufacturers forget I/O

1 more reply

jacquesm10y ago