Ask HN: What is the architectural legacy of Hadoop?

1 pointsibains4y ago2 comments

I was thinking through - what the computer systems / architectural legacy of Hadoop is.

In Databases, the physical query plan streams data. Hadoop seems to have contributed that when running on cheap hardware or at scale, you can write intermediate results to files, so failures are handled better.

Is there anything more to it?

2 comments

2 comments · 2 top-level

PaulHoule4y ago

There is HDFS, which has aged better, and the old MapReduce 'query' processing system which has aged worse. (Replaced by Spark and about 10 other things.)

There is a large supply of firms that would like to dethrone HDFS, because they think customers think that paying to 3x replicate the data is too much. (The winner is Amazon S3 where you pay even more!)

Maybe the scene has changed, ceph has made some inroads, but HDFS has the amazing property of being almost as fast running in degraded mode as it is normally, thus being fast enough that it can regrade faster than it degrades.

A big cluster is going to be partially degraded a lot so it matters.

CarbonCycles4y ago

Distributed processing at massive scale...used for data transformation, stateless algorithm deployment, data mining/discovery at scale. If memory serves me correctly, still used at EBay, Google and other companies for very low level work.

j / k navigate · click thread line to collapse