Show HN: Arvados, a free software storage and compute platform for big data (opens in new tab)

(github.com)

6 pointstetron10y ago5 comments

5 comments

3 comments · 1 top-level

tetronOP10y ago· 2 in thread

Hello all, Arvados developer here to answer your questions.

How does Arvados differ from other big data tools?

It looks to be focused mainly on bio-informatics, do you see other use cases where it's features provide more leverage than general purpose tools?

tetronOP10y ago

There's several features that make Arvados unique.

* The content addressed storage system references hashes all data so you can unambiguously reference an immutable data set, similar to git but capable of handling huge invidual files (hundreds of gigabytes) and scaling to petabytes.

* Every compute job is recorded in a database with hashes identifying the inputs, Docker image, and outputs, so re-running past jobs is easy.

* Designed to federate multiple instances, to support both "hybrid cloud" setups within an organization, and allowing different organizations to share data.

These are all features that are particularly important to the bioinformatics community, but solve problems that are common to lots of informatics big data problems.

1 more reply

j / k navigate · click thread line to collapse