Garage: Open-Source Distributed Object Storage (opens in new tab)

j-pb1y ago

Yeah I've been following it on and off since it was camli-store. Maybe it tried to do too much at once and didn't focus on just the blob part enough, but I feel like it never really reached a coherent state and story.

BageDevimo1y ago

Have you seen https://github.com/willbryant/verm?

j-pb1y ago

Yeah, the subdirectories and mime-type seemed like an unnecessary complication. Also looks pretty dead.

jiggawatts1y ago

Something related that I've been thinking about is that there aren't many popular data storage systems out there that use HTTP/3 and/or gRPC for the lower latency. I don't just mean object storage, but database servers too.

Recently I benchmarked the latency to some popular RPC, cache, and DB platforms and was shocked at how high the latency was. Every still talks about 1 ms as the latency floor, when it should be the ceiling.

j-pb1y ago

Yeah QUIC would probably be a good protocol for such a system. Roundtrips are also expensive, ideally your client library would probably cache as much data as the local disk can hold.

singinwhale1y ago

Sounds a little like Kademlia, the DHT implementation that BitTorrent uses.

It's a distributed hash table where the value mapped to a hash is immutable after it is STOREd (at least in the implementations that I know)

j-pb1y ago

Kademlia could certainly be a part of a solution to this, but it's a long road from the algorithm to the binary that you can start on a bunch of machines to get the service, e.g. something like SeaweedFS. BitTorrent might actually be the closest thing we have to this, but it is at the opposite spectrum of the latency -distributed axis.

rakoo1y ago

But you don't really handle blobs in real life: they can't really be handled, they don't have memorable name (by design). So you need an abstractly layer on top of it. You can use zfs that will deduplicate similar blobs. You can use restic for backups that will also deduplicate similar parts of a file also in an idempotent way. And you can use git that will deduplicate files based on their hash

compressedgas1y ago

You might also be interested in Tahoe-LAFS https://www.tahoe-lafs.org/

j-pb1y ago

I get a

> Trac detected an internal error:

> IOError: [Errno 28] No space left on device

So it looks like it is pretty dead like most projects in this space?

snthpy1y ago

Have a look at LakeFS (https://docs.lakefs.io/understand/architecture.html).

Files are stored by hash on S3. Metadata is stored in a database. I run it locally and access it just like an S3 store. Metadata is in a Postgres DB.

ramses01y ago

Check also SeaweedFS, it has some interesting tradeoffs made, but I hear you with wanting some of the properties you're looking for.

tempest_1y ago

I am using seaweed for a project right now. Some things to consider with seaweed.

- It works pretty well, at least up to the 15B objects I am using it for. Running on 2 machines with about 300TB, (500 raw) storage on each.

- The documentation, specifically with regards to operations like how to backup things, or different failure modes of the components can be sparse.

- One example of the above is I spun up a second filer instance (which is supposed to sync automatically) which caused the master server to emit an error while it was syncing. The only way to know if it was working was watching the new filers storage slowly grow.

- Seaweed has a pretty high bus factor, though the dev is pretty responsive and seems to accept PRs at a steady rate.

rkunnamp1y ago

IPFS like "coordination free" local S3 replacement! Yes. That is badly needed.

lima1y ago

The RADOS K/V store is pretty close. Ceph is built on top of it but you can also use it as a standalone database.

yencabulator1y ago

Nothing content-addressed in RADOS. It's just a key-value store with more powerful operations that get/put, and more in the strong consensus camp than the parents' request for coordination free things.

(Disclaimer: ex-Ceph employee.)

https://en.m.wikipedia.org/wiki/Be_File_System

fijiaarone1y ago· 15 in thread

I don’t understand why everyone wants to replicate AWS APIs for things that are not AWS.

S3 is a horrible interface with a terrible lack of features. It’s just file storage without any of the benefits of a file syste - no metadata, no directory structures, no ability to search, sort, or filter.

Combine that with high latency network file access and an overly verbose API. You literally have a bucket for storing files, when you used to have a toolbox with drawers, folders, and labels.

Replicating a real file system is not that hard, and when you lose the original reason for using a bucket —- because your were stuck in the swamp with nothing else to carry your files in — why keep using it when you’re out of the mud?

vineyardmike1y ago

Does your file system have search? Mine doesn’t. Instead I have software that implements search on top of it. Does it support filtering? Mine uses software on top again. Which an S3 api totally supports.

Does your remote file server magically avoid network latency? Mine doesn’t.

In case you didn’t know, inside the bucket you can use a full path for S3 files. So you can have directories or folders or whatever.

Some benefits of this system (KV style access) is to support concurrent usage better. Not every system needs it, but if you’re using an object store you might.

psychoslave1y ago

Be OS FS at least has this

acdha1y ago

> Replicating a real file system is not that hard

What personal experience do you have in this area? In particular, how have you handled greater than single-server scale, storage-level corruption, network partitions, and atomicity under concurrent access?

nh21y ago

I use CephFS.

Blob storage is easier than POSIX file systems:

You have server-client state. The concept of opened files, directories, and their states. Locks. The ability for multiple writers to write to the same file while still providing POSIX guarantees.

All of those need to correctly handle failure of both the client and the server.

CephFS implements that with a Metadata server that has lots of logica and needs plenty of RAM.

A distributed file system like CephFS is more convenient than S3 in multiple ways, and I agree it's preferable for most use cases. But it's undoubtedly more complex to build.

crabbone1y ago

It's a legitimate question and I'm glad you asked! (I'm not the author of Garage and have no affiliation).

Filesystems impose a lot of constraints on data-consistency that make things go slow. In particular, when it comes to mutating directory structure. There's also another set of consistency constraints when it comes to dealing with file's contents. Object stores relax or remove these constraints, which allows them to "go faster". You should, however, carefully consider if the constraints are really unnecessary for your case. The typical use-case for object stores is something like storing volume snapshots, VM images, layers of layered filesystems etc. They would perform poorly if you wanted to use them to store the files of your programming project, for example.

favadi1y ago

> S3 is a horrible interface with a terrible lack of features.

Because turn out that most applications do not require that many features when it comes to persistent storage.

duskwuff1y ago

> I don’t understand why everyone wants to replicate AWS APIs for things that are not AWS.

It's mostly just S3, really. You don't see anywhere near as many "clones" of other AWS services like EC2, for instance.

And there's a ton of value on being able to develop against a S3 clone like Garage or Minio and deploy against S3 - or being able to retarget an existing application which expected S3 to one of those clones.

Scaevolus1y ago

S3 exposes effectively all the metadata that POSIX APIs do, in addition to all the custom metadata headers you can add.

Implementing a filesystem versus an object store involves severe tradeoffs in scalability and complexity that are rarely worth it for users that just want a giant bucket to dump things in.

The API doesn't matter that much, but everything already supports S3, so why not save time on client libraries and implement it? It's not like some alternative PUT/GET/DELETE API will be much simpler-- though naturally LIST could be implemented myriad ways.

nh21y ago

There are many POSIX APIs that S3 does not cover. For example directories, and thus efficient renames and atomic moves of sub hierarchies.

didntcheck1y ago

You wouldn't want your "interactive" user filesystem on S3, no, but as the storage backend for a server application it makes sense. In those cases you very often are just storing everything in a single flat folder with all the associated metadata in your application's DB instead

By reducing the API surface (to essentially just GET, PUT, DELETE), it increases the flexibility of the backend. It's almost trivial to do a union mount with object storage, where half the files go to one server and half go to another (based on a hash of the name). This can and is done with POSIX filesystems too, but it requires more work to fully satisfy the semantics. One of the biggest complications is having to support file modification and mmap. With S3 you can instead only modify a file by fully replacing it with PUT. Which again might be unacceptable for a desktop OS filesystem, but many server applications already satisfy this constraint by default

klysm1y ago

> Replicating a real file system is not that hard

Ummmm what? Replicating a file system is insanely hard

Nathanba1y ago

it's because many other cloud services offer sending to S3, that's pretty much it

TheColorYellow1y ago

Because at this point it's a well known API. I bet people want to recreate AWS without the Amazon part, and so this is for them.

Which, to your point, makes no sense because as you rightly point out, people use S3 because of the Amazon services and ecosystem it is integrated with - not at all because it is "good tech"

acdha1y ago

S3 was the second AWS service, behind SQS, and saw rapid adoption which cannot be explained by integration with services introduced later.

0: https://how.wtf/aws-sigv4-requests-with-curl.html

S3 is just HTTP. There isn't really an ecosystem for S3, unless you just mean all the existing http clients.

computerfan4941y ago· 11 in thread

I have used Garage for a long time. It's great, but the AWS sigv4 protocol for accessing it is just frustrating. Why can't I just send my API key as a header? I don't need the full AWS SDK to get and put files, and the AWS sigv4 is a ton of extra complexity to add to my projects. I don't care about the "security benefits" of AWS sigv4. I hope the authors consider a different authentication scheme so I can recommend Garage more readily.

dopylitty1y ago

I read that curl recently added sigv4 for what that’s worth[0]

zipping15491y ago

Of course curl has it

6LLvveMx2koXfwn1y ago

Implementing v4 on the server side also requires the service to keep the token as plain text. If it's a persistent password, rather than an ephemeral key, that opens up another whole host of security issues around password storage. And on the flip side requiring the client to hit an endpoint to receive a session based token is even more crippling from a performance perspective.

ianopolous1y ago

You can implement S3 V4 signatures in a few hundred lines of code.

https://github.com/Peergos/Peergos/blob/master/src/peergos/s...

I have done this for my purposes, but it's slow and unnecessary bloat I wish I didn't have to have.

surfingdino1y ago

It makes sense to tap into the existing ecosystem of AWS S3-compatible clients.

Plain HTTP (as in curl without any extra headers) is already an S3-compatible client.

If this 'Garage' doesn't support the plain HTTP use case then it isn't S3 compatible.

neon_me1y ago

Check something like PicoS3 or https://github.com/sentienhq/ultralight-s3

There is a few "very minimal" sigv4 implementations ...

klysm1y ago

Sending your api key in the header is equivalent to basic auth.

https://github.com/seaweedfs/seaweedfs

Yep, and that's fine with me. I don't have a problem with basic auth.

vineyardmike1y ago

This is not intended for commercial services. Realistically, this software was made for people who keep servers in their basement. The security profile of LAN users is very different than public AWS.

2 more replies

TechDebtDevin1y ago· 5 in thread

SeaweedFS is great as well.

n_ary1y ago

Tried this for my own homelab, either I misconfigured it or it consumes x2(linearly) memory(working) of the stored data. So, for example, if I put 1GB of data, seaweed would immediately consume 2GB of memory constantly!

Edit: memory = RAM

TechDebtDevin1y ago

That is odd. It likely has something to do with the index caching and how many replication volumes you configured. By default it indexes all file metadata in RAM (I think) but that wouldn't justify that type of memory usage. I've always used mostly default configurations in Docker Swarm, similar to this:

https://github.com/cycneuramus/seaweedfs-docker-swarm/blob/m...

crest1y ago

Are you claiming that SeaweedFS requires twice as much RAM as the sum of the sizes of the stored objects?

evanjrowley1y ago

Looks awesome. Been looking for some flexible self-hosted WebDAV solutions and SeaweedFS would be an interesting choice.

genewitch1y ago

depending on what you need it for nextcloud has WebDAV (clients can interact with it, and windows can mount your home folder directly, i just tried it out a couple days ago.) I've never used webdav before so i'm unsure of what other use cases there are, but the nextcloud implementation (whatever it may be) was friction-free - everything just worked.

neon_me1y ago· 5 in thread

Whats the motivation behind project like this one?

We got ceph, minio, seaweedfs ... and a dozen of others. I am genuinly curious what is the goal here?

rakoo1y ago

I can only answer for Garage and not others. Garage is the result of the desired organization of the collective behind it: deuxfleurs. The model is that of people willing to establish a horizontal governance, with none being forced to do anything because it all works by consensus. The idea is to have an infrastructure serving the collective, not a self hosted thing that everyone has to maintain, not something in a data center because it has clear ecological impacts, but something in-between. Something that can be hosted on secon-hand machines, at home, but taking the low reliability of machines/electricity/residential internet into account. Some kind of cluster, but not the kind you find in the cloud where machines are supposed to be kind of always on, linked with high-bandwidth, low-latency network: quite the opposite actually.

deuxfleurs thought long and hard about the kind of infra this would translate to. The base came fast enough: some kind of storage, based on a standard (even de-facto only is good because it means it is proven), that would tolerate some nodes go down. The decision of doing a Dynamo-like thing to be accessed through S3 with eventual consistency made sense

So Garage is not "simply" a S3 storage system: it is a system to store blobs in an unreliable but still trusted coonsumer-grade network of passable machines.

koito171y ago

Minio assumes each node has identical hardware. Garage is designed for use-cases like self-hosting, where nodes are not expected to have identical hardware.

Minio doesn't, it has bucket replication and it works okay.

WhereIsTheTruth1y ago

performance, therefore cheaper

iscoelho1y ago

not just about cost! improved performance/latency can make workloads that previously required a local SSD/NVME to be actually able run to run on distributed storage or an object store.

it can not be understated how slow Ceph/Minio/etc can be compared to local NVME. there is plenty of room for improvement.

CyberDildonics1y ago· 4 in thread

What is the difference between a "distributed object storage" and a file system?

vineyardmike1y ago

It’s an S3 api compatible object store that supports distributed storage across different servers.

Object store = store blobs of bytes. Usually by bucket + key accessible over HTTP. No POSIX expectation.

Distributed = works spread across multiple servers in different locations.

CyberDildonics1y ago

store blobs of bytes

Files

by bucket

Directories

key accessible

File names

over HTTP

Web server

crest1y ago

Files are normally stored hierarchically (e.g. atomically move directories), and updated in place. Objects are normally considered to exist in a flat namespace and are written/replaced atomically. Object storage requires less expensive (in a distributed system) metadata operations. This means it's both easier and faster to scale out object storage.

crabbone1y ago

There are few.

From the perspective of consistency guarantees, object storage gives fewer of such guarantees (this is seen as allowing implementations to be faster than typical file-systems). For example, since there isn't a concept of directories in object store, the implementation doesn't need to deal with the problems that arise while copying or moving directories with files open in those directories.

There are some non-storage functions that are performed only by filesystems, but not object storage. For example, suid bits.

It's also much more common to use object stores for larger chunks of data s.a. whole disk snapshots, VM images etc. While filesystems aim for the middle-size (small being RDBMs) s.a. text files you'd open in a text editor. Subsequently, they are optimized for these objectives. Filesystems care a lot about what happens when random small incremental and possibly overlapping updates happen to the same file, while object stores care about performance of sequential reads and writes the most.

This excludes the notion of "distributed" as both can be distributed (and in different ways). I suppose you meant to ask about the difference between "distributed object storage" and "distributed filesystem".

makkesk81y ago· 3 in thread

We moved over to garage after running minio in production with about ~2PB after about 2 years of headache. Minio does not deal with small files very well, rightfully so, since they don't keep a separate index of the files other than straight on disk. While ssd's can mask this issue to some extent, spinning rust, not so much. And speaking of replication, this just works... Minio's approach even with synchronous mode turned on, tends to fall behind, and again small files will pretty much break it all together.

We saw about 20-30x performance gain overall after moving to garage for our specific use case.

sandGorgon1y ago

quick question for advice - we have been evaluating minio for a in-house deployed storage for ML data. this is financial data which we have to comply on a crap ton of regulations.

so we wanted lots of compliance features - like access logs, access approvals, short lived (time bound) accesses, etc etc.

how would you compare garage vs minio on that front ?

withinboredom1y ago

You will probably put a proxy in front of it, so do your audit logging there (nginx ingress mirror mode works pretty good for that)

https://hub.docker.com/r/localstack/localstack

zimbatm1y ago

That's very cool; I didn't expect Garage to scale that well while being so young.

Are there other details you are willing/allowed to share, like the number of objects in the store and the number of servers you are balancing them on?

seaghost1y ago· 3 in thread

I want something very simple to run locally that has s3 compatibility just for the dev work and testing. Any recommendations?

zmj1y ago

rlonstein1y ago

https://min.io/

zX41ZdbW1y ago

Minio is fairly easy to setup locally or in CI.

We use it for CI in ClickHouse, for example: https://github.com/ClickHouse/ClickHouse/blob/master/docker/...

bluepuma771y ago· 3 in thread

Can it be easily deployed with old-school Docker Swarm?

TristanBall1y ago

I don't think I have ever personally felt older than having someone describe anything docker related as "old-school"

bluepuma771y ago

That’s was not my intention!

Docker is young and fashionable, every windows script kiddy uses it nowadays!

And then comes to the Docker forum complaining about strange issues, not realizing Docker Desktop is a different product, it uses a Linux VM to run the Docker engine, which was build for Linux ;-)

I explicitly wrote "old-school Docker Swarm", as that is missing love for years and everyone with 2 IT FTEs seems to be moving to k8s.

CoolCold1y ago

I was insulted as well - luckily just mental insult, assuming my age and the context of what's old and what's not :)

Daviey1y ago· 1 in thread

Last time I looked at Garage it only supported paired storage replication, such that if I had a 10GB disk in location A and a 1TB disk is location 2 and 3, it would only support "RAID1-esq" mirroring, so my storage would be limited to 10GB

leansensei1y ago

That's a deliberate design decision.

icy1y ago· 1 in thread

I've been running this on K3s at home (for my website and file server) and it's been very well behaved: https://git.icyphox.sh/infra/tree/master/apps/garage

I find it interesting that they chose CRDTs over Raft for distributed consensus.

iscoelho1y ago

from an operations point of view, I am surprised anyone likes Raft. I have yet to see any application implement Raft in a way that does not spectacularly fail in production and require manual intervention to resolve.

CRDTs do not have the same failure scenarios and favor uptime over consistency.

surfingdino1y ago· 1 in thread

There's also OpenStack Swift.

giulivo1y ago

I believe OpenStack Swift in particular is known to work well in some large organizations [1], NVIDIA is one of those and also invested in its maintenance [2].

1. https://www.youtube.com/watch?v=H1DunJM1zoc 2. https://platform.swiftstack.com/docs/

thecleaner1y ago· 1 in thread

Ist this formally verified by any chance ? I feel like there's space where formal designs could be expressed in TLA+ such that its easier for the community to keep track of the design.

halfa1y ago

There is formal proof for some parts of garage layout system, see https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/...

comvidyarthi1y ago· 1 in thread

Is this open source ?

kevlened1y ago

AGPL https://git.deuxfleurs.fr/Deuxfleurs/garage

anonzzzies1y ago· 1 in thread

NLNet sponsored a lot of nice things.

lifty1y ago

The EU, but yeah. NLNet are the ones that judged the applications and disbursed the funds.

sunshine-o1y ago

I really appreciate the low memory usage of Garage compared to Minio.

The only thing I am missing is the ability to automatically replicate some buckets on AWS S3 for backup.

storagenerd1y ago

Check out this one - https://github.com/NVIDIA/aistore

https://aiatscale.org/

It is an object storage system and more..

dtag001y ago

A bit of an off-topic question: I would like to programmatically generate S3 credentials that allow only read access or r/w access to only a certain set of prefixes. Imagine something like "Dropbox": You have a set of users, each user has his own prefix, but also users want to be able to share certain prefixes with other users. (Users are managed externally in a Postgres DB - MinIO does currently not know about them).

I found this really difficult to achieve with MinIO, since this appears to require an AssumeRole request, which is almost not documented in any way and I did not find a Typescript example. Additionally, there's a weird set of restrictions in place for MinIO (and also AWS) that makes this really difficult to do, e.g. the size of policies is limited, which effectively limits the number of prefixes a user can share. I found this really difficult to work around.

Can anyone suggest a way to do this? Can garage do this? Am I just approaching this from the wrong side?

Thanks

arcanemachiner1y ago

GitHub mirror: https://github.com/deuxfleurs-org/garage

moffkalast1y ago

Finally one can launch startups from their own Garage again.

MoodyMoon1y ago

Apache Ozone is an alternative for an object store running on top of Hadoop. Maybe someone who has experience running this in a production environment can comment on it.

https://ozone.apache.org/

j / k navigate · click thread line to collapse

141 comments

101 comments · 21 top-level

j-pb1y ago· 25 in thread

What I'm really missing in this space is something like this for content addressed blob storage.

There is stuff like IPFS in the large, but I want this for local deployments as a S3 replacement, when the metadata is stored elsewhere like git or a database.

amluto1y ago

IOW I would settle for content verification even without content addressing.

S3 has an extremely half-hearted implementation of this for “integrity”.

ianopolous1y ago

https://peergos.org/posts/direct-s3

the_duke1y ago

Garage splis the data into chunks for deduplication, so it basically already does content addressed storage under the hood..

They probably don't expose it publicly though.

j-pb1y ago

Yeah, and as far as I understood they use the key hash to address the overall object descriptor. So in theory using the hash of the file instead of the hash of the key should be a simple-ish change.

Tbh I'm not sure if content aware chunking isn't a sirens call:

  - It sounds great on paper, but once you start storing encrypted (which you have to do if you want e2e encryption) or compressed blobs (e.g. images) it won't work anymore.

  - Ideally you would store things with enough fine grained blobs that blob-level deduplication would suffice.

  - Storing a blob across your cluster has additional compute, lookup, bookkeeping, and communication overhead, resulting in worse latency. Storing an object as a contiguous unit makes the cache/storage hierarchies happy and allows for optimisations like using `sendfile`.

  - Storing the blobs as a unit makes computational storage easier to implement, where instead of reading the blob and processing it, you would send a small WASM program to the storage server (or drive? https://semiconductor.samsung.com/us/ssd/smart-ssd/) and only receive the computation result back.

od01y ago

Take a look at https://github.com/n0-computer/iroh

Open source project written in Rust that uses BLAKE3 (and QUIC, which you mentioned in another comment)

j-pb1y ago

All of these things are interesting problems, that I'd definitely like to see solved some day, but I'd be more than happy with an "S3 for blobs" :D.

khimaros1y ago

you might be interested in https://github.com/perkeep/perkeep

skinkestek1y ago

Perkeep has (at least until last I checked it) the very interesting property of being completely impossible for me to make heads or tails of while also looking extremely interesting and useful.

So in the hope of triggering someone to give me the missing link (maybe even a hyperlink) for me to understand it, here is a the situation:

I could start an import of Twitter or something else an it kind of shows up. Same with anything else: photos etc.

It clearly does something but it was impossible to understand what I am supposed to do next, both from the ui and also from the docs.

4 more replies

didntcheck1y ago

Or some even older prior art (which I recall a Perkeep dev citing as an influence in a conference talk)

http://doc.cat-v.org/plan_9/4th_edition/papers/venti/

https://en.wikipedia.org/wiki/Venti_(software)

j-pb1y ago

BageDevimo1y ago

Have you seen https://github.com/willbryant/verm?

j-pb1y ago

Yeah, the subdirectories and mime-type seemed like an unnecessary complication. Also looks pretty dead.

jiggawatts1y ago

j-pb1y ago

Yeah QUIC would probably be a good protocol for such a system. Roundtrips are also expensive, ideally your client library would probably cache as much data as the local disk can hold.

singinwhale1y ago

Sounds a little like Kademlia, the DHT implementation that BitTorrent uses.

It's a distributed hash table where the value mapped to a hash is immutable after it is STOREd (at least in the implementations that I know)

j-pb1y ago

rakoo1y ago

compressedgas1y ago

You might also be interested in Tahoe-LAFS https://www.tahoe-lafs.org/

j-pb1y ago

I get a

> Trac detected an internal error:

> IOError: [Errno 28] No space left on device

So it looks like it is pretty dead like most projects in this space?

snthpy1y ago

Have a look at LakeFS (https://docs.lakefs.io/understand/architecture.html).

Files are stored by hash on S3. Metadata is stored in a database. I run it locally and access it just like an S3 store. Metadata is in a Postgres DB.

ramses01y ago

Check also SeaweedFS, it has some interesting tradeoffs made, but I hear you with wanting some of the properties you're looking for.

tempest_1y ago

I am using seaweed for a project right now. Some things to consider with seaweed.

- It works pretty well, at least up to the 15B objects I am using it for. Running on 2 machines with about 300TB, (500 raw) storage on each.

- The documentation, specifically with regards to operations like how to backup things, or different failure modes of the components can be sparse.

- Seaweed has a pretty high bus factor, though the dev is pretty responsive and seems to accept PRs at a steady rate.

rkunnamp1y ago

IPFS like "coordination free" local S3 replacement! Yes. That is badly needed.

lima1y ago

The RADOS K/V store is pretty close. Ceph is built on top of it but you can also use it as a standalone database.

yencabulator1y ago

(Disclaimer: ex-Ceph employee.)

https://en.m.wikipedia.org/wiki/Be_File_System

fijiaarone1y ago· 15 in thread

I don’t understand why everyone wants to replicate AWS APIs for things that are not AWS.

Combine that with high latency network file access and an overly verbose API. You literally have a bucket for storing files, when you used to have a toolbox with drawers, folders, and labels.

vineyardmike1y ago

Does your remote file server magically avoid network latency? Mine doesn’t.

In case you didn’t know, inside the bucket you can use a full path for S3 files. So you can have directories or folders or whatever.

Some benefits of this system (KV style access) is to support concurrent usage better. Not every system needs it, but if you’re using an object store you might.

psychoslave1y ago

Be OS FS at least has this

acdha1y ago

> Replicating a real file system is not that hard

nh21y ago

I use CephFS.

Blob storage is easier than POSIX file systems:

You have server-client state. The concept of opened files, directories, and their states. Locks. The ability for multiple writers to write to the same file while still providing POSIX guarantees.

All of those need to correctly handle failure of both the client and the server.

CephFS implements that with a Metadata server that has lots of logica and needs plenty of RAM.

A distributed file system like CephFS is more convenient than S3 in multiple ways, and I agree it's preferable for most use cases. But it's undoubtedly more complex to build.

crabbone1y ago

It's a legitimate question and I'm glad you asked! (I'm not the author of Garage and have no affiliation).

favadi1y ago

> S3 is a horrible interface with a terrible lack of features.

Because turn out that most applications do not require that many features when it comes to persistent storage.

duskwuff1y ago

> I don’t understand why everyone wants to replicate AWS APIs for things that are not AWS.

It's mostly just S3, really. You don't see anywhere near as many "clones" of other AWS services like EC2, for instance.

Scaevolus1y ago

S3 exposes effectively all the metadata that POSIX APIs do, in addition to all the custom metadata headers you can add.

Implementing a filesystem versus an object store involves severe tradeoffs in scalability and complexity that are rarely worth it for users that just want a giant bucket to dump things in.

nh21y ago

There are many POSIX APIs that S3 does not cover. For example directories, and thus efficient renames and atomic moves of sub hierarchies.

didntcheck1y ago

klysm1y ago

> Replicating a real file system is not that hard

Ummmm what? Replicating a file system is insanely hard

Nathanba1y ago

it's because many other cloud services offer sending to S3, that's pretty much it

TheColorYellow1y ago

Because at this point it's a well known API. I bet people want to recreate AWS without the Amazon part, and so this is for them.

Which, to your point, makes no sense because as you rightly point out, people use S3 because of the Amazon services and ecosystem it is integrated with - not at all because it is "good tech"

acdha1y ago

S3 was the second AWS service, behind SQS, and saw rapid adoption which cannot be explained by integration with services introduced later.

0: https://how.wtf/aws-sigv4-requests-with-curl.html

S3 is just HTTP. There isn't really an ecosystem for S3, unless you just mean all the existing http clients.

computerfan4941y ago· 11 in thread

dopylitty1y ago

I read that curl recently added sigv4 for what that’s worth[0]

zipping15491y ago

Of course curl has it

6LLvveMx2koXfwn1y ago

ianopolous1y ago

You can implement S3 V4 signatures in a few hundred lines of code.

https://github.com/Peergos/Peergos/blob/master/src/peergos/s...

I have done this for my purposes, but it's slow and unnecessary bloat I wish I didn't have to have.

surfingdino1y ago

It makes sense to tap into the existing ecosystem of AWS S3-compatible clients.

Plain HTTP (as in curl without any extra headers) is already an S3-compatible client.

If this 'Garage' doesn't support the plain HTTP use case then it isn't S3 compatible.

neon_me1y ago

Check something like PicoS3 or https://github.com/sentienhq/ultralight-s3

There is a few "very minimal" sigv4 implementations ...

klysm1y ago

Sending your api key in the header is equivalent to basic auth.

https://github.com/seaweedfs/seaweedfs

Yep, and that's fine with me. I don't have a problem with basic auth.

vineyardmike1y ago

This is not intended for commercial services. Realistically, this software was made for people who keep servers in their basement. The security profile of LAN users is very different than public AWS.

2 more replies

TechDebtDevin1y ago· 5 in thread

SeaweedFS is great as well.

n_ary1y ago

Edit: memory = RAM

TechDebtDevin1y ago

https://github.com/cycneuramus/seaweedfs-docker-swarm/blob/m...

crest1y ago

Are you claiming that SeaweedFS requires twice as much RAM as the sum of the sizes of the stored objects?

evanjrowley1y ago

Looks awesome. Been looking for some flexible self-hosted WebDAV solutions and SeaweedFS would be an interesting choice.

genewitch1y ago

neon_me1y ago· 5 in thread

Whats the motivation behind project like this one?

We got ceph, minio, seaweedfs ... and a dozen of others. I am genuinly curious what is the goal here?

rakoo1y ago

So Garage is not "simply" a S3 storage system: it is a system to store blobs in an unreliable but still trusted coonsumer-grade network of passable machines.

koito171y ago

Minio assumes each node has identical hardware. Garage is designed for use-cases like self-hosting, where nodes are not expected to have identical hardware.

Minio doesn't, it has bucket replication and it works okay.

WhereIsTheTruth1y ago

performance, therefore cheaper

iscoelho1y ago

not just about cost! improved performance/latency can make workloads that previously required a local SSD/NVME to be actually able run to run on distributed storage or an object store.

it can not be understated how slow Ceph/Minio/etc can be compared to local NVME. there is plenty of room for improvement.

CyberDildonics1y ago· 4 in thread

What is the difference between a "distributed object storage" and a file system?

vineyardmike1y ago

It’s an S3 api compatible object store that supports distributed storage across different servers.

Object store = store blobs of bytes. Usually by bucket + key accessible over HTTP. No POSIX expectation.

Distributed = works spread across multiple servers in different locations.

CyberDildonics1y ago

store blobs of bytes

Files

by bucket

Directories

key accessible

File names

over HTTP

Web server

crest1y ago

crabbone1y ago

There are few.

There are some non-storage functions that are performed only by filesystems, but not object storage. For example, suid bits.

makkesk81y ago· 3 in thread

We saw about 20-30x performance gain overall after moving to garage for our specific use case.

sandGorgon1y ago

quick question for advice - we have been evaluating minio for a in-house deployed storage for ML data. this is financial data which we have to comply on a crap ton of regulations.

so we wanted lots of compliance features - like access logs, access approvals, short lived (time bound) accesses, etc etc.

how would you compare garage vs minio on that front ?

withinboredom1y ago

You will probably put a proxy in front of it, so do your audit logging there (nginx ingress mirror mode works pretty good for that)