Write operations (write, writev, pwrite, pwritev) are not currently supported. In the future, Mountpoint for Amazon S3 will support sequential writes, but with some limitations:
Writes will only be supported to new files, and must be done sequentially.
Modifying existing files will not be supported.
Truncation will not be supported.
The sequential requirement for writes is the part that I've been mulling over whether or not it's actually required in S3. Last year I discovered that S3 can do transactional I/O via multipart upload[2] operations combined with the CopyObject[3] operation. This should, in theory, allow for out of order writes, existing partial object re-use, and file appends.[1] https://github.com/awslabs/mountpoint-s3/blob/main/doc/SEMAN...
[2] https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuove...
[3] https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObje...
What I did is:
1. Create 10000 files, each of 1MB size, so that the total usage is 10GB.
2. Mount each file as a loopback block device using `losetup`.
3. Create a RAID device over the 10000 loopback devices with `mdadm --build --level=linear`. This RAID device appears as a single block device of 10GB size. `--level=linear` means the RAID device is just a concatenation of the underlying devices. `--build` means that mdadm does not store metadata blocks in the devices, unlike `--create` which does. Not only would metadata blocks use up a significant portion of the 1MB device size, but also I don't really need mdadm to "discover" this device automatically, and also the metadata superblock does not support 10000 devices anyway (the max is 2000 IIRC).
4. From here the 10GB block device can be used as any other block device. In my case I created a LUKS device on top of this, then an XFS filesystem on the top of the LUKS device, then that XFS filesystem is my backup directory.
So any modification of files in the XFS layer eventually results in some of the 1MB blocks at the lowest layer being modified, and only those modified 1MB blocks need to be synced to the WebDAV server.
(Note: SI units. 1KB == 1000B, 1MB == 1000KB, 1GB == 1000MB.)
One caveat is that my 1MB (actually 999936B) block devices have 1953 sectors (999936B / 512B) but mdadm had silently only used 1920 sectors from each. In my first attempt at replacing mdadm with dm_linear I used 1953 as the number of sectors, which led to garbage when decrypted with dm_crypt. I discovered mdadm's behavior by inspecting the first two loopback devices and the RAID device in xxd. Using 1920 as the number of sectors fixed that, though I'll probably just nuke the LUKS partition and rebuild it on top of dm_linear with 1953 sectors each.
Did you run into any problems with discard/zeroing/trim support?
This was a problem with sshfs — I can’t change the version/settings on the other side, and files seemed to simply grow and become more fragmented.
I suspected WebDAV and Samba might have had been the solution but never looked into it since sshfs is so solid.
I’m a little disappointed that this library (which is supposed to be “read optimized”) doesn’t take advantage of S3 Range requests to optimize read after seek. The simple example is a zip file in S3 for which you want only the listing of files from the central directory record at the end. As far as I can tell this library reads the entire zip to get that. I have some experience with this[1][2].
[1] https://github.com/mlhpdx/seekable-s3-stream [2] https://github.com/mlhpdx/s3-upload-stream
EDIT: In my personal experience with S3 it’s always been super slow.
Ceph was also released many years after S3 was released. And I've never seen a highly performant 9P implementation come anywhere close to even third party S3 implementations. There was nothing for Amazon to copy. That's why everyone else copied Amazon, instead.
It's not the most insanely hyper-optimized thing from the user POV (HTTP, etc) and in the past some semantics were pretty underspecified e.g. before full consistency guarantees several years ago, you only got "read your writes" and that's it. But it's not that hard to see why it's popular, IMO, given the historical context and use cases. It's hard to beat in the average case for both ease of use and commitment.
1. It offered a resilient key/object store over HTTP.
2. By the standards of the day for bandwidth and storage it was (and to a certain extent still is) very inexpensive.
Since then much of AWS has been built on the foundation of S3 and so its importance has changed from merely being a tool to basically a pervasive dependency of the AWS stack. Also, it very much is designed for objects larger than 1KB and for applications that need durable storage of many, many large objects.
The key benefit, at least according to AWS marketing, is that you don't have to host it yourself.
Absurdly cheap storage
Extremely HA
Absurdly durable
Effectively unlimited bandwidth
Effectively unbounded storage without reservation or other management
Everything supports its api
It’s not a file system. It’s a blob store. It’s useful for spraying vast amounts of data into it and getting vast amounts of data out of it at any scale. It’s not low latency, it’s not a block store, but it is really cheap and the scaling of bandwidth and storage and concurrency make it possible to build stuff like snowflake that couldn’t be built on Ceph in any reasonable way.
[1] https://martinfowler.com/articles/patterns-of-distributed-sy...
Using something like FSx [1] gives you a performant option for the use cases when the tooling involved prefers filesystem semantics.
1. Cost. It might vary depending on vendor, but generally S3 is much cheaper than block storage, at the same time with some welcome guarantees (like 3 copies).
2. Pay for what you use.
3. Very easy to hand off URL to client rather than creating some kind of file server. Also works with uploads AFAIR.
4. Offloads traffic. Big files often are the main source of traffic on many websites. Using S3 allows to remove that burden. And S3 usually served by multiple servers which further increases speed.
5. Provider-independent. I think that every mature cloud offers S3 API.
I think that there are more reasons. Encryption, multi-region and so on. I didn't use those features. Of course you can implement everything with your own software, but reusing good implementation is a good idea for most projects. You don't rewrite postgres, so you don't rewrite S3.
Numbers? I feel like it's been a while, but my experience was it is in the 50ms latency range. That's fast enough that you can do most things. Your page loads might not be instant, but 50ms is fast enough for a wide range of applications.
The big mistake I see though is a lack of connection pooling: I find code going through the entire TCP connection setup, TLS setup, just for a single request, tearing it all down, and repeating. boto also enouranges some code patterns which result in GET bucket or HEAD object requests which you don't need and can avoid; none of this gives you good latency.
Other protocols you mentioned, including NFS, does not work well over the internet.
Some of them are exclusively designed to work within the same network, or very sensitive to network latency.
S3 and DynamoDB are essentially a decoupled BigTable; in that both are KV databases: One is used for high performance, small obj workloads; the other for high throughput, large obj workloads.
Amazon should really just fix the underlying issue of semantics by providing a PatchObjectPart API call that overwrites a particular multipart upload chunk with a new chunk uploaded from the client. CopyObjectPart+CompleteMultipartUpload still requires the client to issue CopyObjectPart calls for the entire object.
Azure has a feature where you can mount a blob store storage container into a container/VM, is this possibly aiming to match that feature?
I definitely think people should stop trying to pretend S3 is a file system and embrace what’s it’s good at instead, but I have had many times when having an easy and fast read-only view into an S3 bucket would be insanely useful.
Some bad ideas work extremely well if they fit your use case, you understand very well the tradeoffs and you’re building safeguards (disaster recovery).
Some other companies try to convince (force?) you into a workflow or into a specific solution. Aws just gives you the tools and some guidance on how to use them best.
Later I used Panic's Transmit Disk but they removed the feature.
Recently I'd been looking at s3fs-fuse to use with gocryptfs but haven't actually installed it yet!
We've had to perform occasional maintenance but its operated for years with no major issues. 99% are solved with a server restart + a startup script to auto-re-mount s3fs-fuse in all the appropriate places.
Give them a try, I recommend it!
BTW, Panic seemingly intends to re-build Transmit Disk. Hopefully it'll be part of Transmit 6: https://help.panic.com/transmit/transmit5/transmit-disk/#tec...
A supported macOS option appears to be Mountain Duck: https://mountainduck.io/
For example I'm not sure what they're doing here:
https://github.com/awslabs/mountpoint-s3/blob/main/mountpoin...
I've had really good luck with `gcloud storage ...` though, which takes essentially the same CLI args. It's much faster and IIRC written in golang.
It's considered a good replacement for C++, and like go, is really good for releasing tools. Tools like the AWS CLI... that work great when you can plop a single exe down as your install story, as opposed to say a python install and app (aws cli).
But it's also still new. Releasing a tool like this is likely a big deal in the area, and they're likely quite prod of it due to the effort of things like getting legal approval, marketing, etc, let alone the cool nerd factor of a filesystem, who doesn't want to show off by having written a filesystem or hell a fuse plugin... file system over dns anyone?
Disclaimer: work for MSFT
[0] https://github.com/Azure/azure-storage-fuse
[1] https://learn.microsoft.com/en-us/azure/storage/blobs/data-l...
Could replace `goofys` with this and then stick `catfs` in front.
Not having used s3fs I'm going to guess that s3fs is limited due to the limits of the underlying language - Python - namely poor performance overall and poor multi-thread story.
I'd imagine s3fs is useful for stuff like backing up personal projects, quickly sharing files between developers etc.
For operating at any kind of scale - in terms of concurrent requests, number or size of files etc - I'd guess that Mountpoint would be the only viable solution.