story

Fake S3 – Save time, money, and develop offline (opens in new tab)

blog.getspool.com

354 pointsjubos14y ago55 comments

55 comments

In my opinion, having to replicate S3 in development and test isn't the best idea. There are a few problems I see: You have tied yourself to S3's API, you must maintain this "other" S3 by making sure it behaves like the real S3 and your test and development code never actually hits the real API you're using...until staging or production.

There are a few better strategies I can see here:

1. For test, use something like VCR[1] to record real HTTP interactions with the real S3 API during first test runs, serialize them to disk, and then replay them later.

2. Go the more OO route and create an internal business object with a defined interface that handles persistance of your objects. You could have a S3Persister for production and staging, but then you can create a LocalDiskPersister or even MemoryPersister for tests. Hell, you can even keep your own S3 and create OurS3Persister as well. The main point here is that your application code is coded to one API/interface - the "persister" - and you can easily swap in different persisters for different reasons. All the individual persisters can then have their own tests that guarantee they adhere to to Persister interface and do their own individual things correctly.

3. Mock out the calls to your S3 library. It's the job of the library to provide an API interface for you as the application developer to S3, so you can mock out those API calls and trust the library works and is doing the right thing. Since you're mocking things out, you should still have integration tests with the real S3 to verify everything is working, but for quick unit tests mocking works great.

The blog post mentioned they had GB of data, so YMMV on these ideas, but these are strategies I and others have used in the past when dealing with APIs like S3 and they work great.

[1] https://github.com/myronmarston/vcr

jubosOP14y ago

Excellent points.

We work on the idea of different stages in the test and development pipeline. At different stages mock objects make sense, and at other stages having something like Fake S3 makes more sense.

For testing, the first stage would be unit testing. At that stage it is best to mock out your S3 interactions (with something like VCR or WebMock) and use an OO approach to wrap your persistence, so you could swap out S3 with another persistence engine without breaking APIs.

The second stage for us is integration testing where you might have multiple machines testing across the network. In this situation, I think it is great to have real network requests happening rather than mock requests. Also you can deal with real files (especially important with media files like images and video).

The last stage is taking out Fake S3 and using a true S3 connection to ensure that everything does work on a production environment (cuz Fake S3 could be faking you out, especially on things like authentication and versioning). We do that by launching a stage cluster and running a set of integration tests on that before doing a production release. Ideally, the first and second stages catch any errors before you start doing tests against the real AWS services.

As for the development pipeline, being able to work with real assets while you are making mobile or web interfaces is really useful, as well as simulating latency to see how interfaces respond when under a slow network connection is something that would be difficult to truly mock.

Fluxx14y ago

Awesome, thanks for the extra info. I think your setup sounds really good :)

mb2214y ago

it's fine to mock, but for serious s3 users you need to emulate the exact behavior or you are setting up your users for failure.

Negitivefrags14y ago

Isn't this just a method of implementation for your option 3? I don't really see a substantial difference between mocking the server and mocking the API.

The same caveat still applies about needing an integration test with the real S3 in either case.

Fluxx14y ago

> Isn't this just a method of implementation for your option 3? I don't really see a substantial difference between mocking the server and mocking the API.

The short answer is that there is no difference. Just as you could mock out a call to S3API.get(object_id) and have it returns my_object, you could write a server that responds to the S3 API call for getting object_id.

The long answer is that using mocks is a lot quicker to develop, easier, more straight forward and has faster run time than maintaining a real runnable copy of S3 that behaves the exact same as the real S3. With the fake S3 you're still spending CPU cycles inside your S3 client while it talks HTTP with your fake S3, which slows unit tests down a lot. Plus fake S3 may have slightly different behavior when your S3API library interacts with it, which could lead to really hard to track down bugs later on. Trusting your APIs is what unit testing is all about.

1 more reply

ben104014y ago

I had to do some work on an S3-backed project while out at sea on a cruise ship a few months ago (let's save the discussion about working on vacation for the 501 developer thread).

Thanks to git I was able to spool up my commits and then push when I pulled into port and had cellular access, but I wasn't really able to do everything I wanted with the paperclip-backed models without reliable/cheap network access.

An offline emulation mode for S3 sounds pretty nice, thanks for this!

mr_luc14y ago

On my last project, we used Dragonfly. Holy cow -- trivial to switch between file-backed and s3-backed storage in the various environment config files.

It was a lifesaver, because the wifi at the place I was couch-surfing was a little spotty.

andrewflnr14y ago

This dragonfly? https://github.com/markevans/dragonfly

It's kind of hard to google for "dragonfly".

1 more reply

justinsb14y ago

I'd recommend installing OpenStack's Swift component (S3 equivalent) and evaluating that as well. You can run it on one node for development purposes, you can scale it up if you want private object storage on your network, and many public clouds are offering it: Rackspace Cloud Servers, HP Cloud, AT&T, Korea Telecom, Internap etc

Wikipedia use OpenStack Swift to store their images, and have some good presentations on this.

jubosOP14y ago

Swift is very powerful piece of technology, but it is also more involved to setup. Curious to try RiakCS as well and see how it compares to Swift for running production level S3 object storage.

nl14y ago

http://devstack.org has a script to deploy OpenStack in two lines (git clone the repo, then run the script)

justinsb14y ago

Looks like I need to post my blog post about how to set up Swift really easily!

1 more reply

StavrosK14y ago

Isn't RiakCS an online service, like S3 itself? That's what I understand from the basho page...

DenisM14y ago

How about failure simulations? Also, S3 has eventual consistency, so a read can mIss a recent write. Ferequently injecting errors and consistency issues would make this very helpful.

jubosOP14y ago

Great idea. I like the idea of a command line flag (like the rate limit flag) to run it with a percentage failure rate or something along those lines.

fennecfoxen14y ago

I'm mildly surprised you have in-application bandwidth limits instead of setting up clever firewall rules on your local box. (Latency in particular is a fun thing to add.)

jubosOP14y ago

I wanted a cross platform way to test slow connections with a single command line parameter. Whether it be Linux, FreeBSD, or OSX (maybe Windows (haven't tested :-P)), it is easy to setup.

iptables or putting nginx with rate limits in front of Fake S3 would be a more powerful approach, but also harder to get going.

EricR2314y ago

Why not just change the storage strategy to saving files locally while in your test environment? Fog lets you do this easily with its configuration options.

dennyabraham14y ago

I can only imagine this is specific to scenarios where you have to manipulate s3 objects directly and the fog::storage abstraction used for s3 isn't adequate (though I could not example such a scenario specifically)

hrabago14y ago

I did this on a smaller scale within our SOA environment. We're told our DEV must connect to everyone else's DEV. The problem is everybody's DEV is unstable, because by nature, everything deployed there is a work in progress. If someone's service goes down, it can prevent me from testing and block my progress.

So early on when I developed a mock web service which could serve mock data based on the service I was calling. As a result, I always knew what data was available, had coherent data (foreign keys across systems were always valid), and whenever I needed to, I can bring a system down and test my own system's rigidity and error messages. It was great. And then we reengineered all the systems and everything changed.

RandallBrown14y ago

This has little to do with the contents of the article, but I found it interesting.

"For development, each engineer runs her own instance of Fake S3 where she can put gigabytes of images and video to develop and test against, and her setup will work offline because it is all local."

Is spool a team of all women engineers? (I'm just curious as to whether or not that's true because it's so rare. I don't want to turn this into a weird opposite day version of the sexism in computer science debate.)

anadiazhernandz14y ago

Female pronouns are used in many contexts to counteract the too-common use of male pronouns. I assume jubos was doing this because he knows the tech-world is flooded with male pronouns.

dlgeek14y ago

I have no problem with this in general... but for the love of god, please don't do what a coworker of mine did recently. In a threat model document, he had A sending HIS messages to B, using HER public key. I automatically translated the names to Alice and Bob (like anyone else would in a threat model), hit the pronouns, and my brain segfaulted.

eli14y ago

The English language lacks an appropriate word for this situation (singular personal pronoun that could apply to either gender). Many people feel using "he" to refer to both genders is sexist and will instead rephrase the sentence or use "they" as a singular (which is controversial). Using "she" is just another option. See http://en.wikipedia.org/wiki/Singular_they#Gender-neutral_la...

piggity14y ago

I much prefer gender neutral language to using "she" or "he".

Unless of course we are talking about a specific example like "John foolishly unscrewed the cap on his radiator while the engine was running".

In which case I would generally try and mix the gender across examples.

"Jane stabbed herself in the eye with a pencil when she was rear-ended by John on the way to the hospital"

1 more reply

arms14y ago

Maybe, maybe not, either way I don't think that's why it was phrased that way. I have no idea of the internals of the team, but I've seen the female terms used instead of the male terms more frequently when referencing software engineers than in other fields. It's just how some people write it.

Drbble14y ago

It's an unintended consequence of the stereotype the all women engineers are imaginary.

bdonlan14y ago

No license file? It's difficult to use software like this in many organizations if the licensing situation isn't clear...

jubosOP14y ago

Good point. I will put a MIT License file in there shortly.

deepakprakash14y ago

Talk about the timing!

We currently have a setup that needs S3 access to reliably develop/test the app I'm currently working on and I had just sat down planning to remove this dependency, since I will be on the road the next week or so.

This will save me a bunch of time immediately and probably some money later on. Thanks!

j4514y ago

While the triviality of maintaining a sync between the spec and functionality of a Fake and real S3 will show in time, I think this is a neat idea.

One thought that comes to my mind is if I could get away with building entire apps using this and spin it off to S3 where/if it's needed.

japherwocky14y ago

there was a python implementation of something like this in tornado (s3server and s3client), though now I don't see it. Anyone follow that project and know what happened to it?

dtwwtd14y ago

I think what you're talking about is one of Tornado's demos - not sure how current it is though.

https://github.com/facebook/tornado/tree/master/demos/s3serv...

zackattack14y ago

if anyone wants to port the Ruby script to python, put it on kickstarter and ill start u off with $50. i want a link to my webpage, CompassionPit, on the page for the final tool though. and one on the kickstarter page if ur feeling generous =)

p.s. anyone think we need a developer-tools kickstarter? paul, lmk before i give it to my friends at tech stars

2 more replies

kellysutton14y ago

Simple, elegant, awesome.

Have you tested it against paperclip?

jubosOP14y ago

Thanks! I haven't but let me know if works.

mikebabineau14y ago

A similar tool is available for SDB:

https://github.com/stephenh/fakesdb

deutronium14y ago

Could you use Eucalyptus for this?

LauriL14y ago

You could, but setting up Eucalyptus is a lot of work, compared to Fake S3, a Ruby tool.

js4all14y ago

I think so. Walrus is S3 API compatible.

sparknlaunch1214y ago

Great concept. Bandwidth cost savings are a big plus.

hashfold14y ago

great concept. will use it this weekend.

j / k navigate · click thread line to collapse

55 comments

Fluxx14y ago

There are a few better strategies I can see here:

1. For test, use something like VCR[1] to record real HTTP interactions with the real S3 API during first test runs, serialize them to disk, and then replay them later.

The blog post mentioned they had GB of data, so YMMV on these ideas, but these are strategies I and others have used in the past when dealing with APIs like S3 and they work great.

[1] https://github.com/myronmarston/vcr

jubosOP14y ago

Excellent points.

We work on the idea of different stages in the test and development pipeline. At different stages mock objects make sense, and at other stages having something like Fake S3 makes more sense.

Fluxx14y ago

Awesome, thanks for the extra info. I think your setup sounds really good :)

mb2214y ago

it's fine to mock, but for serious s3 users you need to emulate the exact behavior or you are setting up your users for failure.

Negitivefrags14y ago

Isn't this just a method of implementation for your option 3? I don't really see a substantial difference between mocking the server and mocking the API.

The same caveat still applies about needing an integration test with the real S3 in either case.

Fluxx14y ago

> Isn't this just a method of implementation for your option 3? I don't really see a substantial difference between mocking the server and mocking the API.

1 more reply

ben104014y ago

I had to do some work on an S3-backed project while out at sea on a cruise ship a few months ago (let's save the discussion about working on vacation for the 501 developer thread).

An offline emulation mode for S3 sounds pretty nice, thanks for this!

mr_luc14y ago

On my last project, we used Dragonfly. Holy cow -- trivial to switch between file-backed and s3-backed storage in the various environment config files.

It was a lifesaver, because the wifi at the place I was couch-surfing was a little spotty.

andrewflnr14y ago

This dragonfly? https://github.com/markevans/dragonfly

It's kind of hard to google for "dragonfly".

1 more reply

justinsb14y ago

Wikipedia use OpenStack Swift to store their images, and have some good presentations on this.

jubosOP14y ago

Swift is very powerful piece of technology, but it is also more involved to setup. Curious to try RiakCS as well and see how it compares to Swift for running production level S3 object storage.

nl14y ago

http://devstack.org has a script to deploy OpenStack in two lines (git clone the repo, then run the script)

justinsb14y ago

Looks like I need to post my blog post about how to set up Swift really easily!

1 more reply

StavrosK14y ago

Isn't RiakCS an online service, like S3 itself? That's what I understand from the basho page...

DenisM14y ago

How about failure simulations? Also, S3 has eventual consistency, so a read can mIss a recent write. Ferequently injecting errors and consistency issues would make this very helpful.

jubosOP14y ago

Great idea. I like the idea of a command line flag (like the rate limit flag) to run it with a percentage failure rate or something along those lines.

fennecfoxen14y ago

I'm mildly surprised you have in-application bandwidth limits instead of setting up clever firewall rules on your local box. (Latency in particular is a fun thing to add.)

jubosOP14y ago

I wanted a cross platform way to test slow connections with a single command line parameter. Whether it be Linux, FreeBSD, or OSX (maybe Windows (haven't tested :-P)), it is easy to setup.

iptables or putting nginx with rate limits in front of Fake S3 would be a more powerful approach, but also harder to get going.

EricR2314y ago

Why not just change the storage strategy to saving files locally while in your test environment? Fog lets you do this easily with its configuration options.

dennyabraham14y ago

hrabago14y ago

RandallBrown14y ago

This has little to do with the contents of the article, but I found it interesting.

anadiazhernandz14y ago

Female pronouns are used in many contexts to counteract the too-common use of male pronouns. I assume jubos was doing this because he knows the tech-world is flooded with male pronouns.

dlgeek14y ago

eli14y ago

piggity14y ago

I much prefer gender neutral language to using "she" or "he".

Unless of course we are talking about a specific example like "John foolishly unscrewed the cap on his radiator while the engine was running".

In which case I would generally try and mix the gender across examples.

"Jane stabbed herself in the eye with a pencil when she was rear-ended by John on the way to the hospital"

1 more reply

arms14y ago

Drbble14y ago

It's an unintended consequence of the stereotype the all women engineers are imaginary.

bdonlan14y ago

No license file? It's difficult to use software like this in many organizations if the licensing situation isn't clear...

jubosOP14y ago

Good point. I will put a MIT License file in there shortly.

deepakprakash14y ago

Talk about the timing!

This will save me a bunch of time immediately and probably some money later on. Thanks!

j4514y ago

While the triviality of maintaining a sync between the spec and functionality of a Fake and real S3 will show in time, I think this is a neat idea.

One thought that comes to my mind is if I could get away with building entire apps using this and spin it off to S3 where/if it's needed.

japherwocky14y ago

there was a python implementation of something like this in tornado (s3server and s3client), though now I don't see it. Anyone follow that project and know what happened to it?

dtwwtd14y ago

I think what you're talking about is one of Tornado's demos - not sure how current it is though.

https://github.com/facebook/tornado/tree/master/demos/s3serv...

zackattack14y ago

p.s. anyone think we need a developer-tools kickstarter? paul, lmk before i give it to my friends at tech stars

2 more replies

kellysutton14y ago

Simple, elegant, awesome.

Have you tested it against paperclip?

jubosOP14y ago

Thanks! I haven't but let me know if works.

mikebabineau14y ago