> If OP actually has 10PB on S3 currently, the OP may want to fallback to leaving the existing data on S3 and accessing new data in the new location.
Another option would be to leave data on S3, store new data locally, and proxy all S3 download requests, ie, all requests go to the local system first. If an object is on S3, download it, store it locally, then pass it on to your customer. That way your data will gradually migrate away from S3. Of course you can speed this up to any degree you want by copying objects from S3 without a customer request.
An advantage of doing this is that you can phase in your solution gradually, for example:
Phase 1: direct all requests to local proxies, always get the data from S3, send it to customers. You can do this before any local storage servers are setup.
Phase 2: configure a local storage server, send all requests to S3, store the S3 data before sending to customers. If the local storage server is full, skip the store.
Phase 3: send requests to S3, if local servers have the data, verify it matches, send to customer
Phase 4: if local servers have the data, send it w/o S3 request. If not, make S3 request, store it locally, send data
Phase 5: store new data both locally and on S3
At this point you are still storing data on S3, so it can be considered your master copy and your local copy is basically a cache. If you lose your entire local store, everything will still work, assuming your proxies work. For the next phase, your local copy becomes the master, so you need to make sure backups, replication, etc are all working before proceeding.
Phase 5: start storing new content locally only.
Phase 6: as a background maintenance task, start sending list requests to S3. For objects that are stored locally, issue S3 delete requests to the biggest objects first, at whatever rate you want. If an object isn't stored locally, make a note that you need to sync it sometime.
Phase 7: using the sync list, copy S3 objects locally, biggest objects first, and remove them from S3.
The advantage IMO is that it's a gradual cutover, so you don't have to have a complete, perfect local solution before you start gaining experience with new technology.