Since storage constantly gets cheaper, 100GB first stored in 2001 can be stored on updated media for a fraction of that original cost in 2024.
I think I read this quote on Tim Bray's blog[0], but I am not sure anymore. This is now my approach, my short/middle term archival is designed to be easily transferred to the next short/middle term store on a regular basis. I started with 500GB drives, now I am at 14TB.
Even the floppies were a step up from paper tape - the older guys used to have a cupboard of paper tapes on coathangers, and linked their code by feeding the tapes through the reader in the right order.
Kids these days etc :)
Back in the 90's to 00's a friend had a collection of cd's that he'd written, but he stored them in a big sleeved folder container. The container itself caused them to warp slightly, which made them unusable.
I took a few for testing and managed to unbend them after some time, which turned them back into a working state.
[Note: That's the most apostrophes I've ever used in a sentence, it feels dirty]
Tape also has a problem shared with hard disks - to achieve high density we've rapidly hit a stage where the technology is too complex to enable data archeology at some point in the future; using 90s era complexity hard drives is about where the archeological limit is. LTO-1 may even be beyond that complexity compared to DLT, QIC or even Data8 (helical scan may be too much of a spanner in the works)
Modern polymers may make microfiche/microfilm a longer term solution than it has been in the past with acetate film/slides, but I'm not sure how much research has been done into which polymers might be best.
For longer than a century, our best experience is, thus far, with clay and paper (assuming good quality acid-free paper, rather than cheap modern consumer paper).
Assuming you store your own players, and have a convertor from USB to whatever exists in fifty years, is that a real solution?
What killed the last one was an experiment with installing Emby. Like many similar systems, it bewilderingly has no rate-limiting function, and will thrash a disk to within an inch of its life in order to index it. And that was the most recent thing that killed one of my external Plex drives, with multiple series and movies on it.
So yes, just keep refreshing the media, at reasonable intervals.
PS Yes, I know this is a poor method of content storage. NAS is looming up for me one of these days.
AFAIK large data centers automate something like this.
I look forward to the first time logs from a few decades ago are required, and the media is absolutely dead.
EDIT: they weren’t even Azo dye, they were phthalocyanine. A decade was probably generous.
https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha...
Would love to help, too bad those logs disintegrated decades ago.
Good Luck!!!!
The aforementioned optical media storage was specific to the nuclear reactor and electric plant; I think everyone else’s were stored differently. Not positive.
EDIT: sibling comment below mentions performance data. Yes, that too. I graphed (nuclear) fuel consumption on one underway, and was surprised to find it didn’t match expected. My Captain was also surprised, and thrilled, because it meant he got to be more important (fair enough; who doesn’t want to be listened to by their boss?)
I noted in another comment that the National Archives say only "deck logs" are retained permanently, and it looks like this site lists what they contain: https://www.history.navy.mil/content/history/nhhc/research/a..., which includes all kinds of things.
Stuff like "Actions [combat]", "Appearances of Sea/Atmosphere/Unusual Objects", "Incidents at Sea", "Movement Orders", "Ship's Behavior [under different weather/sea conditions]", "Sightings [other ships; landfall; dangers to navigation]" seem like they'd be useful for history and other kinds of research.
Stuff like "Arrests/Suspensions", "Courts-Martial/Captain's Masts", "Deaths" seem like the kind of legal records that are typically kept permanently.
Stuff like "Soundings [depth of water]" were probably historically useful for map-making.
In the case of a ship (or sub), I'd assume that they'd rotate optical media archives off the vessel every year or two and transfer them to some central database. After all, a vessel can be lost and the data is also useful in the aggregate.
Sounds nice and healthy
1. Incomplete copies with missing dependencies. 2. Old software and their file formats with a poor virtualization story. 3. Poor cataloging. 4. Obsolete physical interfaces, file systems, etc. 5. Long-term cold storage on media neither proven nor marketed for the task.
Managing archives is just a cost center until it isn't, and it's hard to predict what will have value. The worst part of this is that TFA discusses mostly music industry materials. Outside parties and the public would have a huge interest in preserving all this, but of course it's impossible. All private, proprietary, copyrighted, and likely doomed to be lost one way or another.
Oh well.
It broke my heart seeing those librarians in disbelief when their national library was sold off to the highest bidder. When they said "It seems our country does not value our own culture anymore".
Books lasted hundreds of years. Good luck trying to read a floppy from the 90s, or even DVDs that are already beyond their lifetime and are a very recent medium.
It gets worse when you read the fine print of the SSD specifications, wherein they state that an SSD may lose all its data after 2 weeks without power, and data retention rates are at less than 99%, meaning they will degrade after the first year of use. And don't get me started on SMR HDDs, I lost enough drives already :D
Humanity has a backup problem. We surely live in Orwellian times because of it.
The way I remember it, if you tried to read a floppy from the very early 90s, or from the 80s, you'd probably have no trouble at all, even many years later. You can probably still read floppies from the 80s without issue.
However, if tried to read a floppy from the late 90s, or 2000s, even when the floppy was new, good luck! The quality of floppy disks and drives took a steep nose-dive sometime in the 90s, so even brand-new ones failed.
https://images.samsung.com/is/content/samsung/assets/pl/memo...
Damn. 3 months for my SSD.
https://en.wikipedia.org/wiki/Linear_Tape-Open#Generations
LTO-1 started in 2000 and the current LTO-9 spec is from 2021. But it only has backwards compatibility for 1 to 2 generations. You can't read an LTO-6 tape in an LTO-9 drive.
https://en.wikipedia.org/wiki/Sticky-shed_syndrome
> Sticky-shed syndrome is a condition created by the deterioration of the binders in a magnetic tape, which hold the ferric oxide magnetizable coating to its plastic carrier, or which hold the thinner back-coating on the outside of the tape.[1] This deterioration renders the tape unusable.
Stiction Reversal Treatment for Magnetic Tape Media
https://katalystdm.com/digital-transformation/tape-transcrip...
> Stiction can, in many cases, be reversed to a sufficient degree, allowing data to be recovered from previously unreadable tapes. This stiction reversal method involves heating tapes over a period of 24 or more hours at specific temperatures (depending on the brand of tape involved). This process hardens the binder and will provide a window of opportunity during which data recovery can be performed. The process is by no means a permanent cure nor is it effective on all brands of tape. Certain brands of tape (eg. Memorex Green- see picture below) respond very well to this treatment. Others such as Mira 1000 appear to be largely unaffected by it.
Data migration and periodic verification is the answer but it requires more money to hire people to actually do it.
I've got files from 1992 but I didn't just leave them on a 3.5" floppy disk. They have migrated from floppy disk -> hard drive -> PD phase change optical disk -> CD-R -> DVD-R -> back to hard drive
I verify all checksums twice a year and have 2 independent backups.
A few weeks ago I wrote for a customer a restore utility for LTO-4/5/6 made with a now-deceased archival system from a deceased software company. Most of these tapes are up to 16 years old, have been kept in ordinary office cupboards, and work perfectly fine.
But you're right that archival isn't much about the media, but is a process. "Archive and forget" isn't the way.
The more serious problem is as you say that the older drives become obsolete. Even so, if you start using an up-to-date LTO format you can expect that suitable new tape drives will be available for buying at least 10 years in the future.
For HDDs, the most that you can hope is a lifetime of 5 years, if you buy the HDDs with the longest warranties.
I've got 30 hard drives in use right now and at least 10 are older than 5 years. A few are over 7 years old. I've also had hard drives die in less than a week.
Even if the data is on tape I want to emphasize that the tape needs to be periodically read and verify that the data is still readable and correct. Assuming the data is stable for 30 years and you can just leave it there is a dumb idea unless you didn't care about the data in the first place.
The key point ought to be that data must be regenerated within a specified period to avoid bit rot (through loss of remanence).
No matter how many backups you have if they're all made at the same time then they should all be regenerated at the same time (and within the storage safety margin period).
Iirc, the goal was to turnover to new tape media every 8-10 years.
Yes, it's a bit of a PITA. OTOH, modern HD's are huge, so a relative few are needed. And we've lost 0 bits of our off-site data in our >25 years of using that system.
So what's actually wrong with hard drives for archival? Do they deteriorate? Do they "rot" like DVDs/blurays/etc have been known to do? Or is this just an ad for their archival service?
That's really the main disadvantage of hard drives: the media is permanently coupled to the drive. If your tape drive fails, you can just pop the tape into a working drive and still get your data back.
It's certainly inconvenient, but this is my untested understanding of how drive recovery services can work.
Happened to me when I got a call out to a large UK outfit who'd have an extended power cut and knew recovery was going to be fun. First stop was a particularly critical PC which had exactly this problem, so open the case, touch the HDD just right and off it went - happy with that, and to the next item.
Anyway, the recovery operation went well, and this particular incident came floating back by way of a hushed comment from a manager a few years later about this tech who'd come in to help with the recovery, and who'd "...laid his hands on the PC, and it came back to life!" :)
That's very unlikely. If you're thinking of the "capacitor plague" of the 2000s, that only affected electrolytic capacitors, since it was caused by the Chinese poorly copying the formula for capacitor electrolyte. I don't believe hard drives used electrolytic capacitors in that time period, simply due to their size, though I could be wrong.
Leave a HD, audio or videotape long enough without regenerating it and eventually you'll have nothing left.
Stiction and faulty caps etc. are incidentall/secondary issues.
I'm struggling to understand why these miles of shelves filled with essentially hardware junk haven't been digitized at the time when this media worked and didn't experience read issues.
The article doesn't really provide an explanation for this other than incompetence and the business biting off more than it can reasonably chew. I'd be furious if I paid for a service that promised to archive my data, and 10-15 years later told me 25% of it was unreadable. I mean it's not like it was a surprise either. These workflows became digital 2-3 decades ago. There was plenty of time to prepare and convert this.
That's kind of what I'm paying you for.
As always, seems like the simple folk of /r/datahoarder and other archivist communities are more competent than a legacy industry behemoth.
The article is very vague on this, but I thought this company was first doing something like a bank safety deposit box. Send us your media in whatever format and we will keep it secure in a climate controlled vault. They don't offer to archive your data, they offer to store your media. Now it seems they pivoted to archiving data. This is an ad for their existing media storage clients to buy their data archive service:
> Iron Mountain would like to alert the music industry at large to the fact that, even though you may have followed recommended best practices at the time, those archived drives may now be no more easily playable than a 40-year-old reel of Ampex 456 tape.
Artistic endeavors are a unique blend of "extremely chaotic workflows nobody bothers to remember the moment the work is 'done'", "90% of our output doesn't recoup costs so we don't want to burn cash on data storage", and "that one thing you made 20 years ago is now an indie darling and we want to remaster it". A lot of creatives and publishers were sold on the promise of digital 30-odd years ago. They recorded their masters - their "source code" - onto formats they believed would be still in use today. Then they paid Iron Mountain to store that media.
Iron Mountain is a safe deposit box on steroids, they use underground vaults to store physical media. You store media in Iron Mountain if you want that specific media to remain safe in any circumstance[0], but that's a strategy that doesn't make sense for electronic media. There is no electronic format that is shelf-stable and guaranteed to be economically readable 30 years out.
What you already know works is periodic remigration and verification[1], but that's an active measure that costs money to do. Publishers don't want to pay that cost, it breaks their business model, 90% of what they make will never be profitable. So now they're paying Iron Mountain even more for data recovery on the small fraction of data they care about. The key thing to remember is that they don't know what they need to recover at the time the data is being stored. If they did, publishers wouldn't be spending money on risky projects, they'd have a scientific formula to create a perfect movie or album or TV show that would recoup costs all the time.
[0] The original sales pitch being that these vaults were nuke-proof.
[1] Your cloud provider does this automatically and that's built into the monthly fees you would pay. People who are DIYing their storage setup and using BTRFS or ZFS are using filesystems that automate that for online disks, but you still pay for keeping the disks online.
It's hoarding behavior. They paid "a lot" of money for it, have no idea how to further exploit it, but can't shake the feeling that it might be massively valuable one day.
The only difference is they pay someone to hold their hoard for them.
With 2 parties involved in the data, you may want to impose additional restrictions regarding how and when it can be replicated. The party requesting escrow clearly has interest in the source being as durable as possible, but the party providing the source may not want it to be made available across an array of dropbox-style online/networked systems just to accommodate an unlikely black swan event.
A compromise could be to require that the source reside on the original backup media with multiple copies and media types available.
It would be extremely unlikely for both disks to fail together.
What I'm describing is the bare minimum. This is their job, by all accounts. Amazing.
Microsoft has demoed some cool technology where they store data in glass, Project Silica. Sadly, it seems unlikely this will ever be available to consumers. One neat aspect of the design is that writing data is significantly higher power than reading. So you can keep your writing devices physically separated from the readers and have no fear that malicious code could ever overwrite existing data plates.
Some blurbs
Project Silica is developing the world’s first storage technology designed and built from the media up to address humanity’s need for a long-term, sustainable storage technology. We store data in quartz glass: a low-cost, durable WORM media that is EMF-proof, and offers lifetimes of tens to hundreds of thousands of years. This has huge consequences for sustainability, as it means we can leave data in situ, and eliminate the costly cycle of periodically copying data to a new media generation.
We’re re-thinking how large-scale storage systems are built in order to fully exploit the properties of the glass media and create a sustainable and secure storage system to support archival storage for decades to come! We are co-designing the hardware and software stacks from scratch, from the media all the way up to the cloud user API. This includes a novel, low-power design for the media library that challenges what the robotics and mechanics of archival storage systems look like.
https://www.microsoft.com/en-us/research/project/project-sil...What you're talking about already sort of exists, albeit media hadn't reached "cheap" yet, because the manufacturing scale wasn't there. People weren't interested enough in it. Archival Disc was a standard that Sony and Panasonic produced, https://en.wikipedia.org/wiki/Archival_Disc. Before the standard was retired you could by gen3 ones with 5.5TB of capacity, https://pro.sony/ue_US/products/optical-disc-archive-cartrid...
LTO tape was already at 15TB by the time their 300GB Discs came out, and reached 45TB capacity 3 years ago. Tape is still leaps and bounds ahead of anything achievable in optical media and isn't write-once. (https://en.wikipedia.org/wiki/Linear_Tape-Open)
Part of the problem is you can't just store and forget, you have to carry out fixity checks on a regular basis (https://blogs.loc.gov/thesignal/2014/02/check-yourself-how-a...). Same thing as with your backups, backups that don't have restores tested aren't really backups, they're just bitrot. You want to know that when you go to get something archived, it's actually there. That means you're having to load and validate every bit of media on a very regular basis, because you have to catch degradation before it's an issue. That's probably fine when you're talking a handful of discs, but it doesn't scale that well at all.
The amount of space that it takes for the drives to read the optical disc, the machinery to handle the physical automation of shuffling discs around etc. combined with the costs of it, just make no sense compared to the pre-existing solutions in the space. You don't get the effective data density (GB/sq meter) you'd need to make it make sense, nor do the drives come at any kind of a price point that could possibly overcome those costs.
To top it all off, the storage environment conditions of optical media isn't really any different from Tape, except maybe slightly less sensitive to magnetic interference.
No, they didn't. The largest LTO tape is only 18TB; your numbers are bogus. Those are BS advertised numbers with compression. If you're storing a bunch of movies or photos, for instance, you can't compress that data any further. The actual amount of data that the medium can physically store is the only useful number when discussing data storage media.
On a side note, they keep touting how robust their data archival solution is. But I have my doubts. For example, if an image has a big patch of 0 or 1 bits, then it might be impossible to accurately align the bit positions ("reclocking"); this is the same issue with QR codes and why they have a masking (scrambling) technique. Another problem is that their format doesn't seem to mention error correction codes; adding Reed-Solomon ECC is an essential technique in many, many popular formats already.
Your entire article sounds like a sales pitch. Your solution is, well, it's bad, but trust us, we can maybe recover it anyways. Otherwise your article fails to convey anything meaningful.
And once you start doing that, you've just quashed THE advantage tape had over disk. LTO doesn't provide any more reliability, it just shifts the failure points around. Instead of 20 year old sealed hard drives with bearings that will seize up and render your data unreadable, it'll be perfectly stable 20 year old tapes that no drive in the world can read. I'm also skeptical of the cost savings from cheap media once periodic remigrations are priced in, but it might still win out over disk for absolutely enormous libraries (e.g. entire Hollywood studio productions).
And no, there isn't some other tape format that has better long-term support. Oracle stopped upgrading T10000 around 2017, and IBM 3592 has an even worse backwards compatibility story than LTO.
[0] LTO-8 drives only have 1 generation of backwards compatibility because TMR heads get trashed by metal particulate tapes
We’re about to start a project to build an LTO-9 based in-house backup system. Any suggestions for DIY Linux based operation doing it “correctly” would be appreciated. Preliminary planning is to have one drive system on in our primary data center and another offsite at an office center where tapes are verified before storage in locked fireproof storage cabinet. Tips on good small business suppliers and gear models would be great help.
The problem quickly becomes:
- Do we have a drive that can read this tape?
- Do we have server we can connect it to?
- Do we have storage we can extract it to? (go ask your internal IT team for 10TB of drivespace...)
- What program did we create this tape with? Backup Exec, Veritas, ArcServe, SureStore
- You have the encryption keys, right?
- How much of this data already exists on the previous months backup?
- Who's going to pay for the storage to move it to Glacier/etc?
-How long is it going to take to upload?
Yes, though those ultra-thin M.2 NVMe drives could probably top that now.
> Do we have a drive that can read this tape?
Don't let this be a problem in the first place: buy 4 tape drives and keep 2 of them in your cold/offsite/airgapped storage site (2 in case 1 fails, so you can use the remaining drive to transfer everything to a newer format).
> Do we have server we can connect it to?
Significant hardware should not (and is not) necessary: the LTO-7/8/9 drives I see for sale right now seem to be using either USB 3.0, Thunderbolt, or SAS connections; USB and Thunderbolt can be handled by any computer you can find at a PC recycler today; while any old desktop can handle SAS with a $80 HBA card.
> Do we have storage we can extract it to? (go ask your internal IT team for 10TB of drivespace...)
10TB isn't a good example (case-in-point: I have a 3-year-old stack of unused 12TB WD drives less than 3 feet away from me).
That said, if you're enterprise-y enough for a $4000 LTO-9 drive, then you probably also have a SAN that's chock full of drives, so being able to provision a 10TB+ LUN should be implicit.
> What program did we create this tape with? Backup Exec, Veritas, ArcServe, SureStore
Ideally, none of those; instead, good ol' Perl and `dd`.
> You have the encryption keys, right?
I don't encrypt my backups to avoid this problem. My old data archives have little exploitable value for any potential attacker; and I imagine I'd store backup tapes this important in a fireproof safe in my parents' house or something. I'd only encrypt the entire tape if the tape were to leave my custody.
I appreciate that this is not for everyone, and it's probably illegal for some people/orgs to not encrypt backups too anyway (HIPAA, etc).
> How much of this data already exists on the previous months backup?
Incremental/Differential backups are still a thing.
> Who's going to pay for the storage to move it to Glacier/etc?
No-one should. Cold data backups should/must always be in the custody of a designated responsible officer of the company.
BUUUT, I guess there's nothing wrong with storing an encrypted copy in Cloud storage (as in S3/Glacier/AzureBlobs - not OneDrive...). I actually do this right now thanks to the smooth and painless integration in my Synology NAS. It costs me about $15/mo to store all these TBs in S3.
> How long is it going to take to upload?
Consider that it's 2024 - a company with LTO-9 and SAN is probably going to have a metro-ethernet IP connection at 10Gbps or even faster. At home I have a 10Gbps symmetric connection from Ziply (it's $300/mo and they give you an SFP module, which I put into my Ubiquiti UDM): so the limiting factor here is not my upload speed, but my drive read speed (LTO-9 drives seem to read at about 2-3Gbps raw/uncompressed?)
Nothing is fire proof. Is the cabinet "fire suppression system liquid" proof?
> Tips on good small business suppliers and gear models would be great help.
Hire an auditor would be my advice. Every business is different.
I am, just now, having flashbacks of when I was in a SOX environment and had to regularly contract with them... and while the experience can be somewhat unpleasant I've often found good auditors to be extremely knowledgeable about solutions and their practical implementation considerations.
Also check my tape management primer: https://blogs.intellique.com/tech/2022/01/27#TapeCLI
The vendor does not matter, whichever happens to be cheaper at the moment is fine.
For the tape drives, the internal drives can be cheaper by around 10%, but I prefer the tabletop drives, because they are less prone to accumulate dust, especially if you switch them on only when doing a backup or a retrieval. The tape drives have usually very noisy fans, because they are expected to be used in isolated server rooms.
I believe that the cheapest tape drives from a reputable manufacturer are those from Quantum. I have been using a Quantum LTO-7 tape drive for about 7 or 8 years and I have been content with it. Looking now at the prices, it should be possible to find a tabletop LTO-9 drive for no more than $5000. Unfortunately, the prices for tape drives have been increasing. When I have bought an LTO-7 tabletop drive many years ago it was only slightly more than $3000.
The tapes are much cheaper and much more reliable than hard disks, but because of the very expensive tape drive you need to store a few hundred TB to begin to save money over hard disks. You should normally make at least two copies of any tape that is intended for long-term archiving (to be stored in different places), which will shorten the time until reaching the threshold of breaking even with HDDs.
Even if there are applications that simulate the existence of a file system on a tape, which can be used even by a naive user to just copy files on a tape, like copying files between disks, they are quite slow and inefficient in comparison to just using raw tape commands with the traditional UNIX utility "mt".
It is possible to write some very simple scripts that use "mt" and which allow the appending of a number of files to a tape or the reading of a number of consecutive files from a tape, starting from the nth file since the beginning of a tape. So if you are using only raw "mt" commands, you can identify the archived files only by their ordinal number since the beginning of the tape.
This is enough for me, because I prepare the files for backup by copying them in some directory, making an index of that directory, then compressing it and encrypting it. I send to the tape only encrypted and compressed archive files, so I disable the internal compression of the tape drive, which would be useless.
I store the information about the content of the archives stored on tapes (which includes all relevant file metadata for each file contained in the compressed archives, including file name, path name, file length, modification time, a hash of the file content) in a database. Whenever I need archived data, I search the database, to determine that it can be found, for instance in tape 63, file 102. Then I can insert the corresponding cartridge in the drive and I give the command to retrieve file 102.
I consider much better the utility "mt" of FreeBSD than that of Linux. The Linux magnetic drive utilities have seen little maintenance for many years.
Because of that, when I make backups or retrievals they go to a server that runs FreeBSD, on which the SAS HBA card is installed. When a tabletop drive is used, the SAS HBA card must have external SAS connectors, to allow the use of an appropriate cable. I actually reboot that server into FreeBSD for doing backups or retrievals, which is easy because I boot it from Ethernet with PXE, so I can select remotely what OS to be booted. One could also use a FreeBSD VM on a Linux server, with pass-through of the SAS HBA card, but I have not tried to do this.
My servers are connected with 10 Gb/s Ethernet links, which does not differ much from the SAS speed, so they do not slow much the backup/retrieval speed. I transfer the archive files with rsync over ssh. On slow computers and internal networks one can use rsync without ssh. I give the commands for the tape drive from the computer that is backed up, as one line commands executed remotely by ssh.
The archive that is transferred is stored in a RAMdisk before being written on the tape, to ensure that the tape is written at the maximum speed. I write to the tape archive files that have usually a size of up to about 60 GB (I split any files bigger than that; e.g. there are BluRay movies of up to 100 GB). The server has a memory of 128 GB, so I can configure on it a RAMDdisk of up to 80 GB without problems. This method can be used even with a slow 1 Gb/s or 2.5 Gb/s network, but then uploading a file through Ethernet would take much more time than writing or reading the tape.
There is one weird feature of the raw "mt" commands, which is poorly documented, so it took me some time to discover it, during which I have wasted some tape space.
When you append files to a partially written tape, you first give a command to go to the end of the written part of the tape. However, you must not start writing, because the head is not positioned correctly. You must go 2 file marks backwards, then 1 file mark forwards. Only then is the head positioned correctly and you can write the next archived file. Otherwise there would be 1 empty file intercalated at each point where you have finished appending a number of files and then you have rewound the tape and then you have appended again other files at the end.
If you aren’t budget constrained today and had to set it all up again. What would you do?
While I’m a Linux guy, I’ll happily run BSDs when appropriate, like for pfSense, and if it really has better mt tools or driver for LTO-9 drives due to the culture/contributors being more old school, then I’d just grab a 1U server to dedicate for it run a BSD and attach the drive to that.
You seem to have extensive practical hands on experience and while I was doing tapes 20 years ago this will be first time I’m hands on again with it since then. So I need to research most reliable drive vendors and state of kernel drivers and tools, just as you are alluding to.
Pretend you have $50K if needed (doubt it). 2PB existing data, 1PB/year targeted rate, probably 10-20%/year acceleration on that rate. with a data center rack location, 20Gb/s interconnect via bonded 10Gb NICs to storage servers (45drives storinators) and then an office center cabinet/rack/desk (your choice) and will put a tape drive holding at least 8 tapes in data center, planning for worst case of 100TB a month and data center visits to swap in new tapes shouldn’t be too frequent. Any details on what you would do would be interesting.
That was a long time ago but I’ve peeked in at backup systems in the intervening years and it does seem to hold true over time.
But it really depends how much data you have. My ex dropped a single HDD in a safety deposit box at CoB, N times per week and fetched back the oldest disk. I don’t think she ever said how many were in there but I doubt it was more than three. I think the CTO took one home with him once per week.
The silly thing about most of this set up is that the office, the bank, and the data center were all within half a kilometer of each other. If something bad happened to that part of town they only had the infrequent offsite backup.
I don't get where people get the impression that X was at the top right before tapes got that last innovation (where X here is most often HDDs, but not always). But that's always the impression, and tapes are always on top.
People also have been working with 3D phase change drives since the 90s. Those always promise to replace tapes. But nobody ever got them robust enough to leave the labs.
So it may sound like a sales pitch but I consider it more a warning notice
That aside, this sounds extremely old-fashioned, but it seems to me that the only media that is acceptable for long-term storage is going to be punched paper tape. How long does paper last? How long do the holes in it remain readable? Can it be spliced and repaired?
Why is this surprising?
It's been known for decades that magnetic media loses remanence at several percent a year. It's why old sound tape recordings sound noisy or why one's family videotapes of say a wedding are either very noisy or unreadable 20 or so years later.
Given that and the fact that hard disks are already on the margin of noise when working properly it's hardly surprising.
The designers of hard disks go to inordinate lengths to design efficient data separators. These circuits just manage to separate the hardly-recognizable data signal from the noise when the drive is new and working well so the margin for deterioration is very small.
The solution is simple, as the data is digital it should be regenerated every few years.
Frankly I'm amazed that such a lax situation can exist in a professional storage facility.
Edit: has this situation developed because the digital world doesn't know or has forgotten that storing data on magnetic media is an analog process and such signals are deteriorated by analog mechanisms?
https://www.backblaze.com/blog/top-5-not-so-obvious-backup-a...
The final warning was the 2019 news of the 2008 Universal fire.
We can't preserve bits like books.
The only reason we have any copies of "books" (i.e. long written works) from the ancient world is that they were painstakingly copied over centuries from one medium to another, by hand for most of that time.
Some 25 years ago, the hardest part in booting some Apollo workstations, was to make hard drives spin.
I think for the average person, the best thing to do, for long term archives is to take advantage of sturgeons law, "90 percent of everything is crap". triage the things you want to archive to a minimum, then print them out, at human scale, on paper. have physical copies of the photos you want to keep, listings of the code you are proud of, correspondence that is dear to you.
This will last, with no intervention, a very long time. Because as is increasingly becoming obvious, once the format drifts below human scale the best way to preserve data is to manage the data separate from the medium it is stored on with a constant effort to move it to a current medium. where it easily evaporates once vigilance drifts.
Ah, this stupid industry.
"Data management" sometimes means destruction, sometimes means preservation.
Almost like the title was purposely crafted to mislead you to draw eyeballs.
Or I have to mount them in another OS that isn't Windows. It's more than just adjusting DAW settings and updating plugins at this point you need to know that around 2000 the filesystems completely changed with NTFS and added security that wasn't present before.
By the time Vista/7 FAT hard drive support is gone from Microsoft land. There are of course add-ons and such but you still need to _know_ this happened and FAT drives look unformatted in modern Windows.
In fact, Microsoft is adding features to Windows' FAT driver: https://www.bleepingcomputer.com/news/microsoft/microsoft-re...
Any time you're physically warehousing old hard drives and whatnot, they're going to be turning into bricks.
Whereas with cloud providers, they're keeping highly redundant copies and every time a hard drive fails, data gets copied to another one. And you can achieve extreme redundancy and guard against engineering errors by archiving data simultaneously with two cloud providers.
Is there any situation where it makes sense to be physically hosting backups yourself, for long-term archival purposes? Purely from the perspective of preserving data, it seems worse in every way.
Whether we collectively need to store all these things is another question entirely. But if we want to keep it - we'll have to do the work to keep it maintained.
Or so they say. It's not like you can double-check.
> Is there any situation where it makes sense to be physically hosting backups yourself, for long-term archival purposes?
Yes, political and legal risks. There's no guarantee your cloud won't terminate your account for any of a thousand reasons in the future.
Don't they publish SLA numbers? The reliability of the major cloud providers seems quite well-established.
> There's no guarantee your cloud won't terminate your account
Two cloud providers pretty much guarantees against that -- the idea that two would terminate it simultaneously is vanishingly small.