archive.org does use torrents and I have one such torrent laying around in my client, which occasionally connects to peers although the the trackers are currently offline. I suppose a new client would find me and other peers through the DHT. I'd share a magnet link for someone to try, but it's a copyright-ignoring ROM dump archive so it may not be the best idea to post it here.
It's interesting that torrents may not be the first thing that comes to mind. They have the "PR issue" of being the now seemingly mundane way in which we've been downloading DVD rips for the last 20 odd years. Newer technology like IPFS does a better job making the cool core of this technology actually sound cool.
(or r/archiveteam ?)
Personally, I have archived a few of the magazine collections.
- take offline
- purge 'problematic' archives
- return to service
is that impossible? are there redundancies to make this very hard?
The problem that multi generational projects like this always have is tech debt. Any library/dependency chosen by the previous generation might be unmaintained for decades until it falls through the cracks and someone notices it.
Heretrix, for example, was written in a very old "Java way" to do it. They have also lots of services that were built in the PHP4 age, with globals by default and stuff like that.
Always keep in mind that whatever you choose, it's a bet, essentially. Over time you'll realize that different language ecosystems have different aligned or misaligned goals to your project. Don't choose libraries because of hype, choose them because of maintainability.
> "And that's the twentieth-century model, what they used to call an electronic Pearl Habour. Things have moved on since then. Footnotes inserted in government reports feeding into World Trade Organization negotiating positions. Nothing we'd notice at first, nothing that would be obvious for a couple of years. You don't want to halt the state in its tracks, you simply want to divert it into a sliding of your choice."
Who knows what will appear after the archives are restored?
Re your comprehensive edit, I totally am on board with that tech choice idea. It's a bet, avoid the fads, pick stuff that's robust (or at least a fit for your possible futures)
Software-wise I wouldn't know where to start, honestly, because the internet archive as a project is so vast [1] that it's hard to get an architectural overview of how the pieces are glued together. Unifying the tech stack seems to have been no concern at all in its development...
But from a pentesting perspective I'd try to find vulnerabilities in the perl based services first, then Java, then PHP, then NPM and so on... because older projects tend to have a higher likeliness of being unmaintained or using outdated libraries.
[1] (~242 public repositories) https://github.com/orgs/internetarchive/repositories
They definitely should.
Insane that only examples of me myself ever doing anything is in printed copies from 1970's. In National Archives where some aunties still believe that Internet is just a passing fad.