Internet Archive Services are "temporarily offline" (opens in new tab)

(archive.org)

104 pointspushedx1y ago34 comments

34 comments

25 comments · 8 top-level

userbinator1y ago· 9 in thread

This incident brings up a good point: Who archives the archives?

There have been collaborative computing projects like SETI@home [1] and Folding@Home [2] where unused computing power could be used for productive purposes. Could there be something similar for storage? Software that provides unused storage for Internet archiving? In the best case scenario, we could have redundant backups of the Internet Archive distributed around the world.

[1]: https://setiathome.berkeley.edu/

[2]: https://foldingathome.org/

boomlinde1y ago

Perhaps torrents?

archive.org does use torrents and I have one such torrent laying around in my client, which occasionally connects to peers although the the trackers are currently offline. I suppose a new client would find me and other peers through the DHT. I'd share a magnet link for someone to try, but it's a copyright-ignoring ROM dump archive so it may not be the best idea to post it here.

It's interesting that torrents may not be the first thing that comes to mind. They have the "PR issue" of being the now seemingly mundane way in which we've been downloading DVD rips for the last 20 odd years. Newer technology like IPFS does a better job making the cool core of this technology actually sound cool.

2 more replies

binaryroof1y ago

The vision behind IPFS is that (to an extent) https://ipfs.tech/

1 more reply

anacrolix1y ago

https://github.com/anacrolix/btlink

odo12421y ago

There is currently ArchiveTeam going on

Sakos1y ago

I really wish the EU would have their own organisation for creating an internet archive that at the very minimum mirrored IA. This is our history and there's only a single place now that has any significant archive of it. It seems like the EU should have a significant interest in preserving it for generations to come.

Unbefleckt1y ago

Demonising it is more fashionable right now. Everyone I know that contributes to the Internet archive is right of Center enough to be considered a horrible person.

JKCalhoun1y ago

r/DataHoarder

(or r/archiveteam ?)

Personally, I have archived a few of the magazine collections.

notpushkin1y ago

https://archiveteam.org/index.php/IA.BAK

1 more reply

keepamovin1y ago· 8 in thread

how vulnerable is IA to some malicious actor who wanted to rewrite history or run an 'information cleansing' operation?

- take offline

- purge 'problematic' archives

- return to service

is that impossible? are there redundancies to make this very hard?

cookiengineer1y ago

Don't give the SVR any ideas, man.

The problem that multi generational projects like this always have is tech debt. Any library/dependency chosen by the previous generation might be unmaintained for decades until it falls through the cracks and someone notices it.

Heretrix, for example, was written in a very old "Java way" to do it. They have also lots of services that were built in the PHP4 age, with globals by default and stuff like that.

Always keep in mind that whatever you choose, it's a bet, essentially. Over time you'll realize that different language ecosystems have different aligned or misaligned goals to your project. Don't choose libraries because of hype, choose them because of maintainability.

Apocryphon1y ago

I dunno about the state actor hypothesis, but if there is, it all sounds like Charles Stross's description of future cold war in Halting State:

> "And that's the twentieth-century model, what they used to call an electronic Pearl Habour. Things have moved on since then. Footnotes inserted in government reports feeding into World Trade Organization negotiating positions. Nothing we'd notice at first, nothing that would be obvious for a couple of years. You don't want to halt the state in its tracks, you simply want to divert it into a sliding of your choice."

Who knows what will appear after the archives are restored?

1 more reply

keepamovin1y ago

Hah! As if they need ideas. But that's not the point, how possible is it?

Re your comprehensive edit, I totally am on board with that tech choice idea. It's a bet, avoid the fads, pick stuff that's robust (or at least a fit for your possible futures)

cookiengineer1y ago

I'd say we have to differentiate between human error as an attack surface and software bugs / vulnerabilities as an attack surface here.

Software-wise I wouldn't know where to start, honestly, because the internet archive as a project is so vast [1] that it's hard to get an architectural overview of how the pieces are glued together. Unifying the tech stack seems to have been no concern at all in its development...

But from a pentesting perspective I'd try to find vulnerabilities in the perl based services first, then Java, then PHP, then NPM and so on... because older projects tend to have a higher likeliness of being unmaintained or using outdated libraries.

[1] (~242 public repositories) https://github.com/orgs/internetarchive/repositories

emmelaich1y ago

I hope that Google (for instance) has an occasional snapshot of everything tucked away somewhere on a tape in Norway or somewhere. Like the seed bank.

bubblesnort1y ago

- openly speculate the tactic to preemptively address concerns

keepamovin1y ago

Exactly! Red-team the situation to identify weaknesses, build defenses and devise overall strategy! :)

g-b-r1y ago

Yeah, last time I checked they weren't doing any timestaping.

They definitely should.

Apocryphon1y ago

The timing of Google getting rid of the Google Cache couldn't be even worse with these ongoing DDOS attacks on and necessary hardening of the Internet Archive. Wonder what kind of twisty narrative one could posit about why this is happening?

jlund-molfese1y ago

timonoko1y ago

What we really need right now is "black hole" of information. A place where you can push stuff, but retrieving is impossible until that time when ironclad legitimation can be automated.

Insane that only examples of me myself ever doing anything is in printed copies from 1970's. In National Archives where some aunties still believe that Internet is just a passing fad.

ChrisArchitect1y ago

Re: https://news.ycombinator.com/item?id=41836677

jaredb31y ago

Fix the Internet Archive to be back online.

conormarcellus1y ago

internet archive

j / k navigate · click thread line to collapse

34 comments

25 comments · 8 top-level

userbinator1y ago· 9 in thread

This incident brings up a good point: Who archives the archives?

divbzero1y ago

[1]: https://setiathome.berkeley.edu/

[2]: https://foldingathome.org/

boomlinde1y ago

Perhaps torrents?

2 more replies

binaryroof1y ago

The vision behind IPFS is that (to an extent) https://ipfs.tech/

1 more reply

anacrolix1y ago

https://github.com/anacrolix/btlink

odo12421y ago

There is currently ArchiveTeam going on

Sakos1y ago

Unbefleckt1y ago

Demonising it is more fashionable right now. Everyone I know that contributes to the Internet archive is right of Center enough to be considered a horrible person.

JKCalhoun1y ago

r/DataHoarder

(or r/archiveteam ?)

Personally, I have archived a few of the magazine collections.

notpushkin1y ago

https://archiveteam.org/index.php/IA.BAK

1 more reply

keepamovin1y ago· 8 in thread

how vulnerable is IA to some malicious actor who wanted to rewrite history or run an 'information cleansing' operation?

- take offline

- purge 'problematic' archives

- return to service

is that impossible? are there redundancies to make this very hard?

cookiengineer1y ago

Don't give the SVR any ideas, man.

Heretrix, for example, was written in a very old "Java way" to do it. They have also lots of services that were built in the PHP4 age, with globals by default and stuff like that.

Apocryphon1y ago

I dunno about the state actor hypothesis, but if there is, it all sounds like Charles Stross's description of future cold war in Halting State:

Who knows what will appear after the archives are restored?

1 more reply

keepamovin1y ago

Hah! As if they need ideas. But that's not the point, how possible is it?

Re your comprehensive edit, I totally am on board with that tech choice idea. It's a bet, avoid the fads, pick stuff that's robust (or at least a fit for your possible futures)

cookiengineer1y ago

I'd say we have to differentiate between human error as an attack surface and software bugs / vulnerabilities as an attack surface here.

[1] (~242 public repositories) https://github.com/orgs/internetarchive/repositories

emmelaich1y ago

I hope that Google (for instance) has an occasional snapshot of everything tucked away somewhere on a tape in Norway or somewhere. Like the seed bank.

bubblesnort1y ago

- openly speculate the tactic to preemptively address concerns

keepamovin1y ago

Exactly! Red-team the situation to identify weaknesses, build defenses and devise overall strategy! :)

g-b-r1y ago

Yeah, last time I checked they weren't doing any timestaping.

They definitely should.

Apocryphon1y ago

jlund-molfese1y ago

timonoko1y ago

What we really need right now is "black hole" of information. A place where you can push stuff, but retrieving is impossible until that time when ironclad legitimation can be automated.

Insane that only examples of me myself ever doing anything is in printed copies from 1970's. In National Archives where some aunties still believe that Internet is just a passing fad.

ChrisArchitect1y ago

Re: https://news.ycombinator.com/item?id=41836677

jaredb31y ago

Fix the Internet Archive to be back online.

conormarcellus1y ago

internet archive

j / k navigate · click thread line to collapse