How to become a pirate archivist (opens in new tab)

(annas-blog.org)

579 pointspilimi_anna3y ago99 comments

99 comments

67 comments · 14 top-level

jancsika3y ago· 17 in thread

I'm curious how Sci-hub's approach compares to the What.cd/Redacted approach.

IIUC Sci-hub has scooped up science docs through a good enough UX that it was able to leverage the goodwill of science folks to upload docs (plus whatever other methods it has used to scoop up docs), and it uses a public blitzkrieg-style distribution mechanism. I.e., I guess if one had a big enough harddrive and a fast enough internet connection, one could start downloading the lib right now and see if they win the race against the copyright holders.

On the other hand, the What.cd/Redacted approach seems to use Bittorrent ratios to create a private-tracker economy. New users get a few gigs free download on joining. But apparently because a) there's a 1:1 upload/download ratio, and b) a few first-mover fat cats are sitting on enormous ratios, this means there is a scramble by everyone else to upload new FLACs to build up their ratio so they can continue to be able to download FLACs. It seems that would mean the library-in-its-entirety cannot be easily replicated at will. Yet the tracker was apparently already nuked off the internet as What.cd and reappeared later as Redacted. Was any data lost between the two services?

Oh yeah, there's also apparently another approach in rutracker, which seems to be blitzkrieg to add content and publish, at the (apparent?) cost of quality of content.

It's really a shame that the nerdy, completist domain of digital archiving through torrents isn't covered by fair use. Perhaps we could exclude the most recent 10 years of music so that the hopeful young musician streamers can get paid a few hundred dollars for millions of streams and then receive the silver lining of fair use protection against a label refusing to release one of their albums.

MontyCarloHall3y ago

Sci-Hub was interesting because until 2021, it could automatically add any article not in its database by querying proxy servers set up at universities that actually subscribed to the journals, which would then download the PDF from the journal’s website and forward it to Sci-Hub. (This was the approach most academics took in the pre-Sci-Hub days; they’d email friends at other universities and ask them if they had access to a given article.)

Sadly, Sci-Hub took down this “magic proxy” to try and win a court case in India, which its creator thinks might legitimize it elsewhere. It’s a huge shame, because it means that many obscure articles are now inaccessible via Sci-Hub.

Ironically, a pure proxy-based Sci-Hub that didn’t host any articles on its own might actually be legal in certain countries, since it’s not actually hosting any copyrighted content itself. It definitely would be a lot easier to host and a lot harder to shut down; indeed, it could be completely decentralized.

pdntspa3y ago

The Pirate Bay has never hosted any content; yet many, many cases have been decided against them. Why would that work for sci-hub?

2 more replies

genewitch3y ago

ohhhh, so that's why, when my people got their PhD and Master degrees from universities suddenly sci-hub stopped working correctly. I put out the word that it was possible to get any study ever and people asked me this year for a bunch of studies that, prior to this year, should have been the definition of ease.

Months later, I'm still waiting on sci-hub or anyone to get access to the studies.

The real WTF is science publishing. never-mind reproducibility of the studies, just getting the study in the first place is a predicament. at least genesis still works for 85% of requests i get.

throwhn00000013y ago

Redacted's ratio economy is fucked. There's groups of people using high speed seedboxes that grab every new upload to build up ratio in a normal way, if you can call it that.

Everyone else is left with scraps, trickling data out to whoever comes later. Or maybe you get lucky and you're the lone seeder, you get a 1:1 copy. You just better hope that was on a 24-Bit FLAC release, given how big they can get.

There's many many threads on https://old.reddit.com/r/trackers about REDs economy problems. OPS has a bonus point system and suffers less, but has worse content than RED. My opinion only though

rutracker is all about putting content out there. A lot of stuff is mislabelled or lower quality, less retention, identified wrong, seeded slowly. But you can sure find a lot of oddities there.

> was any data lost

yes, absolutely. I've hundreds of albums not on RED, and I can't bother to reupload them. it's a total waste of time when you know they're unseeded in a week if not for me hanging on.

time better spent finding new music and sharing everything on soulseek instead.

Stevvo3y ago

Interesting. I haven't used Redacted, but was an avid user of Oink's Pink Palace, the OG music tracker. Ratio requirements were to partially to encourage people to upload stuff. The site was defined by the number (and quality) of uploads it had. If you upload a 24-bit FLAC yourself, you are guaranteed at-least its file-size added to your ratio. Seed-boxes were pretty much accepted as a requirement if you didn't want to upload anything. i.e. each user was expected to contribute in some way.

That said, things are probably different nowadays. In Oink's day everyone was on ADSL at home. Was paying €20 a month for a 100/100 OVH box; today I have 1000/100 at home and can instead spend that €20 on Spotify and Bandcamp.

relaxing3y ago

What saves the economy is the generous freeleech and gift token events. (Currently one running if you haven’t logged in this week.)

It all serves the purpose of getting users to be good citizens participating in the community, and not just snatch and run.

2 more replies

loeg3y ago

> Yet the tracker was apparently already nuked off the internet as What.cd and reappeared later as Redacted.

You imply there is some continuity in the operation between these two trackers, but I don’t believe that’s the case. What.cd shut down. Subsequently, redacted (passtheheadphones) and apollo started, appealing to the same userbase. Neither of those trackers were privy to what.cd’s databases.

genewitch3y ago

in the sets of what.cd and waffles.fm and Redacted, approximate the intersection of the sets.

I like music a lot. I have a lot of vinyl and weird CDs, too. However, i can't be assed to rip to whatever draconian style-guide some of these private trackers want. So it's a matter of being a member of several servers and finding something that either isn't listed or seeded and "filling" or creating a torrent with some other tracker's set of files.

this is rewarding the wrong behavior. If i remember some song i heard in 1996, i should just be able to get it. It would be nice if all of the people who were involved in the creation and publication of the song got rewarded, somehow, but that's just not how art works in capitalism. I say this as someone who has personally released 11 CDs and a further 6 CDs in collaboration, of music. I haven't been paid a penny or more for anything i've ever produced in "art". I don't consider this a downside. People who know me and know i write music appreciate my music. People who don't know me will miss out. That's all there is to it.

soulofmischief3y ago

Unfortunately, a lot of data is currently yet to be restored. Some of that music might not surface again for decades to come.

luckylion3y ago

> Was any data lost between the two services?

Yes, absolutely, a lot.

Even though some people have automated their setup very well and have been downloading (and uploading on the newer trackers) a lot, it's just a giant amount of content, it's unlikely for any one person to have it all, and coordination was very limited back then.

The birthday release of the torrent db of what.cd includes 2.6m torrents in 1.2m groups (aka individual releases), in total weighing in at 588TB (or 421TB is you discard mp3, but there was content that hadn't been available in FLAC). That's doable on an SWE salary today if you're dedicated, but what.cd was shut down in 2016, and you'd still need to deal with the ratio system during collection.

throwhn00000013y ago

after wcd fell some private sftp and dc+ servers came up for about 100 of the top seeding and hoarding members who were all familiar with each other. people regularly shared new content and filled out mussing music in their collections. i dont think it exists anymore but it might. i did not have access but knew several people that did who got me content to complete music sets

the combined amount of content on those servers was probably around 100TB but most likely more

1 more reply

jumelles3y ago

> Was any data lost between the two services?

Undoubtedly.

eimrine3y ago

> What.cd/Redacted approach seems to use Bittorrent ratios to create a private-tracker economy. New users get a few gigs free download on joining.

Isn't that "bittorrent ratio" easy to cheat? I remember a good old times where I had to download some popular files just for ratio, then (beginning of 10's everybody starts to cheat and some of the biggest trackers turned to forever free leach.

ShowalkKama3y ago

>Isn't that "bittorrent ratio" easy to cheat?

Yes[0] but you have to be careful else you might get caught

0: https://wiki.installgentoo.com/wiki/Ghostleech

290830113977783y ago

Other users also report their upload and download. Ergo, if you say you uploaded 10gig, it would show up in other users' download. This is tracked and checked, and you'll be kicked if you try to play the system this way.

hatware3y ago

Bounty systems are a pretty common way to get insane upload ratios so that you can archive.

throwhn00000013y ago

>insane upload ratios

not with most RED bounties you can fill. if you only sort by biggest bounties, you get albums that realistically can't be filled, or that would require serious money and effort to find.. rare asia specific releeases and stuff.

you can do specific requests if you have accounts for streaming platforms but nobody makes bounty requests for those

ynno3y ago· 11 in thread

I think Alexandra Elbakyan actually did not want to be revealed as the librarian behind Sci-Hub, it was her poor opsec that led to her being identified.

Basically her servers were set up to emit detailed error messages from PHP, including full path of faulting source file, which was under directory /home/ringo-ring, which could be traced to a username she had online on an unrelated site, attached to her real name. Before this revelation, she was anonymous.

est3y ago

> which was under directory /home/ringo-ring, which could be traced to a username

Ha, my home dir is always called "me" or "and". Try google that.

NaturalPhallacy3y ago

This is deep infosec. Instead of security through obscurity, it's security through ubiquity.

pavel_lishin3y ago

People might be able to, now!

1 more reply

latchkey3y ago

Smart. Here's how to rename on OSX (might break a lot of tooling, but whatever).

https://support.apple.com/en-us/HT201548

teddyh3y ago

IIRC, on new installations of NeXTSTEP (based on 4.3BSD Unix), the username of the single installed desktop user was "me".

O__________O3y ago

Electron star - seems oddly specific, though maybe not spelling it same way you do.

1 more reply

genewitch3y ago

i like how people are trying to correlate your HN account with - ostensibly - wild /home/ directories. This is why i've recently moved off of this username, going forward. 20 years as genewitch has attached a lot of bad "OPSEC" to this username, and coupled with the fact that i have federal licenses means that people can just google my entire life story.

my last name, without any other information, has one tenth of the bits of information that "genewitch" does.

thakoppno3y ago

I’ve narrowed it down to the east coast of the US.

pilimi_annaOP3y ago

Did not know that detail. Will add to the post, thank you.

userbinator3y ago

And from your addendum:

So, use random usernames on the computers you use for this stuff, in case you misconfigure something.

...or a username that is so common as to be meaningless, like "user", "Administrator", or even "root".

4 more replies

O__________O3y ago

I was able to find an archived page of hers using ringo-ring back in 2010, which I believe predates Sci-Hub’s launch in 2011; not able to find reference to “home/ringo-ring” anywhere else, but given prior information, very least seems plausible.

imhoguy3y ago· 10 in thread

As an active hoarder I think there is problem with "6. Distribution: Packaging it up in torrents, announcing it somewhere, getting people to spread it.".

I miss a p2p application with torrent packaging and Kademlia like per-file advertising and discovery, where I could point it to my hefty NAS directory of random things and they could be wired to released torrents. This way we could make torrents live much longer, even partially complete. In super extra option the app could even notify me to load DVD because somebody asked for a file which I indexed and advertised previously.

For years my program preferences changed, file locations changed - I have moved files around, made them offline, burned on DVDs, deleted some parts of torrent just to keep interesting stuff. Now these torrents are lost, at least my seeding contribution is gone. But I almost never change the content of these files, their checksum stays the same forever, so they could be still discoverable.

The digital preservation needs better distribution system.

londons_explore3y ago

The "v2" torrent file format allows most of this. Some clients have support already.

All you'd have to do is make a torrent of your whole hard drive and then seed that. You don't need to publish the torrent anywhere.

If anyone else in the world is downloading any other torrent that happens to contain a file you have, they will end up connecting to your machine to download it.

chhs3y ago

A v2 torrent allows clients to identify duplicate files across torrents via the "pieces root" entry in the metadata, so if they're downloading from torrent A and B, and each share file C, they can utilize peers from either swarm.

But there's no way for other clients to know that there exists another torrent containing the file they are interested in if they only have the metadata for torrent A. In other words, there's no lookup mechanism for a "pieces root" to know that torrent B exists and contains file C.

If you were to make a v2 torrent of your entire drive, other clients won't know to download from your torrent. They'd need to have the contents of the metadata to know it would contain a file they are interested in, and have no way of knowing which metadata contains the desired "pieces root" entries without downloading all of them.

I'm very interested in this problem space, if you are aware of clients/mechanisms that allow for this I would love to hear them.

3 more replies

Retr0id3y ago

Hm, is there a security concern here?

Someone could use this to

A) Remotely check the presence of any specific file on your machine.

B) Exfiltrate the contents of any file they know the hash of (or possibly more specifically, the hash of each piece? I don't know the protocol details).

Fine if you have a dedicated "I expect the contents to be public" drive or directory, but not something I'd want to do on my OS drive.

1 more reply

chhs3y ago

I did work on a proof of concept program to accomplish this for my own content library. It would scan a directory to find files and compare them with locally stored metadata. For v2 torrents this is trivial to do via a "pieces root" lookup, for v1 torrents it involves basically checking that each piece matches, and since pieces may not align with the file then it's not possible to guarantee that it's the same file without having all of the other files in the torrent.

I built it with libtorrent and after loading in all of the torrents (multiple TBs of data), it would promptly and routinely crashed. I couldn't find the cause of the error, it doesn't seem it was designed to run with thousands of torrents.

One problem that I've yet to build a solution for is finding the metadata to use for the lookup phase. I haven't been able to find a publicly available database of torrent metadata. If you have an info hash then itorrents.org will give you the metadata, if it exists. I started scraping metadata via DHT announcements, but it's not exactly fast, and each client would have to do this unless they can share the database of metadata between them (I have an idea on how to accomplish this via BEP 46).

anacrolix3y ago

I have a solution to this, it's the successor to Magnetico.

1 more reply

trevyn3y ago

>One problem that I've yet to build a solution for is finding the metadata to use for the lookup phase.

I think BEP 51 followed by BEP 9 is all you need.

1 more reply

3np3y ago

Would you mind sharing the source? Sounds like something others could build on.

1 more reply

justshowpost3y ago

But why? The current Kademlia implementation built in eMule/aMule does exactly this. Primary data advertised is file name string, there is also a support for several meta fields. Transfer proto sucks, but I don't think it really matters in the case of ultra-rare content.

imhoguy3y ago

That `ultra-rare content`, eMule is mess if you need collection of files, ofc you can share ZIP but they get somehow repackaged too often. Torrents nicely group the stuff - it is kind of album.

I've dreamed of federated meta-client which mixes all available p2p networks in a wild and can download missing torrent part from eMule or Soulseek.

dangerface3y ago

> I miss a p2p application with torrent packaging and Kademlia like per-file advertising and discovery, where I could point it to my hefty NAS directory of random things and they could be wired to released torrents.

That was the problem with p2p no one bothered to actually package their releases instead just slinging it onto the internet no care given as to if it was complete or even correctly labeled.

Sites like pirate bay had everything, torrents are a super fast and error resistant way to distribute content its the perfect system. The problem was that the content is illegal so it was nuked by companies with more money and power than any one can fight against.

The problem has never been technical or distribution the issue is the law.

yamrzou3y ago· 8 in thread

«That secrecy, however, comes with a psychological cost. Most people love being recognized for the work that they do, and yet you cannot take any credit for this in real life.»

Feels like the anonymous torrent seeder who keeps seeding a file for years just for the sake of keeping it alive. It's not easy, but some people seem to be able to derive full pleasure from accomplishing the task itself, whether recognition happens or not.

incompatible3y ago

It's probably not that uncommon that people working for companies, governments, or criminal organizations can't talk about their work in public.

One group I remember in particular are mathematicians working for the NSA, etc., who are not permitted to publish their research, then they watch as other mathematicians rediscover their work and get the credit.

michaelt3y ago

NSA mathematicians still get some recognition though:

* Fellow NSA employees, your coworkers and boss

* Cash money, my personal favourite form of recognition

* Your close friends and trusted loved ones will know the broad outlines of what you're doing.

It doesn't make you world-famous but it wouldn't be as lonely as a job that needed total secrecy.

1 more reply

genewitch3y ago

Is this really a bad thing, though? I mean, personally, recognition seems important. I've withheld patenting certain inventions that became commercial products a half dozen times in my life.

That some specific instance of a discovery or whatever becomes the mainstream version is, well, it's irrelevant. Who discovered calculus? It doesn't actually matter because calculus works without some belief system and worship. Traffic routing algorithms? yes, if the person is alive and kicking, being able to lay claim to some algorithm or novel solution is a CV bullet point, but, and i say this with the utmost respect: most people are one hit wonders. If they can ride that "fame" to higher pay or respect, cool. But in the grand scheme, it's irrelevant. Ideas should be spread far and wide, so that people who have a greater understanding can explain the ideas to those without an understanding.

Capitalism is the problem.

madmax1083y ago

One of my proudest achievements-that-dont-count-for-much-outside-of-internet-nerd-innercircles is maintaining a seed ratio of >7 over a a decade and a half in a world where most folks are happy with their seed ratio if 0.1

genewitch3y ago

I seed until 2.0 or i move the files to their proper locations, whichever comes last. I don't set time limits or anything. I have the unadulterated bandwidth to do this, even if it takes a year or more. My main seedbox actually is on the fritz, and i have a small feeling of guilt that if i am too lazy to fix the box that has the .torrent files, that dozens of people will miss out.

I hand-ripped and released the netflix wii disc once upon a time, and my seed ratio on that was astronomically high.

jrochkind13y ago

I think you can still get a reddit account without giving them even an email address, and definitely not a phone number.

midoridensha3y ago

Or you can use a fake email address like from mailinator.

1 more reply

steve_john3y ago

Yes you are right reddit account can get without phone number

MasJ3y ago· 3 in thread

As the founder of emuparadise some 22 years ago, I can relate. I got into retrogames because I never got to play those games growing up in India. I thought, well let me archive these games and make them available for everyone else to play.

It was wildly successful. At it's zenith EmuParadise was ranked 700 or so as per Alexa on the entire internet. We're talking millions of visitors per day and thousands of active users every single second. I ran it all by myself with an entire team of moderators, contributors, etc.

It did have ads. Heck, our server bills were in the range of tens of thousands of dollars a month. How could I pay for that without having ads on the site? Then we're in commercial copyright infringement territory. Basically if you get sued, you can go to prison, and you will be bankrupted for sure. At the time there were no torrents, no IPFS, no distributed hosting solutions in any case.

As time went by the stress became enormous. Of course threatening letters and DMCA takedown notices were the norm. And the fact that the site was hugely popular and government agencies such as the FBI could get involved at the behest of Nintendo et al just made it worse. But also keeping it online, through various CDNs, trying to keep it anonymously run at all times (my OpSec was terrible starting out, it started in the year 2000), keeping servers online and uptime to almost 100% and bandwidth flowing and hard drives spinning and RAID arrays working. It was a whole lot of everything all at once and I was just one guy doing it all.

After another website Loveroms got sued by Nintendo in 2018 (for $12MM) I decided I had had enough. Reading stories like the kickasstorrents guy getting arrested while on holiday with his wife and kid, loveroms getting sued, I decided that this was the end of the road for me. I pulled all the games from the site. Eighteen years of work down the drain.

My mental health had suffered tremendously, I was depressed and anxious almost all the time. The sight of a police officer on the street would set me panicking. The cost was too high.

Was it a blast? Oh yes it was. I used to receive thousands of emails from grateful people. Cancer patients who reminisced in their last days playing video games from their childhood, soldiers at war whose only escape was a few rounds of Bomberman (the irony is not lost on me), and so many more beautiful stories of nostalgia and connection.

But current copyright law is going to destroy all this art and culture. There is no real legal way to preserve it. And people like me may do it for a long while, but at what cost to ourselves? I firmly believe that a 7-10 year copyright (extendible even somehow? debatable) would be fair and would let authors get what they need out of their creations. It would help us preserve all this beautiful art and culture that we have enjoyed and share it with future generations.

I would love for a human kid living on a distant exoplanet in the far future be able to play Chrono Trigger and wonder about the history of the earth and our stories.

justin_3y ago

I just want to say thank you for creating emuparadise. As a kid (~2005) searching for ROMs online, I remember finding a ton of ad-ridden fake sites, endless demands for "voting" on link aggregator sites, and malware downloads. Emuparadise was like a breath of fresh air compared to those sites, and it basically instantly became my favorite ROM site. While not perfect, it actually had all the games I was looking for, and the community actually seemed to care. I was able to play so many classic games that way that I otherwise never would have had access to. (Including Chrono Trigger :)

Emuparadise is also the site that introduced me to BitTorrent, and my very first torrent was downloaded from there. That would get me interested in file sharing. In some way, it's partly responsible for why I'm interested in archiving and links like the OP. I'm sure I'm not the only one.

So, thank you for creating such a wonderful library and community back then! It was a great part of my childhood and adolescence, and it showed me how important preservation and sharing can be.

MasJ3y ago

Oh yes, I always tried to keep it friendly towards our users. There were too many sites out there that just kept you going in loops forever to get to whatever you wanted. Thanks for noticing that :)

Our bittorrent tracker was very short lived. It did well and had some pretty good sets on it! But in 2010 when bittorrent was getting a really bad name right after The Pirate Bay case it was easy to get torrent trackers shut down.

One day I got a downtime alert, I think it was mid-2010. I checked the site, gone. The server, unresponsive. I got in touch with the host. He said he'll check with the data center. After a while he got back to me and said: "German police came in and seized your servers." There had been no notice, no warning, no nothing. Just boom, and gone. I asked him: "How can they do this? What do we do?". He said: "Nothing, they just come and take whatever they want every now and then."

I hired a lawyer in Frankfurt to go and check on the case. He said that they had closed the case with no further process because the person in question was unknown. And he ended that email with: "But Nintendo may try something else".

Until that moment, I had no idea that Nintendo was behind the server seizure. I was relieved that the case was closed. Anyway, I still went ahead and resurrected the site sans bittorrent tracker. YOLO and all that.

For the next 8 years, we never really had much trouble after that except the usual DMCA takedown notice here and there and a threatening legal letter sometimes. But pressure kept piling up. I did consider myself small and unimportant fish to fry (compared to say, The Pirate Bay or even current gen videogame piracy websites) but that didn't stop them from going after LoveROMs.

There was always the chance that one day they would just catch me at an airport or immigration (like the kickasstorrents dude) or something and that would be it. Or the police would just knock at my door. I mean, they would have to know who I was provably but I don't think it would be that hard for a government agency. It was just a matter of time that the powers that be would need to lobby the government to get at me.

I didn't want to live my life like that any more.

birracerveza3y ago

The creator of Emuparadise? Feels like meeting a celebrity! Certainly one of my childhood-defining websites. And no, I did not delete my ROMs after 48h :D

I'm glad you didn't suffer consequences from it. Thank you so much for your work!

chatterhead3y ago· 2 in thread

While reading this I realized that the first impression for 'Pirate Archivists' that I was exposed to were the bums in Fahrenheit 451 who memorize books so they can't be burned.

I never realized that was my first true introduction to piracy. Really enjoyed the write up!

totetsu3y ago

200 years ago the Grimm brothers collected tales that had been memorized and shared for generations in the German oral tradition and made them into a book of fairy tales. 70 years Walt Disney made some animated movies based on these fairy tales. Today sharing a copy of Cinderella is Piracy.

justsomehnguy3y ago

> sharing a copy of Cinderella is Piracy

with Disney artwork.

2 more replies

mdaniel3y ago· 1 in thread

I'm not in the pirate archivist space, but sections 3 and 5 are relevant to my interests. I've had great luck with ZAP (https://github.com/zaproxy/zaproxy#readme) glued to a copy of Firefox (because it allows monkeying with the _browser_'s proxy without having to alter the system one as other browsers do) for archiving all content seen while surfing around a site. It even achieves the stated goal of preserving the HTML (etc) in a database since ZAP uses hsqldb

Then, section 5 reads like an advertisement for Scrapy since it is just stellar at following all pagination links and then either emitting the extracted payload as your own data structure and/or by telling Scrapy you want to download some media as-is. It will, by default, put the local content in a directory of your choice and hash the url to make the local filename. A separate json file serves as the "accounting" between the things it downloaded and their hashed on-disk filename

Scrapy is also able to glue 3 and 5 together because it has a pluggable (everything, heh) dupe detection hook and also HTTP cache support that can be backed by anything, including the aforementioned hsqldb operating in network mode. Scrapy is also very test friendly, since each method accepts a well known python object and emits either a follow-on request, zero or more extracted objects, or nothing if pagination has ended. Thus, one need not rerun the whole scraping process just to test if a bug has been fixed, or during development

I can appreciate there may be other scraping frameworks, but of the ones I've tried Scrapy makes everything that I've asked it to do simple and transparent

genewitch3y ago

This is extremely relevant to my interests. I will try ZAP with a VM that has the explicit purpose of mirroring all content i view "online" within that VM.

there's the web archival projects - that i cannot remember right now - that have some sort of proxy front end, but realistically, it should be possible to record the "content" portion of all web interactions, without relying on such dalliances as OCR and screengrabs or even OBS studio or a screen recorder.

Sometimes i go on a deep dive of some concept, and when i am done i feel i have a decent enough understanding to explain the concept to an adult, and sometimes i do a deep enough dive to explain to a 6 year old. I'd like to archive the entire "session" that got me there. Ideally as plaintext, but never have i wanted video documentation. I only ever use video to prove to someone that their service is acting up, since audio/visual desktop captures can do that, without cheating and provably.

mrfinn3y ago· 1 in thread

I keep feeling we shouldn't accept the term "piracy" anymore. The problem, the big problem is on the so-called "legal" side, and the purpose of this system is not about retrieving authors anymore, is about some big economic groups hoarding goods (and power by doing that). But that's heavily against the common interest. I met quite a few years ago with a member of my country's senate with a solid proposition to end the "piracy" problem. Got an email asking for more info about my proposal. That was the end of it.

PS. Maybe instead "pirates", we should call ourselves "keepers".

genewitch3y ago

The "DNA" of pirate-ish-ness is baked into the US cultural zeitgeist since the 1860s. Freebooters, filibuster, the US excursions into Central America in the 1850s, and the overall Dutch influence on our language and culture still captures some part of us, whether we want it to or not. We're adverse to pirates, even though pirates are universally bad. Unless it's Jack Sparrow. I guess.

O__________O3y ago

>> That secrecy, however, comes with a psychological cost.

Being acknowledged for someone concerned about OpSec is minor, if not completely unimportant issue. Grind of maintaining OpSec for most is mind numbing in my experience, especially over an extended duration. One minor slip ends it all - and risk of slipping increases relative significance of the related operations, since more eyes increases odds someone will notice something, they’ll be forced into unfamiliar situations, etc.

Beyond that, research shows that odds of being discovered grow as more people know:

https://www.bbc.com/news/science-environment-35411684.amp

the-printer3y ago

Feels as though an organization such as this should have more domain appropriate points of contact than Twitter or Reddit.

A very interesting thing nonetheless.

justshowpost3y ago

I just dropped here to praise archivists and their merit in general. I treasure content (regardless of its perceived quality) preservation much more than legal or even ethical problems associated with it.

Anecdote: Remember when Microsoft Corp. declared what they love open source software and launched CodePlex platform, and then lost their business interest in it (when they bought GitHub) so they completely erased CodePlex archives? I was able to reach several long forgotten project I was interested in thanks to invaluable work of independent volunteer archivists. (It was quite tough manual job for me, I had to d/l database then locate desired archive segment and only then could transfer required files via bittorrent proto)

standup3y ago

Having a local copy of an entire ebook archive is one way we can find information without having to use the Internet. Thus we can avoid being subjected to mass surveillance, which is excellent. I wonder if the archive is full text searchable?

Finally an alternative to this Orwellian nightmare we call the Internet. Can't wait to have a copy at home, and there will probably be times where I'll be pulling the plug on the router with relief. And it's one more step towards reducing my Internet usage, thus keeping the government and corporations out of my life.

tenacious_tuna3y ago

This article has me curious as to most people's "op-sec" around personal piracy practices, e.g. torrenting. Do people take requests from family members? How restrictive are you with these behaviors, especially when backed by something like Plex (which presumably just directly erodes any other opsec you may be practicing).

pwdisswordfish03y ago

Would you be open to using an apostrophe ’ in your header instead of the straight single quote? I dig the Comic Sans, but it would look so much better. Cheers lol

j / k navigate · click thread line to collapse

99 comments

67 comments · 14 top-level

jancsika3y ago· 17 in thread

I'm curious how Sci-hub's approach compares to the What.cd/Redacted approach.

Oh yeah, there's also apparently another approach in rutracker, which seems to be blitzkrieg to add content and publish, at the (apparent?) cost of quality of content.

MontyCarloHall3y ago

pdntspa3y ago

The Pirate Bay has never hosted any content; yet many, many cases have been decided against them. Why would that work for sci-hub?

2 more replies

genewitch3y ago

Months later, I'm still waiting on sci-hub or anyone to get access to the studies.

The real WTF is science publishing. never-mind reproducibility of the studies, just getting the study in the first place is a predicament. at least genesis still works for 85% of requests i get.

throwhn00000013y ago

Redacted's ratio economy is fucked. There's groups of people using high speed seedboxes that grab every new upload to build up ratio in a normal way, if you can call it that.

There's many many threads on https://old.reddit.com/r/trackers about REDs economy problems. OPS has a bonus point system and suffers less, but has worse content than RED. My opinion only though

rutracker is all about putting content out there. A lot of stuff is mislabelled or lower quality, less retention, identified wrong, seeded slowly. But you can sure find a lot of oddities there.

> was any data lost

yes, absolutely. I've hundreds of albums not on RED, and I can't bother to reupload them. it's a total waste of time when you know they're unseeded in a week if not for me hanging on.

time better spent finding new music and sharing everything on soulseek instead.

Stevvo3y ago

relaxing3y ago

What saves the economy is the generous freeleech and gift token events. (Currently one running if you haven’t logged in this week.)

It all serves the purpose of getting users to be good citizens participating in the community, and not just snatch and run.

2 more replies

loeg3y ago

> Yet the tracker was apparently already nuked off the internet as What.cd and reappeared later as Redacted.

genewitch3y ago

in the sets of what.cd and waffles.fm and Redacted, approximate the intersection of the sets.

soulofmischief3y ago

Unfortunately, a lot of data is currently yet to be restored. Some of that music might not surface again for decades to come.

luckylion3y ago

> Was any data lost between the two services?

Yes, absolutely, a lot.

throwhn00000013y ago

the combined amount of content on those servers was probably around 100TB but most likely more

1 more reply

jumelles3y ago

> Was any data lost between the two services?

Undoubtedly.

eimrine3y ago

> What.cd/Redacted approach seems to use Bittorrent ratios to create a private-tracker economy. New users get a few gigs free download on joining.

ShowalkKama3y ago

>Isn't that "bittorrent ratio" easy to cheat?

Yes[0] but you have to be careful else you might get caught

0: https://wiki.installgentoo.com/wiki/Ghostleech

290830113977783y ago

hatware3y ago

Bounty systems are a pretty common way to get insane upload ratios so that you can archive.

throwhn00000013y ago

>insane upload ratios

you can do specific requests if you have accounts for streaming platforms but nobody makes bounty requests for those

ynno3y ago· 11 in thread

I think Alexandra Elbakyan actually did not want to be revealed as the librarian behind Sci-Hub, it was her poor opsec that led to her being identified.

est3y ago

> which was under directory /home/ringo-ring, which could be traced to a username

Ha, my home dir is always called "me" or "and". Try google that.

NaturalPhallacy3y ago

This is deep infosec. Instead of security through obscurity, it's security through ubiquity.

pavel_lishin3y ago

People might be able to, now!

1 more reply

latchkey3y ago

Smart. Here's how to rename on OSX (might break a lot of tooling, but whatever).

https://support.apple.com/en-us/HT201548

teddyh3y ago

IIRC, on new installations of NeXTSTEP (based on 4.3BSD Unix), the username of the single installed desktop user was "me".

O__________O3y ago

Electron star - seems oddly specific, though maybe not spelling it same way you do.

1 more reply

genewitch3y ago

my last name, without any other information, has one tenth of the bits of information that "genewitch" does.

thakoppno3y ago

I’ve narrowed it down to the east coast of the US.

pilimi_annaOP3y ago

Did not know that detail. Will add to the post, thank you.

userbinator3y ago

And from your addendum:

So, use random usernames on the computers you use for this stuff, in case you misconfigure something.

...or a username that is so common as to be meaningless, like "user", "Administrator", or even "root".

4 more replies

O__________O3y ago

imhoguy3y ago· 10 in thread

As an active hoarder I think there is problem with "6. Distribution: Packaging it up in torrents, announcing it somewhere, getting people to spread it.".

The digital preservation needs better distribution system.

londons_explore3y ago

The "v2" torrent file format allows most of this. Some clients have support already.

All you'd have to do is make a torrent of your whole hard drive and then seed that. You don't need to publish the torrent anywhere.

If anyone else in the world is downloading any other torrent that happens to contain a file you have, they will end up connecting to your machine to download it.

chhs3y ago

I'm very interested in this problem space, if you are aware of clients/mechanisms that allow for this I would love to hear them.

3 more replies

Retr0id3y ago

Hm, is there a security concern here?

Someone could use this to

A) Remotely check the presence of any specific file on your machine.

B) Exfiltrate the contents of any file they know the hash of (or possibly more specifically, the hash of each piece? I don't know the protocol details).

Fine if you have a dedicated "I expect the contents to be public" drive or directory, but not something I'd want to do on my OS drive.

1 more reply

chhs3y ago

anacrolix3y ago

I have a solution to this, it's the successor to Magnetico.

1 more reply

trevyn3y ago

>One problem that I've yet to build a solution for is finding the metadata to use for the lookup phase.

I think BEP 51 followed by BEP 9 is all you need.

1 more reply

3np3y ago

Would you mind sharing the source? Sounds like something others could build on.

1 more reply

justshowpost3y ago

imhoguy3y ago

That `ultra-rare content`, eMule is mess if you need collection of files, ofc you can share ZIP but they get somehow repackaged too often. Torrents nicely group the stuff - it is kind of album.

I've dreamed of federated meta-client which mixes all available p2p networks in a wild and can download missing torrent part from eMule or Soulseek.

dangerface3y ago

That was the problem with p2p no one bothered to actually package their releases instead just slinging it onto the internet no care given as to if it was complete or even correctly labeled.

The problem has never been technical or distribution the issue is the law.

yamrzou3y ago· 8 in thread

«That secrecy, however, comes with a psychological cost. Most people love being recognized for the work that they do, and yet you cannot take any credit for this in real life.»

incompatible3y ago

It's probably not that uncommon that people working for companies, governments, or criminal organizations can't talk about their work in public.

michaelt3y ago

NSA mathematicians still get some recognition though:

* Fellow NSA employees, your coworkers and boss

* Cash money, my personal favourite form of recognition

* Your close friends and trusted loved ones will know the broad outlines of what you're doing.

It doesn't make you world-famous but it wouldn't be as lonely as a job that needed total secrecy.

1 more reply

genewitch3y ago

Is this really a bad thing, though? I mean, personally, recognition seems important. I've withheld patenting certain inventions that became commercial products a half dozen times in my life.

Capitalism is the problem.

madmax1083y ago

genewitch3y ago

I hand-ripped and released the netflix wii disc once upon a time, and my seed ratio on that was astronomically high.

jrochkind13y ago

I think you can still get a reddit account without giving them even an email address, and definitely not a phone number.

midoridensha3y ago

Or you can use a fake email address like from mailinator.

1 more reply

steve_john3y ago

Yes you are right reddit account can get without phone number

MasJ3y ago· 3 in thread

My mental health had suffered tremendously, I was depressed and anxious almost all the time. The sight of a police officer on the street would set me panicking. The cost was too high.

I would love for a human kid living on a distant exoplanet in the far future be able to play Chrono Trigger and wonder about the history of the earth and our stories.

justin_3y ago

So, thank you for creating such a wonderful library and community back then! It was a great part of my childhood and adolescence, and it showed me how important preservation and sharing can be.

MasJ3y ago

Oh yes, I always tried to keep it friendly towards our users. There were too many sites out there that just kept you going in loops forever to get to whatever you wanted. Thanks for noticing that :)

I didn't want to live my life like that any more.

birracerveza3y ago

The creator of Emuparadise? Feels like meeting a celebrity! Certainly one of my childhood-defining websites. And no, I did not delete my ROMs after 48h :D

I'm glad you didn't suffer consequences from it. Thank you so much for your work!

chatterhead3y ago· 2 in thread

While reading this I realized that the first impression for 'Pirate Archivists' that I was exposed to were the bums in Fahrenheit 451 who memorize books so they can't be burned.

I never realized that was my first true introduction to piracy. Really enjoyed the write up!

totetsu3y ago

justsomehnguy3y ago

> sharing a copy of Cinderella is Piracy

with Disney artwork.

2 more replies

mdaniel3y ago· 1 in thread

I can appreciate there may be other scraping frameworks, but of the ones I've tried Scrapy makes everything that I've asked it to do simple and transparent

genewitch3y ago

This is extremely relevant to my interests. I will try ZAP with a VM that has the explicit purpose of mirroring all content i view "online" within that VM.

mrfinn3y ago· 1 in thread

PS. Maybe instead "pirates", we should call ourselves "keepers".

genewitch3y ago

O__________O3y ago

>> That secrecy, however, comes with a psychological cost.

Beyond that, research shows that odds of being discovered grow as more people know:

https://www.bbc.com/news/science-environment-35411684.amp

the-printer3y ago

Feels as though an organization such as this should have more domain appropriate points of contact than Twitter or Reddit.

A very interesting thing nonetheless.

justshowpost3y ago

standup3y ago

tenacious_tuna3y ago

pwdisswordfish03y ago

Would you be open to using an apostrophe ’ in your header instead of the straight single quote? I dig the Comic Sans, but it would look so much better. Cheers lol

j / k navigate · click thread line to collapse