If you wish to donate bandwidth or storage, I personally know of at least a few mirroring efforts. Please get in touch with me over at legatusR(at)protonmail(dot)com and I can help direct you towards those behind this effort.
If you don't have storage or bandwidth available, you can still help. Bookwarrior has requested help [1] in developing an HTTP-based decentralizing mechanism for LibGen's various forks. Those with experience in software may help make sure those invaluable archives are never lost.
Another way of contributing is by donating bitcoin, as both LibGen [2] and The-Eye [3] accept donations.
Lastly, you can always contribute books. If you buy a textbook or book, consider uploading it (and scanning it, should it be a physical book) in case it isn't already present in the database.
In any case, this effort has a noble goal, and I believe people of this community can contribute.
P.S. The "Pirate Bay of Science" is actually LibGen, and I favor a title change (I posted it this way as to comply with HN guidelines).
[0] http://185.39.10.101/stat.php
[1] https://imgur.com/a/gmLB5pm
[2] bitcoin:12hQANsSHXxyPPgkhoBMSyHpXmzgVbdDGd?label=libgen, as found at http://185.39.10.101/, listed in https://it.wikipedia.org/wiki/Library_Genesis
[3] Bitcoin address 3Mem5B2o3Qd2zAWEthJxUH28f7itbRttxM, as found in https://the-eye.eu/donate/. You can also buy merchandising from them at https://56k.pizza/.
Edit: Found the other comment where you link to the seeding stats: https://docs.google.com/spreadsheets/d/1hqT7dVe8u09eatT93V2x...
There's no easy solution for scanning physical books, is there?
[1] http://1dollarscan.com/ (no affiliation, just a satisfied customer, can't scan certain textbooks due to publisher threats of litigation)
Source: http://gen.lib.rus.ec/dbdumps/
Continuing to be dense, why is there a difference between their "database dump" and the total of all the files they have?
Thus, 32 TB of books (over 2 million titles), 3.2 GB database.
A more advanced version of this architecture is used by pirate addons for the Kodi media center software. Basically, you have a bunch of completely legal and above board services like Imdb that contain video metadata. They provide the search results, the artworks, the plot descriptions, episode lists for TV shows etc. Impossible to sue and shut down, as they're legal. Then, you have a large number of illegal services that, essentially, map IDs from websites like IMDB to links. Those links lead to websites like Openload, which let you host videos. They're in the gray area, if they comply with DMCA requests and are in a reasonably safe jurisdiction, they're unlikely to be shut down. On the Kodi side, you have a bunch of addons. There are the legitimate ones that access IMDB and give you the IDs, the not that legitimate ones that map IDs to URLs, and the half-legitimate ones that can actually play stuff ron those URLS (not an easy taks, as websites usually try to prevent you from playing something without seeing their ads). Those addons are distributed as libraries, and are used as dependencies by user-friendly frontends. Those frontends usually depend on several addons in each category, so, in case one goes down, all the other ones still remain. It's all so decentralized and ownerless that there's no single point of failure. The best you can do is killing the frontend addon, but it's easy to make a new one, and users are used to switching them every few months.
Just like any other distributed system, this is vulnerable to organized take downs and scare tactics. There was a whole bunch of mirrors of Pirate Bay, yet once most of Europe's legal systems adopted the "sharing is theft" mindset, it became pretty much impossible to find one.
Single decentralized service, providing access to all content, national and international, free of DRM, for all platforms, for a proper, fair, and non-monopolist price.
That will pull all the users who are willing to pay for content over to the paid service, and those who remained were not willing to pay regardless of what you did anyhow.
Alas, Yongle Encyclopedia is almost completely lost now. Archiving is harder than you think.
Preservation is easy if you don't get invaded.
All joking aside, I do wonder wither digital or analogue formats are better able to survive into the distant future.
* What impact will DRM have on the accessibility of our knowledge to future historians?
* Is anything recoverable from a harddrive or flash media after 500 years in a landfill?
* Will compressed files be more of less recoverable? What about git archives?
* Will the future know the shape of our plastic GI Joes toys but not the content of the GI Joes cartoon?
There are 5000 year old clay tablets we can still read.
There are centuries old documents on paper, vellum etc. that we can still read.
I personally have decades-old paper documents I can easily read, and a box of floppies I can't.
It's not just a problem of unreadable physical media, I have a database file on a perfectly readable HD that was generated by an application that is no longer available. I might be able to interrogate it somehow, but it won't be easy.
Digital formats and connectivity make LOCKSS easier, so that's a plus. There's less chance of a fire or flood or space-limited librarian destroying the last known copy. However, without archivists actively transforming content to new formats as required, it might only take a few decades before a lot of content starts to require a massive effort to read.
Let's say the probability that: a single copy of a physical book survives 1,000 years, is found and is understood by an archaeologist, is pB and the probability that a single copy of a book on an SSD survives 1,000 years is found and understood by an archaeologist is pD. Even if pB is far larger than pD it could be the case that there might be so many more copies of single book held on SSDs thus making it more likely the book will survive via an SSD than a physical book. On the other hand the technology to recover data from SSDs might not exist in 1,000 years.
It could also be the case that each generation would copy these books onto new digital media providing an unbroken chain of copies. The oldest copy of the Iliad is Venetus.A which is from 1000AD (1000 years ago) despite the Iliad probably first being written down in 800BC (2800 years ago). It was copied from earlier copies of copies of copies.
I really don't know how this will play out and I've been unable to find research on how long SSD and flash memory based media survives especially if buried in a landfill.
* - If archaeologists exist in the future. The current push from the STEM boosters to defund and de-emphasize the humanities may result in a near-future without archaeologists or funded archaeological projects. Over 1,000 years the entire field could die.
Redundant, shared servers ARE a forever solution. Making sure your data is one one of the ones that makes it seems like a vastly easier proposition to me than writing data to clay tablets and trying to keep those from ending up in a dump somewhere.
If you found a mysterious archive object and had no idea what it was - CD-R, hard drive, SSD, whatever - not only would you have to reinvent an entire hardware reader around it, you would also have to work out the file structure, extract the data (some of which could be damaged), and reverse engineer the container file formats and the data structures inside them.
If you got all of that right, you'd eventually be able to start trying to translate the content of the text, audio, images, videos (how many compression formats are there?) into something you could understand.
A much more advanced civilisation would struggle with making a cold start on all of that. In our current state, we'd get nowhere if we didn't already have some records explaining where to begin.
1. Even if the CD-R has been crushed and shattered you could use a modern and cheap microscope to read continuous pits and lands off the disk [0,1]. It would be clear to anyone familiar with information theory how to translate the pits and lands to a series of set of arbitrary symbols which encode data.
2. This data would at first be meaningless. However the mathematical relationships of a simple error correcting code would stand out. This would allow them recover corrupted data. Once the error correcting code was stripped out they have a transcript of the raw data.
3. They would notice a pattern in the data. There would be long high entropy regions and then very short low entropy regions. They would probably notice that some of the low entropy regions had every 8-th bit set to zero (ASCII) and if taken in 8-bit chunks these regions had the roughly the same number of symbols as in the latin alphabet. If they were familiar with English they might quickly decode these regions using letter frequency correspondence with another English text.
4. The high entropy regions would be far harder to decode. However these future archaeologists would be faced with the obvious data patterns of frames of an MP3. Decoding the first MP3 would be a serious project involving many institutions over many years but once it was done it would allow the decoding of all artifacts that use the MP3 and related encoding formats. Possibly someone would find a "rosetta file" [2], a disk that contained both a .wav file and an encoded MP3 of the same song. More likely someone would find an MP3 player and then reverse engineer the decoding algorithm.
[0]: "Being able to see the tracks and bits in a CD-ROM" https://superuser.com/questions/870776/being-able-to-see-the...
[1]: "CD-ROM Under the Microscope" https://www.youtube.com/watch?v=RZUxemOE07Q
[1] https://www.lockss.org/ ; https://en.wikipedia.org/wiki/LOCKSS
If we can't effectively warn a future (>10,000 years) generation to stay away from something that may harm or kill them, what chance do we have of making a universally understandable archive of data?
Just about everybody in academia uses it, too, especially in the case of Scihub. I can't imagine taking the time to actually check whether I have access to some journal when I want to read a paper, let alone jump through all the hoops before you can get a PDF. The first thing we did when my partner's paper was recently published was check to see if it was on Scihub yet. (It was!)
Today I learned that Library Genesis is actually "powered by Sci-Hub" as its primary source.
So I guess they're sister projects by similarly minded people (who seem to be mostly/originally based in Slavic countries, which I find interesting culturally - perhaps it's due to a looser legal environment + activist academics?).
> Just about everybody in academia knows about it.
That really says something about the state of society, this tension between copyright laws (and the motivations behind them) and the intellectual ideal of free and open access to knowledge.
You see the same situation with Asia --- it's a collectivist culture, they have a very different perspective on IP in general.
I realize that doesn't solve the access problem for most people as most of the users who need this research might not know how to use usenet or even be familiar with it at all, but I think the first major concern would be to secure the entire repository on a stable network. Usenet seems like a good place for that even if it doesn't serves as a means of distribution. Encrypting the uploads would make them immune to DMCA takedowns provided that the decryption keys weren't made public and were only shared with individuals related to the maintenance of the LibGen project.
However, in my personal experience, I have seen no issues downloading old data from any binary group. At least not with the provider I have. In fact, just this past week I obtained something sizable (several GBs) with no damaged parts so didn't even need the parchive recovery files at all. This has always been my experience. I've never seen anything like the pruning you are talking about. That sounds more like an issue with your specific provider to me.
The hours that LibGen saved me in gathering all the sources for my research must be in the hundreds. Thank you!
That might change though as people start including video + data within papers and have new notebook formats that are live and contain docker containers/ipython, etc.
It's a shame we can't just mail these around.
I picked up 32TB for just under $500 with discount over the holiday that way.
Encrypted shards partially solves this, but then you hit the quandry of "But what if I have a shard of something illegal or undesired enough to upset the wrong people?" which has not been thoroughly tested in our legal system.
For example right now in Germany I can get a WD 8TB USB 3.0 drive for 135€ but the cheapest internal 8TB drive costs 169€.
Any idea why? It's puzzling.
https://www.instructables.com/id/How-to-Fix-the-33V-Pin-Issu...
In large ZFS arrays, many people are using them with great success, at no greater or lesser annual failure rate than the expensive enterprise hard drives.
I've read these reports as well, but I can say that it's not my experience (we've gone through a few rounds of shucking at the Internet Archive, for economy and in one case necessity after the 2011 Thailand floods pinched the supply chain). Our raw failure rates on shucked drives are significantly higher, and the drives themselves are typically non-performant for high-throughput workloads (often being SMR disks/etc, though hopefully the move away from drive-managed SMR will finally kill that product category off).
Of course, that wouldn't explain the difference between a WD external drive and that same drive as an internal drive - assuming that WD actually manufactures both (and doesn't just license the name provided the 3rd party uses their drives)...
EDIT: To find libgen's torrents health, check out this google sheet: https://docs.google.com/spreadsheets/d/1hqT7dVe8u09eatT93V2x...
Thanks frgtpsswrdlame for the heads up.
I want a full mirror, and ain't nobody got time to deal with 2000 torrents, many of which have no seeders. That's a really dumb way to run this particular railroad.
The frontend would still be a user-friendly HTTP web-application (or collection of several) that pulls (portions of) the archive from the distributed/resilient backend to serve individual files to clients.
The backend can be a relatively obscure, geeky, post-BitTorrent p2p software like IPFS or Dat, as long as those willing to donate bandwidth/storage can run it on their systems. This is a vastly different audience from 'most people'.
The real question is which software's features best fits the backend use-case (efficiently hosting a very large and growing/evolving, IP-infringing dataset). Dat [1] has features to (1) update data and efficiently synchronize changes, and to (2) efficiently provide random-access data from larger datasets. Two quite compelling advancements over BitTorrent for this use-case.
[1] https://docs.datproject.org/docs/faq#how-is-dat-different-th...
There's also ZeroNet, though IDK if it can handle the traffic.
[0] https://www.theverge.com/2019/11/4/20942040/microsoft-projec...
Keep in mind that they don't know our written or computational language and there's nothing about our technology that is inherently self-explaining/obvious.
Even the assumption that they'd use binary computers (rather than trinary, or other technology not based around electrical voltages) is open to debate.
If you assume motivated readers and human-level intelligence, you could end up with good results. It might take a decade or three, and a lot of mental firepower, but they could get there.
(The outer layer is the hardest, since our information density is lowest. Our "description of how to build a magnifying glass" might cover just the basic optics of curved glass and a very basic description of how to get to glass and how to curve it correctly, leaving a lot of the details up to the finder. After all, we did it without help. We're not so much trying to solve this problem for the finder as help them on their way.)
So, before jumping in to argue, remember I'm stipulating decades of dedicated effort by presumably an interested consortium of... whatever they are. I think we can safely stipulate an amount of effort at least as large as our society has dedicated to, say, Linear A and B, or the Voynich manuscript. I'm not trying to spec "Ugh wanders out of the jungle, sees our pretty rock, and personally has a 20th century civilization up and running in 10 years" or anything crazy.
Not necessarily for aliens ... but why not keep a backup in a safe place outside the dangers of earthlings ?
OTOH, I think a sufficiently advanced alien intelligence will be able to decipher the information structures we use regardless of differences in technology. It's possible though there will be missing links in that archive, which will need to be supplemented with a primary secondary and high school curriculim.
If one were to receive an object, how would it be indicated that there is a message embedded in there? Given that a intelligence could recognize that there was a message embedded, could it eventually be deciphered?
- A tiny well behaved client that starts with the OS.
- It downloads rare bits of the archive at 1 kb/s obtaining 1 GB every 278 hours. It should stop around 100 MB to 5 GB.
- It periodically announces what chunks/documents it has.
- It seeds those chunks at 1 kb/s
- Chunks/documents that have thousands of seeds already are not announced. Eventually those are pruned.
This escalates the situation to the point where everyone can help without it costing anything.
If someone is trying to obtain a 20 mb pfd it would take 5 and a half hours using a single 1 kb seed. With just 50 seeds it's just 8 min.
Here's a blog post about our datastores for some background.
https://getpolarized.io/2019/03/22/portable-datastores-and-p...
... essentially Polar is a PDF manager and knowledge repository for academics, scientists, intellectuals, etc.
One secondary challenge we have is allowing for sharing of research but I'd like to do it in a secure and distributed manner.
Some of our users are concerned about their eBooks being stored unencrypted and while for the majority of our users this will never be a problem I can see this being an issue in countries with political regimes that are hostile to open research.
In the US we have an issue of researchers being harassed over climate change btw. Having a way to encrypt your knowledge repository (ebooks) would help academic freedom as your employer or government couldn't force you to give them your repository.
But what if we went beyond this and provided a way to ADD documents to the repository from a site like LibGen?
Then we'd have the ability to easily, with one click, encrypt the document (end to end) and added it to our repository.
If we can add support for Polar to allow colleagues to share directly, this would be a virtual mirror of LibGen.
Alice could add books b1, b2, b3 to their repo, they could then share with Bob, only he would be able to see b1, b2, b3, then they would generate a shared symmetric key to share the books.
No 3rd party (including me) would have any knowledge what's going on.
I'm going to assume our users are not going to do anything nefarious or pirate any books. I'm also certain that they're confirming to the necessary laws ...
The challenge though is that while we'd be able to have a mirror of LibGen and more material, it would be a probabilistic mirror - I'm sure we'd have like 60% of it but the obscure material wouldn't be mirrored.
Right now our datastores support just local disk, and Firebase (which is Google Cloud basically). While we would encrypt the data end to end in Google Cloud I can totally understand why users might not like to use that platform.
One major issue is China where it's blocked.
Something like IPFS could go a long way to solving this but it's still very new and I haven't hacked on it much.
While I do understand your point, it still does not justify encouraging modern-day Robinhoods' and breaking the law.
[1] https://www.amazon.com/Programming-Language-2nd-Brian-Kernig... [2] https://www.amazon.in/Programming-Language-Kernighan-Dennis-...