Indeed, what an intellectual tragedy..
> In August 2010, Google put out a blog post announcing that there were 129,864,880 books in the world. The company said they were going to scan them all.
That seems like a surprisingly "small" number.
Well, in trying to picture a physical library with 130 million books, maybe that's a realistic estimate. But compared to, say, the recently discovered data hoard of more than 2 billion online identities, it's miniscule.
SciHub and LibGen are truly the modern-day Library of Alexandria. The fact that they're being called "Pirate Bays of Science" - and that providing free and open access to all books in the world is illegal - just goes to show that our civilization's priorities are misdirected.
- The total number of books -- not titles, but actual bound volumes -- in Europe as of 1500 CE, was about 50,000. By 1800, the total was just under one billion.
- The library of the University of Paris circa 1000 CE comprised about 2,000 volumes. It was among the largest in Europe.
- The Library of Constantinople in the 5th century had 120,000 volumes, the largest in Europe at the time.
- A fair-sized city public library today has on the order of 300,000 volumes. A large university library generally a millon or so. The Harvard Library contains 20 million volumes. The University of California collection, across all ten campuses, totals more than 34 million volumes.
- The total surviving corpus of Greek literature is a few hundred titles. I believe many of those were only preserved through Arabic scholars, some possibly in Arabic translation, not the original greek.
- There's an online collection of cuneiform tablets. These generally correspond to a written page (or less) of text, with the largest collections numbering in the tens of thousands of items.
- As of about 1800, the library of the British Museum (now the British Library) had 50,000 volumes. Again, among the largest of its time.
- From roughly 1950 - 2000, roughly 300,000 titles were published annually in the United States and/or English-language editions. R.R. Bowker issues ISBNs and tracks this. From ~2005 onward, "nontraditional" books (self- / vanity-published) have been about or above 1 million annually.
- The US Library of Congress, the largest contemporary library in the world, holds 24 million books in its main collection (another 16 million in large type), and has 126 million catalogued items in total (2015).
- At about 5 MB per book, in PDF form, total storage for the 38 million volumes of the Library of Congress would be slightly under 200 TB. At about $50/TB, that's $10,000 of raw disk storage. (Actual provisioning costs would be higher.) Costs are falling at 15%/year.
- Total data in the world comprises far more than books, and has been doubling about every 2 years. Or stated inversely: half of all the recorded information of humankind was created in the past two years.
Sources:
Some of this is off the top of my head, but partial support for the facts from:
https://en.wikipedia.org/wiki/History_of_printing#/media/Fil...
https://en.wikipedia.org/wiki/History_of_libraries
http://www.bowker.com/tools-resources/Bowker-Data.html
https://www.loc.gov/item/prn-16-023/the-library-of-congress-...
https://en.wikipedia.org/wiki/Harvard_Library
https://en.wikipedia.org/wiki/University_of_California_Libra...
https://www.techpowerup.com/249972/ssds-are-cheaper-than-eve...
https://qz.com/472292/data-is-expected-to-double-every-two-y...
> half of all the recorded information of humankind was created in the past two years
That is shocking to imagine, and it's exponentially growing.
It reminds me of Vannevar Bush's "As We May Think", pointing out the emerging information overload in society. It certainly puts things in perspective, how we (humanity) have been making a conscious, collaborative effort to develop globally networked computers, one of whose important functions is to help us organize all the information, including books.
The conundrum it seems is that technology is also a massive multiplier/amplifier of the amount of data, that its capacity to help us organize would never catch up to what it's helping to produce.
> total storage for the 38 million volumes of the Library of Congress would be slightly under 200 TB
I guess it's redundant to say, but I'm sure in the near future that would fit on a thumb drive!
I wonder why the Copyright Office didn't just buy Google Books, would only have cost a few hundred million $ ?
This is where the project derailed and never quite recovered.