That said, the rest of the project, which focuses on preserving several independent copies of repositories hosted on GitHub with a handful of partner organizations, is quite useful. From the same post: "They are using a range of technologies, making feeds available over the Internet, and partnering with the Internet Archive, the Software Heritage Foundation and the Bodleian Library. These are mostly things which will get used in the foreseeable future, and should be applauded for that reason."
>> Of course, this is ridiculous. No-one will decode this archive in the foreseeable future.
Yes, no one will be digging code out of Github right after the apocalypse. But what about 200 years after the apocalypse? Or maybe just 1,000 years from now, no apocalypse needed? I could see the archive being of immense historical value.
-Thanks to flash memory cell charge leakage, I'd be surprised if the micro-SD card or USB drive kept its data for more than 3-5 years. They're designed for low cost, not longevity.
-The electrolytic caps will probably have dried out and failed by 50-100 years.
-The plasticizers used will have evaporated away by a century, leaving any plastic or rubber components brittle and crumbly.
-The lead free solders used in modern electronics are prone to the "tin whiskers" phenomenon. Not sure about the mitigations or timeframe for growth but a couple centuries is far, far longer than any reasonable design timeframe, making it a distinct possibility in my mind.
-At 1000 years, I'd wonder about diffusion effects in chips wrecking the circuits. It would be interesting to do a calculation to see how long that would take for an unpowered chip at room temperature.
Much of that is "if we forget technology which we realize somewhere down the road we actually might want to use again." History provides plenty of examples of this, and it's particularly important with a technology which mostly lives on ephemeral media that only lasts a few decades.
Even if you do expand your speculation to post-disaster scenarios, though, while it's true the archive wouldn't be an instant reset button, it would help greatly accelerate the recovery of technology. It's worth noting that it will come with a slew of (human-readable, not encoded) technical works regarding subjects ranging from modern software engineering to microprocessor design to photolithography to power systems, which we call the Tech Tree, along with a guide and index to all the stored repos. Wherever its inheritors / discoverers may be in terms of technological advancement, and especially if they have modern-ish hardware (which can last much, much longer than most storage media), recovering the archive's contents will be a lot faster than rediscovering them from scratch.
(Also worth noting we'll be storing "greatest hits" copies of the ~15,000 most-starred / most-relied-on repos, along with a sampling of several thousand repos with few/no stars, in a selection of places like Oxford's Bodleian Library; our hypothetical future tech seekers won't have to go all the way to Svalbard for those.)
I don't want to stress the doomsday scenarios too much, though, despite our ongoing pandemic. I think the most likely outcome by far is that progress will continue; the archive may be useful to recover a couple of otherwise forgotten technologies that suddenly become important / interesting; and it will ultimately be chiefly of interest to historians. That historical value is a key reason why it casts such a broad net. I too have a couple of fairly unsophisticated pet projects in there that the future won't be interested in individually - but collectively is another matter. One of the most interesting things our advisory committee told us is that history is replete with lists composed by wealthy people of the books they thought most important, carefully preserved for posterity, whereas what modern historians _really_ want is ordinary people's shopping lists, of which almost none survived. That's one reason there are millions of repos in the Arctic now, instead of eg just the most-starred 100K: some of those may be the modern technological equivalent of Renaissance shopping lists, for the historians who may take a particular interest in this (possibly) especially wacky and volatile era.
I know it's an inherently cinematic and dramatic project and so it's tempting to call it a PR stunt ... but I assure you, it's not, and, speaking personally, I would never have gotten involved with it if I thought it was.
There will always be "negative Nancies" -- especially here, they are everywhere -- but personally I'd just like to say thanks for having some vision outside of the normal day-to-day of making money for shareholders and keeping regular customers happy. More of this, please.
From there: "The OSCOMAK project is an attempt to create a core of communities more in control of their technological destiny and its social implications. No single design for a community or technology will please everyone, or even many people. Nor would a single design be likely to survive. So this project endeavors to gather information and to develop tools and processes that all fit together conceptually like Tinkertoys or Legos. The result will be a library of possibilities that individuals in a community can use to achieve any degree of self-sufficiency and self-replication within any size community, from one person to a billion people. Within every community people will interact with these possibilities by using them and extending them to design a community economy and physical layout that suits their needs and ideas. As the internet has grown, it has enabled collaborative work which has created many success stories, including Linux, Python, GCC, Squeak and other projects. We want to harness that power and apply it to organizing technological knowledge in concert with many interested individuals. The main project goal is to develop an on-line library of technology ideas, techniques, and tools, including a range from high-tech processes like plastics to medium-tech like ceramic houses to low-tech like spinning wheels. Also included will be biotechnology processes, like perennial agriculture, companion planting, sheep farming, and eventually cloning and DNA synthesis. One process to be included is a way to convert the high-tech computerized library to a low-tech paper one as desired. Key to the whole endeavor will be to present everything in a how-to fashion. Also needed is a way to map out and simulate the interrelations of processes; for instance, sheep raising requires veterinarians, antibiotics, feed, fencing, and shears; shears require a blacksmith, metal, and a furnace. This latter feature also would be used to keep track of the product flows into, out of, and within a community's entire economy."
Making all of this code essentially useless. You'd need to store those repos and their entire dependency tree.
I forget that they do fundamentally host text, and not video etc.
I somehow thought it would be petabytes. The private repos might be more than that but those are historically paid.
Even a naive deduplication might yield some very interesting results
Reminds me of a time I caught someone using someone else’s code in an interview and passing it off as their own. (Using was fine, it was the claim that it was theirs that bugged me)
The size of all file contents (including older versions of files) is a few hundreds TBs, and everything else (directory structures, revision history, etc.) is under 10TB.
So for GitHub alone it would be a little under that
> We’ve archived 6,000 of the world’s most popular repositories as a proof of concept for future archives.
> The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size.
You're free not to care about what situation humans might be in in a thousand years or what they might need then. I can't say I spend any effort day to day working with that distant future in mind myself. But I'm glad there are people in our civilization who do.
kinda with you on that one. looks cool, plausibly useful but we'll see.
> Archive Program director here - the 6,000 repos were on the single proof-of-concept reel we archived last autumn. The full archive consists of millions of repos, including all repos with at least one star with any commits in the year leading up to 02/02/2020.
- capnproto/capnproto
- sandstorm-io/sandstorm
- erlang/otp
(I don't remember the order).
I actively contribute heavily to sandstorm. I've sent patches here and there to capnproto, and it's vaguely a sister project to sandstorm. Those are probably some of the most popular projects I have multiple contributions to, though there are others.
otp feels a bit odd though, if there's and "and more" -- I sent them a one line patch to fix a build error when building against musl. I haven't really been involved since, nor was I before. But it's a high profile project.
Ironically, 500 years from now, they may think that the year of the Linux desktop was 2008 :-D
Honestly, nothing scares me more than losing all the code and all the technology we've developed in the past 70 or so years. There's been so much advancement, but it's also transferred in such a way (institutional knowledge, propietary software, proprietary hardware, etc.) that it's super easy to lose. If we preserve open hardware and software, then we could rebuild in the case of civilizational decline and the accompanying knowledge loss, something which we would neither be the first nor the last to experience.
...can we?
I'm sometimes a little concerned about how complicated chip fabs are. They feel like something that could take generations to rebuild, even if we had all the knowledge on what to do.
Devices would be much bigger and less efficient, but we would be able to run code and pump out 8086 processors within 6 months.
I think We'll be fine (as in, our species will survive). If we lose it all, we can rebuild. We've already proven that we're capable. The code is a just a record of our capabilities, not a barrier to entry.
For something that's meant to survive any catastrophe that might happen over centuries to come, it's not a good sign to see that happen so early. It's extra bad to see it driven by a trend, namely global warming, that we're continuing to push farther and farther and have shown few signs of stopping.
Additionally, large-scale github-specific projects like https://gharchive.org (formerly GitHub Archive) have existed for some time.
In my experience, code is more likely than not to be preserved in a stale revision, if at all.
The most common forms of preservation are (a) simple tarballing and (b) git bundles.
What is the probability that we still have the required tech to read that code in 1,000 years?
Seems organisation work is ignored and only individual username fork/PRs respected (is this a bug?). Software is teamwork ;)
I mean awesome-react, tldr-pages or homebrew-cask are probably not unimportant but that's not where I contributed most to.
Tremendous news.
See https://earth.esa.int/documents/1656065/3222865/170922-Piql-... for piql's storage method.