This is quite disappointing (though probably wise from a legal standpoint). It makes it less useful if a party in a lawsuit can retroactively hide evidence of wrongdoing, and thereby deny access to evidence.
It's also a problem if a domain name changes ownership and the new owner suddenly adds a restrictive robots.txt - the old content will no longer be accessible even though the current owner has no claim to it.
[1]: http://www.american-justice.org/upload/page/123/69/docket-18...
[2]: https://tonybox.net/tmp/ia_decl_pacer.pdf (downloaded from PACER)
How is this not indirect self-incrimination?
Subpoena?
The only thing that the judge ruled in this case is that it CAN be used.
At least they cannot add content.
And it is easy to know if they may have hidden something. Since the robots.txt should be public. I hope. May be a judge can make them unhide. I don't know.
These laws specify that the crawls are stored untampered and guarantee that the results can be considered valid evidence.
Interestingly, that's how the Internet Archive's heritrix crawler came about - the nordic national libraries were saddled with this requirement but didn't really have the technical infrastructure to implement it. They formed a coalition among themselves and brought the Internet Archive into it (the IIPC[0]), and used it to fund development of heritrix.
There have been a few cases (in Italy, but not only) where someone sued someone else for defamation (or is it libel?), bringing a screenshot of a tweet, Facebook post etc. as "proof"; all such cases have been dropped because a screenshot cannot possibly be used as evidence.
It is not clear (at least not to me) how someone could proceed in order to obtain proof in these cases.
Both are right. Defamation is a general term that encompasses both libel and slander. Libel is written, slander is spoken. When in doubt, just say "defamation."
> a screenshot cannot possibly be used as evidence
Generally speaking, I see no reason why this should be the case. And screenshots are routinely used as evidence in U.S. courts. Of course, if the opposing side challenges the accuracy of the screenshot, then you'll need to give more evidence (testimony, probably) about how it was produced. But that doesn't mean that screenshots are per se unreliable.
That said, the specific examples of twitter or facebook posts is still a problem - those huge centralised services are hard to properly archive without direct assistance from the companies in question. I know some of the national libraries are trying, but it's far from a solved problem.
What use case do you have in mind where it helps if the archive proves timestamps?
(Timestamping can certainly help sometimes, but with an archive you're trusting them anyway.)
Do it right and the blockchain can provide trust by ensuring that the record demonstrates that a sufficient number of other parties have confirmed each part of the record.
You have a point in that if an archive is trusted the motivations for doing this largely falls away. The problem is of course that we don't know if the archive will always be trustworthy (e.g. at some point they may accidentally hire someone who is not trustworthy into a position where they are able to do damage), and if/when they're not is when they're likely to be most resistant to putting in place means to prove they are trustworthy.
The crawls will probably originate from random AWS addresses, but you can target by User-Agent, since they identify themselves there.
I love the WM, but this is terrible, because they don't have controls around the chain of evidence. Any page can be modified in the archive by an employee, both before and after a page has been identified as evidence.
I'm not saying I don't trust them[1] but this seems like a perfect use case for saving the content hash to prove that content X existed at least as early as time Y.
[1]: Well maybe I am saying that...
Earlier for another situation, my lawyer stated that they've successfully used it in the past.
If you're going to get a bit crazy on your blog, https://archive.org/web/ (see Save Page Now section) is great stuff.
However, one big issue is that Facebook is now blocking the ability to archive links that go directly to comments made in public postings.
So does anyone have a workaround or an archive.org-like site that can archive Facebook comments, full with working JS that allows the exact comment to get archived? (to get the URL of the comment, right-click on the timestamp and copy link).