Federal Judge Says Internet Archive's Wayback Machine a Legit Source of Evidence (opens in new tab)

(techdirt.com)

129 pointskpwags10y ago43 comments

43 comments

28 comments · 10 top-level

throwaway776710y ago· 7 in thread

One point to note here, is that the Wayback Machine obeys robots.txt retroactively, so sites can hide evidence by changing their robots.txt to disallow indexing of specific content. The data won't be purged from the servers, but it will not be displayed through the site.

This is quite disappointing (though probably wise from a legal standpoint). It makes it less useful if a party in a lawsuit can retroactively hide evidence of wrongdoing, and thereby deny access to evidence.

It's also a problem if a domain name changes ownership and the new owner suddenly adds a restrictive robots.txt - the old content will no longer be accessible even though the current owner has no claim to it.

bonyt10y ago

In 2009, there was an interesting case where a federal district court ordered a plaintiff to disable its robots.txt to allow the Wayback Machine to disclose an old version of the plaintiff's website.[1] Someone at the Internet Archive provided a declaration stating that it would place a significant burden on them[2] to respond to a subpoena themselves, whereas all the plaintiff had to do was modify its robots.txt. The court reasoned that since the plaintiff had the technical ability to un-block access, they could be compelled to do so, and made them disable their robots.txt.

[1]: http://www.american-justice.org/upload/page/123/69/docket-18...

[2]: https://tonybox.net/tmp/ia_decl_pacer.pdf (downloaded from PACER)

nightcracker10y ago

> The court reasoned that since the plaintiff had the technical ability to un-block access, they could be compelled to do so, and made them disable their robots.txt.

How is this not indirect self-incrimination?

5 more replies

CJefferson10y ago

Thanks for answering a question I had been worried about for ages -- did changes to robot.txt make the internet archive delete it's history, or just hide it.

greglindahl10y ago

Thanks for making the PACER doc available -- is it in RECAP?

pjc5010y ago

deny access to evidence

Subpoena?

epicaricacy10y ago

That goes to weight more than admissibility.

The only thing that the judge ruled in this case is that it CAN be used.

huherto10y ago

Very interesting.

At least they cannot add content.

And it is easy to know if they may have hidden something. Since the robots.txt should be public. I hope. May be a judge can make them unhide. I don't know.

throwaway776710y ago· 3 in thread

In many European countries, we have legal deposit laws that require the national library to run archival web crawls, at least of the country TLD and sometimes on a best-effort basis for material outside the TLD that's considered relevant (based on language, for example).

These laws specify that the crawls are stored untampered and guarantee that the results can be considered valid evidence.

Interestingly, that's how the Internet Archive's heritrix crawler came about - the nordic national libraries were saddled with this requirement but didn't really have the technical infrastructure to implement it. They formed a coalition among themselves and brought the Internet Archive into it (the IIPC[0]), and used it to fund development of heritrix.

[0] http://netpreserve.org/

corecoder10y ago

That's interesting, I'd like to know more about the specific European situation on this.

There have been a few cases (in Italy, but not only) where someone sued someone else for defamation (or is it libel?), bringing a screenshot of a tweet, Facebook post etc. as "proof"; all such cases have been dropped because a screenshot cannot possibly be used as evidence.

It is not clear (at least not to me) how someone could proceed in order to obtain proof in these cases.

pdabbadabba10y ago

> defamation (or is it libel?)

Both are right. Defamation is a general term that encompasses both libel and slander. Libel is written, slander is spoken. When in doubt, just say "defamation."

> a screenshot cannot possibly be used as evidence

Generally speaking, I see no reason why this should be the case. And screenshots are routinely used as evidence in U.S. courts. Of course, if the opposing side challenges the accuracy of the screenshot, then you'll need to give more evidence (testimony, probably) about how it was produced. But that doesn't mean that screenshots are per se unreliable.

2 more replies

throwaway776710y ago

The IIPC has some more information (I'm not too knowledgeable about the legal side), looks like Italy doesn't have such laws, at least it's not listed: http://netpreserve.org/legal-deposit

That said, the specific examples of twitter or facebook posts is still a problem - those huge centralised services are hard to properly archive without direct assistance from the companies in question. I know some of the national libraries are trying, but it's far from a solved problem.

aakilfernandes10y ago· 3 in thread

Would be great if archival services recursively hashed their documents and put them in a blockchain. Then youd get 100% certainty the records havent been updated since they were first recorded

ics10y ago

Was thinking something similar after seeing the above comment about doing this at a national level. "Best effort" could be a whole lot better if you could piggyback on and contribute to archiving. Lots of issues to explore but very interesting.

ikeboy10y ago

If they aren't trusted, timestamping doesn't help. If they are, it's not needed.

What use case do you have in mind where it helps if the archive proves timestamps?

(Timestamping can certainly help sometimes, but with an archive you're trusting them anyway.)

vidarh10y ago

A blockchain does not have to just confirm the timestamp. It can record a consensus of facts about the page as well. E.g. have multiple parties run crawlers and confirm that a majority of them agree about the content of the page to some delta (and include the deltas) at time X.

Do it right and the blockchain can provide trust by ensuring that the record demonstrates that a sufficient number of other parties have confirmed each part of the record.

You have a point in that if an archive is trusted the motivations for doing this largely falls away. The problem is of course that we don't know if the archive will always be trustworthy (e.g. at some point they may accidentally hire someone who is not trustworthy into a position where they are able to do damage), and if/when they're not is when they're likely to be most resistant to putting in place means to prove they are trustworthy.

1 more reply

ikeboy10y ago· 3 in thread

How hard is it to mitm the IA over http, thus producing fake evidence that a site said something once?

throwaway776710y ago

It's the same difficulty as any other MITM - if you control a point in the routing chain, or if you can affect the routing chain through e.g. BGP, you can do it.

The crawls will probably originate from random AWS addresses, but you can target by User-Agent, since they identify themselves there.

thatcat10y ago

I wonder if they save the IP address for each scraping session

ikeboy10y ago

I also wonder if they respect HSTS and HPKP for a given site, making this attack only possible against sites without such protection.

jedberg10y ago· 1 in thread

Sounds like now is a great time to be a WM employee if you don't mind taking bribes from criminals.

I love the WM, but this is terrible, because they don't have controls around the chain of evidence. Any page can be modified in the archive by an employee, both before and after a page has been identified as evidence.

jedberg10y ago

A note to the above since I can't edit anymore: I'm basing what I said on a guess about their internal controls and assuming they don't spend the time and money to maintain a chain of evidence, but I have no direct knowledge of their internal controls.

rjdevereux10y ago· 1 in thread

A Federal judge said "legit"? Times are a changin'

greglindahl10y ago

No, it's a consequence of HN limiting the length of titles.

koolba10y ago

Is there any sort of blockchain based authentication of the data saved by archive.org?

I'm not saying I don't trust them[1] but this seems like a perfect use case for saving the content hash to prove that content X existed at least as early as time Y.

[1]: Well maybe I am saying that...

MicroBerto10y ago

We recently posted an exposé that created a legal "situation". It includes archive.org links which definitely help.

Earlier for another situation, my lawyer stated that they've successfully used it in the past.

If you're going to get a bit crazy on your blog, https://archive.org/web/ (see Save Page Now section) is great stuff.

However, one big issue is that Facebook is now blocking the ability to archive links that go directly to comments made in public postings.

So does anyone have a workaround or an archive.org-like site that can archive Facebook comments, full with working JS that allows the exact comment to get archived? (to get the URL of the comment, right-click on the timestamp and copy link).

grenoire10y ago

I think it's time to create the Wayback Machine of the Wayback Machine.

awqrre10y ago

Fabricating evidence just got easier... you only have to hack one site...

j / k navigate · click thread line to collapse

43 comments

28 comments · 10 top-level

throwaway776710y ago· 7 in thread

bonyt10y ago

[1]: http://www.american-justice.org/upload/page/123/69/docket-18...

[2]: https://tonybox.net/tmp/ia_decl_pacer.pdf (downloaded from PACER)

nightcracker10y ago

> The court reasoned that since the plaintiff had the technical ability to un-block access, they could be compelled to do so, and made them disable their robots.txt.

How is this not indirect self-incrimination?

5 more replies

CJefferson10y ago

Thanks for answering a question I had been worried about for ages -- did changes to robot.txt make the internet archive delete it's history, or just hide it.

greglindahl10y ago

Thanks for making the PACER doc available -- is it in RECAP?

pjc5010y ago

deny access to evidence

Subpoena?

epicaricacy10y ago

That goes to weight more than admissibility.

The only thing that the judge ruled in this case is that it CAN be used.

huherto10y ago

Very interesting.

At least they cannot add content.

And it is easy to know if they may have hidden something. Since the robots.txt should be public. I hope. May be a judge can make them unhide. I don't know.

throwaway776710y ago· 3 in thread

These laws specify that the crawls are stored untampered and guarantee that the results can be considered valid evidence.

[0] http://netpreserve.org/

corecoder10y ago

That's interesting, I'd like to know more about the specific European situation on this.

It is not clear (at least not to me) how someone could proceed in order to obtain proof in these cases.

pdabbadabba10y ago

> defamation (or is it libel?)

Both are right. Defamation is a general term that encompasses both libel and slander. Libel is written, slander is spoken. When in doubt, just say "defamation."

> a screenshot cannot possibly be used as evidence

2 more replies

throwaway776710y ago

The IIPC has some more information (I'm not too knowledgeable about the legal side), looks like Italy doesn't have such laws, at least it's not listed: http://netpreserve.org/legal-deposit

aakilfernandes10y ago· 3 in thread

Would be great if archival services recursively hashed their documents and put them in a blockchain. Then youd get 100% certainty the records havent been updated since they were first recorded

ics10y ago

ikeboy10y ago

If they aren't trusted, timestamping doesn't help. If they are, it's not needed.

What use case do you have in mind where it helps if the archive proves timestamps?

(Timestamping can certainly help sometimes, but with an archive you're trusting them anyway.)

vidarh10y ago

Do it right and the blockchain can provide trust by ensuring that the record demonstrates that a sufficient number of other parties have confirmed each part of the record.

1 more reply

ikeboy10y ago· 3 in thread

How hard is it to mitm the IA over http, thus producing fake evidence that a site said something once?

throwaway776710y ago

It's the same difficulty as any other MITM - if you control a point in the routing chain, or if you can affect the routing chain through e.g. BGP, you can do it.

The crawls will probably originate from random AWS addresses, but you can target by User-Agent, since they identify themselves there.

thatcat10y ago

I wonder if they save the IP address for each scraping session

ikeboy10y ago

I also wonder if they respect HSTS and HPKP for a given site, making this attack only possible against sites without such protection.

jedberg10y ago· 1 in thread

Sounds like now is a great time to be a WM employee if you don't mind taking bribes from criminals.

jedberg10y ago

rjdevereux10y ago· 1 in thread

A Federal judge said "legit"? Times are a changin'

greglindahl10y ago

No, it's a consequence of HN limiting the length of titles.

koolba10y ago

Is there any sort of blockchain based authentication of the data saved by archive.org?

I'm not saying I don't trust them[1] but this seems like a perfect use case for saving the content hash to prove that content X existed at least as early as time Y.

[1]: Well maybe I am saying that...

MicroBerto10y ago

We recently posted an exposé that created a legal "situation". It includes archive.org links which definitely help.

Earlier for another situation, my lawyer stated that they've successfully used it in the past.

If you're going to get a bit crazy on your blog, https://archive.org/web/ (see Save Page Now section) is great stuff.

However, one big issue is that Facebook is now blocking the ability to archive links that go directly to comments made in public postings.

grenoire10y ago

I think it's time to create the Wayback Machine of the Wayback Machine.

awqrre10y ago

Fabricating evidence just got easier... you only have to hack one site...

j / k navigate · click thread line to collapse