If this were truly the case (that it didn't matter), the argument can be made that there is no reason to change the host -- just show it as google.com like it does now. The only reason that you'd want the address bar to show a different domain (i.e. the "author" rather than "publisher") is exactly because it _does matter_ to the user!
Given Signed Exchanges are entirely opt-in by the publisher/website operator, what's the difference between this and a CDN? Isn't that "lying" about what site you're on? It's not theverge.com - it's Cloudflare!
> There has been discussion of allowing a publisher to restrict the set of distributors that can host its signed content. If that's added, then the privacy situation becomes more similar to the situation with CDNs, where a publisher chooses a CDN to serve their content, and the CDN learns about all requests for that content. Here the publisher would choose one or more distributors, and the distributor(s) would learn about requests for the content.
Basically a signed exchange .sxg as it is right now allows anyone to be a distributor. It authenticates only that the content has not changed, but not who is distributing it was allowed to distribute it.
Imagine I use SXG to sign "myBlog.com/some-post/" as "some-post.sxg" and serve it via "myCdn.com/myBlog.com/some-post.sxg" on "myBlog.com/index", like I would with a CDN. EvilSearchEngine (now being both the source that sends a results page and a distributor CDN) can crawl this index, nab the .sxg and then when it serves results, serve it via "EvilSearchEngineCDN.com/myBlog.com/some-post.sxg" instead. The client browser checks the sxg, finding it is validly signed and thus shows it as "myBlog.com/some-post/".
However, (1) the EvilSearchEngine has now gained tracking data on visits to "myBlog.com/some-post/" (which it may or may not already have), but more importantly (2) I've now lost the ability to do first-party tracking of visits to "myBlog.com/some-post/" since EvilSearchEngine users never hit my server and I must now beg EvilSearchEngine to give me their tracking data for my content.
Maybe they'll introduce "allowing a publisher to restrict the set of distributors", maybe they won't. But until they do, it behaves nothing like a real CDN.
See, this is one of the good things about the SE proposal. The source page already has all the tracking data; that's a given. But if the content is served via SE it reduces the number of other sites that get tracking data, which is very much a win for privacy.
> Maybe they'll introduce "allowing a publisher to restrict the set of distributors", maybe they won't.
It would make no sense for user agents to enforce such a restriction, since a valid signature shows that the content is authentic regardless of the identity of the distributor.
There is only a single way that the source[+] can guarantee[*] knowledge of a click through: Linking to their own domain with a unique identifying tag[**] and subsequently redirecting to the real target.
Without SE, this kind of tracking behavior is readily apparent to the user:
- The link goes to the source's own site [***] - The link contains elements of entropy relevant to tracking - The browser must briefly flash the source's domain being loaded before the redirect comes in. An obvious behavioral difference.
With SE, however, this behavior is no longer apparent:
- The link can visibly go to the publisher directly yet still actually hit the distributor (likely just also the source) even with scripts and referers disabled. Unless browsers offer a way to disable SE specifically, there is no way to prevent it. - The browser has no observable behavior differences from a site not using SE. The only way to distinguish a site using SE (besides the browser making this explicitly visible in the UI) is by capturing connections at a layer below the browser (i.e. sniffing packets at the OS or upstream network level) .
Like with most tech, it is certainly possible to implement SE in a privacy preserving way on the client side, but it also seems to be solving a problem that only the likes of Google will care about.
[+] I'm going to use the terminology in the proposal where: - source = the thing hosting the link (i.e. Google) - publisher = the thing producing the content (i.e. YourBlog.com) that is linked to - distributor = the thing re-distributing a copy of the content (i.e. also Google) in lieu of the publisher
[*] While other methods like JavaScript and 'Referer's may also leak this information, they depend on a cooperative client voluntarily disclosing such info. It is quite trivial for a privacy conscious client to disable both sources of leaking click through information.
[**] This must also be part of the URL, as using cookies or the such can be foiled by loading the link in a clean/fresh alternative sandbox environment (no identifiable info), and then sending back the address it resolved to. A lot of URL "un-shorteners" do this already.
[***] This can be hidden by JavaScript (and Google has done exactly that right now by replacing href= URLs on click), but again, a privacy conscious user would have disabled such a script.
I really think you overestimate how likely users are to notice a redirect URL appearing in the address bar for a fraction of a second. Regardless, even with SE the distributor's URL will appear for a short time until the content has been downloaded and the signature can be validated, just as it would with a redirect. Anyone who actually cares to see which connections are made (which covers much more than what you can infer from the address bar alone) can open up the browser's developer tools and inspect the network traffic. Or you can fire up Wireshark if you're paranoid, but there is nothing in the proposals about hiding the true endpoints of the connections from the developer tools—just main URL shown in the address bar.
If you want to avoid giving the data to Google in the first place there is no substitute for checking the URL before clicking on the link, which will show the distributor's domain regardless of SE.