Google Analytics is on a substantial proportion of the Internet. 65% of the top 10k sites, 63.9% of the top 100k, and 50.5% of the top million[1]. My own partial results from a research project I'm doing using Common Crawl estimates approximately 39.7% of the 535 million pages processed so far have GA on them[2].
That means that you're basically either on a site that has Google Analytics or you've likely just left one that did.
If the page you're on has Google Analytics and isn't encrypted, the Javascript request and response is in the clear. That JS request to GA also has your referrer in it, in the clear.
The aim of my research project is to end with understanding what proportion of links either start or end in a page with Google Analytics. If it starts with Google Analytics, your present "location" is known. If the link ends with Google Analytics, but doesn't start with it, then when you reach that end page, the referrer sent to GA in the clear will state where you came from. All of this is then tied to your identity.
If people are interested when I get the results of my research, ping me. I'll also write it up and submit it to HN as it would seem to be of interest.
[1]: http://trends.builtwith.com/analytics/Google-Analytics
[2]: http://www.youtube.com/watch?v=pkoIUmP5ma8 (GA specific results at 1:20)
Of course, as a web developer, it's useful to be able to see where people came from. But we don't have any right to that information. As an end-user, why the hell is my browser giving you this information for no reason when it doesn't have to?
I've been using RefControl for Firefox for years now. It fakes the referrer, setting it to the root of the domain being requested. This hasn't ever caused me any problems, so there can't be that many sites that rely on it.
I don't give a shit about your analytics or how much money you think you'll lose from referrers disappearing. Privacy is more important.
The problem is, all of these "features" allowed by referrers are user-hostile actions.
If referrers went away tomorrow, users wouldn't notice the difference or care. Publishers would get angry and think "we can't milk our content/visitors for as much money anymore!" But that doesn't really change the relationship with the customers who value your product or business so I personally can't believe it will make a sizable difference in the end.
This is actually an interesting problem, because it's already solved but most people aren't using the solution: If you have a large file do distribute to a large number of people without authentication, use BitTorrent. As far as I can see there are two primary impediments to this:
A) Most browsers can't by default download large files P2P. You can actually write a BitTorrent client in javascript using Web Sockets if you really want to, but that's just horrible. What would be really nice is to be able to just e.g. embed a video into a webpage using a magnet link. There is no technical reason why this couldn't be implemented and rightly should be for large files.
B) Images are exactly the wrong size. They're big enough that you can't just ignore hotlinking but not big enough that you want to pay the overhead of connecting to 50 different peers instead of one to get a good transfer rate. But that just requires some adjustments to the protocol; if you're looking for realtime retrieval for display in a webpage you would probably want to use UDP and then use erasure coding to deal with slow/broken peers and packet loss. If you have a 60KB image, you can send a ~50 byte packet to each of a dozen peers and have ten of them each send 6KB (approximately four packets) to the target with 6KB worth of erasure bits from each of the others (which also allows the image to be constructed once 60KB of data is received in total from any collection of peers), and now the image is costing you ~600 bytes instead of 60KB. And if the image hasn't been received in 150ms, add more peers.
You are of course correct that we don't have a "right" to this information. But I've discovered, many times, through the referrers in my logs, links to my pages from some very interesting places that I might not have discovered otherwise (because the link information that Google discloses is woefully incomplete).
Any user who wants to hide referrer information can easily to do in a variety of ways. For example, I wrote a bookmarklet that does this for you: http://lee-phillips.org/norefBookmarklet/
It's irrelevant how useful you find the information. You'd probably find it useful to know the name and email address of everyone that visits your site too... So?
AdWords paid search clicks still send the info, if that's what you're talking about.
But this discussion is on completely removing referrers, not just stripping search keyword.
Right now, once you hit 10M pageviews a month you either have to sample or pay $150k/year for Premium.
I don't need support, an account manager, four-hour turnaround on data, an SLA, etc. I just need more pageviews sometimes.
I just thought I'm already using NoScript, AdBlock, RequestPolicy, BetterPrivacy, Cookie Monster, Blender, and HTTPS-Everywhere; might as well go all-in.
Firefox, ABE, NoScript, Request Policy, Ghostery, HTTPS-everywhere, hygiene.
The irony of my militant approach toward privacy is that I probably make myself more interesting to would-be eavesdroppers by my carefulness than I would if they could see it all -- I'm just not that interesting.
On the plus side, the LCD of legitimate-threat hostiles is greatly increased. I'm fairly boring even to neighbors and law enforcement and copyright holders and scam artists and advertisers. I imagine I'm pretty stultifying to nation-state actors. :)
Still, I'd like everyone else to join me so that I can get lost in the crowd. The untracked, encrypted, well-rested crowd.
Come on in, the water's fine.
I would only advise against Ghostery, as they whitelist some trackers, if being paid. With every update I had to reselect these trackers.
And Evidon (Ghostery's mothership) selling usageinformation really bugs me: http://venturebeat.com/2012/07/31/ghostery-a-web-tracking-bl...
I would recommend the FF-addon Diconnect: https://addons.mozilla.org/en-US/firefox/addon/disconnect/
Does anybody have an idea, how I could make my own sites secure in a relatively cheap way? Just a personal site with not that much traffic, so spending much money seems a bit off to me.
Ideas?
I work on Disconnect. I don't understand why any hacker would still put Ghostery on their machine:
* Ghostery is run by former ad execs (7/9ths of their executive team): http://www.evidon.com/our-team
* They make their money (I've heard tens of millions of dollars per year) selling user data to ad co's and data brokers: http://www.evidon.com/#block-views-from_our_partners-block
Ghostery does what the user tells it to do. If you are seeing unblocked trackers, most likely, its because we've added new trackers and you didn't select "block" by default for the new trackers when the list gets updated. You can change this preference by going into Ghostery options, Advanced, and review the "auto-update" section.
And heres a full explanation as to what Evidon gets and what it does with it: http://purplebox.ghostery.com/?p=1016023438
Regarding your sites' security, what sort of advice are you looking for? OS-level hardening? Service config?
BTW, if you're using Chrome, you might also want to look into the "Users" section of preferences. You can create multiple user profiles with separate history, cookies, cache, etc. You can have a different user profile per window at the same time. (After you create a second user, there will be an icon in the top right corner of the window to open a window as another user.)
I like to use this to protect against CSRF. (I do financial stuff as another profile and facebook as another profile.) It's also useful for QA if you need to be logged in as multiple people at the same time.
You can even modify you Firefox by changing values in about:config
geo.enabled ---> false
keyword.URL ---> Your Search engine query url
browser.urlbar.trimURLs ---> false
noscript.ABE.wanipCheckURL ---> 0
network.http.sendReferheader ---> 0 network.http.sendSecureXSiteReferrer ---> false ^these ones break some site functionality that rely on it. It's rare at least.
https://github.com/gorhill/httpswitchboard/wiki/How-does-HTT...
Curious, how would this differ from whitelisting cookies in the browser's own settings?
There's this add-on called Blender that is supposed to make your browser send headers like the average browser, you might be interested in that.
Like to point out what may be obvious to some but not others, when using NoScript you may want to remove Goog, Yahoo, etc from the default whitelist.
Before Blender I used various iterations of FF for testing & different surfing types(Waterfox/PaleMoon/ESR), but it appears I'll only be doing that for testing purposes anymore. https://addons.mozilla.org/en-US/firefox/addon/blender-1/?sr...
I also share your concern that my (lack of a) footprint makes me an outlier, and thus inherently more interesting to an adversary with the power and reach of No Such Agency. There's precisely zero I can do about that, without compromising my local objective of not being followed by every damned website, though, so I just carry on.
This is interesting. I would have actually expected more. The last time I remember someone analyzing this, I believe the result was "<script ... ga.js>" was the most popular tag on the web by far. This was, however, a few years ago.
First, as <bdt101> pointed out, you "cannot track a unique visitor across the web using GA cookies" because of the way they're designed: https://news.ycombinator.com/reply?id=6889120&whence=item%3f...
Second, the NSA doc as excerpted in the WashPost article talks only about Google's PREF cookie, which is set only when you go to say Google.com, not when you go to a non-Google property. It's a first-party cookie used for things like saving language preferences when you're not logged in, not for advertising across other properites. (That's what the Doubleclick cookie is for.)
I've been in the internet industry for a while now.
For what it's worth, I think your thesis has significant value.
Google Analytics requests are also only unencrypted if the site itself is unencrypted, so the fact that the GA request includes the referrer doesn't seem relevant (since the referrer would have already been transferred in the clear in the Referer header on the initial HTTP request.)
It does everything I want it to do (so far), but I'm not an analytics power user by any means.
[1] - http://piwik.org/
I'm curious. In that case, the GA JS is requested from what you call the "end page", so the referrer it has should be the "end page", not the one before it.
Add this line to your hosts file:
0.0.0.0 google-analytics.com ssl.google-analytics.com www.google-analytics.comIt would need to be a recognizable image and symbol. The image would link to the site above, that informs users how you respect their privacy and do not track them. Personally, I'd add it to my sites, because with all the recent concern about privacy, I think my users would appreciate this change, and it would provide some advantage over competing sites. I'd like to visit a site, see that image in the footer, and feel more confident using their service.
I think it would be a good way to encourage change from developers. Very few are going to pull Google Analytics on their own. However, if they get pressure from their users to follow a certain privacy standard, and by doing so they can drop an image on their site to illustrate the change and potentially increase trust and improve their reputation, we might see some improvements.
Comments and further improvements welcome.
Can be a little inconvenient at times but seems justified now.
https://addons.mozilla.org/en-US/firefox/addon/cookie-monste...
- Cross-site requests not allowed without whitelisting. This means some setup will be required at first (for example, for separate image domains used by Amazon, Google, Yahoo, etc.), but after a bit it shouldn't be a problem. This also serves as a "better adblock" in some ways, as it blocks ad networks without relying on a database that needs to be updated.
- All cookies blocked by default; whitelist as necessary
- JavaScript disabled by default; whitelist-enable as necessary
- No Flash or Java, period. If I need Flash for something, I'll launch a VM.
Sadly, Safari doesn't support whitelisting for any of this. Chrome supports whitelisting of cookies and JS by default, but the Chrome UX is worse than Safari's IMO (for a few reasons, but that's another topic entirely).
RequestPolicy handles the first one quite well, but is unfortunately Firefox-only.
Firefox is the answer. No other option makes any sense, if you're serious about this stuff. I understand that some people like the UI or process model of other browsers better, and that's where the evaluation of priorities comes in.
The good news is that the days of Chrome's technical superiority are truly over.. Speed, memory consumption, rendering engine...Firefox is all there and sometimes better.
Firefox is also the only browser with an ability to sanely handle tabs on the side, which is the only sane place to put tabs on modern screens. If I had to choose between sane tabs and sane privacy policies, I might have some soul-searching to do. I understand that everyone has their own equivalent, but be sure not to dismiss Firefox based on historical issues.
It's incredible how much inertia there is with that. The majority of the people I know that switched to chrome did it back when firefox was blatantly slower and that's the image that's stuck in their head. It's incredibly hard to remove and to get someone to try it long enough to change their mind again.
Firefox has a tough issue with marketing right now. They need to start a nice "firefox is faster" campaign.
Posted on HackerNews two days ago.
Search - Check (goog.com)
Mail - Check (Gmail)
Browser - Check (chrome)
Devices - Check (Android/Chrome books)
Websites - Check (Double click/AdMob, Unknown number of other companies)
Google Analytics - Check
Your DNA - Check (23&Me)
Cars - Check (self-driving cars)
I am probably missing large chunks of tracking even with this list.
Where do you draw the line so that organizations like Google do not handover (willingly or inadvertently) our life to NSA, GCHQ, ASIO, CSIS & whatever New Zealand's Intelligence spooks go by, on a platter?
Heterogeneity - Make the buggers at least have to work a little bit to invade your privacy.
If every site switched from Google Analytics to, say, Mixpanel... nothing would change. The NSA would just target the equivalent mixpanel cookie. So long as their are popular third-party cookies, this will be a problem.
This is all assuming you consider your adversary to be the NSA. If it's google, well, choose other vendors. If it's both, you'll have to consider both your destination and wire-protocol axes.
FWIW, if your traffic is split evenly between 3-4 main vendors (e.g., google, amazon, bing, etc), and all HTTPS, it's hard to tell what you're doing.
Enormous amounts of things have some connection into Google. Other connections into Google's equipment potentially include Voice, Talk, Hangouts, embedded Google Plus +1 buttons, embedded YouTube, Blogspot sites, embedded Picasa images.
Google runs ReCAPTCHA. ( http://www.google.com/recaptcha/ )
If you email someone with a GMail account your email address is in Google's servers with the email header containing your IP address.
Google's SafeBrowsing URL check built into FireFox which normally works by hashed URLs but could still track that you are using it, but has a simple version of the API so applications could send plain text URLs to it without you knowing ( https://developers.google.com/safe-browsing/ ).
Sites hosted on Google AppEngine ( https://developers.google.com/appengine/ ).
If you have IOS, Safari defaults to Google suggestions - i.e. sending everything you type in the address/search bar to Google.
Google Maps, built into other websites and services. Google Geolocation API built into other software ( https://developers.google.com/maps/documentation/business/ge... ).
Google DNS (last time I read the privacy policy, it said queries are not combined with other data Google collects).
Sites loading popular JavaScript from Google's hosted libraries ( https://developers.google.com/speed/libraries/devguide ).
Sites embedding Google Sparklines ( https://developers.google.com/chart/interactive/docs/gallery... )
Links going via Google's URL shortening service Goo.gl
Not counting things you choose to use (Chromecast, music, docs, drive, Now, voice search, News, Groups, Finance, Toolbar, Android sat nav, Chrome's open tab sync between your devices via Google Cloud, etc.).
That's not to say they are good or bad, or they are or are not tracked. Just that it's way to late to "avoid Google" just by switching away from GMail and blocking Google Analytics.
Yes, I know Google likely didn't cooperate in this, but they built a giant tracking engine, so it's not surprising to see it repurposed.
I'm sure they have plausible deniability.
What kind of "preferences" changes in that way each time the user browse away the page and how does it help "user experience"?
http://blogs.wsj.com/digits/2012/02/28/the-google-cookie-tha...
Disable 3rd party cookies. It solves a lot of these types of tracking issues.
Of course, I'm sure they have some other way to pwn me, but it's nice to know that I was doing something right.
Also, I'm on Iceweasel/ Firefox instead of Chrome. It's probably nothing to worry about, but you can never be too careful these days.
This news makes me happy to see there's a point to me having Google Analytics blocked the last two years. I've noticed a new thing, Google tag manager, lately. Any point in whitelisting this? Anyone know what it does?
As to your question: NoScript does a different thing -- it concentrates on limiting known security issues by disabling Javscript. Tracking is accomplished in a variety of ways, and only some of them are Javascript based. Ghostery looks for all of these and lets users know who is tracking them on any given web page.
From a business perspective why is Google and Facebook getting involved in this and calling for the government to not track users. Won't that just bring more attention to their two business models of... wait for it... tracking users and selling their information?
Previously, when the customers didn't care, they did nothing to involve themselves with this, and almost certainly aided the government.
It's purely business. Google and Facebook don't have morals, they have a bottom line. You can understand their actions by following the money.
Browser string, viewed content, frequency and magnitude of access, user authentication cookies, and ad-tracking cookies all would be tremendously helpful for this purpose.
Also, I'm betting they can easily tell when specific computers on a network are powered on or not based on fixed-interval network traffic from anything that polls regularly, such as anti-virus, news readers, mail clients and background updater services.
All of the above could aid in painting a more complete per-user picture behind the NAT, without actually having to compromise the local network or individual computers in question.
http://betanews.com/2013/12/09/tech-giants-surveillance-refo...
As long as these companies build the best tracking engines the world has ever seen, that can identify anyone and everything they're doing, it's just a matter of time before governments get their hands on that data, legally or illegally. It's just too tempting to pass.
If I were Google I'd start thinking long and hard about how to solve this problem, and try to make money by actually being on the user's side when it comes to privacy, not against them. Google will ultimately fail if their goals aren't aligned with those of the users anymore.
Disable Google tracking, log off user FROM Google search engine: * keep login into Gmail * also remove ads * remove Cookie,Sess~/localstorage __ First run, need refresh Google page to log off ~~
-- Also remove Google anal-itics Cookie :)
https://addons.mozilla.org/pl/firefox/addon/googleantyspam/?...
You can get your open-source and locally running web analytics here: https://prism-break.org/
Like OWS protesters, for example.