This particular one just the latest. But the really big one (IMHO) is the one where they simply started to ignore EFF[0], when they were asking them about the copyright status of co-pilot. If the court decides against EFF, that will have a lot of effect on the legality and enforcement of most of the OSS licenses (though I'm an armchair-lawyer, not even in the US). Fun times ahead.
[0]: if I remember well, it was EFF, who mentioned that MS stopped responding to them. I have found the lawsuit, but filed by not by the EFF. Google is more useless by the day.
No, I haven't. Notice that MS now loves Linux... provided you run it on Azure or as a component of Windows (WSL). They adopted Chrome...'s rendering engine and then abused their desktop OS market share to shove the result down people's throats. They don't have the leverage they once enjoyed, but the approach didn't change, at least not in general.
For one part it is quite reasonable to work like that, on the other side it is really unethically and bad for the society as a whole.
The worst case scenario, if you lose a game stacked in your favour several times in a row, you pay a pittance, or performatively correct a now-obsolete injustice.
VScode telemetry will remain opt out because it yields very valuable information. Microsoft is not a democracy, and the outcry here is less than a rounding error, a footnote in some internal director’s morning agenda.
Obtaining power at any cost requires the internal director to pretend he doesn't know, what he's doing.
The vast majority of social capital is made by lying to people, pretending to not know you've done it and dropping relationships with anybody who is not pulling in your direction.
Silence is vastly underrated, I say ironically, so I shouldn't be typing this out.
> Please give an answer within the next week until the 16th of June.
I wouldn't respond to them either out of spite
Also remember he's not talking to a human, but to a soulless corporation. He was as cordial as could be given the circumstances.
And finally, remember that it doesn't matter if a product Microsoft develops to increase their control over developers (via vendor lock-in, mindshare, and forced telemetry) happens to result in a decent free text editor for the user. No one owes them gratitude. This isn't charity.
P.S. Did you know VSCode lets extensions not respect the user's "no telemetry" choice? It's been an open ticket for like 4 years now, that MS have no intention to ever fix, even though all it would take is a simple VSCode Extension Store EULA change.
Last time I had to use that sort of language was with a deranged ISP who had failed to deliver an internet connection, then decided to chase a debt for unpaid bills for this non-existent connection two years later.
But you're right it could be any one of a number of them! I had problems with quite a few over the years.
Among the amusing misfeatures of bulldog broadband was their cancellation process, which required confirming by sending an email to "cancellationconfirmation@..."
Said cancellation confirmation address had not been set up and would just bounce.
But if they respond at all using other channels, you probably still have enough.
[1] https://commission.europa.eu/law/law-topic/data-protection/r...
Last year we had an argument this regarding LibreOffice, where an option to collect some telemetry was suggested as a nagging-opt-in. Opponents argued against this because some fraction of our users will press Accept just to get through the installation, or without understanding what they're accepting; plus we just didn't want this kind of mechanism in a respectable piece of software. For now the idea seems to be dead in the water.
Get a global firewall for outgoing connections.
At best, they raise it to their internal legal contact. The inhouse lawyer rapidly advises them to not respond in any written or recorded medium. Issue goes nowhere.
At worst, they realize that this is a hairball with "vaguely legal stuff" and decide to review some other issue instead for a more productive and less stressful day. Issue goes nowhere.
And that method is not a github comment.
The commenter might have followed the correct route to complain too, but could then at least have said that "I have contacted microsoft at [..] as outlined in the Personal Data Protection Policy and expect a response within [..]"
That said, consent is not the only grounds on which you can process PII. Contract, legal obligation, vital interests, public task, or legitimate interests are also valid grounds. Of these, legitimate interests is the most applicable in this situation.
Yes it's PII which of course is why no one who does Telemetry in a GDPR compliant way would store the IP address. The fact that it's "sent" (in order to send anything at all over http) isn't relevant. Only what's stored, for what reason, and for how long.
> If you add anything to link two subsequent telemetry reports together, that thing is PII (e.g. a hash or a uuid)
Again, no. PII is only information about physical people. Unless the data becomes enough to identify a person (in itself or together with other data), the data is not PII. Having a browser history associated to a random guid might be PII (because the browser history might pinpoint the user, not the guid!). But having a random guid associated to say "has run VS code 12 times this year" is not.
No, telemetry is not something MS needs to fulfil the primary purpose of VS Code. Best example is that the OSS version is there, without any telemetry enabled by default, still doing by and large the same job.
The OSS version obviously benefits from the telemetry (to the extent telemetry is useful) because it's downstream of the version developed based on the telemetry.
Haha sorry I couldn't continue past that! Neeeiiigggh!
> MacAddressHash - Used to identify a user of VS Code. This is hashed once on the client side and then hashed again on the pipeline side to make it impossible to identify a given user. On VS Code for the Web, a UUID is generated for this case.
A hash of a hash is about as expansive as a hash and it still uniquely identifies a machine, tying telemetry events to a specific user's machine. Microsoft's own telemetry description generator calls the field "EndUserPseudonymizedInformation". Pseudonymisation is inherently not anonymisation.
This bullshit is why I keep my PiHole on for my dev environment.
It’s important though if you e.g have multiple products to use a _different_ pseudonymization (hash salt or whatever) otherwise you run the risk of storing data linking too much data on a user thereby de-pseudonymizing them in the worst case even though no individual app does. Having a users behavior across multiple applications could pose such a risk in extreme cases.
Edit: I think it's important to separate "hashing" and "hashing". A properly hashed identifier uses a salt that is generated on the client, so that it can't be used to identify the user. basically: the first time the app runs, you generate a random salt which is only stored on the client, and NEVER sent in telemetry. Anything you would like to transmit over the wire that would risk identifying the user (E.g. a computer name, mac address) you hash with this local salt. This way no one can try to go to the database on the server side and try to match any data e.g. check if the hash abc123 matches the computername jimbob bcause hash("jimbob")= abc123. Just sending hash(MacAddress) without a local random salt would NOT be properly pseudonymous because an attacker on the server side could ask and answer the the question "Does this come from the address macaddress?".
I think the massive amounts of behaviour analysis Microsoft does should be considered PII. They know when you turn in visual studio in the morning, and when you leave. They know when you go to lunch and don't click any buttons for a while, and they can see the colleagues with you in that boring meeting also not clicking any buttons at the same time. This type of behaviour analysis over time can associate you and the people you interact with, even if it's not directly tied to a reversible hardware ID.
This is why pseudonymisation isn't anonymisation, and why pseudonymisation isn't sufficient to comply with laws liker he GDPR.
If the behaviour analysis was done without identifiers at all, you could say they're just counting button clicks, but they intentionally associate this data with your stable personal identifier for analysis over time.
MAC addresses aren't that big of a collision space either, any consumer GPU can generate a list of all hardware MAC addresses in use in a reasonable amount of time. MAC addresses may theoretically be 2^48 in size, but most of the space hasn't been assigned to vendors yet. It takes about 12 minutes to reverse any given MAC address when you rent a single cloud GPU. The double hashing should take about twice that time.
The weird thing is that Microsoft intentionally chose to use a MAC address rather than a UUID like they use on their web version. If this was just a unique user token, they wouldn't need to use any hardware identifiers at all.
You're mixing up the termso psedonymization and anononymization, though. If something provably not PII, it is considered anonymous. Psedonymization specifically means to keep the data as PII, but where the risk of misuse is reduced by making the identification hard.
In practical terms, psedonymous data is data that someone like a data scientist will only be able to link to a person if making a deliberate effort to do so, which will almost certainly mean that she KNOWS she is breaking some law. And it may also mean that the link between the person and the pseudonym is stored in a locked down database where most data scientists (or others that may have interest in doing the linking) do not even have access.
The GDPR does promote the use of pseudonymization as a layer of protection, and if a business does keep some PII data around, properly categorizes their data as such (in compliance with Article 30 of GDPR, with a defined "Legal Ground" for processing activities) AND properly protects the data both through "Security by Design" and "Privacy by Design" (of which pseudoymization is an important element), their legal exposure can be either completely negated or at least radically reduced if the "Legal Ground" is challenged.
Overall, though, fully understanding GDPR is terribly difficult, as it requires significant understanding of both Law (International AND local within each country covered by the GDPR), Computer Science (development AND IT security) AND a good understanding of Data Science.
I rarely meet people with enough understanding of all 3 to assess practices that are in the gray zone.
Lawyers (and most DPO's) tend to have little understanding of the IT or Data Science aspects, but tend to be good at stretching a "Legal Ground" to whatever is needed by the business to continue to be profitable.
Data Scientists tend to know how to de-pseudonymize data, and may even be taught "Privacy by Design" (this usually has to be forced on them, though, as it makes their job harder). Most data scientists struggle with IT security aspects, though, and would in many cases happily download all data to their laptops if they could.
Developers/engineers may understand concepts such as hashing, and even know the difference between hashed and encrypted data. However, as they live in a boolean world of True vs False, using judgement to evaluate the risk impact of some practice for data subjects tends to be alien to them. In a black and white world, this group tends to think that every bad practice is equally bad, instead of going for the "lesser wrong" or "good enough". Especially if the measures needed to be "good enough" makes the coding harder or the system slower.
Finally, IT security (the experts, not the drones) MAY have a better understanding of degrees of risk than developers, but tend to know/care less about the actual data than any other group.
And each group tend to hold the other groups to a higher standard than their own. The lawyers tend to assume that all aspects of development and infrastructure is properly hardened. Data Scientists tend to interpret the "Legal Ground" to cover whatever they want to use the data for. Developers tend to think that the infra that runs their systems is fully secured by shell protection, and may even store "secrets" in more or less open git repos (and even if they delete it later, they don't clean up the git history or create new secrets). And networking often do not even care about anything in the "Application Level" or higher of the networking stack.
So in practice, any large corporation will have a huge number of vulnerabilities. The only way any sensitive asset (from a privacy, intellectual property or operational stability perspective) can be considered properly protected is to have multiple layers of protection, all or most of which must fail for major incidents to happen.
if data of multiple users is aggregated, that is I think more of what people are thinking when they think "anonymous"
Even if you have groups of size greater than k, though, information elements may be non-anonymous if there is not enough diversity in the group. For instance, if every 49-year-old male on a given postal code in a given occupation has a certain religion, then religion is non-anonymous for that group, according to l-diversity [2].
This can be narrowed down even more by t-closeness [3].
[1] https://en.wikipedia.org/wiki/K-anonymity
[2] https://en.wikipedia.org/wiki/L-diversity
[3] https://en.wikipedia.org/wiki/T-closenessAnd before the army of those who don't understand GDPR comes up with "but then the whole internet can not work"; the crucial distinction comes in the answer to the question: "can this tool fulfill its purpose without this connection? if no, then it's essential to it's functioning and does not require consent, if the tool can fullfll it's purpose without this conection it's optional and does require consent.
GDPR makes a disticntion for connection that are required to fullfill the purpose of the tool and connections that are not essential. So VS code connection to a microsoft Server to let's say update download an extension is allowed and does not require consent becasue without that connection VSCode cannot fullfil its purpose of providing functionality.
Telemetry is not functionaliy and VSCode can execute it's purpose without this connection so that makes it subject to user consent requirement.
I do not recollect seeing any opt-in Privacy prompt enabling this feature. Surely an OS can function without the internet so it's not "essential to its functioning".
Same with Firefox's captive portal check [1] that helps determine if a Wifi network requires a web-based sign-in or acceptance of terms of use.
This is misinformed. There is nothing in the GDPR that relates to "exposing" or "transmitting" anything (other than transmitting further from a processor to a third party). GDPR relates to how data is stored or processed. A program can make any number of http requests, for any reason no matter how unnecessary, so long as that PII (The IP, or similar) isn't stored or otherwise processed/transmitted to a third party in a way that the GDPR concerns. The download web server logs is such a storage (which is why you these days clear those every day, or never log IP at all in them).
> Telemetry is not functionaliy and VSCode can execute it's purpose without this connection so that makes it subject to user consent requirement.
No. It's required because the telemetry data is stored whereas the IP of the update request is not. Had microsoft wanted to store every IP of everyone downloading an update, then that database of IP's/downloads would of course have been subject to the GDPR too. The data isn't less sensitive just because it was from a necessary function. Microsoft's responsibility for that data is exactly the same.
But the easiest way of doing telemetry properly and not worry about GDPR is to not store anything that is PII at all. And it's pretty easy to do so too. Nothing is "Truly anonymous". Telemetry is usually pseudonymous. But it properly pseudonymous telemetry is normally not a privacy concern in any way. The true gripes about telemetry (there are a few valid ones) isn't about that, they are
- People getting a worse experience e.g. a slower product
- People not trusting the companies to adhere to the GDPR with the data transmitted, e.g. you might not trust the server to clear IP's from the transmission (basically the only piece of PII that can't be cleared on the client side because then the package never arrives). But if you don't trust the company to adhere to the GDPR then why would one trust their opt-out does anything? Running any kind of software basically means trust to some extent.
- People feeling cheated because of automatic or hidden opt-in
- People on paid internet connections spending money to send the telemetry.
The GDPR deals with "processing" and this is the definition of processing:
" ‘processing’ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction; "
Note the "transmission, dissemination or otherwise making available".
Microsoft trawls their[1] endpoints mercilessly for every bit of telemetry that they possibly can, and they go out of their way to prevent customers from disabling this.
Windows 10 or 11 with Office requires something like 200+ individual forms of Microsoft telemetry to be disabled!
Notably:
- They keep changing the name of the environment variables[2] that disable telemetry. For unspecified "reasons".
- They've been caught using "typosquatting" domains like microsft.com for telemetry, because security-conscious admins block microsoft.com wholesale.
- Telemetry is implemented by each product group, which means each individual team has to learn the same lessons over and over, such as: GDPR compliance, asynchronous collection, size limiting, do not retry in a tight loop forever on network failure, etc...
- Customers often experience dramatic speedups by disabling telemetry, which ought not be possible, but that's the reality. Turning off telemetry was "the" trick to making PowerShell Core fast in VS Code, because it literally sent telemetry (synchronously!) from all of: Dotnet Core, PowerShell, the Az/AAD modules, and Visual Studio Code! Opening a new tab would take seconds while this was collected, zipped, and sent. Windows Terminal does the same thing, by the way, so opening a shell can result in like half a dozen network requests to god-knows-where.
[1] You thought, wait... that it's your computer!? It's Microsoft's ad-platform now.
[2] Notice the plural? It's one company! Why can't there be a single globally-obeyed policy setting for this? Oh... oh... because they don't want you to have this setting. That's right... I forgot.
Windows: https://learn.microsoft.com/en-us/windows/privacy/configure-...
PowerShell: https://learn.microsoft.com/en-us/powershell/module/microsof...
DotNet Core: https://learn.microsoft.com/en-us/dotnet/core/tools/telemetr...
Windows Terminal: https://github.com/microsoft/terminal/issues/5331
Az module: https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure...
Etc...
This seems interesting. Do you have any references for this? I would assume that the main use of such typo-squatting domains is a simple redirect, a la [0][1].
[0]: https://gogle.com [1]: https://gooogle.com
I agree there should be a single place, at least in Windows to control Microsoft telemetry on a per app basis. It should be very easy to accomplish. On other platforms less so.
In a desktop product I do for work we had the dilemma of opt in/out and showing the query clearly and hiding it in settings. We ended up with the middle ground of showing it but having the checkbox checked (so uncheck to opt out). We were still worried this would leave too few opting in but it meant over 95% did.
For command line I’d be 100% happy with a note on first use describing that telemetry is enabled and how it is disabled. Leaving it disabled by default and requiring user action to enable is not realistic in such a situation.
Of course it’s impossible to actually transmit anything anywhere without including the source IP in the http header - a fact we are ignoring completely. But that’s similar to the topic of this discussion: Microsoft does exactly this under the same assumption, that non-PII data can be sent (even via http) without gdpr coming into play. Otherwise they couldn’t have it enabled by default. If there is a ruling that says otherwise then everyone will need to change.
It could also be that first party servers (Microsoft app talking to Microsoft servers) is acceptable and then everyone would route telemetry to their own servers.
What? Lol. How is this the users fault?
That's just dark patterns by companies to bend users into enrolling. It doesn't have to be like this. It could be opt-in under settings, like just about anything else.
It all about power play.
If a user asks for the software to nag people and then the developers make the software start nagging proper then it is the fault of the user for suggesting that behaviour be implemented.
>It could be opt-in under settings, like just about anything else.
Or there could be an opt out in settings like how it already works.
> If a user asks for the software to nag people and then the developers make the software start nagging proper then it is the fault of the user for suggesting that behaviour be implemented.
People are not asking for software to nag. They're asking for the software to NOT send telemetry at all unless the user agrees to it. As it stands now, vscode sends out telemetry before the user has a chance to opt out.
What people want is for software to not be hostile to the users in that way. Failing that, at least give the option before the hostile behavior begins. But really.. It's not the users' fault. It's the software maker's fault for integrating that behavior in the first place and ramming it down our throat, whether we like it or not.
But no apparantly you think microsoft needs a constant faucet if your information to prevent crashes. Golly i wonder how developers managed before said faucets.